BUAD 2070 final
A monthly time series has a seasonal pattern. If a linear regression model is used, how many dummy variables must be used to represent this seasonality?
11
Within a given range of cells, the number of times a particular condition is satisfied is computed by using the _____ function
COUNTIF
______ is the vector of the averages computed for each variable across all cluster observations.
Centroid
_____ measures dissimilarity between two clusters by using the distance between the two cluster centroids.
Centroid linkage
_____ is the process of estimating the value of a categorical outcome variable
Classification
___ is a category of data-mining techniques that detect patterns and relationships in the data.
Descriptive data-mining
Which of the following is true of bottom-up hierarchical clustering?
It starts with each observation in its own cluster and then iteratively combine two most similar clusters
_____ is a generalization of linear regression for predicting an outcome of a binary variable
Logistic regression
The _____ function pairs each element of the first array with its counterpart in the second array, multiplies the elements of the pairs together, and adds the results
SUMPRODUCT
Which of the following reasons is responsible for the increase in the use of data-mining techniques in business?
The ability to electronically warehouse data
Which of the following approaches is a good way to proceed with the influence diagram building for a problem?
The influence diagram for a portion of the problem is build first and then expanded until the total problem is conceptually modeled.
Which of the following statements is the objective of the moving averages and exponential smoothing methods?
To smooth out random fluctuations in the time series
A graphic presentation of the expected gain from the various options open to the decision maker is called
a decision tree
A one-way data table summarizes
a single input's impact on the output of interest
A group of observations measured at successive time intervals is known as
a time series
The time series pattern that reflects gradual increase or decrease in values over a long time period is called
a trend
Spreadsheet models are referred to as what-if models because they
allow easy instantaneous recalculation for a change in model inputs
A forecasting method that uses a weighted average of all past values is known as
an exponential smoothing
Nodes of a decision tree indicating points where the outcomes do not depend on a decision maker are known as
chance nodes
The data-mining method that can be used in market segmentation to divide consumers into different homogeneous groups is _____.
cluster analysis
The modeling process begins with the framing of the _____ that shows the relationships between the various parts of the problem being modeled
conceptual model
The effectiveness of a classification method can be judged by computing the misclassification errors and summarizing them in a
confusion matrix
The time series pattern which reflects a multi-year pattern of being above and below the trend line is
cyclical
The pattern of a time series in business forecasting that is most difficult to predict is
cyclical pattern
Nodes of a decision tree indicating points where a decision maker chooses a decision alternative are known as
decision nodes
A(n) _____ refers to a model input that the decision maker can control in a what-if model.
decision variable
A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known as a _____.
dendrogram
Jaccard's coefficient is different from the matching coefficient in that the former
does not count matching zero entries while the latter does
Test set is the data set used to
estimate accuracy of final model on unseen data
A(n) _____ is a visual representation that shows which entities influence others in a model.
influence diagram
In _____ decision making companies have to decide whether they should manufacture a product or outsource its production to another firm
make-versus-buy
An analysis of items frequently co-occurring in transactions (such as purchases) is known as _____.
market basket analysis
The simplest measure of similarity between observations consisting solely of categorical variables is given by _____.
matching coefficient
The endpoint of a k-means clustering algorithm occurs when
no further changes are observed in cluster structure and number
The k-means clustering is the process of
organizing observations into one of k groups based on a measure of similarity
With reference to a what-if model, an uncontrollable model input is known as a(n) _____.
parameter
A tabular representation of gains or losses for a decision problem is called a
payoff table
In classification, which of the following would be considered as a categorical variable for a credit approval decision for a requester?
reject or accept credit approval
If data for a time series analysis is collected on an annual basis only, which pattern may be ignored?
seasonal
The time series pattern that reflects variability during a single year is called
seasonal
Observation refers to the
set of recorded values of variables associated with a single entity
The uncontrollable future events that affect the outcome of a decision are known as
states of nature
A _____ refers to the number of times that a collection of items occur together in a transaction data set.
support count
Average group linkage measures dissimilarity between two clusters by considering
the average distance over all pairs of observations between these clusters
In the k-nearest neighbor method, when the value of k is set to 1
the classification or prediction of a new observation is based solely on the single most similar observation from the training set.
The arguments supplied to the IF function are
the condition for execution, the result if condition is true, and the result if condition is false
In the theory of association rules in data mining, by confidence we mean an estimated probability that
the consequent occurs given that the antecedent occurs
If the Euclidean distance were to be represented in a right triangle, which of the following would be considered the distance between two observations?
the hypotenuse
One measure of the accuracy of a forecasting method is
the mean square error
An observation is classified as Class 1 if
the predicted probability of this observation to be in Class 1 is greater than or equal to the cutoff value
The impact of two inputs on the output of interest can be examined by a _____.
two-way data table