ITM 466
The lift ratio of an association rule with a confidence value of 0.45 and in which the consequent occurs in 6 out of 10 cases is
0.75
The strength of a cluster can be measured by comparing the average distance in a cluster to the distance between cluster centroids. One rule of thumb is that the ratio for between-cluster distance to within-cluster distance should exceed what value for useful clusters?
1
Which of the following states the objective of time series analysis?
To uncover a pattern in a time series and then extrapolate the pattern into the future
__________ approaches are designed to describe patterns and relationships in large data sets with many observations of many variables.
Unsupervised learning
Hierarchical clustering using __________ results in a sequence of aggregated clusters that minimizes the loss of information between the individual observation level and the cluster level
Ward's method
__________ can be used to partition observations in a manner to obtain clusters with the least amount of information loss due to the aggregation
Ward's method
In which of the following scenarios would it be appropriate to use hierarchical clustering?
When binary or ordinal data needs to be clustered
The moving averages and exponential smoothing methods are appropriate for a time series exhibiting
a horizontal pattern.
Average linkage is a measure of calculating dissimilarity between two clusters by
computing the average distance between every pair of observations between two clusters
Single linkage is a measure of calculating dissimilarity between clusters by
considering only the two most similar observations in the two clusters
In preparing categorical variables for analysis, it is usually best to
convert the categories to binary, dummy variables
A collection of text documents to be analyzed is called a ___________.
corpus
A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known as a
dendrogram.
The mean absolute error, mean squared error, and mean absolute percentage error are all methods to measure the accuracy of a forecast. These methods measure forecast accuracy by
determining how well a particular forecasting method is able to reproduce the time series data that are already available
Jaccard's coefficient is different from the matching coefficient in that the former
does not count matching zero entries while the latter does.
A time series with a seasonal pattern can be modeled by treating the season as a
dummy variable.
A cluster's __________ can be measured by the difference between the distance value at which a cluster is originally formed and the distance value at which it is merged with another cluster in a dendrogram.
durability
The __________ the lift ratio, the __________ the association rule.
higher; stronger
Forecast error
is associated with measuring forecast accuracy
The value of an independent variable from the prior period is referred to as a
lagged variable
The strength of the association rule is known as __________ and is calculated as the ratio of the confidence of an association rule to the benchmark confidence
lift
An analysis of items frequently co-occurring in transactions is known as
market basket analysis.
When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called the
matching coefficient
Complete linkage can be used to measure the distance between clusters that are the __________ in cluster analysis
most different
Single linkage can be used to measure the distance between clusters that are the __________ in cluster analysis.
most similar
The endpoint of a k-means clustering algorithm occurs when
no further changes are observed in cluster structure and number.
The process of __________ might be used to determine the value of the smoothing constant that minimizes the mean squared error
nonlinear optimization
In k-means clustering, k represents the
number of clusters.
In the moving averages method, the order k determines the
number of time series values under consideration
Euclidean distance can be used to measure the distance between __________ in cluster analysis
observations
Autoregressive models
occur whenever all the independent variables are previous values of the time series.
k-means clustering is the process of
organizing observations into distinct groups based on a measure of similarity.
Causal models
relate a time series to other variables that are believed to explain or cause its behavior.
For causal modeling, __________ are used to detect linear or nonlinear relationships between the independent and dependent variables
scatter charts
A time series that shows a recurring pattern over one year or less is said to follow a
seasonal pattern
Observation refers to the
set of recorded values of variables associated with a single entity.
With reference to time series data patterns, a cyclical pattern is the component of the time series that
shows a periodic pattern lasting more than one year
With reference to exponential forecasting models, a parameter that provides the weight given to the most recent time series value in the calculation of the forecast value is known as the
smoothing constant.
Using a large value for order k in the moving averages method is effective in
smoothing out random fluctuations
A method for modifying variables that reduces bias prior to cluster analysis is
standardization.
The process of converting a word to its stem, or root word, is referred to as
stemming
A __________ refers to the number of times a collection of items occurs together in a transaction data set.
support count
The process of extracting useful information from text data is known as __________.
text mining
Trend refers to
the long-run shift or movement in the time series observable over several periods of time
An exponential trend pattern occurs when
the percentage change between periods in the value of the variable is relatively constant.
If a time series plot exhibits a horizontal pattern, then
there is still not enough evidence to conclude that the time series is stationary
A set of observations on a variable measured at successive points in time or over successive periods of time constitute a
time series
The process of dividing text into separate terms is referred to as __________.
tokenization
In the text mining process, the text is first preprocessed by deriving a smaller set of _________ from the larger set of words contained in a collection of documents
tokens
A positive forecast error indicates that the forecasting method ________ the dependent variable.
underestimated
The goal of __________ is to use the variable values to identify relationships between observations.
unsupervised learning
The moving averages method refers to a forecasting method that
uses the average of the most recent data values in the time series as the forecast for the next period.
Complete linkage can be used to measure the distance between _________ in cluster analysis
clusters
__________ uses a weighted average of past time series values as the forecast.
Exponential smoothing
__________ is the amount by which the predicted value differs from the observed value of the time series variable.
Forecast error
Which of the following statements is the objective of the moving averages and exponential smoothing methods?
To smooth out random fluctuations in the time series
Suppose that the confidence of an association rule is 0.75 and the total number of transactions is 250. How many of those transactions support the consequent if the lift ratio is 1.875?
100
Suppose the dissimilarity between clusters A and B has the value 24 and the dissimilarity between cluster B and C has the value 12. Use McQuitty's method to determine the dissimilarity of clusters A and B
18
Demand for a product and the forecasting department's forecast (naïve model) for a product are shown below. Compute the mean absolute error. Period Actual Demand Forecasted Demand 1 12 - - 2 15 12 3 14 15 4 18 16
2
If the forecasted value of the time series variable for period 2 is 22.5 and the actual value observed for period 2 is 25, what is the forecast error in period 2?
2.5
Suppose for a particular week, the forecasted sales were $4,000. The actual sales were $3,000. What is the value of the mean absolute percentage error?
33.3%
Demand for a product and the forecasting department's forecast (naïve model) for a product are shown below. Compute the mean squared error Period Actual Demand Forecasted Demand 1 12 - - 2 15 12 3 14 15 4 18 16
4.67
Euclidean distance can be used to calculate the dissimilarity between two observations. Let u = (25, $350) correspond to a 25-year-old customer that spent $350 at Store A in the previous fiscal year. Let v = (53, $420) correspond to a 53-year-old customer that spent $4,100 at Store A in the previous fiscal year. Calculate the dissimilarity between these two observations using Euclidean distance.
75.39
A forecast is defined as a(n)
A prediction of future values of a time series.
Which is not true regarding trend patterns?
Can result when business conditions shift to a new level at some point in time
__________ is a method of calculating dissimilarity between clusters by calculating the distance between the centroids of the two clusters
Centroid linkage
__________ is a measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations in the two clusters.
Complete linkage
Which of the following is true of the exponential smoothing coefficient?
It is chosen as the value that minimizes a selected measure of forecast accuracy such as the mean squared error.
Which of the following is true of Euclidean distances?
It is commonly used as a method of measuring dissimilarity between quantitative observations.
Which statement is true of an association rule?
It is ultimately judged on how actionable it is and how well it explains the relationship between item sets.
__________ is a measure that computes the dissimilarity between a cluster AB and a cluster C by averaging the distance between A and C and the distance between B and C.
McQuitty's method
Which of the following measures of forecast accuracy is susceptible to the problem of positive and negative forecast errors offsetting one another?
Mean forecast error
Which of the following is not present in a time series?
Operational variations
If the Euclidean distance were to be represented in a right triangle, which of the following would be considered the distance between two observations of a cluster?
The hypotenuse
Which of the following is not true of a stationary time series?
The time series plot is a straight line.
The exponential smoothing forecast for period t + 1 is a weighted average of the
actual value in period t with weight α and the forecast for period t with weight 1 - α.
A causal model provides evidence of __________ between an independent variable and the variable to be forecast
an association
To identify patterns across transactions, we can use
association rules.
The data preparation technique used in market segmentation to divide consumers into different homogeneous groups is called
cluster analysis