ITM 466

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

The lift ratio of an association rule with a confidence value of 0.45 and in which the consequent occurs in 6 out of 10 cases is

0.75

The strength of a cluster can be measured by comparing the average distance in a cluster to the distance between cluster centroids. One rule of thumb is that the ratio for between-cluster distance to within-cluster distance should exceed what value for useful clusters?

1

Which of the following states the objective of time series analysis?

To uncover a pattern in a time series and then extrapolate the pattern into the future

__________ approaches are designed to describe patterns and relationships in large data sets with many observations of many variables.

Unsupervised learning

Hierarchical clustering using __________ results in a sequence of aggregated clusters that minimizes the loss of information between the individual observation level and the cluster level

Ward's method

__________ can be used to partition observations in a manner to obtain clusters with the least amount of information loss due to the aggregation

Ward's method

In which of the following scenarios would it be appropriate to use hierarchical clustering?

When binary or ordinal data needs to be clustered

The moving averages and exponential smoothing methods are appropriate for a time series exhibiting

a horizontal pattern.

Average linkage is a measure of calculating dissimilarity between two clusters by

computing the average distance between every pair of observations between two clusters

Single linkage is a measure of calculating dissimilarity between clusters by

considering only the two most similar observations in the two clusters

In preparing categorical variables for analysis, it is usually best to

convert the categories to binary, dummy variables

A collection of text documents to be analyzed is called a ___________.

corpus

A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known as a

dendrogram.

The mean absolute error, mean squared error, and mean absolute percentage error are all methods to measure the accuracy of a forecast. These methods measure forecast accuracy by

determining how well a particular forecasting method is able to reproduce the time series data that are already available

Jaccard's coefficient is different from the matching coefficient in that the former

does not count matching zero entries while the latter does.

A time series with a seasonal pattern can be modeled by treating the season as a

dummy variable.

A cluster's __________ can be measured by the difference between the distance value at which a cluster is originally formed and the distance value at which it is merged with another cluster in a dendrogram.

durability

The __________ the lift ratio, the __________ the association rule.

higher; stronger

Forecast error

is associated with measuring forecast accuracy

The value of an independent variable from the prior period is referred to as a

lagged variable

The strength of the association rule is known as __________ and is calculated as the ratio of the confidence of an association rule to the benchmark confidence

lift

An analysis of items frequently co-occurring in transactions is known as

market basket analysis.

When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called the

matching coefficient

Complete linkage can be used to measure the distance between clusters that are the __________ in cluster analysis

most different

Single linkage can be used to measure the distance between clusters that are the __________ in cluster analysis.

most similar

The endpoint of a k-means clustering algorithm occurs when

no further changes are observed in cluster structure and number.

The process of __________ might be used to determine the value of the smoothing constant that minimizes the mean squared error

nonlinear optimization

In k-means clustering, k represents the

number of clusters.

In the moving averages method, the order k determines the

number of time series values under consideration

Euclidean distance can be used to measure the distance between __________ in cluster analysis

observations

Autoregressive models

occur whenever all the independent variables are previous values of the time series.

k-means clustering is the process of

organizing observations into distinct groups based on a measure of similarity.

Causal models

relate a time series to other variables that are believed to explain or cause its behavior.

For causal modeling, __________ are used to detect linear or nonlinear relationships between the independent and dependent variables

scatter charts

A time series that shows a recurring pattern over one year or less is said to follow a

seasonal pattern

Observation refers to the

set of recorded values of variables associated with a single entity.

With reference to time series data patterns, a cyclical pattern is the component of the time series that

shows a periodic pattern lasting more than one year

With reference to exponential forecasting models, a parameter that provides the weight given to the most recent time series value in the calculation of the forecast value is known as the

smoothing constant.

Using a large value for order k in the moving averages method is effective in

smoothing out random fluctuations

A method for modifying variables that reduces bias prior to cluster analysis is

standardization.

The process of converting a word to its stem, or root word, is referred to as

stemming

A __________ refers to the number of times a collection of items occurs together in a transaction data set.

support count

The process of extracting useful information from text data is known as __________.

text mining

Trend refers to

the long-run shift or movement in the time series observable over several periods of time

An exponential trend pattern occurs when

the percentage change between periods in the value of the variable is relatively constant.

If a time series plot exhibits a horizontal pattern, then

there is still not enough evidence to conclude that the time series is stationary

A set of observations on a variable measured at successive points in time or over successive periods of time constitute a

time series

The process of dividing text into separate terms is referred to as __________.

tokenization

In the text mining process, the text is first preprocessed by deriving a smaller set of _________ from the larger set of words contained in a collection of documents

tokens

A positive forecast error indicates that the forecasting method ________ the dependent variable.

underestimated

The goal of __________ is to use the variable values to identify relationships between observations.

unsupervised learning

The moving averages method refers to a forecasting method that

uses the average of the most recent data values in the time series as the forecast for the next period.

Complete linkage can be used to measure the distance between _________ in cluster analysis

clusters

__________ uses a weighted average of past time series values as the forecast.

Exponential smoothing

__________ is the amount by which the predicted value differs from the observed value of the time series variable.

Forecast error

Which of the following statements is the objective of the moving averages and exponential smoothing methods?

To smooth out random fluctuations in the time series

Suppose that the confidence of an association rule is 0.75 and the total number of transactions is 250. How many of those transactions support the consequent if the lift ratio is 1.875?

100

Suppose the dissimilarity between clusters A and B has the value 24 and the dissimilarity between cluster B and C has the value 12. Use McQuitty's method to determine the dissimilarity of clusters A and B

18

Demand for a product and the forecasting department's forecast (naïve model) for a product are shown below. Compute the mean absolute error. Period Actual Demand Forecasted Demand 1 12 - - 2 15 12 3 14 15 4 18 16

2

If the forecasted value of the time series variable for period 2 is 22.5 and the actual value observed for period 2 is 25, what is the forecast error in period 2?

2.5

Suppose for a particular week, the forecasted sales were $4,000. The actual sales were $3,000. What is the value of the mean absolute percentage error?

33.3%

Demand for a product and the forecasting department's forecast (naïve model) for a product are shown below. Compute the mean squared error Period Actual Demand Forecasted Demand 1 12 - - 2 15 12 3 14 15 4 18 16

4.67

Euclidean distance can be used to calculate the dissimilarity between two observations. Let u = (25, $350) correspond to a 25-year-old customer that spent $350 at Store A in the previous fiscal year. Let v = (53, $420) correspond to a 53-year-old customer that spent $4,100 at Store A in the previous fiscal year. Calculate the dissimilarity between these two observations using Euclidean distance.

75.39

A forecast is defined as a(n)

A prediction of future values of a time series.

Which is not true regarding trend patterns?

Can result when business conditions shift to a new level at some point in time

__________ is a method of calculating dissimilarity between clusters by calculating the distance between the centroids of the two clusters

Centroid linkage

__________ is a measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations in the two clusters.

Complete linkage

Which of the following is true of the exponential smoothing coefficient?

It is chosen as the value that minimizes a selected measure of forecast accuracy such as the mean squared error.

Which of the following is true of Euclidean distances?

It is commonly used as a method of measuring dissimilarity between quantitative observations.

Which statement is true of an association rule?

It is ultimately judged on how actionable it is and how well it explains the relationship between item sets.

__________ is a measure that computes the dissimilarity between a cluster AB and a cluster C by averaging the distance between A and C and the distance between B and C.

McQuitty's method

Which of the following measures of forecast accuracy is susceptible to the problem of positive and negative forecast errors offsetting one another?

Mean forecast error

Which of the following is not present in a time series?

Operational variations

If the Euclidean distance were to be represented in a right triangle, which of the following would be considered the distance between two observations of a cluster?

The hypotenuse

Which of the following is not true of a stationary time series?

The time series plot is a straight line.

The exponential smoothing forecast for period t + 1 is a weighted average of the

actual value in period t with weight α and the forecast for period t with weight 1 - α.

A causal model provides evidence of __________ between an independent variable and the variable to be forecast

an association

To identify patterns across transactions, we can use

association rules.

The data preparation technique used in market segmentation to divide consumers into different homogeneous groups is called

cluster analysis


संबंधित स्टडी सेट्स

Chapter 18: Sterilization and Disinfection

View Set

Chapter 7: terminating an agency

View Set

CDE 232: Human Development Chapter 4 Questions

View Set

Chapter 10: Resistance Training: Programming and Progressions

View Set