Business Analytics Chapter 5: Descriptive Data Mining

Ace your homework & exams now with Quizwiz!

The strength of a cluster can be measured by comparing the average distance in a cluster to the distance between cluster centroids. One rule of thumb is that the ratio for between-cluster distance to within-cluster distance should exceed what value for useful clusters? a. 1 b. 2 c. 0.5 d. 1.5

a. 1

In preparing categorical variables for analysis, it is usually best to _____. a. convert the categories to binary, dummy variables b. combine as many categories as possible c. convert the categories to numeric representations d. let them remain categorical

a. convert the categories to binary, dummy variables

Single linkage can be used to measure the distance between clusters that are the _____ in cluster analysis. a. most similar b. farthest apart c. closest d. most different

a. most similar

Suppose the dissimilarity between clusters A and B has the value 24 and the dissimilarity between cluster B and C has the value 12. Use McQuitty's method to determine the dissimilarity of clusters A and B. a. 18 b. 24 c. 12 d. 36

a. 18

Check My Work Euclidean distance can be used to calculate the dissimilarity between two observations. Let u = (25, $350) correspond to a 25-year-old customer that spent $350 at Store A in the previous fiscal year. Let v = (53, $420) correspond to a 53-year-old customer that spent $4,100 at Store A in the previous fiscal year. Calculate the dissimilarity between these two observations using Euclidean distance. a. 75.39 b. 66.21 c. 72.28 d. 88.57

a. 75.39

_____ is a method of calculating dissimilarity between clusters by calculating the distance between the centroids of the two clusters. a. Complete linkage b. Average linkage c. Single linkage d. Centroid linkage

d. Centroid linkage

_____ is used to measure the dissimilarity between text documents. a. Corpus b. Word cloud c. Dendrogram d. Cosine distance

d. Cosine distance

Jaccard's coefficient is different from the matching coefficient in that the former _____. a. is affected by the scale used to measure variables while the latter is not b. measures overlap while the latter measures dissimilarity c. deals with categorical variable while the latter deals with continuous variables d. does not count matching zero entries while the latter does

d. does not count matching zero entries while the latter does

In k-means clustering, k represents the _____. a. number of variables b. number of observations in a cluster c. mean of the cluster d. number of clusters

d. number of clusters

Euclidean distance can be used to measure the distance between _____ in cluster analysis. a. objects b. ward c. clusters d. observations

d. observations

A popular measure for weighing terms based on frequency and uniqueness is _____. a. corpus b. cosine distance c. word cloud d. term frequency times inverse document frequency

d. term frequency times inverse document frequency

_____ is a measure that computes the dissimilarity between a cluster AB and a cluster C by averaging the distance between A and C and the distance between B and C. a. McQuitty's method b. Jaccard's coefficient c. Ward's method d. None of these choices are correct.

a. McQuitty's method

A cluster's _____ can be measured by the difference between the distance value at which a cluster is originally formed and the distance value at which it is merged with another cluster in a dendrogram. a. durability b. dimension c. affordability d. span

a. durability

_____ is the dissimilarity measure that is more robust to outliers than Euclidean distance. a. Matching distance b. Manhattan distance c. Jaccard distance d. Matching coefficient

c. Jaccard distance

_____ is a measure that computes the dissimilarity between a cluster AB and a cluster C by averaging the distance between A and C and the distance between B and C. a. Jaccard's coefficient b. Ward's method c. McQuitty's method d. None of these choices are correct.

c. McQuitty's method

_____ refers to the number of times a collection of items occurs together in a transaction data set. a. A consequent b. Antecedent c. Support d. Validation count

c. Support

If the Euclidean distance were to be represented in a right triangle, which of the following would be considered the distance between two observations of a cluster? a. The short leg b. The long leg c. The hypotenuse d. Euclidean distance is not related to right triangles.

c. The hypotenuse

A cluster's _____ can be measured by the difference between the distance value at which a cluster is originally formed and the distance value at which it is merged with another cluster in a dendrogram. a. dimension b. affordability c. durability d. span

c. durability

The strength of the association rule is known as _____ and is calculated as the ratio of the confidence of an association rule to the benchmark confidence. a. antecedent b. consequent c. lift d. support count

c. lift


Related study sets

Macro Ch 9: Unemployment and Inflation

View Set

Questions from Cancer Management Book

View Set

NU310: (prepU: management of patients with kidney disorders)

View Set

Chapter 41--Management of Musculoskeletal disorders

View Set

Excel Chapter 7 Intro To Business

View Set

Chapter 15 Essentials Fluoroscopic Imaging

View Set

Principles of Marketing - Chapter 2 w/ Practice Quiz

View Set