Stats 2 Ch. 4(2) print 19-48
12. A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known as a _____.
a. dendrogram
3. ___ is a category of data-mining techniques that detect patterns and relationships in the data.
a. descriptive data-mining
6. The k-means clustering is the process of
b. a. organizing observations into one of k groups based on a measure of similarity.
4. The data-mining method that can be used in market segmentation to divide consumers into different homogeneous groups is _____.
b. cluster analysis
8. Jaccard's coefficient is different from the matching coefficient in that the former:
b. does not count matching zero entries while the latter does.
15. An analysis of items frequently co-occurring in transactions (such as purchases) is known as _____.
b. market basket analysis
2. Observation refers to the:
b. set of recorded values of variables associated with a single entity
9. Average group linkage measures dissimilarity between two clusters by considering:
b. the average distance over all pairs of observations between these clusters.
13. If the Euclidean distance were to be represented in a right triangle, which of the following would be considered the distance between two observations?
b. the hypotenuse
5. Which of the following is true of bottom-up hierarchical clustering?
c. it starts with each observation in its own cluster and then iteratively combine two most similar clusters
7. The simplest measure of similarity between observations consisting solely of categorical variables is given by _____.
c. matching coefficient
14. The endpoint of a k-means clustering algorithm occurs when:
c. no further changes are observed in cluster structure and number.
16. A _____ refers to the number of times that a collection of items occur together in a transaction data set.
c. support count
1. Which of the following reasons is responsible for the increase in the use of data-mining techniques in business?
c. the ability to electronically warehouse data
17. In the theory of association rules in data mining, by confidence we mean an estimated probability that
c. the consequent occurs given that the antecedent occurs
18. The lift ratio of an association rule with a confidence value of 0.88 and in which the consequent occurs in 60 out of 100 cases is:
d. 1.47
11. ______ is the vector of the averages computed for each variable across all cluster observations.
d. centroid
10. _____ measures dissimilarity between two clusters by using the distance between the two cluster centroids.
d. complete linkage