Problem Set #5

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

The __________ the lift ratio, the __________ the association rule. higher; weaker lower; weaker higher; stronger lower; stronger

higher; stronger

An analysis of items frequently co-occurring in transactions is known as a) market segmentation. b) market basket analysis. c) regression analysis. d) cluster analysis.

market basket analysis

When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called the a) matching coefficient. b) Jaccard's coefficient. c) Euclidean distance. d) antecedent.

matching coefficient

Single linkage can be used to measure the distance between clusters that are the __________ in cluster analysis. a) most similar b) most different c) farthest apart d) closest

most similar

The process of extracting useful information from text data is known as __________ Corpus stemming tokenization text mining

Text mining

A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known a dendrogram scatter chart decile-wise lift chart cumulative lift tree.

dendrogram

The strength of a cluster can be measured by comparing the average distance in a cluster to the distance between cluster centroids. One rule of thumb is that the ratio for between-cluster distance to within-cluster distance should exceed what value for useful clusters? a) 0.5 b) 1 c) 1.5 d) 2

1

Euclidean distance can be used to calculate the dissimilarity between two observations. Let u = (25, $350) correspond to a 25-year-old customer that spent $350 at Store A in the previous fiscal year. Let v = (53, $420) correspond to a 53-year-old customer that spent $4,100 at Store A in the previous fiscal year. Calculate the dissimilarity between these two observations using Euclidean distance. a) 66.21 b) 72.28 c) 75.39 d) 88.57

75.39

__________ is a method of calculating dissimilarity between clusters by calculating the distance between the centroids of the two clusters. a) Single linkage b) Complete linkage c) Average linkage d) Centroid linkage

Centroid linkage

Complete linkage can be used to measure the distance between clusters that are the __________ in cluster analysis . a) most similar b) most different c) farthest apart d) closes

Most different

k-means clustering is the process of reducing the number of variables to consider in data-mining estimating the value of a continuous outcome variable. agglomerating observations into a series of nested groups based on a measure of similarity .organizing observations into distinct groups based on a measure of similarity.

organizing observations into distinct groups based on a measure of similarity

The lift ratio of an association rule with a confidence value of 0.45 and in which the consequent occurs in 6 out of 10 cases is a) 1.40. b) 0.54. c) 1.00. d) 0.75

0.75

Which of the following is true of Euclidean distances? It is used to measure dissimilarity between categorical variable observations. It is not affected by the scale on which variables are measured. It is commonly used as a method of measuring dissimilarity between quantitative observations. It increases with the increase in similarity between variable values

It is commonly used as a method of measuring dissimilarity between quantitative observations.

To identify patterns across transactions, we can use a) association rules. b) complete linkage. c) centroid linkage. d) k-means

association rules

The data preparation technique used in market segmentation to divide consumers into different homogeneous groups is called cluster analysis supervised learning market analysis data visualization

cluster analysis

In preparing categorical variables for analysis, it is usually best to convert the categories to binary, dummy variables. let them remain categorical. combine as many categories as possible. convert the categories to numeric representations, for example, convert it to 1, 2, 3 .

it is usually best to convert the categories to binary, dummy variables.

The strength of the association rule is known as __________ and is calculated as the ratio of the confidence of an association rule to the benchmark confidence. a) lift b) antecedent c) support count d) consequent

lift


Kaugnay na mga set ng pag-aaral

TIM 431 Chapter 9 - Strategy Review Evaluation and Control

View Set

End of Year Review: Ancient Civilizations

View Set

3.1 Sources of finance - IB Business Management

View Set