ISDS 361B - Ch. 4

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

The __________ the lift ratio, the ____________ the association rule. a. lower; stronger b. lower; weaker c. higher; stronger d. higher; weaker

c. higher; stronger

Which is NOT a primary option for addressing missing data? a. To discard any variable with missing values b. To fill in missing entries with estimated values c. To discard observations with any missing values d. To generate random data to replace the missing values

d. To generate random data to replace the missing values

____________________ measures cluster similarity by calculating the distance between the centroids of the two clusters. a. Cendroid linkage b. Complete linkage c. Average linkage d. Single linkage

a. Cendroid linkage

In which of the following scenarios would it be appropriate to use hierarchical clustering? a. When the number of clusters is known beforehand. b. When binary or ordinal data needs to be clustered. c. When the number of observations in the dataset is relatively high. d. When it is not necessary to know the nesting of clusters.

b. When binary or ordinal data needs to be clustered.

In preparing categorical variables for analysis, it is usually best to a. convert the categories to numeric representations. b. convert the categories to binary, dummy variables. c. combine as many categories as possible. d. let them remain categorical.

b. convert the categories to binary, dummy variables.

Suppose that the confidence of an association rule is 0.75 and the total number of transactions is 250. How many of those transactions support the consequent if the lift ratio is 1.875? a. 100 b. 150 c. 125 d. 175

a. 100 (.75/1.875) * 250 = 100

A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known as a a. dendrogram. b. decile-wise lift chart. c. cumulative lift tree. d. scatter chart.

a. dendrogram.

Average linkage is a measure of calculating dissimilarity between two clusters by a. finding the distance between the two most dissimilar observations in the two clusters. b. computing the average distance between every pair of observations between two clusters. c. computing the distance between the cluster centroids. d. finding the distance between the two closest observations in the two clusters.

b. computing the average distance between every pair of observations between two clusters.

In k-means clustering, k represents the a. number of observations in a cluster. b. number of clusters. c. number of variables. d. mean of the cluster.

b. number of clusters.

The strength of a cluster can be measured by comparing the average distance in a cluster to the distance between cluster centroids. One rule of thumb is that the ratio for between-cluster distance to within-cluster distance should exceed what value for useful clusters? a. 1 b. 2 c. 0.5 d. 1.5

a. 1

k-means clustering is the process of a. estimating the value of a continuous outcome variable. b. agglomerating observations into a series of nested groups based on a measure of similarity. c. reducing the number of variables to consider in data-mining. d. organizing observations into distinct groups based on a measure of similarity.

d. organizing observations into distinct groups based on a measure of similarity.

Complete linkage can be used to measure the distance between _________ in cluster analysis. a. wards b. observations c. objects d. clusters

d. clusters

Complete linkage can be used to measure the distance between clusters that are the _________________ in cluster analysis. a. most different b. closest c. farthest apart d. most similar

a. most different

In which of the following data-mining process steps is the data manipulated to make it suitable for formal modeling? a. Data sampling b. Data preparation c. Model construction d. Model assessment

b. Data preparation

Heirarchial clusting using ____________ results in a sequence of aggregated clusters that minimizes the loss of information between the individual observation level and the cluster level a. centroid linkage b. Ward's method c. median linkage d. McQuitty's method

b. Ward's method

___________________ can be used to partition observations in a manner to obtain clusters with the least amount of information loss due to the aggregation. Select one: a. Average group linkage b. Ward's method c. Dendrogram d. Single linkage

b. Ward's method

Complete linkage can be used to measure the distance between _________ in cluster analysis. a. objects b. clusters c. observations d. wards

b. clusters

Euclidean distance can be used to calculate the dissimilarity between two observations. Let u = (25, $350) correspond to a 25-year old customer that spent $350 at Store A in the previous fiscal year. Let v = (53, $420) correspond to a 53-year old customer that spent $4,100 at Store A in the previous fiscal year. Calculate the dissimilarity between these two observations using Euclidean distance. a. 88.57 b. 66.21 c. 75.39 d. 72.28

c. 75.39

Which of the following reasons contributes to the increase in the use of data-mining techniques in business? a. The ability to manually analyze all the data b. The dearth of information to analyze and interpret c. The ability to electronically warehouse data d. The lack of methods to electronically track data

c. The ability to electronically warehouse data

Jaccard's coefficient is different from the matching coefficient in that the former a. is affected by the scale used to measure variables while the latter is not. b. measures overlap while the latter measures dissimilarity. c. does not count matching zero entries while the latter does. d. deals with categorical variable while the latter deals with continuous variables.

c. does not count matching zero entries while the latter does.

The endpoint of a k-means clustering algorithm occurs when a. all of the observations are encompassed within a single large cluster with mean k. b. Euclidean distance between observations in a cluster is maximized. c. no further changes are observed in cluster structure and number. d. Euclidean distance between clusters is minimized.

c. no further changes are observed in cluster structure and number.

Single linkage can be used to measure the distance between clusters that are the _______________ in cluster analysis. a. most different b. closest c. most similar d. farthest apart

c. most similar

A method for modifying variables that reduces bias prior to cluster analysis is a. weighting. b. removing outliers. c. standardization. d. randomizing.

c. standardization.

If a model's implications depend on the inclusion or exclusion of outliers, one should spend additional time to track down a. another source of data. b. the missing values. c. the cause of the outliers. d. a better estimation of the outliers.

c. the cause of the outliers.

If the Euclidean distance were to be represented in a right triangle, which of the following would be considered the distance between two observations of a cluster? a. the short leg b. Eudlidean distance is not related to right triangles. c. the hypotenuse d. the long leg

ISDS 361B - Ch. 4

Ensembles d'études connexes

SPH 106 Exam

Objective 3.01 Fruits and Vegetables

The Process of Photosynthesis - Assignment

VW Level F unit 13 Synonyms and Antonyms

NCLEX Questions-Health and Physical Assessment of the Adult Client

2020 APR Test Q's

Managment 310 Practice Exam Questions

Lecture 8-WealthTech and InsurTech

FINC 311 Ch 9

MSK 4- Final: Endgame

Business stats Chapter 1

Air Pressure

Enhancement 06

Speech Final Updated

SOCY242: Final Exam

ECON Lecture 4

World Geography Final

PSY 101 Memory

Bio105- Climate Change

Final Exam: Speech