Chapter 5 Review Questions

Ace your homework & exams now with Quizwiz!

Suppose we had a data set of from a call center where customers were asked to choose between the following three options: hear account information, billing questions, and customer service. Using the given order of the three options, and using 0-1 dummy variables to encode the categorical variables, which of the following combinations would yield an entry "customer service"? a. 000 b. 100 c. 010 d. 001

d. 001

The process of dividing text into separate terms is referred to as __________. a. data cleaning b. stemming c. tokenization d. stacking

c. tokenization

A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known as a a. dendrogram. b. scatter chart. c. decile-wise lift chart. d. cumulative lift tree.

a. dendrogram.

When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called the a. matching coefficient. b. Jaccard's coefficient. c. Euclidean distance. d. antecedent.

a. matching coefficient.

Single linkage can be used to measure the distance between clusters that are the __________ in cluster analysis. a. most similar b. most different c. farthest apart d. closest

a. most similar

The process of extracting useful information from text data is known as __________. a. text mining b. tokenization c. stemming d. corpus

a. text mining

The goal of __________ is to use the variable values to identify relationships between observations. a. unsupervised learning b. data mining c. McQuitty's method d. Ward's method

a. unsupervised learning

__________ can be used to partition observations in a manner to obtain clusters with the least amount of information loss due to the aggregation. a. Single linkage b. Ward's method c. Average group linkage d. Dendrogram

b. Ward's method

The data preparation technique used in market segmentation to divide consumers into different homogeneous groups is called a. data visualization. b. cluster analysis. c. market analysis. d. supervised learning.

b. cluster analysis.

Average linkage is a measure of calculating dissimilarity between two clusters by a. finding the distance between the two most dissimilar observations in the two clusters. b. computing the average distance between every pair of observations between two clusters. c. finding the distance between the two closest observations in the two clusters. d. computing the distance between the cluster centroids.

b. computing the average distance between every pair of observations between two clusters.

In preparing categorical variables for analysis, it is usually best to a. convert the categories to numeric representations. b. convert the categories to binary, dummy variables. c. combine as many categories as possible. d. let them remain categorical.

b. convert the categories to binary, dummy variables.

A collection of text documents to be analyzed is called a ___________. a. book b. corpus c. library d. consequent

b. corpus

Jaccard's coefficient is different from the matching coefficient in that the former a. measures overlap while the latter measures dissimilarity. b. does not count matching zero entries while the latter does. c. deals with categorical variable while the latter deals with continuous variables. d. is affected by the scale used to measure variables while the latter is not.

b. does not count matching zero entries while the latter does.

An analysis of items frequently co-occurring in transactions is known as a. market segmentation. b. market basket analysis. c. regression analysis. d. cluster analysis.

b. market basket analysis.

k-means clustering is the process of a. agglomerating observations into a series of nested groups based on a measure of similarity. b. organizing observations into distinct groups based on a measure of similarity. c. reducing the number of variables to consider in data-mining. d. estimating the value of a continuous outcome variable.

b. organizing observations into distinct groups based on a measure of similarity

Observation refers to the a. estimated continuous outcome variable. b. set of recorded values of variables associated with a single entity. c. goal of predicting a categorical outcome based on a set of variables. d. mean of all variable values associated with one particular entity.

b. set of recorded values of variables associated with a single entity.

Euclidean distance can be used to calculate the dissimilarity between two observations. Let u = (25, $350) correspond to a 25-year-old customer that spent $350 at Store A in the previous fiscal year. Let v = (53, $420) correspond to a 53-year-old customer that spent $4,100 at Store A in the previous fiscal year. Calculate the dissimilarity between these two observations using Euclidean distance. a. 66.21 b. 72.28 c. 75.39 d. 88.57

c. 75.39

__________ is a measure that computes the dissimilarity between a cluster AB and a cluster C by averaging the distance between A and C and the distance between B and C. a. Ward's method b. Jaccard's coefficient c. McQuitty's method d. None of these are correct.

c. McQuitty's method

Euclidean distance can be used to measure the distance between __________ in cluster analysis. a. objects b. clusters c. observations d. ward

c. observations

__________ is a method of calculating dissimilarity between clusters by calculating the distance between the centroids of the two clusters. a. Single linkage b. Complete linkage c. Average linkage d. Centroid linkage

d. Centroid linkage

Which of the following is true of Euclidean distances? a. It is used to measure dissimilarity between categorical variable observations. b. It is not affected by the scale on which variables are measured. c. It increases with the increase in similarity between variable values. d. It is commonly used as a method of measuring dissimilarity between quantitative observations.

d. It is commonly used as a method of measuring dissimilarity between quantitative observations.


Related study sets

Ch. 2 - Job Order Costing: Calculating Unit Product Costs

View Set

What impacts does deforestation have with climate change?(25)

View Set

spanish words and phrases - set 1

View Set

Trigonometry Graphing sine and cosine functions

View Set

Introduction to Marketing Quiz 1 Class 1-6 + Guest Lecture

View Set