Chapter 5

¡Supera tus tareas y exámenes ahora con Quizwiz!

In the text mining process, the text is first preprocessed by deriving a smaller set of _____ from the larger set of words contained in a collection of documents. a. tokens b. terms c. stack d. stems

a. tokens

A collection of text documents to be analyzed is called a _____. a. library b. corpus c. consequent d. book

b. corpus

The process of extracting useful information from text data is known as _____. a. corpus b. text mining c. tokenization d. stemming

b. text mining

Suppose we had a data set of from a call center where customers were asked to choose between the following three options: hear account information, billing questions, and customer service. Using the given order of the three options, and using 0-1 dummy variables to encode the categorical variables, which of the following combinations would yield an entry "customer service"? a. 000 b. 100 c. 010 d. 001

d. 001

_____ is a method of calculating dissimilarity between clusters by calculating thedistance between the centroids of the two clusters. a. Centroid linkage b. Complete linkage c. Average linkage d. Single linkage

a. Centroid linkage

Average linkage is a measure of calculating dissimilarity between two clusters by _____. a.finding the distance between the two most dissimilar observations in the two clusters b.computing the average distance between every pair of observations between two clusters c. computing the distance between the cluster centroids d.finding the distance between the two closest observations in the two clusters

b.computing the average distance between every pair of observations between two clusters

_____ refers to the number of times a collection of items occurs together in a transaction data set. a. Validation count b. A consequent c. Antecedent d. Support count

d. Support count

_____ approaches are designed to describe patterns and relationships in large data sets with many observations of many variables. a. Data mining b. Data sampling c. Dimension reduction d. Unsupervised learning

d. Unsupervised learning

To identify patterns across transactions, we can use _____. a. centroid linkage b. k-means c. complete linkage d. association rules

d. association rules

The data preparation technique used in market segmentation to divide consumers into different homogeneous groups is called _____. a. data visualization b. supervised learning c. market analysis d. cluster analysis

d. cluster analysis

In preparing categorical variables for analysis, it is usually best to _____. a. let them remain categorical b. convert the categories to numeric representations c. combine as many categories as possible d. convert the categories to binary, dummy variables

d. convert the categories to binary, dummy variables

Single linkage can be used to measure the distance between clusters that are the _____ in cluster analysis. a. closest b. farthest apart c. most different d. most similar

d. most similar

Euclidean distance can be used to measure the distance between _____ in cluster analysis. a. clusters b. objects c. ward d. observations

d. observations

The process of converting a word to its stem, or root word, is referred to as _____. a. tokenization b. stacking c. data cleaning d. stemming

d. stemming

When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called _____. a. Jaccard's coefficient b. Euclidean distance c. the antecedent d. the matching coefficient

d. the matching coefficient

Euclidean distance can be used to calculate the dissimilarity between two observations. Let u = (25, $350) correspond to a 25-year-old customer that spent $350 at Store A in the previous fiscal year. Let v = (53, $420) correspond to a 53-year-old customer that spent $4,100 at Store A in the previous fiscal year. Calculate the dissimilarity between these two observations using Euclidean distance. a. 72.28 b. 75.39 c. 88.57 d. 66.21

b. 75.39

In which of the following scenarios would it be appropriate to use hierarchical clustering? a.When the number of observations in the dataset is relatively high b. When binary or ordinal data needs to be clustered c. When the number of clusters is known beforehand d. When it is not necessary to know the nesting of clusters

b. When binary or ordinal data needs to be clustered

The strength of the association rule is known as _____ and is calculated as the ratio of the confidence of an association rule to the benchmark confidence. a. consequent b. lift c. support count d. antecedent

b. lift

Which statement is true of an association rule? a.It is a data reduction technique that reduces large information into smaller homogeneous groups. b.It is ultimately judged on how actionable it is and how well it explains the relationship between item sets. c.It uses analytic models to describe the relationship between metrics that drive business performance. d.It seeks to classify a categorical outcome into two or more categories.

b.It is ultimately judged on how actionable it is and how well it explains the relationship between item sets.

_____ is a measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations in the two clusters. a. Average linkage b. Average group linkage c. Complete linkage d. Single linkage

c. Complete linkage

Jaccard's coefficient is different from the matching coefficient in that the former _____. a.deals with categorical variable while the latter deals with continuous variables b.is affected by the scale used to measure variables while the latter is not c. does not count matching zero entries while the latter does d. measures overlap while the latter measures dissimilarity

c. does not count matching zero entries while the latter does

The _____ the lift ratio, the _____ the association rule. a. higher; weaker b. lower; stronger c. higher; stronger d. lower; weaker

c. higher; stronger

An analysis of items frequently co-occurring in transactions is known as _____. a. market segmentation b. regression analysis c. market basket analysis d. cluster analysis

c. market basket analysis

In k -means clustering, k represents the _____. a. number of variables b. mean of the cluster c. number of clusters d. number of observations in a cluster

c. number of clusters

The process of dividing text into separate terms is referred to as _____. a. stemming b. data cleaning c. tokenization d. stacking

c. tokenization


Conjuntos de estudio relacionados

Viktiga begrepp inom judendomen till religionsprov

View Set

GACE Educational Leadership Mometrix Test Preparation

View Set

028 - Chapter 28 - History of the Elizabethan Era

View Set

AMSCO. Chapter 13 : Union in Peril | Analysis Questions

View Set

Classroom expressions - Japanese

View Set

Cellular Transport- El transporte celular

View Set