BUSI 3304 CH 5 Quiz

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

The process of extracting useful information from text data is known as _____. a.text mining b.corpus c.stemming d.tokenization

a.text mining

In which of the following scenarios would it be appropriate to use hierarchical clustering? a.When the number of clusters is known beforehand b.When the number of observations in the dataset is relatively high c.When binary or ordinal data needs to be clustered d.When it is not necessary to know the nesting of clusters

c.When binary or ordinal data needs to be clustered

In k -means clustering, k represents the _____. a.number of variables b.mean of the cluster c.number of observations in a cluster d.number of clusters

d.number of clusters

The _____ the lift ratio, the _____ the association rule. a.higher; weaker b.higher; stronger c.lower; weaker d.lower; stronger

b.higher; stronger

In the text mining process, the text is first preprocessed by deriving a smaller set of _____ from the larger set of words contained in a collection of documents. a.stems b.stack c.tokens d.terms

c.tokens

The data preparation technique used in market segmentation to divide consumers into different homogeneous groups is called _____. a.market analysis b.data visualization c.supervised learning d.cluster analysis

d.cluster analysis

Average linkage is a measure of calculating dissimilarity between two clusters by _____. a.finding the distance between the two most dissimilar observations in the two clusters b.computing the average distance between every pair of observations between two clusters c.computing the distance between the cluster centroids d.finding the distance between the two closest observations in the two clusters

b.computing the average distance between every pair of observations between two clusters

Which statement is true of an association rule? a.It seeks to classify a categorical outcome into two or more categories. b.It is a data reduction technique that reduces large information into smaller homogeneous groups. c.It is ultimately judged on how actionable it is and how well it explains the relationship between item sets. d.It uses analytic models to describe the relationship between metrics that drive business performance.

c.It is ultimately judged on how actionable it is and how well it explains the relationship between item sets.

_____ is a method of calculating dissimilarity between clusters by calculating the distance between the centroids of the two clusters. a.Centroid linkage b.Average linkage c.Single linkage d.Complete linkage

a.Centroid linkage

_____ refers to the number of times a collection of items occurs together in a transaction data set. a.Support count b.Validation count c.A consequent d.Antecedent

a.Support count

In preparing categorical variables for analysis, it is usually best to _____. a.convert the categories to binary, dummy variables b.combine as many categories as possible c.let them remain categorical d.convert the categories to numeric representations

a.convert the categories to binary, dummy variables

A collection of text documents to be analyzed is called a _____. a.book b.corpus c.consequent d.library

b.corpus

Jaccard's coefficient is different from the matching coefficient in that the former _____. a.deals with categorical variable while the latter deals with continuous variables b.does not count matching zero entries while the latter does c.is affected by the scale used to measure variables while the latter is not d.measures overlap while the latter measures dissimilarity

b.does not count matching zero entries while the latter does

Suppose we had a data set of from a call center where customers were asked to choose between the following three options: hear account information, billing questions, and customer service. Using the given order of the three options, and using 0-1 dummy variables to encode the categorical variables, which of the following combinations would yield an entry "customer service"? a.010 b.100 c.001 d.000

c.001

Euclidean distance can be used to calculate the dissimilarity between two observations. Let u = (25, $350) correspond to a 25-year-old customer that spent $350 at Store A in the previous fiscal year. Let v = (53, $420) correspond to a 53-year-old customer that spent $4,100 at Store A in the previous fiscal year. Calculate the dissimilarity between these two observations using Euclidean distance. a.72.28 b.88.57 c.75.39 d.66.21

c.75.39 =SQRT((25-53)^2 + (350-420)^2)

_____ is a measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations in the two clusters. a.Average group linkage b.Average linkage c.Complete linkage d.Single linkage

c.Complete linkage

An analysis of items frequently co-occurring in transactions is known as _____. a.regression analysis b.cluster analysis c.market basket analysis d.market segmentation

c.market basket analysis

Single linkage can be used to measure the distance between clusters that are the _____ in cluster analysis. a.closest b.most different c.most similar d.farthest apart

c.most similar

Euclidean distance can be used to measure the distance between _____ in cluster analysis. a.ward b.objects c.observations d.clusters

c.observations

When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called _____. a.Euclidean distance b.Jaccard's coefficient c.the matching coefficient d.the antecedent

c.the matching coefficient

The process of dividing text into separate terms is referred to as _____. a.stemming b.stacking c.tokenization d.data cleaning

c.tokenization

_____ approaches are designed to describe patterns and relationships in large data sets with many observations of many variables. a.Data sampling b.Dimension reduction c.Data mining d.Unsupervised learning

d.Unsupervised learning

To identify patterns across transactions, we can use _____. a.k-means b.centroid linkage c.complete linkage d.association rules

d.association rules

The strength of the association rule is known as _____ and is calculated as the ratio of the confidence of an association rule to the benchmark confidence. a.support count b.antecedent c.consequent d.lift

d.lift

The process of converting a word to its stem, or root word, is referred to as _____. a.data cleaning b.tokenization c.stacking d.stemming

d.stemming


Kaugnay na mga set ng pag-aaral

RELATIONS AND FUNCTIONS: DEFINITIONS

View Set

BIO 109 UNIT II Picture Questions, Chapter 53, Unit 6 Mastering Biology, AP Biology Chapters 52-54 Test, biology exam 1

View Set

Delirium Pearson NCLEX Questions

View Set

massage therapy Mid Term! ch. 1, 2, 4, 5, 9, 10, 15.

View Set