test 2

Ace your homework & exams now with Quizwiz!

Observation refers to the a. set of recorded values of variables associated with a single entity. b. mean of all variable values associated with one particular entity. c. goal of predicting a categorical outcome based on a set of variables. d. estimated continuous outcome variable.

A

To identify patterns across transactions, we can use a. association rules. b. k-means. c. complete linkage. d. centroid linkage

A

corpus

A collection of documents to be analyzed.

observation

A set of observed values of variables associated with a single entity, often displayed as a row in a spreadsheet or database.

Euclidean distance can be used to measure the distance between __________ in cluster analysis. a. clusters b. observations c. ward d. objects

B

In k-means clustering, k represents the a. number of observations in a cluster. b. number of clusters. c. mean of the cluster. d. number of variables.

B

In the text mining process, the text is first preprocessed by deriving a smaller set of _________ from the larger set of words contained in a collection of documents. a. terms b. tokens c. stack d. stems

B

The process of extracting useful information from text data is known as __________. a. stemming b. text mining c. corpus d. tokenization

B

A __________ refers to the number of times a collection of items occurs together in a transaction data set. a. antecedent b. validation count c. support count d. consequent

C

An analysis of items frequently co-occurring in transactions is known as a. cluster analysis. b. market segmentation. c. market basket analysis. d. regression analysis. Hide Feedback

C

The process of converting a word to its stem, or root word, is referred to as __________. a. tokenization b. data cleaning c. stemming d. stacking

C

Which statement is true of an association rule? a. It is a data reduction technique that reduces large information into smaller homogeneous groups. b. It uses analytic models to describe the relationship between metrics that drive business performance. c. It is ultimately judged on how actionable it is and how well it explains the relationship between item sets. d. It seeks to classify a categorical outcome into two or more categories.

C

unsupervised learning

Category of data-mining techniques in which an algorithm explains relationships without an outcome variable to guide the process.

The data preparation technique used in market segmentation to divide consumers into different homogeneous groups is called a. supervised learning. b. market analysis. c. data visualization. d. cluster analysis

D

Jaccard's coefficient

Measure of similarity between observations consisting solely of binary categorical variables that considers only matches of nonzero entries.

hierarchical clustering

Process of agglomerating observations into a series of nested groups based on a measure of similarity.

Key performance indicators (KPIs)

Quantifiable measures of performance used to gauge progress toward strategic objectives or agreed standards of performance.

Which of the following is true of Euclidean distances? a. It is commonly used as a method of measuring dissimilarity between quantitative observations. b. It increases with the increase in similarity between variable values. c. It is not affected by the scale on which variables are measured. d. It is used to measure dissimilarity between categorical variable observations.

A

__________ approaches are designed to describe patterns and relationships in large data sets with many observations of many variables. a. Unsupervised learning b. Data mining c. Dimension reduction d. Data sampling

A

k-means clustering is the process of a. organizing observations into distinct groups based on a measure of similarity. b. agglomerating observations into a series of nested groups based on a measure of similarity. c. reducing the number of variables to consider in data-mining. d. estimating the value of a continuous outcome variable

A

Geographic Information System (GIS)

A computer system that stores, organizes, analyzes, and displays geographic data.

Crosstabulation

A tabular summary of data for two variables. The classes for one variable are represented by the rows; the classes for the other variable are represented by the columns.

data-ink ratio

The ratio of the amount of ink used in a table or chart that is necessary to convey information to the total amount of ink used in the table and chart. Ink used that is not necessary to convey information reduces the data-ink ratio.

lift ratio

The ratio of the confidence of an association rule to the benchmark confidence.


Related study sets

Chapter 14: Infant, Child, and Adolescent Nutrition

View Set

Chapter 5: Video tape, Video Media, Video Recorders

View Set

Real Estate Syndicates and Real Estate Investment Trusts

View Set

Unit 3 Challenge Questions/ The United States at Mid-Century 1939-1969

View Set

COMS 101 with Dr. Alban - Exam 3 - Spring 2019

View Set

Miscellaneous Test question for exam 2

View Set