test 2

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Observation refers to the a. set of recorded values of variables associated with a single entity. b. mean of all variable values associated with one particular entity. c. goal of predicting a categorical outcome based on a set of variables. d. estimated continuous outcome variable.

To identify patterns across transactions, we can use a. association rules. b. k-means. c. complete linkage. d. centroid linkage

corpus

A collection of documents to be analyzed.

observation

A set of observed values of variables associated with a single entity, often displayed as a row in a spreadsheet or database.

Euclidean distance can be used to measure the distance between __________ in cluster analysis. a. clusters b. observations c. ward d. objects

In k-means clustering, k represents the a. number of observations in a cluster. b. number of clusters. c. mean of the cluster. d. number of variables.

In the text mining process, the text is first preprocessed by deriving a smaller set of _________ from the larger set of words contained in a collection of documents. a. terms b. tokens c. stack d. stems

The process of extracting useful information from text data is known as __________. a. stemming b. text mining c. corpus d. tokenization

A __________ refers to the number of times a collection of items occurs together in a transaction data set. a. antecedent b. validation count c. support count d. consequent

An analysis of items frequently co-occurring in transactions is known as a. cluster analysis. b. market segmentation. c. market basket analysis. d. regression analysis. Hide Feedback

The process of converting a word to its stem, or root word, is referred to as __________. a. tokenization b. data cleaning c. stemming d. stacking

Which statement is true of an association rule? a. It is a data reduction technique that reduces large information into smaller homogeneous groups. b. It uses analytic models to describe the relationship between metrics that drive business performance. c. It is ultimately judged on how actionable it is and how well it explains the relationship between item sets. d. It seeks to classify a categorical outcome into two or more categories.

unsupervised learning

Category of data-mining techniques in which an algorithm explains relationships without an outcome variable to guide the process.

The data preparation technique used in market segmentation to divide consumers into different homogeneous groups is called a. supervised learning. b. market analysis. c. data visualization. d. cluster analysis

Jaccard's coefficient

Measure of similarity between observations consisting solely of binary categorical variables that considers only matches of nonzero entries.

hierarchical clustering

Process of agglomerating observations into a series of nested groups based on a measure of similarity.

Key performance indicators (KPIs)

Quantifiable measures of performance used to gauge progress toward strategic objectives or agreed standards of performance.

Which of the following is true of Euclidean distances? a. It is commonly used as a method of measuring dissimilarity between quantitative observations. b. It increases with the increase in similarity between variable values. c. It is not affected by the scale on which variables are measured. d. It is used to measure dissimilarity between categorical variable observations.

__________ approaches are designed to describe patterns and relationships in large data sets with many observations of many variables. a. Unsupervised learning b. Data mining c. Dimension reduction d. Data sampling

k-means clustering is the process of a. organizing observations into distinct groups based on a measure of similarity. b. agglomerating observations into a series of nested groups based on a measure of similarity. c. reducing the number of variables to consider in data-mining. d. estimating the value of a continuous outcome variable

Geographic Information System (GIS)

A computer system that stores, organizes, analyzes, and displays geographic data.

Crosstabulation

A tabular summary of data for two variables. The classes for one variable are represented by the rows; the classes for the other variable are represented by the columns.

data-ink ratio

The ratio of the amount of ink used in a table or chart that is necessary to convey information to the total amount of ink used in the table and chart. Ink used that is not necessary to convey information reduces the data-ink ratio.

lift ratio

The ratio of the confidence of an association rule to the benchmark confidence.

सभी स्टडी सेट्स देखें

test 2

संबंधित स्टडी सेट्स

402 Chapter 14 Smartbook

Fences Act Two

Chapter 14: Infant, Child, and Adolescent Nutrition

Final CPSA

Chapter 5: Video tape, Video Media, Video Recorders

psych 343 final

Real Estate Syndicates and Real Estate Investment Trusts

Unit 3 Challenge Questions/ The United States at Mid-Century 1939-1969

COMS 101 with Dr. Alban - Exam 3 - Spring 2019

CLEP Psychology

Exam Module 10 Networking

ISAT LS #3

Chapter 14

Spring and Neap Tides

Community Final Exam (PRACTICE QUESTIONS) 🙏🫂

another

Koenig Social Psychology Final

Fundamentals- Communication

chapter 9

Miscellaneous Test question for exam 2