INFO320 CHAPTER 4

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

__________________ is a measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations in the two clusters.

complete linkage

Average linkage is a measure of calculating dissimilarity between two clusters by

computing the average distance between every pair of observations between two clusters.

Single linkage is a measure of calculating dissimilarity between clusters by

considering only the two most similar observations in the two clusters.

In preparing categorical variables for analysis, it is usually best to

convert the categories to binary, dummy variables

The process of eliminating variables from formal analysis without losing any crucial information is called

dimension reduction

Jaccard's coefficient is different from the matching coefficient in that the former

does not count matching zero entries while the latter does

A cluster's _____________ can be measured by the difference between the distance value at which a cluster is originally formed and the distance value at which it is merged with another cluster in a dendrogram.

durability

The __________ the lift ratio, the ____________ the association rule.

higher, stronger

the strength of the association rule is known as ____________ and is calculated as the ratio of the confidence of an association rule to the benchmark confidence.

lift

An analysis of items frequently co-occurring in transactions is known as

market basket analysis

When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called the

matching coefficient

Options for replacing the missing entries for a variable include replacing the missing value with the variable's mode, mean, or median. Imputing values in this manner is truly valid only if variable values are

missing completely at random

The endpoint of a k-means clustering algorithm occurs when

no further changes are observed in cluster structure and number

Euclidean distance can be used to measure the distance between________________ in cluster analysis.

observations

k-means clustering is the process of

organizing observations into one of k groups based on a measure of similarity

observation refers to

set of recorded values of variables associated withy a single entity

A ___________ refers to the number of times a collection of items occur together in a transaction data set.

support count

Which of the following reason contribute to the increase in the use of data-mining techniques in business?

the ability to electronically warehouse data

If the Euclidean distance were to be represented in a right triangle, which of the following would be considered the distance between two observations of a cluster?

the hypotenuse

In k-means clustering, k represents

the number of clusters

Which is NOT a primary option for addressing missing data?

to generate random data to replace missing data

The goal of _______ is to use the variable values to identify relationships between observations

unsupervised learning

_______________ approaches are designed to describe patterns and relationships in large data sets with many observations of many variables.

unsupervised learning

A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known as a

dendrogram

The strength of a cluster can be measured by comparing the average distance in a cluster to the distance between cluster centroids. One rule of thumb is that the ratio for between-cluster distance to within-cluster distance should exceed what value for useful clusters?

1

In which of the following scenarios would it be appropriate to use hierarchical clustering?

??

Which of the following is true for Euclidean distances?

It is commonly used as a method of measuring dissimilarity between quantitative observations.

Which statement is true of an association rule?

It is ultimately judged on how actionable it is and how well it explains the relationship between item sets

Data preparation includes all of the following except which task?

calculating the confidence ratio for all association rules

If a models implications depends on inclusion or exclusion of outliers, then one should spend additional time to track down

cause of the outliers

____________________ measures cluster similarity by calculating the distance between the centroids of the two clusters.

centroid linkage

The data preparation technique used in market segmentation to divide consumers into different homogeneous groups is called

cluster analysis

In which of the following data-mining process steps is the data manipulated to make it suitable for formal modeling?

data preperation


Kaugnay na mga set ng pag-aaral

MicroB 2800 Ch.5: Viral Structure and Multiplication

View Set

8.1 to 8.2 and 3.6 to 3.7 Gleim Review for Exam 2

View Set

Prenatal Labor and Delivery Final

View Set

Elbow and Forearm + review game questions

View Set

Chapter 4 - Access Control Lists (Exam & Quiz Questions)

View Set