Clustering

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

The strength of the association rule is known as _____ and is calculated as the ratio of the confidence of an association rule to the benchmark confidence.

lift

_____ refers to the number of times a collection of items occurs together in a transaction data set.

Support

Hierarchical clustering using _____ results in a sequence of aggregated clusters that minimizes the loss of information between the individual observation level and the cluster level.

Ward's method

In which of the following scenarios would it be appropriate to use hierarchical clustering?

When binary or ordinal data needs to be clustered

In k-means clustering, k represents the _____.

number of clusters

Euclidean distance can be used to measure the distance between _____ in cluster analysis.

observations

k-means clustering is the process of _____.

organizing observations into distinct groups based on a measure of similarity

A method for modifying variables that reduces bias prior to cluster analysis is _____.

standardization

When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called the _____.

the matching coefficient

The _____ the lift ratio, the _____ the association rule.

higher; stronger

The strength of a cluster can be measured by comparing the average distance in a cluster to the distance between cluster centroids. One rule of thumb is that the ratio for between-cluster distance to within-cluster distance should exceed what value for useful clusters?

1

In preparing categorical variables for analysis, it is usually best to _____.

convert the categories to binary, dummy variables

Which of the following is true of Euclidean distances?

it is commonly used as a method of measuring dissimilarity between quantitative observations

An analysis of items frequently co-occurring in transactions is known as _____.

market basket analysis

Euclidean distance can be used to calculate the dissimilarity between two observations. Let u = (25, $350) correspond to a 25-year-old customer that spent $350 at Store A in the previous fiscal year. Let v = (53, $420) correspond to a 53-year-old customer that spent $4,100 at Store A in the previous fiscal year. Calculate the dissimilarity between these two observations using Euclidean distance.

75.39

_____ is a measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations in the two clusters.

Complete linkage

Which statement is true of an association rule?

It is ultimately judged on how actionable it is and how well it explains the relationship between item sets.

To identify patterns across transactions, we can use _____.

association rules

The data preparation technique used in market segmentation to divide consumers into different homogeneous groups is called _____.

cluster analysis


Kaugnay na mga set ng pag-aaral

mobility practice questions prep

View Set

Chapter 33. Nursing Care of Patients With Upper Gastrointestinal Disorders

View Set

EBF - Fagbegreber, Forkortelser, Fill in The Blank og Quiz

View Set

Security Program Administrative & Operational Services - Part 3

View Set

Excel Chapter 2, Excel Chapter 1

View Set

OAE 013 Early Childhood Special Education

View Set

Codes 3 lettres des aéroports et villes Afrique

View Set

OB - Chapter 18: Nursing Management of the Newborn, OB - Chapter 17: Newborn Transitioning, OB - Chapter 15: Postpartum Adaptations

View Set