Chapter 4

Ace your homework & exams now with Quizwiz!

Observation

a set of observed values associated with a single entity, often displayed as a row in a spreadsheet or database

Dendrogram

a tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering

Market basket analysis

analysis of items frequently co-occurring in transactions (such as purchases)

Unsupervised learning

category of data-mining techniques in which an algorithm explains relationships without an outcome variable to guide the process

Confidence

conditional probability that the consequent of an association rule occurs given the antecedent occurs

Euclidean distance

geometric measure of dissimilarity between observations based on the Pythagorean theorem

Association rule

if-then statement describing the relationship b/t item sets

Antecedent

item set corresponding to the if portion of an if-then association rule

Consequent

item set corresponding to the then portion of an if-then association rule

Single linkage

measure of calculating dissimilarity b/t clusters by considering only the two most similar observations between the two clusters

Complete linkage

measure of calculating dissimilarity between clusters by considering only the 2 most dissimilar observations b/t the 2 clusters

Group average linkage

measure of calculating dissimilarity between clusters by considering the distance between each pair of observations between 2 clusters

Matching coefficient

measure of similarity b/t observations based on the # of matching values of categorical variables

Jaccard's coefficient

measure of similarity b/t observations consisting solely of binary categorical variables that considers only matches of nonzero entries

McQuitty's method

measure that computes the dissimilarity b/t a cluster AB (formed by merging clusters A and B) and a cluster C by averaging the distance b/t A and C and the distance b/t B and C

Median linkage

method that computes the similarity b/t 2 clusters as the median of the similarities b/t each pair of observations in the 2 clusters

Ward's method

procedure that partitions observations in a manner to obtain clusters with the least amount of information loss due to the aggregation

Hierarchical clustering

process of agglomerating observations into a series of nested groups based on a measure of similarity

k-Means clustering

process of organizing observations into one of k groups based on a measure of similarity

Dimension reduction

process of reducing the number of variables to consider in a data-mining approach

Lift ratio

ratio of the confidence of an association rule to the benchmark confidence

Missing at random (MAR)

the case when data for a variable is missing due to a relationship b/t other variables

Missing not at random (MNAR)

the case when data for a variable is missing due to its unrecorded value

Missing completely at random (MCAR)

the case when data for a variable is missing purely due to random chance

Support count

the number of times that a collection of items occurs together in a transaction data set

Centroid linkage

uses the averaging concept of cluster centroids to define between-cluster similarity


Related study sets

Pediatric Success - Chapter 8 - Genitourinary

View Set

(UH) Chapter 19 World War I and Its Aftermath

View Set

Reading Part1 (A)__ Synonyms+ Keywords(the link is in the description)

View Set

AU17-PUBHHMP 4650-Midterm Dr. Tasleem Padamsee

View Set