exam 2 slides

Ace your homework & exams now with Quizwiz!

item support formula

# transactions with X/ #total transactions

Association analysis algorithms

-Apriori -Eclat -ZeroR -FP Growth -ETC

Uses for cluster results

-Data segmentation -Categories for classifying new data -Labeled data for classification -Anomaly detection

the three similarity measures

-Euclidean Distance -Manhattan Distance -Cosine Similarity

Cluster Analysis Characteristics

-Unsupervised -No labels for clusters -No 'correct' clustering

Association analysis steps

-create item sets -identify frequent item sets -generate rules

how to evaluate cluster results

-find error -find square error -sum of square errors between all samples and centroids -sum > WSSE -if WSSE 1 < WSSE2 -> WSSE 1 is numerically better.

key uses of rule confidence

-frequent item sets --> significant rules

When to stop iterating

-no changes to centroid -number of samples changing clusters below threshold

K-Means algorithm steps

-start -Select initial centroids (K) -assign each sample to a centroid -calculate cluster mean to find new centroid

Association analysis characteristics

-unsupervised -rules usefulness is subjective -Need to determine uses of rules

residual distance in leas squares method

squared distance from regression line

Manhattan Distance

A measure of travel through a grid system like navigating around the buildings and blocks of Manhattan, NYC via horizontal and vertical paths

cosine similarity

Cosine of the angles between points A and B.

Goal of regression analysis

Given input variables predict numerical output

Error in cluster analysis

distance between sample and centroid

rule confidence formula

conf(x>y) = supp(X U Y) / supp(X)

square error

error ^2

Issue with initial centroid

final clusters are sensitive to initial clusters

Goal of association analysis

find rules to capture associations between items/events

cluster analysis goal

organize similar items into groups aka clusters. differences between items within a cluster are minimized while differences between items in another cluster are maximized

Solution to initial centroid issue

run k-means multiple times and choose best results

Euclidean Distance

the distance between two points measured as a straight line.

x--> Y rule

x is the antecedent y is the consequent


Related study sets

Chapters 5 & 6 - Public Opinion & Political Socialization and The Media

View Set

High-risk newborn chapter 10 NCLEX book

View Set

NFS Chapter 12 Practice Questions

View Set

A Man For All Seasons Characters

View Set

OB- kahoot/lecture 7 practice questions

View Set