MIS 441: Clustering and K-means Clustering

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

stopping criteria for K means clustering

1) object partition does not change, 2) centroid positions don't change, 3) a fixed number of iterations run

steps to K-means clustering (6)

1) pick initial centroids, 2) assign clusters, 3) compute centroids, 4) reassign centroids, 5) compute centroids, 6) converge

________ is the most commonly used example of partitional clustering

K-means

partitional clustering definition

a division of data objects into non-overlapping subsets; each data is in exactly one subset

hierarchical clustering definition

a set of nested clusters organized as a hierarchical tree; each pair of objects is nested in a larger one until only one remains

to assess inter-cluster similarity...

calculate the distance between centroids

each cluster is associated with a randomly chosen ________, the number of which is determined by ______

centroid, K

each point is assigned to the cluster with the __________

closest centroid

when a K-means clustering ________, it reaches a state where the clusters remain unchanged

converges

limitations of k-clustering

differing sizes, different densities, non-globular shapes

the number of K (clusters) depends on....

domain knowledge, software/hardware constraints

a cluster analysis is an __________ data analysis tool used to sort objects into groups

exploratory

clustering breaks large __________ populations into smaller _______ groups

heterogeneous, homogeneous

a good clustering produces high quality clusters where intra-cluster similarity is _____ while inter-cluster similarity is _____

high, low

you can reduce sum of squared error by...

increasing K

solutions to limitations of k-clustering

increasing the size of K

DBI is an index calculated based off of...

inter-cluster differences for pairs of clusters and intra-cluster distances of all clusters; lower=better

______________ distances should be minimized while ___________ distances should be maximized

intra-cluster, inter-cluster

the quality of a clustering depends on...

object representation and similarity of measure used

clustering definition

process of grouping a set of objects into classes based on relation

external criteria to evaluate clusters includes..

purity, rand index

purity definition

ratio between the dominant class of the cluster and the size of the cluster as a whole

rand index definition

ratio between the number of right clustered samples and the total number of samples

supervised classification are....

simple segmentation and query results

to assess intra-cluster similarity, use the...

sum of squared error; the one with the smaller error is better

K stands for..

the number of clusters

clustering is an ________ data mining technique because...

undirected, identifies hidden patterns and structures without a hypothesis, discovers structures with no explanation

clustering is the most common form of _______________

unsupervised learning


Kaugnay na mga set ng pag-aaral

Iggy chapter 45 Review questions

View Set

Healthy Wealthy Wise-Last 3 quizzes

View Set

Ch 1 Introducing Gov. True or False

View Set

Online Question: Chapter 47 Endocrine Dysfunction

View Set

Ch. 13 - Seller's Property Disclosure

View Set

Prepu: Chapter 73: (Beta) Next Generation - NGN

View Set