Clustering

Ace your homework & exams now with Quizwiz!

How does single linkage clustering work?

- treat each point as cluster and then merge clusters with smallest distance until you have single cluster. end up with a tree structure dendrogram

what happens we cluster assignments using argmax in EM?

- treat it as K-Means because we will pick the best cluster for each point rather than assigning a probability

How can we implement the EM algorithm?

- using the Gaussian Mixture Model (GMM) algorithm, which uses Gaussian to pick the hyperparameters (mean and standard deviation) in order to find this values we use Bayes rule and gaussian formula

Can K-Means get stuck?

- yes, K-Means can get stuck in local optima, the way to deal with is to do random restarts and selecting initial centroids far away. Then picking the best solution that minimizes the SSE

Is K-Means sensitive to outliers?

- yes, becasuse k-means uses the mean of the cluster data points to find the centroids

What are some ways to terminate K-Means?

- For fixed number of iterations - assignment of points do not change between iterations - centroids do not change - terminate when RSS falls below threshold

When can K-Means fail to give good results?

- Data point with outliers - data points with different densities - data points with non - convex shapes

Clustering is affected by correlated features because

- they carry the same weight as any other feature

As you increase K, you will always get better likelihhod of the data?

- False, won't improve after K>N

What is the k-means clustering algorithms?

- Init k random centroids - assign each point to the closest centroid - update the centroids based on the new mean - repeat 2 and 3 until converge(the centroids stop moving)

In K-Means the training error does not increase as the # iterations increases

- True, error gets small and smaller

Can decision tree be used for clustering?

- Yes, can custom construct a tree to capture the probability distribution of the data and then prune the tree to useful clusters. The final results clusters will be divided into rectangular regions.

How is K-Means related to Hill climbing?

- because in K-means as in Hill climbing you are moving in the direction of minimizing error. In K-means, we only move the centroids if the error(distance) gets smaller.

What are some assomptions made by k-means?

- clusters are spherical in shape - clusters are similar in sizes

What are the properties of K-Means clustering

- each iteration is polynomial O(kn) - finite (exponential) iterations (Ok^n) -error decrease always decrease if ties broken correctly

Properties of EM

- each iteration monotomically non-decreasing likelihood (not getting worse) - does no converge(because of infinite distributions probababilities) - does not diverge - works with any distribution

What can you do to K-Means to find clusters in arbitraty shapes?

- extend K-Means to probabilistic model like GMM to find points with gaussian probability distribution

What tree structure is single link clustering equivalent to ?

- greedy algorithm for finding the minimum spanning tree

Can you do to improve K-Means performace?

- initialize centroids at different locations - adjust the number of iterations - find out the optimal number of clusters (k)

Advantage of using Hierchical clustering over k-means and viseversa?

- k means is ofter more fair. clear objective function and circular cluster shapes - hierchical clusters produces many clusters so its slower, and no need to specify k. Can learn long skinny clusters that k-means cannot

How choose K

- run k-means with different values of k and plot the mean distance as a function of k then observe the elbow point. The point at which it stops getting large reductions in SSE (distance from each point to centroid)

EM Algorithm

- start with k random placed gaussians (mean, variance) - for each point: --soft assign point to clusters - reestimate the mean and variance to fit new assignments - iterate until converge

What makes K-Means fast?

- the loop, which assigns objects to clusters and recomputes the centroids until no more changes. only takes a few iterations and complexity is O(k*n) k = number of centroids n = number of objects.

What objective function of K-Means?

-minimizes the sum of square distances from points to centroids - variance if Eucledean distance

Way of finding optimal k value for K-Means?

-try multiple values to k and select k at the elbow point which minimizes the distortion measure

What is the impossibility theorem? Does K-Means or EM suffer from this theorem?

Impossibility theorem states that no clustering method can have more than 2 of the following properties: richness, scale invariant, and consistency. K-Means and EM have richness and scale invariance, but not consistency. For example, if we shrank the distance between points inside a cluster it will not produce the same results.

Which clustering algorithms suffers from convergence at local optima?

K-Means and EM, not Hierarchical Clustering

What is the goal of K-Means? What is the objective function is trying to minimize?

K-Means tries to minimize the total intracluster variance or average distance error.

Compare K-Means and EM clustering algorithms?

K-Means uses -hard clustering to assign data points to a single cluster. No overlapping clusters and -bias towards spherical clusters. -uses the mean to assign points to centroids. -Minimizes the SSE EM - soft clustering so assigns probability of points belonging to any cluster - can map overlapping clusters - calculates the likelihood of points being in a cluster based on probability distribution -maximizes the likelihood function Both: - can converge to local optimum - both need to be initialized with k value - both iterative algorithms to assign points to clusters

What type of clustering method does K-Means use? Hard or Soft clustering?

K-Means uses hard clustering to partition data. Each point must belong to only 1 cluster

Compare KNN and K-Means ?

KNN and K-means both use similarity to assign points to either a class or cluster. However, both very different. KNN is a supervised learning algorithm in which you need the label in order to make predictions. K-Means is unsupervised learning algorithm and we do not need any labels, just the number of clusters k and input data.

For 2 runs of K-Means do you expect the same clusering results?

No, K-Means convergence depends on centroids initialization. Its suggested to run K-Mean multiple times to find better solutions

Explain the complexity of K-Means

O(t*k*n*d) In each iteration t, all the n points are compared to k centroids to assign them to nearest centroid, each distance computation complexity is O(d)

Clustering properties. Does K-Means and EM have this properties?

Richness : for any assignment, there is some distance D such that Pd returns that clustering Scale invariance: scaling distances by positive value does not change the clustering Consistency: shrinking intracluster distance and expanding interclustering distance does not change the clustering Both K-Means and EM have Richness and Scale invariance but not consistency. Due to the impossible theorem

Single Link Clustering vs Complete and Average

Single Link Clustering - distance between closest points in clusters - produces long chains - ensures that nearby points end up in the same clusters Complete Clustering - distance between furthest points in clusters - forces spherical clusters Average Clustering - average distance between points in clusters -less affected by outliers Output for this clustering techniques is a dendogram : tree strucuture diagram

When would you use Single-link clustering over K-Means and vice versa?

Single Link Clustering is able to find noncircular shapes, unlike K-Means. SLC no need to specify k. K-Mean is faster than SLC and fairer

What is soft clustering?

Soft clustering assigns a probability to each point to belong to clusters. It allows points to be shared my multiple clusters. Overlapping clusters are allowed.

What is the goal of EM? What is it trying to maximize?

The goal of EM is to estimate parameters- the mean and the standard deviation for each cluster to maximize the likelihood function of the data

K-means gives more weights to large clusters?

True, because it depends on the mean

Each iteration of EM algorithm increases the likelihood of the data, unless?

True, unless, you are the exactly the local optimum

Feature scaling is a important step before applying K-Means

True. Ensure that all feature get same weight in clustering analysis.

Can you use EM to estimate values of missing data ?

Yes, as long as you have some knowledge of the probability distribution

Does EM==K-Means if reach infinitely small variance?

Yes, as variance approaches 0, the likelihood ratio become 0 or 1, and the data points are fully assigned to nearest cluster, like K-Means

Suppose N data points are clustered using K-Mean and GMM. In both cases found 5 clusters and centers are same. Can 3 points different clusters in K-Means be assign to same cluster in GMM?

Yes, because GMM uses soft (probabilistic) assignment to each data point. So even if the clusters centers are identical, if GMM mixture components have large variance (components are spread around center), points on the edges may be assigned to GMM mixture solution.

What is the minimum number of feateures required to do clustering?

at least 1 feature.

What is Expectation Maximization (EM)?

moving between soft clustering and computing the mean. like k-means but you are assigning probabilities of being in each cluster to each point. Another definition: - instead of assigning points to clusters based on the mean, EM computes the probabilities of points belonging to cluster based on one or more probability distributions(Gaussians)

The highest point in silhoutte coefficient represents

the best choice of k clusters

Define K-Means

unsupervised learning algorithms that takes unlabeled data as input and partitions the data into k clusters based on feature similarity

Why would you use median in single linkage clustering?

when value of the distance itself is not as important as the order

can EM get stuck in local optima?

yes, like K-Means you will need to use random restarts and picking the best results that maximizes likelihood function


Related study sets

Psych Final - Childhood Disorders, Eating Disorders, Bipolar Disorder, Depressive Disorders, Schizophrenia, Neurocognitive Disorders, Personality Disorders, Substance Use

View Set

Advanced nutrition and wellness south and eastern asia study guide

View Set

MEA1304 QUIZZES FOR FINAL CH 6-10

View Set

Chapter 6: The circulatory system

View Set