CS 3654: Quiz 5: PCA and Clustering
If points (0,3), (2,1), and (-2,2) are the only points assigned a cluster, what is the centroid for this cluster?
(0,2)
Which of the following can act as possible termination conditions in k-means fitting?
-A fixed number of iterations have passed -Centroids do not change position between successive iterations -The within-cluster variance falls below a pre-set threshold -The assignment of items to clusters does not change between iterations
Which of the following statements are true about cluster analysis in general?
-Agglomerative clustering is an example of distance-based clustering method -Different clustering algorithms are able to detect clusters of different sizes and shape. -In order to perform cluster analysis, we need to have similarity or distance measure between objects -Cluster analysis doesn't require labeled training data; the goal is to find the structure in the data
Feature normalization is an important step to perform before running most clustering algorithms. What is the reason behind this?
-All feature will have approximately equal influence on distance calculations
Which of the following are reasons to use PCA for dimensionality reduction?
-Reducing the data dimensionality to 2D or 3D can give you an intuition of the shape of the data by plotting.
Which of the following statements about Principal Component Analysis are true?
-The number of possible principal components is equal to the number of input dimensions. -We should z-score normalize data dimensions prior to running PCA
Which of these are true about the k-means algorithm we discussed in class?
-k-means always selects exactly k clusters -k-means can produce different clustering's for the same data and same k
The choice of k, the number of clusters to partition a set into:
...depends on why and how you are clustering the data.
Which of the following is the best choice for the number of clusters given the plot below?
6
Are two runs of k-means clustering expected to yield the same clustering results?
No
As the k-Means algorithm runs, we currently have 3 centroids (0,1), (2,1), and (-1,2). Will points (2,3) and (2,0.5) be assigned to the same cluster in the next iteration>
Yes
