Module 7- Clustering Quiz
Considering the K-means algorithm, if points (-1, 3), (-3, 1), and (-2, -1) are the only points which are assigned to the first cluster now, what is the new centroid for this cluster? (1, 2) (-2, 1) (-2.2) (0, 0)
(-2, 1)
What is the minimum no. of variables/ features required to perform clustering? > 10 1 0 2
1
What is a dendrogram? A graph structure Correct Answer A hierarchical structure You Answered A diagram structure
A hierarchy structure
Which of the following is required by K-means clustering? number of clusters initial guess as to cluster centroids defined distance metric All of these
ALL
Why do we need Cluster Vadility? To avoid finding patterns in noise. To compare two sets of clusters. All of the above. To compare clustering algorithms.
ALL
Which of the following statements are true? Clustering analysis has a wide range of applications in tasks such as data summarization, dynamic trend detection, multimedia analysis, and biological network analysis. It is impossible to cluster objects in a data stream. We must have all the data objects that we need to cluster ready before clustering can be performed. When clustering, we want to put two dissimilar data objects into the same cluster. Clustering analysis is supervised learning since it does not require labeled training data.
Clustering analysis has a wide range of applications in tasks such as data summarization, dynamic trend detection, multimedia analysis, and biological network analysis.
Which of the following method is used for finding optimal of cluster in K-Means algorithm? All of the above Ecludian mehthod Manhattan method Elbow method
Elbow Method
What is the most commonly used measure of similarity (or its quadratic form? Euclidean distance Chebychev's distance Manhattan distance City-block distance
Euclidean distance
K-NN algorithm is a clustering algorithm.
FALSE
For which of the following tasks might clustering be a suitable approach? Given sales data from a large number of products in a supermarket, estimate future sales for each of these products. Given historical weather records, predict if tomorrow's weather will be sunny or rainy. Given a database of information about your users, automatically group them into different market segments. From the user's usage patterns on a website, predict user engaging patterns.
Given a database of information about your users, automatically group them into different market segments.
Which is the following is INCORRECT? K-means clustering is a vector quantization method. K-nearest neighbor is same as K-means. None K-means clustering tries to group n observations into K clusters.
K-nearest neighbor is same as K-means.
Feature scaling is an important step before applying K-Mean algorithm. What is reason behind this? Feature scaling has no effect on the final clustering. None of the above. In Manhattan, distance it is an important step but in Euclidian it is not. You Answered Without feature scaling, all features will have the same weight.
NONE
Which of the following statements about the K-means algorithm are correct? The K-means algorithm is sensitive to outliers. For different initializations, the K-means algorithm will definitely give the same clustering results. The K-means algorithm can detect non-convex clusters. The centroids in the K-means algorithm should be one of the observed data points.
The K-means algorithm is sensitive to outliers.
In k-NN what will happen when you increase/decrease the value of k? Smoothness of the decision boundary doesn't dependent on value of K The decision boundary becomes smoother with increasing value of K The decision boundary becomes smoother with decreasing value of K None of these
The decision boundary becomes smoother with increasing value of K
What is WRONG about some common considerations and requirements for clustering? We need to consider whether we want a partitional or hierarchical clustering scheme. We need to be able to handle a mixture of different types of attributes (e.g., numerical, categorical). In order to perform cluster analysis, we need to have a similarity measure between data objects. We must know the number of output clusters a priori for all clustering algorithms.
We must know the number of output clusters a priori for all clustering algorithms.
Considering the K-means algorithm, after current iteration, we have 3 centroids (0, 1) (2, 1), (-1, 2). Will points (2, 3) and (2, 0.5) be assigned to the same cluster in the next iteration?
YES
Which of the following is finally produced by hierarchical clustering? a dendrogram showing how close things are to each other final estimate of cluster centroids none of these assignment of each point to clusters
a dendrogram showing how close things are to each other
The k-means algorithm: always converges to a clustering that minimizes the mean-square vector-representative distance. is typically done by hand, using paper and pencil. can converge to different final clusterings, depending on initial choice of representatives. should only be attempted by trained professionals.
can converge to different final clusterings, depending on initial choice of representatives.
The choice of k, the number of clusters to partition a set of vectors into is a personal choice that shouldn't be discussed in public should be as small as possible should always be as large as your computer system can handle depends on why you are clustering the vectors
depends on why you are clustering the vectors
Collaborative filtering is can be used to _____. none of the others recommend new products identify credit defaulters predict revenues
recommend new products