ADDA Lect. 11: Clustering and Multidimensional Scaling
What are the possible rules for distance between clusters used in hierarchical clustering analysis?
- Single Linkage/nearest neighbor: Nearest neighbors across clusters are linked to each other - Complete Linkage/furthest neighbor: furthest neighbors across clusters are linked to each other - Average linkage within groups: distance of average of new group - Average linkage between groups: Average distance over all pairs across clusters - Ward's method: a variance‐based method. distance is sum of squares between two clusters, summed over all variables. Within cluster sum of squares is then minimized - Centroid method: distance between means of all variables - Median method:similar to centroid but small clusters weighted equally with large clusters
What two procedures does SPSS use to move data points in MDS and which is better?
1. ALSCAL uses 'method of steepest descent' 2. PROXSCAL uses 'iterative majorization' • the latter is better - doesn't need good starting position - always converges to minimum stress
Describe two measures for assessing good fit based on stress of data in space?
1. As a rule of thumb, values less than 0.15 indicate good fit 2. If our stress is less than that for random data ‐ accept the solution
7 steps of K-means clustering
1. Define number of clusters k 2. Set initial cluster means 3. Find squared Euclidean distance from each case to the mean. 4. Allocate object to closest cluster. 5. Recalculate means for each cluster. 6. Find new distances 7. Reallocate cases. If no change, stop; otherwise back to 5.
What are the distance measures used in hierarchical cluster analysis?
1. Euclidean Distance: 2. Block 3. Minkowski‐r 4. Squared Euclidean Distance 5. Power
What are the three types of cluster analysis?
1. Hierarchical cluster analysis 2. k‐means cluster analysis 3. Two step cluster analysis (SPSS)
What are the two ways of interpreting MDS?
1. Subjective interpretation:Examine an MDS diagram to see what patterns are observable and how the elements group together based on your knowledge of the variables. 2. Statistical interpretation:(Stalans, 1997): Regression of dimensional coordinates of an MDS solution against other variables.
Here is a distance matrix (see image) to be used in single link (nearest neighbour) clustering. The first step is to combined V2 and V3 into a single cluster. Under single link the distance from V1 is A. 14.8 B. 12.1 C. 9.5 D. Average of 9.5 and 12.1
B. 12.1
With MDS we are looking for A. Coordinates in 2 (or 1) dimensional space that reflects the distances in rth dimension B. Coordinates in 2 (or 1) dimensional space that are different to the distances in rth dimension C. Similar points to be far apart D. Dissimilar points to be close together
A. Coordinates in 2 (or 1) dimensional space that reflects the distances in rth dimension • If a variable is the largest in rth dimension we want it to be the largest in 2 dimensions • Or distances between the points in this space are in the SAME RANK ORDER (or as close as possible) as the size of the distances in the data.
A. Hierarchical cluster analysis can be used for clustering variables, while K-mean clustering is only used for clustering cases. B. K-mean clustering can is used for clustering variables, while Hierarchical cluster analysis is only used for clustering cases. C. Hierarchical cluster analysis and K-mean clustering can be used for clustering either variables or cases. D. None of the above
A. Hierarchical cluster analysis can is used for clustering variables, while K-mean clustering is only used for clustering cases. NB: two step is also only used for cases
Stress is A. A measure of fit and should be maximised B. A measure of fit and should be minimised C. A complex mathematical procedure D. A rank order transformation
B. A measure of fit and should be minimised
In the simplest kind of MDS, Classical MDS (metric scaling): A. One proximity matrix, typically based on distances (dissimilarities), and interval (often ratio) data is used. B. One distance matrix, typically based on distances (dissimilarities), and interval (often ratio) data is used. C. One distance matrix, typically based on distances (dissimilarities), and ordinal data is used. D. One proximity matrix, typically based on proximity (similarities), and interval (often ratio) data is used.
A. One proximity matrix, typically based on distances (dissimilarities), and interval (often ratio) data is used. Not much used, factor or principal components analysis generally preferred.
Agglomerative clustering: A. Proceeds by combining the two closest clusters at every step. B. Requires an icicle plot or dendrogram. C. Allocates objects to nearest cluster after setting initial cluster means. D. Can only be used for clustering cases.
A. Proceeds by combining the two closest clusters at every step
Nearest neighbour method: A. Uses the smallest distance between any element of A and element of B as the definition of distance between clusters A and B B. Uses the mean distance between any element of A and element of B as the definition of distance between clusters A and B C. Uses the largest distance between any element of A and element of B as the definition of distance between clusters A and B D. Is called double link method
A. Uses the smallest distance between any element of A and element of B as the definition of distance between clusters A and B Is also called single link method
Describe 5 advantages, and 2 disadvantages of the two-step clustering method?
Advantages • Combines both hierarchical and k‐means. • Handles outliers (stops them forming 'nuisance clusters) • Allows for both categorical and continuous measures. • The researcher can either set the number of clusters, or allow the program to determine the number of clusters • Has fancy output diagrams Disadvantages • Only cluster cases • Cluster membership can depend on the order of cases in the data file - a particular problem for small data sets
MDS transforms ordinal proximity into distance data. What are the main assumptions that must hold for this?
Assumes that the relationship between proximity data and derived distances is smooth
What is the criterion for average link between groups method and what is the effect of this method?
Average distance over all pairs across clusters A compromise of the above variations
When looking at Kruskal's stress measure: A. Low values = poor fit B. Low values = better fit C. Higher dimensions result in high stress D. More variables result in lower stress
B. Low values = better fit • Higher dimensions result in lower stress • More variables result in higher stress
An MDS solution does the following: A. Always produces a 2‐dimensional plot. B. Reduces dimensionality of the original data. C. Minimizes stress to produce a good fit. D. Uses a scree plot to determine the best number of variables.
B. Reduces dimensionality of the original data
In MDS each object (ie case or variable) is represented as a point in multidimensional space - so that: A. two similar objects are far apart. B. two similar objects are close together. C. two dissimilar objects are close together. D. MDS has nothing to do with distances
B. two similar objects are close together.
A. Interpreting an MDS requires an understanding of what the axes mean. B. Interpreting an MDS requires that the chart be a simplex or circumplex. C. Interpreting an MDS is often subjective. D. None of the above.
C. Interpreting an MDS is often subjective
A. Non metric MDS requires interval or ratio data. B. Classical MDS is preferred for ordinal data. C. Non‐metric MDS uses rank ordered distances. D. Classical MDS uses rank ordered distances.
C. Non‐metric MDS uses rank ordered distances.
In non-metric MDS: A. One distance matrix is used, and there is an assumption that only interval data is used. B. One proximity matrix is used, and there is an assumption that only interval data is used. C. One distance matrix is used, and there is an assumption that only ordinal data is used. D. None of the above
C. One distance matrix is used, and there is an assumption that only ordinal data is used. - most frequently used
In MDS: A. if in 2 dimensions, the objects can be represented on a line, and if in 1 dimension, the objects can be represented on a chart. B. if in 2 dimensions, the objects can be represented on a line, and if in 1 dimension, the objects can also be represented on a line. C. if in 2 dimensions, the objects can be represented on a chart, and if in 1 dimension, the objects can be represented on a line. D. if in 2 dimensions, the objects can be represented only in the real world, and if in 1 dimension, the objects can be represented on a line.
C. if in 2 dimensions, the objects can be represented on a chart, and if in 1 dimension, the objects can be represented on a line.
A three cluster solution for this data is A. {Amphetamine, Marijuana, Alcohol} B.{ Cigarettes}{Amphetamine, Marijuana, Alcohol} C.{ Cigarettes}{Amphetamine, Marijuana} {Alcohol} D.{ Cigarettes, Alcohol}{Amphetamine} {Marijuana}
C.{ Cigarettes}{Amphetamine, Marijuana} {Alcohol}
When is the two-step cluster analysis approach particularly problematic?
Cluster membership can depend on the order of cases in the data file - a particular problem for small data sets
As a rule of thumb, in MDS values less than 0.15 indicate good fit BUT: A. Higher dimensions result in lower stress B. More variables result in higher stress C. If our stress is less than that for random data ‐ accept the solution D. All of the above
D. All of the above
Block metric is A. The most commonly used method for cluster analysis. B. Based on Pythagoras's theorem, so is the squared distance between points. C. Is a correlational not a proximity measure. D. Is based on the absolute difference between pairs of scores.
D. Is based on the absolute difference between pairs of scores
Clustering is one of the simplest means of looking for: A. The average statistic among participants B. The maximum number of homogeneous groups B. The maximum number of heterogenous groups D. latent classes among participants
D. Latent classes among participants
A. Stress is a measure of how well a cluster analysis fits the data. B. Stress greater than 0.15 indicates that the solution has good fit. C. Higher dimensions result in higher stress. D. More variables result in higher stress.
D. More variables result in higher stress
In a proximity matrix: A. Numbers further away from zero indicate variables are closer together and therefor more likely to be part of the same cluster. B. Numbers further away from zero indicate variables are further apart and therefor more likely to be part of the same cluster. C. Numbers closer to zero indicate variables are further apart, and therefor more likely to be part of the same cluster. D. Numbers closer to zero indicate variables are closer together, and therefor more likely to be part of the same cluster.
D. Numbers closer to zero indicate variables are closer together, and therefor more likely to be part of the same cluster.
Kruskal devised a method A. of rank-order transformations where high values mean better fit B. of rank-order transformations where more variables mean lower stress C. of rank-order transformations called stress D. of rank-order transformations called montone regression
D. of rank-order transformations called montone regression
Monotone regression (Kuskal)
Devised a method of rank‐order transformations, called monotone regression whereby the distance of the proximity matrix is converted into rank ordering for the purposes of the MDS - Low values = better fit. So there is no need for assumptions about ratio scales.
What is the criterion for Ward's method and what is the effect of this method?
Distance is sum of squares between two clusters, summed over all variables. Within cluster sum of squares is then minimised Tends to combine clusters with small and equal number of data points
What is the criterion for average link within groups method and what is the effect of this method?
Distance of average of new group Similar to complete linkage in that it produces tight clusters
What measure of similarity between the variables do we use in hierarchical cluster analysis?
Distance scores - Don't use correlations because they assess similar variation, not similar scores.
What is a subjective interpretation of MDS?
Examine an MDS diagram to see what patterns are observable and how the elements group together based on your knowledge of the variables.
What is the criterion for complete linkage method and what is the effect of this method?
Furthest neighbours across clusters are linked Produces tight clusters
What clustering method would you use if you have outliers?
If outliers, complete linkage preferred (more stable than single)
In MDS when two variables are similar they appear as points______ in the space, and when two variables are very dissimilar they appear as points _______ in the space.
In MDS when two variables are similar they appear as points close together in the space, and when two variables are very dissimilar they appear as points very distant from each other in the space.
How do we determine how many clusters to use in hierarchical cluster analysis?
Largely a matter of interpretation and choice, you may have other theoretical or empirical reasons to expect a certain number of clusters. OR: • Can look at the Agglomeration schedule: Agglomeration coefficient tells us how alike the two clusters being joined at a particular step are. Choose a solution when the increase in the coefficient becomes large.
How are multidimensional scaling and cluster analysis similar?
Like cluster analysis MDS is a measures of association between variables, analyzes distances (dissimilarities), and can analyze cases (individuals) or variables - multidimensional scaling older (1930's v 1960's)
In hierarchical clustering analysis, what is the method for combining clusters whereby the distance between cluster A and B is defined as the smallest distance between any element (variable) of A and any element (variable) B?
Nearest neighbor rule (also called single link)
What is the criterion for single linkage method and what is the effect of this method?
Nearest neighbours across clusters are linked to each other Produces large sometimes straggly clusters
Do the coordinate axes in an MDS have meaning?
No. - What is important is the position of each element relative to other elements. - Elements that are close together provide evidence that they tend to be associated.
What is Degeneracy in MDS and how do we check for it?
Points of the representation are located in a few tight clusters: o These clusters may be only a small part of the structures of the data but may swamp the interpretation and stress value may be close to 0. o Can check this with aid of charts. Inspect the transformation plot: if it is reasonably smooth, then solution is OK; if it has some obvious steps, then there may be problems.
K-means clustering
Produces reasonable numbers in clusters
What is a statistical interpretation of MDS?
Run Regression of dimensional coordinates of an MDS solution against other variables.
What is the calculation for working out distance in hierarchical clustering using Squared Euclidean Distance?
Takes the difference between each score and then squares them to get rid of negative
What is the goal of MDS?
The goal is of dimension reduction • You start with (say) n variables, so you have an n‐dimensional space to begin with. • You want to "shrink" this to a mere 2‐dimensions. • This simplification is intended to make the data more understandable. • a 1‐dimensional or 2‐dimensional solution is usually preferred because it can be visualized on a page
In which type of cluster analysis does the researcher not determine how many clusters will be produced in the final model?
The researcher not determine how many clusters will be produced in the two step cluster analysis, which has inferential techniques to assist with decisions on the number of clusters.
When wouldn't you use Average or Ward's methods?
When there are outliers in the data
When a researcher is deciding on the method, what are they asking themselves?
Which is more important: The pattern of scores or the distance between the scores
What clustering method would you use if you have well behaved data?
With well behaved data, use average linkage.
How is k-means clustering different to hierarchical clustering?
Works on segments of the data and not hierarchies. For example, it is used in market research to 'segment' populations - eg: Those who are price conscious, vs quality concious
In MDS what do we refer to the space as?
r-dimensional
What is the calculation for working out distance in hierarchical clustering using the block metric?
takes the difference of two scores
What is the calculation for working out distance in hierarchical clustering using Euclidean Distance?
takes the square root of the squared difference between each score
What is the ideal number of dimensions in MDS?
• A 1‐dimensional or 2‐dimensional solution is usually preferred because it can be visualized on a page
What is cluster analysis?
• A simple approach to forming groups of variables or cases • Individuals or variables that are "similar" to one another are grouped into the same cluster. • Essentially an exploratory technique
What are the advantages of multidimensional scaling over cluster analysis?
• Clustering has weak or no model • Multidimensional scaling has explicit model
How is MDS different?
• Displays distance‐like data as a geometrical picture.
Is cluster analysis a confirmatory or exploratory technique?
• Essentially an exploratory technique • Individuals or variables that are "similar" to one another are grouped into the same cluster.
What is the main difference between the goal of classic MDS and modern non-metric MDS?
• In the classical form of multidimensional scaling, the function is a simple linear one, (linear regression) so that: djk = a + b.rjk • The innovation underlying modern non-metric MDS is to replace the linear regression function with a rank‐ordered one. • The task is to find a set of coordinates for points such that the distances between the points in this space are in the SAME RANK ORDER (or as close as possible) as the size of the distances in the data.
What are the two steps involved in the two step clustering method?
• In the first step cases are grouped into a reasonably large number of small sub‐clusters (by a technique involving cluster trees) • In the second step the sub‐clusters are clustered using a standard hierarchical agglomerative procedure (SPSS does not say which one) to produce the final clusters.
What are four methods for identifying regions in the space?
• Informal groups • Partitioning the space • Spatial manifolds: the simplex • Spatial manifolds: the circumplex
How are multidimensional scaling (MDS) and cluster analysis similar in terms of what they measures and how?
• Is a measures of association between variables, analyses distances (dissimilarities) • Can analyse cases (individuals) or variables
What can we tell from a transformation plot in terms of monotone regression?
• Rank order has been retained because the plot is heading in the same direction - ie: monotonic increase (if it went up and came down, then that would be a problem) • There are steps, but not like a staircase, so will not be a problem
What measure of similarity between the variables do we use in hierarchical cluster analysis compared to EFA or CFA?
• We are using Distance scores • Unlike EFA and CFA that use correlations - they assess similar variation, not similar scores.
If we have a data set with outliers, which linkage rule would we use and why?
• Wouldn't use Single linkage and Cheng & milligan describe this as elegant theoretically but not superior. • Ward's method, and the Average method generally have been shown to perform well although not when there are outliers in the data. • Cheng & milligan suggest average linkage only with well-behaved data • Complete linkage: therefor should be used with outliers • No method always superior (Milligan, 1980)
With non-metric MDS what can you do with a likert scale?
• You can check that the distance between scale items is equal