STAT242: Principal Component Analysis
Outliers in PCA?
- No significant outliers - these produce a disproportionate influence on the results - Outliers are component scores more than 3 standard deviations away from the mean
Bartletts test of sphericity
- P value smaller than 0.05 = multivariate normality
What are eigenvalues in PCA
- They are the variance of that principal component
How many principal components can you have?
- Can have as many principal components as the number of original variables
Principal component with reversed sign...?
- Component still measuring the same aspect of the data but in the opposite direction
If the original variables are highly variable should you use a correlation or covariance matrix?
- Covariance matrix used - Leads to a more sensible analysis
What kind of variables are used in PCA?
- Mostly continuous but also ordinal variables can be used
What is the aim of principal component analysis?
-Data reduction technique -Produces a small number of derived variables to be used in place of larger number of original variables -They are linear combinations of the original variables and are UNCORRELATED (measure different dimensions in the data)
Sampling Adequacy needed
-Generally 5/10 cases per variable
Can you complete a principal component analysis if the original variables are uncorrelated?
-No. Nothing is achieved by PCA if the original variables are not correlated
What is the sum of the eigenvalues in PCA equal to ?
-Sum of principal component variances are equal to the sum of the variances of the original variables
Scree plot
-Tells how many principal components to use
KMO statistic what does it tell you?
-Tests for sampling adequacy -Must be bigger than 0.5 for overall -For individual, 0.9 is excellent, 0.8, good, 0.5 poor. Variable with lowest MSA removed, analysis repeated