M1 : Principle Component Analysis (PCA)
Why is high variation important?
More variation helps create an explanatory or predictive model
Example of importance of uncorrolating data
x1 = radio advertising (predictor) x2 = tv advertising (predictor) y = sales of product (response) if we have multi-colinearity, it is hard for me to determine what affect radio advertising is having on the sale of the product, compared to what effect TV advertising is having on a product.
How does PCA remove correlation within the data?
by changing the coordinate system
What is the significance of PCR
by using PCR you can perform dimensionality reduction on a high dimensional dataset and THEN fit a linear regression model to a smaller set of variables while at the same time keeping most of the variability of the original predictors
What is the purpose of feature extraction?
dealing with : 1. High dimensional data 2. High Correlated data
multi colinearity
multiple variables (factors) directly influence an independent variable (response)
What are the benefits of ranking coordinates?
reduces the effect of randomness - the earlier principal components are likely to be driven by a higher ratio of actual effects to random effects
Eigenvalue (lambda)
scales the vector to be longer or shorter
How does finding implied coefficients for original factors help us?
- give an intuitive explanation for the model - In other words, PCA can be explained over the original factor space, so an intuitive explanation in terms of those variables can be given?
How do we interpret the new model in terms of the original factors?
- if we plug in the transformation formula, for each T vector. - We can find the implied coefficient aj for each of our original factors j.
What are the goals of principal Component Analysis?
1. Reduce the number of predictors we use 2. eliminate the need for large sets of data 3. removes correlation within the data 4. ranks coordinates by importance
How does PCA rank coordinates by importance?
By ranking the coordinate dimensions in order of the amount of variance in each
Eigenvalue Math Theory
Every value of lambda for which the determinant of a minus lambda times the identity matrix equals 0 - every one of those values of lambda is an an eigenvalue of A - once we have an eigenvalue we can solve for the corresponding eigenvector V
What is a term for the first ranked coordinates?
First end principle components
Eigenvector
If we start with some vector V, and we use a linear transformation on it "A" - we end up with a vector that goes in the exact same direction
is PCA only linear?
No, you can use kernals to perform nonlinear functions
Perfect Colinearity
One variable exactly determines the behavior of another ex. know x1, would know exactly what x2 would be
PCA Change of Coordinate System
PCA changes the coordinate system (denoted by the arrows) to create an uncorrelated scenario
Why is it important to uncorrelate data?
Performing PCA on the raw data produces linear combinations of the predictors that are uncorrolated. - therefore it allows you to disentangle the different affects predictors have on the response
What is PCR?
Principal Component Regression
What are Eigenvalues used for? (Math Theory)
The first step of PCA is to find all the eigenvectors of the matrix X transpose times X - we multiply x by each eigenvector to find each of the principal components
What are Eigenvalues used for? (Principle)
basically uses the properties of eigenvectors and eigenvalues to get the transformed set of coordinate directions also for the directions to all be orthogonal to each other