DM Chapter 6: Multiple Linear Regression
Be able to compare (similarities and differences) explanatory modeling vs predictive modeling.
- BOTH involve using a dataset to fit a model, checking model validity, assessing its performance, and comparing it to other models. - Explanatory FOCUSES on modeling the average record, try to fit the best model to the data in an attempt to learn about the underlying relationship. --> small sample, few variables --> retrospective (looking back) - Predictive modeling, the GOAL is to find a regression model that BEST predicts new individual records --> large sample, many variabbles --> prospective (how to deploy in the future)
Stochastic View vs. Deterministic View
- Stochastic View in Multiple Linear Regression is that we need an error variable because Xps don't perfectly predict y - Deterministic View in Multiple Linear Regression is that Xis perfectly predict 'y'
The traditional statistical use of regression modeling is (1) ______________________ modeling while data mining forcuses on (2)______________________ modeling.
1) Explanatory 2) Predictive
What are three other methods for finding the best subset which are reasonable for situations with large numbers of predictors?
1) Forward Selection 2) Backward Selection 3) Stepwise Selection
What is the rule of thumb when splitting data for partitioning?
60% Training Data 40% Validation Data
When selecting variables in multiple linear regression, which measure should be used for choosing the "BEST" model in terms of predictive power?
Adjusted-R Squared - higher values of adjusted-R squared indicates better fit - adjusted-R squared uses a penalty on the number of predictors (regular R-squared does not account for the number of predictors)
Equation for SSE (Sum of Squared Errors)
E(yi-yi[hat])^2 yi - yi (hat) = ei (error)
Explain Stepwise Selection.
In stepwise regression, each step we consider dropping predictors that are not statistically significant.
Explain Cp.
Mallow's Assumes the full model is unbiased; If unbiased, average Cp should equal p + 1 - Identify subsets with small bias by examining those with values of Cp that are near (p+1) and have small p (are small size)
When explaining the relationship in multiple linear regression, what does the Method of Least Squares do?
Method of Least Squares picks a line that minimums the SSE (Sum of Squared Errors)
What does OLS stand for?
Ordinary Least Squares
True or False: Cp, R-squared, & Adj_R_squared select the same subset when comparing models with the same # of predictors.
TRUE
When do you use OLS in the multiple linear regression process?
Use OLS to estimate coefficients in the regression formula
When explaining the relationship in multiple linear regression, what do we use the Root Mean Square Error for?
Use the Root Mean Square Error to assess the model's value Formula: Square Root of SSE/(n-1)
What does the regression formula look like?
Y= B0 +B1x1 + B2x2 +...+Bpxp+ E - Remember to put hats above the output 'y' and coefficients (when using a sample) - B0 is the intercept - The other Bs are coefficients for each predictor variable - E is the error term (captures the randomness, can't capture everything in the Xps); "Garbage Collector"