Ch 6 - Multiple Linear Regression

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

outcome variable

the Y - what you are trying to predict

stepwise

- Like Forward Selection - Except at each step, also consider dropping non-significant predictors use step() to run stepwise regression (available in lm). set direction = both

backward elimination

- Start with all predictors - Successively eliminate least useful predictors one by one - Stop when all remaining predictors have statistically significant contribution - yields model with 7 predictors use step() to run stepwise regression (available in lm). set direction = backward

forward selection

- Start with no predictors - Add them one by one - Stop when the addition is not statistically significant use step() to run stepwise regression (available in lm). set direction = forward

popular subset selection algorithms

- a partial iterative search through the space of all possible regression models

exhaustive search

- a search that continues until the test item is compared with all items in the memory set - shows best model for each number or predictors - Subset selection methods help find "good" candidate models. These should then be run and assessed. evaluate all subsets of predictors: - Goal - Find parsimonious model (the simplest model that performs sufficiently well) - More robust - Higher predictive accuracy - We will assess predictive accuracy on validation data Exhaustive Search = "best subset": - Partial Search Algorithms; Forward, Backward, Stepwise - All possible subsets of predictors assessed (single, pairs, triplets, etc.) - Computationally intensive, not feasible for big data - Judge by "adjusted R2" - Use regsubsets() in package leaps - Exhaustive search requires library leaps and manual coding into binary dummies library(leaps) Fuel_Type <- as.data.frame(model.matrix(~ 0 + Fuel_Type, data=train.df)) train.df <- cbind(train.df[,-4], Fuel_Type[,]) head(train.df) search <- regsubsets(Price ~ ., data = train.df, nbest = 1, nvmax = dim(train.df)[2], method = "exhaustive") sum <- summary(search) sum$which sum$rsq sum$adjr2 sum$Cp

Adjusted R-squared

- correlation coefficient squared and adjusted for the number of independent variables used to make the estimate - Adjusted R-squared for the models with 1 predictor, 2 predictors, 3 predictors, etc. (exhaustive search method) > sum$adjr2 [1] 0.7556359 0.7922356 0.8267935 0.8436895 0.8494282 0.8534911 [7] 0.8544782 0.8545430 0.8543943 0.8542602 0.8540382 - Adjusted R-squared drops until you hit 7 predictors, then stabilizes, so choose model with 7 predictors, according to the adj R-squared criterion

explanatory modeling

- explaining or quantifying the average effect of inputs on an outcome - its a descriptive task - learn about the underlying relationship in the population - find a trend or pattern - Goal: Explain relationship between predictors (explanatory variables) and target - Familiar use of regression in data analysis - Model Goal: Fit the data well and understand the contribution of explanatory variables to the model - "goodness-of-fit": R2, residual analysis, p-values

predictive modeling

- predicting the outcome value for new records, given their input values - its a predictive task - predicting new records - Goal: predict target values in other data where we have predictor values, but not target values - Classic data mining context - Model Goal: should have good predictive accuracy - Predictive models are fit to training data, and predictive accuracy is evaluated on a separate validation data set - Assess performance on validation (hold-out) data - Explaining role of predictors is not primary purpose (but useful)

Ordinary least squares

- regression formula used to minimize the sum of squared deviations btw the actual outcome values (Y) and predicted values based on that model (ŷ) - these predictions will be unbiased and will have the smallest mean squared error

predictors

- the X variables - what you are inputing - Removing redundant predictors is key to achieving predictive accuracy and robustness

how to reduce the number of predictors

- use domain knowledge - reduce to a sensible set that reflects the problem at hand - make use of computational power and statistical performance metrics

multiple linear regression model p.153

- used to fit a relationship btw a numerical outcome variable (Y) and set of predictor variables (X's) - the goal is to predict new models, not simply identify which variables are more significant - can be used for explanatory and predictive modeling

overfitting

- when a model is too over done

underfitting

- when a model is too simple

explanatory vs predictive p.154-155

1.) explanatory should fit the data well; predictive gives new records 2.) explanatory uses the whole data set to estimate; predictive splits into training and predictive 3.) explanatory performs well if its fits well; predictive performs well is its predicts accurately 4.) explanatory models focus on coefficients; predictive models focus on predictions

the 3 algos of iterative search methods are

1.) forward selection - start with no predictors and then add predictors one by one 2.) backward elimination - start with all predictors, and then eliminate then least useful ones one by one 3.) stepwise regression - with each step, we consider dropping predictors that are not statistically significant

Understand the methods of variable selection in linear regression p.161-166

Backwards, forwards, stepwise (both?)


Ensembles d'études connexes

Advantages and Disadvantages of Major Types of Business Organisation

View Set

Chapter 12 - Capital Budgeting Decisions

View Set

AIC30 Claim Handling Principles and Practices Chapter 6

View Set

H9. Feelings About Work: Job Attitudes and Emotions. Industrial and organizational Psychology

View Set

Hate Small Talk? These 5 Questions Will Help You Work Any Room

View Set

childhood communicable diseases & immunizations

View Set

Art 100 Ch. 22 From Modern to Postmodern

View Set