Multiple Linear Regression

¡Supera tus tareas y exámenes ahora con Quizwiz!

Reducing Multicollinearity

1. Do not include redundant independent variables 2. Add more observations 3. Use the RIDGE method of regression to estimate the parameters of the model

Variable Selection Procedures

1. Use all Possible subsets 2. Forward Selection 3. Backward Elimination 4. Stepwise Selection

VIF equation

1/ (1-R squared)

Multiple Regression

A technique for describing the relationship between a continuous response Y and a set of more than one explanatory variables. Same as simple linear model but extended.

Variable Selection: Backward Elimination

All variables are initially brought into the equation. The coefficient of the least significant variable is then tested for significance level. If it is significant, all variables are included in the model and the process stops. IF it is not significant, the variable is removed, the regression equation refitted without this variable and the process is repeated.

Dummy variables

Binary variables in the regression model with values 0 or 1 for each individual observation; the regression coefficient indicates the average difference in the dependent variable between the groups.

Variable Selection: Stepwise Selection

Combines the features of forward selection and backward elimination. Variables are added one by one to the model depending on their significance. After a variable is added, all variables already included in the model are examined and any variable that is not significant is deleted.

Cook's D

Compares the data points to regression and how much the line will change. Values lager than 4/n are considered highly influential

DFBETA Cut-off

DFBETA>2/sqrt of n has potential to be influential or DFBETA > 1

Variable Selection: Forward Selection

Including variables in the model one at a time (according to strength of their association with the Y) unti the coefficient of the next variable would not be significantly different from zero. 1) all simple linear regression models are considered to find the one that gives the best fit based on the F-statistic. This variable is brought into the regression equation first.

VIF

Measure of how highly correlated each independent variable is with the other predictors in the model; larger than 10 and mean larger than 1 usually implies that multicollinearity may be influencing the least square estimate.

DFBETA

Measure of how much an observation has effected the estimate of a regression coefficient. (there is more than one for each correlation coefficient)

Interactions

Need to account for difference in slopes between groups- interaction between continuous and dummy variable. Include an interaction term (the explanatory variables multiplied together)

Coefficient of Multiple Determination

R-squared increases with added variables; it is impacted by the number of parameters and sample size, thus r-squared's can't be compared between studies or models with different samples sizes and parameters.

Overall Significance of Modeel

Same as simple linear regression: TOTAL SS (df = n-1) = RESIDUAL SS (df = n-p-1) + REGRESSION SS (df = p) The f-test provides a composite test for the null hypothesis H0: beta1=beta2=beta3....=0 (predictor variables are irrelevant) H2: Predictor variables are better than just the average; at leat one does not = 0

Interpretation of Multiple Regression

The slope of the population (Beta) or the amount by which Y changes on average when Xi changes by one unit and all other X variables remain constant. In the case of 2 predictor variables Xi and Xii, the regression of Y on Xi and Xii is equivalent to: 1) The regression of Y on Xi (part or Y not explained by xi) 2) The regression of Y on Xii (part of Y not explained by xii) 3) Regression of residuals from 1 (remaining variation in Y not explained by Xi) on residuals from 2 (remaining variation in Xii not explained by Xi)

Collinearity

This is a measure of the correlation between the explanatory variables; 0.9 and above are cause for concern. Can calculate the VIF

Fitting a Model for Multiple Regression

Use method of Least Squares with added independent variables in equation.

Adjusted R-squared

Use this to take into account the chance contribution of each variable included. It can be calculated by (1-Rsquared)(n-1/n-p)

Leverage

When a point has an unusual X profile. Measure of how far an observation is from others in terms of the levels of the INDEPENDENT variable. Observations with values larger than 2(K+1)/n are considered to be potentially highly influential

Variable Selection: All Possible Subsets

if there are k variables under consideration, then there are 2^k-1 possible subsets; If K is large, this could be a problem but it can be assessed using the following criteria: R^2: maximize within each subset size Adjusted R^2 Maximize the adjusted R^2 within the range of sizes AIC (Akaikis information criteria)

Categorical (factor) explanatory analysis

multiple regression does not require X variables to be normally distributed or for variables to be continuous


Conjuntos de estudio relacionados

chapter 13 spinal cord and spinal nerves

View Set

1.นักวิชาการขนส่งปฏิบัติการ/แนวข้อสอบ พ.ร.บ. ระเบียบข้าราชการพลเรือน พ.ศ. 2551 และที่แก้ไขเพิ่มเติมฉบับที่ 3 พ.ศ. 2562

View Set

Chapter 2 - Understanding the Sky

View Set

Biomechanics of Sport and Exercise Chapter One: Forces

View Set

Combo with "IS 300 ch.6 Network" and 1 other

View Set

Chemistry , Chp.7, Chemical quantities & reactions

View Set

Distributed Ledger / Blockchain / Cryptocurrency

View Set

Chapter 1-B: Characteristics of Insurance Contracts

View Set