Multiple Regression

Ace your homework & exams now with Quizwiz!

Gross Effect

for every 1 dollar change in the independent variable, it causes a # amount of change to the dependent variable we observe. it is an average computed over the range of variation of all other factors that influence price.

Inlucde Lagged variables only if

we believe the benefits of adding it outweigh the loss of an observation and the "penalty" imposed by the adjustment to R-square

Gross Coefficient

"gross" with respect to all omitted variables -takes all indep. variables into account --> hard to disentangle the true effect of single indep. variable

Steps when interpreting the coefficients of an independent variable in a multiple regression

1. Look at the p value to see if the independent variable is statistically significant 2. Check the sign of the coefficient to see if it makes sense given your understanding of the situation 3. Look at the magnitude of the coefficient to understand the structural relationship with the dependent variable 4. Note which other independent variables are included in the regression so you can interpret the coefficient as an appropriate net or gross effect.

Net Coefficient

A coefficient is "net" with respect to all variables included in the regression - only see effect of that one indep. variable --> all other indep. variables are held constant

indication of lurking multicollinearity

A common indication of lurking multicollinearity in a regression is a high adjusted R-squared value accompanied by low significance for one or more of the independent variables check if the p-value on and independent variable rises when a new independent variable is added, suggesting strong correlation between those independent variables.

Equation for multiple regression

Estimate Formula y = a + (b* x) + (b2* x2) Dependent = intercept + (slope1 * independent1) + (slope2 * independent2) True Relationship Formula y = a + (b* x) + (b2* x2) + error term

Dummy Variables

How do we incorporate categorical variables into a regression analysis? many variables we study are qualitative or categorical: they do not naturally take on numerical values, but can be classified into categories. To study the effects of category, we use a type of categorical variable called a dummy variable. A dummy variable takes on one of two values, 0 or 1, to indicate which of two categories a data point falls into. example: This dummy variable — we'll call it "Poulk!" flavor — is set to 1 for years when EasyMeat ads feature Poulk!, and 0 for years when they feature Classic.

Cheating

Improving R-squared by adding irrelevant variables We can always increase R-squared to 100% by adding independent variables until we have one fewer than the number of observations

Quantifying the Predictive Power of multiple regression

R-squared is the percentage of variation in the dependent variables explained by the independent variables

Residual Analysis

Residual Error = actual - predicted values of the dependent variable Plot the residuals against each independent varaible Rely more heavily on residual plots in MR because the full rel. among the indep. var. is difficult or impossible to represent in a single scatter diagram Residual plots reveal nonlinear and heteroskedasticity

Lagged Variables

Sometimes, the value of the dependent variable in a given period is affected by the value of an independent variable in an earlier period. We incorporate the delayed effect of an independent variable on a dependent variable using a lagged variable.

Excel Multiple Regression

Start like any other regression 1. Input Y range 2. Input X range by entering the cell reference of the top cell of the independent variable appearing farthest to the left. Following a colon, enter the cell reference of the bottom cell of the independent variable appearing farthest to the right. 3. Select Labels 4. Select Residual Plots 5. Enter Confidence Level

Adding a lagged variable is costly in two ways.

The loss of a data point decreases our sample size,which reduces the precision of our estimate of the regression coefficients. At the same time, because we are adding another variable, we decrease adjusted R-squared.

Adjusted R sqaured

To balance out the effect of the difference between the number of observations and the number of independent variables, we modify R-squared by an adjustment factor. This transformation looks quite complicated, but notice that it is largely determined by n-k, the difference between the number of observations n and the number of independent variables k. this adjustment balances out the apparent advantage gained just by increasing the number of independent variables. when the adjusted R-squared of the multiple regression of house price versus house size and distance is greater than the adjusted R-squared of either simple regression, we can conclude that we gained real predictive power by considering both independent variables simultaneously. we should never compare R-squared or adjusted R-squared values for regressions with different dependent variables.

Regression on Lagged Variables

To run a regression on the lagged variable, we first need to prepare the data for the lagged variable: we copy the column of advertising data over to a new column, shifted down by one row. Since we need observations with data on all variables, we are forced to discard the first observation as well as the extraneous piece of information in the lagged variable column.

True or False: R-squared cannot decrease when we add another independent variable to a regression

True: R-squared always improves when we add another variable R-squared cannot decrease when we add another independent variable to a regression — it can only stay the same or increase, even if the new independent variable is completely unrelated to the dependent variable.

multicollinearity

When two of the independent variables are highly correlated, one is essentially a proxy for the other. But because these two variables are closely correlated in our data set, there is not enough information in the data to discern how their combined contributions should be attributed to each of these two variables. Without house size, adjusted R-squared is 90.89%, slightly lower than 91.40%, the adjusted R-squared for the regression including house size. Thus, although the regression model cannot accurately estimate the effect of house size when we control for lot size and distance, the addition of house size does help explain a bit more of the variance in selling price.

Proxy Variable

a variable that is closely correlated with the variable we want to investigate, but typically has more readily available data.

multiple regression

taking into account the influence of several variables during regression analysis.

multicollinearity is not a problem when

using it to make predictions If we're using it to make predictions, multicollinearity is not a problem, assuming as always that the historically observed relationships among the variables continue to hold going forward.

multicollinearity is a serious problem that must be addressed when

we're trying to understand the net relationships of the independent variables. How to fix: reduce multicollinearity 1. increase the sample size. The more observations we have, the easier it will be to discern the net effects of the individual independent variables. 2. removing one of the collinear independent variables

"over-fitting"

when we obtain a regression equation in this way: the equation fits our particular data set exactly, but almost surely does not explain the true relationship between the independent and dependent variables.


Related study sets

mastering a&p 2 ch. 25 group 4 modules 25.9-25.10 DSM

View Set

APWH, Unit 4 Test, My AP Practice

View Set

Astronomy 100g Midterm 2 & 3/ Final Flashcards (Lectures 12- 40)

View Set

Chapter 5.5 1,2,3,4, and 5.6 1,2 and 3

View Set