Econometrics lecture note 6- Multiple linear regression estimation

Ace your homework & exams now with Quizwiz!

Beware interpretations of R squared and adjusted R squared

Adjusted useful because it summarises extent to which regressors explain variation in Y Maximising adjusted in practice is rarely the goal, if it is too close to 1, it is a sign there is logical problem with regression model

Further dummy variable trap- how to solve

Cant have dummy variables adding to equal another dummy variable

Heteroskedasticity

Error term in regression is homoskedastic if... Otherwise error term is hetero (We work under assumption of hetero)

Standard error of regression (SER)

Estimates standard error of the error term u

Multiple linear regression

Extends single linear regression model to include additional variables as regressors Allows us to estimate effect on Y of changing one variable while holding all other regressors constant Tool for eliminating omitted variable bias

Fixing omitted variable boas

Focusing on schools with similar levels of income

R squared

Fraction of sample variance in Y explained or predicted by the regressors

Omitted variable bias

If the regressor is correlated with a variable that has been omitted from the analysis AND that determines, in part, the dependent variable, then the OLS estimator of the effect of interest will suffer from omitted variable bias (OVB) causing ß^₁≠ ß₁ Error in the regression model contains these confounding variable

Signing omitted variable bias

In our example, we had positive relationship between income and test score (Y), so positive sign (+) There was negative relationship between income and class size (X), so negative sign (-) Take the sign of the correlation of the omitted variable and Y and multiply by the sign of the correlation of omitted variable and X and you get sign of omitted variable bias

The constant

Intercept is the expected value of Y when all regressors are equal to zero We can equivalent write regression including third regressor which is a dummy variable equal to one for all observations, where X0 is the constant regressor

Summary

Just look at whats been written on slides!

When does omitted bias exist

Occurs when omitted variable satisfies two conditions 1. Correlated with included regressor 2. Helps determine the dependent variable If omitted bias exists then E[ß^₁]≠ ß₁, the OLS estimate of ß₁ is now biased and all machinery for estimating and testing regression fails

Omitted variable bas and OLS assumption 1

Omitted variable bias means our first least squares assumption E[u|X]=0, fails - u contains all factors other than X that are determinants of Y - If one of these factors are correlated with X, then this means u is correlated with X e.g. If income is a determinant so test score (Y) and we omit it, then it is in u AND if income is correlated with class size, then u will be correlated with x Because u and X are correlated in the presence of an omitted variable, conditional mean of u given X isn't zero --> If corr(uᵢ,Xᵢ)≠0 --> E[uᵢ|Xᵢ]≠0

Avoiding dummy variable trap

Only include G-1 of the G dummy variables in regression Dummy we do not include is the base category or base group or omitted variables We interpret all other dummies as change in outcome variable when given dummy variable is equal to 1 relative to the base group, holding all other regressors constant Alternatively, we can include all G dummies and drop the constant in the regression (very uncommon) In sum, when your software indicates you have perfect multicollinearity, eliminate by 1. Determining source of perfect multicollinearity 2. Creating a base group (drop one) 3. Ensure you properly interpret regression coefficients for dummies in regression relative to omitted base group

Econometrically modelling test scores

Other variables like income affect grades/ class size - So we have lower income --> higher class size, lower scores In words, if some variable like income varies across classes, this creates negative relationship between class size and test score - This has serious implications for OLS estimate, because ß₁ designed to capture direct link alone - The OLS coefficient will fail to isolate the direct link The OLS estimate is driven by two forces 1. Direct relationship we want to determine empirically 2. A separate indirect correlation due to differences in another variable

Dummy variable trap

Possible source of multicollinearity arises when multiple dummy variables used as regressors Add them, they equal one always= constant regressor (linearly related)

R squared in multiple

R squared always rises when regressions added to regression unless estimated coefficient on regression is exactly 0 (rare) Because of this, we often work the adjusted R squared, which is a modified version that doesn't necessarily increase when new regressor is added

Imperfect multicollinearity

Related issue, which arises when one regressor is highly correlated but not perfectly correlated with other regressors Doesn't prevent statistics programs from providing OLS estimates but results in regression coefficients being estimated imprecisely and having large standard errors and therefore statistically insignificant regression coefficients Intuitively, if two regressors highly correlated and almost always co-moving together, hard to disentangle individual impacts on dependent variable in regression Whereas multicollinearity arises because of logical mistake in regression set-ip, imperfect multicollinearity is not necessarily error but function of data, OLS and question you're trying to address

Student test score example

Richer regressions allow us to control for many other variables that predict test scores in avoiding omitted variable bias to isolate the relationship between variables which is the main effect of interest

How to deal with

Should we expect ß^₁ to be bigger or smaller than ß₁ Conceptually, we're interpreting what our OLS estimate ß^₁, we can think of it as containing two parts ß^₁= ß₁ + γ Where: - ß₁ is direct relationship - γ is indirect relationship Given we expect γ < 0, this means we expect the regression to yield ß^₁> ß₁ which means it gives a biased estimate of a relationship (or vice versa) i.e. downward bias Magnitude of estimate may be too large relative to true value, as it is confounded by different variables

Formula for omitted variable bias

Suppose least squares assumptions 2 and 3 hold but 1 doesn't Let corr(uᵢ,Xᵢ)=ρXᵤ is the correlation between Xᵢ and uᵢ in single linear regression

What does all of this mean

Suppose you get a statistically significant ß^₁, it could be driven by another variable and the link doesnt exist at all You may have statistically significant ß^₁ but could be just driven by the fact that higher income schools tend to have smaller class sizes, higher income kids do better on test - Could be no direct and driven by indirect relationship

Control variables

Taking conditional expectations at a point, where X1=x1 and X2=x2 the population regression function is... We often refer to some of the regressors in a multiple linear regression as control variables Interpreting ß₁, we say it is relationship between X1 and Y holding X2 fixed (or controlling for X2)

Perfect multicollinearity example: huge class

Two important aspects 1. Perfect multicollinearity can arise because if the constant 2. Perfect multicollinearity is specific to dataset you have at hand; we could imagine classes with more than 35 students (if you did, you wouldn't have it)

Perfect multicollinearity

Two regressors exhibit perfect multicollinearity if one of regressors is a perfect linear combination of other regressors Can't hold something fixed and change it at the same time Generally, groups of regressors that are perfectly collinear, it is impossible to hold one regressor fixed to estimate effect of one of the other collinear regressors on dependent variable In practice - Drop one variable - Error message - Modify set of regressors to eliminate problem

OLS estimate with multiple linear regression

Use ordinary least squares (OLS) estimator to estimate regression coefficients of multiple linear regression model OLS estimator aims to find regression coefficients that minimise mistakes models make in predicting dependent variable given regressions OLS estimators together minimise the sum of squared prediction mistakes

Coefficient interpretation

When we interpret coefficients, we imagine changing only one regressor at time, leaving others fixed It is often called the partial effect of X1 on Y, which emphasises our focus on changing just one regressor whole holding other regressors fixed

Income and class size error term impact on the regression

Yields bias in ß^₁


Related study sets

NCLEX Practice Test questions for exam 3

View Set

International Political Economy Midterm

View Set

2.1-2.10 AP-Style MC Practices & Quizzes

View Set

Human Sexuality: Making Informed Decisions Exam 1

View Set

MTA 98-361 Ch.5 Understanding Desktop Applications Questions

View Set