Econometrics - test 1: Multiple Linear Regression Assumptions and Theorems

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Normality of error terms - assumption

It is assumed that the unobserved factors are normally distributed around the population regression function.

Gauss-Markov assumptions

MLR.1 - MLR.5

Classical linear model (CLM) assumptions

MLR.1 - MLR.6

Assumption of Homoscedasticity

MLR.5 The value of the explanatory variables must contain no information about the variance of the unobserved factors (not to be confused with exogeneity ".. contain no information about the MEAN* of the unobserved factors" Var(Ui | Xi1, Xi2,...,Xik) = σ shorthand notation: Var(Ui | Xi) = σ This assumption may also be hard to justify in many cases

In some cases, normality can be achieved through

transformations of the dependent variable (e.g. use log(wage) instead of wage)

Theorem 3.1

(Unbiasedness of OLS) MLR.1 - MLR.4 E(Bjhat) = Bj j = 0,1,...,k

Gauss-Markov Theorem

- Assumptions (conditions): MLR1 - MLR5

No perfect collinearity

- In the sample (and therefore in the population), none of the independent variables is constant (similar to SLR.3) there are no exact linear relationships among the independent variables Only rules out perfect collinearity/correlation between explanatory variables imperfect correlation is allowed Constant variables are also ruled out (collinear with intercept)

Components of OLS Variances:

1) The error variance: 2) The total sample variation in the explanatory variable: 3) Linear relationships among the independent variables

Components of OLS Variances: 1) The error variance:

A high error variance increases the sampling variance because there is more "noise" in the equation A large error variance necessarily makes estimates imprecise The error variance does not decrease with sample size

-Omitting relevant variables

All estimated coefficients will be biased. Instead of estimating B1 we estimate B1+B2d1. No bias if B2 is equal to zero or d1 is equal to zero.

superfluous variable

An explanatory variable that is a perfect linear combination of other explanatory variables

Unbiasedness

Average property in repeated samples; in a given sample, the estimates may still be far away from the true values'

Multicollinearity problem

Dropping some independent variables may reduce multicollinearity (but this may lead to omitted variable bias) Only the sampling variance of the variables involved in multicollinearity will be inflated; the estimates of other effects may be very precise multicollinearity is not a violation of MLR.3 in the strict sense

endogenous variables

Explanatory variables that are correlated with the error term

exogenous variables

Explanatory variables that are uncorrelated with the error term

MLR.5

Homoskedasticity

More likely to hold zero conditional mean? Simple o multiple regression?

In a multiple regression model, the zero conditional mean assumption is much more likely to hold because fewer things end up in the error

Assumption: Linear in parameters

In the population, the relationship between y and the explanatory variables is linear y = B0 + B1X1 + B2X2 + ... + BkXk + u

Exogeneity

Key assumption for a causal interpretation of the regression, and for unbiasedness of the OLS estimators The value of the explanatory variables must contain no information about the mean of the unobserved factors

MLR.1

Linear in parameters

Gauss-Markov Theorem definition

Mathematical result stating that, under certain conditions, the OLS estimator is the best linear unbiased estimator (BLUE) of the regression coefficients conditional on the values of the regressors.

Components of OLS Variances: 2) The total sample variation in the explanatory variable:

More sample variation leads to more precise estimates Total sample variation automatically increases with the sample size Increasing the sample size is thus a way to get more precise estimates

Will irrelevant variables cause bias in a model?

No

MLR.3

No perfect collinearity

MLR.6

Normality of error terms

When is OLS BLUE?

OLS is only the best estimator if MLR.1 - MLR.5 hold; if there is heteroskedasticity for example, there are better estimators

Partialling Out

One can show that the estimated coefficient of an explanatory variable in a multiple regression can be obtained in two steps: 1) Regress the explanatory variable on all other explanatory variables 2) Regress y on the residuals from this regression

MLR.2

Random sampling

Components of OLS Variances: 3) Linear relationships among the independent variables

Regress xj on all other independent variables (including a constant) (The R-squared of this regression will be the higher the better xj can be linearly explained by the other independent variables) Sampling variance of Bjhat will be the higher the better explanatory variable xj can be linearly explained by other independent variables The problem of almost linearly dependent explanatory variables is called multicollinearity

Theorem 3.2

Sampling variances of the OLS slope estimators Assumptions MLR.1 - MLR.5

Random sampling

The data is a random sample drawn from the population {(xi1,xi2,... ,xik,yi): i=1,...,n} Each data point, therefore, follows the population equation Yi = B0 + B1Xi2 + ... + BkXik + Ui

Why partialling out works?

The residuals from the first regression is the part of the explanatory variable that is uncorrelated with the other explanatory variables. The slope coefficient of the second regression therefore represents the isolated effect of the explanatory variable on the dependent variable.

Zero conditional mean

The value of the explanatory variables must contain no information about the mean of the unobserved factors E(ui/xi,xi2,...,xik)=0

The Gauss-Markov Theorem

Under assumptions MLR.1 - MLR.4, OLS is BLUE However, under these assumptions there may be many other estimators that are unbiased Which one is the unbiased estimator with the smallest variance? OLS. (MLR.5)

Theorem 3.4 (Gauss-Markov Theorem)

Under assumptions MLR.1 - MLR.5, the OLS estimators are the best linear unbiased estimators (BLUEs) of the regression coefficients, i.e.

normality assumption - discussion

Under normality, OLS is the best (even nonlinear) unbiased estimator

Endogeneity

Violation of assumption MLR.4: Zero conditional mean When explanatory variables are correlated with the error term (are endogenous) The value of the explanatory variables contains information about the mean of the unobserved factors

When MRL.4 holds?

When all explanatory variables are exogenous Exogeneity

MLR.4

Zero conditional mean (between error and error)

sampling variances of the OLS estimators

formulas are only valid under assumptions MLR.1-MLR.5 (in particular, there has to be homoskedasticity)

What problem irrelevant factors generate?

may increase sampling variance.

Larger sample sizes:

t-distribution is close to the Normal(0,1) distribution (MLR.6 not neded) t-tests are valid in large samples without MLR.6 MLR.1 - MLR.5 are still necessary, esp. homoskedasticity


Kaugnay na mga set ng pag-aaral

Community Development Chapter 1-2

View Set

English 12 CP midterms study guide

View Set

chapter 10 liabilities accounting

View Set

Ch. 12: Gender, Sex, and Sexuality

View Set

Chapter 11 Human Resource Management: Finding and Keeping the Best Employees

View Set