ECON 2206 - ECONOMETRICS

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Classical linear model assumptions CLM

MLR 1- 6

Gauss markov for Multiple regression

MLR1: linear in parameters 2. random sampling 3. no perfect collinearity (no exact linear relationships among the independent variables, when perfectly correlated we can omit one of the variables) 4. ZCM 5. homoskedasticity

zero conditional mean assumption

explanatory variable must not be influence by the unobservable (u)

multicollinearity

- A situation in which several independent variables are highly correlated with each other. This characteristic can result in difficulty in estimating separate or independent regression coefficients for the correlated variables. - sample problem and not amenable by testing population parameters

Frisch Waugh theorem

- "partialling out" interpretation of multiple regression - residuals from holding other things constant, uncorrelated with the other explanatory variables

Unit Root Process

- A highly persistent time series process where the current value equals last periods value, plus a weakly dependent disturbance - et is allowed to have an arbitrary weakly dependent process - A series of successive differences

Random Walks with Drift

- A random walk that has a constant (or drift) added in each period. - now a linear trend which random walk wanders - no clear direction of which it develops, may wander away from trend - not covariance stationary or weakly dependent

Adjusted R^2

- Adjusted R^2 compares the explanatory power of regression models that contain different nums of predictions - Increases when -> the new term improves the model more that would be expected by chance. - Decreases when -> the new term improves the model less than by chance - Adjusted can be negative but is always lower than the normal r^2

strict exogeneity

- An assumption that holds in a time series or panel data model when the explanatory variables are strictly exogenous. - rules out feedback from dependent variable on future values of explanatory variables - precludes use of lagged dependent variables as regressors

normality

- Assumption MLR.6 - the population error u is independent of the explanatory variables and is normally distributed with zero mean and variance. - cases where normality impossible: wages (continous but non-negative), number of arrests (discrete taking on integer values), unemployed (discrete indicator variable, yes or no). - normality achieved through transformations or use of a large enough sample - use the central limit theorem to conclude u has a normal distribution -> normality of error terms

Omitted variable bias

- The bias that arises in the OLS estimators when a relevant variable is omitted from the regression. - misspecification occurs however when too many variables are included - deciding whether it should be included, use t-test not R^2

F-test

- Used to compare a group of variances

dummy variables

- a way to include qualitative variables i.e. gender

random walks

- all values today is an accumulation of past shocks past initial value - random walks are highly persistent (effect of shock lasts forever) - not covariance stationary (variance and correlation are time dependent) - not weakly dependent as correlation vanishes slowly with large t.

limitations of FD estimator

- assuming -> omitted variables (approximately) constant over time - omitted variables that vary over I and t still cause bias - need variation in included explanatory variables

white test for hetero

- based on auxiliary regression with squared residuals as the dependent variable - explain by squares and cross-products of x's - can detect broader class of hetero than BP

homoskedasticity

- constant variance - VAR(u|x) = delta^2, can not dependent on x - value of explanatory variable must not contain info about variability of unobservables

spurious regression

- correlation != causation - ommitted variable problem can occur when variables are I(0) the problem: - regressing one I(1) series on another I(1) series leads to extremely high t-stats even if the series are completely independent - regression involving unit root may lead to misleading inferences - regressions with I(1) variables are not always spurious (co=integrated)

static models

- current value of dependent variable modelled as a result of current (contemporaneous values of explanatory variables) - phillips curve - impact is felt instantaneously (firms/people may be slow to react or to change their behaviour, costs that impeded instantaneous reactions)

Average partial effect APE

- describes a relationship between dependent variable and each explanatory variable - reparameterization useful to find APE (enables easy interpretation of parameters

Components of OLS estimators

- error variance (large delta^2 increases with more noise, estimates now imprecise) - total sample variation in explanatory variable (more sample variation, more precise estimates) - linear relationship between explanatory variables (make sure multi-collinearity doesn't occur)

outliers

- extreme outliers can cause problems for OLS - use less sensitive approach (least absolute deviations LAD) - cleaning data, tedious but necessary first step in analysis - find whether the outliers really matter

weighted least squares

- for hetero transformation involves weighting the data, WLS - exploiting info to generate an even more precise estimator, e.g. generalised least squares GLS transformed model has: - no intercept - regressors should not be interpreted in a structural sense - if Og model satisfies MLR1-4 then so will the transformed model when wrong form: - if misspecified WLS is still consistent under MLR1-4 - if there is strong heter often better to use wrong form in order to increase efficiency - if OLS and WLS produce very different estimates indicates other assumptions e.g. MLR.4 are wrong

pooled-cross sections

- independent cross sections over different time periods are pooled together to create one data set useful because: - increased sample size - investigate whether relationships are constant over time - evaluate the impact of the policy - need for use of dummy variables across time

Efficient Market Hypothesis (EMH)

- info observable in market prior to week t should not help predict return during week t (lagged dependent variable should not have predictive power) - TS.3' holds and returns are weakly dependent - B1 = 0 if past returns are not predictive of current returns

Fixed effects of LSDV regression

- interpretation of fixed effects as a least squares dummy regression - if T =2 fixed effects and first differencing are identical (for T >2 fixed effects is more efficient if classical assumptions hold, under reasonable assumptions both are consistent) - first differencing may be better in case of severe serial correlation in og errors

R^2

- is the ratio of the explained variation compared to the total variation. - between 0 and 1, i.e. closer to 1 the data us close to the regression line

Functional Form Misspecification

- lead to biased an inconsistent estimates - use: -> exclusion restrictions for nested + non-nested models (use of f-test) -> general form misspecification (RESET test, useful test that can detect various form misspecifications )

sem-log model

- linear in parameters - non linear between dependent and explanatory variables

log-log model

- linear regression model but non-linear functional form

using highly persistent time series in regression analysis

- many time series violate weak dependence because they are highly persistent (=strongly dependent) - OLS methods are thus invalid - transformation to weak dependence sometimes possible

motivation for multiple regression analysis

- measure the ceteris paribus effects of the independent variable - capture the ommitted factors in the disturbance term, allowing for better OLS estimates

seasonality in time series

- modelling -> create a set of seasonal dummies - R^2 based on first deseasonalising dependent variable may better reflect the explanatory power of variables - if it appears to be deseasonalised still include seasonal variables

nested models

- models are nested if one is a special case of the other non-nested, cant impose restrictions, models with different dependent variables -> can occur when there is a log of the dependent variable

dummy variable trap

- multicollinearity problem occurs if all dummy variables are added i.e. male + female = 1 - they are perfectly collinear - solution: drop one of the dummies - we can get an intercept if we exclude one

asymptotic bias/ normality

- often normality assumtpion MLR.6 often questionable - OLS estimates are normal in large samples even without MLR.6

consistency of OLS

- probability of estimate being close to the true population value, high with increased sample size - MLR1 - MLR 4 - collapses on the true value with increased sample

confidence intervals

- provide a range of likely values for the population parameter

Hypothesis testing - t-test

- rejection at given significance level

functional form: interactions

- similar to quadratics, interaction term complicates interpretation of parameters - e.g. num of bedrooms depends on level of square footage

ordinary least squares

- simplest goal of linear regression - closely fit the function with the data - done by minimizing the sum of squared residuals - variances of the OLS estimators from SLR1-SLR5

first differenced FD panel estimator

- solution using first differencing (unobserved heterogeneity has been differenced away) - OLS applied to this equation is the first-differenced estimator - further explanatory variables may be included in the og equation - will consistently estimate causal effects in presence of time-invariance endogeneity (need strict exogeneity in og equation) - estimates will be imprecise if explanatory variables vary little over time

stationary

- stochastic properties and its temporal dependence structure do not change over time - covariance stationary (mean constant, same variance and covariance between observations) - when non-stationary easy to detect (time series with linear trend) - A time series process where the marginal and all joint distributions are invariant across time.

weak dependence

- the measure of dependence between variables at 2 points in time diminishes as the interval between the 2 points in time increases (i.e. as we move into the future) - a series may be non-stationary but weakly dependent - for theory LLN & CLT individual observations must not be too strongly related to each other (relation becomes weaker the further apart they become)

p-values

- the smallest significance level in which H0 will be rejected - small p-values imply all variables have statistical significance, when p-values < 0.05 indicates strong evidence against the null

using trend stationary in regression analysis

- time series with deterministic time trends are non-stationary - if stationary around the trend and are weakly dependent (called trend-stationary processes) - these satisfy assumption TS.1'

Time series I(1)

- to find whether a times series is I(1) -> unit root tests - or use sample first order autocorrelation (measure how strongly adjacent times observations are related, close to 1 is highly persistent and contains a unit root) - unit root and trend may be eliminated by differencing

robust inference/standard erros

- under hetero OLS estimators are unbiased an consistent - robust inference developed so inference is valid in present of heterskedasticity of unknown form

difference in differences estimator DiD

- used to evaluate the policy change - compare differences in prices before and after the policy - control for systematic differences between treatment and control

spurious regression

-> if trending variables are regressed on each other but are driven by a common trend -> time series producing high (inaccurate) values of R^2 -> trend should be included when: depending variable shows trending behaviour, if both dependent and independent have trends

Gauss-markov assumptions

1. linear in parameters 2. random sampling 3. sample variation in explanatory variable 4. zero conditional mean (ZCM) 5. Homoskedasticity (constant variance)

Linear Probability Model (LPM)

A binary response model where the response probability is linear in its parameters. disadvantages: - predicted probabilities may be larger of smaller than 1 or 0 - marginal probability sometimes logically impossible - needs to be heteroskedastic advantages LPM: - easy interpretation and estimation

finite distributed lag model

A dynamic model where one or more explanatory variables are allowed to have lagged effects on the dependent variable. - long-run is the cumulative effect of all the individual effects

Exogenous Sample Selection

A sample selection that either depends on exogenous explanatory variables or is independent of the error term in the equation of interest. -> sample selection not a problem if uncorrelated with the error term of the regression

Chow test

A test for a break in a time series regression at a known break date. - test of poolability - joint F-test of linear restrictions - relies on assumption of homoskedasticity (constant variance)

feasible GLS

A version of the generalized least squares (GLS) estimator that uses an estimator of the conditional variance of the regression errors and covariance between the regression errors at different observations.

BLUE

Best Linear Unbiased Estimator - under assumptions 1-4 OLS is unbiased - under assumptions 1-5 OLS is BLUE allows for the smallest variance

contemporaneously exogenous

Describes a time series or panel data application in which a regressor is contemporaneously exogenous if it is uncorrelated with the error term in the same time period, although it may be correlated with the errors in other time periods. -> same time period

heteroskedasticity

Non-constant error variance Detect: Breusch-Pagan test Correct: White-corrected standard errors - hetero present, OLS is not most efficient linear estimator (not BLUE)

Endogenous Sample Selection

Nonrandom sample selection where the selection is related to the dependent variable, either directly or through the error term in the equation -> sample selection a problem when based on the dependent variable of the error term e.g. modelling wages on basis of workers

ommitted variables

OV bias occurs if excluded variable is correlated with individual variables how to avoid it: - include relevant variables - introduce proxies - exploit panel data - find instrument variables

OLS under classical assumptions for TS

TS.1 linear in parameters 2. no perfect collinearity 3. zero conditional mean (strict exogeneity in all periods v strong assumption, or contemporaneous exogeneity just for the current period) 4. homoskedasticity (time series often violates constant variance in the error) 5. no serial correlation or autocorrelation (crucial in TS modelling, conditional on explanatory variables unobservable factors are not correlated over time)

asymptotic properties of OLS

TS.1' linear in parameters (same as usual but now dependent + independent variables are assumed to be stationary + weakly dependent) 2. no perfect collinearity 3. ZCM (variables assumed to be only contemporaneously exogenous rather than strictly exogenous) 4. homoskedasticity (contemporaneous) 5. no serial correlation

proxy variables

an alternative measurement of outcomes; used when it is difficult to determine and measure direct effects - unobserved explanatory variables - e.g. IQ when determining ability in wage equation Lagged dependent variable may be a good proxy for general unobserved factors

time series data

data collected over several time periods - ordering of observations matters - typical features: serial correlation (past impacting future), often trends over time, seasonality (tendency for patterns to occur) - randomness f time series, sequences of r.v. (stochastic processes) - temporal nature of time series data

functional form: quadratics

gives marginal change

Asymptotic theory

is large sample theory, assessing properties of estimators and statistical framework, as n tends to infinity

differencing with T >2

issues: - differenced model does not contain an intercept but this can be "fixed" if necessary - take care when calculating differences - might worry new error term is serially correlated over the 2 time periods

transformations on highly persistent time series

order of integration: - weekly dependent time series are integrated over order zero I(0) - time series that has to be difference one time to obtain weakly dependent is called integrate of order one I(1)

two time periods

simplest panel data (t=2) - observe all n entities at t= 1 and t=2 - not pooled cross sections note that a is fixed over time - think of it as an omitted variable - fixed effect or unobserved heterogeneity can estimate B1 consistently

Asymptotic normality of OLS

under assumptions TS.1'-TS.5' OLS estimators are asymptotically noramally distributed. - OLS std errors, t-stats, f-stats, are asymptotically valid

functional form: logs

useful too: - eliminate problems with outliers (decreases the variance) - helps to secure normality and homoskedasticity variables that should not be logged: units in percentage, zero/negative values


संबंधित स्टडी सेट्स

Astronomy 105: Chapter 12: Saturn

View Set

Unit 2B Nature and Functions of Product Market (Surplus, Elasticity, and Utility)

View Set

Chapter 3: The Gallbladder (Penny)

View Set

Chapter 13 Consideration Practice Questions

View Set

Chapter 1: Comparing Security Roles and Security Controls

View Set