SAL 213 Unit 2
detection of autocorrelation
-graphical method -Durbin-Watson Test -Breusch-Godfrey (BG) test
Breusch-Pagan (BP) Test
-Estimate the OLS regression, and obtain the squared OLS residuals from this regression. -Regress the square residuals on the k regressors included in the model. -The null hypothesis here is that the error variance is homoscedastic - that is, all the slope coefficients are simultaneously equal to zero. -Use the F statistic from this regression with (k-1) and (n-k) in the numerator and denominator df, respectively, to test this hypothesis. -If the computed F statistic is statistically significant, we can reject the hypothesis of homoscedasticity. If it is not, we may not reject the null hypothesis.
remedial measures
-First-Difference Transformation -Generalized Transformation -Newey-West Method -Model Evaluation
detection of heteroscedasticity
-Graph histogram of squared residuals -Graph squared residuals against predicted Y -Breusch-Pagan (BP) Test -White's Test of Heteroscedasticity
Detection of Multicollinearity
-High R^2 but few significant t ratios -High pair-wise correlations among explanatory variables or regressors -High partial correlation coefficients -Significant F test for auxiliary regressions -High Variance Inflation Factor (VIF) - particularly exceeding 10 in value - and low Tolerance Factor (TOL, the inverse of VIF)
What should we do if we detect multicollinearity?
-Nothing, for we often have no control over the data -Redefine the model by excluding variables may attenuate the problem, provided we do not omit relevant variables. -Principal components analysis: Construct artificial variables from the regressors such that they are orthogonal to one another.
detection of omission of variables
-Ramsey's Regression Specification Error (RESET) Test -Lagrange Multiplier (LM) test
White's Test
-Regress the squared residuals on the regressors, the squared terms of these regressors, and the pair-wise cross-product term of each regressor. -Obtain the R2 value from this regression and multiply it by the number of observations. -Under the null hypothesis that there is homoscedasticity, this product follows the Chi-square distribution with df equal to the number of coefficients estimated. -The White test is more general and more flexible than the BP test.
consequences of high collinearity
-The OLS estimators are still BLUE, but one or more regression coefficients have large standard errors relative to the values of the coefficients, thereby making the t ratios small -Even though some regression coefficients are statistically insignificant, the R2 value may be very high -Therefore, one may conclude (misleadingly) that the true values of these coefficients are not different from zero. -Also, the regression coefficients may be very sensitive to small changes in the data, especially if the sample is relatively small.
Consequences of Heteroscedasticity
-The OLS estimators are still unbiased and consistent, yet the estimators are less efficient, making statistical inference less reliable (i.e., the estimated t values may not be reliable). -Thus, estimators are not best linear unbiased estimators (BLUE); they are simply linear unbiased estimators (LUE). -In the presence of heteroscedasticity, the BLUE estimators are provided by the method of weighted least squares (WLS).
consequences of autocorrelation
-The OLS estimators are still unbiased and consistent. -They are still normally distributed in large samples. -They are no longer efficient, meaning that they are no longer BLUE. -In most cases standard errors are underestimated. -Thus, the hypothesis-testing procedure becomes suspect, since the estimated standard errors may not be reliable, even asymptotically (i.e. in large samples).
Durbin-Watson test
-Two critical values of the d statistic, dL and dU, called the lower and upper limits, are established. Decision rules: 1. If d < dL, there probably is evidence of positive autocorrelation. 2. If d > dU, there probably is no evidence of positive autocorrelation. 3. If dL < d < dU, no definite conclusion about positive autocorrelation. 4. If dU < d < 4 - dU, probably there is no evidence of positive or negative autocorrelation. 5. If 4 - dU < d < 4 - dL, no definite conclusion about negative autocorrelation. 6. If 4 - dL < d < 4, there probably is evidence of negative autocorrelation.
the simultaneity problem
-There are many situations where such unidirectional relationship between Y and the Xs cannot be maintained, since some Xs affect Y but in turn Y also affects one or more Xs. -Simultaneous equation regression models are models that take into account feedback relationships among variables.
What should we do if we detect heteroscedasticity?
-Use method of Weighted Least Squares (WLS) -Take natural log of dependent variable. -Use White's heteroscedasticity-consistent standard errors or robust standard errors.
Durbin-Watson d value
-d value always lies between 0 and 4. -The closer it is to zero, the greater is the evidence of positive autocorrelation, and the closer it is to 4, the greater is the evidence of negative autocorrelation. If d is about 2, there is no evidence of positive or negative (first) order autocorrelation.
Durbin alternative test
-takes into account the lagged dependent variables -Provides a formal test of the null hypothesis of serially uncorrelated disturbances against the alternative of autocorrelation of order p
consequences of omission
-the coefficients of the estimated model are biased -Even if the incorrectly excluded variables are not correlated with the variables included in the model, the intercept of the estimated model is biased. -The disturbance variance is incorrectly estimated. -The variances of the estimated coefficients of the misspecified model are biased. -In consequence, the usual confidence intervals and hypothesis-testing procedures become suspect, leading to misleading conclusions about the statistical significance of the estimated parameters. -Furthermore, forecasts based on the incorrect model and the forecast confidence intervals based on it will be unreliable.
Consequences for Errors of Measurement in the Regressor
1. OLS estimators are biased as well as inconsistent. 2. Errors in a single regressor can lead to biased and inconsistent estimates of the coefficients of the other regressors in the model. -It is often suggested that we use instrumental or proxy variables for variables suspected of having measurement errors. The proxy variables must satisfy two requirements—that they are highly correlated with the variables for which they are a proxy and also they are uncorrelated with the usual equation error as well as the measurement error. But such proxies are not easy to find.
Consequences for Errors of Measurement in the Regressand
1. The OLS estimators are still unbiased. 2. The variances and standard errors of OLS estimators are still unbiased. 3. But the estimated variances, and ipso facto the standard errors, are larger than in the absence of such errors. In short, errors of measurement in the regressand do not pose a very serious threat to OLS estimation.
consequences of Inclusion of irrelevant variables
1. The OLS estimators of the "incorrect"or overfitted model are all unbiased and consistent. 2. The error variance is correctly estimated. 3. The usual confidence interval and hypothesis testing procedures remain valid. 4. However, the estimated coefficients of such a model are generally inefficient (their variances will be larger than those of the true model).
Durbin-Watson test assumptions
1. The regression model includes an intercept term. 2. The regressors are fixed in repeated sampling. 3. The error term follows the first-order autoregressive (AR1) scheme: ut=pu(t-1)+vt (where ρ (rho) is the coefficient of autocorrelation, a value between -1 and 1.) 4. The error term is normally distributed. 5. The regressors do not include the lagged value(s) of the dependent variable, Yt.
Perfect collinearity
A perfect linear relationship between the two variables exists.
multicollinearity
A situation in which several independent variables are highly correlated with each other. This characteristic can result in difficulty in estimating separate or independent regression coefficients for the correlated variables.
errors of measurement
Although not explicitly spelled out, this presumes that the values of the regressand as well as regressors are accurate. That is, they are not guess estimates, extrapolated, interpolated or rounded off in any systematic manner or recorded with errors.
model specification error
By correct specification we mean one or more of the following: 1. The model does not exclude any "core" variables. 2. The model does not include superfluous variables. 3. The functional form of the model is suitably chosen. 4. There are no errors of measurement in the regressand and regressors. 5. Outliers in the data, if any, are taken into account. 6. The probability distribution of the error term is well specified. 7. The regressors are nonstochastic
omission of relevant variables
If we omit a relevant variable because we do not have the data, or because we have not studied the underlying economic theory carefully, or because we have not studied prior research in the area thoroughly, or just due to carelessness, we are underfitting a model
Inclusion of irrelevant variables
Sometimes researchers add variables in the hope that the R2 value of their model will increase in the mistaken belief that the higher the R2 the better the model. This is called overfitting a model.
Imperfect collinearity
The regressors are highly (but not perfectly) collinear.
Breusch-Godfrey (BG) test
This test allows for (1) Lagged values of the dependent variables to be included as regressors (2) Higher-order autoregressive schemes, such as AR(2), AR(3), etc. (3) Moving average terms of the error term, such as ut-1, ut-2, etc. The error term in the main equation follows the following AR(p) autoregressive structure: ut=p1ut-1+p2ut-2+...+vt The null hypothesis of no serial correlation is: p1=p2=...=0
variance-inflating factor (VIF)
a measure of the degree to which the variance of the OLS estimator is inflated because of collinearity
Heteroscedasticity
occurs when the spread in Y is not equal throughout the relationship Reasons include: -The presence of outliers in the data -Incorrect functional form of the regression model -Incorrect transformation of data -Mixing observations with different measures of scale (such as mixing high-income households with low-income households).
autocorrelation
the next data point is likely to be similar to the last data point Reasons include: -The possible strong correlation between the shock in time t with the shock in time t+1 -more common in time series data
Endogenous variables
variables whose values are determined in the model
Exogenous variables
variables whose values are not determined in the model -Sometimes, exogenous variables are called predetermined variables, for their values are determined independently or fixed, such as the tax rates fixed by the government. Estimate parameters using Method of Indirect Least Squares (ILS) or Method of Two-Stage Least Squares (2SLS).