CFA Level II Multiple Regression and Issues in Regression Analysis
Testing for Heteroskedasticity
The Breusch‐Pagan (BP) Test
Correcting Serial Correlation
1) Adjust the coefficient standard errors to account for serial correlation using *Hansen's method* 2) Modify the regression equation to eliminate the serial correlation.
Correcting Heteroskedasticity
1) Use robust *White‐corrected* standard errors 2) Use generalized least squares, where the original regression equation is modified to eliminate heteroskedasticity
Detecting multicollinearity
A *high R^2* and a *significant F‐stat* (both of which indicate that the regression model overall does a good job of explaining the dependent variable) coupled with *insignificant t‐stats* of slope coefficients (which indicate that the independent variables individually do not significantly explain the variation in the dependent variable) provide the classic case of multicollinearity. The low t‐stats on the slope coefficients increase the chances of Type II errors: failure to reject the null hypothesis when it is false.
Serial Correlation
Causes the F‐stat (which is used to test the overall significance of the regression) to be inflated (deflated) because MSE will tend to underestimate (overestimate) the population error variance. Causes the standard errors for the regression coefficients to be underestimated (overestimated), which results in larger (smaller) t‐values. Analysts may reject (fail to reject) null hypotheses incorrectly, make Type I errors (Type II errors) and attach (fail to attach) significance to relationships that are in fact not significant (significant).
DW
DW = 2(1 − r) where r is the sample correlation between squared residuals from one period and those from the previous period
Effects of multicollinearity
Multicollinearity does not affect the consistency of OLS estimates and regression coefficients, but makes them inaccurate and unreliable. It becomes difficult to isolate the impact of each independent variable on the dependent variable. The standard errors for the regression coefficients are inflated, which results in t‐stats becoming too small and less powerful (in terms of their ability to reject null hypotheses).
Qualitative Dependent Variables
The *probit model* is based on the normal distribution. It estimates the probability that a qualitative condition is fulfilled (Y = 1) given the value of the independent variable (X). The *logit model* is similar except that it is based on the logistic distribution. Both models use maximum likelihood methodologies.
Testing for Serial Correlation
The Durbin‐Watson (DW) Test
Principles of Model Specification
The model should be backed by solid economic reasoning. Data mining (where the model is based on the characteristics of the data) should be avoided. The functional form for the variables in the regression should be in line with the nature of the variables. Each variable in the model should be relevant, making the model "parsimonious." The model should be tested for violations of regression assumptions before being accepted. The model should be found useful out of sample.
Null and alternative hypotheses
The null hypothesis is the position the researcher is looking to reject. The alternative hypothesis is the condition whose existence the researcher is trying to validate.
The BP test
The test statistic for the BP test is a Chi‐squared random variable that is calculated as: χ^2 = nR^2 with k degrees of freedom n = Number of observations R^2 = Coefficient of determination of the second regression (the regression when the squared residuals of the original regression are regressed on the independent variables). k = Number of independent variables H0: No Heteroscedasticity: The original regression's squared error term is uncorrelated with the independent variables. Ha: heteroscedasticity. The original regression's squared error term is correlated with the independent variables. Note: The BP test is a one‐tailed Chi‐squared test because conditional heteroskedasticity is only a problem if it is too large. See Example 3-1.
Correcting for Multicollinearity
exclude one or more of the independent variables from the regression model.
Conditional Heteroskedasticity
occurs when the heteroskedasticity in the error variance is correlated with the independent variables in the regression. While conditional heteroskedasticity does create problems for statistical inference, it can be easily identified and corrected.
Heteroskedasticity
occurs when the variance of the error term in the regression is not constant across observations.
Multicollinearity (between independent variables)
occurs when two or more independent variables (or combinations of independent variables) in a regression model are highly (but not perfectly) correlated with each other.
If the F-value is greater than the critical F-value...
we can reject the null that the slope coefficients on the independent variables equal zero. We conclude that at least one of the slope coefficients is significantly different from 0.