CFA Level 2 2016 - Quant: Multiple Regression & Analysis issues
Name of the test for Heteroskedasticity?
The Breusch‐Pagan (BP) Test
How to correct Serial Correlation?
1) Adjust the coefficient standard errors to account for serial correlation using *Hansen's method* 2) Modify the regression equation to eliminate the serial correlation.
How to correct Heteroskedasticity?
1) Use robust *White‐corrected* standard errors 2) Use generalized least squares, where the original regression equation is modified to eliminate heteroskedasticity
5 Principles of multiple regression Model Specification?
1. The model should be backed by solid economic reasoning. Data mining (where the model is based on the characteristics of the data) should be avoided. 2. The functional form for the variables in the regression should be in line with the nature of the variables. 3. Each variable in the model should be relevant, making the model "parsimonious." 4. The model should be tested for violations of regression assumptions before being accepted. 5. The model should be found useful out of sample.
What are the 3 problems with linear regression, what are their effects, and their solutions?
1. heteroskedasticity --> incorrect std errors, too many type 1 errors --> use robust standard errors (correct them: WHITE-corrected, if heteroskedasticity is CONDITIONAL) 2. serial correlation --> incorrect std errors, by lagged dependent variable used as independent variable, so too many type 1 errors (if positive corr) --> use robust std errors (correct them via HANSEN's method) 3. multicollinearity --> high r^2 and low t-stats --> remove 1 or more independent variables (often no solution in theory)
How to detect multicollinearity?
A *high R^2* and a *significant F‐stat* (both of which indicate that the regression model overall does a good job of explaining the dependent variable) coupled with *insignificant t‐stats* of slope coefficients (which indicate that the independent variables individually do not significantly explain the variation in the dependent variable) provide the classic case of multicollinearity. The low t‐stats on the slope coefficients increase the chances of Type II errors: failure to reject the null hypothesis when it is false.
What is Serial Correlation?
Causes the F‐stat (which is used to test the overall significance of the regression) to be inflated (deflated) because MSE will tend to underestimate (overestimate) the population error variance. Causes the standard errors for the regression coefficients to be underestimated (overestimated), which results in larger (smaller) t‐values. Analysts may reject (fail to reject) null hypotheses incorrectly, make Type I errors (Type II errors) and attach (fail to attach) significance to relationships that are in fact not significant (significant).
The effect of omitting an important variable in a regression...?
Coefficients are biased and/or inconsistent
The Durbin‐Watson (DW) Test formula? How to interpret its value?
DW = 2(1 − r) where r is the sample correlation between squared residuals from one period and those from the previous period
How to assess a multiple regression model (flowchart)? (5 steps)
HSM 1. is model correctly specified --> N: correct the model 2a. Are individual coefficients statistically significant? (t-test: t=estimated regression parameter/std error of parameter n-k-1 degrees of freedom) AND 2b. is the model statistically significant? (F-test) --> N: start another model 3. Heteroskedasticity? --> Y: test if conditional via BP, if Y: the WHITE correct, if N on either: continue 4. Serial correlation? --> Y: hansen method correction 5. Multicollinearity? --> drop a variable/parameter
What is Conditional Heteroskedasticity?
Heteroskedasticity in the error variance is correlated with the independent variables in the regression. While conditional heteroskedasticity does create problems for statistical inference, it can be easily identified and corrected.
Significance of F [SKIP]
In this case, significance is extremely high
What is the The Breusch‐Pagan (BP) Test? What is the formula?
It tests for heteroskedasticity. The test statistic for the BP test is a Chi‐squared random variable that is calculated as: χ^2 = nR^2 with k degrees of freedom n = Number of observations R^2 = Coefficient of determination of the second regression (the regression when the squared residuals of the original regression are regressed on the independent variables). k = Number of independent variables H0: No Heteroscedasticity: The original regression's squared error term is uncorrelated with the independent variables. H1: heteroscedasticity. The original regression's squared error term is correlated with the independent variables. Note: The BP test is a ONE-TAILED Chi‐squared test because conditional heteroskedasticity is only a problem if it is too large.
Adjusted R^2 formula? And why use adjusted r^2?
R-squared supposes that every independent variable in the model explains the variation in the dependent variable. It gives the percentage of explained variation as if all independent variables in the model affect the dependent variable. The adjusted R-squared gives the percentage of variation explained by only those independent variables that in reality affect the dependent variable. R-squared cannot verify whether the coefficient ballpark figure and its predictions are prejudiced. It also does not show if a regression model is satisfactory; it can show an R-squared figure for a good model, or a high R-squared figure for a model that doesn't fit.
Describe Qualitative Dependent Variables models (2)?
The *probit model* is based on the normal distribution. It estimates the probability that a qualitative condition is fulfilled (Y = 1) given the value of the independent variable (X). The *logit model* is similar except that it is based on the logistic distribution. Both models use maximum likelihood methodologies.
What is the name to Test for Serial Correlation?
The Durbin‐Watson (DW) Test
What are the null and alternative hypotheses?
The null hypothesis is the position the researcher is looking to reject. The alternative hypothesis is the condition whose existence the researcher is trying to validate.
How to correct for Multicollinearity?
exclude one or more of the independent variables from the regression model.
What is Multicollinearity and what are its effects?
occurs when two or more independent variables (or combinations of independent variables) in a regression model are highly (but not perfectly) correlated with each other. Multicollinearity does not affect the consistency of OLS estimates and regression coefficients, but makes them inaccurate and unreliable. It becomes difficult to isolate the impact of each independent variable on the dependent variable. The standard errors for the regression coefficients are inflated, which results in t‐stats becoming too small and less powerful (in terms of their ability to reject null hypotheses). !!R^2 does NOT necessarily increase when we drop one independent variable
What is Heteroskedasticity?
the VAR (s^2) of the error term in the regression is not constant across observations.
If the F-value is greater than the critical F-value, what conclusions can you draw?
we can reject the null hypothesis that the slope coefficients on all the independent variables equal zero. We conclude that at least one of the slope coefficients is significantly different from 0.