Chapter 9: Multiple Regression-FIN 360
incorrect functional form, violates regression assumptions, has time series specification problems
3 important model misspecifications
omit important variables, incorrectly represent variables, pool data that shouldn't be pooled
3 things that can cause incorrect functional form misspecifications
lagged values of independent variables as dependent variables, independent variables that are measured with error, independent variables that are function of the dependent variable
3 time series misspecifications
tested out of sample, compliant with regression assumptions, model is parsimonious, variables are appropriate functional form, model is grounded in sound economic reasoning
5 important model specifications
qualitative
Dependent variables that take on ordinal or nominal values are better estimated using models developed for ____________ analysis
multiple linear regression
Y = b0+ b1X1+ b2X2+ b3X3+...+ bKXK is the equations for
degrees of freedom
adjusted r^2 doesn't automatically increase when another variable is added to the regression; it's adjusted for this
multiple linear regression
allows us to determine the effect of more than one independent variable on a particular dependent variable
residuals
another word for error term (according to her)
relationship between the dependent variable and independent variables is linear, the independent variables are linear and no linear relation exists between the independent variables, the expected value of the error term is 0, the variance of the error term is the same for all observations, error term is uncorrelated across observations, error term is normally distributed, residuals are independent
assumptions of the multiple linear regression model
estimated values of the regression coefficients
b0 hat, b1 hat, b2 hat, etc. are the
inflated standard errors>smaller t-stat>fail to reject more often than you should , high r^2
consequences of multicollinearity
data mining
developing a model that exploits characteristics of a specific data set
qualitative dependent variables
dummy variables used as dependent variables instead of as independent variables ex-dependent variable is bankrupt or not bankrupt and the company's financial data is the independent variable
Altman's Model
example of discriminant analysis
conditional heteroskedasticity
heteroskedasticity of the error variance that is correlated with the values of the independent variables in the regression
unconditional heteroskedasticity
heteroskedasticity of the error variance that is not correlated with the independent variables in the multiple regression
test the null hypothesis that all the sloe coefficients in the regression are equal to 0
how do we answer the question of whether all the independent variables in the model help explain the dependent variable?
2
if a regression has no serial correlation, the durbin watson statistic should equal
0
if none of the independent variables in a regression model help explain the dependent variable, the slope coefficients should all equal what
positive serial correlation
if the DW statisitc is less than 2 then it suggests what
negative serial correlation
if the DW statistic is more than 2, then it suggests what
0
if the independent variables don't explain the variation in the dependent variable at all, then the f stat should be
large
if the regression model does a good job of explainting the variation in the dependent variable, the f-stat should be
n-1
if we want to distinguish among n categories, we need how many dummy variables ex-if we want to distinguish between 4 quarters of the year, we need ________ dummy variables
value of the dependent variable if the all the independent variables have a value of 0
interpret the intercept
slope coefficient
measures how much the dependent variable changes when a independent variable changes by one unit
f statistic
measures how well the regression equation explains variation in the dependent variable
nonstationarity
most frequent source of misspecficaitons in time series linear regression with 2 or more different variables
financial
multicollinarity is prevalent in _________ models
y=b0 +b1x1 + b2x2.......
multiple linear regression model equation
multiple linear regression
proc reg; model y = x1 x2 x3; is an example of what
simple linear regression
proc reg; model y = x; is an example of what
weighted linear regression
proc reg; model y = x; weight w; is an example of what
logit
qualitative dependent variable model that based on the logistic distribution, it estimates the probability of the dependent variable outcome
probit
qualitative dependent variable model that based on the normal distribution, it estimates probability of the dependent variable outcome
discriminant analysis
qualitative dependent variable model that estimates a linear function, which can then be used to assign the observation to the underlying categories.
dummy variable
quantitative independent variables that take on a value of 1 if a condition is true and 0 if a condition is false ex-an independent variable that =1 if the month is january and =0 for every other month
log-log regression model
regression model that expresses the independent and dependent variables as natural logs
proc anova
sas command to get analysis of variance
proc nlin
sas command to use for nonlinear regression
proc glm
sas commond to use for general linear models that have unbalance among two or more factors
first order serial correlation
serial correlation between adjacent observations
positive serial correlation
serial correlation in which a positive error for one observation increases the change of a positive error for another observation
p-value
smallest level of significance at which we can reject a null hypothesis that the population value of the coefficient is 0
durbin watson statistic
statistic used to detect serial correlation
decrease
when a new indepedent variable is added, adjusted r^2 can __________ if adding the variable results in only a small increase in adjusted R^2
when the f-stat is greater than the upper critical value of the f distribution
when do we reject the null hypothesis in the f test
when the model is only used for predictions
when is multicollinearity not a problem
when proportional changes in the dependent variable bear constant relationships to proportional changes in the independent variable
when is using the log-log regression model appropriate
nonstationarity
when the variable's properties, such as mean or variance, are not constant through time
time series regressions
where do you typically see issues with serial correlation
financial analysts
who often needs to use quantitative variables as independent variables in a regression
want to avoid data mining
why is it important for our model to be grounded in sound economic reasoning
f-test
what test do we use to test the null hypothesis that all of the slope coefficients are jointly equal to 0 against the alternative hypothesis that at least one slope coefficient is not equal to 0
unconditional
what type of heteroskedasticty doesn't create problems for statistical inference
heteroskedasticity, serial correlation, multicollinearity
the 3 main violations of regression assumptions
lower
the ____________ the p-value, the stronger the evidence against the null hypothesis
true
true/false a high R^2 doesn't necessarily mean that a regression is well specified in the sense of including the correct set of independent variables
false
true/false adjusted r^2 cannot be negative
true
true/false financial analysts often need to be able to explain the outcomes of a qualitative dependent variable
true (overestimate t stat)
true/false having heteroskedasticity in your model can lead to finding more significant relationships when none actually exist
false
true/false high pairwise correlations among independent variables are a necessary condition for multicollinearity
false
true/false interpreting the slope coefficients in multiple regression is the same as doing so in one independent variable regressions
true
true/false low pairwise correlations do not mean that multicollinearity isn't a problem
false (unreliable)
true/false predictions based on values outside the range of the data on which the multiple regression model was estimated tend to reliable
false (worse)
true/false r^2 is just as good as a measure of goodness of fit in multiple regression as it is in simple regression
true
true/false we can increase r^2 by including many additional independent variables that explain even a slight amount f the previously unexplained variation, even if the amount they explain isn't significant
true
true/false we make the assumption that positive serial correlation takes the form of first order serial correlation, which means that the sign of the error term tends to persist from one period to the next
error term, parameter estimates
two sources of uncertainty in linear regression
increase the sample size, use different variables
two ways to correct for multicollinearity
incorporate the source of the correlation to capture the effect, use alternative time series methods that capture serial correlation
two ways to correct for serial correlation
visual inspection of residuals, durbin watson test
two ways to detect serial correlation
indicator (dummy) variables
used to capture qualitative aspects of the hypothesized relationship
serial correlation
violation of regression assumptions that happens when regression errors are correlated across observations
heteroskedasticity
violation of regression assumptions that happens when the variances of the error term differs across observations
multicollinearity
violation of regression assumptions that happens when two ore more independent variables are highly correlated with each other
underestimate standard error, overestimate t-stat, f-test will be unreliable
what are a few consequences of heteroskedasticiy
visually inspect residuals, associate residuals with another factor, use goldfeld quant test
what are a few methods that you can use to test for heteroskedasticity
partial regression coefficients
what are slope coefficients in multiple regression models called
underestimate standard error>overestimate t stat> reject the null more often than you should
what are the main consequences of serial correlation
use more sophisticated estimation methods, incorporate an independent variable that includes the missing factor
what are two ways to correct for heteroskedasticity
standard error of the estimate
what estimates the uncertainty of the error term
high r^2 even though the t-stats of the slope coefficients are not significant
what is a classic symptom of multicollinearity
adjusted r^2
what is a good measure of goodness of fit in multiple regression models
hold the other independent variables constant (ex-if b1=0.60, then we would interpret 0.60 as the expected increase in y for a 1 unit increase in x1 holding the other independent variables constant)
what must you do when interpreting a slope coefficient in a multiple regression model?