Chapter 9: Multiple Regression-FIN 360

Ace your homework & exams now with Quizwiz!

incorrect functional form, violates regression assumptions, has time series specification problems

3 important model misspecifications

omit important variables, incorrectly represent variables, pool data that shouldn't be pooled

3 things that can cause incorrect functional form misspecifications

lagged values of independent variables as dependent variables, independent variables that are measured with error, independent variables that are function of the dependent variable

3 time series misspecifications

tested out of sample, compliant with regression assumptions, model is parsimonious, variables are appropriate functional form, model is grounded in sound economic reasoning

5 important model specifications

qualitative

Dependent variables that take on ordinal or nominal values are better estimated using models developed for ____________ analysis

multiple linear regression

Y = b0+ b1X1+ b2X2+ b3X3+...+ bKXK is the equations for

degrees of freedom

adjusted r^2 doesn't automatically increase when another variable is added to the regression; it's adjusted for this

multiple linear regression

allows us to determine the effect of more than one independent variable on a particular dependent variable

residuals

another word for error term (according to her)

relationship between the dependent variable and independent variables is linear, the independent variables are linear and no linear relation exists between the independent variables, the expected value of the error term is 0, the variance of the error term is the same for all observations, error term is uncorrelated across observations, error term is normally distributed, residuals are independent

assumptions of the multiple linear regression model

estimated values of the regression coefficients

b0 hat, b1 hat, b2 hat, etc. are the

inflated standard errors>smaller t-stat>fail to reject more often than you should , high r^2

consequences of multicollinearity

data mining

developing a model that exploits characteristics of a specific data set

qualitative dependent variables

dummy variables used as dependent variables instead of as independent variables ex-dependent variable is bankrupt or not bankrupt and the company's financial data is the independent variable

Altman's Model

example of discriminant analysis

conditional heteroskedasticity

heteroskedasticity of the error variance that is correlated with the values of the independent variables in the regression

unconditional heteroskedasticity

heteroskedasticity of the error variance that is not correlated with the independent variables in the multiple regression

test the null hypothesis that all the sloe coefficients in the regression are equal to 0

how do we answer the question of whether all the independent variables in the model help explain the dependent variable?

2

if a regression has no serial correlation, the durbin watson statistic should equal

0

if none of the independent variables in a regression model help explain the dependent variable, the slope coefficients should all equal what

positive serial correlation

if the DW statisitc is less than 2 then it suggests what

negative serial correlation

if the DW statistic is more than 2, then it suggests what

0

if the independent variables don't explain the variation in the dependent variable at all, then the f stat should be

large

if the regression model does a good job of explainting the variation in the dependent variable, the f-stat should be

n-1

if we want to distinguish among n categories, we need how many dummy variables ex-if we want to distinguish between 4 quarters of the year, we need ________ dummy variables

value of the dependent variable if the all the independent variables have a value of 0

interpret the intercept

slope coefficient

measures how much the dependent variable changes when a independent variable changes by one unit

f statistic

measures how well the regression equation explains variation in the dependent variable

nonstationarity

most frequent source of misspecficaitons in time series linear regression with 2 or more different variables

financial

multicollinarity is prevalent in _________ models

y=b0 +b1x1 + b2x2.......

multiple linear regression model equation

multiple linear regression

proc reg; model y = x1 x2 x3; is an example of what

simple linear regression

proc reg; model y = x; is an example of what

weighted linear regression

proc reg; model y = x; weight w; is an example of what

logit

qualitative dependent variable model that based on the logistic distribution, it estimates the probability of the dependent variable outcome

probit

qualitative dependent variable model that based on the normal distribution, it estimates probability of the dependent variable outcome

discriminant analysis

qualitative dependent variable model that estimates a linear function, which can then be used to assign the observation to the underlying categories.

dummy variable

quantitative independent variables that take on a value of 1 if a condition is true and 0 if a condition is false ex-an independent variable that =1 if the month is january and =0 for every other month

log-log regression model

regression model that expresses the independent and dependent variables as natural logs

proc anova

sas command to get analysis of variance

proc nlin

sas command to use for nonlinear regression

proc glm

sas commond to use for general linear models that have unbalance among two or more factors

first order serial correlation

serial correlation between adjacent observations

positive serial correlation

serial correlation in which a positive error for one observation increases the change of a positive error for another observation

p-value

smallest level of significance at which we can reject a null hypothesis that the population value of the coefficient is 0

durbin watson statistic

statistic used to detect serial correlation

decrease

when a new indepedent variable is added, adjusted r^2 can __________ if adding the variable results in only a small increase in adjusted R^2

when the f-stat is greater than the upper critical value of the f distribution

when do we reject the null hypothesis in the f test

when the model is only used for predictions

when is multicollinearity not a problem

when proportional changes in the dependent variable bear constant relationships to proportional changes in the independent variable

when is using the log-log regression model appropriate

nonstationarity

when the variable's properties, such as mean or variance, are not constant through time

time series regressions

where do you typically see issues with serial correlation

financial analysts

who often needs to use quantitative variables as independent variables in a regression

want to avoid data mining

why is it important for our model to be grounded in sound economic reasoning

f-test

what test do we use to test the null hypothesis that all of the slope coefficients are jointly equal to 0 against the alternative hypothesis that at least one slope coefficient is not equal to 0

unconditional

what type of heteroskedasticty doesn't create problems for statistical inference

heteroskedasticity, serial correlation, multicollinearity

the 3 main violations of regression assumptions

lower

the ____________ the p-value, the stronger the evidence against the null hypothesis

true

true/false a high R^2 doesn't necessarily mean that a regression is well specified in the sense of including the correct set of independent variables

false

true/false adjusted r^2 cannot be negative

true

true/false financial analysts often need to be able to explain the outcomes of a qualitative dependent variable

true (overestimate t stat)

true/false having heteroskedasticity in your model can lead to finding more significant relationships when none actually exist

false

true/false high pairwise correlations among independent variables are a necessary condition for multicollinearity

false

true/false interpreting the slope coefficients in multiple regression is the same as doing so in one independent variable regressions

true

true/false low pairwise correlations do not mean that multicollinearity isn't a problem

false (unreliable)

true/false predictions based on values outside the range of the data on which the multiple regression model was estimated tend to reliable

false (worse)

true/false r^2 is just as good as a measure of goodness of fit in multiple regression as it is in simple regression

true

true/false we can increase r^2 by including many additional independent variables that explain even a slight amount f the previously unexplained variation, even if the amount they explain isn't significant

true

true/false we make the assumption that positive serial correlation takes the form of first order serial correlation, which means that the sign of the error term tends to persist from one period to the next

error term, parameter estimates

two sources of uncertainty in linear regression

increase the sample size, use different variables

two ways to correct for multicollinearity

incorporate the source of the correlation to capture the effect, use alternative time series methods that capture serial correlation

two ways to correct for serial correlation

visual inspection of residuals, durbin watson test

two ways to detect serial correlation

indicator (dummy) variables

used to capture qualitative aspects of the hypothesized relationship

serial correlation

violation of regression assumptions that happens when regression errors are correlated across observations

heteroskedasticity

violation of regression assumptions that happens when the variances of the error term differs across observations

multicollinearity

violation of regression assumptions that happens when two ore more independent variables are highly correlated with each other

underestimate standard error, overestimate t-stat, f-test will be unreliable

what are a few consequences of heteroskedasticiy

visually inspect residuals, associate residuals with another factor, use goldfeld quant test

what are a few methods that you can use to test for heteroskedasticity

partial regression coefficients

what are slope coefficients in multiple regression models called

underestimate standard error>overestimate t stat> reject the null more often than you should

what are the main consequences of serial correlation

use more sophisticated estimation methods, incorporate an independent variable that includes the missing factor

what are two ways to correct for heteroskedasticity

standard error of the estimate

what estimates the uncertainty of the error term

high r^2 even though the t-stats of the slope coefficients are not significant

what is a classic symptom of multicollinearity

adjusted r^2

what is a good measure of goodness of fit in multiple regression models

hold the other independent variables constant (ex-if b1=0.60, then we would interpret 0.60 as the expected increase in y for a 1 unit increase in x1 holding the other independent variables constant)

what must you do when interpreting a slope coefficient in a multiple regression model?


Related study sets

305 Financial Chapter 1 (Part 2)

View Set