Level 2 SS 3

¡Supera tus tareas y exámenes ahora con Quizwiz!

ANOVA Table

◦output of ANOVA procedure is an ANOVA table, which is a summary of the variation in the dependent variable. ANOVA tables are included in the regression output of may statistical software packages.

How do you detect serial correlation?

◦residual plots and durbin-watson statistic

To test whether the two time series have unit roots, the analyst ﬁrst runs separate DF tests with ﬁve possible results:

1. Both time series are covariance stationary (can use linear regression) 2. only the dependent variable time series is covariance stationary (regression not reliable) 3. only the independent variable time series is covariance stationary (regression not reliable) 4. neither time series is covariance stationary and the two series are not cointegrated (depends if the two time series are cointegrated) 5. neither time series is covariance stationary and the two series are cointegrated. (depends if the two time series are cointergrated)

A time series is covariance stationary if it satisﬁes the following three conditions:

1. Constant and ﬁnite expected value: the expected value of the time series is constant over time. ( we will refer to this value as the mean-reverting level) 2. constant and ﬁnite variance: The time series' volatility around its mean (i.e., the distribution of the individual observations around the mean) does not change over time. 3. constant and ﬁnite covariance between values at any given lag. A. the covariance of the time series with leading or lagged values of itself is constant

What are the two guidelines to follow to determine what type of model is best suited to meet your needs?

1. Determine your goal A. are you attempting to model the relationship of a variable to other variables (e.g. cointegrated time series, cross-sectional multiple regression)? B. Are you trying to model the variables over time (e.g. trend model)? 2. If you have decided on using a time series analysis for an individual variable, plot the values of the variable over time and look for characteristics that would indicate nonstationarity, such as non-constant variance (heteroskedasticity), non-constant mean, seasonality, or structural change.

What is the procedure to test whether an AR time series model is correctly speciﬁed? (Three Steps)

1. Estimate the AR model being evaluated using linear regression: Start with a ﬁrst order ar model 2. CALCULATE the autocorrelations of the model's residuals (i.e., the level of correlation between the forecast errors from one period to the next) 3. TEST whether the autocorrelations are signiﬁcantly different from zero A. to test for signiﬁcance, a t-test is used to tgest the hypothesis that the correlations of the residuals are zero. B. the t statistic is the hypothesis that the correlations of the residuals are zero. C. the t-statistic is the stimated autocorrelation divided by the standard error

What are steps 3-8 to determine what type of model is best suited?

3. if there is no seasonality or structural shift, use a trend model - if the data plot on a straight line with an upward or downward slope, use a linear trend model - if the data plot in a curve, use a log-linear trend model 4. Run the trend analysis, compute the residuals, and test for serial correlation using the Durbin Watson test. - if you detect no serial correlation, you can use the model - if you detect serial correlation, you must use another model (e.g AR) 5. IF the data has serial correlation, reexamine the data for stationarity before running an AR model. If it is not stationary, treat the data for use in an AR model as follows: - if the data has a linear trend, ﬁrst-difference the data - if the data has an exponential trend, ﬁrst-difference the natural log of the data - if there is a structural shift in the data, run two separate models as discussed above - if the data has a seasonal component, incorporate the seasonality in the AR model as discussed below. 6. After ﬁrst-differencing in 5 above, if the series is covariance stationary, run an AR(1) model and test for serial correlation and seasonality - if there is no remaining serial correlation, you can use the model - if you still detect serial correlation, incorporate lagged values of the variable (possibly including one for seasonality - e.g. for monthly data, add the 12th lag of the time series) into the AR model until you have removed (i.e. modeled) any serial correlation 7. Test for ARCH. Regress the square of the residuals on squares of lagged values of the residuals and test whether the resulting coefﬁcient is signiﬁcantly different from zero. - if the coefﬁcient is not signiﬁcantly different from zero, you can use the model - if the coefﬁcient is significantly different from zero, ARCH is present. correct using generalized least squares. 8. If you have developed two statistically reliable models and want to determine which is better at forecasting, calculate their out-of-sample RMSE.

C. the t-statistic is the stimated autocorrelation divided by the standard error.

a. numerator is the correlation of error term t with the kth lagged error term.

What are in sample forecasts?

in sample forecasts are within the range of data (i.e. time period) used to estimate the model, which for a time series is known as the sample or test period

Forecasting with an autoregressive model

these are calculated in same manner as other regression models, but since ind var are lagged, it is necessary to calculate a one step ahead forecast before a two step ahead forecast can be calculcated

How do you determine if a time series is covariance stationary?

• run an AR model and examine autocorrelations ◦an AR model is estimated and the statistical signiﬁcance of the autocorrelations at various lags is examined • perform the dickey fuller test ◦(DF) transform the AR(1) model to run a simple regression.

What is autoregressive conditional heteroskedasticity (ARCH)?

◦ARCH exists if the variance of the residuals in one period is dependent on the variance of the residuals in a previous period. ‣ when this exists, the standard errors of the regression coefﬁcients in AR models and the hypothesis tests of these coefﬁcients are invalid.

What is a linear trend model?

◦a linear trend model is a time series pattern that can be graphed using a straight line ◦downward sloping indicates a negative trend, while upward indicates positive trend. ◦ordinary least squares (OLS) regression is used to estimate the coefﬁcient in the trend line, which provides the following prediction equation:

What is a structural change?

◦a structural change is indicated by a signiﬁcant shift in the plotted data at a point in time that seems to divide the data into two or more distinct patterns. ◦in this case, you have to run 2 different models- 1 incorporating the data before and one after that date and test whether the time series has actually shifted. ◦if it has shifted, a single time series over the whole period is bad!

What is mean reversion?

◦a time series exhibits mean reversion if it has a tendency to move toward its mean. ‣ decline when current value is above, and increase when current value is below the mean ◦if at the mean reverting level, the model predicts that the next value of the time series will be the same as its current value

What is an ARCH time series?

◦an arch time series is one for which the variance of the residuals in one period is dependent on (i.e. a function of) the variance of the residuals in the preceding period.

Using ARCH Models

◦arch models are used to test for autoregressive contional heteroskedasticity

Limitations of Trend Models

◦assumption that residuals are uncorrelated with each other. violation of this is referred to as autocorrelation. the residuals are persistently positive or negative for periods of time and it is said that the data exhibit serial correlation. shouldn't use it, not appropriate for time series ◦if looks like should be log linear, but actually shouldn't be. use the autoregressive model. ◦for a time series model without serial correlation DW should be approximately equal to 2.0. A DW signiﬁcantly different from 2.0 suggests that the residual terms are correlated.

What is cointegration?

◦cointegration means that two time series are economically linked (related to the same macro variables) or follow the same trend and that relationship is not expected to change. ◦if two time series are cointegrated, the error term from regressing one on the other is covariance stationary and the t-tests are reliable (scenario 5 ok, and scenario 4 is not)

What is a random walk?

◦if a time series follows a random walk process, the predicted value of the series (i.e., the value of the dependent variable) in one period is equal to the value of the series in the previous period plus a random error term.

What is a random walk with a drift?

◦if a time series follows a random walk with a drift, the intercept term is not equal to zero. ◦in addition to a random error term, the time series is expected to increase or decrease by a constant amount each period.

Predicting the variance of a time series

◦if a time series has ARCH errors, an ARCH model can be used to predict the variance of the residuals in future periods. ◦if ARCH (1) model... then can predict the variance of the residuals in period t + 1

First Differencing

◦if a time series is a random walk (has unit root) we can transform the data to a covariance stationary time series using ﬁrst differencing. ◦process involves subtracting the value of the time series (i.e. the dependent variable) in the immediately preceding period from the current value of the time series to deﬁne a new dependent variable, y. (model change in dep var)

What is the chain rule of forecasting?

◦it is the calculation of successive forecasts int his manner (need to calculate one step ahead of forecast before a two step ahead forecast can be calculated)

Bottom line on DF-EG test

◦just like regular DF test, if the null is rejected, we say the series (of error terms in this case) is covariance stationary and the two time series are cointegrated

What is covariance stationarity?

◦neither a random walk nor a random walk with a drift exhibits covariance stationarity. ◦look at equation from last card... the mean reverting level is Bo / 1-b1 where 1-b1 is zero. so bo/0 is undeﬁned. so it is not covariance stationary. but is unit root (b1=1) and we cant use the least squares regression procedure we used to estimate an ar(1) model without transforming the data.

What are out of sample forecasts?

◦out-of-sample forecasts are made outside of the sample period ◦we compare how accurate a model is in forecasting the y variable value for a time period outside the period used to develop the model. ◦help see if the model adequately describes the time series and whether it has relevance (i.e. predictive power) in the real world.

LOS 13.d: Describe the structure of an autoregressive (AR) model of order p, and calculate one and two-period-ahead forecasts given the estimated coefﬁcients.

◦p indicates the number of lagged values that the autoregressive model will include as independent variables.Ar (2) means second order autoregressive

How do you test if two time series are cointegrated?

◦regress one variable on the other using the following model: ◦the residuals are tested for a unit root using the Dickey Fuller test with critical t-values calculated by Engle and Granger (i.e., the DF-EG test). ◦if the test rejects the null hypothesis of a unit root, we say the error terms generated by the two time series are covariance stationary and the two series are cointegrated. ◦if the two series are cointegrated, we can use the regression to model their relationship.

what do you do if a time series model has been determined to contain arch errors?

◦regression procedures that correct for heteroskedasticity, such as generalized least squares, must be used in order to develop a predictive model ◦otherwise, the standard errors of the models coefficients will be incorrect, leading to invalid conclusions.

Autocorrelation & Model Fit

◦serial correlation (or autocorrelation) means the error terms are positively or negatively correlated ◦when the error terms are correlated, standard errors are unreliable and t tests of individual coefﬁcients can incorrectly show statistical signiﬁcance or insigniﬁcance.

What is the root mean squared error?

◦the root mean squared error criterion (RMSE) is used to compare the accuracy of autoregressive models in forecasting out of sample values. ◦the model with the lower RMSE for the out of sample data will have lower forecast error and will be expected to have better predictive power in the future.

How do you test whether a time series is arch(1)?

◦the squared residuals from an estimated time-series model are regressed on the ﬁrst lag of the squared residuals.

Log-Linear Trend Models

◦time series data often displays exponential growth (growth with continuous compounding). ◦positive exponential growth means that the random variable (i.e., the time series) tends to increase at some constant rate of growth. ◦the observations will form a convex curve. ◦negative exponential growth means that the data tends to decrease at some constant rate of decay, and the plotted time series will be a concave curve.

How do you correct for seasonality?

◦to adjust for seasonality in an AR model, an additional lag of the dependent variable (corresponding to the same period in the previous year) is added to the original model as another independent variable. if quarterly, seasonal lag is 4, if monthly then it is 12.

Bottom line on linear versus log-linear

◦when a variable grows at a constant rate, a log-linear model is most appropriate ◦when the variable increases over time by a constant amount, a linear trend model is most appropriate

What is a autoregressive model?

◦when the dependent variable is regressed against one or more lagged values of itself, the model is called a autoregressive model ◦in an autoregressive time series, past values of a variable are used to predict the current (and hence future) value of the variable. ◦statistical inferences based on ordinary least squares (OLS) estimates for an AR time series model may be invalid unless the time series being modeled is covariance stationary

LOS 13.h: Explain the instability of coefﬁcients of time-series models

◦ﬁnancial and economic time series inherently exhibit some form of instability or nonstationarity. ◦this is because ﬁnancial and economic conditions are dynamic, and the estimated regression coefﬁcients in one period may be quite different from those estimated during another period. ◦longer time periods are less stable, more time for things to change

What are some factors that determine which model is best?

◦ﬁrst you must plot the data ◦use linear trend model if the data points appear to be equally distributed above and below the regression line. inﬂation rate data usually is this way ◦if plots with a non-linear (curved) shape, then the residuals from a linear trend model will be persistently positive or negative for a period of time. ‣ then use log linear model. usually ﬁnancial data

How do you transform the AR1 model for DF?

ﬁrst: start with the basic form of the AR(1) model and (2) subtract X(t-1) from both sides • then test that the new, transformed coefficient (b1-1) is different from zero using a modiﬁed t test. ◦if b1-1 is not signiﬁcantly different from zero, they say that B1 must be equal to 1.0 and therefore, the series must have a unit root. ◦if the null (g=0) cannot be rejected, answer is that the time series has a unit root ‣ if rejected, noes not have a unit root.

3. Other time-series misspeciﬁcations that result in nonstationarity (covered later)

...

Analysis of variance (ANOVA)

Analysis of variance (ANOVA) is a statistical procedure for analyzing the total variability of the dependent variable

Conﬁdence interval for the regression coefﬁcient, b1, is calculated as:

Tc is the critical two-tailed t-value for the selected conﬁdence level with the appropriate number of degrees of freedom, which is equal to the number of sample observations minus 2 (i.e. n-2) • SB1 is the standard error of the regression coefﬁcient. it is a function of the SEE. as SEE rises, SB1 also increases, and conﬁdence interval widens. ◦SEE measures the variability of the data about the regression line, and the more variable the data, the less conﬁdence there is in the regression model to estimate a coefﬁcient.

LOS 11.g: Formulate a null and alternative hypothesis about a population value of a regression coefﬁcient, and determine the appropriate test statistic and whether the null hypothesis is rejected at a given level of signiﬁcance.

a t-test may also be used to test the hypothesis that the true slope coefﬁcient, b1, is equal to some hypothesized value. Letting b1 be the point estimate for b1, the appropriate test statistic with n-2 degrees of freedom is:

What is the effect of heteroskedasticity on regression analysis

there are four effects of heteroskedasticity you need to be aware of: ‣ the standard errors are usually unreliable estimates ‣ the coefﬁcient estimates (the Bj) aren't affected ‣ if the standard errors are too small, but the coefﬁcient estimates themselves are not affected, the t-statistics will be too large and the null hypothesis of no statistical signiﬁcance is rejected too often. the opposite is true if the standard errors are too large ‣ the f test is also unreliable.

Hypothesis Testing of Regression coefﬁcients

use the t test on all of the individual ones... remember DF = n-k-1.. and k =1... the 1 is the intercept and the k is the regression coefﬁcients.

How do you detect heteroskedasticity?

• examine scatter plots of the residuals • use the breush-pagan chi-square test

Regression Coefﬁcient Conﬁdence Interval

• hypothesis testing for a regression coefﬁcient may use the conﬁdence interval for the coefﬁcient being tested.

Intercept term

• is the lines intersection with the y axis at x=0. it can be positive, negative, or zero.

LOS 11.j: Explain limitations of regression analysis

• linear relationships can change over time. This means that estimation equation based on data from a speciﬁc time period may not be relevant for forecasts or predictions in another time period. this is parameter instability • even if the regression model accurately reﬂects the historical relationship between the two variables, its usefulness in investment analysis will be limited if other market participants are also aware and act on it. • if assumptions underlying regression do not hold, may not be valid. if data is heteroskedastic (non-constant variance of the error terms) or exhibits autocorrelation (error terms are not independent), regression results may be invalid.

What is multiple regression?

◦ multiple regression is regression analysis with more than one independent variable ◦simple linear regression explains the variation in stock returns in terms of the variation in systematic risk as measured by beta ◦with multiple regression, stock returns can be regressed against beta and against additional variables. such as ‣ ﬁrm size ‣ equity ‣ industry classiﬁcation

F-Statistic

◦F-test assess how well a set of independent variables, as a group, explains the variation in the dependent variable. ◦in multiple regression, the F-statistic is used to test whether at least one independent variable in a set of independent variables explains a signiﬁcant portion of the variation of the dependent variables.

LOS 11.c: Formulate a test of the hypothesis that the population correlation coefﬁcient equals zero, and determine whether the hypothesis is rejected at a given level of signiﬁcance

◦Ho = p=0 versus Ha= p DNE 0 ◦assuming normally distributed, use a t test ‣ to make a decision, compare to the critical t-value for the appropriate degrees of freedom and level of signiﬁcance ‣ reject Ho if +tcritical <t, or t< -tcritical

The DW test procedure for positive serial correlation as follows:

◦Ho = the regression has no positive serial correlation ◦If DW < d1, the error terms are positively serially correlated (i.e., reject the null hypothesis of no positive serial correlation). ◦if D1 < DW < Du, the test is inconclusive ◦if DW > Du, the test is inconclusive ◦If DW > Du, there is no evidence that the error terms are positively correlated (i.e., fail to reject the null of no positive serial correlation)

MSR and MSE

◦MSR is the mean regression sum of squares and MSE is the mean squared error. both are simply calculated as the appropriate sum of squares divided by its degrees of freedom

Adjusted R2

◦R2 almost always increases as variables are added to the model, even if the marginal contribution of the new variables is not statistically signiﬁcant. ◦thus, a high R2 may reﬂect the impact of a large set of independent variables rather than how well the set explains the dependent variable. overestimating the regression

Coefﬁcient of Determination, R2

◦R2 can be used to see the effectiveness of all independent variables. ◦R2 = Total variation - unexplained variation / total variation = SST-SSE/SST = explained variation/total variation = RSS/SST

Regression sum of squares (RSS)

◦RSS measures the variation in the dependent variable that is explained by the independent variable. RSS is the sum of the squared distances between the predicted Y-values and the mean of Y.

Sum of squared errors (SSE)

◦SSE measures the unexplained variation in the dependent variable. ◦also known as the sum of squared residuals or the residual sum of squares. ◦SSE is the sum of the squared vertical distances between the actual Y-values and the predicted Y-values on the regression line

What are the three broad categories of model misspeciﬁcation, or ways in which the regression model can be speciﬁed incorrectly, each with several subcategories:

◦The functional form can be misspeciﬁed ◦explanatory variables are correlated with the error term in time series models ◦other time-series misspeciﬁcations that result in nonstantionarity.

Total Variation

◦Total variation = explained variation + unexplained variation ◦SST = RSS + SSE

2. Explanatory variables are correlated with the error term in time series models

◦a lagged dependent variable is used as an independent variable ◦a function of the dependent variable is used as an independent variable ("forecasting the past") ◦independent variables are measured with error.

Interpreting a scatter plot

◦a scatter plot is a collection of points on a graph where each point represents the values of two variables (an x/y pair) ◦upward sweeping scatter plot indicates a positive correlation between the two variables, while a downward sweeping plot implies a negative correlation

What is a time series?

◦a time series is a set of observations for a variable over successive periods of time (e.g. monthly stock market returns for past ten yrs) ◦series has a trend if a consistent pattern can be seen by plotting the data (individual observations) on a graph

How do you correct serial correlation?

◦adjust the coefﬁcient standard errors (hansen method) ‣ also corrects for conditional heteroskedasticity ‣ these adjusted standard errors, which are sometimes called serial correlation consistent standard errors or hansen white standard errors, are then used in hypothesis testing of the regression coefﬁcients. ‣ only use the Hansen method if serial correlation is a problem.the white corrected standard errors are preferred if only heteroskedasticity is a problem. if both conditions are present, then use hansen method. ◦improve the speciﬁcation of the model - explicitly incorporate the timeseries nature of the data (include a seasonal term). This can be tricky

LOS 12.e: Calculate and interpret the F-statistic, and describe how it is used in regression analysis

◦an F-test assesses how well the set of independent variables, as a group, explains the variation in the dependent variable. ◦F-statistic is used to test whether at least one of the independent variables explains a signiﬁcant portion of the variation of the dependent variable. ◦to determine whether at least one of the coefﬁcients is statistically signiﬁcant, the calculated F-statistic is compared with the one-tailed critical F-value, Fc, at the appropriate level of signiﬁcance. The degrees of freedom for the numerator and denominator are: ‣ dfnum = k ‣ df denom = n-k-1 ‣ n= number of observations ‣ k=number of independent variables

LOS 12.d: Explain the assumptions of a multiple regression model

◦as with simple linear regression, most of the assumptions made with the multiple regression pertain to the error term: ‣ a linear relationship exists between the dependent and independent variables. ‣ the independent variables are not random, and there is no exact linear relation between any two or more independent variables ‣ the expected value of the error term, conditional on the independent variable, is zero (i.e, sum of errors =0) ‣ the variance of the error terms is constant for all observations ‣ the error term for one observation is not correlated with that of another observation ‣ the error term is normally distributed.

LOS 11.e: Explain the assumptions underlying linear regression, and interpret the regression coefﬁcients

◦assumptions: ‣ a linear relationship exists between the dependent and the independent variable ‣ the independent variable is uncorrelated with the residuals ‣ the expected value of the residual term is zero ‣ the variance of the residual term is constant for all observations ‣ the residual term is independently distributed; that is, the residual for one observation is not correlated with that of another observation ‣ the residual term is normally distributed.

What is the effects of the model misspecﬁication on the regression results?

◦basically the same for all of the misspecﬁications we will discuss ‣ regression coefﬁcients are often biased and/or inconsistent which means we cant have any conﬁdence in our hypothesis tests of the coefﬁcients or in the predictions of the model.

What is the effect of serial correlation on regression analysis?

◦because data clusters together from observation to observation, positive serial correlation typically results in coefﬁcient standard errors that are too small ◦these small error terms will cause the computed t-stats to be larger than they should be, which causes too many type one errors(rejection of the null hypothesis when it is actually true) ◦F test will also be unreliable because the MSE will be underestimated leading again to too many type I errors

How do you detect multicollinearity?

◦best way to detect it is when t-tests indicate that none of the individual coefﬁcients is sign. different than zero., while the f test is significant and the R2 is high. ◦so together they explain it, but individually they dont. means independent variables are highly correlated with each other. ◦if the absolute value of the sample correlation between any two independent variables in the regression is greater than 0.7, multicollinearity is a potential problem. this only works if there are exactly two independent variables. ‣ if more than two independent variables, while individual variables may not be highly correlated, linear combinations might be, leading to multicollinearity. ‣ high correlation among the independent variable suggests the possibility of multicollinearity, but low correlation among the independent variables does not necessarily indicate multicolinearity is not present.

Calculating R squared and SEE

◦both of these can also be calculated directly from the anova table ◦R2 is the percentage of the total variation in the dependent variable explained by the independent variable ◦SEE is the standard deviation of the regression error terms and is equal to the square root of the mean squared error (MSE)

How do you correct heteroskedasticity?

◦calculate robust standard errors (also called White-corrected standard errors or heteroskedasticity-consistent standard errors). ◦these robust standard errors are then used to recalculate the t-statistics using the original regression coefﬁcients ◦can also use the genrealied least squares, which modifies ornginal equation

Nonlinear relationships

◦correlation measures the linear relationship between two variables ◦two variables could have nonlinear relationship such as y=(3x-6) squared and the correlation would be close to zero. ◦therefore, another limitation of correlation analysis is that is does not capture strong nonlinear relationships between variables.

Outliers

◦outliers represent a few extreme values for sample observations ◦relative to the rest of the sample data, the value of an outlier may be extraordinarily large or small ◦outliers can result in apparent statistical evidence that a signiﬁcant relationship exists when, in fact, there is none, or that there is no relationship when, in fact, there is a relationship

Covariance

◦covariance between two random variables is a statistical measure of the degree to which the two variables move together. ◦covariance captures the linear relationship between two variables. ◦positive covariance indicates that the variables tend to move together ◦negative covariance indicates that the variables tend to move in opposite directions. ◦ negative to positive inﬁnity, presented in squared units. hard to interpret so use correlation

what is the decision rule for the F-test?

◦decision rule: reject Ho if F (test statistic) > Fc (critical value)

Dependent variable

◦dependent variable is the variable whose variation is explained by the independent variable. also called endogenous variable, ◦independent variable: used to explain the variation of the dependent variable. the independent variable i also referred to as the explanatory variable, the exogenous variable, or the predicting variable.

Slope Coefﬁcient (b1)

◦describes the change in Y for one unit change in X. ◦it can be positive, negative or zero, depending on the relationship between the regression variables.

What are discriminant models?

◦discriminant models are similar to probit and logit models but make different assumptions regarding the independent variables. ◦discriminant analysis results in a linear function similar to an ordinary regression, which generates an overall score, or ranking, for an observation. ◦The scores can then be used to rank or classify observations

What is the effect of multicollinearity on regression analysis?

◦even though multicollinearity does not affect the consistency of slope coefﬁcients, such coefﬁcients themselves tend to be unreliable ◦additionally, the standard errors of the slope coefﬁcients are artiﬁcially inﬂated ‣ hence, there is a greater probability that we will incorrectly conclude that a variable is not statistically signiﬁcant (i.e. Type II error) ◦likely to be present to some extent in economic models, but the issue is whether the multicollinearity has a signiﬁcant effect on the regression results

What are the three most common violations for regression?

◦heteroskedasticity ◦serial correlation ◦multicollinearity

What is heteroskedasticity?

◦heteroskedasticity occurs when the variance of the residuals is not the same across all observations in the sample. This happens when there are sub samples that are more spread out than the rest of the sample. ◦unconditional heteroskedasticity occurs when the heteroskedasticity is not related to the level of the independent variables, which means that it doesn't systematically increase or decrease with changes in the value of the independent variable(s). while this is a problem with the equal variance assumption, it usually causes no major problem with the regression

DW Explained further

◦if = 2, then homoskedastic and not serially correlated ◦DW < 2 if the error terms are positively serially correlated (r >0) ◦DW > 2 if the error terms are negatively serially correlated (r < 0)

LOS 12.h: Formulate a multiple regression equation by using dummy variables to represent qualitative factors, and interpret the coefﬁcients and regression results.

◦if independent variable is binary in nature = it is either "on" or "off" ◦ these are called dummy variables and are used to quantify the impact of qualitative events. ◦they are assigned value of "0" or "1" ◦whenever we want to distinguish between n classes, we must use n-1 dummy variables. otherwise, the regression assumption of no exact linear relationship between independent variables would be violated.

1. The functional form can be misspeciﬁed

◦important variables are omitted ◦variables should be transformed ◦data is improperly pooled

What is conditional heteroskedasticity?

◦it is heteroskedasticity that is related to the level of the independent variables. ◦for example, heteroskedasticity exists if the variance of the residual term increases as the value of the independent variable increases, as shown in ﬁgure 4. ◦this does create signiﬁcant problems for statistical inference

What is negative serial correlation?

◦it occurs when a positive error in one period increases the probability of observing a negative error in the next period.

LOS 12.m: Interpret the economic meaning of the results of multiple regression analysis, and evaluate a regression model and its results

◦look at the slope coefﬁcients

Whats a popular application of discriminant models?

◦makes use of ﬁnancial ratios as the independent variables to predict the qualitative dependent variable bankruptcy. ◦a linear relationship among the independent variables produces a value for the dependent variable that places a company in a bankrupt or not bankrupt class.

Standard error of estimates (SEE)

◦measures the degree of variability of the actual Y-Values relative to the estimated Y-Values from a regression equation. The SEE gauges the "ﬁt" of the regression line. ◦The smaller the standard error, the better the ﬁt. ◦the SEE is the standard deviation of the error terms in the regression. As such, SEE is also referred to as the standard error of the residual, or standard error of the regression.

Total sum of squares (SST)

◦measures the total variation in the dependent variable. SST is equal to the sum of the squared differences between the actual Y-values and the mean of Y

Determining Statistical Signiﬁcance

◦most common hypothesis test is to test statistical signiﬁcance- which means testing the null hypothesis that the coefﬁcient is zero versus the alternative that it is not: ‣ "testing statistical signiﬁcance" => Ho: Bj = 0 versus Ha: Bj DNE 0

How do you correct multicollinearity?

◦most common method to correct for multicollinearity is to omit one or more of the correlated independent variables. but its hard to ﬁnd that speciﬁc variable that is the cause of the problem

LOS 12.j: Describe the multicollinearity, and explain its causes and effects in regression analysis.

◦multicollinearity refers to the condition when two or more of the independent variables, or linear combinations of the independent variables, in a multiple regression are highly correlated with each other. ◦this condition distorts the standard error of estimate and the coefﬁcient standard errors, leading to problems when conducting t-tests for statistical signiﬁcance of parameters.

Interpreting p-Values

◦p-value is the smallest level of signiﬁcance for which the null hypothesis can be rejected. ◦an alternative method of doing hypothesis testing of the coefﬁcients is to compare the p-value to the signiﬁcance level: ‣ if the p-value is less than signiﬁcance level, the null hypothesis can be rejected ‣ if the p-value is greater than the signiﬁcance level, the null hypothesis cannot be rejected.

What is positive serial correlation?

◦positive serial correlation exists when a positive regression error in one time period increases the probability of observing a positive regression error for the next time period.

Predicted Values

◦predicted values are values of the dependent variable based on the estimated regression coefﬁcients and a prediction about the value of the independent variable. ◦they are the values that area predicted by the regression equation, given an estimate of the independent variable.

What is a probit and logit model?

◦probit model: based on the normal distribution while logit is based on the logistic distribution. ◦application of these models results in estimates of the probability that the event occurs (probability of default) ◦the maximum likelihood methodology is used to estimate coefﬁcients for probit and logit models ◦these coefﬁcients relate the independent variables to the likelihood of an event occurring, such as a merger, bankruptcy, or default.

LOS 12.1: Describe models with qualitative dependent variables

◦qualitative dependent variables: dummy variable that takes on a value of either zero or one. ◦a model that attempts to predict when a bond issuer will default - variable is one if default and zero if no default.

Correlation coefﬁcient

◦r, a measure of the strength of the linear relationship between two variables ◦no unit of measure, is a pure measure of tendency of two variables to move together.

Regression line explained

◦regression line is the line for which the estimates of Bo and B1 are such that the sum of squared differences (vertical differences) between the Y values predicted by the regression equation and actual y values is minimized ◦the sum of the squared vertical distances between the estimated and actual Y-values is referred to as the sum of squared errors (SSE)

What is regression model speciﬁcation?

◦regression model speciﬁcation is the selection of the explanatory (independent) variables to be included in the regression and the transformations, if any, of those explanatory variables.

Breush-Pagan test

◦regression of the squared residuals on the independent variables ◦if conditional hetero is present, the independent variables will signiﬁcantly contribute to the explanation of the squared residuals. ◦its a one tail test because heterosked is only a problem if the r2 and the BP test statistic are too large.

F-Test explained

◦rejection of null indicates that the independent variable is signiﬁcantly different than zero, which is interpreted to mean that it makes a significant contribution to the explanation of the dependent variable. ◦in simple linear regression, it tells us the same thing as the t-test of the slope coefﬁcient (tb1) ◦f is not important for single... but important for multiple regression.

What is serial correlation?

◦serial correlation, aka autocorrelation, refers to situation in which the residual terms are correlated with one another. ◦serial correlation is a relatively common problem with time series data

The F-Statistic with one independent variable

◦since only one independent variable, f test tests the same hypothesis as the t test for statistical sign of the slope of the coefﬁcient. ◦Ho: b1 = 0 versus Ha: b1 DNE 0 ◦compare the calculated F-statistics with the critical F-value, Fc, at the appropriate level of signiﬁcance. ◦the degree of free for the numerator and denominator with one independent variable is: ◦Df numerator = k=1 ◦df denominator = n-k-1=n-2 ◦n= # of observations ◦decision rule: Reject Ho if F > Fc

Spurious Correlation

◦spurious correlation refers to the appearance of a casual linear relationship when, in fact, there is no relation.

Conﬁdence intervals for a regression coefﬁcient

◦the conﬁdence interval for a regression coefﬁcient in multiple regression is calculated and interpreted the same way as it is in simple linear regression. ‣ estimated regression coefﬁcient +/- (critical t-value)(coefﬁcient standard error) ◦constructing a conﬁdence interval and conducting a t test with a null hypothesis of equal to zero will always result in the same conclusion regarding the statistical signiﬁcance of the regression coefﬁcient.

LOS 12.g: Evaluate how well a regression model explains the dependent variable by analyzing the output of the regression equation and an anova table

◦the info in an anova table is used to attribute the total variation of the dependent variable to one of two sources: the regression model or the residuals. This is indicated in the ﬁrst column in the table, where the "source" of the variation is listed. ◦the info in an anova table can be used to calculate R2, the f statistic, and the standard error of estimate (SEE) ◦R2 = RSS/SST ◦F=MSR/MSE with k and n-k-1 Degrees of freedom ◦SEE = square root of MSE

Interpreting the multiple regression results

◦the intercept term is the value of the dependent variable when the independent variables are all equal to zero ◦each sloe coefﬁcient is the estimated chg in dep var for a one unit change in the independent variable - holding other independent variables constant. ‣ slope coefﬁcients in multiple regression are sometimes called partial slope coefﬁcients.

Coefﬁcient of determination (R sqr)

◦the percentage of the total variation in the dependent variable explained by the independent variable

LOS 11.d: Distinguish between the dependent and independent variables in a linear regression

◦the purpose of simple linear regression is to explain the variation in a dependent variable in terms of the variation in a single independent variable ◦the term variation is interpreted as the degree to which a variable differs from its mean value ◦ don't confuse variation with variance - they are related but are not the same.

How much below the magic number 2 is statistically signiﬁcant enough to reject the null hypothesis of no positive serial correlation?

◦there are tables of DW stats that give upper and lower critical DW values for various sample sizes, levels of sign, and numbers of degrees of freedom

Durbin-Watson statistic (DW)

◦used to detect the presence of serial correlation

◦so you adjust R2 for the number of independent variables.

◦where: ‣ n = number of observations ‣ k= number of independent variables ‣ ra2= adjiusted r2 ‣ always less than or equal to r2. so adding more may either increase or decrease the ra2. if new variable barely impacts r2 it may decrease. it can also be less than zero if r2 is low enough

Level 2 SS 3

Conjuntos de estudio relacionados

Study Guide Questions: Chapter 30 - Assessment of the Cardiovascular System

nutrition 1 practice questions

Pharm Ch 38 Antibiotic P1

module 2 exam

Readings Quiz 3: Study Questions HIST 2610

167 Exam 4***

Social Psychology Chapter 10

Adaptive Practice Ch 5

week 2 edapts

Cell Chemistry Quiz

C++ Chapter 5 Repetition Structures

Study Guide Unit 2 4 & 2

Gross and Net income Quick Check

Naked Economics

A& P II continuous exam questions

Fission and Fusion

EXSC 3210 Exercise Adherence

Auditing & Systems Exam 2 MC

CH. 10/11

Computer Apps Master of Excel