Further Development and Analysis of the Classical Linear Regression Model

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

The square of a t-distributed random variable with T - k degrees of freedom also follows an F-distribution with

1 and T - k degrees of freedom

Problems with R^2 as a goodness of fit measures

1. Defined in terms of variation about the mean of y so that if a model is reparameterized (rearranged) and the dependent variable changes, R^2 will change, even if the second model was a simple rearrangement of the first with identical RSS 2. Never falls if more regressors are added to the regression 3. Can take values of 0.9 or higher for time series regressions, hence not good at discriminating between models, since a wide array of models will frequently have broadly similar (and high) values of R^2

Degrees of freedom parameters of F-distribution

1. m (number of restrictions imposed on the model) 2. T - k (number of observations less the number of regressors for the unrestricted regression) **Appropriate critical value is in column m, row (T - k)

Financial example of multiple linear regression model is

APT

R^2

Square of the correlation coefficient between y and y hat (i.e., square of the correlation between the values of the dependent variable and the corresponding fitted values from the model) **Lies between 0 and 1 = ESS / TSS

Null hypothesis of F test

Test statistic follows an F-distribution (i.e., all of the regression slope parameters are simultaneously zero)

Asymptotic theory

The results in theory hold only if there is an infinite number of observations (though that is usually not required to invoke the theory)

Total sum of squares (TSS)

Total variation across all observations of the dependent variable about its mean value = Summation (y(t) - y bar)^2 **Split into: 1. Explained sum of squares (ESS): part explained by model 2. Residual sum of squares (RSS): part not explained by model

There is indeed an explanatory variable x1 next to beta1 in the multiple linear regression model but it is

a column of ones of length T and is thus a constant term not written **Beta1 is thus equivalent to alpha in the simple model and can be interpreted as the intercept (average value which y would take if all of the explanatory variables took a value of zero)

The essential idea behind 'out-of-sample' is

a proportion of the data is not used in model estimation, but is retained for model testing **If data mining has been carried out, the model will tend to give very inaccurate forecasts for the out-of-sample period

In multiple regression, RSS is minimized with respect to

all of the elements of beta ** = vector of estimated parameters beta hat = (X'X)^-1 x X^-1 * y **Parameters of RHS = k x 1 (given that there are k parameters to be estimated by the formula for beta hat)

In simple linear regression, the residual sum of squares (RSS) is minimized with respect to

alpha and beta

Hypotheses which are not linear/are multiplicative cannot be tested with

an F-test or a t-test

Quantile regressions represent a comprehensive way to

analyze the relationships between a set of variables **Far more robust to outliers and non-normality than OLS regressions (such as how median is better measure of average behavior than mean when distribution is considerably skewed by a few large outliers) **Non-parametric technique since no distributional assumptions are required to optimally estimate the parameters

A simple bivariate regression model implies that

changes in the dependent variable are explained by reference to changes in one single explanatory variable x (e.g. CAPM)

Trying many variables in a regression without basing the selection of the candidate variables on a financial or economic theory is known as

data mining (or data snooping) --> the true significance level will be considerably greater than the nominal significance level assumed

Quantile regressions effectively model the entire conditional distribution of y

given the explanatory variables (rather than only the mean, as is done in OLS) **Examines the impact of all explanatory variables not only on the location and sale of the distribution of y, but also on the shape of the distribution as well **Should be carried out by minimizing the sum of the absolute values of the residuals for the median value to be used in regressions

Goodness of fit statistics explain

how well the sample regression function (SRF) fits the data (i.e., how close the fitted regression line is to all of the data points taken together)

OLS selects coefficient estimates that

minimize the quantity of RSS (i.e., the lower the minimized value of RSS, the better the model fits the data)

We generalize the model with k regressors (aka independent x variables) by

multiplying them with a coefficient estimate beta -x are the explanatory variables thought to influence y -betas are the parameters which quantify the effect of each of these explanatory variables on y

Any hypothesis that could be tested with a t-test could also have been tested with an F-test but

not the other way around **Single hypotheses involving one coefficient can be tested with either t- or F-test, but multiple hypotheses can be tested only with F-test

Nested models implies that

restrictions are imposed on the original model to arrive at a restricted formulation that would be a sub-set of ('nested' within) the original specification

Estimating standard errors of the coefficient estimates in simple regression

s^2 (estimator) = Sigma u(t) hat ^2 / T-2 **T - 2 = number of degrees of freedom for the bivariate regression model (i.e., number of observations minus two) **Two chosen because two observations are effectively "lost" in estimating alpha and beta (the model parameters)

Estimating standard errors the coefficient estimates in multiple regression

s^2 = u' hat x u hat / T - k k = number of regressors including a constant **k observations are lost as k parameters are estimated, leaving T - k degrees of freedom

The t-test is ideal for testing

single hypotheses (i.e., those involving one coefficient)

Adjusted R^2

takes into account the loss of degrees of freedom associated with adding extra variables = 1 - [(T-1)/(T-K) * (1-R^2)] **Include variable if adjusted R^2 greater than regular R^2 and do not if is less than **Still not ideal though as can lead to bulky models

The number of restrictions in an F-test can be informally seen as

the number of equality signs under the null hypothesis

In multiple regression context, each coefficient beta is known as a partial regression coefficient, interpreted as representing

the partial effect of the given explanatory variable on the explained variable, after holding constant or eliminating the effect of all other explanatory variables **"Each coefficient measures the average change in the dependent variable per unit change in a given independent variable, holding all other independent variables constant at their average values"

Quantiles refer to

the position where an observation falls within an ordered series for y (e.g., the median is the observation in the very middle, the (lower) tenth percentile is the value that places 10% of observations below it (and 90% of observations above), etc.)

The probability of rejecting a correct null hypothesis is equal to

the size of the test (denoted by alpha)

The F-distribution has only positive values and is not symmetrical, therefore the null is rejected only if

the test statistic exceeds the critical F-value

Conformable matrices implies that

there is a valid matrix multiplication and addition on the right hand side (RHS) of the equation

Dummy variables are also known as qualitative variables because

they are often used to numerically represent a qualitative variable **Usually specified to take on one of a narrow range of integer values (esp. 0 or 1)

Under an F-test framework

two regressions are required (unrestricted and restricted) -Unrestricted regression: coefficients are freely determined by the data, as has been constructed previously -Restricted regression: one in which coefficients are restricted (i.e., restrictions are imposed on some betas) **Also known as restricted least squares **RSS are determined for each regression and then compared in the test statistic = RRSS - URSS / URSS * T - k / m T = # observations m = # restrictions k = # regressors in unrestricted regression including constant

Hedonic models are used to

value real assets (such as housing) **View the asset as representing a bundle of characteristics, each of which gives either utility or disutility to the consumer **In these models, the coefficient estimates represent prices of the characteristics

Parameter variance-covariance matrix

var(Beta hat) = s^2(X'X)^-1 **Leading diagonal terms give the coefficient variances **Off-diagonal terms give the covariance between the parameter estimates ***Variance of Beta hat (1) = first diagonal element ***Variance of Beta hat (2) = second diagonal element on leading diagonal ***Variance of Beta hat (k) = 4th diagonal element --> coefficient standard errors = square roots of each of the terms on the leading diagonal

Writing a multiple linear regression model in matrix form

y = X*Beta + u y = dimension T x 1 X = dimension T x k Beta = dimension k x 1 u = dimension T x 1 **Here all of the time observations have been stacked up in a vector and all of the explanatory variables have been squashed together so that there is a column for each in the X matrix


Kaugnay na mga set ng pag-aaral

Chpt 21 Machining Operations and machine Tools

View Set

chapter 6 enzymes biochemistry 299

View Set

3rd Millennium Classrooms:: Alcohol-Wise (Pre-test)

View Set

Study Guide Interest Group Amer Gov't

View Set

Managerial Accounting: Exam 1 Review

View Set

MODULE 4- MARKET EQUILIBRIUM AND POLICY

View Set

FINAL ENGLISH 10 ( Julius Caesar)

View Set