Further Development and Analysis of the Classical Linear Regression Model
The square of a t-distributed random variable with T - k degrees of freedom also follows an F-distribution with
1 and T - k degrees of freedom
Problems with R^2 as a goodness of fit measures
1. Defined in terms of variation about the mean of y so that if a model is reparameterized (rearranged) and the dependent variable changes, R^2 will change, even if the second model was a simple rearrangement of the first with identical RSS 2. Never falls if more regressors are added to the regression 3. Can take values of 0.9 or higher for time series regressions, hence not good at discriminating between models, since a wide array of models will frequently have broadly similar (and high) values of R^2
Degrees of freedom parameters of F-distribution
1. m (number of restrictions imposed on the model) 2. T - k (number of observations less the number of regressors for the unrestricted regression) **Appropriate critical value is in column m, row (T - k)
Financial example of multiple linear regression model is
APT
R^2
Square of the correlation coefficient between y and y hat (i.e., square of the correlation between the values of the dependent variable and the corresponding fitted values from the model) **Lies between 0 and 1 = ESS / TSS
Null hypothesis of F test
Test statistic follows an F-distribution (i.e., all of the regression slope parameters are simultaneously zero)
Asymptotic theory
The results in theory hold only if there is an infinite number of observations (though that is usually not required to invoke the theory)
Total sum of squares (TSS)
Total variation across all observations of the dependent variable about its mean value = Summation (y(t) - y bar)^2 **Split into: 1. Explained sum of squares (ESS): part explained by model 2. Residual sum of squares (RSS): part not explained by model
There is indeed an explanatory variable x1 next to beta1 in the multiple linear regression model but it is
a column of ones of length T and is thus a constant term not written **Beta1 is thus equivalent to alpha in the simple model and can be interpreted as the intercept (average value which y would take if all of the explanatory variables took a value of zero)
The essential idea behind 'out-of-sample' is
a proportion of the data is not used in model estimation, but is retained for model testing **If data mining has been carried out, the model will tend to give very inaccurate forecasts for the out-of-sample period
In multiple regression, RSS is minimized with respect to
all of the elements of beta ** = vector of estimated parameters beta hat = (X'X)^-1 x X^-1 * y **Parameters of RHS = k x 1 (given that there are k parameters to be estimated by the formula for beta hat)
In simple linear regression, the residual sum of squares (RSS) is minimized with respect to
alpha and beta
Hypotheses which are not linear/are multiplicative cannot be tested with
an F-test or a t-test
Quantile regressions represent a comprehensive way to
analyze the relationships between a set of variables **Far more robust to outliers and non-normality than OLS regressions (such as how median is better measure of average behavior than mean when distribution is considerably skewed by a few large outliers) **Non-parametric technique since no distributional assumptions are required to optimally estimate the parameters
A simple bivariate regression model implies that
changes in the dependent variable are explained by reference to changes in one single explanatory variable x (e.g. CAPM)
Trying many variables in a regression without basing the selection of the candidate variables on a financial or economic theory is known as
data mining (or data snooping) --> the true significance level will be considerably greater than the nominal significance level assumed
Quantile regressions effectively model the entire conditional distribution of y
given the explanatory variables (rather than only the mean, as is done in OLS) **Examines the impact of all explanatory variables not only on the location and sale of the distribution of y, but also on the shape of the distribution as well **Should be carried out by minimizing the sum of the absolute values of the residuals for the median value to be used in regressions
Goodness of fit statistics explain
how well the sample regression function (SRF) fits the data (i.e., how close the fitted regression line is to all of the data points taken together)
OLS selects coefficient estimates that
minimize the quantity of RSS (i.e., the lower the minimized value of RSS, the better the model fits the data)
We generalize the model with k regressors (aka independent x variables) by
multiplying them with a coefficient estimate beta -x are the explanatory variables thought to influence y -betas are the parameters which quantify the effect of each of these explanatory variables on y
Any hypothesis that could be tested with a t-test could also have been tested with an F-test but
not the other way around **Single hypotheses involving one coefficient can be tested with either t- or F-test, but multiple hypotheses can be tested only with F-test
Nested models implies that
restrictions are imposed on the original model to arrive at a restricted formulation that would be a sub-set of ('nested' within) the original specification
Estimating standard errors of the coefficient estimates in simple regression
s^2 (estimator) = Sigma u(t) hat ^2 / T-2 **T - 2 = number of degrees of freedom for the bivariate regression model (i.e., number of observations minus two) **Two chosen because two observations are effectively "lost" in estimating alpha and beta (the model parameters)
Estimating standard errors the coefficient estimates in multiple regression
s^2 = u' hat x u hat / T - k k = number of regressors including a constant **k observations are lost as k parameters are estimated, leaving T - k degrees of freedom
The t-test is ideal for testing
single hypotheses (i.e., those involving one coefficient)
Adjusted R^2
takes into account the loss of degrees of freedom associated with adding extra variables = 1 - [(T-1)/(T-K) * (1-R^2)] **Include variable if adjusted R^2 greater than regular R^2 and do not if is less than **Still not ideal though as can lead to bulky models
The number of restrictions in an F-test can be informally seen as
the number of equality signs under the null hypothesis
In multiple regression context, each coefficient beta is known as a partial regression coefficient, interpreted as representing
the partial effect of the given explanatory variable on the explained variable, after holding constant or eliminating the effect of all other explanatory variables **"Each coefficient measures the average change in the dependent variable per unit change in a given independent variable, holding all other independent variables constant at their average values"
Quantiles refer to
the position where an observation falls within an ordered series for y (e.g., the median is the observation in the very middle, the (lower) tenth percentile is the value that places 10% of observations below it (and 90% of observations above), etc.)
The probability of rejecting a correct null hypothesis is equal to
the size of the test (denoted by alpha)
The F-distribution has only positive values and is not symmetrical, therefore the null is rejected only if
the test statistic exceeds the critical F-value
Conformable matrices implies that
there is a valid matrix multiplication and addition on the right hand side (RHS) of the equation
Dummy variables are also known as qualitative variables because
they are often used to numerically represent a qualitative variable **Usually specified to take on one of a narrow range of integer values (esp. 0 or 1)
Under an F-test framework
two regressions are required (unrestricted and restricted) -Unrestricted regression: coefficients are freely determined by the data, as has been constructed previously -Restricted regression: one in which coefficients are restricted (i.e., restrictions are imposed on some betas) **Also known as restricted least squares **RSS are determined for each regression and then compared in the test statistic = RRSS - URSS / URSS * T - k / m T = # observations m = # restrictions k = # regressors in unrestricted regression including constant
Hedonic models are used to
value real assets (such as housing) **View the asset as representing a bundle of characteristics, each of which gives either utility or disutility to the consumer **In these models, the coefficient estimates represent prices of the characteristics
Parameter variance-covariance matrix
var(Beta hat) = s^2(X'X)^-1 **Leading diagonal terms give the coefficient variances **Off-diagonal terms give the covariance between the parameter estimates ***Variance of Beta hat (1) = first diagonal element ***Variance of Beta hat (2) = second diagonal element on leading diagonal ***Variance of Beta hat (k) = 4th diagonal element --> coefficient standard errors = square roots of each of the terms on the leading diagonal
Writing a multiple linear regression model in matrix form
y = X*Beta + u y = dimension T x 1 X = dimension T x k Beta = dimension k x 1 u = dimension T x 1 **Here all of the time observations have been stacked up in a vector and all of the explanatory variables have been squashed together so that there is a column for each in the X matrix