Eco 441k Midterm 2
Interaction Effect
In multiple regression, the partial effect of one explanatory variable depends on the value of a different explanatory variable.
classical linear model (CLM) assumptions for cross-sectional regression.
MLR.1 (Linear in Parameters) MLR.2 (Random Sampling) MLR.3 (No Perfect Collinearity) MLR.4 (Zero Conditional Mean) MLR.5 (Homoskedasticity) MLR.6 (Normality)
if we make Assumption MLR.6, then we are necessarily assuming
MLR.4 and MLR.5.
T test uses
T test for two sided alternative 2 T test for other single restrictions - value other than 0 and linear combinations
Asymptotic Variance
The square of the value by which we must divide an estimator in order to obtain an asymptotic standard normal distribution.
For the null hypothesis H0 : βj = a the t-statistic is given by
t = (estimate - hypothesized value) / standard error = (Bhatj - a) / se(Bhatj)
Under the CLM assumptions, the t statistics have
t distributions under the null hypothesis.
Because βjhat is a linear combination of
the sample errors ui , which are assumed to be independently, identically distributed normally,
Residual analysis can be used to
determine whether particular members of the sample have predicted values that are well above or well below the actual outcomes.
The normality assumption, MLR.6, assumes that the population error u is
independent of the explanatory variables x1, x2,..., xk and the error is normally distributed with a mean of zero and a variance of 𝛔^2 That is, u ∼ Normal(0, 𝛔^2) .
Heteroskedasticity does not cause bias or inconsistency in the OLS estimators, but the usual standard errors and test statistics are
no longer valid.
the F statistic is always
nonnegative (and almost always strictly positive).
Normality of the error term translates into
normal sampling distributions of the OLS estimators
independent normal random variables: a linear combination of such random variables is ______distributed
normally
For reporting the outcomes of F tests, ______________are especially useful.
p-values
P-values for F Test
p-values are given by: p value = Pr(F (2, n-k-1) > F) Then reject null if p-value < α -> F > F (2, n-k-1) α
Explanatory variables that affect y and are uncorrelated with all the other explanatory variables can be used to
reduce the error variance without inducing multicollinearity.
to determine a rule for rejecting Hnot, we need to decide on the
relevant alternative hypothesis
When do we reject Hnot: Bj= 0
values of t(Bhatj) sufficiently far from zero will result in a rejection of Hnot The precise rejection rule depends on the alternative hypothesis and the chosen significance level of the test.
Single hypothesis tests concerning more than one Bj can always be tested by
rewriting the model to contain the parameter of interest. Then, a standard t statistic can be used.
As the degrees of freedom in the t distribution get large, the t distribution approaches the
standard normal distribution.
Homoskedastic
(also spelled "homoscedastic") refers to a condition in which the variance of the residual, or error term, in a regression model is constant. That is, the error term does not vary much as the value of the predictor variable changes. Homoskedasticity is one assumption of linear regression modeling. If the variance of the errors around the regression line varies much, the regression model may be poorly defined. The lack of homoskedasticity may suggest that the regression model may need to add predictor variables to explain the performance of the dependent variable.
SSTj: The Total Sample Variation in xj
(i) more variation in xj => more precision in estimating jth slope (ii) SSTj grows with n => more precision for all slopes
For Normal Sampling Distribution, standardizing transformation implies
(βhatj-βj)/ (s.d.(Bhatj) ~ Normal(0, 1) Linear combinations of the βhatj are also normally distributed - Bhatj is approximately normally distributed in large samples even without Assumption
t Distribution for the Standardized Estimators
(βhatj-βj)/ (s.e.(Bhatj)) ~ t(n-k-1) 1 Note that we have replaced the σ^2n with σhat^2 Used s.e. instead of s.
Adjusted R-squared
, as an alternative to the usual R-squared for measuring goodness-of-fit. Whereas Adjusted R-squared can never fall when another variable is added to a regression, Adjusted R-squared penalizes the number of regressors and can drop when an independent variable is added. This makes Adjusted R-squared preferable for choosing between nonnested models with different numbers of explanatory variables. Neither Rsquared nor Adjusted R-squared can be used to compare models with different dependent variables. Nevertheless, it is fairly easy to obtain goodness-of-fit measures for choosing between y and log(y) as the dependent variable,
significance level
. The acceptable level of Type I error. Probability of rejecting Hnot when it is true
how to test hypotheses using a classical approach:
1- state the alternative hypothesis 2- choose a significance level, which then determines a critical value. 3- Once the critical value has been identified, the value of the t statistic is compared with the critical value, and the null is either rejected or not rejected at the given significance level.
How to do the Breusch-Pagan test for heteroskedasticity?
1. Estimate the restricted model using OLS to obtain the squared residuals u^2hat . 2. Regress the squared residuals on the explanatory variables on which the heteroskedasticity depends. 3. Use the information gained from step 2 to form either the F statistic or the LM statistic, and compute the p-value to decide whether to reject the null. the Breusch-Pagan test can use either an F statistic or an LM statistic to test the null. In this case, you can use the following formula to form an LM statistic to test the null hypothesis LM=n× (Rsquared of the u2hat)
Linear Probability Model (LPM)
A binary response model where the response probability is linear in its parameters.
Classical Errors-in-Variables (CEV)
A measurement error model where the observed measure equals the actual variable plus an independent, or at least an uncorrelated, measurement error.
How should we choose a rejection rule?
1st- decide on a significance level
the LM statistic can be used instead of the F statistic for testing.
exclusion restrictions
Least Absolute Deviations (LAD)
A method for estimating the parameters of a multiple regression model based on minimizing the sum of the absolute values of the residuals.
Feasible GLS (FGLS) Estimator
A GLS procedure where variance or correlation parameters are unknown and therefore must first be estimated. (See also generalized least squares estimator.)
Difference in Slopes
A description of a model where some slope parameters may differ by group or time period.
Prediction Interval
A confidence interval for an unknown outcome on a dependent variable in a multiple regression model.
Asymptotic Confidence Interval
A confidence interval that is approximately valid in large sample sizes.
Regression Specification Error Test (RESET)
A general test for functional form in a multiple regression model; it is an F test of joint significance of the squares, cubes, and perhaps higher powers of the fitted values from the initial OLS estimation.
Adjusted R-Squared .
A goodness-of-fit measure in multiple regression analysis that penalizes additional explanatory variables by using a degrees of freedom adjustment in estimating the error variance
MLR4. Zero Conditional Mean Assumption
A key assumption used in multiple regression analysis that states that, given any values of the explanatory variables, the expected value of the error equals zero.
Random Coefficient (Slope) Model
A multiple regression model where the slope parameters are allowed to depend on unobserved unit-specific variables.
Stratified Sampling
A nonrandom sampling scheme whereby the population is first divided into several nonoverlapping, exhaustive strata, and then random samples are taken from within each stratum.
standard normal random variable
A normal random variable with mean 0 and SD equal to 1
Functional Form Misspecification
A problem that occurs when a model has omitted functions of the explanatory variables (such as quadratics) or uses the wrong functions of either the dependent variable or some explanatory variables.
Standardized Random Variable
A random variable transformed by subtracting off its expected value and dividing the result by its standard deviation; the new random variable has mean zero and standard deviation one [βjhat−a]/[se(βjhat)] is essentially an estimate for how many standard errors βjhat is away from a hypothetical population value of βj . When the model does not suffer from heteroskedasticity, this standardized random variable is distributed in accordance with the t distribution with n−k−1 degrees of freedom.
Auxiliary Regression
A regression used to compute a test statistic—such as the test statistics for heteroskedasticity and serial correlation—or any other regression that does not estimate the model of primary interest.
Bootstrap
A resampling method that draws random samples, with replacement, from the original data set.
Smearing Estimate
A retransformation method particularly useful for predicting the level of a response variable when a linear model has been estimated for the natural log of the response variable.
Exogenous Sample Selection
A sample selection that either depends on exogenous explanatory variables or is independent of the error term in the equation of interest.
asymptotic properties
large sample properties Large Sample Properties are helpful when MLRM.6 is not true
Bootstrap Standard Error
A standard error obtained as the sample standard deviation of an estimate across all bootstrap samples.
Heteroskedasticity-Robust Standard Error
A standard error that is (asymptotically) robust to heteroskedasticity of unknown form.
Asymptotic Standard Error
A standard error that is valid in large samples.
Asymptotic t Statistics
A t statistic that has an approximate standard normal distribution in large samples.
Heteroskedasticity-Robust t Statistic
A t statistic that is (asymptotically) robust to heteroskedasticity of unknown form.
Resampling Method
A technique for approximating standard errors (and distributions of test statistics) whereby a series of samples are obtained from the original data set and estimates are computed for each subsample.
White Test for Heteroskedasticity
A test for heteroskedasticity that involves regressing the squared OLS residuals on the OLS fitted values and on the squares of the fitted values; in its most general form, the squared OLS residuals are regressed on the explanatory variables, the squares of the explanatory variables, and all the nonredundant interactions of the explanatory variables.
Breusch-Pagan Test for Heteroskedasticity (BP Test)
A test for heteroskedasticity where the squared OLS residuals are regressed on the explanatory variables in the model.
multiple hypotheses test
A test of a null hypothesis involving more than one restriction on the parameters.
Overall Significance of a Regression
A test of the joint significance of all explanatory variables appearing in a multiple regression equation.
Overall Significance of a Regression
A test of the joint significance of all explanatory variables appearing in a multiple regression equation. Null that all slopes are zero - ie none of the regressors matter: H0 : β1 = 0, β2 = 0, ...., βk = 0 Alternative is that at least one βj is different from 0. The restricted model is y = β0 + u Rsquared for the restricted model is 0 The F statistic for testing this null simplifies to F = (Rsquared/k) / [(1-Rsquared)/ (n-k-1)] ~ F (q, n-k-1) where Rsquared is the usual Rsquared from the regression of y on all the x's
Lagrange Multiplier (LM) Statistic
A test statistic with large-sample justification that can be used to test for omitted variables, heteroskedasticity, and serial correlation, among other model specification problems.
Davidson-MacKinnon Test
A test that is used for testing a model against a nonnested alternative; it can be implemented as a t test on the fitted values from the competing model.
Residual Analysis
A type of analysis that studies the sign and size of residuals for particular observations after a multiple regression model has been estimated.
Dummy Variables
A variable that takes on the value zero or one.
Ordinal Variable
A variable where the ordering of the values conveys information but the magnitude of the values does not.
Reject null or not, using t statistic
According to the rejection rule, in order to determine whether to reject the null hypothesis, the t statistic for the particular coefficient you are testing must be compared to the corresponding critical value. Critical values for the t distribution, are based on degrees of freedom and desired significance level, In general, for a two-sided alternative hypothesis, a general t statistic t βjhat and a general critical value c : 2-Sided Alternative H1: βj ≠ 0 | tβjhat| > c Reject null | tβjhat | < c Fail to reject null
Asymptotic Efficiency
Already know Gauss Markov Theorem ñOLS is BLUE Large Sample Result: Under Gauss-Markov assumptions MLRM1-5,the OLS estimators have the smallest asymptotic variance among the set of consistent estimators Large Sample justification for using OLS
Chow Statistic
An F statistic for testing the equality of regression parameters across different groups (say, men and women) or time periods (say, before and after a policy change).
Heteroskedasticity-Robust F Statistic
An F-type statistic that is (asymptotically) robust to heteroskedasticity of unknown form.
Heteroskedasticity-Robust LM Statistic
An LM statistic that is robust to heteroskedasticity of unknown form.
Program Evaluation
An analysis of a particular private or public program using econometric methods to obtain the causal effect of the program.
Policy Analysis
An empirical analysis that uses econometric methods to evaluate the effects of a certain policy.
Consistency
An estimator converges in probability to the correct population value as the sample size grows.
Generalized Least Squares (GLS) Estimators
An estimator that accounts for a known structure of the error variance (heteroskedasticity), serial correlation pattern in the errors, or both, via a transformation of the original model.
Weighted Least Squares (WLS) Estimators
An estimator used to adjust for a known form of heteroskedasticity, where each squared residual is weighted by the inverse of the (estimated) variance of the error.
Endogenous Explanatory Variable
An explanatory variable in a multiple regression model that is correlated with the error term, either because of an omitted variable, measurement error, or simultaneity.
Lagged Dependent Variable
An explanatory variable that is equal to the dependent variable from an earlier time period.
Interaction Term
An independent variable in a regression model that is the product of two explanatory variables.
Proxy Variable
An observed variable that is related but not identical to an unobserved explanatory variable in multiple regression analysis.
Zero Conditional Mean Assumption
Assumption 4 (ZCM) E(u|x)=E(u) E(u)=0 If u and x are uncorrelated, they are not linearly related The average value of u does not depend on the value of x Under ZCM, Var(u|x1, ..., xk ) = E(u^2|x1, ..., xk )= σ^2
classical linear model (CLM) assumptions.
Assumptions MLR.1 Linear in Parameters, MLR.2 No Perfect Collinearity, MLR.3 No Perfect Collinearity, MLR.4 Zero Conditional Mean Assumption, MLR.5 Homoskedasticity and MLR.6 It is best to think of the CLM assumptions as containing all of the Gauss-Markov assumptions plus the assumption of a normally distributed error term. Under CLM assumptions the OLS estimators are Minimum Variance Unbiased Estimators Slightly stronger than Gauss-Markov Theorem Estimators dont have to be linear in the yi
Unbiasedness of OLS: Assumptions #
Assumptions MLR.1, MLR.2, MLR.3 and MLR.4, the OLS estimators are unbiased estimators of the population parameters. Bhatj ~ Normal (βj,Var(βhatj)) unbiasedness: Includes unbiasedness E(βhatj) = βj
Gauss-Markov assumptions
Assumptions MLR.1, MLR.2, MLR.3, MLR.4 and MLR.5 (for cross-sectional regression)
The group represented by the overall intercept in a multiple regression model that includes dummy explanatory variables.
Base Group
The argument justifying the normal distribution for the errors
Because u is the sum of many different unobserved factors affecting y, we can invoke the central limit theorem (CLT) to conclude that u has an approximate normal distribution.
A 95% confidence interval for the unknown Bj is given by
Bhatj ± c * se(Bhatj) lower bound: Blowerj = Bhatj - c * se(Bhatj) upper bound Bupperj = Bhatj + se(Bhatj)
Three quantities are needed for constructing a confidence interval
Bhatj, se(Bhatj), and c
Attenuation Bias
Bias in an estimator that is always toward zero; thus, the expected value of an estimator with attenuation bias is less in magnitude than the absolute value of the parameter.
General Linear Restrictions: F test
Can do F test for any number of linear restrictions
Binary Qualitative Data
Create abinary variable(indicator variable) commonly known as dummy variables Zero - One is natural and makes interpretation easy Can be done in 2 ways eg: gender - male or female
Self-Selection
Deciding on an action based on the likely benefits, or costs, of taking that action.
dummy variable
Dummy variables are also useful for incorporating ordinal information, such as a credit or a beauty rating, in regression models. We simply define a set of dummy variables representing different outcomes of the ordinal variable, allowing one of the categories to be the base group. Dummy variables can be interacted with quantitative variables to allow slope differences across different groups. In the extreme case, we can allow each group to have its own slope on every variable, as well as its own intercept.
MLRM6. if u is assumed to be normally distributed with a mean of 0, then
E(u)=0 .
MLRM6. if u is assumed to be independent of the explanatory variables, then
E(u|x1,..., xk)=E(u) by definition. Further, if u is assumed to be normally distributed with a mean of 0, then E(u)=0 . It follows then that E(u|x1,..., xk)=E(u)=0 . Therefore MLR.4, the zero conditional mean assumption, must also hold.
Robust Inference
Easy to fix problem Use robust option in STATA These methods are known as heteroskedasticity- robust procedures Works for any form of heteroskedasticity Also works (asymptotically) under MLRM5 Then for t-tests you just proceed as usual but instead use the robuststandard errors Once we have the heteroskedasticity- robust standard errors, it is simple to construct the het. t.-robust tstatistics: t=estimate-hypothesised value / standard error where we plug-in the robust standard error For F statistic, usual form in terms of Rsquared or SSR is invalid; STATA will do it via postestimation hypothesis tests.
F-statistic (or F ratio) is given
F = [(SSR(r)-SSR(ur))/q] / SSR(ur) /(n-k-1) ~ F(q, n-k-1) Under null with CLM assumptions, F ~F (q, n-k-1)
The F Test Reject null if
F > F (q, n-k-1) p-value = Ftail(q, n-k-1, F) < α If H0 is rejected we say that xkq+1,..., xk are jointly statistically significant. If H0 is not rejected we say that xkq+1,..., xk are jointly statistically insignificant, statistical justification for dropping them from the model
F statistic
F= [(SSS(r) - SSR(ur) / 2] / (SSR(ur) / (n-k-1) F= [(Rsquared(ur) - Rsquared(r) ) / 2] / [(1-Rsquared(ur)) / (df)] Reject null if F > F(2, n-k-1) of α
P values
For a given value of the test statistic, the p-value is the smallest signiÖcance level at which the null hypothesis would be rejected. One Sided (positive): p value = P(t(n-k-1) > t(Bhatj) One sided: For a given α p value < α Two sided: p value = 2P(t(n-k-1) > |t(Bhatj) | p value < α P-value allows you to do test based on favorite α without needing t
Testing Hypotheses about a Single Parameter null hypothesis
H0 : βj = 0 xj has zero (partial) effect on the expected value of y
Asymptotically Efficient
For consistent estimators with asymptotically normal distributions, the estimator with the smallest asymptotic variance
Average Partial Effect (APE) .
For nonconstant partial effects, the partial effect averaged across the specified population
Average Partial Effect (APE)
For nonconstant partial effects, the partial effect averaged across the specified population.
Quadratic Functions
Functions that contain squares of one or more explanatory variables; they capture diminishing or increasing effects on the dependent variable.
When the form of heteroskedasticity is known,
GLS estimation can be used. This leads to weighted least squares as a means of obtaining the BLUE estimator. The test statistics from the WLS estimation are either exactly valid when the error term is normally distributed or asymptotically valid under nonnormality. This assumes, of course, that we have the proper model of heteroskedasticity. More commonly, we must estimate a model for the heteroskedasticity before applying WLS. The resulting feasible GLS estimator is no longer unbiased, but it is consistent and asymptotically efficient. The usual statistics from the WLS regression are asymptotically valid. We discussed a method to ensure that the estimated variances are strictly positive for all observations, something needed to apply WLS.
variances of the OLS estimators under the Gauss-Markov assumptions
Gauss-Markov assumptions imply nothing about whether OLS gives the smallest variance of all unbiased estimators.
This level is known as the pvalue
Given the observed value of the t statistic, what is the smallest significance level at which the null hypothesis would be rejected
Two-Sided Alternative to test the null hypothesis
H1: Bj≠ 0 Under this alternative, xj has a ceteris paribus effect on y without specifying whether the effect is positive or negative
Heteroskedasticity of Unknown Form
Heteroskedasticity that may depend on the explanatory variables in an unknown, arbitrary fashion.
Heteroskedasticity
Heteroskedasticity does NOT cause bias or inconsistency! Violation of MLRM5 -> standard errors are off since var(ˆβj) is not usual form confidence intervals, t-statistics and F-statistics will be invalid Heteroskedasticity is present messes up inference Heteroskedasticity is easily dealt with using the robust option in STATA Then use the test options in STATA and everything will be fine Check your data for outliers ñcould indicate a problem in the data
Testing Hypotheses about a Single Parameter
Hypothesis testing: -Null (H0) and Alternative (H1) Hypotheses -Type I and Type II errors Hypotheses concerning the mean: -Known variance - applet -Unknown variance - t stat and distribution
Under the Gauss-Markov assumptions, the distribution of Bjhat has which shape?
IT can have virtually any shape.
One-sided alternative hypothesis
If it states that a parameter is larger than the null hypothesis value or if it states that the parameter is smaller than the null value. Ex Hnot: Bj = 0 H1: Bj>0 Thus, we are looking for a "sufficiently large" positive value of in order to reject in favor of H1 . Negative values of t(Bhatj) provide no evidence in favor of H1 .
Percent Correctly Predicted
In a binary response model, the percentage of times the prediction of zero or one coincides with the actual outcome.
Response Probability
In a binary response model, the probability that the dependent variable takes on the value one, conditional on explanatory variables.
Over Controlling
In a multiple regression model, including explanatory variables that should not be held fixed when studying the ceteris paribus effect of one or more other explanatory variables; this can occur when variables that are themselves outcomes of an intervention or a policy are included among the regressors.
Missing at Random
In multiple regression analysis, a missing data mechanism where the reason data are missing may be correlated with the explanatory variables but is independent of the error term.
Restricted Model
In hypothesis testing, the model obtained after imposing all of the restrictions required under the null. The restricted model always has fewer parameters than the unrestricted model.
unrestricted model
In hypothesis testing, the model that has no restrictions placed on its parameters.
critical value
In hypothesis testing, the value against which a test statistic is compared to determine whether or not the null hypothesis is rejected.
Control Group
In program evaluation, the group that does not participate in the program.
Treatment Group .
In program evaluation, the group that participates in the program
Consistency
In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter θ0—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to θ0. Consistency requires the variance to decrease with n Correlation betweenu and any of x1,x2,...,xk results in all the OLS coeficients being inconsistent.
Population R-Squared
In the population, the fraction of the variation in the dependent variable that is explained by the explanatory variables.
Assumption MLR.3 No Perfect Collinearity
In the sample (and therefore in the population), none of the independent variables is constant, and there are no exact linear relationships among the independent variables Assumption MLR.3 does allow the independent variables to be correlated; they just cannot be perfectly correlated.
Single Dummy Independent Variable
Include dummy variable as explanatory variable in regression model lwage=β0+δ0female+β1educ+u ontrolling for education δ0 is the effect on the mean log wage of beingfemale δ0 if δ0<0 then, for the same level of other factors, women earn less than men on average
Interaction E§ects - Di§erent Slopes
Interacting a dummy variables with a continuous explanatory variables allows for differences in slopes
Interaction Effects - Interacting Dummies
Interacting a dummy variables with a continuous explanatory variables allows for differences in slopes Application: Interaction between female and nonwhite I can interact dummies with each other. As an example I could interact nonwhite with female as follows:generate áoat femnon = female*nonwhiteand run the regression
Considerations When Using Interactions
Interaction terms allow the partial effect of an explanatory variable, say x1, to depend on the level of another variable, say x2—and vice versa. Interpreting models with interactions can be tricky. The coefficient on x1, say B1, measures the partial effect of on y when x2=0, which may be impossible or uninteresting. Centering x1 and x2 around interesting values before constructing the interaction term typically leads to an equation that is visually more appealing. When the variables are centered about their sample averages, the coefficients on the levels become estimated average partial effects. A standard t test can be used to determine if an interaction term is statistically significant. Computing the partial effects at different values of the explanatory variables can be used to determine the practical importance of interactions.
What is a confidence interval?
Interval estimator
The F Test - Joint and Individual Significance
It is possible that variables could be individually significant but jointly significant
Why? Tests for Heteroskedasticity
It is useful to test whether het. is present in the population model since: the usual tstats have exact tdistributions under the classical assumption (the robust t-stat in section 2 above are approximations which are valid for large samples)if het. is present, OLS is no longer BLUE (there are better estimatorswhen the form of the het. is known) I will focus on recent tests which detect forms of het. whichinvalidate the usual OLS statistics
How to form an LM statistic to test the null hypothesis
LM=n× (Rsquared of the u^2hat)
Confidence Intervals, lower and upper bounds are given by
Lower bound - Bjhat- t(n-k-1,0.025) *se(Bjhat) Upper bound βjhat + t(n-k-1,0.025)*se(Bjhat) They are random since it depends on the sample
Multiple Restrictions
More than one restriction on the parameters in an econometric model.
What value of t statistic leads us to reject null?
Must pick two things: 1 An alternative hypothesis - what values βj do we want to test against? 2 Significance level of the test - what probability of rejecting a true null can we live with (Type I error)
Assumptions MLR.1-MLR.6, collectively referred to as the classical linear model (CLM) assumptions, .
OLS estimators are the minimum variance unbiased estimators. This means that OLS has the smallest variance among all unbiased estimators, including those that may not be linear in the explained variable y
Including Irrelevant Variables
Omitted (relevant) variables leads to bias Omitted variables affect variance too R^2j increases omitted variable becomes part of the residual
How to construct a confidence interval
P [ (- t(n-k-1,0.025) < [(βhatj-βj)/ (s.e.(Bhatj))] < t(n-k-1,0.025) ]= .95 Rearrange expression inside P(.) given by Bjhat- t(n-k-1,0.025) *se(Bjhat) < βj < βjhat + t(n-k-1,0.025)*se(Bjhat)
SST/n is a consistent estimator of
SST/n is a consistent estimator of σ^2y regardless of whether the model suffers from heteroskedasticity or not
Asymptotic Properties
Properties of estimators and test statistics that apply when the sample size grows without bound.
Dummy Variables for Multiple Categories
Qualitative Variable with g categories we need g-1 dummy variables plus overall intercept term. intercept for the base group is the overall intercept in the model dummy coefficient is difference between intercepts for that group and the base group Incorporating Ordinal Information: Where a qualitative variable can take on multiple values, but the scale is not meaningful
Type 1 error
Rejecting null hypothesis when it is true ex. at the 5% significance level, we are willing to mistakenly reject Hnot when it is true 5% of the time.
The t-test - One Sided Negative H0 : βj = 0 H1 : βj < 0
Reject H0 in favor of H1 if t(Bhatj)< t(n-k-1, α)
For testing exclusion restrictions, it is often more convenient to have a form of the F statistic that can be computed using the
R-squareds from the restricted and unrestricted models.
R squared j
R^2j is the R-squared from regressing xj on all the other independent variables (and intercept) The Linear Relationship among the Independent Variables How much is xj (linearly) related to other regressors Less related =) more precision for jth slope R^2j < 1 by MLRM3 - no perfect collinearity
Multicollinearity problem
Refers to when there is a strong but not perfect linear relationship among regressors This does not violate any of our assumptions 1-5. => Less precision for some coefficients Analogous to having a small sample less information
Calculating P value for an f statistic
Reject Hnot at significance levels >_ p
The t-test - Two-Sided Alternatives H0 : βj = 0 H1 : βj≠ 0
Reject for large positive or large negative values of t(Bhatj) That is reject H0 when |t(Bhatj)| > t(n-k-1, α/2) For a significance level of α =5%, the critical value is t(n-k-1, .025)
Standardized Coefficients
Regression coefficients that measure the standard deviation change in the dependent variable given a one standard deviation increase in an independent variable.
exclusion restrictions
Restrictions which state that certain variables are excluded from the model (or have zero population coefficients).
what is Rsquared of the u^2hat?
Rsquared of the u^2hat is the R-squared from a regression of the squared residuals u^2hat on the explanatory variables on which the heteroskedasticity depends and n is the number of observations. This statistic is distributed as a chi-squared random variable with k degrees of freedom, where k is the number of explanatory variables in the regressions, with the squared residuals as the dependent variable.
Consistency vs Unbiasedness
Separate concepts - one does not imply the other
The t-test - Rejection Region
Set R such that, P(t(Bhatj) included in R|H0) = P(type I error) = α Lots of ways of setting R based on this We also want to have small chance of type II error Accept null when H1 is correct Conversely would like to have high chance of rejecting H0 when false = Power
why is the t statistic good to test null hypothesis Hnot: Bj= 0
Since se(Bhatj) is always positive, t(Bhatj) has the same sign at Bhatj For a given value of se(Bhatj), a larger value of Bhatj leads to larger values of t(Bhatj) . If Bhatj becomes more negative, so does t(Bhatj) .
Multiple Linear Restrictions: The F Test is used for
Testing Exclusion Restrictions ex. H0 : β3 = 0, β4 = 0 H1 : H0 is false H1 is correct if either β3≠ 0 or β4≠ 0 or both t test can do one or other but not both Test is based on looking at how much SSR increases when impose restriction
the most important application of F statistics.
Testing exclusion restrictions
Testing Other Hypotheses about Bj
Tests other than Hnot: Bj=0 against H1: Bj≠ 0
R-Squared Form of the F Statistic
The F statistic for testing exclusion restrictions expressed in terms of the R squareds from the restricted and unrestricted models Rsquared = 1- (SSR/SST) and hence SSR = SST * (1-Rsquared) -> the F statistic can be rewritten as: F = [ (Rsquared(ur)- Rsquared(r)/q] / [ (1- Rsquared(ur) / (n-k-1) ] ~ F (q, n-k-1)
R-squared form of the F statistic
The F statistic for testing exclusion restrictions expressed in terms of the R-squareds from the restricted and unrestricted models.
F and t Test
The F test could be used to test whether a single independent variable should be excluded from the model -F stat equals square of t stat -equivalent tests if alternative is 2 sided
Uncentered R-Squared
The R-squared computed without subtracting the sample average of the dependent variable when obtaining the total sum of squares (SST).
Two-sided alternative hypothesis
The alternative hypothesis is two-sided if it states that the parameter is different from the null value (it could be either smaller or larger). 2-Sided Alternative H1: βj ≠ 0 | tβjhat| > c Reject null | tβjhat | < c Fail to reject null
normality assumption MLR.6
The classical linear model assumption which states that the error (or dependent variable) has a normal distribution, conditional on the explanatory variables. To make the sampling distributions of Bjhat the tractable, we now assume that the unobserved error is normally distributed in the population. The population error u is independent of the explanatory variables and is normally distributed with zero mean and variance if we make Assumption MLR.6, then we are necessarily assuming MLR.4 and MLR.5. Assumptions MLR.1-MLR.6, collectively referred to as the classical linear model (CLM) assumptions, OLS estimators are the minimum variance unbiased estimators. This means that OLS has the smallest variance among all unbiased estimators, including those that may not be linear in the explained variable y .
Considerations When Using Logarithms
The coefficients have percentage change interpretations. We can be ignorant of the units of measurement of any variable that appears in logarithmic form, and changing units from, say, dollars to thousands of dollars has no effect on a variable's coefficient when that variable appears in logarithmic form. Logs are often used for dollar amounts that are always positive, as well as for variables such as population, especially when there is a lot of variation. They are used less often for variables measured in years, such as schooling, age, and experience. Logs are used infrequently for variables that are already percents or proportions, such as an unemployment rate or a pass rate on a test. Models with log(y) as the dependent variable often more closely satisfy the classical linear model assumptions. For example, the model has a better chance of being linear, homoskedasticity is more likely to hold, and normality is often more plausible. In many cases, taking the log greatly reduces the variation of a variable, making OLS estimates less prone to outlier influence. However, in cases where y is a fraction and close to zero for many observations, log(yi) can have much more variability than yi. For values yi very close to zero, log(yi) is a negative number very large in magnitude. If y>_0 but y=0 is possible, we cannot use log(y). Sometimes log(1+y) is used, but interpretation of the coefficients is difficult. For large changes in an explanatory variable, we can compute a more accurate estimate of the percentage change effect. It is harder (but possible) to predict y when we have estimated a model for log(y)
Measurement Error
The difference between an observed variable and the variable that belongs in a multiple regression equation.
Prediction Error .
The difference between the actual outcome and a prediction of that outcome
Inconsistency
The difference between the probability limit of an estimator and the parameter value.
Assumption MLR.5 Homoskedasticity
The error u has the same variance given any value of the explanatory variables.
Predictions
The estimate of an outcome obtained by plugging specific values of the explanatory variables into an estimated model, usually a multiple regression model.
Intercept Shift
The intercept in a regression model differs by group or time period.
Conditional Median
The median of a response variable conditional on some explanatory variables.
Dummy Variable Trap
The mistake of including too many dummy variables among the independent variables; it occurs when an overall intercept is in the model and a dummy variable is included for each group.
Central Limit Theorem (CLT)
The name of the theorem stating that the sampling distribution of a statistic (e.g. x ) is approximately normal whenever the sample is large and random. it assumes that all unobserved factors affect y in a separate, additive fashion.
Jointly Statistically Significant
The null hypothesis that two or more explanatory variables have zero population coefficients is rejected at the chosen significance level.
degrees of freedom
The number of degrees of freedom is calculated as the number of observations less the number of parameters (all slope parameters as well as the intercept parameter). In this case, this is degrees of freedom=n−k−1
Asymptotic Normality
The sampling distribution of a properly normalized estimator converges to the standard normal distribution.
Studentized Residuals
The residuals computed by excluding each observation, in turn, from the estimation, divided by the estimated standard deviation of the error.
t-test - Single Linear Combination of Parameters
The test is: H0 : β1 = β2 H1 : β1≠ β2 Rewrite the hypotheses as: H0 : β1- β2 = 0 H1 : β1- β2 ≠ 0 Standardise by dividing by the standard error of the di§erence: t = (Bhat1-Bhat2) / se. (Bhat1-Bhat2) The usual OLS output does not have enough information to calculate se. (Bhat1-Bhat2) se. (Bhat1-Bhat2) ≠ se. (Bhat1)- se. (Bhat2) Var(Bhat1-Bhat2) = Var(Bhat1) + Var(Bhat2) - 2Cov (Bhat1,Bhat2), and se(Bhat1-Bhat2) = sqrVar(Bhat1-Bhat2)
Variance of the Prediction Error
The variance in the error that arises when predicting a future value of the dependent variable based on an estimated multiple regression equation.
Asymptotic Normality
Theorem Asymptotic Normality of OLS Under the Gauss-Markov assumptions 1-5: (i)Bhatj is asymptotically normally distributed, (ii)σhat^2 is a consistent estimator of σ^2= Var(u) (iii) For eachj,(Bhatj-Bj)/se(Bhatj) ~A Normal(0,1) where se(Bhatj) is the usual OLS standard error. Still need MLRM5 otherwise the s.e.(Bhatj) will be invalid and the usual t and F tests, and CIs, are not valid The estimated variance of Bjhat, Var(Bjhat), shrinks to 0 at the rate 1/n, which is why larger samples sizes are better We can apply OLS and use the usual inference procedures inapplications where the dependent variable in not normally distributedand we have a large sample
heteroskedasticity-robust standard errors
These standard errors (reported in brackets) are valid whether heteroskedasticity is present or not.
Testing Hypotheses about a Single Parameter The t-test
To test H0 : βj = 0 against any alternative use the t-statistic (or the t-ratio): t(Bhatj)= Bhatj / se(Bhatj) Always reported by STATA
Nonnested Models
Two (or more) models where no model can be written as a special case of the other by imposing restrictions on the parameters.
Normal Sampling Distributions
Under the CLM Assumptions MLR.1, MLR.2, MLR.3, MLR.4, MLR.5 and MLR.6, conditional on the sample values of the independent variables, Bjhat ~ Normal [Bj, Var(Bj)] (Bjhat-Bj)/sd(Bjhat) ~ Normal (0,1)
t Distribution for the Standardized Estimators
Under the CLM Assumptions MLR.1-6, (Bjhat-Bj)/se(Bjhat)~ t(n-k-1)= t(df) where k + 1 is the number of unknown parameters in the population model where j corresponds to any of the k independent variables.
Var of estimators Var(Bhatj)
Using the estimate σˆ2 we can obtain an unbiased estimate of Var(βhatj) by using Varhat(βhatj) = σˆ2/ [SSTj * (1-R^2j)] for j=1,...,k
Homoskedasticity
Var(u|x1, ..., xk ) = σ^2 variance of the error term is unrelated to the explanatory variables Violation: Heteroskedasticity
Under Assumptions 1 to 5, conditional on sample values of regressors, for slopes Var(βhatj)
Var(βhatj)= σ^2/ [SSTj (1-R^2j) for j=1,...,k SSTj = ∑n i=1 (xij- xbarj)^2 SST for xj
The t-test - Signifficance Level
What probability of a Type I error can we live with? Type I error - reject a true null Usually a small number like 0.1, 0.05, 0.01 etc Generically α Set decision rule, reject H0 when t(Bhatj) is in a certain R where R is rejection region
The linear probability model, which is simply estimated by OLS, allows us to explain
a binary response using regression analysis. he OLS estimates are now interpreted as changes in the probability of "success" (y=1) , given a one-unit increase in the corresponding explanatory variable
When the alternative is two-sided, we are interested in the
absolute value of the t statistic. The rejection rule for Hnot: Bj=0 against H1: Bj≠ 0 is | t(bhatj) | > c
The F statistic for the overall significance of a regression tests the null hypothesis that
all slope parameters are zero, with the intercept unrestricted. Under Hnot , the explanatory variables have no effect on the expected value of y.
t distribution allows us to
allows us to test hypotheses involving the Bj. In most applications, our primary interest lies in testing the null hypothesis Hnot: Bj= 0 where j corresponds to any of the k independent variables.
In the presence of heteroskedasticity, t statistic calculated using the normal standard errors (reported in parentheses) are
biased and no longer valid for use in hypothesis testing. Instead, heteroskedasticity-robust standard errors need to be used. These standard errors (reported in brackets) are valid whether heteroskedasticity is present or not.
Under the CLM assumptions, confidence intervals
can be constructed for each Bj. These CIs can be used to test any null hypothesis concerning Bj against a two-sided alternative.
binary variable
categorical variable with only two outcomes
he linear probability model for a binary dependent variable necessarily has a heteroskedastic error term. A simple way to deal with this problem is to
compute heteroskedasticity-robust statistics. Alternatively, if all the fitted values (that is, the estimated probabilities) are strictly between zero and one, weighted least squares can be used to obtain asymptotically efficient estimators.
confidence interval (CI) for the population parameter Bj
confidence intervals are also called interval estimates because they provide a range of likely values for the population parameter, and not just a point estimate. If random samples were obtained over and over again, with Blowerj and Bupperj computed each time, then the (unknown) population value Bj would lie in the interval (Blowerj , Bupperj ) for 95% of the samples. remember that a confidence interval is only as good as the underlying assumptions used to construct it. If we have omitted important factors that are correlated with the explanatory variables, then the coefficient estimates are not reliable: OLS is biased.
the t statistic always has the same sign as the.
corresponding OLS coefficient estimate
According to the rejection rule, in order to determine whether to reject the null hypothesis, the t statistic for the particular coefficient you are testing must be compared to the
corresponding critical value.
a dummy variable is
defined to distinguish between two groups, and the coefficient estimate on the dummy variable estimates the ceteris paribus difference between the two groups.
Dummy variables: The Chow test can be used to
detect whether there are any differences across groups. In many cases, it is more interesting to test whether, after allowing for an intercept difference, the slopes for two different groups are the same. A standard F test can be used for this purpose in an unrestricted model that includes interactions between the group dummy and all variables.
For a two-tailed test, c is
divided in to 2
The R-squared and adjusted R-squared are both
estimates of the population R-squared, which can be written as 1−σ^2u/σ^3y , where σ^2u is the population error variance and σ^2y is the population of the dependent variable. Both of these population variances are unconditional variances and are thus not affected by heteroskedasticity of the error term. Also, because SSR/n is a consistent estimator of σ^2u and SST/n is a consistent estimator of σ^2y regardless of whether the model suffers from heteroskedasticity or not, the R-squared estimate and adjusted R-squared estimate are both consistent estimators of the population R-squared.
There are some applications where MLR.6 is clearly false
ex. Whenever y takes on just a few values it cannot have anything close to a normal distribution
The F statistic is often useful for testing
exclusion of a group of variables when the variables in the group are highly correlated
Type II error
failing to reject a false null hypothesis
In order to perform statistical inference, we need to know more than just the first two moments of Bjhat ; we need to know the
full sampling distribution of the Bjhat . Even under the Gauss-Markov assumptions, the distribution of Bjhat can have virtually any shape.
When data are missing on one or more explanatory variables, one must be careful when computing F statistics by
hand," that is, using either the sum of squared residuals or R-squareds from the two regressions. Whenever possible it is best to leave the calculations to statistical packages that have built-in commands, which work with or without missing data.
Multiple Linear Restrictions Terminology The restricted model
has SSR(r) and Rsquared(r) Note SSR(r) >_SSR(ur) and the Rsquared(r) <_ Rsquared(ur)
Multiple Linear Restrictions Terminology The unrestricted model
has SSR(ur) and Rsquared(ur)
The definition of "sufficiently large," with a 5% significance level, is
he 95th percentile in a t distribution with n − k − 1 degrees of freedom; denote this by c. the rejection rule is that is rejected in favor of at the 5% significance level if t(Bhatj)> c
how to interpret regression equations when the dependent variable is discrete
he key is to remember that the coefficients can be interpreted as the effects on the expected value of the dependent variable.
t statistic measures
how many estimated standard deviations Bhatj is away from the hypothesized value of Bj
We use t statistics to test
hypotheses about a single parameter against one- or two-sided alternatives, using one- or two-tailed tests, respectively. The most common null hypothesis is Hnot: Bj= 0 , but we sometimes want to test other values of Bj under Hnot.
a change in the units of measurement of an independent variable changes the OLS coefficient in the expected manner:
if is multiplied by c, its coefficient is divided by c. If the dependent variable is multiplied by c, all OLS coefficients are multiplied by c. Neither t nor F statistics are affected by changing the units of measurement of any variables.
Allowing for more than two groups is accomplished by defining a set of dummy variables:
if there are g groups, then g − 1 dummy variables are included in the model. All estimates on the dummy variables are interpreted relative to the base or benchmark group (the group for which no dummy variable is included in the model).
t statistic
indicates the distance of a sample mean from a population mean in terms of the estimated standard error
t statistic
indicates the distance of a sample mean from a population mean in terms of the estimated standard error t(Bhatj) = Bhat / se(Bhatj)
The LPM does have some drawbacks:
it can produce predicted probabilities that are less than zero or greater than one, it implies a constant marginal effect of each explanatory variable that appears in its original form, and it contains heteroskedasticity. The first two problems are often not serious when we are obtaining estimates of the partial effects of the explanatory variables for the middle ranges of the data. Heteroskedasticity does invalidate the usual OLS standard errors and test statistics, but, as we will see in the next chapter, this is easily fixed in large enough samples.
To use the F statistic, we must know
its sampling distribution under the null in order to choose critical values and rejection rules.
variance is smallest among ________ ____________ estimators.
linear unbiased
Single Dummy Independent Variable: base group
lwage=β0+δ0female+β1educ+u In this model we have treated males as the base group (against whothe comparisons are made). β0is the intercept for males β0+δ0 is intercept for females δ0 is the difference in intercepts
The F statistic is used to test
multiple exclusion restrictions, and there are two equivalent forms of the test. One is based on the SSRs from the restricted and unrestricted models. A more convenient form is based on the R-squareds from the two models.
Never reject H0 in favour of H1 if t(Bhatj) is
negative.
Bj measures the
partial effect of xj on (the expected value of) y, after controlling for all other independent variables
classical statistical inference presumes that we state the null and alternative about the _______ before looking at the data.
population
it is important to remember that we are testing hypotheses about the
population parameters. We are not testing hypotheses about the estimates from a particular sample. We are testing whether the unknown population value, B1 , is zero.
The variance of βjhat, Var(βjˆ) , is
simply Var(βjˆ)=𝛔^2/[*SSTj*(1−R2j)]
The alternative for F testing is two-sided. In the classical approach, we
specify a significance level which, along with the numerator df and the denominator df, determines the critical value. The null hypothesis is rejected when the statistic, F, exceeds the critical value, c. Alternatively, we can compute a p-value to summarize the evidence against Hnot .
standard errors of the estimates are
sqr of Var(Bhatj) so sqr of Varhat(βhatj) = σˆ2/ [SSTj * (1-R^2j)]
If Hnot is not rejected, we say that "xj is"
statistically insignificant at the 5% level.
If Hnot is rejected in favor of H1 at the 5% level, we usually say that "xj is
statistically significant, or statistically different from zero, at the 5% level."
you can standardize a normally distributed random variable by
subtracting off its mean, in this case βj , and dividing by its standard deviation. The result is a standard normal random variable equal to βjhat−βj/ [sd(βjhat)] distributed normally with a mean of 0 and a variance of 1.
SSRj
sum of squared residuals from the regression of xj the on all the other independent variables,
Based on the best decision rule βj > 0 reject H0 (in favor of H1) α% level of significance if
t(Bhatj) > Critical value at α P(t(Bhatj) > Critical value at α|H0) = = α
if the null is stated as Hnot: Bj =aj Where aj is the hypothesized value of BJ, then the appropiate t statistic is
t= (Bhatj-aj) / se (Bhatj)
The general t statistic is usefully written as
t= (estimate- hypothesized value) / standard error We can use the general t statistic to test against one-sided or two-sided alternatives.
hypotheses involving more than one of the population parameters
test a single hypothesis involving more than one of the Bj
two common ways to test for heteroskedasticity:
the Breusch-Pagan test and a special case of the White test. Both of these statistics involve regressing the squared OLS residuals on either the independent variables (BP) or the fitted and squared fitted values (White). A simple F test is asymptotically valid; there are also Lagrange multiplier versions of the tests.
the Breusch-Pagan test can use either an _______ or an _______ to test the null. I
the Breusch-Pagan test can use either an F statistic or an LM statistic to test the null. In this case, you can use the following formula to form an LM statistic to test the null hypothesis
Assumptions MLR.1-MLR.6, collectively referred to as
the classical linear model (CLM) assumptions
according to the rejection rule, whether to reject the null hypothesis, the t-statistic for the particular coefficient you are testing must be compared to
the corresponding critical value. One-Sided Alternative H1:βj > 0 tβjˆ> c Reject null tβjˆ< c Fail to reject null One-Sided Alternative H1:βj < 0 tβjˆ< -c Reject null tβjˆ> -c Fail to reject null
rhatij
the ith residual from the regression of xj the on all the other independent variables,
The Error Variance: σ^2
the larger is σ^2 the more noise => less precision for all slopes
When computing an F statistic, the numerator df is
the number of restrictions being tested, while the denominator df is the degrees of freedom in the unrestricted model.
What is p-value?
the p-value is the significance level of the test when we use the value of the test statistic a p-value is a probability, its value is always between zero and one. The p-value nicely summarizes the strength or weakness of the empirical evidence against the null hypothesis. the p-value is the probability of observing a t statistic as extreme as we did if the null hypothesis is true.
The R-squared and adjusted R-squared are both estimates of
the population R-squared, which can be written as 1−σ^2u/σ^3y
To obtain c, we only need
the significance level and the degrees of freedom.
It can be shown that the F statistic for testing exclusion of a single variable is equal to
the square of the corresponding t statistic.
General multiple linear restrictions can be tested using
the sum of squared residuals form of the F statistic.
T distribution diffence from normal distribution
the t distribution in comes from the fact that the constant 𝛔 in sd(Bhatj) has been replaced with the random variable 𝛔hat . where j corresponds to any of the k independent variables.
You should remember that a confidence interval is only as good as
the underlying assumptions used to construct it. If we have omitted important factors that are correlated with the explanatory variables, then the coefficient estimates are not reliable: OLS is biased.
the sampling distributions of the OLS estimators depend on
the underlying distribution of the errors.
Hypothesis testing: When a specific alternative is not stated, it is usually considered to be
two-sided
Under MLR.1-MLR.4, OLS parameter estimates βjhat are
unbiased estimators of βj for i=1,...,k . This means that E(βjhat)=βj . These assumptions imply nothing about the variance of βjˆ relative to other linear estimators.
The Bj are
unknown features of the population, and we will never know them with certainty. Nevertheless, we can hypothesize about the value of Bj and then use statistical inference to test our hypothesis. where j corresponds to any of the k independent variables.
the SSR in the denominator of F is the SSR from the
unrestricted model.
how to test whether a particular variable has no partial effect on the dependent variable:
use the t statistic.
Classical hypothesis testing
we first choose a significance level, which, along with the df and alternative hypothesis, determines the critical value against which we compare the t statistic. It is more informative to compute the p-value for a t test—the smallest significance level for which the null hypothesis is rejected—so that the hypothesis can be tested at any significance level.
F statistic can be used to test
whether a group of variables should be included in a model. The F statistic is intended to detect whether a set of coefficients is different from zero, but it is never the best test for determining whether a single coefficient is different from zero.
The t statistic associated with any OLS coefficient can be used to test
whether the corresponding unknown parameter in the population is equal to any given constant (which is usually, but not always, zero)
beta coefficients
which measure the effects of the independent variables on the dependent variable in standard deviation units. The beta coefficients are obtained from a standard OLS regression after the dependent and independent variables have been transformed into z-scores.
population assumptions of the CLM
y|x~normal (B0+B1x+B2x... Bkxk, 𝛔^2) so conditional of x, y has a normal distribution with mean linear in x1... xk and a constant variance
the R-squared is always between
zero and one, whereas the SSRs can be very large depending on the unit of measurement of y, making the calculation based on the SSRs tedious.
SSR/n is a consistent estimator of
σ^2u Population error variance
Variance of OLS Estimators
σˆ2 =[1/ (n-k-1)] * n∑i=1 uˆ2i = SSR/ (n-k-1) where n is the number of obs and k is the number of regressors (apart from const
MLRM6. Var (u|x1,..., xk) is equal to
𝛔^2