Hagrannsóknir
a binary variable is called a
dummy variable
True
Adjusted R Squared can be negative
b1 meaning
how much of the dependent variable varies?
upward bias.
if E(B ̃1) . B1, then we say that B ̃1 has an upward bias
statistically significant
if a regression coefficient is different from zero in a two-sided test, the corresponding variable is said to be...
reject null in F-test
if unconstrained equation fit is better than constrained
In the simple linear regression model y = β0+β1x+u E (u|x) = 0 the regression slope
indicates by how many units the conditional mean of y increases, given a one unit increase in x.
a binary/dummy variable
is a variable that only ever takes the values of 0 or 1
linear probability model
is heteroskedastic by definition
The population parameter in the null hypothesis
is not always equal to zero.
rejection rule
is that H0 is rejected in favor of H1 at the 5% significance level if t>c
data frequency
daily, weekly, monthly, quarterly, and annually.
sum of squared residuals (SSR)
difference between the actual data point (y) and our regression line (y hat)
normal distribution in stata
display normal(z)
Severe Imperfect Multicollinearity
linear functional relationship between two or more independent variables so strong that it can significantly affect the estimations of coefficients.
Linear Regression
means a regression that is linear in the parameters. It may or may not be linear in the variables.
R (bar) ^2 (adjusted R^2)
measures the percentage of the variation of Y around its mean that is explained by the regression equation, adjusted for degrees of freedom.
Skewness
measures the symmetry (Logs can get rid of skewness)
The natural log of X
monotonically increasing transformation of X
a general way to run a higher-order terms specification test; includes squares and possibly higher order fitted values in the regression; failure to reject implies that some combination of higher order terms and interactions of your X variables would produce a better model
regression specification error test (RESET)
RSS
regression sum of squares how much variation does your model explain?
standard error of B
se(B)=σ^/[SST(1-R^2)]^(1/2)
partial effect of a change in X on Y
the coefficient in a regression represents the...
Probit coefficients are typically estimated using:
the method of maximum likelihood
unbiasedness of the error variance
theorem 2.3
unbiasedness of OLS
theorem 3.1
sampling variances of OLS slope estimators
theorem 3.2
unbiased estimator of the error variance
theorem 3.3
omitted variable bias
when the omitted variable is correlated with the included explanatory variables
Consider a regression model of a variable on itself: y = β0 + β1y + u What are the OLS estimators of β and β , βˆ and βˆ ? 0101
βˆ = 0, βˆ = 1
Explained Sum of Squares
the total variation of the fitted Y values around their average (i.e. the variation that is explained by the regression):
Residual Sum of Squares (RSS)
the unexplained variation of the Y values around the regression line: • RSS = Σ(i)(Yi-^Yi)² = Σ^u²
RESET test
regression specification error test tests the null hypothesis that your model is correct high test score, model probably not correct
standard error of ^B1
σ/ √SSTx
variance inflation factor (VIF)
σ^2 Var(B)=-------*(VIF) SST
standard error of the regression (SER)
σ^=sqrtσ^2
Var(~Y)
σ²Y/n
SE(~Y)
^σ~Y=s/√n •standard error or ~Y
regression R^2=
ess/tss
The probit model
forces the predicted values to lie between 0 and 1.
β₁
notation for the population slope
alternative hypothesis
Ho: B/=/0
How to find the OLS estimators ^β₀ and ^β₁
• Solve: ^β₁= sXY/s²X = (Σ(i=1,n)(Xi-~X)(Yi - ~Y))/(Σ(i=1,n)(Xi - ~X)² ^β₀= ~Y - ^β₁~X
Steps in regression analysis step 4
4) Collect the data. Inspect and clean the data
semi-elasticity
=100*b1
This is not actually correct
AIC is defined in terms of the log-likelihood. The likelihood is loosely interpreted as the probability of occurrence of an event
in the model ln(y)=B0+B1X1 +u the elasticity of E(Y|X) with respect to X is
B1X
econometric model.
Crime=B0+B1edu+B2exper+B3training+u
conditional mean of Y
E (Y|X) = β₀ + β₁X
errors have zero expected value (not systematically biasing your estimate by a non-zero amount)
Gauss-Markov assumption 4
null hypothesis
Ho: B=0
Infinity
If p= n-1
Multiple Restrictions
More than one restriction of the parameters in an econometric model.
What does B1 represent
The Slope
What does B0 represent
The intercept
t Statistic
The statistic used to test a single hypothesis about the parameters in an econometric model.
Impact of variance
Weights equalize
regress squared residuals on all explanatory variables, their squares, and interactions; detects more general deviation from hetero than BP test
White test for heteroscedasticity
Residual
^ui=Yi-^Yi=Yi-b₀-b₁Xi
classical assumption 3
all explanatory variable are uncorrelated with the error term
OLS estimates
best fit
Y-hat =
b₀ + b₁X
binary variables
can take on only two values
f distribution in stata
does my model matter? display F(df1, df2, z) high f value means it's unlikely that your model doesn't matter (ergo...it matters)
rij are the squared residuals from a regression (of xj on all other explanatory variables)
heteroskedasticity-robust OLS standard errors
rejection rule
if p-value < alpha, we reject null
MSE
mean square error
ols estimator is derived by
minimizing the sum of squared residuals
correlation coefficient
quantifies correlation/dependence measure of the strength and direction of the linear relationship between two variables
What is the difference between u and uhat?
"u represents the deviation of observations from the population regression line, while uhat is the difference between Wage and its predicted value Wage."
If we can reject the null hypothesis that β1=0 then we say that
"β1 is statistically significant", or "β1 is positive/negative and statistically significant."
residual sum of squares=
(actual - expected)^2
first moment of standardized variable
0
second moment of standardized variable
1
identification solutions
1) generate a variable that we know can't be correlated with other factors because it is random 2) figure out a way of identifying exogenous variation (variation that is random)
Incorporating nonlinearities in simple regression
1. logarithmic transformations of dependent variable 2. logarithmic transformations of both dependent and independent variable
Steps in regression analysis step 2
2) Specify the model: select independent variables and the functional form
Steps in regression analysis step 3
3) Hypothesize the expected signs of the coefficients
Let X be a normally distributed random variable with mean 100 and standard deviation 20. Find two values, a and b, symmetric about the mean, such that the probability of the random variable being between them is 0.99.
48.5, 151.5
Steps in regression analysis step 6
6) Document Results
A multiple regression includes two regressors: Yi = B0 + B1 X1i + B2 X2i + ui What is the expected change in Y if X1 increases by 8 units and X2 decreases by 9 units?
8B1 - 9B2
one-tailed test
>2.1%
Causal Effect
A ceteris paribus change in one variable that has an effect on another variable
Cross-sectional Data Set
A data set collected by a sampling population at a given point in time
Which of the following statements is true of hypothesis testing?
A restricted model will always have fewer parameters than its unrestricted
How to pick the best model in portfolio
Adjusted R Squared Akaikie's (AIC) Schwar'z Information Criterion Hannan-Quinn's Information Criterion
less than or equal to R squared
Adjusted R squared is
Rbar^2
Adjusted R^2 for adding another independent variable because R^2 will never decrease if another independent variable is added
Of the following assumptions, which one(s) is (are) necessary to guarantee unbi- asedness of the OLS estimator in a multiple linear regression context? a) Linearity of the model in the parameters. b) Zero conditional mean of the error term. c) Absence of perfect multicollinearity. d) Homoskedasticity of the error term. e) Random sampling.
All except d
Assumption 3
All explanatory variables are uncorrelated with error term
Assumption 3
All explanatory variables are uncorrelated with the error term
Two sided test
Alternative hypothesis has values on both sides of the null hypothesis
the slope estimate
B1=Δy/Δx
simplest way to test for heteroskedasticity; regress residuals on our Xs
Breusch-Pagan test
Perfect Multicollinearity
Cannot invert (X'X) matrix
Dummy Variables
Changes due to something Seasons Strikes War/Flood/Storm
different units households, firms, industries, states, countries Difference in those characteristics across units that result in variation in the disturbance terms
Cross-sectional data is concerned with
Retrospective Data
Data collected based on past, rather than current, information.
Residual (e)
Difference between the estimated vale of the dependent variable and the actual value of the dependent variable e=Y-Y^
Type 2 error
Do not reject a false hypothesis
An unbiased estimator with the smallest variance is
EFFICIENT
Property two of variance
For any constants a and b, Var(aX + b) = a^2Var(x)
one-sided alternative hypothesis
H1: Bj>0
Stationery
If both the mean and variance are finit and constant, then the time series is said to be
Rejection Rule
In hypothesis testing,the rule that determines when the null hypothesis is rejected in favor of the alternative hypothesis.
Sum of Squared Residuals (SSR)
In simple and multiple regression analysis, the sum of the squared OLS residuals across all observations.
Goodness of fit 3
Is data set reasonably large and accurate!
Goodness of fit 1
Is equation supported by theory
T-test
Is the appropriate test to use when the stochastic error term is normally distributed and when the variance of that distribution must be estimated
Sampling Distribution
Just as the error term follows a probability distribution, so too do the estimates of β. In fact, each different sample of data typically produces a different estimate of β. The probability distribution of these βN values across different samples is called the sampling distribution of 𝛃hat N
Simple Regression
Linear regression model with one regressor
Method one: Method of Moments
MOM estimators are constructed by replacing the population moment, such as µ with its unbiased sample counterpart, such as ¯x. For the method of moments estimators, we use the sample counterparts to choose our estimates of β0 and β1.
Variance
Measures how dispersed the data is. How far away x is from the population mean
Correllation
Measures how linear the data is (how related the variables x and y are)
The detection of Multicollinearity
Multicollinearity exists in every equation. • Important question is how much exists. • The severity can change from sample to sample. • There are no generally accepted, true statistical tests for multicollinearity.
Panel/Longitudial data
Multiple individuals Multiple time periods example Vouchers for low income to go to higher income Tracked those people over time
Does ommitting a variable in our regression always cause OMMITTED VARIABLE BIAS
No
Assumption 6
No explanatory variable is a perfect linear function of any other explanatory variable (no perfect multicollinearity)
Exogenous variables
Not correlated with the error term
adding independent variables
R sqaured can be made arbitrarily large by
overall significance of the regression
R^2/k ----------------------- (1-R^2)/(n-k-1)
Assumption 1
Regression is linear, correctly specified and had error term
Type 1 error
Reject a true hypothesis
Type 1 error
Reject the null hypothesis that is actually true
Chow test
Run separate regressions, the new unrestricted SSR is given by the sum of the SSR of these two separate regressions, then just run a regression for the restricted model
Coefficient of Determiniation (R²)
R² = ESS/TSS = 1- RSS/TSS
Response variable
See dependent variable
Binary variables
Success or failure x=1 or x=0
Which of the following tests helps in the detection of heteroskedasticity?
The Breusch-Pagan test
True Model
The actual population model relating the dependent variable to the relevant independent variables, plus a disturbance, where the zero conditional mean assumption holds.
Variance inflation factor
The amount by which the variance is inflated due to multicollinearity
Consistency
The first asymptotic property of estimators concerns how far the estimator is likely to be from the parameter it is supposed to be estimating as we let the sample size increase indefinitely
Data Frequency
The frequency at which time series data are collected. Yearly, quarterly, and monthly are the most common data frequencies
OlS Intercept Estimate
The intercept in an LS estimation.
The residual
The residual for observation i is the difference between the actual yi and its fitted value ^ui = yi- ^yi -= yi - ^B0 - ^B1xi these are not the same as errors in our population regression function
Sample Variation in the explanatory variable
The sample outcomes on x are not all the same value
Dependent Variable
The variable to be explained a multiple regression model
Weighted Least Squares
This method of estimation in which the data, both the Xs and Y are weighted by a factor depending on the VGP
T/F Although the expected value of a random variable can be positive or negative its variance is always non-negative.
True, follows from the definition of the variance of a random variable.
What does the measure of goodness of fit tell us?
Useful for evaluating the quality of the regression and comparing models that have different data sets or combinations of independent variables
Method two: Least Squares Estimator
Using calculus, we set the derivative wrt the model parameters of the objective function, the sum of the squared error, equal to zero
It is binomially distributed
Which of the following assumptions about the error term is not part of the so called Classical Assumptions?
linear in the variables
X vs Y gives a straight line
population simple linear regression model
Y = β₀ + β₁X
The OLS residuals, u i, are defined as follows:
Yi - Yhat i
Confidence Interval for β₁
[^β₁+Z(α/2) x SE(^β₁), ^β₁+ Z(1-α/2) x SE(^β₁)]
An estimate is:
a nonrandom number.
AIC (akaike's information criterion)
adjusts RSS for the sample size and K; the lower it is, the better the equation; penalizes more for irrelevant variables that Ramsey does
We would like to predict sales from the amount of money insurance companies spent on advertising. Which would be the independent variable?
advertising.
As the sample size increases the variance for B1 ______
decreases
ols residuals, ui, are sample counterparts of the population
errors
the fixed effects regression model
has n different intercepts
univariate model
has u as the error term (aka the unknown that isn't accounted for in the y = b0 +b1x1 model)
Xi
independent variable, or regressor, or explanatory variable, or covariate
β₀
intercept. • It measures the average value of Y when X equals zero (this is often not economically meaningful by itself)
The question of reliability/unreliability of a multiple regression depends on:
internal and external validity.
"Changing the units of measurement that is, measuring test scores in 100s, will do all of the following except for changing the:"
interpretation of the effect that a change in X has on the change in Y
The correlation coefficient
is a measure of linear association.
The population correlation coefficient between two variables X and Y
is a measure of linear association.
The size of the test
is the probability of committing a type I error.
omitted variable bias
is the regressor (the student- teacher ratio) is correlated with a variable that has been ommitted from the analysis( the percent of english learners) and that determinines in part the dependent variable ( test scores) , then the OLS estimator will have omitted bias
one of the primary advantages of using econometrics over typical results from economic theory is
it potentially provides you with quantitative answers for a policy problem rather than simply suggesting the direction of the response
constant elasticity model.
log(yi)=b0+b1xi+ui log(c1yi)=[log(c1)+b0]+b1xi+ui .
semi-elastic
log-level 100 x beta j
elastic
log-log
The Adjusted R^2 is always ___________ than the R^2
lower
maximum likelihood estimation yields the values of the coefficients that
maximize the likelihood function
sum(ax1+B) equals
n x a x X with a bar + n x b
denominator degrees of freedom
n-k-1=df
proxy variable approach
need correlated x and uncorrelated u best proxy is a lagged version of the same thing
M (in the F-test)=
number of constraints
Experimental data
often collected in laboratory environments in the natural sciences, but they are much more difficult to obtain in the social sciences.
Sources of variation in Y other than Xs:
omitted variation, measurement errors, functional form problems, random variation/unpredictability
respondent bias
people misrepresent themselves and give answers they think you want to hear
one way of dealing with omitted variables (but imperfect); think of running them as something more like a specification test, or a test for the influence of possible omitted variable bias (worried about nonrandom sampling)
proxy variables
correlate in stata
pwcorr
increasing the sample size
reduces the variance
regress in stata
regress y x
if the absolute value of your calculated t statistic exceeds the critical value from the standard normal contribution, you can
reject the null hypothesis
p value
smallest significance at which the null would be rejected
p value
smallest significance level at which the null hypothesis would still be rejected; small value is evidence against the null hypothesis
efficient estimator
smallest variance and unbiased
cross-sectional
snapshot at a point in time
σ/ √SSTx
standard error of ^B1
central limit theorem (asymptotic law)
states that given a sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be approximately equal to the mean of the population
measures how many estimated standard deviations the estimated coefficient is away from zero
t statistic
average treatment effect
tells us what on average our unit treatment effect is (the average difference between all the treated (Z=1) and control (Z=0) groups)
F-Test
test the naive model against the sophisticated model
structural estimation
testing the model and its assumptions deliberately and explicitly
regression
tests the relationship between variables
A large p value implies
that observed value Y bar is consistant with null hypothesis
first order conditions
the OLS first order conditions can be obtained by the method of moments: under assumption E(u)=0 and E(xu)=0, where j=1, 2, ..., k. The equations are the sample counterparts of these population moments, although we have omitted the division by the sample size n.
We are able to estimate the true β₀ and β₁ more accurately if we use
the OLS method rather than any other method that also gives unbiased linear estimators of the true parameters. This is because the OLS estimators are (BLUE)
In the case of errors-in-variables bias, the precise size and direction of the bias depend on
the correlation between the measured variable and the measurement error.
Heteroskedasticity is a problem with the
the error term
R^2 is a measure of
the goodness of fit of a line
Comparing the California test scores to test scores in Massachusetts is appropriate for external validity if
the institutional settings in California and Massachusetts, such as organization in classroom instruction and curriculum, were similar in the two states.
OLS sampling errors
the variance in our estimator lets us infer the accuracy of a given estimate
heteroskedasticity
the variance of the error term increases as X increase, violating Assumption V
1. If the dependent variable is multiplied by a constant,
then the OLS intercept and slope estimates are also multiplied by that constant
normal sampling distributions
theorem 4.1
the slope estimator B1 has a a smaller standard error, other things equal, if
there is more variation in the explanatory variable X
The OLS slope estimator, β1, has a smaller standard error, other things equal, if
there is more variation in the explanatory variable, x
measurement error inconsistency
this factor (which involves the error variance of a regression of the true value of x1 on the other explanatory variables) will always be between zero and one; implies we are consistently biased towards zero
rejecting a true hypothesis
type 2 error
serial correlation
violates Classical Assumption IV because observations of the error term ARE correlated (based on past data)
causal effect
when one variable has an effect on another variable
well identified model
when we can argue that an estimator is likely to yield an unbiased estimate of our parameter of interest (one can correctly "identify" what plays a causal role and what does not)
ceteris paribus
which means "other (relevant) factors being equal"
participation bias
whoever chooses to participate is inherently biased
Non-Linear function of the independent variables examples
• Yi = β₀+β₁X²i+ui • Yi = β₀+β₁1/Xi+ui • Yi = β₀+β₁log(Xi)+ui
Distribution of OLS estimators
• ^β₁~ N(β₁, Var(^β₁)) • ^β₀~ N(β₀, Var(^β₀))
Median
"central tendency" of data Half of the data is above and the other half of the data is below
F=
((RSSm-RSS)/M)/(RSS/(N-K-1))
For n = 25, sample mean =645, and sample standard deviation s = 55, construct a 99% confidence interval for the population mean.
(614.23, 675.77)
adjusted R^2=
(ESS/(n-k-1)/(TSS/(n-1)
t-statistic
(^β₁-β₁)/SE(^β₁)
State whether the following statements about heteroskedasticity are true or false. If false - give reasons why. (a) Heteroskedasticity occurs when the disturbance term in a regression model is correlated with one of the explanatory variables.
(a) False - Heteroskedasticity occurs when the variance of the disturbance term is not the same for all observations.
total sum of squares=
(actual - average)^2
(t/f) When R2 = 1 F=0 and when R2=0 F=infinite.
(c) False. Follows from formula see question 18 part c answer
Adjusted R²
1-((n-1)/(n-k-1))((RSS)/(TSS))
Properties of R²
1. 0 ≤ R² ≤ 1 2. For the simple regression, R² = ρ²YX
Non linear functions that model economic phenomena
1. Diminishing marginal returns 2. Estimating the price elasticity of demand
FOUR Assumptions required for unbiasedness of OLS
1. Linear in parameters 2. Random Sampling 3. Sample variation in independent variable 4. Zero Conditional mean
Randomized Controlled experiement SRTE
1. Sampling 2. Randomization, random assignment 3. treatment 4. Estimation
Steps for finding covariance
1. find x_bar and y_bar 2. find dx and dx (x-x_bar) (y-y_bar) 3. find (dx*dy) 4. Sum 5. Divide by n-1
OLS will have omitted variable bias when two conditions hold
1. when the omitted variable is correlated with the included regressor 2. when the omitted variable is a determinant of the dependent variable another way to say it 1. X is correlated with the omitted variable 2. The omitted variable is a determinant of the dependent variable Y
sample covarience
1/n-1 * sum(xi-¯x)(yi-¯y)
New assumption
P independent variables are linearly independent Can't write one independent variable as a linear combination of other p-1 variables
Heteroskedasticity
Patterened error terms
Steps for hypothesis test
State null hypothesis and alternative Calculate value of test statistic (z or t depending) Draw a picture of what Ha looks like, find p value State sentence
Null hypothesis (H0)
Statement of the values that the researcher does not expect
Alternative hypothesis (HA)
Statement of the values that the researcher expects
linear in parameters
assumption SLR.1
they are not independent and each will have a different distribution conditional on the other
if knowing something about random variable B tells you something about variable A then...
reject
if | t-statistic |>critical value, we ___________ the null hypothesis
log just the independent variable
impact of independent variable increases at a decreasing rate
how do you control for the person specific and time specific effects in panel regression data
include period specific and time specific variables in the regression
may increase sampling variance
including irrelevant variables...
irrelevant variable added
increase variance, decrease t-scores, usually decrease adjusted R^2 but not regular R^2
log just the dependent variable
increasing in independent variable causes dependent variable to increase at an increasing rate
whether two random variables are related (or not)
independence and conditionality are how we discuss...
Because of random sampling, the observations (Xi, Yi) are
independent of each other and identically distributed (they are i.i.d.)
In the simple linear regression model, the regression slope
indicates by how many units Y increases, given a one unit increase in X.
In the simple linear regression model y = β0+β1x+u E (u|x) = 0 the regression slope
indicates by how many units the conditional mean of y increases, given a one unit increase in x.
a small p value
indicates evidence against the null hypothesis
estat ovtest
is my model ommiting variables?
if the absolute value of the calulated t statistic exceeds the critical value from the standard normal distribution you can
reject the null hypothesis
To obtain the slope estimator using the least squares principle, you divide the
sample covariance of X and Y by the sample variance of X.
In the simple regression model y = β0 + β1x + u, to obtain the slope estimator using the least squares principle, you divide the
sample covariance of x and y by the sample variance of x.
standard error of Bˆ1
se(B1)=σ/sqrt(SST)=σ/[∑(x-x ̄)^2]^(1/2)
consistency of OLS (estimator is consistent for a population parameter)
theorem 5.1
asymptotic normality of OLS
theorem 5.2
The interpretation of the slope coefficient in the model Yi = β0 + β1 ln(Xi) + ui is as follows:
1% change in X is associated with a β1 % change in Y.
Stochastic error term
A term that is added to a regression equation to introduce all of the variation in Y that cannot be explained by the included Xs
F statistic
is used to test joint hypothesis about regression coeff two restrictions B1=0 b2=0 F stat : is the average of the squared t stat . if t stat are uncorrelated than F=.5(t1^2 +t2^2 )
the assumption of constant variance of the error term
is violated by the linear probability model
the r squared of the restricted model
is zero by definition
VALID INSTRUMENT : Z
istrument variable Z to isolate that part of X that is correlated with u
Fixed effect
it captures anything that is specific to an indiv but does not change over time
Testing the hypothesis that β1 = 0 is somewhat special, because
it is essentially a test of whether or not Xi has any eect on Yi .
if people are not honest in a survey
it may lead to a non-random measurement error
regression of Y on an indicator variable for treatment X which takes on the value 1 when treatment occurred and 0 otherwise; 1 if person is a woman, 0 if person is a man
dummy variable
investigator bias
easy to influence the outcome via subjective lens
OLS is best and unbiased (under assumptions MLR.1 - MLR.5 ); want unbiased estimator with the smallest variance (doesn't have to be OLS)
efficiency of OLS
under assumptions MLR.1 - MLR.5, OLS is best and unbiased; want unbiased estimator with the smallest variance (doesn't have to be OLS)
efficiency of OLS
When testing joint hypotheses, you can use
either the F-statistic or the chi-squared statistic.
Of the following assumptions, which one(s) is (are) necessary to guarantee unbi- asedness of the OLS estimator in a multiple linear regression context? a) Linearity of the model in the parameters. b) Zero conditional mean of the error term. c) Absence of perfect multicollinearity. d) Homoskedasticity of the error term. e) Random sampling.
All of the above except d).
In a random sample
All the individuals or units from the population have the same probability of being chosen
In a random sample:
All the individuals or units from the population have the same probability of being chosen.
Hannan Quinn Information Criterion
Alternative to AIC Used Infrequently Look for Model with smallest H-Q Value
Degrees of freedom
Excess of the number of observations (N) over the number of coefficients estimated (K+1)
F-statistic
F = ((RSSC-RSSU)/m)/(RSSU/(n-k-1)) ~ F(m,n-k-1) where m=# of linear restrictions and k=number of regresoors in the unconstrained model
Denote SSRr the sum of squared residuals of a restricted model and SSRur the sum of squared residuals of an unrestricted model, n the sample size and k the number of regressors of the unrestricted model. All the following are correct formulae for the F-statistic for testing q restrictions with the exception of
F = ((SSRur−SSRr)/q)/(SSRr/(n-k-1))
Confidence interval
Set of values at which we fail to reject the null
Upward Bias
The expected value of an estimator is greater than the population parameter value.
Consider a regression model where the R2 is equal to 0.2359 with n = 46 and k = 5. You are testing the overall significance of the regression using the F -Test. What is the p-value of the test?
The p-value is approximately 5% (but a little bit smaller than that).
Intercept Parameter (Constant)
The parameter in a simple and multiple linear regression model that gives the expected value of the dependent variable when all the independent variables equal zero.
The effect of a one unit increase in X on Y is measured by
The slope of a line relating Y to X. It is β₁in the equation Y=β₀+β₁X • β₁= ΔY/ΔX
p-Value
The smallest significance level at which the null hypothesis can be rejected. Equivalently, the largest significance level at which the null hypothesis cannot be rejected.
σ^2
population variance
Park test
residuals from estimated regression, form dependent variable Z from the log, run regression against log of dependent variable, if t-test is large...proof of heteroskedasticity
In the case of errors-in-variables bias
the OLS estimator is consistent if the variance in the unobservable variable is relatively large compared to the variance in the measurement error.
BLUE
the OLS estimator with the smallest variance in the class of linear unbiased estimators of the parameters
In a randomized controlled experiment
there is a control group and a treatment group.
beta 1 hat has a smaller standard error, ceteris paribus, if
there is more variation in the explanatory variable, x
The OLS slope estimator, β1, has a smaller standard error, other things equal, if
there is more variation in the explanatory variable, x.
if the significance level is made smaller and smaller
there will be a point where H0 cannot be rejected
"If the ommitted variable biased exists, what is the effect, if any, on our regression estimations (the numbers Stata gives us for the Betas/coefficients)"
they are biased and inconsistent
discrete variable
variable with countable outcomes within an interval i.e rolling a dice
Endogenous Variable
variables correlated with the error term
1) often mitigates influence of outlier observations 2) can help to secure normality and homoscedasticity
why take logs?
1) weakens influence of outlier observations 2) can help secure normality and homoscedasticity
why take logs?
1) one sided tests imply a certain amount of certainty in one's model 2) set the bar for lower significance
why use two sided tests?
In the multiple regression model, the adjusted R2, R2
will never be greater than the regression R2
Let y be the fitted values. The OLS residuals, u , are defined as follows:
y −y₋
Properties of the Adjusted R²
• Adjusted R² ≤ R² • Adding a regressor to the model has two opposite effects on the adjusted R²: 1. the SSR falls, which increases the adjusted R² 2. the factor (n-1)/(n-k-1) increases, which decreases the adjusted R² (Note: Whether the adjusted R² increases or decreases depends on which of the two effects is stronger) • The R² is always positive, but the adjusted R² can sometimes be negative • The adjusted R² can be used to compare two regressions with the same dependent variable but different number of regressors
The bias of ^β₁is
• Cov(Xi,ui)/Var(Xi) • does not depend on the sample size
Goodness of fit 4
Is OLS best estimator
Ordinary Least squares
Is a regression estimation technique that calculates the B^ so as to minimize the sum of the squared residuals
Slope Dummy
A quantitative variable and dummy variable interaction is typically called a
Type I Error
A rejection of the null hypothesis when it is true.
Continuous random variable
A variable that takes on any real value with zero probability. Speed of a car No way to tell what speed the car will be going at a particular moment
Dummy Variable, or binary variable,
A variable that takes on only the values 0 or 1
Which of the following is true of the standard error the OLS slope estimator, s.e. β ?
It is an estimate of the standard deviation of the OLS slope estimator.
Properties of variance
It is desirable for sampling distribution to be as narrow (as precise) as possible
Perfect Multicollinearity
It is the case where the variation in one explanatory variable can be completely explained by movements in another explanatory variable.
total sample variation in explanatory variable xj (converges to n * var(xj)); more sample variation leads to more precise estimates so just increase sample size (good variation)
SST (variance)
How to get a log out of an equation
Take the e^(number on other side) log(y)=10.6 y=e^10.6
Slope Parameter
Te coefficient on an independent variable in a simple and multiple regression model
Elasticity
The percentage change in one variable given a 1% ceteris paribus increase in another variable.
no multicollinearity (model is estimating what it should be)
assumption MLR.3
Fitted value
We can define a fitted value for y when x=xi as ^yi=^B0+^B1xi This is the value we predict for y when x=xi for the estimated intercept and slope
Consider a regression model where the R2 is equal to 0.257 with n = 33 and k = 3. You are testing the overall significance of the regression using the F-Test. Which of the statements below is correct?
We can reject the null hypothesis at the 5% level of significance but not at the 1% level of significance.
Take an observed (that is, estimated) 95% confidence interval for a parameter in a multiple linear regression model. Then:
We cannot assign a probability to the event that the true parameter value lies inside that interval.
Y =
average Y for given X + Error
constant elasticity model
both variables are in logarithmic form
Irrelevant Variable
a variable in an equation that doesn't belong there Decreases R(bar)^2
correlation between X and Y
can be calculated by dividing the covariance between X and Y by the product of the two standard deviations
internal validity of the model
concerns about identification within an econometric model
causes of heteroskedasticity
1) misspecification (omitted variables) 2) outliers 3) skewness 4)incorrect data transformation 5) incorrect functional form for the model 6) improved data collection procedures
IV procedure
1) regress X on Z 2) take predicted X values 3) regress y on predicted Xs (if Z only influences Y through X, then we have identified variation in X that is NOT jointly determined)
What is captured by the error term?
1. Omitted or left out variables 2. measurement error in data 3. different functional form than the chosen regression 4. unpredictable events or purely random variation
What conditions are necessary for an omitted variable to cause OVB?
1. The omitted variable has an effect on Y (i.e. β₂is different from O). 2. The omitted variable is correlated with one of the regressors (in our example, Ai is correlated with Xi)
Omitted Variable Bias (OVB) occurs when
Assumption 1 of OLS is violated i.e. when E (ui|Xi=xi) depends on the value of xi
False
Autocorrelation occurs primarily with cross-sectional data
Law of large numbers
Cannot survey the entire population, too expensive and will take too long The larger the sample size, the closer you will get to the actual population
Type II Error
Failing to reject the null hypothesis when it is, in fact, false
(t/f) The adjusted and unadjusted R2s are identical only when the unadjusted R2 is equal to 1.
True. see question 18 answer
zero conditional mean assumption
When we combine mean independence with assumption E(u/x)=0
Var(aX + bY ) is
a2σX2 +2abσXY +b2σY2
Nonexperimental data
accumulated through controlled experiments on individuals, firms, or segments of the economy. (called observational data, or retrospective data) emphasize the fact that the researcher is a passive collector of the data
Let σX2 be the population variance and X be the sample mean. V ar(X) is:
equal to σX2 divided by the sample size.
Let σX2 be the population variance and X be the sample mean. Var(X) is
equal to σX2 divided by the sample size.
example of time series data
studying inflation in us from 1970 to 2006
SSE
sum of squared errors how much of reality does your model not explain?
SSxy
sum of the cross-products
SSE
sum of the squared errors
significance level
the probability of rejecting H0 when it is in fact true.
causality
the relationship between cause and effect
In testing multiple exclusion restrictions in the multiple regression model under the Classical Linear Model assumptions, we are more likely to reject the null that some coefficients are zero if:
the residuals sum of squares of the restricted model is large relative to that of the unrestricted model.
ρ²YX
the squared sample correlation coefficient between Y and X
Sample mean
unbiased estimator for population mean
U(Residual error)
y - y(hat) Difference between the estimated and the population
Non-linear in the parameters
• Yi = β₀+β²₁Xi • Yi = β₀+ß₀β₁Xi
Linear function of the independent variables:
• Yi = β₀+β₁Xi+ui
what kind of p-value to reject the null?
SMALL
SST formula
SSE + SSR = SST
the explained sum of squares (SSE)
SSE=∑(Y^-Y)^2
the residual sum of squares (SSR)
SSR=∑u^2
Predictor Variable
See explanatory variable
Control Variable
See explanatory variables
Economic Significance
See practical experience
Residual Sum of Squares (SSR)
See sum of squared residuals
t Ratio
See t statistic
homoskedasticity
Variance of disturbance term is constant
Heteroskedasticity
Variance of disturbance term is not constant
3. The goodness of fit model, as measured by the R-squared
does not depend on the units of measurement of our variables
Low R2
doesnt mean OLS is useless, it is still possible that our OLS is a good estimate
if MLR3 is violated with a perfect linear function
drop one of the independent variables
A binary variable is often called a:
dummy variable
How do we interpret B2 in words Yi = B0 + B1X1i + B2X2i + ui
"A 1 unit increase in X2 is associated with a B2 increase in Y on average, holding all else constant"
"Suppose that a researcher, using wage data on 235 randomly selected male workers and 263 female workers, estimates the OLS regression Wage = 11.769 + 1.993 × Male, Rsq= 0.04, SER= 3.9, with the respective standard errors for the constant and the Male coefficient of (0.2162) and (0.3384). Wage is measured in dollars per hour and Male is a binary variable that is equal to 1 if the person is a male and 0 if the person is a female. What does an Rsq of 0.04 mean ?"
"It means using our explanatory variable, we are only able to explain 4% of the variation in the dependant variable"
SSE/SST
% explained variation (equation)
R-squared form of the F statistic.
(Rur^2-Rr^2)/q F=----------------------- (1-Rur^2)/dfur
F statistic
(SSRr-SSRur)/q F=------------------------ SSRur/(n-k-1)
698.9-2.28 x STR. you are told t statistic on the slope coefficient is 4.38. what is standard error of the slope coefficient?
.52
Consider the following regression line: testscore = 698.9 - 2.28 × STR. You are told that the t-statistic on the slope coefficient is 4.38. What is the standard error of the slope coefficient?
.52
A box has 20 screws, three of which are known to be defective. What is the probability that the first two screws taken out of the box are both defective?
0.0158
Find the probability that a standard normal random variable has a value greater than -1.56.
0.9406
Steps in regression analysis step 1
1) Review literature and develop theoretical model
partialling out
1) regress the explanatory variable X1 on all other explanatory variables 2) regress Y on X's 3) regress the residuals of Y on the residuals of X1
Steps in regression analysis step 5
5) Estimate and evaluate the equation
Confidence Interval for a Single Coeeff in Multiple Regression
95% CI (Bj - 1.96SE(Bj) , Bj+1.96SE(Bj)
-5
A fitted regression equation is given by Yb = 20 + 0.75X. What is the value of the residual at the point X=100, Y=90?
One-Tailed Test
A hypothesis test against a one-sided alternative.
Zero Conditional Mean Assumption
A key assumption used in regression analysis that states that, given any values of the explanatory variables, the expected value of the error equals zero.
Multiple Linear Regression Model
A model linear in its parameters, whee the dependent variable is a function of independent variables plus an error term.
Empirical Analysis
A study that uses data in a formal econometric analysis to test a theory, estimate a relationship, or determine the effectiveness of a policy.
Best Linear Unbiased Estimator (BLUE)
Among all linear unbiased estimators, the one with the smallest variance. OLS is BLUE, conditional on the sample values of the explanatory variables, under the Gauss-Markov assumptions.
Classical Error Term
An error term satisfying Assumptions I through V (of the classical assumptions) (Called classical normal error term if assumption VII is added)
(a) What is meant by an unbiased estimator?
An estimator is unbiased if the mean of its sampling distribution equals the true parameter. The mean of the sampling distribution is the expected value of the estimator. Thus lack of bias means that E(theta^) = (theta) , where (theta^) is the estimator of the true parameter, (theta) . This means that in repeated random sampling we get on average the correct estimate.
Experiment
Any procedure that can, in theory, be infinitely be repeated and has a well defined set of outcomes
Goodness of fit 6
Are obviously important varaibles included
micronumerosity
Arthur Goldberger defines as the "problem of small sample size."
Consider the multiple regression model with two regressors X1 and X2, where both variables are determinants of the dependent variable. You first regress Y on X1 only and find no relationship. However, when regressing Y on X1 and X2, the slope coefficient β1 changes by a large amount. This suggests that your first regression suffers from:
B. omitted variable bias.
Specific form
B0+B1R1+B2M1+E
95% confidence interval for B0 is interval
B0-1.96SE (B0), B0+1.96SE(B0)
OLS regression line/ sample regression function (SRF) y^=B0+B2x2+...+Bkxk
B0= OLS intercept estimate Bˆ1,..., bˆk= OLS slope estimates
"A professor decides to run an experiment to measure the effect of time pressure on final exam scores. He gives each of the 400 students in his course the same final exam, but some students have 90 minutes to complete the exam while others have 120 minutes. Each student is randomly assigned one of the examination times based on the flip of a coin. Let Yi denote the number of points scored on the exam by the ith student ( 0 less than or equl to Yi less than or equal to 100), let Xi denote the amount of time that the student has to complete the exam (Xi = 90 or 120), and consider the regression model Yi = Beta0 + Beta1 Xi + ui , E( ui) = 0 The Least Squares Assumptions Reminder 1. The error term ui has conditional mean zero given Xi : Yi = Beta0 + Beta1 Xi + ui , i = 1,..., n where E (u|Xi)= 0; 2. Xi ,Yi, i = 1,..., n, are independent and identically distributed (i.i.d.) draws from their joint distribution; and 3. Large outliers are unlikely: Xi and Yi have nonzero finite fourth moments. Assuming this year's class is a typical representation of the same class in other years, are OLS assumption (2) and (3) satisfied?"
Both OLS assumption #2 and OLS assumption #3 are satisfied
Formula for correlation
Cov(x,y)/Sx * Sy s=standard deviations
Consider a regression with two variables, in which X1i is the variable of interest and X2i is the control variable. Conditional mean independence requires
E(ui |X1i, X2i) = E(ui |X2i)
Zero Conditional Mean assumption
E(u|x)=0
population regression function (PRF) E(y/x)=Bo+B1x
E(y/x), is a linear function of x. The linearity means that a one-unit increase in x changes the expected value of y by the amount b1. For any given value of x, the distribution of y is centered about E(y/x), a
Goodness of fit 8
Free of exonomic problems?
sampling is random (not biasing your estimate through selection)
Gauss-Markov assumption 2
Instrument variables
IV regression uses these additional variable tools to isolate the movements of X that are uncorrelated with u , which in turn permit consistent estimation of the regression coeff
statistically insignificant
If H0 is not rejected, we say that "xj is statistically insignificant at the 5% level."
multicollinearity
High (but not perfect) correlation between two or more independent variables
Two lines of hypothesis test
Ho=null hypothesis (statement being tested) Ha=the statement we hope or suspect is true instead
Kurtosis
How "thick" are tails of the data How many observations are in the tail of the data
R^2
How much variation in y is explained by a variation in x
Quadratic Functions
How to capture diminishing returns y=b0+b1x+b2x^2 To approximately determine the marginal effect of x on y we use delta(y)=(b1+2b2x)(deltax)
Goodness of fit 5
How well do coeffcients. Correspond to expectations developed by the researcher before data were collected
Goodness of fit 2
How well does the estimate fit data
The estimated effect of gender gap is statistically significant at the: I. 5% level II. 1% level III. 0.01% level
I,II,III
Intheestimatedmodellog(q)=502.57−0.9log(p)+0.6log(ps)+0.3log(y),where p is the price and q is the demanded quantity of a certain good, ps is the price of a substitute good and y is disposable income, what is the interpretation of the coefficient on p? (Assume that the Gauss-Markov assumptions hold in the population model.)
If the price increases by 1%, the demanded quantity is estimated to be 0.9% lower, ceteris paribus
In the estimated model log(q)=502.57−0.9log(p)+0.6log(ps)+0.3log(y),where p is the price and q is the demanded quantity of a certain good, ps is the price of a substitute good and y is disposable income, what is the meaning of the coefficient on p? (Assume that the Gauss-Markov assumptions hold in the theoretical model)
If the price increases by 1%, the demanded quantity will be 0.9% lower on average, ceteris paribus.
heterosckedasticity
If the variance of the error term depends on x, the error term exhibits heteroskedasticity (non constant error value)
Standard Error of the Regression (SER)
In a simple and multiple regression analysis, the estimate of the standard deviation of the population error, obtained as the square root of the sum of squared residuals over the degrees of freedom
R-squared
In a simple or multiple regression model, the proportion of the total sample variation in the dependent variable that is explained by the independent variable.
Denominator Degrees of Freedom
In an F test, the degress of freedom in the unrestricted model.
Numerator Degrees of Freedom
In an F test,the number of restrictions being tested.
Null Hypothesis
In classical hypothesis testing, we take this hypothesis as true and require the data to provide substantial evidence against it.
are typically random
In distributed lag models, both X and Y
Degrees of Freedom (df)
In multiple regression analysis, the number of observations minus the number of estimated parameters.
Perfect Collinearity
In multiple regression, one independent variable is an exact linear function of one or more other independent variables.
Explanatory Variable (Independent Variable)
In regression analysis, a variable that is used to explain variation in the dependent variable
Intercept
In the equation of a line, the value of the y variable when the x variable is zero.
is measured at different points in time and it is the relationship between the measurments that is the issue
In time series data, the same unit
For an instrument Z to be valid, it must satisfy two conditions:
Instrument relevance: Corr (Zi ,Xi ) 6= 0 Instrument exogeneity: Corr (Zi , ui ) = 0
decision rule
Is a method of deciding whether to reject a null hypothesis
Critical values
Is a value that divides the acceptance region from the rejection region when testing a null hypothesis
Why does the dependent variable appear in logarithmic form?
It allows us to approximate a constant percentage change in the dependent variable, y, due to a change in the independent variable,x
In the estimated model log(q)=502.57−0.9log(p)+0.6log(ps)+0.3log(y),where p is the price and q is the demanded quantity of a certain good, ps is the price of a substitute good and y is disposable income, what is the meaning of the coefficient on ps?
It is the cross-price elasticity of demand in relation to the substitute good and it bears the expected sign.
Intheestimatedmodellog(q)=502.57−0.9log(p)+0.6log(ps)+0.3log(y),where p is the price and q is the demanded quantity of a certain good, ps is the price of a substitute good and y is disposable income, what is the interpretation of the coefficient on ps?
It is the estimate of the cross-price elasticity of demand in relation to the substitute good and it bears the expected sign.
VIF = 1
No correlation, no variance inflation No multicollinearity
No Multicollinearity
No linear relationship at all Completely orthogonal Not Really typical of economic data
not an issue, but it will rarely occur with economic data
No multicollinearity is
Consider the model grade = β0 + β1study + β2leisure + β3sleep + β4work + u, where each regressor is the number of hours per week a student spends in each one of the named activities. The dependent variable is the student's final grade for BEE1023 Introduction to Econometrics. What assumption is necessarily violated if the weekly endowment of time (168 hours) is entirely spent either studying, or sleeping, or working, or in leisure activities?
No perfect multicollinearity
null vs alternative hypothesis
Null Hypothesis is the statement which we are testing and the alternative hypothesis is the statement which must be true if the null is false. null is result *NOT* expected and alternative is result expected.
Expected Values of OLS estimators
Our objective is to show the OLS estimators provide unbiased estimates for true population parameters We need to show that E(^B0) = B0 E(^B1) = B1
p-value
P-value for a t-value is the probability of observing a t-value of that size or larger in absolute value if the null is true
P value meanings
P<_a reject Ho P>a fail to reject
True
Pairwise correlations will always successfully reveal whether multicollinearity is present in your estimating samples
pooled cross section
Pooling cross sections from different years is often an effective way of analyzing the efects of a new government policy. The idea is to collect data from the years before and after a key policy change
µY
Population mean of Y
Econometrics
Quantitative measurement and analysis of actual economic and business phenomena
The regression sum of squares (SSR) as a proportion of the total sum of squares (SST)
R squared measures
increase as more variables are added
R squared will only
from a regression of explanatory variable on all other independent variables (including a constant) (converges to a fixed number)
R-squared (variance)
Type 1 error
REJECT when you were supposed to accept (false positive)
Regression through the Origin
Regression analysis where the intercept is set to zero; the slopes are obtained by minimizing the sum of squared residuals, as usual (i.e. regression with no intercept)
Ordinary least squares
Regression estimate technique that calculates the beta that minimizes sum of squared residuals
Statistically Significant
Rejecting the null hypothesis that a parameter is equal to zero against the specified alternative, at the chosen significance level.
Type I Error
Rejecting the null hypothesis when it is, in fact, true
Exclusion Restrictions
Restrictions which state that certain variables are excluded from the model (or have zero population coefficients).
One of your friends is using data on individuals to study the determinants of smoking at your university. She is particularly concerned with estimating marginal effects on the probability of smoking at the extremes. She asks you whether she should use a probit, logit, or linear probability model. What advice do you give her?
She should use the logit or probit, but not the linear probability model.
? Unbiasedness of OLS
Show that OLS estimators are unbiased
Unbiasedness of ^B0
Show that the expected value of ^B0 is B0 by plugging in mean
true
T or F: t-statistic is a number, not a random variable
(T/F) If X and Y are independent, the conditional probability density function f(XjY )(XjY ) is equal to the marginal probability density function fX(X)
TRUE
(T/F) Consider the model, Consumption = β₀ + β₁Wage + ε. The sample regression function estimated with OLS gives you the average (or expected) value of Consumption for each value of Wage.
TRUE.
Units of Measurement
The OLS estimates change in expected ways when the units of measurement of the dependent and independent variables change
Sum of Squared residuals
The OLS estimator chooses ^B0 and ^B1 to make the SSR as small as possible
If repeated samples of the same size are taken, on average their value will be equal to B1
The OLS estimator for Beta 1 is unbiased means
Under the assumption of the Gauss-Markov Theorem, in the simple linear regres- sion model, the OLS estimator is BLUE. This means what?
The OLS estimator is the estimator that has the smallest variance in the class of linear unbiased estimators of the parameters.
False
The Variance Inflation Factor (VIF) is one (1) if there is a large degree of multicollinearity in a multiple regression model.
Which of the following Gauss-Markov assumptions is violated by the linear probability model?
The assumption of constant variance of the error term.
still hold for general case of multiple regression Linearity Unbiasedness Minimum Variance
The desirable small sample properties
Partial Effect
The effect of an explanatory variable on the dependent variable, holding other factors in the regression model fixed.
Linear in Parameters
The equation is linear in parameters B0 and B1. There are no restrictions on how y and x relate to the original explained and explanatory variables of interest
OLS Regression Line
The equation relating the predicted value of the dependent variable to the independent variables, where the parameter estimates have been obtained by OLS.
Assumption 5
The error term has a constant variance (no Heteroskedasticity)
Assumption 2
The error term has a zero population mean
Ordinary Least Squares
The goal of OLS is to closely "fit" a function with the data. It does so by minimizing the sum of squared errors from the data.
Which of the following is true of the OLS t statistics?
The heteroskedasticity-robust t statistics are justified only if the sample size is large.
Alternative Hypothesis
The hypothesis against which the null hypothesis is tested.
Which of the following is not correct in a regression model containing an interaction term between two independent variables, x1 and x2:
The interaction term coefficient is the effect of a unit increase in √x1x2
Omitted variable bias means the first least squares assumption that E(u|X)=0 is incorrect This fails the first assumption of FLS -> ols estimator is bias, the bias does not vanish in a very large sample, the ols estimator is inconsistent
The larger pXu the larger the bias pg 182
Ordinary Least Squares (OLS)
The most common way of estimating the parameters β₀ and β₁
Classical Linear Model (CLM)
The multiple linear regression model under the full set of classical linear model assumptions.
The natural logarithm
The nonlinear function that plays the most important role in econometric analysis y=log(x) The relationship between y and x displays diminishing marginal returns 100 * delta(log x) = % delta(x)
standard error of the regression (SER)
The positive square root of σ^2, denoted σˆ it is estimator of the standard deviation of the error term.
Practical Significance
The practical or economic importance of an estimate, which is measured by its sign ad magnitude, as opposed to its statistical significance.
Suppose that the linear probability model yields a predicted value of Y that is equal to 1.3. Explain why this is nonsensical.
The predicted value of Y must be between 0 and 1.
Significance Level
The probability if a Type I error in hypothesis testing.
Joint probability
The probability of two variables occurring (Probability of someone buying a ticket and being a businessman)
Misspecification Analysis
The process of determining likely biases that can arise from omitted variable, measurement error, simultaneity, and other kinds of model misspecification.
Regression Analysis
The study of the relationship between one variable (dependent variable) and one or more other variables (independent, or explanatory, variables).
Which of the following statements is true?
The upper bound of the confidence interval for a for a population parameter, say β , is given by β + critical value · standard error β .
Error Term (Disturbance)
The variable in a simple or multiple regression equation that contains unobserved factors which affect the dependent variables. The error term may also include measurement errors in the observed dependent or independent variables.
"If there is multicollinearity among independent variables, then a variable that appears significant may not indeed be so" Is this statement valid?
This statement is erroneous, just the opposite is true. Multicollinearity increases the standard errors and lowers t-statistics. A lower t-statistic is likely to make a variable insignificant rather than significant.
True
Three information criterion give basically the same answers for most problems
normality assumption
To make the sampling distributions of the Bˆj tractable, we now assume that the unobserved error is normally distributed in the population. The population error u is independent of the explanatory variables x1, x2, ..., xk and is normally distributed with zero mean and variance σ^2: u~Normal(0,σ^2).
True/False: Multiple linear regression is used to model annual income (y) using number of years of education (x1) and number of years employed in current job (x2). It is possible that the regression coefficient of x2 is positive in a simple linear regression but negative in a multiple regression.
True
(T/F) The sample average of the OLS residuals is zero.
True. (Σ(i=1,n)^ui)/n = ~Y - ~^Yi = ~Y - (1/n)(Σ(i=1,n)(~Y - ^β₁~X - ^β₁~X = 0
T/F If two variables are independent their correlation coefficient will always be zero
True. If two variables X and Y are independent E(XY)=E(X)E(Y).
T/F The coefficient of correlation will have the same sign as that of the covariance between the two variables.
True. The covariance can be positive or negative but the standard deviation always takes a positive value. Then from the formula to compute the correlation coefficient it must have the same sign as that of the covariance between two variables.
5. For the estimated regression model,Y = b + b X + b X 2 , we will find the 123 minimum value of Y over all values of X occurring at X * = -b2/2b3 if b3 turns out to be a positive number.
True. When b3 is positive the quadratic is U shaped resulting in a minimum. True. Using the formula for omitted bias,
Multicollinearity
Two or more predictor variables in the multiple regression are highly correlated. One can be linearly predicted from the other
are still BLUE and consistent The Gauss-Markov Theorem still holds
Under the expanded classical assumptions, the OLS estimators
Interpreting R^2 and adjusted R^2 WHAT THEY DO TELL YOU
What they tell you whether a regressor are good at predicting or explaing the valyes of the dependent variable in the sample of data on hand.
OLS regression line
Y= b0+B1x,
example of a quadratic regression model is
Y=B0+B1X+B2X^2+u
Theoretical regression equation
Y=B0+B1X1+E
exogenous explanatory variables
Zero Conditional Mean- The error u has an expected value of zero given any values of the independent variables. In other words, E(u|x1, x2, ..., xk)=0.
95% Confidence Interval for β₁
[^β₁- 1.96 x SE(^β₁), ^β₁+1.96 x SE(^β₁)] means we are 95% confident that this interval covers the true β₁
interaction term
a term in a regression where two independent variables are multiplied together and run as their own variable (if we believe that one variable affects our model in different ways for different values of the other variable)
zero
a variable is irrelevant to a model if its coefficient's true value is __________
Random variable
a variable that takes on numerical value and has an outcome that is determined by an experiment
confidence interval=
beta +- T*SE
Lagged Dependent Variable
better at capturing dynamics "Today Influenced by yesterday"
Omitted variable bias
bias on the coefficients; violate Assumption III because omitted variable is correlated with explanatory variables (which determine the error term)
attentuation bias
bias towards zero that results from classical measurement error; increases risk of type 2 errors (false negatives not false positives)
omitted variable
bias, violate classical assumption 3
Dummy variable
binary metric
specifying and equation
choosing the correct independent variables, choosing the correct functional form, choosing the correct form of the stochastic term
law of large numbers (asymptotic law)
as a sample size grows, its mean gets closer to the average of the whole population
Attrition bias
assignment was random but people drop out of the study . if those characteristics of those who drop out are systematically different from characteristics of the remainder of the treatment group then we have attrition bias
random sampling
assumption MLR.2
let W be the included exogenous variables in a regression function that also has endogenous regressor X the W variables can
be a control variable, make an instrument uncorrelated with u, have the property E(u|W)=0
they themselves are random variables
because estimators are the product of random variables...
the following problems could be analyzed using probit and logit estimation with the exception of whether or not
being a female has an effect on earnings
the regression residuals are our estimated errors
best fit
heteroskedasticity
constant variance of errors fix: white correction test, Breusch-Pagan test -- tests whether dependent (white test) variables have predictive power over predictive squared residuals. B-P does the same for independent variables high p value means you probably have it estat hettest robust cluster
log functions, when to use
consumer side, indifference curve, constant elasticity
2.57, 1.96, 1.64
critical values for when there is a large sample size (>120) for 1% sig level, 5% sig level, 10% sig level
The most frequently used experimental or observational data in econometrics are of the following type:
cross-sectional data
cdf
cumulative density functions integral of pdf how likely are you to be in a range? always upward-sloping
first order serial correlation
current value of the error terms is a function of the last value of the error term
SSE/SST or 1- SSR/SST
equation for R^2
sum(y hat-y bar)^2
equation for SSE
sum(yi-y hat)^2
equation for SSR, another equation is: sum(ui hat)^2
To infer the political tendencies of the students at your college/university, you sample 150 of them. Only one of the following is a simple random sample: You
have your statistical package generate 150 random numbers in the range from 1 to the total number of students in your academic institution, and then choose the corresponding names in the student telephone directory.
Level of Significance
indicates the probability of observing an estimated t-value greater than the critical t-value if the null hypothesis were correct.
OLS standard errors will be biased; will alter your standard errors in ways that will make you think you've found signals that are stronger or weaker than they actually are
heteroskedasticity implies...
standard errors will be biased; arises when one or more of our X variables is correlated with the variance of our errors (will alter your standard errors in ways that will make you think you've found signals that are stronger or weaker than they actually are)
heteroskedasticity implies...
rij are the squared residuals from a regression of xj on all other explanatory variables
heteroskedasticity-robust OLS standard errors
filing cabinet bias
hiding when you're wrong so that the anomaly study looks like the accurate one. non-results are just as important as results
multicollinearity
high correlation among some of the independent variables
the sampling variance because there is more noise (bad variation)
high error variance increases...
OVB is problematic, because
if it occurs, it means that our OLS estimator ^β₁will be biased and inconsistent. This also means that ^β₁will not correctly measure the effect of changing Xi on Yi. It will be both biased and inconsistent.
polynomial form
if the slope of relationship depends on the level of the variable
Weights
inverses of the variances
the population parameter in the null hypothesis
is not always equal to zero
A type II error
is the error you make when not rejecting the null hypothesis when it is false.
Gauss-Markov assumptions
justifies the use of the OLS method rather than using a variety of competing estimators. 1. (linear in parameters) y=B0+B1x+u 2. (random Sampling) -We have a random sample of size n, {(xi,yi): i= 1, 2, ..., n}, following the population model 3. (Sample Variation in the explanatory Variable) -The sample outcomes on x, namely, {xi, i=1, ..., n}, are not all the same value 4. (zero Conditional Mean) E(u|x)=0 5.(homoskedasticity) Var(u|x)=σ^2
fourth moment of standardized variable
kertosis
Orthogonal
lack of linear relationship between data
The OLS estimator is derived by
minimizing the sum of squared residuals.
pooled data
multiple cross-sections at different periods of time
Cross sectional data set
multiple individuals/entities at same time
The Classical Assumptions
must be met in order for OLS estimators to be the best available. I. The regression model is linear, is correctly specified, and has an additive error term. II. The error term has a zero population mean. III. All explanatory variables are uncorrelated with the error term. IV. Observations of the error term are uncorrelated with each other (no serial correlation). V. The error term has a constant variance (no heteroskedasticity). VI. No explanatory variable is a perfect linear function of any other explanatory variable(s) (no perfect multicollinearity). VII. The error term is normally distributed (this assumption is optional but usually is invoked).
time series data
one person multiple periods
probability density functions derivative of cdf function of a continuous random variable gives the probability that the value of the variable lies within an interval
discrete distribution
probability of different outcomes for a variable that can take one of a finite number of outcomes along a discrete scale
semi-log functions, when to use
producer side, diminishing returns, utility curve, constant semi-elasticity
Correlation Coefficient of Perfect Multicollinearity
r= 1.0
to decide whether the slope coefficient indicates a large effect of X on Y, you look at the
size of the slope coefficient
The cumulative probability distribution shows the probability:
that a random variable is less than or equal to a particular value.
cumulative probability distribution shows probability
that random variable is less than or equal to a particular value
A large p-value implies
that the observed value act is consistent with the null hypothesis.
a large p value implies
that the observed value y with bar act is consistent with null hypothesis
You have to worry about perfect multicollinearity in the multiple regression model because
the OLS estimator cannot be computed in this situation.
When there are omitted variables in the regression, which are determinants of the dependent variable, then
the OLS estimator is biased if the omitted variable is correlated with the included variable.
Variance
the distance X is from its expected value
Multiple Regression model
this model permits estimating the effect of Y of changing on variable X1, whole holding the other regressors ( X2,X3...) constant Test score example Isolate the effect of on test scores Y of the student-teacher ratio (X1) while holding constant the percentage of students in the district who are english learners(x2)
unemployment level
time specific
in the time fixed effects regression model you should exclude one of the binary variables for the time periods when an intercept is present in the equation
to avoid perfect multicollinearity
represents total variation in dependent variable
total sum of squares
TSS
total sum of squares how much story is there to explain?
goodness-of-fit
total sum of squares (SST) SST=∑(y-y ̄)^2 explained sum of squares (SSE) SSE=∑(y-y ̄)^2 residual sum of squares or sum of squared residuals (SSR) SSR=∑u^2 SSR/SST+SSE/SST=1. R^2=SSE/SST=1-SSR/SST (∑(y-y ̄)(y-y))^2 R^2=----------------------- (∑(y-y ̄)^2)(∑(y-y ̄)^2)
ideal randomized controlled experiments in economics are
useful because they give a definition of a causal effect
why use logarithms
useful when measuring relative or percentage change
discrete heteroskedasticity
two distributions, one with larger variance even thought both centered around 0 (heights of mice and bball players)
one is an exact linear combination of the other
two variables are multicollinear when...
error term or disturbance
u
measures how far each number is from the mean
variance
variance of the error term
variance
Mean Squared Error
variance plus bias^2; lower is better
pure serial correlation
violates Assumption IV; correctly specified equation
What do we use regression analysis for?
• To estimate the mean or average value of the dependent variable, given the values of the independent variables. • To test a hypothesis implied by economic theory • To predict, or forecast, the mean value of the dependent variable given the independent variables.
Total Sum of Squares (TSS)
• the total variation of the actual Y values around their sample average: • TSS = Σ(i)(Yi-~Y)²
A multiple regression includes two regressors: Yi = B0 + B1 X1i + B2 X2i + ui What is the expected change in Y if X2 decreases by 10 units and X1 is unchanged?
0-10B2
One sided test
Have values on only one side of the null hypothesis
bias towards zero that results from classical measurement error; increases risk of type 2 errors (false negatives not false positives)
attenuation bias
the intercept in the multiple regression model
determines the height of the regression line
degrees of freedom
df=n-(k+1) = (number of observations)- (number of estimated parameters).
When there are ∞ degrees of freedom, the t∞ distribution
equals the standard normal distribution.
When there are ∞ degrees of freedom, the t∞ distribution:
equals the standard normal distribution.
ui
error. • It captures that there are likely other variables (that we are not explicitly considering) that affect Y in addition to X.
non-normality of error term
errors aren't normally distributed fixes: change model get more data transform variables report it and walk away
time series
following a group over a period of time
log(wage)=2.48+.094log(education) Interpret
for every one more year of education wage increases by 9.4%
non-independence of chronological errors
predictable errors Durbin-Watson test: compares errors to previous errors to test for independence 2-tailed test: close to 2, likely no errors. on edges (0, 4) data likely has errors regress y x estat dwatson
in a quasi experiment
randomness is introduced by variations in individual circumstances that make it appear as if the treatment is randomly assigned
B. True/False: Multiple linear regression is used to model annual income (y) using number of years of education (x1) and number of years employed in current job (x2). It is possible that R2 is equal to 0.30 in a simple linear regression model using only x1 and equal to 0.26 in a multiple regression model using both x1 and x2.
False
Population Regression Function (PRF)
See conditional expectation
Explained Variable
See dependent variable
Predicted Variable
See dependent variable
Underspecifying the Model
See excluding a relevant variable.
F-test for overall significance=
(ESS/K)/(RSS/(N-K-1))
Consequences of Multicollinearity
1. Estimates will remain unbiased. 2. The variances and standard errors of the estimates will increase. 3. The computed t-scores will fall. 4. Estimates will become sensitive to changes in specification. 5. The overall fit of the equation and estimation of the coefficients of nonmulticollinear variables will be largely unaffected.
In a multiple linear regression where the Gauss-Markov assumptions hold, why can you interpret each coefficient as a ceteris paribus effect?
Because the Ordinary Least Squares (OLS) estimator of the coefficient on variable xj is based on the covariance between the dependent variable and the variable xj after the effects of other regressors has been removed.
Confidence interval formula
CI = X_bar +- z (st dev/sqrt(n))
T-Test
Econometricians generally use the t-test to test hypotheses about individual regression slope coefficients. .
we would calculate an estimate that is not efficient in the class of linear, unbiased estimators since BLU has a smaller variance while also being linear and unbiased
If we persist in using the conventional OLS formula and ignore heteroskedasticity,
What does it mean when we say that OLS is unbiased?
In expectation, the difference between the OLS estimator and the true pop- ulation parameter is equal to zero.
Restricted Model
In hypothesis testing, the model obtained after imposing all of the restrictions required under the null.
Unrestricted Model
In hypothesis testing, the model that as no restrictions aced on its parameters.
Critical Value
In hypothesis testing, the value against which a test statistic is compared to determine whether or not the null hypothesis is rejected.
Properties of the Mean
Mean equals the true mean of the variable being estimated, and has an unbiased estimator
Classical Assumption
Must be met in order for OLS estimators to be the best available
probability of an event A or B(Pr(A or B)) to occur equals
Pr(A) +Pr(B) if and A B are mutually exclusive
Specialized Variables used to represent
Present/Absent Yes/No Male/Female
Estimator
Rule that assigns each possible outcome of the sample a value
total sum of squares (SST)
SST=∑(Y-Y ̄)^2
Standard deviations formula within correlation
Sqrt(sum(dx^2)/n-1)
log(1+x) Interpretation
The quality of the estimation decreases as the value of x gets larger
There may be approximation errors in the calculation of the least squares estimates
The regression model includes a random disturbance term for a variety of reasons. Which of the following is NOT one of them?
Error Variance
The variance of the error term in a multiple regression model.
Heteroskedasticity
The variance of the error term, given the explanatory variables, is not constant.
When the sample size n is large, the 90% confidence interval for is
Y ± 1.64SE(Y ).
r²
coefficient of determination
elasticity
in the log-log model= b1
Standard Deviation of Y
sY=√(Σ(i)(Yi-~Y)²/(n-1))
semi elasticity model
semi log form suggests...
β₁
slope. • It measures the change in Y resulting from a unit change in X
SSy
sum of squares Y
E(u/x)=E(u)
u is mean independent of x.
Multiple linear regression model
y=B0+B1x1+B2x2+B3x3+...+Bkxk+u -B0 is the intercept. -B1 is the parameter associated with x1. -B2 is the parameter associated with x2, and so on.
to provide quantitative answers to policy questions
you should examine empirical evidence
In the multiple regression model the standard error of regression is given by
√((1/(n-k-1)∑(n, 1=i)⁻u²))
Difference between non-linear and linear
Change in y depends on the starting point of x
Assumption 4
Observations of the error term are uncorrelated with eachother
Reduced importance
Observations with large variance
if F> Fc
then x1 and x2 are jointly significant and reject the null
unbiasedness of OLS
theorem 2.1
we would be using a biased estimator
If we used estimator of variance under homoskedasticity for heteroskedasticity
multicollinearity
Occurs when the independent variables (or error observations) are highly correlated among themselves.
Heteroskedasticity
Occurs when the variance of the residuals is not the same across all observations in the sample
Explained Sum of Squares (SSE)
The total sample variation of the fitted values in a simple or multiple regression model.
The overall regression F-statistic tests the null hypothesis that
all slope coefficients are zero
(c) How can multicollinearity be detected?
(c) The classic case of multicollinearity occurs when none of the explanatory variables is statistically significant even though R2 may be high. High simple correlation coefficients among explanatory variables are sometimes used as a measure of multicollinearity. Another indication is when estimates are very sensitive to changes in specification.
Desirable properties of ~Y
1. unbiased 2. consistent
multicollineraity
2 or more x's are highly correlated symptoms: R^2, high F, low t correl x1 x2 x3 any result > |0.3| worth looking at solutions: more data acknowledge include only one (but have a good reason
b) How is bias defined?
Bias is defined as the difference between the expected value of the estimator and the true parameter. That is, bias = E(theta^hat) - (theta)
p-value
2ф(-|t-stat|) • the probability of drawing a value of ~Y that differs from µ₀ as much as its value from the sample. The smaller the p-value is, the more statistical evidence there is against H₀
If you choose a higher level of significance, a regression coefficient is more likely to be significant.
3. True. With an increase in the level of significance it becomes more likely that the t-statistics will be significant.
Assume that you assign the following subjective probabilities for your final grade in your econometrics course (the standard GPA scale of 4 = A to 0 = F applies): Grade Probability A 0.50 B 0.50 C 0 D 0 F 0
3.5
A bag has five pearls in it, out of which one is artificial. If three pearls are taken out at random, what is the probability that the artificial pearl is one of them?
3/5
s²
=(1/(n-1))∑(i-1,n)(Yi - ~Y)² • sample variance used to estimate the population variance. It is unbiased and consistent
Residual
Difference between actual value and estimated value
Variances of the OLS estimators
It is important to know how far away we can expect ^B1 to be away from B1 on average -This allows us to choose the best estimator -The measure of spread in the distribution of ^B1 that is easiest to work with is the variance
residuals will follow a sine-wave type pattern
Positive autocorrelation
Assumption 7
The error term is normally distributed
R^2
is the fraction of the sample variance of Y explained by the regressors. R^2= ESS/TSS = 1- SSR/TSS in multiple regression the R^2 increases whenever a regressor is added, unless the estimated coeff on the added regressor is exactly 0 an increase in R^2 doesn not mean that adding a variable actually improves the fit of the model inflated estimate
t distribution
large t value (at least ≥ 2) means that the variable likely DOESN'T NOT matter (ergo...the variable matters)
error term=
true Y - expected Y
Suppose we have the linear regression model y = β0 + β1x1 + β2x2 + u, and we would like to test the hypothesis H0 : β1 = β2. Denote β1 and β2 the OLS estimators of β1 and β2. Which of the following statistics can be used to test H0?
β⁻₁-β⁻₂/(Var(β⁻₁)- 2Cov(β⁻₁, β⁻₂)+ Var(β⁻₂))
Parameters of the regression model
β₀ and β₁
Variance of ^B1
σ^2/sum(xi-¯x)^2 invalid in presence of heteroskedasticity
OLS 3 assumptions
• E (ui | Xi) = 0. In words, the error term ui has a conditional mean of zero given Xi . • (Xi,Yi) , i = 1,2,...,n are i.i.d.. That is, we have a random sample. • E(X^4i) < ∝ and E(Y^4i) < ∝, i.e. Xi and Yi have finite fourth moments. In practice, this means that large outliers (values of Xi and Yi that are far outside the range of the data) are unlikely.
Under the OLS assumptions, we have the following results for ^β₁(same results hold for ^β₀):
• E( ^β₁) = β₁. In words, ^β₁is an unbiased estimator of β₁ • As the sample size n increases, ^β₁ gets closer and closer to β₁, i.e. ^β₁is a consistent estimator of β₁. • If n is large, the distribution of ^β₁is well approximated by a normal
Briefly explain what is meant by Multicollinearity
• Problem if some explanatory variables are highly correlated • How do we deal with multicollinearity? ways to spot it: • use economic theory to tell you what is most important to put in the regression
The 1st OLS assumption E(ui | Xi)=0 means
• That ui and Xi are unrelated and that the expected value of omitted variables is 0 for any Xi. • Also under assumption 1, the predicted value ^Yi, is an estimate of the expected value of Yi given Xi. • Is not a strong assumption at all, it is just a normalization
1- SSR/SST
% unexplained variation (equation)
Two ways in which a regression function can be linear
1. Linear in the variables 2. Linear in the parameters
multiple hypotheses test
A test of multiple restrictions
if added independent variables have little or no explanatory power
Adjusted R squared can decline
Time Series Data
Data collected over time on one or more variables.
The conditional expectation of Y given X, E[Y |X = x], is calculated as follows:
Eki=1 yi Pr(Y = yi|X = x).
D. True/False: To perform simple linear regression the explanatory variable must follow a normal distribution.
False
Does regression prove causality?
No, it only tests the strength and direction of the quantitative relationship involved
Standard deviation
Square root of the variance
General form
T=f(R,M)+E
R-Squared Form of the F Statistic
Te F statistic for testing exclusion restrictions expressed in terms of R-squareds from the restricted and unrestricted models.
Omitted Variable Bias
The bias that arises in the OLS estimators when a relevant variable is omitted from the regression.
Homoscedastic errors
The variance of each ui is constant for all i
CLT
Theorem that states the average from a random sample for any population, when standardized, has an asymptotic standard normal distribution
autocorrelation
Time series data usually exhibit
False
Under autocorrelation, the conventionally calculated regression variance estimator, s 2 , is unbiased since this has nothing to do with the disturbance term.
we believe that our identified coefficient can be generalized to outside our sample
a model is externally valid if...
an estimator is consistent for a population parameter if
consistency
t statistic
estimator-hypothesize value/ standard error of estimator
Effects of a disturbance or shock linger in time, but then die out
musical insturment
SSx
sums of squares X
The linear probability model is:
the application of the linear multiple regression model to a binary dependent variable.
GDP growth
time specific
"Suppose you are interested in studying the relationship between education and wage. More specifically, suppose that you believe the relationship to be captured by the following linear regression model, Wage = Beta0 + Beta1 Education + u Suppose further that you estimate the unknown population linear regression model by OLS. What is the difference between Beta1 and Beta1hat ?"
"Beta1 is a true population parameter, the slope of the population regression line, while Beta1hat is the OLS estimator of Beta1
semilog form
"increasing at a decreasing rate" form...; perfect for percentage terms (ln X...change in Y related to 1 percent increase in X...ln Y...percent change in Y related to a one-unit increase in X)
correlation coefficient
"r" measures strength and direction of linear relationship between two variables r=1 perfect positively correlated r= 0 variables uncorrelated
algebraic properties of olS Statistics
(1) The sum, and the sample average of the OLS residuals, is zero. ∑Ui=0 (2) The sample covariance between the regressors and the OLS residuals is zero. ∑xu=0 (3) The point (x ̄,y ̄) is always on the OLS regression line. yi =yˆi+uˆi.
Let R2 be the R-squared of a regression (that includes a constant), SST be the total sum of squares of the dependent variable, SSR be the residual sum of squares and 2 df be the degrees of freedom. The estimator of the error variance, σ = SSR/df, can be re-written as:
(1−R^2)SST /df
Let R2 be the R-squared of a regression (that includes a constant), SST be the total sum of squares of the dependent variable, SSR be the residual sum of squares and 2 df be the degrees of freedom. The estimator of the error variance, σ = SSR/df, can be re-written as:
(1−R²)SST /df
Var(^β₀)
(Σ(i)X²i)(^σ²(u))/nΣ(i)(Xi-~X)²
15. Consider a regression model of a variable on itself: y = β0 + β1y + u If you were to estimate this regression model by OLS, what would be the value of R-squared?
1
components of OLS variation
1) error variance (σ2); bad variation 2) total sample variation in Xs (Var X); good variation 3) linear relationships among the independent variables
Standard Normal Distribution
A normal distribution with a mean of 0 and a standard deviation of 1.
Define a time series data set...
A time series data set consists of observations on a variable or several variables over time. Examples of time series data include stock prices, money supply, consumer price index, gross domestic product, annual homicide rates, and automobile sales figures
The dummy variable trap is an example of:
A. perfect multicollinearity.
Exogenous Explanatory Variable
An explanatory variable that is uncorrelated with the error term.
best linear unbiased estimator (BLUE)
B ̃=∑wy
(T/F) The Adjusted R-squared (~R^2) is always greater than or equal to the R-squared (R^2)
FALSE. Because [(n-1)/(n-k-1)] is always greater or equal to 1, ~R^2 <= R^2
Type 2 error
Fail to reject false the null hypothesis
Dependent variable
Function of movements in a set of other variables
Gauss-Markov Theorem
Given assumptions 1-6, the OLS estimator of B is the minimum variance estimator among all linear unbiased estimated for (k=0,1,2...K)
p-value
Given the observed value of the t statistic, what is the smallest significance level at which the null hypothesis would be rejected? P(|T|>|t|)
Estimated regression coefficients
Guesses of the true regression coefficients and are obtained from data from a sample of the Ys and Xs
Level of significance
Indicates probability of observing an estimated t value greater than the critical value if null hypothesis were correct. Measured amount of type 1 error
R^2
Is the ratio of the explained sum of squares to the total sum of squares
Type 2 error
NOT REJECT when you were supposed to REJECT (false negative)
the much larger family GLS
OLS is a special case of
from a regression of explanatory variable xj on all other independent variables (including a constant); converges to a fixed number
R-squared (variance)
Goodness of fit
R-squared is the coefficient of determination. It is the ratio of the explained variation compared to the total variation, and is the fraction of the sample variation in y that is explained by x. It is equal to the square of the sample correlation coefficient between yi and ^yi R^2=SSE/SST = 1 - SSR/SST
Coefficient of Determination, R^2
R2 is the ratio of the explained sum of squares to the total sum of squares: It is used to measure the fit of the regression r^2 of 1 is a perfect fit r^2 of 0 shows a failure of the estimated regression Explained sum of squares/ Total sum of squares A major problem with R2 is that adding another independent variable to a particular equation can never decrease R2, because it cannot change TSS.
Adjusted R^2
R^2 adjusted for degrees of freedom
linear in parameters
SLR.1
Overspecifing the Model
See inclusion of an irrelevant variable.
Properties of standard error
Square root of variance
Residual Sum of Squares
The Residual Sum of Squares is the amount of variation that is left unexplained by the regression line, that is, the sum of the squared differences between the predicted and observed values.
Ordinary Least Squares
The estimates given by ^B- and ^B1 are the OLS estimates of B0 and B. Can derive unbiasedness, consistency
Gauss-Markv Assumptions
The set of assumptions under which OLS is BLUE
The error term is homoskedastic if
Var(u|x) is constant.
Estimated regression equation
Y^=1034+6.38Xi
ols residuals Ui are defined as follows
Yi- Y^i
Alternate Hypothesis
a statement of the values that the researcher expects
the model is properly specified and linear
assumption MLR.1
As variability of xi increases, the variance of ^b1 ____
decreases
Yi
dependent variable
any difference in the predicted dependent variable and the actual dependent variable is due to
factors subsumed in the model error term
exponential functions, when to use
growth
a large p value
is in favour of the null
log-log
log(y)=b0+b1log(x) for every 1% increase in B1, y inc/decr by 1%
instrument relevance
means that some of the variance in the regressor is related to variation in the instrument
coefficient of variation
measure of spread that describes the amount of variability relative to the mean unitless, so you can use it to compare the spread of different data sets instead of the standard deviation
β₀
notation for population intercept
central limit theorem
states conditions under which a variable involving the sum of Y1 variable becomes the standard normal distribution
The central limit theorem
states conditions under which a variable involving the sum of Y1,..., Yn i.i.d. variables becomes the standard normal distribution.
simple linear regression
statistical procedure used to determine a model (equation of a straight line) which "best" fits all points on the scatter plot for bivariate data
to standardize a variable
subtract its mean and divide by standard deviation
In the log-log model, the slope coefficient indicates
the elasticity of Y with respect to X.
The proof that OLS is BLUE (Gauss-Markov Theorem) requires all of the following assumptions with the exception of:
the errors are normally distributed.
The proof that OLS is BLUE (Gauss-Markov Theorem) requires all of the following assumptions with the exception of: a) the errors are homoskedastic. b) the errors are normally distributed. c) E(u|x) = 0. d) there is variation in the regressors.
the errors are normally distributed.
the t statistic is calculated by dividing
the estimator minus its hypothesized value by the standard error of the estimator
The t-statistic is calculated by dividing
the estimator minus its hypothesized value by the standard error of the estimator.
The t-statistic is calculated by dividing:
the estimator minus its hypothesized value by the standard error of the estimator.
conditional mean independence assumption
the explanatory variable must not contain information about the mean of ANY unobserved factors (talking about the actual parameter u)
regression through the origin
the line passes through the point x=0, y ̃=0. To obtain the slope estimate, we still rely on the method of ordinary least squares, which in this case minimizes the sum of squared residuals y ̃=B1x
unbiasedness
the mean of the sampling distribution is equal to the true population
if there is heteroskedasticity
the ols is not the most efficient estimator and the standard errors are not valid for inference
slope parameters
the parameters other than the intercept even though this is not always literally what they are where neither b1 nor b2 is itself a slope, but together they determine the slope of the relationship between consumption and income
the fitted value of Y is also called
the predicted value of Y-hat
the significance level of a test
the probability of rejecting the null hypothesis when it is true
consistency
the probability that the estimate is close to the true population value can be made high by increasing the sample size
probability of an outcome
the proportion of times that the outcome occurs in the long run
demand equation
the quantity demanded of each commodity depends on the price of the goods, the price of substitute and complementary goods, the consumer's income, and the individual's characteristics that affect taste.
A statistical analysis is internally valid if
the statistical inferences about causal effects are valid for the population studied
regression
the statistical technique of modeling the relationship between variables
2. If the independent variable is multiplied or divided by some nonzero constant,
then the OLS slope coefficient is multiplied or divided by that constant
variances of OLS estimators
theorem 2.2
states that OLS estimator is the Best Linear Unbiased Estimator (under assumptions MLR.1-MLR.5)
theorem 3.4 (Gauss Markov theorem)
t distribution for standardized estimators (allows us to form a general null hypothesis)
theorem 4.2
how to choose variables
theory, t-test, adjusted R^2, and bias
us unemployment rate % from 65-95
time series
spurious regression
time series producing high (inaccurate) values of R^2
failing to reject a false hypothesis
type 1 error
Non-Orthogonal
typical of economic data which are not collected by experiments
With a biased estimator, confidence intervals and hypothesis tests are
typically invalid
"A professor decides to run an experiment to measure the effect of time pressure on final exam scores. He gives each of the 400 students in his course the same final exam, but some students have 90 minutes to complete the exam while others have 120 minutes. Each student is randomly assigned one of the examination times based on the flip of a coin. Let Yi denote the number of points scored on the exam by the ith student ( 0 less than or equl to Yi less than or equal to 100), let Xi denote the amount of time that the student has to complete the exam (Xi = 90 or 120), and consider the regression model Yi = Beta0 + Beta1 Xi + ui , E(ui) = 0 Which of the following are true about the unobservable ui ? "
ui represents factors other than time that influence the student's performance on the exam.
"Under the least squares assumptions for the multiple regression problem (zero conditional mean for the error term, all Xi and Yi being i.i.d., all Xi and ?i having finite fourth moments, no perfect multicollinearity), the OLS estimators for the slopes and intercept:"
unbiased and inconsistent
If Cov(Xi,ui) > 0, then ^β₁is biased
upwards (positive bias)
standardized beta coefficients
use for regression when variables are in different units; one st. dev increase in X leads to an increase of beta coefficient amount in Y
When testing joint hypotheses, you should
use the F-statistics and conclude that at least one of the restrictions does not hold if the statistic exceeds the critical value.
When testing joint hypothesis, you should
use the F-statistics and reject at least one of the hypothesis if the statistic exceeds the critical value.
To measure the fit of the probit model, you should:
use the "fraction correctly predicted" or the "pseudo R squared."
Expected bias analysis
use to determine whether or not there is an omitted variable: sign of correlation between omitted var and explanatory times sign of omitted variable coefficient
mean square error
used to decide if a small variance offsets bias (with a sampling distribution with narrow distribution but offset from the true value a little)
multicollinearity
violates assumption VI; estimates remain unbiased and overall fit will be generally unaffected; variances and standard errors will be large
fundamental problem of causal inference
we can never observe both potential outcomes Yi(0) and Yi(1) (counterfactuals are unobservable); we cannot know the unit effects (the individual average difference between counterfactual state)
estimating the error variance
we can use data to estimate σ^2, which allows us to estimate var(^b1)
When estimating σ²u, we divide by n - 2, the degrees of freedom, because
we lose 2 for estimating β₀ and β₁
If we want to test whether a binary variable equaling 0 or 1 generates equal results,
we should test the null hypothesis that the slope coefficient β₁=0
constrained equation in the F-test
what the null would look like if it were correct
level-level
y(hat)=B(hat) + B(hat)1x + .... B(hat)1 - for every 1 unit increase in x, y(hat) increases/decreases by B1
Model with two Independent Variables
y=B0+B1x1+B2x2+u -B0 is the intercept. -B1 measures the change in y with respect to x1, holding other factors fixed. -B2 measures the change in y with respect to x2, holding other factors fixe
simple linear regression model
y=Bo+B1x+u y- the dependent variable x-independent variable u- error term or disturbance B1= slope parameter in the relationship between y and x Bo=intercept parameter If the other factors in u are held fixed, so that the change in u is zero, Change U=0, then x has a linear effect on y: ChangeY=B1ChangeX if ChangeU=0.
Let y^i be the fitted values. The OLS residuals, u^i , are defined as follows
yi-y^i
var(^B1)
σ^2/ sum(xi- ¯x)^2
Variance of ^B0
σ^2/n times sum of x variance over sum of x variance squared = 0
Linearity in the parameters
• Yi = β₀+β₁Xi+ui • Yi = β₀+β₁X²i+ui • Yi = β₀+β₁1/Xi+ui • Yi = β₀+β₁log(Xi)+ui
~Y
• estimator of µY • often called a point estimate
t/f An efficient estimator means an estimator with minimum variance.
(c) False. To be efficient an estimator must be both unbiased and have minimum variance.
consider the following least squares specification between test scores and student teacher ratio. test score=557.8+36.42ln(income) according to this equation a 1% increase income is associated with an increase in test scores of
.36 points
Assume that Y is normally distributed N(μ, σ2). Moving from the mean (μ) 1.96 standard deviations to the left and 1.96 standard deviations to the right, then the area under the normal p.d.f. is
0.95
Threats to external validity
1.Non rep sample 2. Non rep treatment 3. General equilibruim effects and externalities 4. Treatment vs eligibility 5.Treatment vs choice
OLS Slope Estimate
A slope in an OLS regression line.
Minimum Variance Unbiased Estimators
An estimator with the smallest variance in the class of all unbiased estimators.
Unbiased Estimator
An estimator βN is an unbiased estimator if its sampling distribution has as its expected value the true value of β. E(βhat) = β
hourly data, which has a major problem
Annual data probably have less autocorrelation problem then
Formula for slope (B1)
B1 = Cov(x,y)/Var(x)
According to Guass-Markov Theorem what is OLS
BLUE = best, linear, unbiased, estimator
BLUE
Best Linear Unbiased Estimator
BLUE
Best - minimum variance Linear Unbiased Estimator
omitted variable bias.
Bias(B1)=E(B1)-B1=B2δ1
Experimental Data
Data that have been obtained by running a control group
The regression R2 is defined as follows:
ESS/TSS
(T/F) In the expression Pr(Y = 1) = Φ (β₀ + β₁X) from a probit model, β₁cannot be negative, since probabilities have to lie between 0 and 1.
FALSE. Even if β is negative Φ(β₀ + β₁X) will never be negative, as the function Φ(.) is the cumulative density function of the N(0,1) , and all cumulative density functions lie, by definition, between 0 and 1.
(T/F) When we drop a variable from a model, the total sum of squares (TSS) increases.
FALSE. The TSS is the variation of Y around its sample average ~Y . It is unrelated to the number of X's used to explain Y .
(T/F) Among all unbiased estimators that are weighted averages of Y1; :::; Yn, ^β₁ is the most unbiased estimator of β₁.
False ^β₁is unbiased so it is not more or less unbiased than any other unbiased estimator. The Gauss-Markov theorem says that it is the best (most efficient) among unbiased estimators.
False
For most economic time series, the autocorrelation coefficient is negative
the model is properly specified and linear
Gauss-Markov assumption 1
endogenous explanatory variable
If xj is correlated with u for any reason, then xj is said to be an endogenous explanatory variable
False
If you wish to use a set of dummy variables to capture six different categories of an explanatory variable, you should use six different dummy variables, each equal to one when the observation is a member of a specific category and zero otherwise.
Variance Inflation Factor (VIF)
In multiple regression analysis under the Gauss-Markov assumption, the term in the sampling variance affected by correlation among the explanatory variables.
Excluding a Relevant Variable
In multiple regression analysis, leaving out a variable that has a non-zero partial effect on the dependent variable.
Critical t-value
Is the value that distinguishes the acceptance region from the rejection region
random white noise
Jaggedness in residuals due to
Specialized Variables
Lagged Variables Time Trend Dummy Variables
What can one do about OVB other than trying to determine the direction of the bias?
Main strategies are: 1. Multiple Regression 2. Instrumental Variables 3. Randomized Experiments
random movement from one state to another
Markov Process Represents the
Z distribution
Mean of 0 and standard deviation of 1
Dummy
Measure structural change; measure impact of strike; account for seasonality; measure gender differences
Trend
Measure technological change; exogenous factors for population growth
"In a regression, if the t-statistic for a coefficient is 1.83, do you reject the null hypothesis of the coefficient being equal to zero at the 5% level?"
No
Assumption 6
No explanatory variable is perfectly linear function of any other variable
In the model Grade = β0 + β1study + β2leisure + β3sleep + β4work + u, where each regressor is the amount of time (hours), per week, a student spends in each one of the named activities and where the time allocation for each activity is explaining Grades (where Grade if the final grade of Introduction to Econo- metrics), what assumption is necessarily violated if the weekly endowment of time (168 hours) is entirely spent either studying, or sleeping, or working, or in leisure activities?
No perfect multicollinearity.
what assumption is necessarily violated if the weekly endowment of time (168 hours) is entirely spent either studying, or sleeping, or working, or in leisure activities?
No perfect multicollinearity.
Minimum Variance: Heteroskedasticity
Non-Constant Variance makes OLS estimators differ from BLU Estimators
classical linear model (CLM) assumptions
Normality Assumption_The population error u is independent of the explanatory variables x1, x2, ..., xk and is normally distributed with zero mean and variance σ^2: u~Normal(0,σ^2).
Best Linear Unbiased Estimators (BLUE)
OLS estimators have minimum variance among all unbiased estimators of the β's that are linear functions of the Y's.
gauss-markov theorem
OLS is BLUE (best, minimum variance, linear unbiased estimator); assumptions 1-6
errors in variable assumption (measurement error)
OLS is biased and inconsistent because the mismeasured variable is endogenous
The difference between the standard deviation of Y and the SER is
SER measures the deviation of the Y values around the regression line and the standard deviation of Y measures the deviation of the Y values around the sample mean
random sampling
SLR.2
sample variation in the explanatory variable
SLR.3
zero conditional mean
SLR.4
In the simple regression model an unbiased estimator for V ar(u) = σ2, the variance of the population regression errors, is:
SSR/(n−2)
Increasing return to education
Since the change in wage for an extra year of education increases as education increases, it is increasing
Difference between z and t distributions
T distributions are lower and wider, more "kurtosis" the standard deviation of the data is much larger in a t distribution (meaning data is more dispersed)
(T/F) A random variable X can only take on the following values: 0 with probability b; 10 with probability 4b; 20 with probability 4b; and 100 with probability b. Therefore b must be equal to 0.1
TRUE. Since probabilities must sum to 1, if X can only take on the specified values it must be the case that b + 4b + 4b + b = 1. Hence b = 0:1.
Residual
Te difference between the actual value of the fitted (or predicted) value; there is a residual for each observation in the sample used to obtain an OLS regression line.
Fitted Value
The estimated values of the dependent variable when the values of the independent variables for each observation are plugged into the OLS regression line
Downward Bias
The expected value of an estimator is below the population value of the parameter.
Mean Independent
The key requirement in a simple and multiple regression analysis, which says the unobserved error has a mean (E(u)) that does not change across subsets of the population defined by different values (x) of the explanatory variables.
Assumption 1
The regression model is linear, is correctly specified, and has an additive error term
Gauss-Markov Theorem
The theorem that states that, under the five Gauss-Markov assumptions (for cross-sectional or time series models), the OLS estimator is BLUE (conditional on the sample values of the explanatory variables.
J. True/False: In trying to obtain a model to estimate grades on a statistics test, a professor wanted to include, among other factors, whether the person had taken the course previously. To do this, the professor included a dummy variable in her regression that was equal to 1 if the person had previously taken the course, and 0 otherwise. The interpretation of the coefficient associated with this dummy variable would be the average amount the repeat students tended to be above or below non-repeaters, with all other factors the same.
True
Take an observed (that is, estimated) 95% confidence interval for a parameter of a multiple linear regression. Then:
We cannot assign a probability to the event that the true parameter value lies inside that interval.
homoskedasticity
assumption SLR.5
yi =
b₀ + b₁xi + ei
cross sectional data
data constructed of individual observations taken at a single point in time (individuals, households) (must be more or less independent to draw meaningful inferences)
Omitted Variable
defined as an important explanatory variable that has been left out of a regression equation. omitted variable bias.
residual sum of squares
defines sample variation in ^u sum(yi-^y)^2
Explained sum of squares
defines sample variation in ^y sum(^y-_y)^2
Total sum of squares
defines sample variation in y
Consider the regression model Wage = β0 + β1Female + u Where Female (=1 if female) is an indicator variable and u the error term. Identify the dependent and independent variables in the regression model above. Wage is the __________ variable
dependent
chi squared distribution in stata
display chi2(df, z)
type 2 error
fail to reject a false null
how you plan to identify your parameter of interest (one that will satisfy the conditional mean independence assumption)
identification strategy
Interaction Term
is an independent variable that is a multiple of two or more other independent variables.
The expected value of a discrete random variable
is computed as a weighted average of the possible outcome of that random variable, where the weights are the probabilities of that outcome.
The expected value of a discrete random variable:
is computed as a weighted average of the possible outcome of that random variable, where the weights are the probabilities of that outcome.
The larger the error variance that ___ is var(^B1)
larger
panel data
one cross-section over a period of time
The power of the test is
one minus the probability of committing a type II error
unobserved ability
person-specific
A manufacturer claims that his tires last at least 40,000 miles. A test on 25 tires reveals that the mean life of a tire is 39,750 miles, with a standard deviation of 387 miles. Compute the actual value of the t statistic.
t = −3.23.
empirical analysis
uses data to test a theory or to estimate a relationship.
Proportionate Change
x1-x0/x0 x0=beginning term x1=ending term
Standard Error of the OLS estimators:
• se(^β₀) = √Var(^β₀) • se(^β₁) = √Var(^β₁)
Assumptions for OLS
1. Parameters are linear 2. Random sampling No bias Sample estimates are unbiasedly targeted toward the pop parameters 3. Var(x) > 0 X cannot be more than one term, has to have variation and be dispersed 4. E(u) = 0 Expected value of error is 0 5. Homoskedasticity -> equality of error variances across x values
Which of the following is true of confidence intervals?
Confidence intervals are also called interval estimates.
Nonexperimental Data
Data that have not been obtained through a controlled experiment.
Assumption 2
Error term has a zero erro population mean
E. True/False: It is impossible for R*2 to decrease when you add additional explanatory variables to a linear regression model.
False
F. True/False: In a particular model, the sum of the squared residuals was 847. If the model had 5 independent variables, and the data set contained 40 observations, the value of the standard error of the estimate is 24.2.
False
(T/F) The standard error of the regression is equal to 1-R².
False. SER = (1/(n-2))Σ(i=1,n)^u²i and 1-R² = (Σ(i=1,n)^u²i)/(Σ(i=1,n)(Yi-Y)² , and the two are not equal.
Assumption 4
Observations of error term are uncorrelated with each other
Perfect Multicollinearity
Perfect linear relationship among variables One or more variables are redundant Holds for all observations Not typical of economic data Usuall introduced into a problem (Dummy Variable Trap)
lie on top of each other and are indistinguishable. Move along one by a certain amount and you move along the other by the exact same ammount
Perfect multicollinearity, the two lines
Regression analysis
Statistical technique that attempts to explain movements in one variable
true
T or F: R^2 never decreases when you add variables to the model
critical value
c, rejection of H0 will occur for 5% of all random samples when H0 is true.
unit elastic
level-log
Constant elasticity of demand function
log(y)=B0+B1log(x) B1 is the elasticity
Log-level
log(y)=B0+Bx+... for every 1 unit increase in x, y incr/decr by b1*100 -> %
Cross sectional data
one time period multiple entities
economic significance
significance of variance is determined by the size and sign of B
1) variables measured in natural units such as years 2) variables measured in percentage points 3) if variables take on 0 or negative values
when to not take logs
For n = 121, sample mean=96, and a known population standard deviation σX = 14, construct a 95% confidence interval for the population mean.
(93.51, 98.49).
to derive the least squares estimator Uy, you find the estimator m which minimizes
(Y1-m)^2
^β₁~ N(β₁, Var(^β₁)) and ^β₀~ N(β₀, Var(^β₀)) implies that
(^β₀ - ß₀)/SE(^ß₀) ~ N(0, 1) and ( ^β₁ - ß₁)/SE(^ß₁) ~ N(0, 1)
(a) OLS is an estimating procedure that minimizes the sum of errors squared, (sum) ei^2
(a) False, OLS minimizes the sum of squared residuals.
(a) State the null and alternative hypotheses in testing the overall significance of the regression. (b) How is the overall significance of the regression tested? What is its rationale?
(a) Testing the overall significance of the regression refers to testing the hypothesis that none of the independent variables helps to explain the variation of the dependent variable about its mean. Formally the null hypothesis is H0 :B(2) =B(3) ... B(K) = 0 against the alternative hypothesis H1: not all Bi's are 0 (b) The overall significance of the regression is tested by calculating the F ratio of the explained to the unexplained variance. A "high" value for the F statistic suggests a significant relationship between the dependent and independent variables leading to the rejection of the null hypothesis that the coefficients of all explanatory variables are jointly zero.
t/f An estimator of a parameter is a random variable but the parameter is non-random.
(a) True. We normally assume the parameter to be estimated is some fixed number, although unknown.
(a) What is meant by perfect multicollinearity? What is its effect?
(a) Two or more independent variables are perfectly collinear if one or more of the variables can be expressed as a linear combination of the other variable(s). For example, there is perfect multicollinearity between X1 and X2 if X1=2X2 or X1=5-X2. If two or more explanatory variables are perfectly linearly correlated it will be impossible to calculate OLS estimates of the parameters.
t/f An unbiased estimator of a parameter (theta) means that it will always be equal to (theta)
(b) False. An estimator is unbiased if on average it is equal to the true unknown parameter.
(b) For a given significance level and degrees of freedom, if the computed ItI exceeds the critical t value we should accept the null hypothesis.
(b) False. The null hypothesis should be rejected.
(b) What is meant by high but not perfect multicollinearity? What problems may result?
(b) High but not perfect multicollinearity refers to the case in which two or more independent variables in the regression model are highly correlated. This may make it difficult to isolate the effect that each of the highly collinear explanatory variables has on the dependent variable.
State whether the following statements about heteroskedasticity are true or false. If false - give reasons why. (b) In the presence of heteroskedasticity OLS is an inefficient estimation techniques and the t- and F-tests are invalid.
(b) True.
State whether the following statements about heteroskedasticity are true or false. If false - give reasons why. (c) Heteroskedasticity can be detected with a Chow test.
(c) False - The Chow test is used to test for structural change. .
(c) The coefficient of correlation r has the same sign as the estimated slope coefficient.
(c) True. The numerator of both involves the covariance between Y and X which can be positive or negative.
t/f An estimator can be BLUE only if its sampling distribution is normal.
(d) False. No probabilistic assumption is required for an estimator to be BLUE.
(d) What can be done to overcome or reduce the problems resulting from multicollinearity?
(d) Serious multicollienarity may sometimes be corrected by (1) extending the size of the sample data (2) Using some prior information about one of the estimates (3) transforming the functional form or (4) dropping one of the highly collinear variables (however this may lead to specification bias so care must be taken)
State whether the following statements about heteroskedasticity are true or false. If false - give reasons why. (d) Sometimes apparent heteroskedasticity can be caused by a mathematical misspecification of the regression model. This can happen for example, if the dependent variable ought to be logarithmic, but a linear regression is run.
(d) True
explained sum of squares=
(expected - average)^2
In one sentence discuss what is meant by the following statements: (i) When we say that a time series variable is I(0) we mean ? (ii) When we say that a time series variable is I(1) we mean ?
(i) When we say that a time series variable is I(0) we mean ?__the time series is stationary and doesn't need to be differenced. (ii) When we say that a time series variable is I(1) we mean ?_the time series is non- stationary and needs to be differenced once to make it stationary.
panel data
(or longitudinal data) set consists of a time series for each cross-sectional member in the data set.
sum of squared deviations from the mean
(sum of the x values individually squared) - (n*mean)^2
standardizing a variable
(value - mean) / std dev standardize so you can compare different units not a summary statistic gives a value for each observation so you can compare across the board makes different values relative to each other
Consider the following estimated model (by OLS), where return is the total return of holding a firm's stock during one year, dkr is the firm's debt to capital ratio, eps denotes earnings per share, netinc denotes net income and salary denotes total compensation, in millions of dollars, for the CEO (estimated standard errors of the parameters in parentheses below the estimates). The model was estimated using data on n = 142 firms. return = −12.3 + 0.32 dkr + 0.043eps − 0.005 netinc + 0.0035salary, (6.89) (0.150) (0.078) (0.0047) (0.0022) n = 142, R2 = 0.0395 Which of the following is the 99% confidence interval for the coefficient on dkr?
(−0.0664,0.7064)
A spark plug manufacturer believes that his plug lasts an average of 30,000 miles, with a standard deviation of 2,500 miles. What is the probability that a given spark plug of this type will last 37,500 miles before replacement? Assume a normal distribution.
0.0013
Consider the following estimated model (by OLS), where return is the total return of holding a firm's stock during one year, dkr is the firm's debt to capital ratio, eps denotes earnings per share, netinc denotes net income and salary denotes total compensation, in millions of dollars, for the CEO (estimated standard errors of the parameters in parentheses below the estimates). The model was estimated using data on n = 142 firms. return = −12.3 + 0.32 dkr + 0.043eps − 0.005 netinc + 0.0035salary, (6.89) (0.150) (0.078) (0.0047) (0.0022) n = 142, R2 = 0.0395 What is the correlation between the fitted values and the dependent variable?
0.1987
Let Z be a standard normal random variable. Find Pr(-0.5 < Z < 0.5).
0.3830
The probability of stock A rising is 0.3; and of stock B rising is 0.4. What is the probability that neither of the stocks rise, assuming that these two stocks are independent?
0.42
7 Assumptions for OLS to be BLUE
1) Regression model is linear, correctly specified, and has an additive term 2) Error term has a zero population mean 3) All explanatory variables are uncorrelated with error term 4) Observations of the error term are uncorrelated with one another (no serial correlation) 5) observations of the error term are uncorrelated with one another (no perfect multicollinearity) 6) Error term has constant variance ( no heteroskedasticity) 7) Error term is normally distributed
The Least Squares Assumption in Multiple Regression
1. Conditional distribution of u given X1i,X2i.. has a mean of zero 2. (X1i,X2i..,Yi)i=1..n are IID 3. Large outliers are unlikely 4. No perfect multicollinearity - if something is perfect multicollinearity then one of the regressors is a perfect linear function of the other regressors - makes it impossible to compute OLS (produces division with zeros)
OLS assumptions for multiple regression
1. E (ui | X1i = x1i ,X2i = x2i,...,Xki = xki ). In words, the expectation of ui is zero regardless of the values of the k regressors. 2. (X1i ,X2i ,...,Xki ,Yi ) are independently and identically distributed (i.i.d). This is true with random sampling. 3. (X1i ,X2i,...,Xki ,Yi ) have finite fourth moments. That is, large outliers are unlikely (this is generally true in economic data). New assumption: no perfect multicollinearity between regressors 4. The regressors (X1i ,X2i ,...,Xki ) are not perfectly multicollinear. This means that none of the regressors can be written as a perfect linear function of only the other regressors.
Suppose you believe there is heteroskedasticity proportional to the square of an explanatory variable x and so you divide all the data by x prior to applying OLS. If in fact there is no heteroskedasticity this is undesirable because it causes your OLS estimates to be biased.
1. False. If there is no heteroskedasticity and you divide all the data through by x you will introduce heteroskedasticity into the equation. In the presence of heteroskedasticy OLS estimates are unbiased.
Steps of hypothesis testing for a parameter of the linear regression model
1. Formulate the null hypothesis (e.g. H0 : βj = 0) 2. Formulate the alternative hypothesis (e.g. H1 : βj ≠ 0) 3. Specify the significance level α (e.g. α = 5%) 4. Calculate the actual value of the decision variable, called t-statistic. 5. Compute the critical values zα/2 and z1-α/=2. 6. Decide whether you can or cannot reject the null hypothesis.
Two conditions for a valid instrument
1. Instrument relevance: corr(Z,X) does not equal zero 2. Instrument exogenity: corr(Z,u)=0 thus an instrument that is relevant and exogenous can capture movement in X that are exogenous. This exogenous variation can in turn be used to estimate the population coeff B1.
unbiasedness of OLS
1. LINEAR IN PARAMETERS-In the population model, the dependent variable, y, is related to the independent variable, x, and the error (or disturbance), u, as y=B0+B1x+u where b0 and b1 are the population intercept and slope parameters, respectively. 2. random sampling- We have a random sample of size n, {(xi,yi): i= 1, 2, ..., n}, following the population model in equation 3. SAMPLE VARIATION IN THE EXPLANATORY VARIABLE- The sample outcomes on x, namely, {xi, i=1, ..., n}, are not all the same value 4.ZERO CONDITIONAL MEAN- The error u has an expected value of zero given any value of the explanatory variable. In other words, E(u|x)=0 5..5 HOMOSKEDASTICITY (or nonconstant variance).- The error u has the same variance given any value of the explanatory variable. In other words, Var(u|x)=σ^2
The Expected Value of the OLS Estimators
1. Linear in Parameters- The model in the population can be written as y=B0+B1x1+B2x2+...+Bkxk+u, where B0, B1, ..., Bk are the unknown parameters (constants) of interest and u is an unobserved random error or disturbance term 2. Random Sampling- We have a random sample of n observations, {(xi1, xi2, ..., xik, yi ): i=1, 2, ..., n} 3. No Perfect Collinearity- In the sample (and therefore in the population), none of the independent variables is constant, and there are no exact linear relationships among the independent variables. 4. Zero Conditional Mean- The error u has an expected value of zero given any values of the independent variables. In other words, E(u|x1, x2, ..., xk)=0.
Six Steps in the Applied Regression Analysis
1. Review the literature and develop the theoretical model 2. Specify the model : select the independent variables and functional form 3. Hypothesize the expected signs of the coefficients 4. Collect, inspect and clean the data 5. Estimate and evaluate the equation 6. Document the results.
There is a simple relationship between ̃b1 and ˆb1, which allows for interesting comparisons between simple and multiple regression: ̃B1=B1+B2δ1
1. The partial effect of x2 on yˆ is zero in the sample. That is, bˆ2=0 2. x1 and x2 are uncorrelated in the sample. That is, δ1=0.
What does the sign of Cov(Xi,ui) depend on?
1. The sign of Cov(Xi,Ai) i.e. whether the omitted variable Ai is positively or negatively correlated with Xi 2. The sign of β₂i.e. whether the omitted variabble Ai positively or negatively affects Yi
The four specification criteria
1. Theory: Is the variable's place in the equation unambiguous and theoretically sound? 2. t-Test: Is the variable's estimated coefficient significant in the expected direction? 3. R2: Does the overall fit of the equation (adjusted for degrees of freedom) improve when the variable is added to the equation? 4. Bias: Do other variables' coefficients change significantly when the variable is added to the equation?
Assuming that x1 and x2 are not uncorrelated, we can draw the following conclusions:
1. When b2 doesn't equal 0, B ̃ is biased, Bˆ1 is unbiased, and Var(B ̃1)<Var(bˆ1). 2. When B2=0, B ̃1 and Bˆ1 are both unbiased, and Var(B ̃) <Var(Bˆ1).
5 necessary conditions for CLM to hold
1. errors must have constant variance (CV) 2. errors must be normally distributed (N) 3. errors must be sequentially independent (IE) 4. explanatory variables must be independent of each other (IV) 5. all relevant independent variables must be counted (C) cv ie iv n c
6 main CLM (classical linear model) assumptions
1. linear in parameters y = b0 + b1x1 + b2x2...+ bkxk + u 2. random sampling of population 3. no exact linear relationship in x's 4. conditional expected value of error is 0 e(u | x) = 0 5. variance of error is constant 6. error is independent of x's and normally distributed cov(x, u) = 0 and u is normally distributed
assumptions about variance in error for ols estimator
1. not serially coralated 2. homoskadicety 3. normal distribution
Functional Form 1. Linear 2. Double Log 3. Semilog 4. polynomial
1. the slope of the relationship between the independent variable and the dependent variable is constant Slopes are constant, elasticities are not 2. the natural log of Y is the dependent variable and the natural log of X is the independent variable: lnY = β0 + β1 lnX1 + β2 lnX2 + e the elasticities of the model are constant and the slopes are not. 3. is a variant of the double-log equation in which some but not all of the variables (dependent and independent) are expressed in terms of their natural logs. Yi = β0 + β1 lnX1i + β2X2i + ei 4. express Y as a function of independent variables, some of which are raised to powers other than 1. For example, in a second-degree polynomial (also called a quadratic) equation, at least one independent variable is squared: Yi = β0 + β1X1i + β2(X1i) 2 + β3X2i + ei
Algebraic properties of OLS statistics
1. the sum of the OLS residuals is zero 2. The sample covariance between Xi and OLS residuals is zero 3. The point (¯x,¯y) is always on the regression line
Two sided t-tests
1. two sided tests of whether an estimated coefficient is significantly different from zero 2. Two-sided tests of whether an estimated coefficient is significantly different from a specific nonzero value
Given the following probability distribution: X P(X) 1 0.2 2 0.3 3 0.3 4 0.2 What is the variance of the random variable X?
1.05
Threats to internal validity
1.non random assignment 2. Failure to follow random protocol 3. Attrition bias 4. Hawthorne Effect 5. Failure to follow intended treatment protocp;
what is the estimator of σ^2
1/n sum(^u^2i) = SSR/n but it is bias because it does not account for two restrictions sum(^ui)=0 and sum(xi^ui)=0 if we know n-2 of the residuals, we can get the other two by using the restrictions in the first condition so The unbiased estimator is 1/n-2 sum ^u^2 = SSR/(n-2)
If we wish to test the null hypothesis that B4 =B5=B6 in a model with 30 observations and five explanatory variables other than the intercept term, we will need to compare the F-test statistic for this hypothesis with the critical values of an F-distribution with 2 and 24 degrees of freedom.
10. True. While three parameters are involved, there are only two restrictions. The number of restrictions provides the so-called numerator degrees of freedom; the degrees of freedom in the unrestricted model, N-K, is the denominator and will be 30- 6=24.
2. When Y is a binary variable, using the linear probability model ensures that the predicted probability that Y=1 is between 0 and 1 for all values of X.
2. False. OLS predictions will not necessarily fall between 0 and 1.
4. Multicollinearity causes the values of estimated coefficients to be insensitive to the presence or absence of other variables in the model. ˆ
4. False. Multicollinearity can make it difficult to distinguish the individual effects on the dependent variable of one regressor from that of another regressor and can cause the values of estimated coefficients to be sensitive to the presence or absence of other variables in the model.
to test for the significance of entity fixed effects you should calculate the F statistic and compare it to the critical value from your F distribution where w equals
47
sum of deviations from the mean
50% of amount of values are above/below the mean, so added together the total deviations from the mean are 0
Consider the following estimated model (standard errors in parentheses) wage = 235.3923 + 60.87774educ − 2.216635hours, (104.1423) (5.716796) (1.738286) n = 935, R2 = 0.108555, where wage is the wage in euros, educ is the level of education measured in years and hours is the average weekly hours of work. What is the F-statistic for the overall significance of the model.
56.747
7. Consider a model where the return to education depends upon the amount of work experience: log(wage) = B1educ B2exper B3(exper*educ) where educi stands for the number of years of education and experi is the number of years of work experience. The appropriate test to the null hypothesis that the return to education does not depend on the level of work experienceis H0: B1=B3
7. False. The appropriate test is H0 : B3=0.
8. Suppose a null hypothesis CAN NOT be rejected at the 13% significance level. Then, based upon this information, we can conclude that the p-value of the test may be equal to 7%.
8. False. The p value would have to be greater than 0.13.
Denote the R2 of the unrestricted model by R2 and the R2 of the restricted model by R2 . Let R2 and R2 be 0.4366 and 0.4149 respectively. The difference R UR R between the unrestricted and the restricted model is that you have imposed two restrictions. The unrestricted model has one intercept and 3 regressors. There are 420 observations. The F-statistic in this case is
8.01
9. Assuming that the number of regressors is greater than 1 then the adjusted and unadjusted R2s are identical only when the unadjusted R2 is equal to 1.
9. True.FollowsfromR2=1-N-1(1-R2) N-K
Pooled Cross Section
A data configuration where independent cross sections, usually collected at different points in time, are combined to produce a single data set.
Define cross sectional data set...
A data set collected by sampling a population at a given point in time.
Panel Data
A data set constructed frm repeated cross sections over time. With a balanced panel, the same unit appear in each time period. With an unbalanced panel, some units do not appear in each time period, often due to attrition.
Biased Toward Zero
A description of an estimator whose expectation in absolute value is less than the absolute value of the population parameter.
Ordinary Least Squares (OLS)
A method for estimating the parameters of a simple and multiple linear regression model. The OLD estimates are obtained by minimizing the sum of squared residuals (loss function).
Logarithmic transformations of the dependent variable
A model that gives approximately a constant percentage effect is log(wage) = B0 + b1edu + u the effect of education on wage then is, %delta(wage) = 100B1(delta(edu)
Simple Linear Regression Model
A model where the dependent variable is a linear function of a single independent variable, plus an error term.
Constant Elasticity model
A model where the elasticity of the dependent variable, with respect to an explanatory variable, is constant; in multiple regression, both variables appear in logarithmic form.
Population Model
A model, especially a multiple linear regression model, that describes a population.
Define panel data set...
A panel data (or longitudinal data) set consists of a time series for each cross-sectional member in the data set. As an example, suppose we have wage, education, and employment history for a set of individuals followed over a ten-year period.
Confidence Interval (CI)
A rule used to construct a random interval so that a certain percentage of all data sets, determined by the confidence level, yields an interval that contains the population value.
Random Sampling
A sampling scheme whereby each observation is drawn at random from the population. In particular, no unit is more likely to be selected than any other unit, and each draw is independent of all other draws.
F Statistic
A statistic used o test multiple hypotheses about the parameters in a multiple regression model.
Micronumerosity
A term introduced by Arthur Goldberger to describe properties of econometric estimators with small sample sizes.
Multicollinearity
A term that refers to correlation among the independent variable in a multiple regression model; it is usually invoked when some correlations are "large," but an actual magnitude is not well defined.
Two-Tailed Test
A test against a two-sided alternative.
Joint Hypotheses Test
A test involving more than one restriction on the parameters in a model.
Multiple Hypotheses Test
A test of a null hypothesis involving more than one restriction on the parameters.
Overall Significance of the Regression
A test of the joint significance of all explanatory variables appearing in a multiple regression equation.
Multiple Regression Analysis
A type of analysis that is used to describe estimation of an inference in the multiple linear regression model.
Lagged Dependent or Independent
Account for dynamics in time series; account for habits or learning
Hypothesis test
Always saying something about the population Testing if something is true about the population or not Can never prove Ho true, only that it is not true
One-Sided Alternative
An alternative hypothesis that states that the parameter is grater than (or less than) value hypothesized under the null.
Two-Sided Alternative
An alternative where the population parameter can be either less than or greater than the value stated under he null hypothesis.
Econometric Model
An equation relating the dependent variable to a set of explanatory variables and unobserved disturbances, where unknown population parameters determine the ceteris paribus effect of each explanatory variable
Endogenous Explanatory Variable
An explanatory variable in multiple regression that is correlated with the error term, either because of an omitted variable, measurement error, or simultaneity.
What is the trade-off when including an extra variable in a regression?
An extra variable could control for omitted variable bias, but it also increases the variance of other estimated coefficients.
Suppose you are interested in investigating the wage gender gap using data on earnings of men and women. Which of the following models best serves this purpose? A. Female = β0 + β1Wage + u where Female (=1 if female) is an indicator variable and u the error term. B. Wage = β0 + β1 Female + u where Female (=1 if female) is an indicator variable and u the error term. C. Wage = β0 + u where u is the error term. D. Male = β0 + β1Female + u where Male (=1 if male) is an indicator variable and u the error term
B
In a multiple linear regression where the Gauss-Markov assumptions MLR.1 through MLR.4 hold, why can you interpret each coefficient as a ceteris paribus effect?
Because the Ordinary Least Squares (OLS) estimator of the coefficient on variable xj is based on the covariance between the dependent variable and the variable xj after the effects of other regressors has been removed
simplest way to test for heteroskedasticity; regress residuals on our x's
Breusch-Pagan test
The 95% confidence interval for β1 is the interval: A. β1 - 1.645SE β1 , β1 + 1.645SE β1 . B. (β1 - 1.96SE(β1), β1 + 1.96SE(β1)) C. β1 - 1.96SE β1 , β1 + 1.96SE β1 D. β1 - 1.96, β1 + 1.96
C
T distribution
Centered at zero but has a large standard deviation
sample of 100 households and their consumption and income patterns using these observations, you estimate the following regression Ci= B1Yi+ui where C is consumption and Y is disposable income. The estimate of B1 will tell you
Change in consumption/ change in income
Method Three: Maximum Likelihood
Chooses model parameters to maximize the likelihood function. The value that makes the likelihood of the observed data the largest should be chosen Often it is more convenient to work with the log likelihood function. Using calculus, we find the derivative wrt the model parameters of the objective function, the log likelihood function, and set it equal to zero to find the maximum
Specification
Choosing from the following components 1) Independent variables and how they should be measured 2) the functional form of the variables 3) The properties of the stochatic error term
Run separate regressions, the new unrestricted SSR is given by the sum of the SSR of these two separate regressions, then just run a regression for the restricted model
Chow test
A researcher estimates the effect on crime rates of spending on police by using city-level data. Which of the following represents simultaneous causality?
Cities with high crime rates may need a larger police force, and thus more spending. More police spending, in turn, reduces crime.
Gauss-Markov and normality assumptions are collectively referred to as the...
Classical Linear Model (CLM) assumptions
Define Cointegration
Cointegration consists of matching the degree of nonstationarity of the variables in an equation in a way that makes the errors of the equation stationary. Even though individual variables might be nonstationary it's possible for linear combinations of nonstationary variables to be stationary or cointegrated.
Conditional distributions
Continuous "the probability that y=y given that x=x" Probability of the next thing happening is dependent on the first thing happening Free throw shooting (making the first what is the probability of making the second) (Not making the first what is probability of the second)
Using the textbook example of 420 California school districts and the regression of test scores on the studentteacher ratio, you find that the standard error on the slope coefficient is 0.51 when using the heteroskedasticityrobust formula, while it is 0.48 when employing the homoskedasticityonly formula. When calculating the t statistic, the recommended procedure is to: A. use the homoskedasticityonly formula because the t statistic becomes larger. B. first test for homoskedasticity of the errors and then make a decision. C. make a decision depending on how much different the estimate of the slope is under the two procedures. D. use the heteroskedasticity-robust formula.
D
Covariance
Describes the relationship between two variables and the entire population (if they move apart or move together)
Fan Shapes
Desirable pattern in constant, even spread around a line with intercept zero and slope zero
Coefficients (B)
Determine the coordinates of the straight line at any point
Drawbacks of correlation
Does not perform well with non-linear data No units Does not do well with skewness (outliers)
Fix Multicollinearity
Drop a redundant variable Increase Sample Size Tranfrom variables (use logs/change time series frequency)
X and Y are two random variables. Which of the following statements holds true regardless of whether X and Y are independently distributed?
E(Y ) = E[E(Y |X)]
with idd sampling each of the following is true except
E(Y with bar)<e(Y)
The condition for ^β₁to be an unbiased estimator of β₁is
E(^β₁)=β₁
an estimator u with a bar of the population value u without a bar is unbiased if
E(uwith bar) = U without bar
Zero conditional mean assumption
E(u|x)=E(u) E(u)=0 If u and x are uncorrelated, they are not linearly related The average value of u does not depend on the value of x
Expressing zero condition mean and homoskedacity assumption compactly:
E(y|x) = β0 + β1x Var(y|x) = σ^2
An estimator θ^ of the population value θ is unbiased if
E(θ^) = θ
An estimator θ of the population value θ is unbiased if
E(θ₋) = θ.
R^2
ESS/TSS -- coefficient of determination -> measures best fit
Variance formulas
E[(X - µ^2)] E(X^2)-µ^2
Let Y be a random variable with mean μY . Then V ar(Y ) equals:
E[(Y − μY )2]
Let Y be a random variable with mean μY . Then var(Y ) equals:
E[(Y − μY )2].
Consider the model: log(price) = β0 + β1score + β2breeder + u, where price is the price of an adult horse, score is the grade given by a jury (higher score means higher quality of the horse) and breeder is the reputation of the horse breeder. The estimated model is: log(price) = 5.84 + 0.21score + 0.13breeder What is the interpretation of the estimated coefficient on score?
Each additional grade point increases the horse's price by 21%, on average, ceteris paribus.
Parameter estimators
Each three approach generates our OLS estimators, ^B1 and ^B0
Economic vs Econometric model
Economic model consists of mathematical equations that describe relationships. Econometric models take economic models that do not take into account variables that are not directly observed and includes in the analysis. The choice of these variables is based on economic theory.
Correlation vs Causality
Economists want causality instead of correlation Shark attacks in summer as well as ice cream consumption. Positive correlation but do not cause one another. Just because things are correlated does not mean they cause each other.
Assumption 7
Error is normally distributed
Assumption 5
Error term has constant variance
Step one of TSLS
Estimate OLS the first stage auxilary regression X= pi0 +pi1Z+ u and obtain predicted values
In the regression model y=β0 +β1x+β2d+β3(x×d)+u, where x is a continuous variable and d is a dummy variable, to test that the intercept and slope parameters for d = 0 and d = 1 are identical, you must use the
F-statistic for the joint hypothesis that β2 = 0, β3 = 0.
weak instrument
F<10 Cov(z,x) is small will elad to b1 can be volatile amd the distribution may be very spread out nad non standard
(T/F) If E(Xi) = µ, then W =[((ΣXi)+4)/N] is an unbiased estimator of µ
FALSE. E(W) = E[(ΣXi+4)/N] = 1/N E(ΣXi + 4) = 1/N [E(X₁) + E(X₂) + ... + 4] = 1/N [µ + µ + ... + 4] = (Nµ + 4)/N = µ + 4/N ≠ µ
(T/F) If the true model is Y = ý₀+ý₁X₁+ ý₂X₂+ ε but you omit X₂and estimate Y = ý₀+ý₁X₁+ ε, your estimate of ý₁will always be biased
FALSE. By omitting a variable that is part of the model we risk to obtain an estimate of ý₁that is biased (omitted variable bias). However, for an omitted variable to cause a bias two conditions must hold: • The first one is that the omitted variable (in this case X₂) causes Y . Since the question says that, according to model considered, X₂is one of the variables that explains Y , this condition is likely to hold in this case. • The second condition is that the omitted variable is correlated with the variable whose coefficient may be biased. As long as X₂is not correlated with X₁, the estimate of ý₁will still be unbiased even if we omit X₂.
(T/F) Assume that H0 : μY = μY,0 and H1 : μY > μY,0, and Y is normally distributed. To compute the critical value for this 1-sided test, we divide by two the positive critical value of the 2-sided test.
FALSE. If we want to conduct this test on the significance level α (where usually α =0:01, 0:05, or 0:10) we have to look for the critical value Z(1-α) in the corresponding table. If the test is 2-sided we have to look for Z(1-α/2) .
(T/F) The higher the standard error of an estimator ^β₁, the more likely it is that you will reject the null hypothesis H0 : β₁ = 0.
FALSE. We divide by the standard error to find the actual value of the t statistic, therefore a higher SE reduces the absolute value of the statistic, thus it becomes less likely that we reject the null.
(T/F) In the following model, Wage = α₀+α₁Educ+α₂Female+α₃Black+α₄Female X Educ+u, to check whether the returns to education are the same for males and females you would have to test a joint hypothesis with an F test.
FALSE. You only have to test one hypothesis on one coefficient: H₀: α₄= 0 (Note: You would have to test a joint hypothesis with an F test if you wanted to check whether average wages are equal for men and women -other things equal. In this case, H₀ : α₂= 0 and α₄= 0.)
Statistically Insignificant
Failure to reject the null hypothesis that a population is equal to zero, at the chosen significance level.
Type II Error
Failure to reject the null hypothesis when its is false.
Jointly Insignificant
Failure to reject, using an F test at a specified significance level, that all coefficients for a group of explanatory variables are zero.
C. True/False: Multiple linear regression is used to model annual income (y) using number of years of education (x1) and number of years employed in current job (x2). If the F-statistic for testing H0 : B1 = B2 = 0 has a p-value equal to 0.001, then we can conclude that both explanatory variables have an effect on annual income.
False
H. True/False: A regression had the following results: SST = 82.55, SSE = 29.85. It can be said that 73.4% of the variation in the dependent variable is explained by the independent variables in the regression.
False
(T/F) To obtain the slope estimator using the least squares principle, we divide the sample covariance of X and Y by the sample variance of Y .
False Instead to get the the slope estimator we divide the sample covariance of X and Y by the sample variance of X.
T/F If the correlation coefficient between two variables is zero, it means that the two variables are independent.
False generally. Covariance is a measure of linear dependence between two random variables. However, variables can be nonlinearly related.
(t/f) When we say that an estimated regression coefficient is statistically significant we mean that it is statistically different from 1.
False. It is statistically significant from 0 not 1.
t/f The way to determine whether a group of explanatory variables exerts significant influence on the dependent variable is to see if any of the explanatory variables has a significant t statistic; if not, they are statistically insignificant as a group.
False. Use the F test not individual t tests.
Consider the following estimated model (by OLS), where return is the total return of holding a firm's stock during one year, dkr is the firm's debt to capital ratio, eps denotes earnings per share, netinc denotes net income and salary denotes total compensation, in millions of dollars, for the CEO (estimated standard errors of the 3 parameters in parentheses below the estimates). The model was estimated using data on n = 142 firms. return = −12.3 + 0.32 dkr + 0.043eps − 0.005 netinc + 0.0035salary, (6.89) (0.150) (0.078) (0.0047) (0.0022) n = 142, R2 = 0.0395 What can you say about the estimated coefficients of the variable salary? (con- sider a two-sided alternative for testing significance of the parameters)
For each additional million dollars in the wage of the CEO, return is predicted to increase by 0.0035, on average, ceteris paribus. But it is not statistically significant at the 5% level of significance.
Consider the following estimated model (standard errors in parentheses) wage = 235.3923 + 60.87774educ − 2.216635hours, (104.1423) (5.716796) (1.738286) n = 935, R2 = 0.108555, where wage is the wage in euros, educ is the level of education measured in years and hours is the average weekly hours of work. What can you say about the estimated coefficient of the variable educ? (consider a two-sided alternative for testing significance of the parameters)
For each additional year of education, wage is predicted to increase by 60.88 euros, on average, ceteris paribus. But it is statistically significant at a 5% level of significance.
housing = 164 + .27(income) Interpret
For every additional (marginal) dollar of income earned, $0.27 goes toward housing.
no multicollinearity (model is estimating what it should be)
Gauss-Markov assumption 3
errors are homoscedastic
Gauss-Markov assumption 5
How to fix serial correlation?
Generalized least squares; either cochrane orcutt methos, prais-watson, or newey west standard errors
Gauss-Markov theorm
Give the classical assumptions the ols estimator of beta is minimum variance estimator from among the set of linear unbiased estimators. BLUE
Test of Joint Hypotheses : Test using F stat
H0 B1=0 B2=0 vs H1: B1 not 0 B2 not one 0
Null hypothesis
H0:Bj=0 Since Bj measures the partial effect of xj on (the expected value of) y, after controlling for all other independent variables this means that, once x1, x2, ..., xj21, xj11, ..., xk have been accounted for, xj has no effect on the expected value of y
two-sided alternative
H1: B≠0 xj has a ceteris paribus effect on y without specifying whether the effect is positive or negative. This is the relevant alternative when the sign of bj is not well determined by theory (or common sense). Even when we know whether bj is positive or negative under the alternative, a two-sided test is often prudent.
if null hypothesis states H0; E(Y)=uy then a two sided alternative hypothesis is
H1: E(Y) does not equal Uy,0
False
Heteroskedasticity means constant variance
True
Heteroskedasticity occurs primarily with cross-sectional data.
cross-sectional datat
Heteroskedasticity typical of
Two stage Least Square Estimator
IF the instrument Z satifies the condtions of instruments relevance and exogeniety, the ceoff B1 can be estimated using IV estimator TWO STAGE 1. decomposes X into two componets. a problematic componet that may be correlated with the regression error adn another problem free componet uncorrelated with the error. 2. Uses the problem free componet to estimate Z.
statistically significant
If H0 is rejected in favor of the model at the 5% level, we usually say that "xj is statistically significant, or statistically different from zero, at the 5% level."
jointly statistically significant
If H0 is rejected, then we say that xk-q+1, ..., xk are jointly statistically significant
Gauss Markov
If OLS assumptions 1-4 hold then B0(hat) = B0 of population approximately B1(Hat) = B1 of population approximately
VIF Rule of Thumb
If VIF greater than 10 indicates presence of high degree of multicollinearity
perfect collinearity
If an independent variable is an exact linear combination of the other independent variables, then we say the model suffers from perfect collinearity
Covariance Stationary
If both the mean, variance are finite and constant and the covariance of the time series with leading or lagged values of itself is constant, then the time series is said to be c
we have autocorrelation
If covariances is not zero
jointly insignificant
If the null is not rejected, then the variables are jointly insignificant, which often justifies dropping them from the model.
The residual u=yi-y^i
If u>0, then yˆi is below yi, which means that, for this observation, yi is underpredicted. If u<0, then yi<yˆi, and yi is overpredicted. 1.The sample average of the residuals is zero and so y= y^ 2. The sample covariance between each independent variable and the OLS residuals is zero. Consequently, the sample covariance between the OLS fitted values and the OLS residuals is zero. 3.Thepoint(x1,x2,...,xk,y) is always on the OLS regression line: y=B0+B1x1+B2x2+...+Bkxk
In the estimated model log(q)=502.57−0.9log(p)+0.6log(ps)+0.3log(y), where p is the price and q is the demanded quantity of a certain good, ps is the price of a substitute good and y is disposable income, what is the meaning of the coefficient on ps?
It is the cross-price elasticity of demand in relation to the substitute good and it bears the expected sign.
"Suppose that a researcher, using wage data on 235 randomly selected male workers and 263 female workers, estimates the OLS regression: Wage = 11.769 + 1.993 × Male, Rsq= 0.04, SER= 3.9. What does the SER of 3.9 tell us ?"
It measures the average size of the OLS residual (the average mistake made by the OLS regression line) and tells us our average error is 3.9 dollars.
Multiple Regression
Linear regression model with more than one regressors
Heteroskedasticity: Impact on OLS Properties
Linearity: Still Met Unbiasedness: Still Met Minimum Variance: Not Met
Most important non-linear econometric data tool
Logs
Akaike Information Criterion
Look for model with smalest AIC
Schwarz's Information Criterion
Look for model with smalest Schwarz value
P value
Lowest probability at which we can reject null
Formula for margin of error
ME=z (stdev/sqrt(n))
Classical Linear Model Assumptions
MLR assumptions 1-6. Including linearity in the parameters, no perfect colinearity, the zero conditional mean assumption, homoskedasticity, no serial correlation, and normality of the errors.
Goodness of fit 7
Most theortical logical function
Markov Example
Move from being unemployed to employed, back to being unemployed
it is a matter of degree, check for severity
Multicollinearity is not a present/absent problem
Some or all of your t-ratios are individually small (canot reject individual slopes being zero), but the F-test value is large (rejects all slopes simultaneously zero)
Multicollinearity may be present in your model if
degrees of freedom
N-K-1
Why are the coefficients of probit and logit models estimated by maximum likelihood instead of OLS?
OLS cannot be used because the regression function is not a linear function of the regression coefficients.
unbiased
OLS estimated coefficients centered around the true/population values
gauss-markov theorem
OLS estimator is the best linear unbiased estimator (BLUE) of linear model with average zero error + constant error variance
the variance in our estimator lets us infer the likely accuracy of a given estimate
OLS sampling errors
Increased Importance
Observations with small variances
Conditional probability
One thing depends on the other. (Conditional on walsh being a student, what are the chances he drinks on tuesday)
Rejection region
Probability of making type 1 error, its on the distribution but we end up rejecting a true null
method of least squares
Process of fitting a mathematical function to a set of measured points by minimizing the sum of the squares of the distances from the points to the curve.
when the estimated slope coefficient in the simple regression model b1 is zero, then
R^2=0
R-squared/ coefficient of determination
R^2=SSE/SST=1-SSR/SST When interpreting R^2, we usually multiply it by 100 to change it into a percent: 100xR^2 is the percentage of the sample variation in y that is explained by x.
Homoskedasticity
Random error terms
serial correlation
Refers to the situation in which the residual terms are correlated with one another; error terms follow each other; error terms are thus too small
Limitations of t-test
Researchers confuse statistical significance with theoretical validity or empirical importance. Does not say anything about which variables determine the major portion of the variation in the dependent variable.
In the simple regression model an unbiased estimator for V ar(u) = σ2, the variance of the population regression errors, is:
SSR/(n−2).
total sample variation in explanatory variable xj; converges to n * var(xj)
SST (variance)
Which of the following statements is correct?
SST = SSE + SSR
Which of the following statements is correct? a) SST = SSE + SSR b) SSE = SSR + SST c) SSE > SST d) R2 =1−SSE/SST
SST = SSE + SSR
decomposition of total variation
SST = SSE + SSR
Pooled cross sectional
Same as cross sectional EXCEPT You do several pooling of data in different points of time Different units each time
Panel data
Same entity multiple periods
Labor economists studying the determinants of women's earnings discovered a puzzling empirical result. Using randomly selected employed women, they regressed earnings on the women's number of children and a set of control variables (age, education, occupation, and so forth). They found that women with more children had higher wages, controlling for these other factors. What is most likely causing this result?
Sample selection bias
Suppose that a state offered voluntary standardized tests to all its third graders and that these data were used in a study of class size on student performance. Which of the following would generate selection bias?
Schools with higher-achieving students could be more likely to volunteer to take the test.
Observational Data
See Nonexperimental Data
Sample Regression Function (SRF)
See OLS regression line
Coefficient of Determination
See R-sqaured
Covariate
See explanatory variable
Independent Variable
See explanatory variable
Regressor
See explanatory variable
Level of signfigance
Shows the probability of observing a t-value greater than the critical t-value, shows the probability of making type 1 error
just plot residuals against time
Simplest method of detecting autocorrelation
Time Series data
Single individual/entity/varibale Data over time (multiple points in time) same units statistical tracking unemployment each year
Some Multicollinearity
Some linear relationship Typical of economic data
Imagine that you were told that the tstatistic for the slope coefficient of the regression line Test Score = 698.9 - 2.28 × STR was 4.38. What are the units of measurement for the tstatistic?
Standard deviations
OLS characteristics
Sum of residuals =0 best estimator when OLS is an estimator and given Bhat produced by OLS is an estimate
(T/F) Suppose you run a test of the hypothesis H₀ : β₁= 0 against the two-sided alternative H₁: β₁≠ 0. Your t-statistic takes the value |-2.001 | > 1.96. You therefore reject the null at the 10% significance level.
TRUE. According to your t-statistic you'd reject the hypothesis at 5% significance level, as | - 2:001| > 1.96. If you reject at 5%, you necessarily reject at 10% (the lower the significance level, the more difficult it is to reject the null).
(T/F) In the model Y = β₀ + β₁X + u, if Cov(Y,X) > 0 then the estimate ^β₁will be greater than zero.
TRUE. Since ^β₁= sXY/s²X and s²X is always positive, the sign of ^β₁is determined by the sign of sXY
(T/F) Everything else equal, the length of the confidence interval decreases with the sample size n
TRUE. Since the length is proportional to the standard error of the estimator and standard error decreases with n, the length of the condence interval decreases with n if everything else stays the same.
Inclusion of an Irrelevant Variable
Te including of an explanatory variable in a regression model that has zero population parameter in estimating an equation by OLS.
False
The Classical Assumption regarding the variance of the disturbance term is that the variance varies from observation to observation.
False
The Cochrane-Orcutt procedure can be used to correct for heteroskedasticity.
Which of the following statements is true?
The F statistic is always nonnegative as SSRr is never smaller than SSRur.
True
The autocorrelation coefficient can be any number between -1 and +1.
What is meant by the best unbiased or efficient estimator? Why is this important?
The best unbiased or efficient estimator refers to the one with the smallest variance among unbiased estimators. It is the unbiased estimator with the most compact or least spread out distribution. This is very important because the researcher would be more certain that the estimator is closer to the true population parameter being estimated.
Normality Assumption
The classical linear model assumption which states that the error or dependent variable) has a normal distribution, condition on the explanatory variables.
Interpretation of the constant
The constant includes the fixed portion of Y that cannot be explained by independent variables
Why does a regression have an error?
The error u arises because of factors, or variables, that influence Y but are not included in the regression function
Assumption 5: Homoskedasticity
The error u, has the same variance given any value of the explanatory variables. Var(u|x) = σ^2 The variance of u, conditional on x, is constant Implies efficiency properties
Homoskedasticity
The errors in a regression model have constant variance conditional on the explanatory variables.
Consider the model: log(price) = β0 + β1score + β2breeder + u, where price is the price of an adult horse, score is the grade given by a jury (higher score means higher quality of the horse) and breeder is the reputation of the horse breeder. Because reputation of the breeder is difficult to measure we decided to estimate the model omitting the variable breeder. What bias can you expect in the score coefficient, assuming breeder reputation is positively correlated with score and β2 > 0?
The estimated coefficient of score will be biased upwards.
Jointly Statistically Significant
The null hypothesis that two or more explanatory variables have zero population coefficients is rejected at the chosen significance level.
Homoskedasticity
The pattern of the covariation is constant (the same) around the regression line, whether the values are small, medium, or large.
Which of the following correctly identifies an advantage of using adjusted R2 over R2?
The penalty of adding new independent variables is better understood through
Semi-elasticity
The percentage change in the dependent variable given a one-unit increase in an independent variable (only dependent variable appears in logarithmic form).
Sample distribution of B^
The probability distribution of these B^ values across different samples
First Order Conditions
The set of linear equations used to solve for the OLS estimates.
Total Sum of Squares (SST)
The total sample variance in a dependent variable about it sample average.
A researcher estimates a regression using two different software packages. The first uses the homoskedasticity-only formula for standard errors. The second uses the heteroskedasticity-robust formula. The standard errors are very different. Which should the researcher use?
The heteroskedasticity-robust standard errors should be used
A researcher investigating the determinants of the demand for public transport in a certain city has the following data for 100 residents for the previous calendar year: expenditure on public transport, E, measured in dollars; number of days worked, W; and number of days not worked, NW. By definition NW is equal to 365 - W. He attempts to fit the following model E= B1 + B2W + B3NW + e .Explain why he is unable to fit this equation. How might he resolve the problem?
There is exact multicollinearity since there is an exact linear relationship between W, NW and the constant term. As a consequence it is not possible to tell whether variations in E are attributable to variations in W or variations in NW, or both. One way of dealing with the problem would be to drop NW from the regression. The interpretation of b2 now is that it is an estimate of the extra expenditure on transport per day worked, compared with expenditure per day not worked.
inclusion of an irrelevant variable or overspecifying the model
This means that one (or more) of the independent variables is included in the model even though it has no partial effect on y in the population.
Comment on whether the following statement is true or false. If a variable in a model is significant at the 10% level, it is also significant at the 5% level.
This statement is false. It works the other way around. If a variable is significant at the 5% level, it is also significant at the 10% level. This is most easily explained on the basis of the p-value. If the p-value is smaller than 0.05 (5%) we say that a variable is significant at the 5% level. Clearly, if p is smaller than 0.05 it is certainly smaller than 0.10 (10%).
Comment on whether this statement is true or false. "The assumption of homoskedasticity states that the variance of the OLS residuals is constant
This statement is false. The homoskedasticity assumption states that the error terms have a constant variance (independent of the regressors). While some people use the terms `disturbance' ('error term') and `residual' interchangably, this is incorrect. Error terms are unobservables in our model and depend upon the unknown population parameters. Residuals are observable and depend upon the estimates for these parameters. Assumptions are always stated in terms of the error terms, never in terms of the residuals (which result after we have estimated the model).
"High multicollinearity affects standard errors of estimated coefficients and therefore estimates are not efficient" Is this statement valid? If yes, cite what assumptions and properties enable you to agree with this statement. If not, explain why not.
This statement is not valid because high multicollinearity does not affect the assumptions made on the model and hence the properties of unbiasedness and efficiency are unaffected by multicollinearity.
G. True/False: Consider a regression in which b2 = - 1.5 and the standard error of this coefficient equals 0.3. To determine whether X2 is a significant explanatory variable, you would compute an observed t-value of - 5.0.
True
I. True/False: A regression had the following results: SST = 82.55, SSE = 29.85. It can be said that 63.84% of the variation in the dependent variable is explained by the independent variables in the regression.
True
(T/F) The output from the Stata command "regress y x" reports the p-value associated with the test of the null hypothesis that β₁= 0.
True. The p-value associated with the test of the null hypothesis that β₁= 0, is reported in a Stata regression under the column "P > | t |."
(T/F) The t-statistic is calculated by dividing the estimator minus its hypothesized value by the standard error of the estimator.
True. The t-statistic is constructed by taking the estimator and subtracting off the hypothesized value and then dividing that quantity by the standard error of the estimator.
(T/F) When the estimated slope coefficient in the simple regression model, ^β₁is zero, then R² = 0.
True. When ^β₁= 0 then Xi explains none of the variation of Yi, and so the ESS (Explained Sum of Squares) = 0. Thus we have R²= ESS/TSS = 0
(T/F) In the presence of heteroskedasticity, and assuming that the usual least squares assumptions hold, the OLS estimator is unbiased and consistent, but not BLUE.
True. With both homoskedasticity and heteroskedasticity, the OLS estimator is unbiased and consistent, but it requires homoskedasticity to be BLUE.
False
Under heteroskedasticity, the conventionally calculated regression variance estimator, s 2 , is unbiased since it has nothing to do with the disturbance term
Total sum of square
Use squared variations of Y around its mean as a measure of amount of variation to be explained by regression, TSS=ESS+RSS
The error term is homoskedastic if
V ar(u|x) is constant
σ^2, error varience
Var(u|x)=E(u^2|x)-[E(u|x)]^2
Property one of variance
Var(x) = 0 iff there is a constant c, such that P(X=c)=1, in which case E(x) = C
Omitted Variables
Variables that are in ui i.e. variables that affect Y other than X
Independent or explanatory variables
Variables within the function
Random sampling
We have a random sampling size of n, following the population model y=B0 + B1x + u
What does it mean when you calculate a 95% confidence interval?
Were the procedure you used to construct the confidence interval to be re- peated on multiple samples, the calculated confidence interval (which would differ for each sample) would encompass the true parameter 95% of the time.
downward bias.
When E(B ̃1) , B1, B ̃1 has a downward bias.
Exclusion Restriction
Z cannot be part of the true model. this maens that Z is correlated with X, but has no direct effect of Y. In other words, in the presence of X, Z has no additional explanatory power for Y in the model
z stat or t stat
Z score when st deviation is known of population T-score when you know st dev of sample only
Predicted Value
^Y = b₀+b₁Xi
estimation results are summarized as
^Yi = ^β₀ + ^β₁ Xi (SE(^β₀)) (SE(^β₁))
The Central Limit Theorem (CLT) implies that
^βj ~ N(βj, Var(^βj)) and (^βj-βj)/SE(^βj) ~ N(0,1)
(T/f) The OLS intercept coefficient ^β₀ is equal to the average of the Yi in the sample.
^β₀ = ~Yi-^β₁X
Var(^β₁)
^σ²(u)/Σ(i)(Xi-~X)²
interpretation of the slope coefficient in the model ln(Yi) = β0 + β1 ln(Xi)+ ui is as follows:
a 1% change in X is associated with a β1 % change in Y.
smaller coefficients (downward bias)
a bias towards zero means...
smaller coefficients, often referred to as downward bias
a bias towards zero means...
The interpretation of the slope coefficient in the model ln(Yi) = β0 + β1Xi + ui is as follows:
a change in X by one unit is associated with a 100 β1 % change in Y.
causality vs correlation
a correlation between two variables does not imply causality
coefficient of determination
a descriptive measure of the strength of the regression relationship, a measure of how well the regression line fits the data
F Test
a formal hypothesis test that is designed to deal with a null hypothesis that contains multiple hypotheses or a single hypothesis about a group of coefficients.
regression specification error test (RESET)
a general way to run a higher-order terms specification test; includes squares and possibly higher order fitted values in the regression; failure to reject implies that some combination of higher order terms and interactions of your X variables would produce a better model
scatter plot
a graph with points plotted to show a possible relationship between two sets of data.
Simple correlation coefficient (r)
a measure of the strength and direction of the linear relationship of two variables. (-1 to +1) If "r" is high (in absolute value), multicollinearity is a potential problem .8 and up is high
error variance
a measure of the variability of the distance between our actual and predicted Y observations (the higher the error variance, the noisier our beta estimates will be)
we believe coefficients are appropriately specified and identified for the sample
a model is internally valid if...
An estimate is
a nonrandom number
assume that for the T=2 time period case you have estimated a simple regression in changes model and found a statistically significant positive intercept
a positive mean change in the LHS variable in the absence of a change in the RHS variable
an estimator is
a random variable
An estimator is
a random variable.
An estimator is:
a random variable.
confidence interval
a range of values so defined that there is a specified probability that the value of a parameter lies within it.
probability distribution
a set of all random variables with the probabilities of all possible outcomes
specification
a specific version of a more general econometric model
Null Hypothesis
a statement of the values that the researcher does not expect.
robust standard errors
a technique to obtain unbiased standard errors of OLS coefficients under heteroscedasticity
Discrete random variable
a variable that takes on only a finite or countably infinite number of values. flipping a coin A number will either be something or another thing
residual (e)=
actual value - estimated value
Ramsey RESET test
add Y^2, Y^3, and Y^4; perform F-test, if different overall fits, it's likely misspecified; checks for ommitted variables, specification errors
11. We would like to predict sales from the amount of money insurance companies spent on advertising. Which would be the independent variable? a) sales. b) advertising. c) insufficient information to decide.
advertising
changing the unit of measurement of any independent variable, where log of the independent variable appears in the regression
affects only the intercept coefficient
Changing the unit of measurement of any independent variables, where log of the independent variable appears in the regression:
affects only the intercept coefficient.
Imperfect multicolinearity
affects the standard errors
slope dummy variables
aka interaction term; allows slope of the relationship between dependent variable and independent variable to be different whether or not the dummy is met
null hypothesis of an F-test for overall significance=
all coefficients equal zero simultaneously; null hypothesis that the fit of the equation isn't significantly better than that provided by using the mean alone
Ceteris Paribus
all other relevant factors are held fixed
The overall regression F-statistic tests the null hypothesis that
all slope coefficients are zero.
In the Chow test the null hypothesis is
all the coefficients in a regression model are the same in two separate popu- lations.
In the Chow test the null hypothesis is:
all the coefficients in a regression model are the same in two separate populations.
panel data
also called longitudinal data
biased estimator
an estimator that comes from a sampling distribution that is not centered around the true value
example of cross sectional data
analyzing the behavior of unemployment rates across US states in march 2006
the adjusted r squared takes into account the number of variables in a model
and may decrease
degrees of freedom
any more observations than you need to create a best fit line for y = b0 + b1x1 degrees of freedom for x1, x2...xk = k + 1
The F statistic is always nonnegative
as SSRr is smaller than SSRur
errors have zero conditional mean; all variables must be exogenous (more likely to hold in multivariate OLS because fewer things end up in the error term)
assumption MLR.4
homoskedasticity
assumption MLR.5
normality of error terms
assumption MLR.6
random sampling
assumption SLR.2
sample variation in explanatory variable
assumption SLR.3
zero conditional mean
assumption SLR.4
classical linear model (CLM)
assumptions MLR.1-MLR.6
A researcher plans to study the causal effect of police crime using data from a random sample of U.S. counties. He plans to regress the county's crime rate on the (per capita) size of the country's police force. Which of the following variable(s) is/are likely NOT to be useful to add to the regression to control for important omitted variables? The average level of education in the county. The number of bowling alleys in the county. The fraction of young males in the county population The average income per capita of the county.
b
A researcher plans to study the causal effect of police crime using data from a random sample of U.S. counties. He plans to regress the county's crime rate on the (per capita) size of the country's police force. Which of the following variable(s) is/are likely NOT to be useful to add to the regression to control for important omitted variables? a.The average level of education in the county. b.The number of bowling alleys in the county. c.The fraction of young males in the county population d.The average income per capita of the county
b
minimize the sum of squared regression residuals
best fit
Joint probability distribution
both discrete variables
An estimator θ^1 of the population value θ is more efficient when compared to another estimator θ^2, if
both estimators are unbiased, and V ar(θ1^) < V ar(θ2^).
the regression slope indicates
by how many units the conditional mean of y increases, given a one unit increase in x
Consider the multiple regression model with two regressors X1 and X2, where both variables are determinants of the dependent variable. When omitting X2 from the regression, there will be omitted variable bias for β1: only if X2 is a dummy variable B. if X2 is measured in percentages. C. if X1 and X2 are correlated. D. always.
c
The correlation between X and Y
can be calculated by dividing the covariances between X and Y by the product of the two standard deviations.
The correlation between X and Y:
can be calculated by dividing the covariances between X and Yby the product of the two standard deviations.
logged coefficients
can be interpreted as percent changes for small deviations in the dependent variable; gives "midpoint" percentage changes; logarithmic changes are elasticities
the confidence interval for the sample regression function slope
can be used to conduct a test about a hypothesized population regression function slope
F statistic computed using maximum likelihood estimators
can be used to test joint hypotheses
F-statistics computed using maximum likelihood estimators
can be used to test joint hypotheses
Binary variables:test
can take on only two values.
reasons why economists do not use experimental data more frequently is for the reasons except that real world experiments
cannot be executed in economics
1) misspecification (omitted variables) 2) outliers 3) skewness 4)incorrect data transformation 5) incorrect functional form for the model 6) improved data collection procedures
causes of heteroskedasticity
ordinary least squares
chooses the estimates to minimize the sum of squared residuals. ∑(y-B0-B1x1-B2x2)^2
expected value of a discrete random variable
computed as a weight average of the possible outcome of that random variable, where weights are probabilities of that outcome
least squares assumptions
conditional distribution of Ui given Xi has a mean of zero Xi,Yi i=1 are independently and identically distributed large outliers are unlikely
the explanatory variable must not contain information about the mean of ANY unobserved factors (talking about the actual parameter u)
conditional mean independence assumption
1) errors start to look normal 2) in large samples, the t distribution is close to the N(0,1) distribution (same with confidence intervals and F tests)
consequences of asymptotic normality
1) errors start to look normal 2) in large samples, the t-distribution is close to the N(0,1) distribution (same with confidence intervals and F tests)
consequences of asymptotic normality
1) estimates will be less precise because the error variance is higher (otherwise, OLS will be unbiased and consistent)
consequences of measurement error in the dependent variable
estimates will be less precise because the error variance is higher (otherwise, OLS will be unbiased and consistent)
consequences of measurement error in the dependent variable
the probability that the estimate is close to the true population value can be made high by increasing the sample size
consistency
cross-sectional data set
consists of a sample of individuals, households, firms, cities,states, countries, or a variety of other units, taken at a given point in time.
economic model
consists of mathematical equations that describe various relationships. basic premise underlying these models is utility maximization.
time series data
consists of observations on a variable or several variables over time. Examples of time series data include stock prices, money supply, consumer price in- dex, gross domestic product, annual homicide rates, and automobile sales figures. Because past events can influence future events and lags in behavior are prevalent in the social sci- ences, time is an important dimension in a time series data set.
Log transformations of the dependent and independent variables
constant elasticity model dependent variable y=log(salary) independent y=log(sales The coefficient of log sales is the estimated elasticity of salary with respect to sales. It implies that a 1% increase in firm sales increases CEO salary by 0.257% (coefficient is 0.257)
Linear Regression Model
describes the relationship between the two random variables • Yi=β₀+β₁Xi+ui
uses of econometrics
description, hypothesis, forecast
in the study of the effectivness of cardiac catheterization using instrument and difference in distance to cardiac catheterization and regular hospitals
determine if the instrument is weak, compute the value of the first stage F statistic
total sum of squares (SST)
difference between data point (y) and sample average (y bar)
explained sum of squares (SSE)
difference between regression line (y hat) and sample average (y bar)
Suppose that n = 100, and that we want to test at 1% level whether the population mean is equal to 20 versus the alternative that it is not equal to 20. The sample mean is found to be 18 and the sample standard deviation is 10. Your conclusion is:
do not reject the null hypothesis.
Interpreting R^2 and adjusted R^2 THEY DONT TELL YOU
do not tell you 1. an included variable is statistically significat 2. the regressor are true cause of the movement in the depedent variable 3. there is omitted variable bias 4. You have chosen the most appropriate set of regressors.
The availability of computerrelated leisure activities in the district. If this variable is omitted, it will likely produce a(an) _________ bias of the estimated effect on tests scores of increasing the number of computers per student.
downward
If Cov(Xi,ui) < 0, then ^β₁is biased
downwards (negative bias)
ols unbiased
e(b1)=e(b^1)
log-log form
elasticities are constant, slopes are not; one percent increase in X leads to a percent increase in Y equal to the coefficient of the X
sum(yi-y bar)^2
equation for SST
Error vs residual
erris is the deviation from the true value while the residual is the difference between the observed and estimated
classical assumption 5
error term has constant variance
classical assumption 2
error term has zero population mean
OLS is biased and inconsistent because the mismeasured variable is endogenous
errors in variable assumption (measurement error)
A survey of earnings contains an unusually high fraction of individuals who state their weekly earnings in 100s, such as 300, 400, 500, etc. This is an example of:
errors-in-variables bias.
Standard Error of the regression SER
estimates the standard deviation of the error term u.Thus SER is a measure of the spread of the distribution of Y around the regression line. SSR/ n-k-1 k: slope coeffecients
b₀
estimator of β₀
b₁
estimator of β₁
In general, the t-statistic has the following form:
estimator-hypothesize value/ standard error of estimator
causal inference
evaluating whether a change in x will lead to a change in y assuming nothing else changes (ceteris paribus)
to provide quantitative answer to policy question
examine emperical evidence
Omitted variable bias
exists if the omitted variable is correlated with the included regressor and is a determinant of the dependent variable.
variance
expected squared deviation from the mean
proportionality model of heteroskedasticity
expenditure in Rhode Island: less absolute value of variability than California because of relative percentage
represents variation explained by regression
explained sum of squares
R^2=
explained sum of squares/total sum of squares
stochastic error term
explains all changes in Y not explained by changes in Xs
total sum of squares (SST)
explains total variation of the model, what's explained by the model and what is not
type 2 error
failing to reject an incorrect null hypothesis
Threats to internal validity lead to:
failures of one or more of the least squares assumptions.
in practice the most difficult aspect of IV estimation is
finding instruments that are both relevant and exogenous
GLS (generalized least squares)
fix to heteroskedasticity WLS (weighted least squares): use when you have enough info to be able to describe why you have this problem feasible GLS: use when var(u | x) is undeniable and nonconstant but there's an unknown f(x) regress y x [iweight = variable] estat hettest predict weight, residuals
Durbin-Watson d test
for FIRST ORDER serial correlation; no lagged variable, includes an intercept; high and low d; if d is below critical d, you reject the null; if d is above high critical d, you fail to reject null
how well the explanatory variable explain the dependent variable (if we only know x, how much can we say about y)
goodness of fit
how well the explanatory variable explains the dependent variable (if we only know x, how much can we say about y)
goodness of fit
our model explains a lot of the variation in Y but doesn't tell us anything causal
high r-squared only tells us that...
b0 meaning
how much of the dependent variable is fixed? y-intercept for line of best fit minimizes the sum of the squares of residuals
covariance
how much two random variables change together
R^2
how well the regression line fits the data
Omitted variable without Bias
if correlation between term and explanatory variables is 0 or if the coefficient of the omitted variable is 0
Multicollinearity
if its imperfect multi - means one regressor is highly corrleated but not perfect- could lead to one or more coeff could be estimated imprecisely
identification (GM assumption 4)
if the zero conditional mean assumption holds, then we may interpret our coefficients on our X variables as causal (a change in X does not systematically cause a change in Y other than through the impact of the coefficient)
fail to reject
if | t-statistic |<critical value, we ____________ the null hypothesis
why don't we use slope-intercept form?
implies causation
finding a small value of the p value
indicates evidence in against the null hypothesis
Multivariate Regression Coefficient
indicates the change in the dependent variable associated with a one-unit increase in the independent variable in question
In the regression model y=β0 +β1x+β2d+β3(x×d)+u, where x is a continuous variable and d is a dummy variable, β2
indicates the difference in the intercept when d = 1 compared to the base group.
16. In the regression model y=β0 +β1x+β2d+β3(x×d)+u, where x is a continuous variable and d is a dummy variable, β3
indicates the difference in the slope parameter when d = 1 compared to the base group.
Variance inflation factor
is a method of detecting the severity of multicollinearity by looking at the extent to which a given explanatory variable can be explained by all other explanatory variables in an equation. -The higher the VIF, the more severe effects of multicollinearity If VIF>5, multicollinearity is severe.
Adjusted R^2
is a modified version , that doesnt always increase when you add another regressor 1- (n-1)/(n-k-1) * (SSR/TSS) so adjusted R^2 is always less than r^2 useful quanitfies the extent to which the regressor account for, or explain the variation in the dependent variables -use this to see if we should add regressor
Confidence Interval
is a range of values that will contain the true value of β a certain percentage of the time
marginal effect on x and y
is constant and equal to B1 delta(y)=b1(deltax)
reject null if p-value
is less than the level of significance (while beta has same sign as HA)
Multiple regression analysis
is more amenable to ceteris paribus analysis because it allows us to explicitly control for many other factors that simultaneously affect the dependent variable. This is important both for testing economic theories and for evaluating policy effects when we must rely on nonexperimental data. Because multiple regres- sion models can accommodate many explanatory variables that may be correlated, we can hope to infer causality in cases where simple regression analysis would be misleading.
A Control Variable in Multiple Regression
is not the object of interest in the study , rather it is a regressor included to hold constant factos that , if neglected , could lead the estimated casual effect of interest to suffer from omitted variable bias
endogenous variable
is one that is correlated with u
exogenous variable
is one that is uncorrelated with u
The population regression line
is the relationship that holds between Y and X on average in the population
The standard error of the estimated coefficient, SE(Beta)
is the square root of the estimated variance of the βNs, it is similarly affected by the size of the sample and the other factors we've mentioned. For example, an increase in sample size will cause SE1βN 2 to fall; the larger the sample, the more precise our coefficient estimates will be.
An estimator is unbiased if
its expected value is equal to the true population parameter it is supposed to be estimating
detect serial correlation
lagged residuals against residuals --> Breusch-godfrey test with a test statistic of NR^2
Newey west standard errors
larger SE because serial correlation is accounted for;
Meaning of Linear Regression
linear in parameters of B0 and B1. there are no restrictions on how y and x relate to the original explained and explanatory variables of interest
constant elasticity model
log log form suggests...
can be interpreted as percent changes for small deviations in the dependent variable; gives "midpoint" percentage changes; logarithmic changes are elasticities
logged coefficients
making comparisons across different scales (data that varies in magnitude)
logs are a way of...
When looking at the relationship between 2 variables, it is always good to start by
looking at a scatterplot
impure serial correlation
looks like serial correlation but because of some other specification error
in the context of a controlled experiment consider the simple linear regression formulation, let Yi be the outcome Xi the treatment level when the treatment is binary, and u contain all the additional determinants of the outcome. Then calling B^1 a difference estimator:
makes sense since it is the difference between the sample average outcome of the treatment group and the sample average outcome of the control group
The effect that x has on y is a ________ and constant and equal to B1
marginal
p-value
marginal significance value; probability of observing a t-score that size or larger if the null were true; lowest level of significance at which we can reject the null
MSR
mean square regression
minimum variance unbiased estimators
means that OLS has the smallest variance among unbiased estimators; we no longer have to restrict our comparison to estimators that are linear in the yi.
Imperfect multicollinearity:
means that two or more of the regressors are highly correlated
imperfect multicollinearity
means that two or more of the regressors are highly correlated
Imperfect Multicollearity
means two or more of the regressors are highly correlated in the sense that there is a linear function of the regressor that is highly correlated with another regressor. does not pose any problems to the theory of OLS -atleast on indiv regressor will be impresciselt estimated -larger sampling variance
level of significance
measure of probability of type 1 error
this factor (which involves the error variance of a regression of the true value of x1 on the other explanatory variables) will always be between zero and one; implies we are consistently biased towards zero
measurement error inconsistency
covariance
measures how much two random variables vary together; when one is big the other also tends to be big (height and weight of animals)
To obtain OLS estimates of the unknown coefficients β₀, β₁, ... , βk , we
minimize the sum of squared residuals (RSS) with respect to ^β₀, ^β₁, ... , ^βk :
the OLS estimator is derived by
minimizing the sum of squared residuals
pooled cross sections
multiple unit of observations multiple times, but different observations each time (change in property taxes on house prices)
panel/longitudinal data
multiple units of observation with multiple time observations for each (have both cross-sectional and time series dimensions) (city crime statistics)
linear in the coefficients
must be true to perform linear regression
Bernoulli random variable
mutually exclusive, exhaustive binary, extreme case. if probabilities don't add up to 1, there must be other outcomes
instrumental variable
needed when some regressors are endogenous (correlated with the error term); involves finding instruments that are correlated with the endogenous regressors but uncorrelated with the error term
"In a regression, if the p-value for a coefficient is 0.0834, do you reject the null hypothesis of it being equal to zero at the 5% level? "
no
Irrelevant variable effect
no bias but increased variance and decreased adjusted-R^2
serial correlation effects
no bias in coefficients, biased SEs, OLS is no longer the minimum variance estimator
classical assumption 6
no explanatory variable is a perfect linear function of any other explanatory variable
davidson-mackinnon j test
nontested model specification test nnest in stata
What to do about multicollinearity?
nothing, drop a redundant variable (based on theory), or try to center the variables
M
number of constraints, numerator's degrees of freedom for an F-test
time series data
observations of a variable or several variables over time; typical features include trends and seasonality and serially correlated (stock prices, GDP)
classical assumption 4
observations of the error term are uncorrelated
serially correlated
observations that occur before and after each other tend to be similar
T-distribution
obtained from a standard normal distribution
The true causal effect might not be the same in the population studied and the population of interest because
of differences in characteristics of the population. of geographical differences. the study is out of date.
high correlation
ols estimators still unbiased, but the estimation of parameters has lower precision when regressors correlated
lagged dependent variables as proxies
omitted unobserved factors may be proxied by the value of the dependent variable from an earlier time period
Consider the multiple regression model with two regressors X1 and X2, where both variables are determinants of the dependent variable. You regress Y on X1 only and find no relationship, However when regressing Y on X1 and X2 the slope coefficient changes by a large amount, the first regression suffers from
omitted variable bias
The power of the test is
one minus the probability of committing a type II error.
The significance level of a test is:
one minus the probability of rejecting the null hypothesis when it is true.
proxy variables
one way of dealing with omitted variables (but imperfect); think of running them as something more like a specification test, or a test for the influence of possible omitted variable bias (worried about nonrandom sampling)
Linear in the Coefficients
only if the coefficients (the βs) appear in their simplest form—they are not raised to any powers (other than one), are not multiplied or divided by other coefficients, and do not themselves include some sort of function (like logs or exponents).
Degrees of Freddom
or the excess of the number of observations (N) over the number of coefficients (including the intercept) estimated (K + 1). Higher DOF = More reliable data
OLS
ordinary least squares best fit line that optimizes the value of u^2 minimizes the difference between observed and expected values
exogenous
originating outside the system, ie determined by processes unrelated to the question at hand (randomly assigned treatments and their outcomes)
endogenous
originating within the system, ie co-influential or jointly determined (education and earnings)
dummy variable trap is an example of
perfect multicollinearity
unobserved household environment
person specific
unobserved motivation
person-specific
the best way to interpret polynomial regressions is to:
plot the estimated regression function and to calculate the estimated effect on Y associated with a change in X for one or more values of X
Linear in Variables
plotting the function in terms of X and Y generates a straight line.
counterfactual (randomized experiments)
potential outcome of each individual as opposed to the other (if it were not the case that the individual received treatment, they would have been in control)
significance level
probability of rejecting the null when it is in fact true
quasi experiments
provide a bridge between the econometric analysis of observational data sets and the statistical ideal of a true randomized controlled experiment
confidence interval (CI)
provide a range of likely values for the population parameter, and not just a point estimate B=B^(+/-)C*SE(B^)
multiple restrictions
putting more than one restriction on the parameters in
numerator degrees of freedom
q=dfr-dfur
econometrics
quantitative measurement of actual economic and business phenomena
stochastic variables
random variables
estimate of the treatment effect; yielded by the regression of Y on indicator variable Z
randomized experiments
they create groups that on average are virtually identical to each other (able to attribute difference in groups to treatment)
randomized experiments are the gold standard for answering causal questions because...
excluding a relevant variable or underspecifying the model
rather than including an irrelevant variable, we omit a variable that actually belongs in the true (or population) model.
biased toward zero
refers to cases where E(B ̃1) is closer to zero than is B1. Therefore, if B1 is positive, then B ̃1 is biased toward zero if it has a downward bias. On the other hand, if B1<0, then B ̃1 is biased toward zero if it has an upward bias.
White test for heteroskedasticity
regress squared residuals on all explanatory variables, their squares, and interactions; detects more general deviation from hetero than BP test
OLS
regression estimation technique that calculates beta hats so as to minimize the sun of the squared residuals
classical assumption 1
regression is linear, specified correctly, and has an additive error term
dummy/indicator variable
regression of Y on an indicator variable for treatment X which takes on the value 1 when treatment occurred and 0 otherwise; 1 if person is a woman, 0 if person is a man
Type 1 error
reject a true null
type 1 error
rejecting a true null hypothesis
A predicted value of a dependent variable:
represents the expected value of the dependent variable given particular values for the explanatory variables.
represents variation not explained by regression
residual sum of squares
deviations from regression line
residuals
why partialling out works
residuals from the first regression are the part of the explanatory variable that is uncorrelated with the other explanatory variables (slope coefficient from 2nd regression represents isolated effect of explanatory variable on dep. variable)
restricted vs. unrestricted model
restricted log(salary)=B0+B1years+B2gamesyr+u unrestricted log(salary)=B0+B1years +B2gamesyr+B3bavg+B4hrunsyr+b5rbisyr+u The restricted model always has fewer parameters than the unrestricted model.
omitted variable bias
results in a misestimated coefficient on the included variable, which is trying to capture the roles of both variables "underspecifying the model"
Generalized least squares
rids an equation of first order serial correlation problems and makes it a minimum variance equation again; decreases standard errors and makes confidence intervals more accurately wider
the larger the variability of the unobserved factors (bad variation)
sampling variability of the estimated regression coefficients will be higher...
the higher the variation in the explanatory variable (good variation)
sampling variability of the estimated regression coefficients will be lower...
standard deviation of B
sd(B)=σ/[SST(1-R^2)]^(1/2)
When testing joint hypotheses, you should
se the F-statistics and conclude that at least one of the restrictions does not hold if the statistic exceeds the critical value.
elastic demand
sensitive to changes in price and income
multivariate regression coefficient
serve to isolate the impact on Y of a change in one variable from the impact on Y of changes in the other variables
one of your friends us using data on individuals to study the determinants of smoking at your university she is concerned with estimating marginal effects on the probability of smoking at the extremes, what should she use
she should use the logit or probit, but mot the linear probability model
variance inflation factors
show how much the variance of one variable is inflated due to the addition of another variable (above 5 is a concern)
how could you determine whether this instrument the difference in distance to cardiac catheterization and regular hospitals is exogenous
since there is one endogenous regressor and one instrument the J test cant be used to test the exogeneity of the instruments Expert judgment is required to assess the exogeneity
we must make a case for using proxies and arguing that they do not threaten any inferences we make (especially causal ones)
since we can never observe our unobservables...
third moment of standardized variable
skewness (+ right, - left)
summary of functional forms involving logs
slide 18 notes 2-5
standard error of estimated model
small error compared to the mean indicates that unexplained results/errors are small compared to reality
Nonlinear least squares
solves the minimization of the sum of squared predictive mistakes through sophisticated mathematical so routines, essentially by trial-and-error methods.
example of randomized controlled experiment
some 5th graders in a specific elementary school are allowed to use comps at school, while others are not, and their end of year performance is compared holding constant other factors
variance vs std deviation
squaring the std deviation emphasizes dispersion and draws attention to things that are unusual
install function in stata
ssc install ____
sample variance of y
sst / n-1 (sum(yi-¯y)^2 /n-1)
estimated standard deviations of the regression coefficients; measure how precisely the regression coefficients are estimated
standard errors
when you add state fixed effects to a simple regression model for U.S states over a certain time period and the regression R^2 increases significantly then it is safe to assume that
state fixed effects account for a large amount of the variation in the data
Ordinary least squares
sum of the vertical distances squared between the residuals and the estimated regression line (also RSS)
summarize in stata
summarize
Reject the null hypothesis if
t-stat falls outside of the critical values or if p-value ≤ α or if H₀: β₁falls within the confidence interval
The rejection rule for alternative
t<-c
t statistic
t=B^/se(B^)
Dummy Variable
takes on the value of one or zero (and only those values) depending on whether a specified condition is met. e.g Male=1 Female=0
reduced form estimation
testing something close to the assumptions of the model. more potential sources for error
The cumulative probability distribution shows the probability
that a random variable is less than or equal to a particular value.
in the case of the simple regression model Y=Bo+B1Xi+ui, i=1 when X and u are correlated then
the OLS estimator is inconsistent
In testing multiple exclusion restrictions in the multiple regression model under the classical assumptions, we are more likely to reject the null that some coefficients are zero if:
the R-squared of the unrestricted model is large relative to the R-squared of the restricted model.
to decide whether Y=B0+B1X+u or ln(Y)=B0+B1X+u fits the data better you cant consult the regression because
the TSS are not measured in the same unites between the two models
In the probit regression, the coefficient beta 1 indicates:
the change in the the z-value associated with a unit change in X.
Econometrics
the development of statistical methods for estimating economic relationships, testing economic theories, and evaluating and implementing gov- ernment and business policy. The most common application of econometrics is the fore- casting of such important macroeconomic variables as interest rates, inflation rates, and gross domestic product.
The Student t distribution is:
the distribution of the ratio of a standard normal random variable, divided by the square root of an independently distributed chi−squared random variable with m degrees of freedom divided by m.
The student t distribution is
the distribution of the ratio of a standard normal random variable, divided by the square root of an independently distributed chi−squared random variable with m degrees of freedom divided by m.
Each slope coefficient βj, in the multiple regression, measures
the effect of a one unit change in the corresponding regressor Xji , holding all else (e.g. the other regressors) constant.
why is the variance of the error term the same as the variance of our dependent variable y?
the error shows up in the equation containing population parameters, which are never observable the residual shows up in equations containing sample parameters, which are computed from the data
MLR.6
the error term is independent of the explanatory variables x1, x2, x3,....xk and is normally distributed with mean zero and variance o^2
classical assumption 7
the error term is normally distributed
A type I error is
the error you make when rejecting the null hypothesis when it is true
a type 1 error is
the error you make when rejecting the null hypothesis when it is true
A type I error is
the error you make when rejecting the null hypothesis when it is true.
Internal validity is that:
the estimator of the causal effect should be unbiased and consistent.
The regression R2 is a measure of
the goodness of fit of your regression line.
The regression R^2 is a measure of:
the goodness of fit of your regression line.
15. Which of the following is not correct in a regression model containing an interaction term between two independent variables, x1 and x2:
the interaction term coefficient is the effect of a unit increase in √x₁x₂.
When Xi is a binary variable
the interpretation of the estimated coefficients ^β₀ and ^β₁is different. ^β₁measures the difference in ^Yi between Xi=0 and Xi=1
Take an observed (that is, estimated) 95% confidence interval for a parameter of a multiple linear regression. If you increase the confidence level to 99% then, necessarily:
the length of the confidence interval increases.
Total Sum of Squares
the sum, over all observations, of the squared differences of each observation from the overall mean.
Explained Sum of Squares (ESS)
the total variation of the fitted Y values around their average (i.e. the variation that is explained by the regression): • ESS = Σ(i)(^Yi-~Y)²
reason why estimators have a sampling distribution
the values of explanatory variable and the error term differ across samples
MLR.5 homoskedasticity
the variance of ui is the same for all xi and all i
the larger F is, the larger SSR restricted relative to SSR unrestricted
the worse the explanatory power of the restricted model, implying H0 is false
2. The availability of computerized adaptive learning tools in the district. If this variable is omitted, it will likely produce a(an) ___________ bias of the estimated effect on tests scores of increasing the number of computers per student.
upword
MSE=
variance + bias^2 (the lower the better!)
homoskedasticity
variance does not change for different observations of the error term
1) total sample variation 2) linear relationships between x variables 3) error variance
what determines the size of our standard errors?
coefficient of determination (R^2)
what share of all variation there is to explain, does your model explain?
r squared can never decrease
when another independent variable is added to a regression
Perfect multicollinearity is
when one of the regressors is an exact linear function of the other regressors.
omitted variable bias (OVB)
when our model has omitted an important variable that is correlated with one or more of our included (x) variables, causing biased estimators
inverse functional form
when the impact of a dependent variable approaches zero as it approaches infinity
Level-log
y=b0+b1log(x) for every 1% increase in x, y incr/decr by b1/100 UNITS.
"In a regression, if the confidence interval for a coefficient is (1.83, 2.76), do you reject the null hypothesis of the coefficient being equal to zero at the 5% level?"
yes
In the simple regression model y = β0 + β1x + u, the simple average of the OLS residuals is
zero
In the simple regression model y = β0 + β1x + u, the simple average of the OLS residuals is
zero.
two-tailed test
|t|>c c is chosen to make the area in each tail of the t distribution equal 2.5%. In other words, c is the 97.5th percen- tile in the t distribution with n-k-1 degrees of freedom. When n-k-1=25, the 5% critical value for a two-sided test is c=2.060.
The estimates Bˆ1 and Bˆ2 have partial effect, or ceteris paribus
Δy=B1Δx1+B2Δx2 Δy=B1Δx1 Δy=B2Δx2 Δy=B1Δx1+B2x2+...+BkΔxK Δy=B1Δx1
Assume that Y is normally distributed N(μ,σ2). To find Pr(c1 ≤ Y ≤ c2), where c1 <c2,anddi =ci−μ,youneedtocalculatePr(d1 ≤Z≤d2)=
Φ(d2 ) − Φ(d1 )
Assume that Y is normally distributed N(μ,σ2). To find Pr(c1 ≤ Y ≤ c2), where c1 <c2,anddi =ci−μ,youneedtocalculatePr(d1 ≤Z≤d2)=
Φ(d2 ) − Φ(d1 ).
Properties of OLS
• The sample regression function obtained through OLS always passes through the sample mean values of X and Y . • ~û = (Σ(i) ûi)/n = 0 (mean value of residuals is zero) • Σ(i) ûiXi = 0 (^ui and Xi are uncorrelated) • Given the OLS assumptions and homoscedastic errors, the OLS estimators have minimum variance among all unbiased estimators of the β's that are linear functions of the Y 's. They are Best Linear Unbiased Estimators (BLUE).
We can decompose each Yi value into the fitted (or predicted) part given Xi and the residual part, which we called ^ui :
• Yi=^Yi+^ui •Yi-~Y=(^Yi-~Y)+^ui •(Yi-~Y)=(^Yi-~Y)+(Yi-^Yi) • Σ(i)(Yi-~Y)²=Σ(i)(^Yi-~Y)²+Σ(i)(Yi-^Yi)² • TSS = ESS + RSS
Root Mean Squared Error (RMSE)
• ^σu = √^σ²u = √Σ(i)û²i/(n-2) = √Σ(i)(Yi-^Yi)²/(n-2) • Also called the standard error of the regression (SER) • is a measure of the deviation of the Y values around the regression line
OLS estimators of β₀ and β₁
• denoted ^β₀ and ^β₁ • The estimators that minimize the sum of squared residuals ∑(i=1,n) (Yi-b₀-b₁Xi)²
There a 5 basic steps to a standard hypothesis test. What are these?
• null and alternative hypotheses • test statistic (with distribution under the null) • significance level and critical value • decision rule • inference and conclusion