BUSN 5000 Final Review
The test-sample MSPEs are _____ for lasso and _____ for OLS. (Round to 3 decimal places and in X.XXX format.)
.543, .544
consider two estimators of the population mean μ, μ̂ and μ̃ . Assume that μ̂ has a N(0,1) sampling distribution, μ̃ has a N(.5,(.5)^2) sampling distribution, and the true mean of the population is 0. MSE(μ̃ )= _____ .
.75
consider two estimators of the population mean μ, μ̂ and μ̃ . Assume that μ̂ has a N(0,1) sampling distribution, μ̃ has a N(.5,(.5)^2) sampling distribution, and the true mean of the population is 0. E(μ̂ )−μ= _____, which implies μ̂ is _____.
0, unbiased
consider two estimators of the population mean μ, μ̂ and μ̃ . Assume that μ̂ has a N(0,1) sampling distribution, μ̃ has a N(.5,(.5)^2) sampling distribution, and the true mean of the population is 0. MSE(μ̂ )= _____ .
1
The benefit difference in differences is _____ .
88
Suppose yi=β0+β1xi1+β2xi2+β3xi3+β4xi4+ui. To test the null that β3=β4=0, you use an ______ test, which compares the fit of a short regression that ______ x3 and x4 with the fit of a long regression that ______ them.
F, omits, includes
True or false. If corr(x,y)=0, y does not depend on x.
False
yi=β0+β1xi1+β2xi2+ui,,i=1,...,N,(2); where E(ui|xi1,xi2)=0 True or false: R2 is centrally important for doing causal inference.
False
Cross-validation involves computing the _____ for each fold and _____ them over all folds.
MSPE, averaging
The CV(M) plot shows how _____ varies with _____ strength.
MSPE, penalty
E(yi|Di=1)−E(yi|Di=0)=E(y1i|Di=1)−E(y0i|Di=1) : TERM 1 + E(y0i|Di=1)−E(y0i|Di=0)(1) : TERM 2
The Switching Equation
True or false. If x causes y, the conditional distribution of y given x must depend on x.
True
consider two estimators of the population mean μ, μ̂ and μ̃ . Assume that μ̂ has a N(0,1) sampling distribution, μ̃ has a N(.5,(.5)^2) sampling distribution, and the true mean of the population is 0. Although μ̃ is _____, it has a lower _____.
biased, mean squared error
Choosing the best-performing ML model involves empirically tuning model complexity through _____.
cross-validation
R¯^2 is an adjustment of R^2 to account for the _____ in SSR and SST.
degrees of freedom
The training sample is divided into _____ , one of which is held out for _____ while the others are used to _____ the model.
folds, validation, estimate
take θ to be some unknown parameter and θ̂ to be an estimator of θ. E[(θ̂ −θ)^2] is the _____ of the estimator.
mean squared error
yi=β0+β1xi1+ui,i=1,...,N.(1); f β0 and β1 solve the population least-squares problem their values ______ the expected value of the _____ difference between the dependent variable and the CEF.
minimize, squared
The high-earner group is (more/less) _____ male and (more/less) _____ married, but the male and married shares (do/do not) _____ change over time for either group.
more, more, do not
The distribution of the running variable should show
no evidence of manipulation because it is smooth through the cutoff.
Unlike in standard regression analysis, however, there is no _____ in a sharp RD design, because individuals with different values of D� have different values of the covariate by construction.
overlap
Based on the information in Table 1, Project STAR had good _____ and covariate _____ across class types.
overlap, balance
If η varied by group, the _____ assumption would not hold.
parallel trends
The key identifying assumption in a DD analysis is that the treated and untreated outcomes would follow _____ trends in the _____ of the treatment.
parallel, absence
Regularization ______ model complexity by restricting the regression _____ .
penalizes, coefficients
The coefficient plot shows how the estimated coefficients vary with the _____ strength and which variables are _____ as the penalty is strengthened
penalty, omitted
The key identifying assumption of an RD design is that the average _____ outcomes are _____ through the cutoff.
potential, continuous
LASSO is a _____ estimator that also performs variable _____ by forcing the coefficients of the least relevant variables to be equal to _____.
shrinkage, selection, 0
Machine learning that involves predicting an outcome with a set of explanatory variables is called _____ learning.
supervised
R¯^2 penalizes the inclusion of an additional explanatory variable if its associated _____ is less than _____.
t statistic, 1
Cross-validation begins by dividing the data into _____ and _____ samples.
training, test
Cross-validation is repeated for different values of the _____ parameter, which determines the strength of the _____ imposed by the regularizer.
tuning, penalty
take θ to be some unknown parameter and θ̂ to be an estimator of θ. If E(θ̂ )=θ, then E[(θ̂ −θ)^2] equals the _____ of θ̂ .
variance
take θ to be some unknown parameter and θ̂ to be an estimator of θ. E[(θ̂ −θ)^2] can be decomposed into the sum of the _____ and square of the _____ of θ̂ .
variance, bias
yi=β0+β1xi1+β2xi2+ui,,i=1,...,N,(2); where E(ui|xi1,xi2)=0 In (2), the test statistic for the null hypothesis that β2=1 is ________..
(β̂ 2−1)/se(β2)
If treatment assignment is randomized, then TERM 2 equals ______ and TERM 1 equals the ______.
0, ATE
yi=β0+β1xi1+ui,i=1,...,N.(1); If E(ui|xi1)=0 in (1), the sampling error of β̂ 1 converges to _____ and β̂ 1 is ______.
0, consistent
yi=β0+β1xi1+β2xi2+ui,,i=1,...,N,(2); where E(ui|xi1,xi2)=0 If you omit xi2 from (2), β̂ 1 will be unbiased only if β2= ______ or xi1 and xi2 are ______.
0, uncorrelated
The ldurat difference in differences is _____ .
0.20
The result in Column (1) is statistically significant at the _____-percent level.
1
On average, men with complete IQ test score data earn approximate $______ more per hour (round to the nearest dime) and have _____ more years of schooling (round to the nearest year).
1.2, 2
The estimated coefficient of teacher experience in Column (2) suggests that an additional year of experience is associated with about a _____ point increase (round to one decimal place) in test scores.
1.5
The results in Column (1) indicate that the effect of adding an aide to a regular class is just under _____ of a point and statistically (significant/insignificant) _____.
1/3, insignificant
This difference amounts to about _____ of a standard deviation in the baseline regular-class sample. (Give a fraction or percent approximation).
1/5
This percentage-point effect translates into about a/an _____ percent increase on the likelihood of having 12 or more drinks in a year. (Answer with an integer value).
11
Their preferred specification indicates a _____ percentage-point increase in the proportion of days drinking. (Answer with an integer value).
2
On average, men with missing IQ test scores are ______ points _____ likely to be from the south and _____ points _____ likely to live in a city.
20, more, 9, less
This percentage-point effect translates into about a/an _____ percent increase in the proportion of days drinking. (Answer with an integer value.)
21
Based on the results in Column (4), the effect of being assigned to a small class is roughly the same as having teacher with _____ years more experience (round to the nearest whole number).
24
Their preferred specification indicates a statistically significant _____ percentage-point effect on the likelihood of having 12 or more drinks in a year. (Answer with an integer value).
6
Based on the results in Column [2], the estimated return to the first year of experience is about ______ % (report one decimal place).
8.1
The t statistic for the teacher-experience coefficient estimate in Column (2) is _____ . (Round to one decimal).
8.8
What is the impact of teacher experience on the estimated class-size effect?
Adding teacher experience to the model has essentially no impact on the simple differences-in-means estimate given in Column (1).
In a sharp RD design, the _____ holds automatically because treatment assignment is determined solely by the _____ value of the _____ variable.
CIA, cutoff, running
The parameter δ represents the _____.
DD
yi=β0+β1xi1+ui,i=1,...,N.(1); The ______ theorem says you can control for other explanatory variables in estimating the effect of an x on y by either including the other variables directly or regressing y on the ______ from a regression of x on the other variables.
Frisch-Waugh-Lovell, residuals
CD lump all causes potentially related to _____ into the ______ (internal/external) category and all causes unrelated to ______ (internal/external) into the ______ category.
alcohol, external, alcohol, internal
Under the assumptions of a sharp RD design, you identify an
average treatment effect measured at the cutoff.
yi=β0+β1xi1+β2xi2+ui,,i=1,...,N,(2); where E(ui|xi1,xi2)=0 Let's say you don't omit xi2, but it is measured with error. Then β̂ 2 will be ______ . (unbiased/ biased down/ biased up)
biased down
The basis for an RD analysis should be apparent in a _____ scatter plot of the outcome and _____ variable.
binned, running
CD show that drinking increases after the 21st birthday on _____
both the extensive and intensive margins
Basic OLS inference is grounded in the application of the _____, which says that the ______ of the OLS estimator can be regarded as approximately _____ for large samples.
central limit theorem (CLT), sampling distribution, normal
yi=β0+β1xi1+ui,i=1,...,N.(1); The coefficient β1 measures the _____ in y _____ with a _____ in x1, holding all of the unobservables constant.
change, associated, unit change
Computing the correct standard errors for TWFE estimates usually requires _____ at the group level to account for _____ and _____ correlation.
clustering, heteroscedasticity, serial
The test statistic for whether a explanatory variable has a statistically significant association with the dependent variable is the ratio of explanatory variable's ______ to its _____.
coefficient estimate, standard error
The ______ captures how the population average of one random variable varies with the values of another random variable.
conditional expectation function (CEF)
In the above DAG, Z is a _____.
confounder
You can't observe the effect of a treatment on an individual because you can't observe the ______ outcome. In this sense, causal inference is a ______ data problem.
counterfactual, missing
The parameter η also reflects the _____ average difference in outcomes between periods 0 and 1 for the _____.
counterfactual, treated
yi=β0+β1xi1+ui,i=1,...,N.(1); The value of β1 that solves the population least-squares problem is: _____
cov(xi1yi)/E(x^2i1)
An RD analysis of baseline _____ should show no evidence of _____ among them.
covariates, discontinuities
Including the baseline _____ in the regression model (should/should not) _____ affect the estimated treatment effect.
covariates, should not
In a fuzzy RD design, the _____ value of the _____ variable determines the _____ of treatment.
cutoff, running, probability
Including the IQ test score as an ability proxy ______ the estimated return to schooling by about ______ percentage points (report one decimal place).
decreases, 0.6
yi=β0+β1xi1+β2xi2+ui,,i=1,...,N,(2); where E(ui|xi1,xi2)=0 R^2 measures how much of the variance of the ______ variable is accounted for by the ______ variables.
dependent, explanatory
Column (1) indicates that time out of work (did/did not) _____ rise for low earners.
did not
yi=β0+β1xi1+β2xi2+ui,,i=1,...,N,(2); where E(ui|xi1,xi2)=0 If xi1 is education and xi2 is labor market experience, and you omit xi2 from (2), then β̂ 1 will be biased ______ because β2 is _____ and cov(xi1,xi2) are _____ correlated.
downward, positive, negatively
A model's MSPE is the _____ value of the _____ prediction error arising from an out-of-sample prediction.
expected, squared
A regression formulation of a DD design is appealing because it
facilitates standard error estimation, generalizes for multiple time periods and treatment groups, accommodates covariates
If we say E(y|x)=β1+β1x where β0 and β1 solve the population least-squares problem, then the CEF is the population regression _____ and β0 and β1 are population regression _____.
function, coefficients
The standard 2×2 DD analysis can be carried out by regressing the outcome on a _____ dummy, a period _____, and their _____.
group, dummy, interaction
We described a TWFE model as a regression model for data with both a _____ and _____ dimension.
group, time
The R function lm gives the wrong standard errors, test statistics and confidence intervals because it ignores ______.
heteroscedasticity
The modern approach to regression inference is to allow for the variance of the regression errors to allow for _____, which implies the variance of the errors depends on the ______.
heteroscedasticity, explanatory variables
Estimating a TWFE model with data on multiple groups and variation in treatment timing can identify the ATT if the treatment effect is _____ .
homogenous
The conditional independence assumption (CIA) is a claim that there is a set of covariates that once you control for them, you can consider the potential outcomes to be ______ of treatment assignment. The CIA is a claim of un______ and is un______.
independent, confoundedness, testable
However, if the potential outcomes are ______ of treatment assignment, the assignment mechanism is ______ and the difference in sample average outcomes for treated and untreated individuals will identify the ATE.
independent, ignorable
Potential outcomes will be ______ of treatment assignment if individuals are ______ assigned to treated and untreated groups.
independent, randomly
Using the difference in sample average outcomes for treated and untreated individuals generally won't work for estimating the ATE because potential outcomes are not ______ of treatment assignment, which results in ______ bias.
independent, selection
The distinction between causes of death is important because estimating the treatment effect on _____ causes should function as a falsification exercise.
internal
Overall, the simple differences-in-means result _____ (is/is not) highly robust, holding up even when you _____ for a range of student and school characteristics.
is, adjust
The high-earner group is (more/less) _____ likely to work in maufacturing and (more/less) _____ likely to work in construction, and the share of high earners in contruction (rises/falls) _____ by _____ points after the WBA increase.
less, more, falls, 4
The population regression function provides the best ______ to the CEF.
linear approximation
The population regression function provides the best _____ of the dependent variable, given the explanatory variables
linear predictor
yi=β0+β1xi1+ui,i=1,...,N.(1); If E(ui|xi1)=0 in (1), xi1 is _____ of ui, and the sampling error of β̂ 1 equals ______ on average, which implies that β̂ 1 is ______.
mean independent, 0, unbiased
yi=β0+β1xi1+ui,i=1,...,N.(1); When the PRF includes more than one x, we say that β1 measures the _____ effect of x1 (without necessarily giving a causal interpretation).
partial
In general, the RD specification should include a low-order _____ in the running variable and _____ of the running variable with the treatment indicator.
polynomial, interactions
While individual treatment effects are not observable, you may be able to identify the average treatment effect (ATE), which is the difference in average ______ outcomes.
potential
Under _____ assignment of students to class type, this can be regarded as the _____ of small class size on test scores.
random, ATE
Figure 1. Stylized sharp RD design The black lines are _____ approximations to potential outcome CEFs.
regression
If you have a set of control variables for which a CIA holds, you can identify the average effect of the treatment on the outcome using a ______ of the outcome on a ______ dummy and ______ .
regression, treatment, control variables
As an alternative to the procedure sketched out in the previous question, by the Frisch-Waugh-Lovell theorem, you could simply run a regression of the outcome on the ______ from a regression of the treatment indicator on the controls.
residuals
The modern approach means we should always report _____ standard errors and test statistics.
robust
yi=β0+β1xi1+ui,i=1,...,N.(1); If there were more than one x in (1), then the formula for β1 would be the _____, except xi1 would be replaced with the _____ from a regression of xi1 on the other 'x's.
same, residuals
yi=β0+β1xi1+ui,i=1,...,N.(1); The OLS estimator for β1 can be obtained by plugging in the _____ of xi and yi for their ______ and plugging in another _____ for each outer expectation.
sample averages, population averages, sample average
A simple comparison of treated vs control observations after treatment misses factors that cause non-random _____ into treatment.
selection
E(yi|Di=1)−E(yi|Di=0)=E(y1i|Di=1)−E(y0i|Di=1) : TERM 1 TERM 2 in (1) is
selection bias
The CEF that captures the simple class-size treatment effects on test scores equivalent to the differences in means obtained from Table 1 is E(yi|___,___)=β0+β1___+β2___.
small, regular+aide
Larger ______ statistics and smaller ______ values indicate ______ evidence against the null hypothesis.
test, p, more
E(yi|Di=1)−E(yi|Di=0)=E(y1i|Di=1)−E(y0i|Di=1) : TERM 1 TERM 1 in (1) is
the average treatment effect on the treated
A DD analysis targets the average treatment effect on the _____ or E(y1i−y0i|Di=_____)
treated, 1
The standard 2×2 DD analysis compares the difference in average outcomes for the _____ observations _____ and _____ treatment with the difference in mean outcomes for the controls _____ and _____ treatment.
treated, before, after, before, after
The parameter γ reflects the average difference between _____ and _____ outcomes.
treated, untreated
To estimate the ATE under a CIA, you also need overlap, which is the ability to observe ______ and ______ units for any set of covariate values.
treated, untreated
A simple before vs after comparison of treated observations misses _____ that are shared by the control group.
trends
The target estimand cannot be estimated directly because E(y0i|Di=1) is _____.
unobservable
yi=β0+β1xi1+β2xi2+ui,,i=1,...,N,(2); where E(ui|xi1,xi2)=0 If you omit xi2 from (2), β̂ 1 will be biased ______ if β2 and cov(xi1,xi2) have the same ______.
upward, sign
A formal expression of the DD regression consistent with Table 2 is:
y= μ+γtreat+ηafter+δtreat⋅after+u
Select the regression specification that is consistent with the black lines.
yi=β0+β1xi+τDi+ui
Under the key identifying assumption of a sharp RD design, the model in Question 7 identifies
τ=E(y1i−y0i|xi=c)