BUSN 5000 Final Review

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

The test-sample MSPEs are _____ for lasso and _____ for OLS. (Round to 3 decimal places and in X.XXX format.)

.543, .544

consider two estimators of the population mean μ, μ̂ and μ̃ . Assume that μ̂ has a N(0,1) sampling distribution, μ̃ has a N(.5,(.5)^2) sampling distribution, and the true mean of the population is 0. MSE(μ̃ )= _____ .

.75

consider two estimators of the population mean μ, μ̂ and μ̃ . Assume that μ̂ has a N(0,1) sampling distribution, μ̃ has a N(.5,(.5)^2) sampling distribution, and the true mean of the population is 0. E(μ̂ )−μ= _____, which implies μ̂ is _____.

0, unbiased

consider two estimators of the population mean μ, μ̂ and μ̃ . Assume that μ̂ has a N(0,1) sampling distribution, μ̃ has a N(.5,(.5)^2) sampling distribution, and the true mean of the population is 0. MSE(μ̂ )= _____ .

1

The benefit difference in differences is _____ .

88

Suppose yi=β0+β1xi1+β2xi2+β3xi3+β4xi4+ui. To test the null that β3=β4=0, you use an ______ test, which compares the fit of a short regression that ______ x3 and x4 with the fit of a long regression that ______ them.

F, omits, includes

True or false. If corr(x,y)=0, y does not depend on x.

False

yi=β0+β1xi1+β2xi2+ui,,i=1,...,N,(2); where E(ui|xi1,xi2)=0 True or false: R2 is centrally important for doing causal inference.

False

Cross-validation involves computing the _____ for each fold and _____ them over all folds.

MSPE, averaging

The CV(M) plot shows how _____ varies with _____ strength.

MSPE, penalty

E(yi|Di=1)−E(yi|Di=0)=E(y1i|Di=1)−E(y0i|Di=1) : TERM 1 + E(y0i|Di=1)−E(y0i|Di=0)(1) : TERM 2

The Switching Equation

True or false. If x causes y, the conditional distribution of y given x must depend on x.

True

consider two estimators of the population mean μ, μ̂ and μ̃ . Assume that μ̂ has a N(0,1) sampling distribution, μ̃ has a N(.5,(.5)^2) sampling distribution, and the true mean of the population is 0. Although μ̃ is _____, it has a lower _____.

biased, mean squared error

Choosing the best-performing ML model involves empirically tuning model complexity through _____.

cross-validation

R¯^2 is an adjustment of R^2 to account for the _____ in SSR and SST.

degrees of freedom

The training sample is divided into _____ , one of which is held out for _____ while the others are used to _____ the model.

folds, validation, estimate

take θ to be some unknown parameter and θ̂ to be an estimator of θ. E[(θ̂ −θ)^2] is the _____ of the estimator.

mean squared error

yi=β0+β1xi1+ui,i=1,...,N.(1); f β0 and β1 solve the population least-squares problem their values ______ the expected value of the _____ difference between the dependent variable and the CEF.

minimize, squared

The high-earner group is (more/less) _____ male and (more/less) _____ married, but the male and married shares (do/do not) _____ change over time for either group.

more, more, do not

The distribution of the running variable should show

no evidence of manipulation because it is smooth through the cutoff.

Unlike in standard regression analysis, however, there is no _____ in a sharp RD design, because individuals with different values of D� have different values of the covariate by construction.

overlap

Based on the information in Table 1, Project STAR had good _____ and covariate _____ across class types.

overlap, balance

If η varied by group, the _____ assumption would not hold.

parallel trends

The key identifying assumption in a DD analysis is that the treated and untreated outcomes would follow _____ trends in the _____ of the treatment.

parallel, absence

Regularization ______ model complexity by restricting the regression _____ .

penalizes, coefficients

The coefficient plot shows how the estimated coefficients vary with the _____ strength and which variables are _____ as the penalty is strengthened

penalty, omitted

The key identifying assumption of an RD design is that the average _____ outcomes are _____ through the cutoff.

potential, continuous

LASSO is a _____ estimator that also performs variable _____ by forcing the coefficients of the least relevant variables to be equal to _____.

shrinkage, selection, 0

Machine learning that involves predicting an outcome with a set of explanatory variables is called _____ learning.

supervised

R¯^2 penalizes the inclusion of an additional explanatory variable if its associated _____ is less than _____.

t statistic, 1

Cross-validation begins by dividing the data into _____ and _____ samples.

training, test

Cross-validation is repeated for different values of the _____ parameter, which determines the strength of the _____ imposed by the regularizer.

tuning, penalty

take θ to be some unknown parameter and θ̂ to be an estimator of θ. If E(θ̂ )=θ, then E[(θ̂ −θ)^2] equals the _____ of θ̂ .

variance

take θ to be some unknown parameter and θ̂ to be an estimator of θ. E[(θ̂ −θ)^2] can be decomposed into the sum of the _____ and square of the _____ of θ̂ .

variance, bias

yi=β0+β1xi1+β2xi2+ui,,i=1,...,N,(2); where E(ui|xi1,xi2)=0 In (2), the test statistic for the null hypothesis that β2=1 is ________..

(β̂ 2−1)/se(β2)

If treatment assignment is randomized, then TERM 2 equals ______ and TERM 1 equals the ______.

0, ATE

yi=β0+β1xi1+ui,i=1,...,N.(1); If E(ui|xi1)=0 in (1), the sampling error of β̂ 1 converges to _____ and β̂ 1 is ______.

0, consistent

yi=β0+β1xi1+β2xi2+ui,,i=1,...,N,(2); where E(ui|xi1,xi2)=0 If you omit xi2 from (2), β̂ 1 will be unbiased only if β2= ______ or xi1 and xi2 are ______.

0, uncorrelated

The ldurat difference in differences is _____ .

0.20

The result in Column (1) is statistically significant at the _____-percent level.

1

On average, men with complete IQ test score data earn approximate $______ more per hour (round to the nearest dime) and have _____ more years of schooling (round to the nearest year).

1.2, 2

The estimated coefficient of teacher experience in Column (2) suggests that an additional year of experience is associated with about a _____ point increase (round to one decimal place) in test scores.

1.5

The results in Column (1) indicate that the effect of adding an aide to a regular class is just under _____ of a point and statistically (significant/insignificant) _____.

1/3, insignificant

This difference amounts to about _____ of a standard deviation in the baseline regular-class sample. (Give a fraction or percent approximation).

1/5

This percentage-point effect translates into about a/an _____ percent increase on the likelihood of having 12 or more drinks in a year. (Answer with an integer value).

11

Their preferred specification indicates a _____ percentage-point increase in the proportion of days drinking. (Answer with an integer value).

2

On average, men with missing IQ test scores are ______ points _____ likely to be from the south and _____ points _____ likely to live in a city.

20, more, 9, less

This percentage-point effect translates into about a/an _____ percent increase in the proportion of days drinking. (Answer with an integer value.)

21

Based on the results in Column (4), the effect of being assigned to a small class is roughly the same as having teacher with _____ years more experience (round to the nearest whole number).

24

Their preferred specification indicates a statistically significant _____ percentage-point effect on the likelihood of having 12 or more drinks in a year. (Answer with an integer value).

6

Based on the results in Column [2], the estimated return to the first year of experience is about ______ % (report one decimal place).

8.1

The t statistic for the teacher-experience coefficient estimate in Column (2) is _____ . (Round to one decimal).

8.8

What is the impact of teacher experience on the estimated class-size effect?

Adding teacher experience to the model has essentially no impact on the simple differences-in-means estimate given in Column (1).

In a sharp RD design, the _____ holds automatically because treatment assignment is determined solely by the _____ value of the _____ variable.

CIA, cutoff, running

The parameter δ represents the _____.

DD

yi=β0+β1xi1+ui,i=1,...,N.(1); The ______ theorem says you can control for other explanatory variables in estimating the effect of an x on y by either including the other variables directly or regressing y on the ______ from a regression of x on the other variables.

Frisch-Waugh-Lovell, residuals

CD lump all causes potentially related to _____ into the ______ (internal/external) category and all causes unrelated to ______ (internal/external) into the ______ category.

alcohol, external, alcohol, internal

Under the assumptions of a sharp RD design, you identify an

average treatment effect measured at the cutoff.

yi=β0+β1xi1+β2xi2+ui,,i=1,...,N,(2); where E(ui|xi1,xi2)=0 Let's say you don't omit xi2, but it is measured with error. Then β̂ 2 will be ______ . (unbiased/ biased down/ biased up)

biased down

The basis for an RD analysis should be apparent in a _____ scatter plot of the outcome and _____ variable.

binned, running

CD show that drinking increases after the 21st birthday on _____

both the extensive and intensive margins

Basic OLS inference is grounded in the application of the _____, which says that the ______ of the OLS estimator can be regarded as approximately _____ for large samples.

central limit theorem (CLT), sampling distribution, normal

yi=β0+β1xi1+ui,i=1,...,N.(1); The coefficient β1 measures the _____ in y _____ with a _____ in x1, holding all of the unobservables constant.

change, associated, unit change

Computing the correct standard errors for TWFE estimates usually requires _____ at the group level to account for _____ and _____ correlation.

clustering, heteroscedasticity, serial

The test statistic for whether a explanatory variable has a statistically significant association with the dependent variable is the ratio of explanatory variable's ______ to its _____.

coefficient estimate, standard error

The ______ captures how the population average of one random variable varies with the values of another random variable.

conditional expectation function (CEF)

In the above DAG, Z is a _____.

confounder

You can't observe the effect of a treatment on an individual because you can't observe the ______ outcome. In this sense, causal inference is a ______ data problem.

counterfactual, missing

The parameter η also reflects the _____ average difference in outcomes between periods 0 and 1 for the _____.

counterfactual, treated

yi=β0+β1xi1+ui,i=1,...,N.(1); The value of β1 that solves the population least-squares problem is: _____

cov(xi1yi)/E(x^2i1)

An RD analysis of baseline _____ should show no evidence of _____ among them.

covariates, discontinuities

Including the baseline _____ in the regression model (should/should not) _____ affect the estimated treatment effect.

covariates, should not

In a fuzzy RD design, the _____ value of the _____ variable determines the _____ of treatment.

cutoff, running, probability

Including the IQ test score as an ability proxy ______ the estimated return to schooling by about ______ percentage points (report one decimal place).

decreases, 0.6

yi=β0+β1xi1+β2xi2+ui,,i=1,...,N,(2); where E(ui|xi1,xi2)=0 R^2 measures how much of the variance of the ______ variable is accounted for by the ______ variables.

dependent, explanatory

Column (1) indicates that time out of work (did/did not) _____ rise for low earners.

did not

yi=β0+β1xi1+β2xi2+ui,,i=1,...,N,(2); where E(ui|xi1,xi2)=0 If xi1 is education and xi2 is labor market experience, and you omit xi2 from (2), then β̂ 1 will be biased ______ because β2 is _____ and cov(xi1,xi2) are _____ correlated.

downward, positive, negatively

A model's MSPE is the _____ value of the _____ prediction error arising from an out-of-sample prediction.

expected, squared

A regression formulation of a DD design is appealing because it

facilitates standard error estimation, generalizes for multiple time periods and treatment groups, accommodates covariates

If we say E(y|x)=β1+β1x where β0 and β1 solve the population least-squares problem, then the CEF is the population regression _____ and β0 and β1 are population regression _____.

function, coefficients

The standard 2×2 DD analysis can be carried out by regressing the outcome on a _____ dummy, a period _____, and their _____.

group, dummy, interaction

We described a TWFE model as a regression model for data with both a _____ and _____ dimension.

group, time

The R function lm gives the wrong standard errors, test statistics and confidence intervals because it ignores ______.

heteroscedasticity

The modern approach to regression inference is to allow for the variance of the regression errors to allow for _____, which implies the variance of the errors depends on the ______.

heteroscedasticity, explanatory variables

Estimating a TWFE model with data on multiple groups and variation in treatment timing can identify the ATT if the treatment effect is _____ .

homogenous

The conditional independence assumption (CIA) is a claim that there is a set of covariates that once you control for them, you can consider the potential outcomes to be ______ of treatment assignment. The CIA is a claim of un______ and is un______.

independent, confoundedness, testable

However, if the potential outcomes are ______ of treatment assignment, the assignment mechanism is ______ and the difference in sample average outcomes for treated and untreated individuals will identify the ATE.

independent, ignorable

Potential outcomes will be ______ of treatment assignment if individuals are ______ assigned to treated and untreated groups.

independent, randomly

Using the difference in sample average outcomes for treated and untreated individuals generally won't work for estimating the ATE because potential outcomes are not ______ of treatment assignment, which results in ______ bias.

independent, selection

The distinction between causes of death is important because estimating the treatment effect on _____ causes should function as a falsification exercise.

internal

Overall, the simple differences-in-means result _____ (is/is not) highly robust, holding up even when you _____ for a range of student and school characteristics.

is, adjust

The high-earner group is (more/less) _____ likely to work in maufacturing and (more/less) _____ likely to work in construction, and the share of high earners in contruction (rises/falls) _____ by _____ points after the WBA increase.

less, more, falls, 4

The population regression function provides the best ______ to the CEF.

linear approximation

The population regression function provides the best _____ of the dependent variable, given the explanatory variables

linear predictor

yi=β0+β1xi1+ui,i=1,...,N.(1); If E(ui|xi1)=0 in (1), xi1 is _____ of ui, and the sampling error of β̂ 1 equals ______ on average, which implies that β̂ 1 is ______.

mean independent, 0, unbiased

yi=β0+β1xi1+ui,i=1,...,N.(1); When the PRF includes more than one x, we say that β1 measures the _____ effect of x1 (without necessarily giving a causal interpretation).

partial

In general, the RD specification should include a low-order _____ in the running variable and _____ of the running variable with the treatment indicator.

polynomial, interactions

While individual treatment effects are not observable, you may be able to identify the average treatment effect (ATE), which is the difference in average ______ outcomes.

potential

Under _____ assignment of students to class type, this can be regarded as the _____ of small class size on test scores.

random, ATE

Figure 1. Stylized sharp RD design The black lines are _____ approximations to potential outcome CEFs.

regression

If you have a set of control variables for which a CIA holds, you can identify the average effect of the treatment on the outcome using a ______ of the outcome on a ______ dummy and ______ .

regression, treatment, control variables

As an alternative to the procedure sketched out in the previous question, by the Frisch-Waugh-Lovell theorem, you could simply run a regression of the outcome on the ______ from a regression of the treatment indicator on the controls.

residuals

The modern approach means we should always report _____ standard errors and test statistics.

robust

yi=β0+β1xi1+ui,i=1,...,N.(1); If there were more than one x in (1), then the formula for β1 would be the _____, except xi1 would be replaced with the _____ from a regression of xi1 on the other 'x's.

same, residuals

yi=β0+β1xi1+ui,i=1,...,N.(1); The OLS estimator for β1 can be obtained by plugging in the _____ of xi and yi for their ______ and plugging in another _____ for each outer expectation.

sample averages, population averages, sample average

A simple comparison of treated vs control observations after treatment misses factors that cause non-random _____ into treatment.

selection

E(yi|Di=1)−E(yi|Di=0)=E(y1i|Di=1)−E(y0i|Di=1) : TERM 1 TERM 2 in (1) is

selection bias

The CEF that captures the simple class-size treatment effects on test scores equivalent to the differences in means obtained from Table 1 is E(yi|___,___)=β0+β1___+β2___.

small, regular+aide

Larger ______ statistics and smaller ______ values indicate ______ evidence against the null hypothesis.

test, p, more

E(yi|Di=1)−E(yi|Di=0)=E(y1i|Di=1)−E(y0i|Di=1) : TERM 1 TERM 1 in (1) is

the average treatment effect on the treated

A DD analysis targets the average treatment effect on the _____ or E(y1i−y0i|Di=_____)

treated, 1

The standard 2×2 DD analysis compares the difference in average outcomes for the _____ observations _____ and _____ treatment with the difference in mean outcomes for the controls _____ and _____ treatment.

treated, before, after, before, after

The parameter γ reflects the average difference between _____ and _____ outcomes.

treated, untreated

To estimate the ATE under a CIA, you also need overlap, which is the ability to observe ______ and ______ units for any set of covariate values.

treated, untreated

A simple before vs after comparison of treated observations misses _____ that are shared by the control group.

trends

The target estimand cannot be estimated directly because E(y0i|Di=1) is _____.

unobservable

yi=β0+β1xi1+β2xi2+ui,,i=1,...,N,(2); where E(ui|xi1,xi2)=0 If you omit xi2 from (2), β̂ 1 will be biased ______ if β2 and cov(xi1,xi2) have the same ______.

upward, sign

A formal expression of the DD regression consistent with Table 2 is:

y= μ+γtreat+ηafter+δtreat⋅after+u

Select the regression specification that is consistent with the black lines.

yi=β0+β1xi+τDi+ui

Under the key identifying assumption of a sharp RD design, the model in Question 7 identifies

τ=E(y1i−y0i|xi=c)


Kaugnay na mga set ng pag-aaral

NURS250: Chapter 3 - Techniques, Safety, and Infection Control PrepU

View Set

PEDs Chapt 19 Nursing Care of the Child with a Cardiovascular Disorder

View Set