Econometrics (Econ 20) Exam 2
Stata command for constructing means and standard deviations
"summarize (variable name)"
What happens to the sampling distribution as n rises?
the distribution is more concentrated around B1
Examples of selecting certain parts of a sample
drop if phstat>5 drop if racerpi==5 keep if educ1 > 96
What is a falsification test?
falsification tests examine outcomes that should *not* be affected by the events in question
What is a probability density function
for a continuous random variable Z, pdf g(z) is called a probability density function does not give the probabilities because there are infinite outcomes and the probability of any outcome is zero
What is a cumulative distribution function (CDF)
for a continuous random variable probabilities are instead measure with a CDF: G(z) = Pr(Z≤z) = ∫(-∞ to 2)g(t)dt the derivative of the CDF is the probability density function Also whether it is Pr(Z≤z) or Pr(Z<z) logits use G(z) for the "logistic function" which is the cdf for a "logistic" random variable
What is a probability distribution function
for a discrete random variable Z, the probability distribution function gives the probabilities associated with each outcome g(x) = Pr(X = x)
loop command for stata
foreach var in (y variable(s)) { reg `var' (treatment variable) if (variable that is going to be held constant)==(some constant/value) }
What is the normality assumption needed for?
not needed for OLS or BLUE, only needed for hypothesis testing
running a regression in stata
reg (y variable) (x variables)
What is a difference in differences test?
"Difference-in-differences (diff-in-diff) is one way to estimate the effects of new policies. To use diff-in-diff, we need observed outcomes of people who were exposed to the intervention (treated) and people not exposed to the intervention (control), both before and after the intervention." (from internet)
What is the stata command to find f-test p-value?
"dis ftail(q,n-k-1,f-stat)" where "f-stat" is the f statistic, "q" is the number of x's you're testing and "k" is the number of x's in the unrestricted regression
What is the stata command to find f-test critical values?
"dis invftail(q,n-k-1, α)" where "α" is the significance level, "q" is the number of x' you're testing and "k" is the number of x' in the unrestricted regression
What is the stata command to estimate a logit?
"logit y x1 x2 x3 .... " where y is your y variable and x1, x2, x3, etc. are your x variables
merge datasets
"merge 1:1/m:1/1:m merge_variables using filename.dta" • where you chose 1:1 or m:1 or 1:m, merge_variables are the list of variables that you are matching observations on across datasets, and Filename is the name of the data you are merging on to the dataset in memory. • if you are merging multiple data sets, be sure to add the "drop _merge" on the next line
What is the stata command to estimate the "marginal effect" of x on y (note: comes after estimating the logit)?
"mfx" "mfx at (____,_____)" where constant values for x1 and x2 are inputted into the respective _____s
Problems with LPM model include:
1. its linear, so predicted values can lie outside of [0,1] 2. this model will violate the assumption of homoskedasticity, so standard errors and hypothesis tests may be wrong 3. despite drawbacks, it's usually a good place to start when y is binary - it's still unbiased
what does (e^xB) equal?
= (Pr(Y = 1|X))/(1 - Pr(Y = 1|X)) meaning that ln(Pr(Y = 1|X))/(Pr(Y = 0|X)) = B0 + B1X1 + .... = y*
What is a placebo test?
A placebo test involves demonstrating that your effect does not exist when it "should not" exist (from internet, not a great definition, try and find a better one in the notes)
What is the motivation/goal of asymptotics?
Asymptotics asks how certain factors change as the sample size increases (e.g. effciency, sampling distribution, OLS estimators and OLS assumptions, etc.) value: can do inference under much weaker assumptions such as • hypothesis testing that is not reliant on assumption of normally distributed errors • standard errors not relying on homoskedasticity assumption (", robust") • some estimators are biased but bias shrinks in large samples ("consistent") in practice, we usually rely on asymptotics
in the following equation, estimate B5: Yi = B0 + B1*Di^(northeast) + B2*Di^(south) + B3*Di^(midwest) + B4*edi + B5*Di^(northeast)*edi + B6*Di^(south)*edi + B7*Di^(midwest)*edi +ui
B5 = how different the slope is of income-education relationship in the northeast compared to the west
What is the interpretation of Bj's for dummy dependent variable regressions?
Bj's represent the change in probability of something (e.g. being employed) as x's change (hence the term linear probability model) the units are percentage points (after you multiply Bj by 100) say "holding constant _________ (controls) each additional _____ (unit) of _______ (x variable) lowers/raises the chance of _______ (dummy dependent variable) by _______(Bj * 100) percentage points
What is the Central Limit Theorem (CLT)?
Central Limit Theorem says that OLS estimators are "asymptotically normal" under MLR 1-5 => (B̂j - Bj)/se(B̂j) ~ ˆa N(0,1) where ~ ˆa means "asymptotically distributed" NOT NORMALLY DISTRIBUTED (same as "plim") also according to CLT, B̂j and σ̂ are, respectively, consistent estimates of Bj and σ
Basic characteristics of all CDFs
G(z) = (e^2)/(1 + (e^2)) = Λ(z) = σ(xB) = Λ(xB) where xB = B0 +B1X1 + B2X2 + ..... in short, a logit fits Pr(y = 1|x) = (e^xB)/(1+(e^xB)) notice like all cdfs • lim(z --> ∞) G(z) = 1 • lim(z --> -∞) G(z) = 0 • G(z) is always increasing (G'(z) > 0)
What is the Lagrange Multiplier Statistic?
LM stat is an alternative to an F-test for testing multiple exclusion restrictions that relies only on large samples • uses an auxiliary regression and is sometimes called an "nR^2" stat • uses chi-squared distribution in place of F • rarely used in place of regular F-test but used in tests of heteroskedasticity and other things
what does the latent variable model (LVM) model?
Pr(y = 1|x) = Pr (y*>0|x) => Pr(u < xB) => G(xB) = Λ(xB) where Λ means percent variance notice that the non-linear model does not get rid of the potential for bias (still need cov(x, u) = 0 to be constant)
What is a balance test?
The purpose of a balance test is to show that observable third factors that potentially affect the outcome are uncorrelated with the treatment , or in symbols, that the δ1 from the omitted variables bias formula is close to zero for as many relevant potential controls as possible. This either helps make the case for the causal interpretation of the estimates (if few of the correlations are substantial or significant) or points to the controls necessary
Dummy dependent variables
a dependent variable that takes on one of two values (usually one or zero) can be estimated with OLS
Linear probability model
a linear regression. with a dummy dependent variable
Dummy variable
a variable that takes on one of two values (usually one or zero) aka 'binary variables' no such thing as a marginal change in a dummy variable the marginal effect on dummy variables is calculated by comparing the change in predicted probabilities from 0 to 1
What do the coefficients on dummy variables represent (w/o interactions)?
absent other controls, the coefficients on dummy variables represent the difference in average y between included and excluded groups
What does the intercept estimate (w/o interactions)?
absent other controls, the intercept captures the mean for y for the excluded group
What are some examples of data with non-normal errors?
any clearly skewed variables (e.g. wages, savings, arrests, etc.) discrete data or positive only data
How do we defend the parallel trends assumption?
assumption cannot be tested directly, we have to rely upon indirect evidence by using pretests or falsification tests
What is the interpretation of the coeffcient of a dummy variable?
coefficient on each dummy variable are interpreted as an intercept shift relative to the excluded group
Command for creating a new variable in stata
gen (variable name) = 6-phstat (or whatever other value for want the variable to take on)
Interpret the estimated slope coefficient of "white": yrsed = B0 + 0.5*white + 0.08*birthyear
holding constant birthyear, whites average 0.5 years of education more than nonwhites
Latent variable model
idea is that there is a latent (unobserved) continuous variable y* that measures "propensity" to be in the 1 category y* = B0 +B1X1 + B2X2 + ..... + u = xB + u Threshold model: • we observe not y* but: --> y = 1, if y* > 0 (and xB > -u) --> y = 0, if y* ≤ 0 •also, we assume a distribution for u (logistic = logit, normal = probit)
How do you do a t-test in a large sample?
if you have a large sample, you need not rely on errors being normally distributed to do a t-test • do a t-test, but draw the values from a N(0,1) (1.645, 1.96, etc.) • what constitutes a large sample depends on data, but in most cases you don't need that big of a sample
What do interaction terms allow for?
interaction terms between dummy variables and another variable (e.g. education) allow for a difference in slopes across the groups (interactions are entered from all but one group)
How do we interpret logit slopes in terms of probability?
marginal effects (∂Pr(Y = 1))/(∂Xj) = Λ'(B̂0 + B̂1X1 + B̂2X2 + .... )B̂j Λ'(z) = (e^2)/((1+ (e^2))^2) = Λ(z)*[1 - Λ(z)] so.... (∂Pr(Y = 1))/(∂Xj) |xˉ = p̂(1-p̂)B̂j where p̂ = Λ'(B̂0 + B̂1X1 + B̂2X2 + .... )
What does the OLS model for dependent dummy variables (aka a linear probability model) give?
model gives E[y|x] = B0 + B1*X1 + .... + BkXk E[y] for the dummy variable is the probability of being in the 1 category
What do you need to determine if a relationship is causal?
need to be able to observe the counterfactual from internet: "To establish causality you need to show three things-that X came before Y, that the observed relationship between X and Y didn't happen by chance alone, and that there is nothing else that accounts for the X -> Y relationship"
What is a pretest?
pretests examine trends across comparison groups in outcomes prior to the event
How do we interpret the logit slope?
take the ceoffcient and multiply it by 100, it is then the percent impact on the odds that the dependent variable is 1 holding constant other variables in the regression
what does the coefficient of AFTERi * NOT nyi * FOREIGN represent?
the (1987-99) change in the share of babies with a healthy birth weight born to foreign-born mothers not in New York (CA, FL, TX) compared to the change in the share with a healthy birth weight born to foreign-born mothers in New York was 0.833 percentage points greater than the change in the share of babies with a healthy birth weight born to native-born mothers not in New York (CA, FL, TX) compared to the change in the share with a healthy birth weight born to native-born mothers in New York.
What does the coefficient on the interacted variable represent?
the coefficient on the interacted variable represents the difference in slopes between the included and excluded groups
What does the coefficient variable represent?
the coefficient variable represents the slope for the excluded group
Excluded group
the group for which there is not a variable (categorical variables are created by being entered as controls in regressions by putting in dummies for all but one category) Ex. if there are 3 regions East, Central and West and the equation is y = B0 + B1*east + B2*central then west would be the excluded group
What is the key assumption of the Rubin Causal model?
the key assumption is that absent treatment, treatment and control groups are on average the same E[ΔYio | Ti = 1] = E[ΔYio | Ti = 0] aka the parallel trends assumption
What is the opposite of a consistent estimator?
the opposite of a consistent estimator is one with "asymptotic bias" (bias that remains as the sample gets arbitrarily large) inconsistency is a large sample problem, it doesn't go away as we add more data in contexts where the OLS is biased it is also inconsistent
What does the mean of a dummy variable signify?
the share of the sample in the "1" (treatment) category
How do we establish the consistency of an OLS?
under MLR 1-4 OLS slope estimates are consistent in place of the expected value (which establishes unbiasedness) we take the probability limit (or "plim") to establish consistency generally speaking, "plim" gives the distribution of an estimator as N --> ∞
Interpret the estimated slope coefficient of "female": ln(wage) = B0 - 0.26*female + 0.12 * yrsed
we can say either: the male-female wage gap is 26% holding constant education or the male-female wage gap is 26% at the same yrsed value
What does "δ" stand for?
δ is interpreted as an intercept shift
An estimator is consistent if......?
• it approaches the true population parameter as the sample gets large [e.g. lim as n--> ∞ of X̄ = μ or lim as n--> ∞ of B̂ = B (aka probability limit)] • with rare exception, unbiased estimators are also consistent if lim as n--> ∞ of se = 0 --> the estimator's distrbution "collapses" to population value • some estimators are biased but not consistent --> X̄ + 1/N --> with a reasonable sample size we are not worried about bias
What is a logit model?
• nonlinear probability model • a common transformtion of the model is: ln{Pr(Y=1|X)/Pr(Y=0|X)} = B0 + B1X1 +..... where {Pr(Y=1|X)/Pr(Y=0|X)} is the "odds" of the relative probability of 'success' (Y = 1) vs 'failure' (Y = 0) • B's represent the proportional effect of a one-unit change in x on the 'odds' that Y = 1 • B's are estimated to maximize "log-likelihood" (joint probability of observing the data) • standard errors and confidence intervals work as usual • coefficients are different from LPM (ln(odds) vs probability)