Quan 755
In attempting to estimate the causal effect of employee hours worked on total units produced, you include the age of the production equipment as a control variable. Which of the following conditions would not be a good reason to include this control variable?
The age distribution of the plants is highly non-normal.
Suppose you have the following regression results from a regression of home prices on house attributes for a random sample of house transactions: Coefficients Standard Error Intercept 16310.0 4114.5 Number of Bedrooms 7295.3 1399.9 Number of Bathrooms 23473.0 4032.0 Given these results which additional condition would be sufficient to ensure number of bathrooms satisfies the "primary criteria" for a good control variable in attempting to identify the causal effect of number of bedrooms on house prices?
The number of bathrooms is correlated with the number of bedrooms.
After estimating a regression of your firm's store sales and the number of local competitors as follows: Sales = 321,752 + 70.35 Number of Competitors. You are willing to believe you have a random sample of store locations, and a sufficiently large sample size. What is wrong with the following logic, "We need more competitors to enter the markets we're in, so that our sales will rise?"
The positive coefficient on number of competitors is not a causal estimate.
In estimating the linear probability model for whether an individual clicked through on an advertisement based on how much time the individual spent on the website, the regression results are as follows: ClickThroughi = 0.4(0.08) + 0.07(0.01)TimeSpent_i, where the standard errors of each coefficient are reported in parenthesis. Why would the fact that someone in the sample was on the website for 10 hours be problematic?
The prediction for that individual will fall above 1 (the maximum of ClickThrough).
After estimating a probit model for whether an individual purchase a particular product as a function of income, which of the following conditions must hold?
The predictions will fall between zero and 1.
Which of the following conditions is necessary for the sample estimates of the population coefficients that best describe the co-movement amongst the variables to be consistent?
The sample is a random sample.
Suppose one runs the regression of Y on X1 and X2 and the coefficient on X2 is positive. Which of the following correlation conditions must hold in the sample?
The semi-partial correlation between Y and X2 holding X1 constant must be positive.
For a parameter that is identified, what should happen to the 95% confidence interval as the sample size gets larger?
The width of the interval should get smaller.
What are the two primary criteria for identifying "good" controls?
They are likely correlated with treatment and they influence the outcome.
Why is the use of polynomial functional forms typical in trying to estimate non-linear functional forms?
They offer a high degree of flexibility.
Consider the following proposed determining function for the winning percentage of baseball teams WinPcti = 0.500 - 0.1 Team ERAi + 0.2 Team BAi + Ui, where Team ERA is the team earned run average, and Team BA is the team batting average. What is the effect (change) on winning percentage from an increase in Team ERA from 2.3 to 3.3?
A decrease of 0.1 in winning percentage
If one ran the regression on whether a mortgage applicant was approved for a loan or not (Approved_i) on their income to debt ratio, this would be an example of what sort of model?
A linear probability model
When making an interpolation it is possible to use which of the following functional forms?
All of these choices are correct.
In estimating a fixed effects model using panel data, which of the following variables will not be effective controls if you use a full set of (cross-sectional) fixed effects for individuals?
Birthplace of the individual
How does collecting more/alternative data put less burden on the modeling choices of the analyst?
By collecting more data, an analyst can avoid having to make functional form choices to fill in data gaps/extent of the data.
To calculate the partial correlation of Coconut Milk household purchases CMi and Salami Deli Meat household purchases Si controlling for household income Yi, (i.e., pCorr(CMi,Si;Yi)), which of the following regressions will be run:
CMi = b0 + b1Yi + ei
Which object from the first or second stage reports whether an instrument is relevant?
Check if the coefficient on the instrumental variable in the first stage is statistically distinct from zero.
In the event that there is not an acceptable model of the data-generating process within which the treatment effect is identified using samples from the population, the analyst should do what?
Consider alternative data populations (e.g., additional variables) before attempting to estimate the effect.
In determining the causal effect of Price on Sales, if advertising spend is a good control variable, how will the correlation of the error terms from the regression, Salesi = β0 + β1Pricei + Ui, and Salesi = α0 + α1Pricei + α2Advertisingi + Vi and price be related?
Cor(Ui, Pricei) ≠ 0, Cor(Pricei, Vi) = 0
In the determining function for Sales given by Salesi = α0 + α1Pricei + Ui, what is the role of Ui?
Factors other than price that impact sales
Consider the following proposed determining function for a student's grade on the econometrics final, FinalGradei = 68 + 4 HoursStudiedi - 0.5 Num Other Finalsi2 + Ui, where hours studied is the number of hours studied during finals week by student i, and number of other finals squared is the number of other finals student i has during finals week squared. Derive the formula for the change in a student's final grade with respect to a unit change in the number of other finals the student has in finals week.
Final Grade decreases by Num Other Finals.
All of the following will cause an identification challenge except for what?
Imperfect multicollinearity
Suppose you estimate the following regression of a firm's Sales and number of employees at each location across the country: Sales = 95,342 + 0.76 Number of Employees. You are willing to believe you have a random sample of store locations, and a sufficiently large sample size. Which statements are not yet justified by the regression results?
Increasing the number of employees at a store by 2 will raise sales by 0.76 × 2 = 1.52
If the regression results for a linear probability model of mortgage application are given by: Approved_i = 0.6(0.12) + -0.05(0.001)Debt2IncomeRatio_i, with standard errors reported in parenthesis. How should we interpret the coefficient on the debt-to-income ratio variable?
Increasing your debt to income ratio by 1 decreases your probability of being approved by 0.05.
The determining function that drives share of accepted job offers for a company is given by the following equation: AcceptedOfferst = α0 + α1StartingSalaryt + α2EconomicClimatet + Ut, where the unit of observation is particular month (t). Suppose one wanted to use the national unemployment rate (unemploymentt) as a proxy for EconomicClimate. Which of the following describes a condition required to hold for this to be an adequate proxy variable?
StartingSalary, EconomicClimate, and Unemployment rate to be uncorrelated with "other factors" (Ut)
Which of the following variables is most likely to be a limited dependent variable, assuming that each variable will be featured as a dependent variable?
Weekly number of complaints
When might imperfect multicollinearity not require the collecting of more data to remedy the likely imprecise estimates of the affected variables?
When the imperfect multicollinearity is confined to control variables only
In estimating the effect of price on sales (Salesi = α0 + α1Pricei + Ui), you are attempting to find an instrumental variable that will solve the endogeneity problem caused by the confounding factor of number of competitors being within Ui, which is correlated with price. Which of the following statements would suggest that wholesale costs would satisfy the exogenous condition to be a potential instrument variable?
Wholesale costs are uncorrelated with number of competitors.
Suppose you are trying to estimate the following regression: Yi = β0 + β1X1i + β2X2i + β3X3i + Ui, with an instrument for Zi for X2i. All of the following variables will be included (on the right- and left-hand side of this regression) in the first stage of two-stage least squares except for which one?
Yi
Suppose you are estimating the following model: Yi = β0 + β1Xi + Ui. Suppose also that you only observe values of Y that are above 50. What is the consequence of this selection on the values of Y?
Your estimate for β1 will be biased.
The two critical elements required to sign the bias of an omitted variable include the sign of the:
effect of the omitted variable on the outcome and the sign of the correlation between the omitted variable and the treatment.
A parameter is identified in the event that it can be:
estimated with any level of precision given a large enough sample from the population.
In the context of regression analysis, a variable that allows us to isolate the causal effect of a treatment on an outcome due to its exogenous correlation with the treatment is known as a(n):
instrumental variable.
To estimate a difference-in-differences it requires that one has a:
panel data set.
A potential upside of using within estimation besides the reduction in the number of parameters to be estimated is:
r-squared is more meaningful.
A marginal effect summarizes the:
rate of change in the probability of a dichotomous dependent variable equaling one with a one-unit increase in an independent variable.
The R-squared of a regression is 1 - X, where X is the:
um of squared residuals divided by the total sum of squares.
The successful use of a proxy variable to control for a confounding factor will allow you to accomplish all of the following except:
uncover the size of the semi-partial correlation of the confounding factor and the outcome.
When making active predictions, it is important to be able to conclude that:
your estimate of the coefficients is a causal estimate.
After estimating a probit model for the likelihood of a website visitor clicking through conditional on the average income from the county in which the visit's IP address came from, you get the following results: ClickThrough_i = -1.8(0.75) + 0.06(0.005)Income_i, where standard errors are reported in parenthesis. What would be the calculation that yields the marginal effect of income moving from $40,000 to $41,000 on the click-through rate?
Φ(-1.8 + 0.06 × 41) - Φ(-1.8 + 0.06 × 40)
Suppose you've regressed profits across stores (i) in Indiana and Michigan over two years (t) on an Indiana dummy variable as well as on an interaction between an Indiana dummy variable and Year 2 dummy variable. Thus, your regression equation is: Profitsit = β0 + β1Indianait + β2Year2it Indianait + Ui. What is the marginal effect of a store being in Indiana based off this regression equation?
β1 + β2 × Year2it
In order for a variable to be a valid instrumental variable, it needs to satisfy which two conditions?
Relevant and exogenous
Which step is not involved with constructing a representative sample?
Run a t-test to check if the outcomes are different across stratums.
Estimating a probit or logit model via maximum likelihood involves all of the following except for what?
Setting up the moment conditions
Suppose you are estimating the following model: Yi = β0 + β1Xi + Ui. You believe the variance of the unobserved factors (U) varies with X. If this is true, what is the consequence?
None of the answers is correct.
Which object from the first or second stage reports whether an instrument is exogenous?
None of the answers is correct.
Suppose that you observed several key characteristics of a random sample of firms in your industry. You know that the semi-partial correlation of firm Productivity (Y) with R&D (Z) investment holding amount of Labor (X) fixed is positive. Furthermore, suppose you know that the covariance of R&D investment and the amount of labor is positive. How will the coefficient on Labor when you run the regression of Productivity on Labor and R&D investment relate to the coefficient on Labor when you run a regression of Productivity on just Labor?
It'll be less than
Which of the following is not a valid reason to be concerned that your model might suffer from the endogeneity problem?
Measurement error in your outcome variable
If you are planning on running the regression model given by Tenure_i = β0 + β1Salary_i + β2Years of Education_i + U_i, which of the following situations would cause this model to have a limited dependent variable?
No one can have negative tenure at the company.
If you are modeling shopping decisions at the grocery store, and construct a control variable that is coded as 0 for not a store loyalty program member, 1 for an individual store loyalty program member, and 2 for a family store loyalty program member, it would be appropriate to model this as limited dependent variable because:
None of the answers is correct.
