ACE 264 Exam
Which of the following assumptions is NOT necessary to justify use of the least squares estimator for the linear regression model? (a) No omitted variables (b) No sample selection (c) Large outliers are unlikely (d) Homoskedasticity (e) No perfect collinearity among regressors
(d) Homoskedasticity
Which regression method seeks to isolate variation in an endogenous regressor that is "as-if" randomly assigned? (a) Multiple regression (b) Polynomial regression (c) Fixed-effects regression (d) Instrumental variables regression (e) Time series regression
(d) Instrumental variables regression
In the case of a linear regression model with one regressor, Bhat1 refers to: (a) The slope coefficient estimate (b) The formula for the slope coefficient estimator (c) The population slope coefficient (d) a and b (e) All of the above
(d) a and b
In the case of a linear regression model with one regressor, βhat1 can refer to: (a) The estimated slope coefficient value (b) The formula for the slope coefficient estimator (c) The slope coefficient for the population regression function (d) a and b (e) All of the above
(d) a and b
Heteroskedasticity is a threat to the internal validity of linear regression analysis because: (a) It biases coefficient estimates (b) It results in incorrect standard errors (c) It may lead to incorrect conclusions from hypothesis tests (d) b and c (e) All of the above
(d) b and c
The R^2 statistic measures: (a) The statistical significance of the estimated effect of X on Y (b) How well the regression model fits the observed data (c) The size of the sum of squared residuals (SSR) realative to the variation in y (d) b and c
(d) b and c
Which types of variables (defined above) are included as regressors in the first stage regression? (a) x, y, and z (b) x and z (c) w, y, and z (d) w and z (e) z only
(d) w and z
Omitted variables bias occurs: (a) Anytime an observed variable is left out of a linear regression model (b) When omitted variables are correlated with the regressor of interest, x (c) When omitted variables are determinants of the outcome variable y (d) When either b or c is true (e) When both b and c are true
(e) When both b and c are true
A regression coefficient is estimated to be Bhat = 6200. You want to test whether this is significantly different from zero. (i.e. H0 : B = 0 vs H1 : B not= 0) To complete this test, we could: (a) Estimate the standard error for B and calculate the test statistic and p-value (b) Determine whether the 95% confidence interval contains zero (c) Reject the null hypothesis since 6200 is a large number (d) Choose a significance level we are confident will result in rejecting the null hypothesis (e) a and b
(e) a and b
Which type of data is likely easiest to find? A. Cross-sectional data (units only) B. Unbalanced panel data (some units and times) C. Balanced panel data (all units and times)
A. Cross-sectional data (units only)
How do we test the joint significance the coefficients on multiple regressors? A. t-test B. F-test C. Compare R^2 D. Use confidence interval
B. F-test
Suppose T = 2. What happens if we try to include binary regressors for both time periods? (D1t = 1 if t = 1 and D2t: = 1 if t = 2) A. Higher R2 B. Perfect multicollinearity C. Heteroskedastic standard errors D. Estimated effect of x on y is not constant
B. Perfect multicollinearity
Test your understanding: For rainfall to be a valid instrument for prices, rainfall and the error term must be A. Correlated B. Uncorrelated
B. Uncorrelated
Which challenge to model validity involves differences between the variable we want to assess the effect of (x) and some imperfect proxy for that variable? A. Omitted variables B. Sample selection C. Measurement error D. Simultaneous causality
C. Measurement error
Which challenge to model validity did we try to address using instrumental variables in the butter demand elasticity example below? In(qi) = Bo + B1 In(Pi) + ui A. Omitted variables B. Sample selection C. Simultaneous causality D. Measurement error
C. Simultaneous causality
In economics (and econometrics), what is an elasticity? A. The estimated variability in an effect B. The slope of the demand or supply curve C. The percentage change in one variable given a percentage change in another variable D. The change in a variable given multiple changes in other economic factors
C. The percentage change in one variable given a percentage change in another variable
Which challenge to model validity involves chicken-and-egg-type situations where x causes y and y causes x? A. Omitted variables B. Sample selection C. Measurement error D. Simultaneous causality
D. Simultaneous causality
Two regression models have the same outcome, y. Model A has a smaller sum of squared residuals, Ui^2 than Model B, so: (a) A has a higher R^2 than B (b) B has a higer R^2 than A (c) A and B have the same R^2 (d) We cannot say for sure which model has the higher R^2
(a) A has a higher R^2 than B
If the correlation between a regressor x1 and omitted variable is positive, and the effect of the omitted variable on the outcome is positive, then: (a) B1hat > B1 (b) B1hat < B1 (c) B1hat > B2 (d) B1hat < B2
(a) B1hat > B1
Fill in the blank: Adding binary regressor D to the simple linear regression model yi = B0 + B1Xi + ui allows the ________ to vary depending on whether D = 0 or D = 1. (a) Intercept (b) Slope (c) F-statistic (d) R^2
(a) Intercept
What statistic is used to estimate the expected value of a variable? (a) Mean (b) Standard deviation (c) Variance (d) Z-statistic
(a) Mean
Holding SE constant, we are ____________ to reject the null hypothesis when the difference between the estimated and hypothesized values is larger. (a) More likely (b) Less likely (c) Neither more or less likely (d) Hypothesis test do not compare estimated and hypothesized values
(a) More likely
What variable created in the first stag regression does the 2SLS estimator use in the second stage? (a) The predicted values for the first-stage outcome (b) The residuals (c) The standard error for B1hat (d) The F-statistic testing the null hypothesis that the coefficients on the instrumental variables are equal to zero
(a) The predicted values for the first-stage outcome
Unit fixed effects models control for: (a) Time-invariant omitted variables (b) Unit-invariant omitted variables (c) Both of the above (d) None of the above
(a) Time-invariant omitted variables
In the US counties are administrative areas smaller than a state. In panel data where counties and years are the unit and time period, respectively, the geographic size of a county measured in square miles is an example of a: (a) Time-invariant variable (b) Unit-invariant variable (c) Both of the above (d) None of the above
(a) Time-invariant variable
Evaluate this statement. If the estimated effect of x on y changes when we implement a method like instrumental variables to address bias in our estimated effect of x on y, it suggests (but does not prove) the original estimate was biased. (a) True (b) False
(a) True
Evaluate this statement: The linear regression model estimated using the ordinary least squares estimator can be used to study both linear and non-linear relationships between two variables x and y. (a) True (b) False
(a) True
You survey 1,000 US farmers and ask if they use organic practices. You code this data as one if they say yes, and zero if they say no. This is what kind of variable. (a) Random (b) Binary (c) Continuous (d) Normally-distributed
(b) Binary
You want to test the null hypothesis that two regression coefficients are both equal to zero: B1 = 0 and B2 = 0. Which statistic is necessary to evaluate this hypothesis? (a) t (b) F (c) s (d) We do not have a statistic to evaluate joint hypotheses like this one
(b) F
Evaluate this statement: "Both the variance and covariance must always be positive" (a) True (b) False
(b) False
Evaluate this statement: Our procedure for testing the statistical significance of the coefficient estimate for B1, is different when the regression function contains logarithmic transformations of the variables. (a) True (b) False
(b) False
Evaluate this statement: Panel data can address all types of omitted variables bias. (a) True (b) False
(b) False
We can calculate the p-value for the test statistic t using values from the normal distribution because: (a) All variables are normally distributed (b) In large samples, the estimated regression coefficient follows a normal distribution with a given mean and variance (c) The expected value of estimated regression coefficient is the true value, i.e. it is unbiased (d) None of the above
(b) In large samples, the estimated regression coefficient follows a normal distribution with a given mean and variance
We can estimate models that include unit fixed effects by doing each of the following EXCEPT: (a) Including n - 1 binary regressors for each unit but one (b) Including T - 1 binary regressors for each time period but one (c) Taking the difference of each variables over time if our data have only T = 2 time periods (d) Subtracting the unit-specific mean from each variable
(b) Including T - 1 binary regressors for each time period but one
You want to estimate the effect of food consumption habits on obesity. If your survey procedure systematically excludes overweight people because you only survey college students who regularly go to the gym, your regression analysis of these data may have problems with: (a) Omitted variables (b) Sample selection (c) Measurement error (d) Wrong functional form (e) Simultaneous causality
(b) Sample selection
Suppose we change the scale of a regressor: we divide it by 1000 to express it in thousands. What happens to the statistical significance of our coefficient estimate? (a) It becomes more likely that the estimate is statistically significant (b) Scaling the regressor does not affect statistical significance (c) It becomes less likely that the estimate is statistically significant (d) We cannot say what happens to statistical significance
(b) Scaling the regressor does not affect statistical significance
Perfect multicollinearity occurs when: (a) The variance of ui varies with x (b) The correlation between two regressors equals one (c) There is a large outlier in one regressor (d) All of the above
(b) The correlation between two regressors equals one
In panel data where counties and years are the unit and time period, respectively, state-level tax rates are an example of a: (a) Time-invariant variable (b) Unit-invariant variable (c) Both of the above (d) None of the above
(b) Unit-invariant variable
The two-stage least squares estimator involves a first stage regression. What is the outcome variable in the first stage regression? (a) w, an exogenous regressor from the second stage regression equation (b) x, the endogenous regressor in the second stage regression equation (c) y, the outcome in the second stage regression equation (d) z, an instrumental variable that is correlated with x
(b) x, the endogenous regressor in the second stage regression equation
Fill in the blanks: The least squares estimator finds the values of __________ that minimize the sum of _____________. (a) B , Ui (b) Bhat , X^2i (c) Bhat , U^2i (d) Yi , Bhat (e) Yi , U^2i
(c) Bhat , U^2i
You estimate the effect of education on earning using data on individuals in Indiana. Your results will have external validity to Illinois if: (a) Individuals in each state behave similarly so the two populations are comparable (b) Relevant policies, economic systems, and other setting are comparable (c) Both are necessary for external validity
(c) Both are necessary for external validity
How do we typically address heteroskedasticity in the linear regression model? (a) Include as many regressors as possible (b) Re-scale our variables to make them easier to interpret (c) Calculate SE(Bhat) using the heteroskedasticity (d) Assume the problem goes away if we have a large sample size
(c) Calculate SE(Bhat) using the heteroskedasticity
You survey 1,000 US consumers and ask how much they spent on organic food products in the last week. This is what kind of variable? (a) Binary (b) Discrete (c) Continuous (d) None of the above
(c) Continuous
Which of the following statistics does NOT measure variability in a single variable? (a) Standard deviation (b) Variation (c) Correlation (d) All of these measure variability in a single variable.
(c) Correlation
The output includes a hypothesis test conducted using the linearHypothesis()command. The objective of this test is to assess the relevance of the instrumental variable. What is the criteria for concluding the instrumental variable is relevant? Is the instrumental variable relevant in this case? (a) Having p < 0.05; yes, it is. (b) Having p < 0.05; no, it is not. (c) Having F > 10; yes, it is. (d) Having F > 10; no, it is not.
(c) Having F > 10; yes, it is.
The variable created when one regressor x1, is multiplied by another x2 is called a(n): (a) Binary variable (b) Coefficient (c) Interaction term (d) Quadratic term (e) Intercept
(c) Interaction term
You want to estimate the effect of food consumption habits on obesity. You survey 1,000 people about their food consumption patterns over the past year and measure their current weight, height, and body fat. If your survey respondents have difficulty remembering their past food consumption, your regression analysis of these data may have problems with: (a) Omitted variables (b) Sample selection (c) Measurement error (d) Wrong functional form (e) Simultaneous causality
(c) Measurement error
For linear regression, which of the following is NOT a reason why it is generally better to have a larger sample size? (a) Coefficient estimates get closer to their (theoretical) true value (b) The variance of estimated coefficients gets smaller (c) Omitted variable bias in an estimated coefficient is reduced (d) The distribution of an estimated coefficient becomes approximately normal (e) All of the above are reasons why larger sample size is better.
(c) Omitted variable bias in an estimated coefficient is reduced
Testing the joint null hypothesis Ho : R, = 0 and P2 = 0 can involve the estimation of a(n) ____________ regression model where x1 and x2 are excluded as regressors. (a) Simple (b) Scaled (c) Restricted (d) Unrestricted
(c) Restricted
In the case of a linear regression model with only one regressor, the correct interpretation of Bhat1 is: (a) The correlation between x and y (b) The change in y given an average change in x (c) The change in y given a one unit change in x (d) The change in y given a one unit change in x, holding all omitted variables constant
(c) The change in y given a one unit change in x
The two conditions for a valid instrumental variable are relevance and exogeneity. Instruments are considered exogenous when: (a) The instrumental variable, z, is correlated with the endogenous regressor, x (b) The instrumental variable, z, is correlated with the outcome, y (c) The instrumental variable, z, is uncorrelated with the error term, u (d) The instrumental variable, z, is uncorrelated with the outcome, y
(c) The instrumental variable, z, is uncorrelated with the error term, u
In the context if statistical hypothesis testing about a regression coefficient B, the p-value describes: (a) The probability that B takes on the hypothesized value (b) The probability that B is greater than the hypothesized value (c) The probability of observing Bhat if B equals the hypothesized value (d) The probability that Bhat equals B
(c) The probability of observing Bhat if B equals the hypothesized value
Data analysts often prefer log-log specifications because: (a) They always provide a better fit to the data (b) They are easier to estimate (c) Their slopes can be interpreted as an elasticity (d) You cannot take a logarithm of a negative number
(c) Their slopes can be interpreted as an elasticity
Which of the following is NOT a method for addressing threats to internal validity: (a) Including additional regressors x that might otherwise be omitted variables (b) Using instrumental variables for endogenous regressors (c) Using R^2 to measure goodness-of-fit (d) Including unit and time fixed effects in panel data (e) All of the above are methods to address issues of internal validity
(c) Using R^2 to measure goodness-of-fit
In the simple linear regression model, yi = Bo + B1Xi, + Ui ; Xi is considered an endogenous regressor if: (a) Xi is correlated with the outcome, Yi (b) yi has its own simultaneous causal relationship with xi (c) Xi is correlated with the error term, Ui (d) b and c (e) None of the above
(c) Xi is correlated with the error term, Ui
Calculate the 95% confidence interval for B in the question above, assuming SE(Bhat) = 1000. (a) (-1000, 6200) (b) (6198, 6202) (c) (5200, 7200) (d) (4240, 8160)
(d) (4240, 8160)
A regression model that includes both unit and time fixed effects is called: (a) A two-stage least squares model (b) A two-period model (c) A two-way fixed effects model (d) A T = 2 model
(d) A T = 2 model
Which of the following methods may be used to select a nonlinear regression specification? (a) Plot the data and observe how different specifications fit the data (b) Compare R^2 and/or adj - R^2 for different models (c) Test the statistical significance of coefficient estimates for variables added to the simple linear regression equation (d) All of the above
(d) All of the above
In the in-class example of the estimated effect of ballot order on voting outcomes in Texas elections, omitted variable bias was less of a concern because: (a) Ballot order is the only factor that affects voting outcomes, so there were no omitted variables (b) The estimated effect was so large that it probably exceeded any possible bias (c) In the election considered, both candidates had the same last name so voters couldn't tell the difference between the two candidates. (d) Ballot order was randomized, so the correlation between ballot order and other factors is likely to be zero
(d) Ballot order was randomized, so the correlation between ballot order and other factors is likely to be zero