306 Final Exam

¡Supera tus tareas y exámenes ahora con Quizwiz!

Suppose that the linear probability model yields a predicted value of Y that is equal to 1.3. Explain why this is nonsensical.

The predicted value of Y must be between 0 and 1.

Suppose that you have just read a careful statistical study of the effect of advertising on the demand for cigarettes. Using data from New York during the​ 1970s, the study concluded that advertising on buses and subways was more effective than print advertising. Use the concept of external validity to determine if these results are likely to apply to Boston in the​ 1970s; Los Angeles in the​ 1970s; New York in 2010.

The results are likely to apply to Boston in the​ 1970s, but not to Los Angeles in the 1970s or New York in 2010.

A researcher estimates a regression using two different software packages. The first uses the​ homoskedasticity-only formula for standard errors. The second uses the​ heteroskedasticity-robust formula. The standard errors are very different. Which should the researcher​ use?

The ​heteroskedasticity-robust standard errors should be used.

An ordinary least squares regression of Y onto X will be internally inconsistent if X is correlated with the error term. Each of the five primary threats to internal validity implies that X is correlated with the error term.

True True

By imposing restrictions on the true​ coefficients, the researcher wishes to test the null hypothesis that the coefficients on I and E are jointly​ 0, against the alternative that at least one of them is not equal to​ 0, while controlling for the other variables. The values of the sum of squared residuals ​(SSR​) from the unrestricted and restricted regressions are 36.50 and 37.50​, respectively.

(((37.5-36.5))/(2))/((36.5)/(100-3-1))

Consider the following least squares specification between test scores and the​ student-teacher ratio: Test Score=557.8+36.42 1n(Income) According to this​ equation, a​ 1% increase in income is associated with an increase in test scores​ of

.36 points

Consider the following regression output where the dependent variable is test scores and the two explanatory variables are the​ student-teacher ratio and the percent of English​ learners: Test Score=698.9−1.10×STR−0.650×PctEL. You are told that the t​-statistic on the​ student-teacher ratio coefficient is 2.56. The standard error therefore is​ approximately:

.43

The adjusted R^2​, or Rbar^2​, is given​ by:

1- n-1/n-k-2 SSR/TSS

The critical value of F4,_∞at the​ 5% significance level​ is:

2.37

Assume that you had estimated the following quadratic regression​ model: Test Score=607.3+3.85Income−0.0423Income2 If income increased from 10 to 11​ ($10,000 to​ $11,000), then the predicted effect on test scores would​ be:

2.96

You have estimated the relationship between test scores and the​ student-teacher ratio under the assumption of homoskedasticity of the error terms. The regression output is as​ follows: Test Score=698.9−2.28×STR​, and the standard error on the slope is 0.48. The​ homoskedasticity-only "overall" regression F​-statistic for the hypothesis that the regression R2 is zero is​ approximately:

22.56

What is the difference between internal validity and external validity​? What is the difference between the population studied and the population of interest​?

A statistical analysis is said to have internal validity if the statistical inferences about causal effects are valid for the population being studied. The analysis is said to have external validity if conclusions can be generalized to other populations and settings. The population studied is the population from which the sample was​ drawn, while the population of interest is the population to which causal inferences from this study are to be applied.

What is the​ trade-off when including an extra variable in a​ regression?

An extra variable could control for omitted variable​ bias, but it also increases the variance of other estimated coefficients.

A researcher estimates the effect on crime rates of spending on police by using​ city-level data. Which of the following represents simultaneous​ causality?

Cities with high crime rates may need a larger police​ force, and thus more spending. More police​ spending, in​ turn, reduces crime.

Which of the following is an example of panel​ data? A panel in which the variables are studied for all 'n​' entities for all 'T​' time periods is (_________) panel. Such omitted variable biases in panel regressions can be removed by using an OLS regression with (__________)

Data on the performance of Golden State​ Warriors, Cleveland​ Cavaliers, Chicago​ Bulls, New York​ Knicks, and Dallas Mavericks in the NBA playoffs for the years 2000 to 2015. a balanced fixed effects

A recent study found that the death rate for people who sleep 6 to 7 hours per night is lower than the death rate for people who sleep 8 or more hours. The 1.1 million observations used for this study came from a random survey of Americans aged 30 to 102. Each survey respondent was tracked for 4 years. The death rate for people sleeping 7 hours was calculated as the ratio of the number of deaths over the span of the study among people sleeping 7 hours to the total number of survey respondents who slept 7 hours. This calculation was then repeated for people sleeping 6​ hours, and so on. Based on this​ summary, would you recommend that Americans who sleep 9 hours per night consider reducing their sleep to 6 or 7 hours if they want to prolong their​ lives? Why or why​ not? Explain.

Drug or alcohol use. Type of employment. Indicator for chronic illness

The following OLS assumption is most likely violated by omitted variables​ bias:

E(u_i|X_i)=0

Consider a regression with two​ variables, in which X_1i is the variable of interest and X_2i is the control variable. Conditional mean independence​ requires:

E(ui|X_1i,X_2i)=E(ui|X_2i)

Consider the polynomial regression model of degree r​, Yi=β0+β1Xi+β2X2i+•••+βrXri+μi. According to the null hypothesis that the regression is linear and the alternative that is a polynomial of degree r corresponds​ to:

H_0:B_2=0,B_3=0,...,B_r=0 vs H_1: at least one B_j (does not equal) 0, j=2,...,r

You have estimated a linear regression model relating Y to X. Your professor​ says, "I think that the relationship between Y and X is​ nonlinear." How would you test the adequacy of your linear​ regression? ​(Check all that apply​)

If adding a quadratic​ term, you could test the hypothesis that the estimated coefficient of the quadratic term is significantly different from zero. Compare the fit between of linear regression to the​ non-linear regression model.

When does missing data pose a threat to internal​ validity? Which of the following statements is not an implication of the regression error being correlated across​ observations?

Internal validity is threatened when the data are missing because of a selection process that is related to Y_i beyond depending on X_i The OLS estimators become biased and inconsistent.

What do subscripts i and t refer​ to?

Subscripts i and t identify the entity and time period respectively.

A researcher is interested in the effect on test scores of computer usage. Using school district​ data, she regresses district average test scores on the number of computers per student. What are possible sources of bias for β1​, the estimated effect on tests scores of increasing the number of computers per​ student? For each source of bias​ below, determine whether β1 will be biased up or down. Average income per capita in the district. If this variable is​ omitted, it will likely produce​ a(an) (________)bias of the estimated effect on tests scores of increasing the number of computers per student. The availability of computerized adaptive learning tools in the district. If this variable is​ omitted, it will likely produce​ a(an) (_________)bias of the estimated effect on tests scores of increasing the number of computers per student. The availability of​ computer-related leisure activities in the district. If this variable is​ omitted, it will likely produce​ a(an) (____) bias of the estimated effect on tests scores of increasing the number of computers per student.

Upward Upward Downward

Graph Question

Y_i=B_0+B_1K_i+B_2X^2_i+u_1 The relationship between wage earnings and years of experience. The relationship between time spent studying for an exam and grade for such exam. The relationship between income and fertility.

A polynomial regression model is specified​ as:

Y_i=B_0+B_1X_i+B_2X^1_1+***B_rX^r_i+u_i

Consider the following regression​ equation: Yi=β0+β1Xi+β2Xi ×Di+ui​, where β0​, β1​, β2​, and ui are the​ intercept, the slope coefficient on Xi​, the coefficient on the interaction term which is the product of ​(Xi ×Di​), where Di is the binary variable respectively. This regression equation has (_____) slope and (______)intercept for the two values of the binary variable. The coefficient on ​(X1×X2​) is the effect of a​ one-unit increase in the product of X1 and X2​, above and beyond the sum of the individual effects of a unit increase in X1 alone and a unit increase in X2 alone. Which of the following statements describes a way of determining the degree of the polynomial in X which best models a nonlinear​ regression? Let r denote the highest power of X that is included in the regression.

a different the same this holds true whether Upper X 1 and divided by or Upper X 2 are continuous or binary. A way is to check if the coefficients in the regression equation associated with the largest values of r are equal to zero.

All of the following are true with the exception of one​ condition:

a high R^2 or Rbar^2 always means that an added variable is statistically significant.

A survey of earnings contains an unusually high fraction of individuals who state their weekly earnings in​ 100s, such as​ 300, 400,​ 500, etc. This is an example of:

​errors-in-variables bias.

Consider the population regression of log earnings ​[Yi​,where Yi= ​ln(Earningsi​)] against two binary​ variables: whether a worker is married ​(D1i​,where D1i ​= 1 if the ith person is​ married) and the​ worker's gender (D2i​,where D2i = 1 if the ith person is​ female), and the product of the two binary variables Yi=β0+β1D1i+β2D2i+β3D1i×D2i+μi. The interaction​ term:

allows the population effect on log earnings of being married to depend on gender.

Under the least squares assumptions for the multiple regression problem​ (zero conditional mean for the error​ term, all X_i and Y_i being​ i.i.d., all X_i and μ_i having finite fourth​ moments, no perfect​ multicollinearity), the OLS estimators for the slopes and​ intercept:

are unbiased and consistent.

The interpretation of the slope coefficient in the model ln(Yi)=β_0+β_1ln(Xi)+μi is as​ follows:

a​ 1% change in X is associated with a β_1​% change in Y.

In the multiple regression​ model, the t​-statistic for testing that the slope is significantly different from zero is​ calculated:

by dividing the estimate by its standard error.

If you had a​ two-regressor regression​ model, then omitting one variable that is​ relevant:

can result in a negative value for the coefficient of the included​ variable, even though the coefficient will have a significant positive effect on Y if the omitted variable were included.

The​ homoskedasticity-only F​-statistic and the​ heteroskedasticity-robust F​-statistic typically​ are:

different.

A binary variable is often called​ a:

dummy variable.

Threats to internal validity lead​ to:

failures of one or more of the least squares assumptions.

The probit​ model:

forces the predicted values to lie between 0 and 1.

Consider the multiple regression model with two regressors X1 and X2​, where both variables are determinants of the dependent variable. When omitting X2 from the​ regression, there will be omitted variable bias for B_1

if X_1 and X_2 are correlated.

Imperfect​ multicollinearity:

implies that it will be difficult to estimate precisely one or more of the partial effects using the data at hand.

The question of​ reliability/unreliability of a multiple regression depends​ on:

internal and external validity.

A nonlinear​ function:

is a function with a slope that is not constant.

This problem is inspired by a study of the​ "gender gap" in earnings in top corporate jobs​ [Bertrand and Hallock​ (2001)]. The study compares total compensation among top executives in a large set of U.S. public corporations in the 1990s.​ (Each year these publicly traded corporations must report total compensation levels for their top five​ executives.) Let Female be an indicator variable that is equal to 1 for females and 0 for males. A regression of the logarithm of earnings onto Female yields ln(Earnings)=6.48−0.44Female, SER=2.65 ​(0.01) ​ (0.05) The estimated coefficient on Female is​ -0.44. Explain what this value means. The SER is 2.65. Explain what this value means. Does this regression suggest that female top executives earn less than top male​ executives? Does this regression suggest that there is gender​ discrimination?

ln(Earnings​) for females​are, on​average, 0.44 lower than​men's. Earnings for females​ are, on​ average, 44% lower than​ men's. The error term has a standard deviation of 2.65​ (measured in​ log-points) Yes No

A​ "Cobb-Douglas" production function relates production ​(Q​) to factors of​ production, capital ​(K​), labor ​(L​), raw materials ​(M​), and an error term u using the equation Q=λKβ1Lβ2Mβ3eu​, where λ​, β1​, β2​, and β3 are production parameters. Taking logarithms of both sides of the equation yields ln(Q)=β0+β1ln(K)+β2ln(L)+β3ln(M)+u. Suppose that you thought that the value of β2 was not​ constant, but rather increased when K increased. Which of the following regression functions captures this dynamic​ relationship?

ln(Q)=B+Bln(K)+Bln(L)+Bln(M)+B[ln(L)*ln(K)]+u

Imperfect​ multicollinearity:

means that two or more of the regressors are highly correlated.

Consider the multiple regression model with two regressors X1 and X2​, where both variables are determinants of the dependent variable. You first regress Y on X1 only and find no relationship.​ However, when regressing Y on X 1and X2​, the slope coefficient β1 changes by a large amount.

omitted variable bias.

The dummy variable trap is an example​ of:

perfect multicollinearity.

The best way to interpret polynomial regressions is​ to:

plot the estimated regression function and to calculate the estimated effect on Y associated with a change in X for one or more values of X.

Your textbook plots the estimated regression function produced by the probit regression of deny on ​P/I ratio. The estimated probit regression function has a stretched​ "S" shape given that the coefficient on the ​P/I ratio is positive. Consider a probit regression function with a negative coefficient.

resemble an inverted​ "S" shape​ (for low values of X​, the predicted probability of Y would approach​ 1).

In the case of​ errors-in-variables bias:

the OLS estimator is consistent if the variance in the unobservable variable is relatively large compared to the variance in the measurement error.

The linear probability model​ is:

the application of the linear multiple regression model to a binary dependent variable.

In the probit​ regression, the coefficient β1 ​indicates:

the change in the z​-value associated with a unit change in X.

In the case of​ errors-in-variables bias, the precise size and direction of the bias depend​ on:

the correlation between the measured variable and the measurement error.

In the​ log-log model, the slope coefficient​ indicates:

the elasticity of Y with respect to X.

Internal validity is​ that:

the estimator of the causal effect should be unbiased and consistent.

Comparing the California test scores to test scores in Massachusetts is appropriate for external validity​ if:

the institutional settings in California and​ Massachusetts, such as organization in classroom instruction and​ curriculum, were similar in the two states.

A statistical analysis is internally valid​ if:

the statistical inferences about causal effects are valid for the population studied.

The true causal effect might not be the same in the population studied and the population of interest​ because:

the study is out of date. of geographical differences. of differences in characteristics of the population. all of the above.

When testing a joint​ hypothesis, you​ should:

use the F​-statistics and reject at least one of the hypotheses if the statistic exceeds the critical value.

Using the textbook example of 420 California school districts and the regression of test scores on the​ student-teacher ratio, you find that the standard error on the slope coefficient is 0.51 when using the​ heteroskedasticity-robust formula, while it is 0.48 when employing the​ homoskedasticity-only formula. When calculating the t​-statistic, the recommended procedure is​ to:

use the​ heteroskedasticity-robust formula.

In the multiple regression​ model, the adjusted R^2, R bar^2

will never be greater than the regression R^2

In which of the following scenarios does perfect multicollinearity​ occur? Why is it impossible to compute OLS estimators in the presence of perfect​ multicollinearity? Perfect multicollinearity can be rectified by modifying the ()

Perfect multicollinearity occurs when one of the regressors is a perfect linear function of the other regressors. It is impossible to compute OLS estimators in the presence of perfect multicollinearity because it produces division by 0. (independent variables)

Labor economists studying the determinants of​ women's earnings discovered a puzzling empirical result. Using randomly selected employed​ women, they regressed earnings on the​ women's number of children and a set of control variables​ (age, education,​ occupation, and so​ forth). They found that women with more children had higher​ wages, controlling for these other factors. What is most likely causing this​ result?

Sample selection bias.

Suppose that a state offered voluntary standardized tests to all its third graders and that these data were used in a study of class size on student performance. Which of the following would generate selection​ bias?

Schools with​ higher-achieving students could be more likely to volunteer to take the test.

One of your friends is using data on individuals to study the determinants of smoking at your university. She is particularly concerned with estimating marginal effects on the probability of smoking at the extremes. She asks you whether she should use a​ probit, logit, or linear probability model. What advice do you give​ her?

She should use the logit or​ probit, but not the linear probability model.

(Y_i, X_1i, X_2i) satisfy the following assumptions You are interested in β_1​, the causal effect of X_1 on Y. Suppose that X_1 and X_2 are uncorrelated. You estimate β_1 by regressing X_1 ​(so that X_2 is not included in the​ regression). Does this estimator suffer from omitted variable​ bias?

No.

Consider the following regression model Y_i=B_0+B_1X_i+u_i Suppose that Y is measured with random error. Does this mean that regression analysis is​ unreliable? ​Now, suppose that X is measured with random error. Does this mean that regression analysis is​ unreliable?

No. Yes.


Conjuntos de estudio relacionados

Chapter 51: Assessment and Management of Patients With Diabetes

View Set

Nursing of Adults: Neurological Disorders Part II

View Set

Bio 1 - Chapter 20: Genes Within Populations

View Set