Econometrics
The adjusted R^2, or R^-2, is given by:
(1)-((n-1)/(n-k-1))((ssr)/(tss))
To measure the fit of the probit model, you should:
use the "fraction correctly predicted" or the "pseudo R2."
In the case of a simple regression, where the independent variable is measured with i.i.d. error
βp1→σ2Xσ2X+σ2wβ1.
The TSLS estimator is:
consistent and has a normal distribution in large samples.
Consider the regression model Wage=β0+β1Female+u Where Female (=1 if female) is an indicator variable and u the error term. Identify the dependent and independent variables in the regression model above. Wage is the __________variable, while Female is the__________ variable.
(1) dependent (2) independent
Workers in the South earn $_____ more per hour than workers in the west, on average, controlling for other variables in the regression.
0.28 (Column 3 South)
Consider the following regression output where the dependent variable is test scores and the two explanatory variables are the student-teacher ratio and the percent of English learners: Test Score (hat)= 698.9-1.10 *STR -0.650 * PctEL
0.43 (1.10/2.56)
The critical value of F4, infinity at the 5% significance level is:
2.37
Based on the multiple regression results in part (2), we could say that
27.9% of the variation in school is explained by the variables
Consider the estimated equation from your textbook: Test Score=698.9−2.28×STR, R2=0.051, SER=18.6 (10.4) (0.52) The t-statistic for the slope is approximately:
4.38
In the model 1nYi=β0+β1Xi+μi, the elasticity of E(Y X) with respect to X is: A. cannot be calculated because the function is nonlinear. B. β1X. C. β1Xβ0+β1X. D. β1.
B. β1X.
Let W be the included exogenous variables in a regression function that also has endogenous regressors (X). The W variables can: A. make an instrument uncorrelated with μ. B. have the property EμiWi=0. C. be control variables. D. all of the above.
D) All of the above
In an instrumental variable regression model with one regressor, Xi, and two instruments, Z1i and Z2i, the value of the J-statistic is J = 18.5. Which of the following statements is correct?
D. Eui∣Z1i, Z2i≠0, but there is insufficient information to infer if Eui∣Z1i≠0.
Suppose that a state offered voluntary standardized tests to all its third graders and that these data were used in a study of class size on student performance. Which of the following would generate selection bias?
Schools with higher-achieving students could be more likely to volunteer to take the test.
Consider a panel data set and the following regression model. Yit=β0+β1Xit+uit What does subscript i refer to?
Subscript i identifies the entity
Consider a panel data set and the following regression model. Yit=β0+β1Xit+uit What do subscripts i and t refer to?
Subscripts i and t identify the entity and time period respectively.
Consider a panel data set and the following regression model. Yit=β0+β1Xit+uit What does subscript t refer to?
Subscripts t identifies the time period.
The Boston HMDA data set was collected by researchers at the Federal Reserve Bank of Boston. The data set combines information from mortgage applications and a follow-up-survey of the banks and other lending institutions that received these mortgage applications. The data pertain to mortgage applications made in 1990 in the greater Boston metropolitan area. The full data set has 2925 observations, consisting of all mortgage applications by blacks and Hispanics plus a random sample of mortgage applications by whites. GRAPH
The marginal effect in column (1) is the estimated coefficient, whereas the marginal effects in columns (2) and (3) are not the estimated coefficients directly.
Why is the regressor West omitted from the regression? What would happen if it was included>
The regressor West is omitted to avoid perfect multicollinearity. If West is included, then the OLS estimator cannot be computed in the situation.
HAC standard errors and clustered standard errors are related as follows:
clustered standard errors are one type of HAC standard error.
A researcher plans to study the casual effect of police crime using data from a random sample of U.S. counties. He plans to regress the county's crime rate on the (per capita) size of the country's police force. Why is this regression likely to suffer from omitted variable bias?
There are other important determinants of a country's crime rate, including demographic characteristics of the population, that if left out of the regression would bias the estimated partial effect of the (per capita) size of the country's police force
The coefficient on DadColl from the regression in part (2) indicates that
This person would be expected to have 0.696 more years of schooling than the same person whose father did not have a college degree
Suppose you are interested in investigating the wage gender gap using data on earnings of men and women. Which of the following models best serves this purpose?
Wage=β0+β1Female+u, where Female (=1 if female) is an indicator variable and u the error term.
The difference between an unbalanced and a balanced panel is that
an unbalanced panel contains missing observations for at least one time period or one entity.
Sally is a 26-year-old female college graduate. Betsy is a 42-year-old female college graduate. Constuct a confidence interval of 95% for the expected difference between their earnings. The 95% confidence interval for the expected difference between their earnings. The 95% confidence interval for the expected difference between their earnings is (_______,________)
dif age= 42-26= 16 years 95% confidence= dif age * age column 4 +- 1.96 * Age column 4 p2 = 16* (0.28+- 1.96 *0.04 = 3.23,5.73
The homoskedasticity-only F-statistic and the heteroskedasticity-robust F-statistic typically are:
different
When testing joint hypotheses, you can use:
either the F-statistic or the chi-squared statistic.
Threats to internal validity lead to
failures of one or more of the least squares assumptions.
Panel data is also called
longitudinal data.
The logic of control variables in IV regression:
parallels the logic of control variables in OLS.
Nonlinear least squares:
solves the minimization of the sum of squared predictive mistakes through sophisticated mathematical routines, essentially by trial-and-error methods.
Probit coefficients are typically estimated using:
the method of maximum likelihood.
A statistical analysis is internally valid if:
the statistical inferences about causal effects are valid for the population studied.
If the estimates of the coefficients of interest change substantially across specifications
then this often provides evidence that the original specification had omitted variable bias
The notation for panel data is (Xit, Yit), i = 1, ..., n and t = 1, ..., T because
there are n entities and T time periods.
In the multiple regression model, the SER is given by
((1)/(n-k-1)) sum u(hat) ^2
What is the difference in the expected hourly earnings of a 25-year old male with a college degree as compared to a 30-year old female with a college degree?
0.74
Open the Excel data set, CPS08, described in Empirical Exercise 4.1. The variables are described in the Word file, CPS_Description.docx. Regress average hourly earns (AHE) on age, female and bachelor. 1) If age increase from 35 to 36, by how much is AHE expected to change? A. $0.740 B. $0.370 C. $0.585 D. $0.058 2) Re-run the regression in part (1) but use the natural logarithm of AHE as the dependent variable. The effect on an increase in age from 35 to 36 is? A. 0.027 dollars B. 2.73 percent C. 0.027 years D. 0.273 percent 3) Re-run the regression in part (2) but use the natural logarithm of age (ln age) instead of age. Remember the dependent variable is the Ln AHE. The expected change in AHE given an increase in age from 35 to 36 is A. 2.26 percent B. 0.026 percent C. 2.62 dollars D. 0.262 percent 4) Re-run the regression in part (1) but add the square of age to the model (include both age and age2). What is the expected AHE of a 40-year old woman, with a college degree? A. $26.23 B. $29.24 C. $25.51 D. $22.70 5) Based on your results in part (4) you would conclude that the quadratic in age is A. an unnecessary addition because the natural logarithm of age used in part (2) is superior B. an unnecessary addition because the coefficient on age2 is not statistically significant at the 5% level C. an appropriate functional form because the sum of the coefficients age2 and age2 is larger than the coefficient on agewhen entered alone as in part (1) D. an appropriate functional form because the coefficient on age2 is statistically significant at the 5% level 6) Age is a rough proxy for experience. It has been proposed that women of the same age and maybe the same experience as men earn less every year they age. Regress Ln AHE on bachelor, female, age and the interaction of age and female. Based on this regression you conclude that A. Women's wages increase 1.68% less than men's wages for every increment in age B. Women's wages increase 30.98% less than men's wages for every increment in age C. Women's wages increase 3.46% less than men's wages for every increment in age D. Women's wages increase 3.46% less than men's wages for every increment in age (7) Based on the results in part (7), there is no statistical support for the proposition that women's wage increase less than men's wages for each year that they age. A. False, the coefficient on age is positive and statistically significant B. True, the coefficient on the interaction term is statistically insignificant C. True, the coefficient on the interaction term is irrelevant to the question D. False, the coefficient on the interaction term is negative and statistically significant 8) The effect of age on Ln AHE is different for high school graduates than for college graduates? A. True, the t-statistic on the interaction term is greater than 1.96 B. All the above C. True, the prob-value on the interaction term is less than 0.05 so it's statistically significant D. True, college graduates earn 1.7% more per year of age than do high school graduates 9) Based on the results in part (9), the expected wage (in logs) of a 40-year old woman with a bachelor's degree is A. 4.40 B. 3.30 C. 2.63 D. 3.02
1) 0.585 2) 2.73 percent 3) 2.26 percent 4) 26.23 5) an unnecessary addition because the coefficient on age2 is not statistically significant at the 5% level 6) Women's wages increase 1.68% less than men's wages for every increment in age 7) False, the coefficient on the interaction term is negative and statistically significant 8) All the above 9) 3.30
How many years of schooling would a black female be expected to have if she has a base year test score of 50; a father that went to college; is from a family with income greater than $25,000 that owns its home; she is from a country where the unemployment rate is 6.0, and that state hourly wage is manufacturing is $8.00 and she lives 100 miles from the nearest 4-year college?
14.68
Imagine you regressed earnings of individuals on a constant, a binary variable ("Male") which takes on the value 1 for males and is 0 otherwise, and another binary variable ("Female") which takes on the value 1 for females and is 0 otherwise. Because females typically earn less than males, you would expect A) the coefficient for Male to have a positive sign, and for Female a negative sign. B) both coefficients to be the same distance from the constant, one above and the other below. C) none of the OLS estimators exist because there is perfect multicollinearity. D) this to yield a difference in means statistic
C) none of the OLS estimators exist because there is perfect multicollinearity.
The t-statistic for the male-female earnings difference estimated from this regression is _______ Is the male-female earnings difference estimated from this regression statistically significant at the 5% level? Since the t-statistic is ________ than the critical value for 95% confidence, the male-female earnings difference estimated from this regression _____ statistically significant at the 5% level.
Female column 1: eg. -2.72/0.21= -12.95 greater is
Now run a regression on average hourly earning (AHE) on bachelor, female and age. Comparing the coefficient on age in part (1) with coefficient on age when bachalor and female are included, you could conclude that
There is no evidence of omitted variable bias in the simple regression of AHE on age
An example of a quadratic regression model is
Yi = β0 + β1X + β2X2 + ui.
A person with a college degree earns on average $8.03 more than someone with less than a college degree holding age and gender constant. If the t-statistic and standard error were remove from the output could you still test the following null: H0:Hbachalor=4.0
You could use the 95% confidence interval from which you would reject the null.
The interpretation of the slope coefficient in the model 1nYi=β0+β11nXi+μi is as follows:
a 1% change in X is associated with a β1% change in Y.
In the multiple regression model, the t-statistic for testing that the slope is significantly different from zero is calculated
by dividing the estimate by its standard error
The OLS residuals
can be calculated by subtracting the fitted values from the actual values
Imagine that you were told that the t−statistic for the slope coefficient of the regression line TestScore= 698.9 − 2.28 × STR was 4.38. What are the units of measurement for the t−statistic?
standard deviations
Instrumental variables regression uses instruments to:
isolate the movements in X that are uncorrelated with μ.
Rerun the regression in part (2), but drop the variable bachalor. What happens to the coefficient on female?
it falls by $1.13 in the absolute value which suggests that there is an omitted variable bias when bachelor is excluded.
The t-statistic for the college-high school earnings difference estimated from this regression is _______. Is the college-high school earnings difference estimated fro this regression statistically significant at 5% level? Since the absolute value of the t-statistic is _________ than the critical value for 95% confidence, the college-high school earnings difference estimated from this regression _______ statistically significant at the 5% level.
=25.55 (column 1 college x/x eg. 5.62/0.22) Greater Is
What is the difference between internal validity and external validity?
A statistical analysis is said to have internal validity if the statistical inferences about causal effects are valid for the population being studied. The analysis is said to have external validity if conclusions can be generalized to other populations and settings.
Using the t-statistic for the coefficient on Age is _____ The p-value for the preceding t-statistic is _______ Does this imply that age is an important determinant of earnings?
A) 4.67 Age column 2 x/x eg. .28/.06 B) 0.0000 C) Yes, age is an important determinant of earnings because the low p-vale implies that the coefficient on age is statistically significant at the 1% level.
Consider the population regression of log earnings [Yi, where Yi = ln(Earningsi)] against two binary variables: whether a worker is married (D1i, where D1i = 1 if the ith person is married) and the worker's gender (D2i, where D2i = 1 if the ith person is female), and the product of the two binary variables Yi=β0+β1D1i+β2D2i+β3D1i×D2i+μi. The interaction term: A. does not make sense since it could be zero for married males. B. allows the population effect on log earnings of being married to depend on gender. C. indicates the effect of being married on log earnings. D. cannot be estimated without the presence of a continuous variable.
B. allows the population effect on log earnings of being married to depend on gender.
The Least Squares Assumptions Yi=β0+β1Xi+ui, i=1,..., n where 1. The error term ui has conditional mean zero given Xi: Eui∣Xi=0; 2. Xi,Yi, i=1,..., n, are independent and identically distributed (i.i.d.) draws from their joint distribution; and 3. Large outliers are unlikely: Xi and Yi have nonzero finite fourth moments. Assuming this year's class is a typical representation of the same class in other years, are OLS assumption (2) and (3) satisfied?
Both OLS assumption #2 and OLS assumption #3 are satisfied.
The probit model: A. is the same as the logit model. B. always gives the same fit for the predicted values as the linear probability model for values between 0.1 and 0.9. C. forces the predicted values to lie between 0 and 1. D. should not be used since it is too complicated.
C) forces the predicted values to lie between 0 and 1.
Consider the polynomial regression model of degree r, Yi=β0+β1Xi+β2X2i+•••+βrXri+μi. According to the null hypothesis that the regression is linear and the alternative that is a polynomial of degree r corresponds to: A. H0 : β2=0, β3=0,...,βr=0 vs. H1 : all βj≠0, j=2,...,r. B. H0 : βr=0 vs. H1 : βr≠0. C. H0 : β2=0, β3=0,...,βr=0 vs. H1 : at least one βj≠0, j=2,...,r. D. H0 : β1=0 vs. H1 : β1≠0.
C. H0 : β2=0, β3=0,...,βr=0 vs. H1 : at least one βj≠0, j=2,...,r.
For W to be an effective control variable in IV estimation, the following condition must hold:
E(μi|Zi, Wi)=E(μi|Wi).
The OLS residuals, ui (hat), are sample counterparts of the population
Errors.
What is the difference between the population studied and the population of interest?
The population studied is the population from which the sample was drawn, while the population of interest is the population to which causal inferences from this study are to be applied.
Consider the regression model Wage= B0 + B1Female + u Where Female(=1 if female) is an indicator variable and u the error term. Identify the dependent and independent variables in the regression model above. Wage is the _____________ variable, while Female is the ______________ variable.
Wage is the dependent variable, while Female is the independent variable.
Do there appear to be important regional differences?
Yes, because wages are not consistent across the region.
A researcher is using a panel data set on n = 1000 workers over T = 10 years (from 2001 through 2010) that contains the workers' earnings, gender, education, and age. The researcher is interested in the effect of education on earnings. Determine whether each of the following is an example of unobserved person-specific or time-specific variables that are correlated with both education and earnings. . Unobserved ability _______(a)_____ 2. Unemployment level ______(b)_______ 3. Unobserved motivation ______(c)_______ 4. Unobserved household environment ________(d)________ 5. GDP growth ________(c)_________ (f) How would you control for these person-specific and time-specific effects in a panel data regression?
a) Person-specific b) Time-specific c) Person-specific d) Person-specific e) Time-specific f) Include period-specific and time-specific variables in the regression.
In their study of the effectiveness of cardiac catheterization, McClellan, McNeil, and Newhouse (1994) used as an instrument the difference in distance to cardiac catheterization and regular hospitals. a) How could you determine whether this instrument is relevant? b) How could you determine whether this instrument—the difference in distance to cardiac catheterization and regular hospitals—is exogenous?
a) Use both (a) and (c). b) Since there is one endogenous regressor and one instrument, the J-test cannot be used to test the exogeneity of the instruments. Expert judgment is required to assess the exogeneity.
Consider the following regression function to answer the questions below. (GRAPH WITH CURVE PICTURE) (a) Which of the following specifies a nonlinear regression that model this shape? (b) Which of the following economic relationships may exhibit a shape like this? (Check all that apply)
a) Yi=β0+β1Xi+β2X2i+ui. b) -The relationship between wage earnings and years of experience. -The relationship between income and fertility. This is the correct answer. -The relationship between time spent studying for an exam and grade for such exam.
Construct a confidence interval of 95% for the male-female earnings difference. The 95% confidence interval for the male-female earnings difference (_______,______)
female column 1 eg. -2.72+- 1.96 * 0.21 =-3.13,-2.31
In the binary dependent variable model, a predicted value of 0.6 means that
given the values for the explanatory variables, there is a 60 percent probability that the dependent variable will equal one.
The fixed effects regression model:
has n different intercepts.
The question of reliability/unreliability of a multiple regression depends on:
internal and external validity.
The binary dependent variable model is an example of a
limited dependent variable model.
A "Cobb-Douglas" production function relates production (Q) to factors of production, capital (K), labor (L), raw materials (M), and an error term u using the equation Q=λKβ1Lβ2Mβ3eu, where λ, β1, β2, and β3 are production parameters. Suppose that you have data on production and the factors of production from a random sample of firms with the same Cobb-Douglas production function. Which of the following regression functions provides the most useful transformation to estimate the model?
logarithmic regression function.
The OLS estimator is derived by
minimizing the sum of squared residuals.
A researcher is using a panel data set on n = 1000 workers over T = 10 years (from 2001 through 2010) that contains the workers' earnings, gender, education, and age. The researcher is interested in the effect of education on earnings. Suppose you run a regression of earnings on person-specific and time-specific control variables. Why might the regression error for a given individual be serially correlated?
(2 answers) -An unexpected earnings increase that is persistent through some part of the sample period. Your answer is correct. -An unexpected natural disaster occurs in a particular individual's city.
Suppose that n = 331 i.i.d. observations for Yi, Xi yield the following regression results: Y=32.18+69.03X, SER=15.67, R2=0.81 (16.4) (13.4) Another researcher is interested in the same regression, but he makes an error when he enters the data into the regression: He enters each observation twice, so he has 662 observations (with observation 1 entered twice, observation 2 entered twice, and so forth). (a) Which of the following estimated parameters change as result? (Check all that apply) (b) Using the 662 observations, what results will be produced by his regression program? Y = 32.18 + 69.03X, SER = ___(Bi)____, R2 = 0.81 (_____(Bii)______) (_____(Biii)______) (c) Which (if any) of the internal validity conditions are violated?
(a) - The standard error of the regression (SER) - The standard errors of the estimated coefficients (b) Bi) 15.65 (use ser from 1st thing and subtract 0.02 bc data size has changed. may 0.02 casue it is 2 times the size.) Bii) 11.58 =(sqroot((331-2)/(662-2))*16.4 Biii) 9.46 =(sqroot((331-2)/(662-2))*13.4 (more data means more stable estimator) (c) Measurement error.
Four hundred driver's license applicants were randomly selected and asked whether they passed their driving test (Passi=1) or failed their test (Passi=0); data were also located on their gender (Malei=1 if male and = 0 if female) and their years of driving experience (Experiencei, in years). The following table summarizes the results from several probit, logit and linear probability models. GRAPH Use the results in column (2) to answer the following questions. a) Is the coefficient on Experience significant at any reasonable level? B) John has 15 years of driving experience. What is the predicted probability that he will pass the test? The predicted probability that John will pass the test is ______b)________ (Round your response to three decimal places) C) Katherine is a new driver (zero years of experience). What is the predicted probability that she will pass the test? The predicted probability that Katherine will pass the test is _______c________ (Round your response to three decimal places) d) Which of the figures below is more likely to show predicted probabilities from the logit model? figue a= curved figure b= straight diagonal line
(a) The coefficient on Experience is significant at the 1% significance level. b) 0.844 ((1)/((1+e^(1.058+0.042*15))) c)0.742 ((1)/(1+e^(1.058))) d) Figure (a) (which is curved)
Four hundred driver's license applicants were randomly selected and asked whether they passed their driving test (Passi=1) or failed their test (Passi=0); data were also located on their gender (Malei=1 if male and = 0 if female) and their years of driving experience (Experiencei, in years). The following table summarizes the results from several probit, logit, and linear probability models. GRAPH (a) Is the coefficient on Experience significant at any reasonable level? b) Matthew has 11 years of driving experience. What is the predicted probability that he will pass the test? The predicted probability that Matthew will pass the test is ______b)_________ (c) Christopher is a new driver (zero years of experience). What is the predicted probability that he will pass the test? The predicted probability that Christopher will pass the test is _______(c)_________ (d) The sample included values of Experience between 0 and 40 years, and only four people in the sample had more than 30 years of driving experience. Jed is 95 years old and has been driving since he was 17. What is the model's prediction for the probability that Jed will pass the test? The predicted probability that Jed will pass the test is _______d)_________ (e) Do you think the previous prediction is reliable?
(a) The coefficient on Experience is significant at the 1% significance level. b) 0.864 (=Use Z table. First caluc 0.712 (contasnt column 1)+(.036 (experience column 1) *20 (# of years)= 1.432 Then go to z table and search under x 1.4 then go over to 0.03= 0.924.) c) 0.762 d) 1.000 e) No
Suppose that the linear probability model yields a predicted value of Y that is equal to 1.3. Explain why this is nonsensical.
The predicted value of Y must be between 0 and 1
The following questions refer to the panel data regressions summarized in Table 12.1. (GRAPH) Suppose that the federal government is considering a new tax on cigarettes that is estimated to increase the retail price by $0.60 per pack. If the current price per pack is $7.50, use the regression in column (1) to predict the change in demand. The expected percentage change in cigarette demand is _______a)________%. (Round your response to two decimal places.) Construct a 95% confidence interval for the change in demand. The confidence interval is (_________b)______%, _________c)_________%). (Round your responses to two decimal places.) d) Suppose that the United States enters a recession, and income falls by 3%. Use the regression in column (1) to predict the change in demand. The expected percentage change in demand is ______d_______ (Round your response to two decimal places.) e) Suppose that the recession lasts less than 1 year. Do you think that the regression in column (1) will be able to reliably predict the effect of income change on cigarette demand? Why or why not? f)Suppose that the F-statistic in column (1) were 3.6. Would the regression provide a reliable measure of the effect of a price change on cigarette demand? Why or why not?
a) -7.24% (idk how to get that) b) -10.41% (-.94-1.96*.21)*0.0770 *100% c)-4.07% (-.94+1.96*.21)*0.0770 *100% the -0.94 comes from graph ln(p cig 1995)-ln(p cig 1985) column 1 and the -0.21 is below that. d)-1.59% (0.53* (-.03) * 100% .53 from ln(Inc1995) column 1. and -.03 from overidentifing restictions of j-test and p value column 3 e) Both (a) and (c) are correct. f) No, the instrumental variable would be too weak (irrelevant) if the F-statistic in column (1) were less than 10.
New Jersey has a population of 6.5 million people. Suppose that New Jersey increased the tax on a case of beer by $1 (in 1988 dollars). Use the results in column (4) to predict the number of lives that would be saved over the next year. The predicted number of lives that would be saved over the next year is _____(a)_____ Construct a 95% confidence interval for your answer. The 95% confidence interval for the number of lives that would be saved over the next year is (______(b)_____, _____(c)____) The drinking age in New Jersey is 21. Suppose that New Jersey lowered its drinking age to 18. Use the results in column (4) to predict the change in the number of traffic fatalities in the next year. The predicted ______(d)_____in the number of traffic fatalities in the next year is _______(e)_______ Construct a 90% confidence interval for your answer. The 90% confidence interval for the predicted increase in the number of traffic fatalities in the next year is (________(f)______, _______(g)______) Suppose that real income per capita in New Jersey increases by 1% in the next year. Use the results in column (4) to predict the change in the number of traffic fatalities in the next year. The predicted ______(h)_____ in the number of traffic fatalities in the next year is ________(i)_______ Construct a 90% confidence interval for your answer. The 90% confidence interval for the predicted increase in the number of traffic fatalities in the next year is [_____(j)______, ______(k)_____) l) Refer to the reported F-Statistics and p-values associated with testing for exclusion of group of variables. Should time effects be included in the regression? m)A researcher conjectures that the unemployment rate has a different effect on traffic fatalities in the western states than in the other states. How would you test this hypothesis?
a) 279.50 (beer tax column 4= 0.43. 6.5mil= 650. 650*0.43= 279.5) b) -140.92 =0.33 (column 4 part 2 beer tax) is SE of (b1 hat) -.43 + (1.96*.33)= 0.2168 then multiply by -650= -140.92 c) 699.92 (=-.43- (1.96*.33)= -1.0768 then multiply by -650= 699.92) d) increase e) 20.15 (=0.031 (column 4 drinkin age 18) * 650 (population)= 20.15) f) -61.11 (=0.031 (column 4 drinkin age 18)+ (1.645* 0.076 from column 4 drinkin age 18)= 0.09402. then multiply by -650= -61.11) g) 101.41 (=0.031 (column 4 drinkin age 18)- (1.645* 0.076 from column 4 drinkin age 18)= -.15602 *-650= 101.413) h) increase i) 11.31 (=1.74% (column 4 for real income) *650)) j) 4.04 k) 18.58 l) yes m) I would include a binary variable west (=1 if the state is in the west and 0 otherwise), and an interaction term west*Unemployment rate . Then, I would test if the estimated coefficient for the interaction term is significant at a reasonable level.
Consider the following binary variable version of the fixed effects model. Each regressor Dj is a binary variable that equals 1 when i = j and 0 otherwise. Note that the binary variable D1i for the first group is arbitrarily omitted. Yit=β0+β1Xit+γ2D2i+γ3D3i+...+γnDni+uit Use the regression in the equation above and the tool palette to the right to answer the following questions. What is the slope and intercept for entity 1 in time period 1? The slope of entity 1 in time period 1 is ______a)_____. The intercept of entity 1 in time period 1 is _____(b)_____. (Properly format your expressions using the tools in the palette. Hover over tools to see keyboard shortcuts. E.g., a subscript can be created with the _ character.) What is the slope and intercept for entity 3 in time period 3? The slope of entity 3 in time period 3 is ______(c)_____. The intercept of entity 3 in time period 3 is _____(d)______. (Properly format your expressions using the tools in the palette.) What is the slope and intercept for entity 2 in time period 1? The slope of entity 2 in time period 1 is _____(e)______. The intercept of entity 2 in time period 1 is _____(f)______. (Properly format your expressions using the tools in the palette.)
a) B1 (substript) b) B0 (subscript) c) B1 d) B0 + Y3 (0 and 3 are subscripts) e) B1 f) B0 + Y2
A set of instruments must satisfy the following two conditions to be valid: (i) Instrument Relevance and (ii) Instrument Exogeneity. Consider the instrumental variable regression model Yi=β0+β1Xi+β2Wi+ui, where Xi is correlated with ui, Wi (the exogenous regressor) is uncorrelated with ui, and Zi is an instrument. Suppose that the following three assumptions are satisfied. 1. EuiW1i,...,Wri=0; 2. X1i,...,Xki, W1i,...,Wri, Z1i,...,Zmi, Yi are i.i.d. draws from their joint distribution; 3. Large outliers are unlikely: The X's, W's, Z's, and Y have nonzero finite fourth moments. a) Which of the two conditions, (i) and (ii), for a valid instrument is not satisfied when Zi is independent of (Yi, Xi, Wi)? b)Which of the two conditions, (i) and (ii), for a valid instrument is not satisfied when Zi = Wi? c) Which of the two conditions, (i) and (ii), for a valid instrument is not satisfied when Wi = 1 for all i? d)Which of the two conditions, (i) and (ii), for a valid instrument is not satisfied when Zi = Xi?
a) Only (i) is not satisfied b) Only (i) is not satisfied. c) Only (i) is not satisfied. d) Only (ii) is not satisfied.
Consider the problem of estimating the elasticity of demand for butter. The demand equation is given by lnQbutteri=β0+β1lnPbutteri+ui, where Qbutteri is the ith observation on the quantity of butter consumed, Pbutteri is its price, and ui represents other factors that affect demand, such as income and consumer tastes. a) In the above demand curve regression model, is lnPbutteri positively or negatively correlated with the error, ui? b) If β1 is estimated by OLS, would you expect the estimated value to be larger or smaller than the true value of β1?
a) ln left parenthesis Upper P Subscript i Superscript butter right parenthesislnPbutteri is positively correlated with the regression error, u Subscript iui. b) The OLS estimator of beta 1β1 is likely to be larger than the true value of beta 1β1, because ln left parenthesis Upper P Superscript butter Baseline right parenthesislnPbutter is positively correlated with the regression error, u Subscript iui.
The rule of thumb for checking for weak instruments is as follows: for the case of a single endogenous regressor:
a first-stage F-statistic < 10 indicates that the instruments are weak.
F-statistics computed using maximum likelihoodestimators:
can be used to test joint hypotheses.
Consider a model with one endogenous regressor and two instruments. Then the J-statistic will be large:
if the coefficients are very different when estimating the coefficients using one instrument at a time.
In the expression Pr(deny=1P/I ratio, black)=Φ(−2.26+2.74 P/I ratio+0.71 black), the effect of increasing the P/I ratio from 0.3 to 0.4 for a black person (Assume a probit model):
is 9.4 percentage points.
In panel data, the regression error:
is likely to be correlated over time within an entity.
In panel data, the standard errors are clustered because the regression error:
is likely to be correlated over time within an entity.
In the case of errors-in-variables bias:
the OLS estimator is consistent if the variance in the unobservable variable is relatively large compared to the variance in the measurement error.
Weak instruments are a problem because:
the TSLS estimator may not be normally distributed, even in large samples.
The linear probability model is:
the application of the linear multiple regression model to a binary dependent variable.
In the probit regression, the coefficient β1 indicates:
the change in the z-value associated with a unit change in X.
In the time fixed effects regression model, you should exclude one of the binary variables for the time periods when an intercept is present in the equation:
to avoid perfect multicollinearity.
The distinction between endogenous and exogenous variables is:
whether or not the variables are correlated with the error term.
In the multiple regression model, the adjusted R^2, R^-2
will never be greater than the regression R^2
A survey of earnings contains an unusually high fraction of individuals who state their weekly earnings in 100s, such as 300, 400, 500, etc. This is an example of:
errors-in-variables bias.
Suppose you are interested in studying the relationship between education and wage. More specifically, suppose that you believe the relationship to be captured by the following linear regression model, Wage=β0+β1Education+u Suppose further that the only unobservable that can possibly affect both wage and education is intelligence of the individual. OLS assumption (1): The conditional distribution of ui given Xi has a mean of zero. Mathematically, Eui∣Xi=0. (a) Which of the following provides evidence in favor of OLS assumption #1? (Check all that apply) (b) Which of the following provides evidence against of OLS assumption #1? (Check all that apply) (c) OLS assumption (2): Xi, Yi, i=1,..., n are independently and identically distributed. Suppose you would like to draw a sample to study the effect of education on wage. Which of the following provides evidence in favor of OLS assumption #2? (Check all that apply) (d)Suppose you would like to draw a sample to study the effect of education on wage. Which of the following provides evidence against OLS assumption #2? (Check all that apply) (e) OLS assumption (3): Large outliers are unlikely. Mathematically, X and Y have nonzero finite fourth moments: 0<EX4i<∞ and 0<EY4i<∞ . Suppose you would like to draw a sample to study the effect of education on wage. Which of the following provides evidence in favor OLS assumption #3? (Check all that apply) (f)Suppose you would like to draw a sample to study the effect of education on wage. Which of the following provides evidence against OLS assumption #3? (Check all that apply)
(a) 1 answer E(Intelligence∣Education=x)=E(Intelligence∣Education=y) for all x≠y. (b) 2 answers - corr(Intelligence, Education)≠0. - covariance(Intelligence, Education)≠0. (c) 1 answer A random sample is drawn from a population of college graduates. (d) 2 answers - A sample consisting of all honor students is drawn from a population of college graduates. Your answer is correct. - Observations consisting of the same group of college students are drawn repeatedly each year over the course of their college careers (e) 2 answers - The maximum wage an individual can get is a finite number. Your answer is correct. - The years of education an individual can get is bounded above. (f) 2 answers - Half of the wages in the sample were incorrectly multiplied by 1 million when recorded. Your answer is correct. - For some individuals in the sample, years of education were recorded in days rather than years.
(a) Consider a man with 17 years of education and 5 years of experience who is from a western state. Use the results from column (4) of the table and the method in Key Concept 8.1LOADING... to estimate the expected change in the logarithm of average hourly earnings (AHE) associated with an additional year of experience. The expected change in the logarithm of average hourly earnings (AHE) associated with an additional year of experience is ______(a)_____%. (Round your response to two decimal places.) (b) Consider a man with 17 years of education and 11 years of experience who is from a western state. Use the results from column (4) of the table and the method in Key Concept 8.1LOADING... to estimate the expected change in the logarithm of average hourly earnings (AHE) associated with an additional year of experience. The expected change in the logarithm of average hourly earnings (AHE) associated with an additional year of experience is _____(b)________ (Round your response to two decimal places.) (c)Why are the answers to Scenario A and Scenario B different? (d) The t-statistic for the difference between the effects in Scenario A and Scenario B is _______(d)______. (Round your response to two decimal places.) (e) Is the difference between the effects in Scenario A and Scenario B statistically significant at the 5% level? (Y/N) (f)How would you change the regression if you suspected that the effect of experience on earnings was different for men than for women?
(a) 1.22% (b) 0.98% (c)The regression is nonlinear in experience. (d) 7.80 (potential experience ^2 row 4 top devided by bottom (no negatives) (e) Yes (f) Include interaction terms Female×Potential experience and Female×(Potential experience)2.
In this exercise, you will use these data to investigate the relationship between the number of completed years of education for young adults and the distance from each student's high school to the nearest four-year college. (Proximity lowers the cost of education, so that students who live closer to a four-year college should, on average, complete more years of higher education.) The following table contains data from a random sample of high school seniors interviewed in 1980 and re-interviewed in 1986. Download the data from the table by clicking the download table icon . A detailed description of the variables used in the dataset. Use a statistical package of your choice to answer the following questions. Suppose you are interested in estimating the following model ED = β0+β1Dist+u Run a regression of years of completed education (ED) on distance to the nearest college (Dist), where Dist is measured in tens of miles. (For example, Dist = 2 means that the distance is 20 miles). (a) What is the estimated intercept β0? (b)What is the estimated slope β1? (c) Is the estimated intercept β0 meaningful in this case? Y/N (d)How does the average value of years of completed schooling change when colleges are built close to where students go to high school (e) Bob's high school was 39 miles from the nearest college. Predict Bob's years of completed education using the estimated regression. (f)John's high school was 15 miles from the nearest college. Predict John's years of completed education using the estimated regression (g) Compute the R2 for the regression above (h)Does distance to college explain a large fraction of the variance in educational attaintment across individuals Y/N (i) Compute the value of the standard error of the regression and specify its units: The standard error of the regression (SER) is _________ __________.
(a) 13.824 (b) 0.012 (c) Yes (d) The regression predicts that if colleges are built 10 miles closer to where students go to high school, average years of college will decrease by 0.012 years. (e) 13.87 (f) 13.84 (g) 0.0002 (h) No (i) 3.7825, years
Suppose that a researcher, using data on class size (CS) and average test scores from 94 third-grade classes, estimates the OLS regression TestScore=567.236+(−6.3438)×CS, R2=0.09, SER=12.5. (a) A classroom has 20 students. The regression's prediction for that classroom's average test score is ___________ (Round your response to two decimal places.) (b)Last year a classroom had 17 students, and this year it has 21 students. The regression's prediction for the change in the classroom average test score is ______________ (Round your response to two decimal places.) (c) The sample average class size across the 94 classrooms is 23.33. The sample average of the test scores across the 94 classrooms is ______________(Hint: Review the formulas for the OLS estimators.) (Round your response to two decimal places.) (d) The sample standard deviation of test scores across the 94 classrooms is ____________
(a) 440.36 (b) -25.38 (c) 419.24 (d) 13.1
The estimated regression is Yi=41+0.56Xi Compute the estimated regression's prediction for the average score of students given 95, 127, or 152 minutes to complete the exam. (a) Given 95 minutes, the estimated regression's prediction for the average score of students is ______________ (b) Given 127 minutes, the estimated regression's prediction for the average score of students is _________________ (c) Given 152 minutes, the estimated regression's prediction for the average score of students is _______________ (d) Compute the estimated gain in score for a student who is given an additional 46 minutes on the exam. The estimated gain in score for a student who is given an additional 46 minutes on the exam is ___________
(a) 94.2 (b) 112.12 (c) 126.12 (d) 25.75
In this exercise, you will investigate the relationship between earnings and height. These data are taken from the US National Health Interview Survey for 1994. Download the data from the table by clicking the download table icon . A detailed description of the variables used in the dataset is available here LOADING.... Use a statistical package of your choice to answer the following questions. Run a regression of Earnings on Height. (a) Is the estimated slope statistically significant? (b) Construct a 95% confidence interval for the slope coefficient using heteroskedasticity-robust standard errors LOADING.... The 95% confidence interval for the slope coefficient is [_____________,__________________) (Round your responses to three decimal places) (c) Run a regression of Earnings on Height using data for female workers only. Is the estimated slope statistically significant? (d) Construct a 95% confidence interval for the slope coefficient using heteroskedasticity-robust standard errors LOADING.... The 95% confidence interval for the slope coefficient is [___________,_______________] (Round your responses to three decimal places) (e) Run a regression of Earnings on Height using data for male workers only. Is the estimated slope statistically significant? (f) Construct a 95% confidence interval for the slope coefficient using heteroskedasticity-robust standard errors LOADING.... The 95% confidence interval for the slope coefficient is [_______________,_____________] (Round your responses to three decimal places) (g) Can you reject the null hypothesis that the effect of height on earnings is the same for men and women?
(a) No (b) -926.980, 2150.005 (c) No (d) -10923.158, 5058.995 (e) No (f) -8550.075, 22665.640 (g) No
Consider the following regression model Yi=β0+β1Xi+ui (a) Suppose that Y is measured with random error. Does this mean that regression analysis is unreliable? (b) Now, suppose that X is measured with random error. Does this mean that regression analysis is unreliable?
(a) No (b) Yes
A professor decides to run an experiment to measure the effect of time pressure on final exam scores. He gives each of the 400 students in his course the same final exam, but some students have 90 minutes to complete the exam while others have 120 minutes. Each student is randomly assigned one of the examination times based on the flip of a coin. Let Yi denote the number of points scored on the exam by the ith student (0 ≤ Yi ≤100), let Xi denote the amount of time that the student has to complete the exam (Xi = 90 or 120), and consider the regression model Yi=β0+β1Xi+ui, Eui=0 (a) Which of the following are true about the unobservable ui?(Check all that apply)
(a) two correct: - ui represents factors other than time that influence the student's performance on the exam. Your answer is correct. - Different students will have different values of ui because they have unobserved individual specific traits that affect exam performance
Suppose you are interested in studying the relationship between education and wage. More specifically, suppose that you believe the relationship to be captured by the following linear regression model, Wage=β0+β1Education+u Suppose further that you estimate the unknown population linear regression model by OLS. (a) What is the difference between β1 and β1? (b) What is the difference between u and u? (c) What is the difference between the OLS predicted value Wage and E(Wage∣Education)?
(a) β1 is a true populationparameter, the slope of the population regressionline, while β1 is the OLS estimator of β1. (b) u represents the deviation of observations from the population regression line, while u is the difference between Wage and its predicted value Wage. (c) E(Wage∣Education) is the expected value of Wage for given values of Education, while Wage is the OLS predicted value of Wage for given values of Education.
All of the following are true
- A high R^2 or adjusted R^2 does not mean that the regressors are true cause the dependent variable - A high R^2 or adjusted R^2 does not mean that there is no omitted variable bias - A high R^2 or adjusted R^2 does not necessarily mean that you have the most appropriate set of regressors - A high R^2 or adjusted R^2 does not always mean that an added variable is statistically significant
Which of the following variables are likely useful to add to the regression to control for important omitted variables?
- the fraction of young males in the county population - the average level of education in the county - the average income per capita of the county
Workers in the Northeast earn $_____ more per hour than workers in the west, on average, controlling for other variables in the regression.
0.72 (column 3 Northeast)
How many years of schooling would a person be expected to have if all you knew was that they lived 100 miles from the nearest 4-year college
13.22
How many years of schooling would a black female be expected to have if she had the same characteristics as in part (7) but her family had less than $25,000 in income and they did not own their own family home?
14.14
Assume that you had estimated the following quadratic regression model: Test Score=607.3+3.85Income−0.0423Income2 If income increased from 10 to 11 ($10,000 to $11,000), then the predicted effect on test scores would be:
2.96
Suppose that a researcher, using data on the class size (CS) and average test scores from 103 third-grade classes, estimates the OLS regression Test score (hat)= 515.196 + (-5.7618) * CS, R^2=0.06, SER=11.4. A classroom has 21 students. The regression's prediction for that classroom's average test score is ________
515.196 + (-5.7618) * 21 = 394.20
The multiple regression includes two regressors: Yi= B0+ B1X1i + B2X2i +ui Use the tool palette to the right to answer the following questions. What is the expected change in Y if X1 increases by 4 units and X2 is unchanged? - The expected change in Y if X1 increases by 4 units and X2 is unchanged is ________. What is the expected change in Y if X2 decreases by 7 units and X1 is unchanged? - The expected change in Y if X2 decreases by 7 units and X1 is unchanged is ______. What is the expected change in Y if X1 increases by 2 units and X2 decreases by 5 units? -The expected change in Y if X1 increases by 2 units and X2 decreases by 5 units is__________.
A) 4B1 B) -7B2 C) 2B1-5B2
Changing the units of measurement—that is, measuring test scores in 100s, will do all of the following except for changing the: A. interpretation of the effect that a change in X has on the change in Y. B. numerical value of the intercept. C. residuals. D. numerical value of the slope estimate.
A. interpretation of the effect that a change in X has on the change in Y.
A nonlinear function: A. is a function with a slope that is not constant. B. can be adequately described by a straight line between the dependent variable and one of the explanatory variables. C. makes little sense, because variables in the real world are related linearly. D. is a concept that only applies to the case of a single or two explanatory variables since you cannot draw a line in four dimensions.
A. is a function with a slope that is not constant.
The coefficient on age shows that
AHE increase by 0.605 for every one-year increase in age
A researcher estimates the effect on crime rates of spending on police by using city-level data. Which of the following represents simultaneous causality?
Cities with high crime rates may need a larger police force, and thus more spending. More police spending, in turn, reduces crime.
Construct a confidence interval of 95% for the college-high school earnings difference. The 95% confidence interval for the college-high school earnings difference is (______,______)
Column 1 college. 5.62+- 1.96 * 0.22 = 5.19, 6.05
Consider a regression with two variables, in which X1i, is the variable of the interest and X2i is the control variable. Conditional mean independence requires:
E(uiIX1i,X21)= E(uiIX1i)
The coefficient on females in part (2) indicates that
Females obtain 0.145 more years of schooling than do males adjusted for other factors
Consider the regression in part (2). Based on a joint test of the hypothesis H0:Bfemale= Bbachalor= 0 we would
Reject H0: because the F^act is 822 which is much larger than the critical 2,8 of 3.0
Given the following hypothesis: H0:B(females)=0.0 adjusted for age and education we would
Reject H0 because the 95% confidence interval does not include zero
Data were collected from a random sample of 340 home sales from a community in 2003. Let Price denote the selling price (in $1,000), BDR denote the number of bedrooms, Bath denote the number of bathrooms, Hsize denote the size of the house (in square feet), Lsize denote the lot size (in square feet), Age denote the age of the house (in years), and Poor denote a binary variable that is equal to 1 if the condition of the house is reported as "poor". An estimated regression yields Price (hat)= 122.8 + 0.500BDR + 24.1Bath +0.161Hsize + 0.004Lsize + 0.093Age - 50.3Poor, R^-2=0.74, SER=42.7 Suppose that a homeowner converts part of an existing family room in her house into a new bathroom. What is the expected increase in value of the house? The expected increase in value of the house is $_______.
The expected increase in value of the house is $24100 24.1*1000= 24100
Suppose that you have just read a careful statistical study of the effect of advertising on the demand for cigarettes. Using data from New York during the 1970s, the study concluded that advertising on buses and subways was more effective than print advertising. Use the concept of external validity to determine if these results are likely to apply to Boston in the 1970s; Los Angeles in the 1970s; New York in 2010.
The results are likely to apply to Boston in the 1970s, but not to Los Angeles in the 1970s or New York in 2010.
Using the Excel data set, run a regression of years of completed schooling (ed) on distance (in 10s of miles) from a 4-year college (dist)
Years of completed schooling decreased by 0.073 years for every 10-mile increase in distance from the nearest 4-year college
Regress completed schooling (ed) on the variables dist. female, black, hispanic, byset, dadcoll incomehi, ownhome, cue80, and stwfg80. The coefficient on distance (dist) now indicates that adjusted for other factors
Years of completed schooling increase by 0.032 years for every 10-miles closer one lives from the nearest 4-year college.
Sales in a company are $191 million in 2009 and increase $200 million in 2010. Compute the percentage increase in sales using the usual formula 100×(Sales(2010)−Sales(2009))/(Sales(2009)) Compare this value to the approximation 100×ln (Sales(2010))−ln (Sales(2009)) 100×((Sales(2010)−Sales(2009))/(Sales(2009)) = ____(a)_____% 100×ln (Sales(2010))−ln (Sales(2009)) = ____(b)_____% Now, assume that sales in a company are $191 million in 2009 and increase $263 million in 2010. 100×((Sales(2010)−Sales(2009))/(Sales(2009)) = ____(c)_____% 100×ln (Sales(2010))−ln (Sales(2009)) = ____(d)_____% The approximation performs _______(e)________ when the change is small. The quality of the approximation ______(f)_____as the percentage change increases.
a) 4.712 b) 4.604 c) 37.696 d) 31.989 e) better f) deteriorates
This problem is inspired by a study of the "gender gap" in earnings in top corporate jobs [Bertrand and Hallock (2001)]. The study compares total compensation among top executives in a large set of U.S. public corporations in the 1990s. (Each year these publicly traded corporations must report total compensation levels for their top five executives.) Let Female be an indicator variable that is equal to 1 for females and 0 for males. A regression of the logarithm of earnings onto Female yields ln(Earnings)=6.44−0.43Female, SER=2.48. (0.01) (0.05) Calculate the average hourly earnings for top male and female executives. The hourly earnings for top male executives is $____(a)____ per hour. (Round your response to two decimal places.) The hourly earnings for top female executives is $______(b)_____ per hour. (Round your response to two decimal places.) What is the estimated average difference between earnings of top male executives and top female executives? The estimated average difference between earnings of top male executives and top female executives is $_____(c)_____ per hour. (Round your response to two decimal places.) What is the estimator of the standard deviation of the regression error? The estimator of the standard deviation of the regression error is ____(d)_____ (Round your response to two decimal places.) Calculate the t-statistic for Female. The t-statistic for Female is ______(e)_______. (Round your response to two decimal places.) f) Looking at the t-statistic, does this regression suggest that female top executives earn less than top male executives? (y/n) g) Does this imply that there is gender discrimination? (y/n) Two new variables, the market value of the firm (a measure of firm size, in millions of dollars) and stock return (a measure of firm performance, in percentage points), are added to the regression: ln(Earnings)=3.86−0.28Female+0.37ln(MarketValue)+0.004Return, (0.03) (0.04) (0.004) (0.003) n = 46,670, R^-2 = 0.345. If MarketValue increases by 1.77%, what is the increase in earnings? If MarketValue increases by 1.77%, earnings increase by _____(h)_____ _______(i)_______. (Round your response to two decimal places.) (j) The coefficient on Female is now−0.28. Why has it changed from the first regression? A. Female is correlated with the two new included variables. B. MarketValue is important for explaining ln(Earnings). C. The first regression suffered from omitted variable bias. D. All of the above. (k) Are large firms more likely to have female top executives than small firms? A. Yes. B. There is no relationship between the genders. C. No.
a) 626.41 b) 407.48 c) 218.93 d) 2.48 e) -8.60 f) Yes g) No h) 0.65 i) % j) all of the above k) no
The true causal effect might not be the same in the population studied and the population of interest because
all of the above.
The interpretation of the slope coefficient in the model ln(Yi) = β0 + β1 ln(Xi)+ ui is as follows:
a 1% change in X is associated with a β1 % change in Y.
The adjusted and unadjusted R^2 from the regression in part (2) are very similar because
because (n-1)/(n-k-1) is close to 1.0
The OLS estimators of the coefficients in multiple regression will have omitted variable bias
if an omitted determinant of Yi is correlated with at least one of the regressors.
One of the least square assumptions in the multiple regression model is that you have random variables which are "i.i.d." this stands for
independently and identically distributed
In the simple linear regression model, the regression slope:
indicates by how many units Y increases, given a one-unit increase in X.
Changing the units of measurement, e.g. measuring testscores in 100s, will do all of the following EXCEPT for changing the
interpretation of the effect that a change in X has on the change in Y
A nonlinear function
is a function with a slope that is not constant.
Consider the multiple regression model with two regressors X1 and X2, where both variables are determinants of the dependant variable. You first regress Y on X1 only and find no relationship. However, when regressing Y on X1 and X2, the slope coefficient B1 (hat) changes by a large amount. This suggests that your first regression suffers from:
omitted variable bias.
The dummy variable trap is an example of
perfect multicollinearity
The best way to interpret polynomial regressions is to:
plot the estimated regression function and to calculate the estimated effect on Y associated with a change in X for one or more values of X.
To obtain the slope estimator using the least squares principle, you divide the
sample covariance of X and Y by the sample variance of X
If you wanted to test, using a 6% significance level, whether or not a specific slope coefficient is equal to one, then you should:
subtract 1 from the estimated coefficient, divide the difference by the standard error, and the check if the resulting ratio is larger than 1.96
Based on the regression in part (8) (AHE on age and female), test the following H0:Bage=Bfemale=0
the F^act statistic is 178.4 so you would reject the null at the 1% level
Internal validity is that
the estimator of the causal effect should be unbiased and consistent
The regression R2 is a measure of:
the goodness of fit of your regression line.
Comparing the California test scores to test scores in Massachusetts is appropriate for external validity if
the institutional settings in California and Massachusetts, such as organization in classroom instruction and curriculum, were similar in the two states.
Suppose that crime rate is positively affected by the fraction of young males in the population, and that countries with high crime rates tend to hire more police. Use the following expression for omitted variable bias to determine whether the regression will likely over- or underestimated the effect of police on the crime rate. B1 (hat) -> p B1 + pxu (ou/ox)
the regression will likely overestimate B1. That is B1 (hat) is likely to be larger than B1.
Based on a comparison of the coefficents on dist in the regression in part (1) and part (2) we would conclude that
there is an omitted variable problem because the coefficient on distance was reduced by 57% suggesting that other factors correlated with distance but also correlated with completed schooling were not included in the simple regression.
The error term is homoskedastic if
var(ui Xi = x) is constant for i= 1,?, n.