Econ 104 Midterm

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

(c) Model 3 should be selected because it has the lowest AIC, indicating the best model fit among the three when accounting for the number of parameters. The Akaike Information Criterion (AIC) balances two things:→ Goodness of fit (how well the model explains the data)→ Parsimony (how many parameters it uses — penalizes overfitting) Lower AIC is better.Even though Model 3 is the most complex (more variables), its AIC is the lowest (508), which means after penalizing for complexity, it still provides the best overall fit. So, we always pick the model with the lowest AIC, not just the fewest variables or smallest improvement.

A data analyst is working on constructing a predictive model for electricity consumption based on several variables. Three models are considered: Model 1: temperature, humidity (AIC = 520) Model 2: temperature, humidity, time of day (AIC = 510) Model 3: temperature, humidity, time of day, day of the week (AIC = 508) Which statement is correct, and why? (a) Model 1 should be selected because it includes fewer variables, suggesting simplicity without much loss of information. (b) Model 2 should be selected because it offers a substantial improvement over Model 1 in terms of lower AIC, indicating a better trade-off between model complexity and fit. (c) Model 3 should be selected because it has the lowest AIC, indicating the best model fit among the three when accounting for the number of parameters. (d) Model 2 should be disregarded because adding the time of day did not reduce the AIC significantly compared to the increase in complexity from Model 1. (e) None of the above.

(d) Cross-sectional data; correlation plot This dataset is a snapshot of many students at one point in time (last Fall), not repeated over time → cross-sectional, not time series or panel data.

A researcher is analyzing the impact of social media usage on academic performance using data collected from a survey of UCLA final year students last Fall. The dataset includes individual-level information on GPA, hours spent on social media per week, hours spent studying per week, major, and demographic information such as age, gender, and year in school. The researcher is interested in understanding the relationship between social media usage and GPA, controlling for other variables, and testing for potential econometric issues. 1. What type of dataset is the researcher using in this scenario and what can be used to illustrate the distribution of the variables while identifying potential outliers? (a) Time series data; histogram (b) Cross-sectional data; boxplot (c) Time series data; scatterplot (d) Cross-sectional data; correlation plot (e) None of the above answers are correct

(e) Running a simple linear regression with GPA as the dependent variable and hours spent on social media per week as the only predictor Which options work? (a) Adding a squared term → captures non-linear curvature. (b) Log transformation → also reshapes the relationship. (c) RESET test → checks for general model misspecification (including non-linear patterns). (d) Scatterplot → visual inspection for non-linearity. Why NOT (e)?A simple linear regression assumes a straight-line relationship — it doesn't test for or capture non-linear effects, so it's the least suitable here.

A researcher is analyzing the impact of social media usage on academic performance using data collected from a survey of UCLA final year students last Fall. The dataset includes individual-level information on GPA, hours spent on social media per week, hours spent studying per week, major, and demographic information such as age, gender, and year in school. The researcher is interested in understanding the relationship between social media usage and GPA, controlling for other variables, and testing for potential econometric issues. The researcher suspects that hours spent on social media per week may have a non-linear relationship with GPA. Which of the following approaches would NOT be suitable for testing this hypothesis? (a) Adding a squared term for hours spent on social media per week in the regression model (b) Using a logarithmic transformation of hours spent on social media per week (c) Performing a RESET test to check for functional form misspecification in the regression model (d) Observing the pairwise scatterplot of hours spent on social media and GPA (e) Running a simple linear regression with GPA as the dependent variable and hours spent on social media per week as the only predictor

(c) Use the Akaike Information Criterion (AIC) to find the model that balances fit and complexity by penalizing models with excessive predictors. What's the goal?You want a model that balances: Good fit (explains the data well) Simplicity (avoids overfitting with too many predictors) Why AIC?The Akaike Information Criterion explicitly balances model fit and complexity — it penalizes excessive predictors, helping avoid overfitting. Why not adjusted R²?Adjusted R² adjusts for the number of predictors, but it doesn't directly account for predictive performance or penalize overfitting as systematically as AIC. Why are the others wrong? (b) VIF is for checking multicollinearity, not model selection. (d) Adding all interaction terms can overcomplicate the model. (e) Low p-values alone don't guarantee the best or simplest model.

A researcher is analyzing the impact of social media usage on academic performance using data collected from a survey of UCLA final year students last Fall. The dataset includes individual-level information on GPA, hours spent on social media per week, hours spent studying per week, major, and demographic information such as age, gender, and year in school. The researcher is interested in understanding the relationship between social media usage and GPA, controlling for other variables, and testing for potential econometric issues. Which of the following criteria would be most appropriate for selecting a model with both simplicity and predictive accuracy in explaining the relationship between social media usage and GPA? (a) Choose the model with the lowest adjusted R-squared, as it accounts for the number of predictors and improves interpretability. (b) Include only predictors with a Variance Inflation Factor (VIF) above 5 to ensure the model captures all possible relationships. (c) Use the Akaike Information Criterion (AIC) to find the model that balances fit and complexity by penalizing models with excessive predictors. (d) Add interaction terms between every pair of predictors to capture any potential combined effects on GPA. (e) Select the model with the lowest p-values for all predictors to ensure statistical significance.

(b) Advertising expenditure Granger causes sales revenue, but sales revenue does not Granger cause advertising expenditure. Granger causality test interpretation: Low p-value (< 0.05) → reject null → there is Granger causality. High p-value (> 0.05) → fail to reject null → no Granger causality. Here: p=0.002p=0.002 → reject null → AdExp → Sales. p=0.23p=0.23 → fail to reject null → Sales ↛ AdExp. So only AdExp Granger causes Sales → pick (b).

A researcher is studying the relationship between advertising expenditure (AdExp) and sales revenue (Sales) for a sample of 10 companies over 10 years. The researcher conducts Granger causality tests in both directions. Granger test 1 Null: AdExp does not Granger cause Sales → p-value = 0.002 Granger test 2 Null: Sales does not Granger cause AdExp → p-value = 0.23 Based on the results, which conclusion is correct? (a) Both advertising expenditure and sales revenue Granger cause each other, so we need a VAR model. (b) Advertising expenditure Granger causes sales revenue, but sales revenue does not Granger cause advertising expenditure. (c) Sales revenue Granger causes advertising expenditure, but advertising expenditure does not Granger cause sales revenue. (d) Neither advertising expenditure nor sales revenue Granger cause each other. (e) There is no significant relationship between advertising expenditure and sales revenue.

(d) The functional form of the regression model is misspecified. Heteroskedasticity means the variance of the error term changes across observations. To detect it, we often use: Breusch-Pagan test → null: homoskedasticity (so a large p-value → fail to reject, no heteroskedasticity). White test → also tests for heteroskedasticity. The functional form matters because if it's misspecified, variance patterns might emerge in the residuals, pointing to heteroskedasticity. Other options like too many variables or large sample size are unrelated to heteroskedasticity.

A test for heteroskedasticity can conclude that heteroskedasticity exists if: (a) The Breusch-Pagan test results in a large p-value. (b) The White test results in a very small test statistic. (c) The regression model includes too many independent variables. (d) The functional form of the regression model is misspecified. (e) The sample size is too large.

(c) Probit, LPM, Logit LPM (300+50)/500 = 70%; Probit = 67%; Logit = 71%

A test set of 500 observations is used to verify the strength of each model's prediction. Use the confusion matrices to order the models from least to most accurate: a) Logit, Probit, LPM (b) Logit, LPM, Probit (c) Probit, LPM, Logit (d) LPM, Probit, Logit (e) None of the given answers are correct.: Answer: C;LPM (300+50)/500 = 70%; Probit = 67%; Logit = 71%

(d) Current sales will increase by 0.1865 units. D; it'll be the sum of the individual marginal effects:0.618 − 0.4315 = 0.1865

Based on the given AR(2) model for weekly paper towel sales, what is the impact on current sales if there is a permanent increase in paper towel sales from two weeks ago by one unit?(a) Current sales will increase by 0.618 units. (b) Current sales will decrease by 0.4315 units. (c) Current sales will increase by 35.0501 units. (d) Current sales will increase by 0.1865 units. (e) We need more information to answer this question

(c) 1.68 5.7/15.3 ≈ 1.68 to two decimal places. Remember, the larger variance goes on top. Since both groups are the same size, their denominators cance

Given a dataset with 30 observations, you are conducting a Goldfeld-Quandt test forheteroskedasticity by splitting the ordered data into two groups. The sum of squared residuals for the first group (SSR1) is 15.3 and for the second group (SSR2) is 25.7. Each group contains 12 observations after removing the central observations and this model has only one independent variable. Calculate the F-statistic (rounded to twodp) (a) 2.57 (b) 2.41 (c) 1.68 (d) 1.67 (e) None of the given answer options are correct

(a) the functional form of the regression model is misspecified (a) True → Misspecifying the model (wrong variables, wrong functional form) can lead to heteroskedasticity. (b) Large p-value → means we fail to reject no heteroskedasticity. (c) Same — large p-value = no evidence of heteroskedasticity. (d) Too many variables may cause multicollinearity, not necessarily heteroskedasticity. (e) Small test statistic → we don't reject the null, meaning no heteroskedasticity.

Heteroskedasticity can occur in a regression model if . (a) the functional form of the regression model is misspecified (b) the Breusch-Pagan test results in a large p-value (c) the White test results in a large p-value (d) the regression model includes too many independent variables (e) the Breusch-Pagan test statistic is smaller than the critical value

(b) Fail to reject the null hypothesis, indicating no evidence of heteroskedasticity. (b) since 1.68 < 3.18, we fail to reject the null, indicating no evidence of h/s with respect to the variable it is split according to.

If F-Statistic is 1.68, Based on a critical value of 3.18 at a 5% significance level, what is the conclusion of the Goldfeld-Quandt test for heteroskedasticity? (a) Reject the null hypothesis, indicating heteroskedasticity is present. (b) Fail to reject the null hypothesis, indicating no evidence of heteroskedasticity. (c) Reject the alternative hypothesis, indicating no evidence of heteroskedasticity. (d) Fail to reject the alternative hypothesis, indicating heteroskedasticity is present. (e) The test is inconclusive.

(d) [0.02, 0.18]; it is not precise because the interval is wide due to heteroscedasticity. 0.1 ± 1.96 * 0.04, for the linear probability model, it is h/s by design.

If the marginal effect of ads on the probability of subscribing has a standard error of 0.04 for the LPM, calculate the 95% confidence interval for the marginal effect, rounded to two decimal places. (The critical value is 1.96). Is this confidence estimate precise, and why or why not? (a) [0.02, 0.18]; it is precise because the interval is narrow. (b) [0.1, 0.5]; it is not precise because the interval is wide due to small sample size. (c) [0.1, 0.5]; it is not precise because the interval is wide due to multicollinearity. (d) [0.02, 0.18]; it is not precise because the interval is wide due to heteroscedasticity. (e) None of the above answers are correct and we need more information to speak about the precision

(b) Yes, because it ensures that the properties of the series do not change over time, allowing for meaningful statistical inference Stationarity means the series has constant mean, variance, and autocovariance over time. Without it, statistical models can't reliably predict or infer because patterns shift unpredictably. So: stationarity is critical for valid inference and forecasting in time series models.

Is stationarity an important assumption in time series analysis? If so, why? (a) Yes, because it allows for the use of simple linear regression models. (b) Yes, because it ensures that the properties of the series do not change over time, allowing for meaningful statistical inference. (c) Yes, because it guarantees that the series has no outliers. (d) Yes, because it ensures that the mean of the series is zero. (e) No, stationarity is not an important assumption in time series analysis.

(e) None of the given combinations are correct (i) Dependent variable is a function of the residuals → (C) LM test auxiliary regression(e.g., Breusch-Pagan, Breusch-Godfrey, where residuals enter the auxiliary regression) (ii) Dependent variable is the feedback scores given to products based on customer satisfaction → (B) Ranked order model(used for ordinal outcomes like satisfaction rankings) (iii) Independent variables are lags of the dependent variable → (D) Autoregressive (AR) model(single time series regressed on its own lags) (iv) Lags of the dependent variable can be used to predict the independent variable, and vice versa → (A) Vector autoregressive (VAR) model(multi-variable system, where variables predict each other)

Match the descriptions on Side 1 with the appropriate model or test on Side 2. SIDE 1: (i) Dependent variable is a function of the residuals. (ii) Dependent variable is the feedback scores given to products based on customer satisfaction. (iii) The independent variables are lags of the dependent variable. (iv) Lags of the dependent variable can be used to predict the independent variable, and vice versa. SIDE 2: (A) Vector autoregressive (VAR) model (B) Ranked order model (C) Lagrange multiplier (LM) test auxiliary regression (e.g., Breusch-Pagan or Breusch-Godfrey tests) (D) Autoregressive (AR) model (a) i - D, ii - A, iii - C, iv - B (b) i - A, ii - C, iii - B, iv - D (c) i - B, ii - C, iii - A, iv - D (d) i - C, ii - B, iii - A, iv - D (e) None of the given combinations are correct

(c) We fail to reject the null of the ADF test and conclude that our time series is non-stationary. We fail to reject the null of the ADF test and conclude that our time series is non-stationary. We are told that the alternative hypothesis is stationarity, implying that the null is non-stationarity.

The following diagram illustrates weekly paper towels sales (measured in 10,000s rolls) for the most recent 124 weeks. Paired with the results of the ADF test: what can be concluded about this time series? a) We fail to reject the null of the ADF test and conclude that our time series is stationary. (b) We reject the null of the ADF test and conclude that our time series is stationary. (c) We fail to reject the null of the ADF test and conclude that our time series is non-stationary. (d) We reject the null of the ADF test and conclude that our time series is non-stationary. (e) None of the given answers are correct.

(a) 0.03 or (b) 0.04 Either A or B; both were awarded the marks. You can compute either the difference in the respective probabilities Φ(1.8) − Φ(1.5) = 0.0309, or the formula received in class φ(1.5) ∗ 0.3 = 0.03885 ≈ 0.04.

Using a probit model, what is the effect of one more ad on the probability of subscribing to the YouTube channel for a user who sees 5 ads and where the video has 20 views? (Note that Φ(1.5) = 0.9332, φ(1.5) = 0.1295, Φ(1.8) = 0.9641, φ(1.8) = 0.0790) rounded to two decimal places. (Select ONE answer option only) (a) 0.03 (b) 0.04 (c) 0.05 (d) 0.06 (e) None of the above are correct

(b) Since our test statistic exceeds the critical value, we reject the null and conclude that the errors are heteroskedastic and a function of FACTORY. We can use either: Breusch-Pagan test: uses χ2χ2 critical value. t-test on the coefficient: uses the tt-critical value. In both cases: The test statistic exceeds the critical value → we reject the null of homoskedasticity. Conclusion: heteroskedasticity exists, and it's related to FACTORY.

We observe that the least squares residuals e^ie^i​ increase exponentially in magnitude when plotted against FACTORY. We regress e^i2e^i2​ on a function of FACTORY and obtain R2=0.6066R2=0.6066 for this auxiliary regression. Furthermore, the estimated coefficient of the function of FACTORY is 2.024 with a standard error of 0.612. (Note: χ0.99,12=6.635χ0.99,12​=6.635 and t0.99,14=2.62t0.99,14​=2.62) What can we conclude about heteroskedasticity based on these results? (a) Since our test statistic exceeds the critical value, we fail to reject the null and conclude that heteroskedasticity exists as a function of FACTORY. (b) Since our test statistic exceeds the critical value, we reject the null and conclude that the errors are heteroskedastic and a function of FACTORY. (c) Since our test statistic is less than the critical value, we fail to reject the null and conclude that heteroskedasticity does not exist. (d) Since our test statistic is less than the critical value, we reject the null and conclude that the errors are a heteroskedastic function of FACTORY. (e) We cannot come to a conclusion from the information given.

(e) The estimated coefficients will remain unbiased, but the standard errors will be inefficient, leading to invalid inference. When heteroskedasticity is present: OLS coefficients → still unbiased. Standard errors → wrong (inefficient), so: Confidence intervals and hypothesis tests are no longer valid. You may make false inferences (Type I or Type II errors). That's why we use robust standard errors to correct inference when heteroskedasticity exists.

What are the consequences of using least squares estimation when heteroskedasticity is present in the data, given that all other regression assumptions hold? (a) The estimated coefficients will be biased. (b) The standard errors will be correctly estimated, leading to valid inference. (c) The residuals will be normally distributed. (d) The regression model will overfit the data. (e) The estimated coefficients will remain unbiased, but the standard errors will be inefficient, leading to invalid inference.

(d) 1.0 Predicted probability=0.2+(0.1*5)+(0.02*20)=0.2+0.5+0.4=1.1, where we censor it down to 1

What is the estimated predicted probability of subscribing to the YouTube channel for a user who sees 5 ads and where the video has 20 views, according to the LPM (censor your results where necessary if it falls outside the bounds of probability assumptions.)? (a) 0.0 (b) 0.5 (c) 0.6 (d) 1.0 (e) None of the above

(c) Var(ei|X) = σ2FACTORY 4 Var(ei|X) = σ2FACTORY 4. In order to rescale the model, we divide throughout by the square root of the scedastic function

What is the implicit assumption about the heteroskedasticity pattern? (a) Var(ei|X) = σ2FACTORY (b) Var(ei|X) = σ2FACTORY 2 (c) Var(ei|X) = σ2FACTORY 4 (d) Var(ei|X) = σ2√FACTORY (e) We cannot tell from the information given

(b) Heteroscedasticity What is a residual vs. fitted plot?It plots the residuals (errors) on the y-axis and the fitted (predicted) values on the x-axis. What does it reveal? If the spread (variance) of residuals increases or decreases across fitted values → that's heteroscedasticity(non-constant variance). Example: residuals "fan out" or "cone" as fitted values increase. Why not the others? Multicollinearity → check with VIF, not residual plots. Autocorrelation → shows up in time series, check with Durbin-Watson. Normality of residuals → check with Q-Q plots or histograms.

What is the primary issue that a residual vs. fitted values plot can help identify in a linear regression model? (a) Multicollinearity (b) Heteroscedasticity (c) Autocorrelation (d) Normality of residuals (e) None of the above answers

(d) Lags 1, 2, and 4 only, suggesting a moving average (MA), autoregressive (AR), or ARMA model For this our test statistic is rk × √T , so Lags 1, 2, and 4 meet thisrequirement. 3 is within the range ±1.96. Given T=1125, we compute: √1125≈33.55 Then: Lag 1: 0.58×33.55≈19.46 → >1.96 → significant Lag 2: −0.12×33.55≈−4.03 → >1.96 → significant Lag 3: 0.04×33.55≈1.34 → ≤1.96 → not significant Lag 4: −0.36×33.55≈−12.08 → >1.96 → significant

Which of the first 4 lags are statistically significant at the 5% level, if Z0.975=1.96Z0.975​=1.96 and there are 1125 effective periods in the study?If we are told that all lags thereafter are statistically insignificant, what conclusion can we draw about an appropriate model? (a) Lag 1 only, suggesting an autoregressive (AR) model with a small coefficient (b) Lags 1 and 2 only, suggesting a moving average (MA) model (c) Lags 1 and 3 only, suggesting a moving average (MA), autoregressive (AR), or autoregressive moving average (ARMA) model (d) Lags 1, 2, and 4 only, suggesting a moving average (MA), autoregressive (AR), or ARMA model (e) Lags 1, 2, 3, and 4, suggesting an autoregressive (AR) model

(e) All the answer options from (a)-(d) are false (a) WLS works only when we know the exact form of error variance to apply correct weights. (b) The Breusch-Pagan test checks if error variance changes systematically with predictors (heteroskedasticity). (c) We can calculate F-tests robust to heteroskedasticity using White's robust standard errors. (d) The White test regresses squared residuals on predictors, their squares, and interactions to detect heteroskedasticity or misspecification.

Which of the following is (are) false? (a) Weighted least squares estimation is used only when the functional form of the error variances is known (b) A Breusch-Pagan test is commonly used to test for heteroskedasticity in a regression model by examining whether the variance of the error terms is constant across levels of the independent variables. (c) It is possible to obtain F statistics that are robust to heteroskedasticity of an unknown form (d) The White test assumes that the square of the error term in a regression model is uncorrelated with all the independent variables, their squares and cross products (interactions) and can be used for functional form analysis (e) All the answer options from (a)-(d) are false

(e) bg test BG test → used to detect autocorrelation, not stationarity. ACF, ADF test, ndiffs/autoselection → all help detect stationarity in time series.

Which of the following is not used to detect whether a series is stationary or not? (a) time series (b) ACF (c) ADF test (d) ndiffs or autoselection tool (e) bg test

(c) (i) and (iv) only Residuals (the estimated errors from the regression) → by design, they always sum to zero when the model includes an intercept.This is a result of the least squares minimization. True errors (the unobservable actual errors) → we have no reason to expect they sum to zero; they depend on the true data-generating process. So:✔ (i) True → residuals sum to zero.✔ (iv) True → actual errors don't necessarily sum to zero. That's why the correct combo is (i) and (iv).

Which of the following is true? (i) Residuals from the model always sum to zero. (ii) Residuals from the model don't always sum to zero. (iii) The actual errors always sum to zero. (iv) The actual errors don't always sum to zero. Options: (a) (i) and (iii) only (b) (ii) and (iv) only (c) (i) and (iv) only (d) (ii) and (iii) only (e) None of the above combinations are correct

(e) Endogeneity Endogeneity occurs when an independent variable is correlated with the error term — this directly violates the assumption that the regressors are exogenous.→ This leads to biased and inconsistent coefficient estimates. Other terms: Homoskedasticity → constant variance of errors (violation leads to inefficient estimates, but no bias). Autocorrelation → errors correlated over time (biases standard errors, not coefficients). Multicollinearity → predictors correlated with each other (inflates variance, but no bias). Random sampling → if done correctly, ensures unbiasedness.

Which of the following violations introduces bias to the regression model? (a) Homoskedasticity (b) Autocorrelation (c) Multicollinearity (d) Random sampling (e) Endogeneity


Kaugnay na mga set ng pag-aaral

Unit #13: Types and Characteristics of Fixed Income (Debt) Securities and Methods Used to Determine Their Value.

View Set

STUDY 2021 WHAP 1ST SEMESTER FINAL

View Set

Chapter 16- Nursing Management During the Postpartum Period

View Set

Constitution and Gov Power Final

View Set

Chapter 10: Section 1 The First New Deal

View Set

AP Review Key Concepts - Period 2 A (1648-1815)

View Set

From Monalisa to Modernism - Artworks

View Set

Ch 2 Econ MidtermIn a market system, scarce goods are allocated by

View Set