Hagrannsóknir 1 - spurningar fyrir lokapróf
If a plot shows residuals that are spreading out like heteroskedasticity, how might you model Y?
With a correction for heteroskedasticity.
If the Durbin-Watson statistic d is between 0 and dL this indicates:
0 < d < dl dl is never higher than 1,65 for example 1 < 1,65 Reject H0, there is serial correlation. A positive first-order autocorrelation.
If the coefficient of determination is equal to 1 in a regression problem the
1 = ESS/TSS Residuals sum of squares (RSS) or Error sum of squares must be 0. ESS + RSS = TSS If RSS = 0 then ESS = TSS R^2 = 1 If ESS explained sum of squares or regression sum of squares is 0 Then TSS = RSS then R^2 = 0.
Arna is analysing multiple regression model with three independent variables. She needs to check whether the errors from her model are homoscedastic, She has decided to set up a Breusch Pagan test. How many degrees of freedom will the N*R_squared statistic have?
3 degrees of freedom.
What is the correct interpretation of the following model? Log(wage) = 0,584 + 0,083educationi + 0,02femalei + 0,004(education*female)
An extra year of education is increasing female salary by 8,7% Restricted equation for females: (0,584+0,02) + 0,083+0,004 = 0,604 + 0,087education. Thus an extra year of education is increasing female salary by 0,087*100 = 8,7%
Katrín needs to verify whether there is a problem of heteroskedasticity in her model (OLS regression with 3 independent variables). She has set up a Breusch-Pagan test, the value of the statistic is 39. What is the 5% critical value and the conclusion from the test?
Breusch pagan: we use the Chi-square test. Degrees of freedom: 3 Chi square critical: 7,81 39 > 7,81 We reject the null hypothesis, there is evidence of heteroskedasticity.
A way to check whether multicollinearity problem is present in the model is to:
Calculate variance inflation factors, VIF. Or we could make a correlation matrix. We do not make a scatter plot of the errors of the model, this would be appropriate when testing for heteroskedasticity.
What is the current effect of sales on y in the period t and what is the long term effect of a 1 unit increase in sales in period t? Y = -43,8 + 6salest + 0,25yt-1
Current effect. 6*1 Long term affect 6/(1-0,25) = 8
Anna is analysing multiple regression model with two independent variables. There are 69 observations in her dataset. She needs to check whether the errors from her model are homoscedastic. She has decided to set up the White test. How many degrees of freedom will the N*R_squared statistic have?
Degrees of freedom two independent variables two independent variables in the power of 2 one independent variable that is the sum of X1*X2 Thus: 5 degrees of freedom.
What is the correct information about the test below: Dickey-Fuller = -8.342 p-value = 0.02
Dickey fuller tests for stationarity. We reject the null hypothesis, the series are stationary.
If we have a model and add dummies for each hour of the day, how can we verify that these dummies are jointly significant?
Do an F-test for subgroup for the 23 dummies.
What is the difference in F test between multiple regression, time series and anova?
F test always has two degrees of freedom Multiple regression Independent variables = K and (N-K-1). Time series. - granger causality. Degrees of freedom is the number of restrictions (implied by the null hypothesis) and (N-K-1) Anova, Degrees of freedom is K-1, K is the number of groups. n-K. N is all the data points from every group.
Arnar is estimating a multiple regression with days of the week dummies. He wants to verify whether these dummies are together important for the model. For this he needs to set up:
F test for a subset of the coefficients.
in a one way ANOVA with 3 groups and N=15, the value of the F-statistic is equal to 3.9. What is the critical value (with α=0.05)? Do you reject or fail to reject H0?
F value: K-1 = 3 = 2 N - K - 1 = 15 -3 = 12 The critical value F: 3,885 3,9 > 3,885 = we reject H0.
In order to verify whether exchange rate Granger causes balance of payments Arnar needs to set up:
F-test for subgroup
A real estate broker is interested in identifying the factors that determine the price of a house. She wants to run the following regression: y= B0 + B1x1 + B2x2 + B3x3 + u where Y = price of the house in $1,000s, x1= number of bedrooms, x2= square footage of living space, and x3= number of miles from the beach. Taking a sample of 30 houses, the broker runs a multiple regression and gets the following results: Y=123.2 + 4.59x1 + 0.125x2 - 6.04x3 Sb1=1.2 standard error on B1 Sb2=2.13 standard error on B2 Sb3=4.17 standard error on B3 What is the 95% confidence interval for B1?
First. B3 = -6,04 T critical: 0,05/2 with (30-3-1) = 26: 2.056 Rember this is a two sided test. Confidence interval 2,056*1,2 = 2,47 Lower level: 4,59 - 2,47 = 2,12 Upper level: 4,59 + 2,47 = 7,06. As Zero is not in the confidence interval we can reject the null hypothesis of H0: B1 = 0 It is significant.
Arnar is estimating a model of electricity prices where the dependent variable is daily electricity price, as independent variables he uses demand for electricity, daily temperature and daily price of gas. He believes that daily electricity prices follow a weekly pattern and decides to include dummies for days of the week. How many dummies does he need to include?
Here he want a dummy for days of the week. We always include one dummy less than catagories. If he had monthly data then he would have 11 dummies. So 6 dummies.
What is heteroskedasticity? What are the consequences of heteroskedasticity? How does the Breusch-Pagan test, test for heteroskedasticity? State the null hypothesis of the test and write down the auxiliary regression that is performed. What conclusions can you draw from these results?
Heteroskedasticity means that the variance of the error term or the residuals is non-constant and it depends on the observation. We violate CA 5. It does not cause bias in the estimation but T-tests will be unreliable because the standard errors will be unreliable. OLS is no longer blue, not biased but not mininum variance estimator. Breusch pagan tests works by taking the residuals, make a auxiliary equation with E^2 because we are looking at variance, CA 2 tells us that the error term has a zero population mean. Auxiliary equation: e^2 = a0 + a1rooms + a2area + a3cbd + u. H0: a1 = a2 = a3 = 0 H1: H0 is false. P value: 0,5477 > 0,05 We fail to reject the null hypothesis, no evidence of heteroskedasticity in our model.
Value of variance inflation factor below 5 is an indication that:
If VIF is over 5, multicollinearity is severe. There is no evidence of multicollinearity in the model.
Correlation coefficient is -0,9 between two independent variables in a regression model indicates.
If r is close to -0,9 or 0,9 then multicollinearity is a problem.
Which of the following is expected to occur in multiple regression analysis if an important variable is omitted from the list of independent variables?
If we omit a variable then an independent variable will be correlated with the error term. It will lead to biased least squares estimators.
Why do we use Breusch Pagan for heteroskedasticity and auto correlation in time series models?
In time series we are checking for auto correlation, if the auto correlation is severe then we most likely have a non stationary series. In stationary series the autocorrelation declines rapidly. If we have a non stationary series the error term and the residuals can also be nonstationary, this indicates heteroskedasticity problem. The variance of the error term changes over time and thus a heteroskedastic error term is also nonstationary. In heteroskedasticity and serial correlation tests we are looking at the error term, or the residuals.
In regression models, muticollinearity arises when the:
Independent variables are highly correlated with one another.
Which of the following is true of the error term in a linear regression?
It represents the joint influence of factors, other than the dependent and independent variables, on the regression model.
In testing the validity of a multiple regression model, a large value of the F-test statistic indicates that:
Large value of F test indicates that the model has significant explanatory power and at leas one slope coefficient (B) is not zero. H0: B2 = B3 = 0. If F test is high then we likely reject the null hypothesis and say that B2 and B3 is not zero. At least one of the tested variables is a good predictor of the dependent variable, it does not mean that all of them are, we use T tests for that.
Katrin has set up a model evaluating hourly electricity prices using observations from the last year (price of electricity is the dependent variable). She has set up a test to verify whether data are stationary. The t-statistic for electricity prices is equal to -1,2. What can she conclude?
Looking at the dickey fuller table for 5% The lowest number is 2,86 -1,2 > -2,86 Therefore for any sample size given, we would not reject the null hypothesis, series are nonstationary.
Sam wants to analyse electricity prices. He has set up a multiple regression model with 3 independent variables, his dataset includes 24 monthly observations. What is the lower critical value of the one sided Durbin-Watson test at 5% significance level.
More appropriate test would be the breusch pagan test, as this is monthly data. But, K = 3 and N = 24 Lower level: 1,10
What is multicollinearity? What are the consequences of multicollinearity? Do you think that multicollinearity is a problem in this model? What information from the tables below can you use in that assessment?
Multicollinearity is when two independent variables are highly correlated with each other, and therefore have a high R^2 when estimated. The consequences are OLS is no longer the minimum variance estimator, no longer BLUE but the estimates are still unbiased. Perfect multicollinearity is a break in CA 6. Standard error will increase though. There might be some problems with multicollinearity if the correlation between two variables are around 0,80. There is not a problem with multicollinearity if the VIF is lower than 5.
In order to test the validity of a multiple regression model involving 5 independent variables, an intercept and 50 observations, the statistics for assessing the overall significance of the model follows:
Multiple regression and overall significance is F test. We have K = 5, N = 50 Degrees of freedom 5 and (50-5-1) 44.
You have a model with 4 independent variable and 293 observations. Your Anova table Is presented below. Regression sum of squares: 7,781 Residual Sum of squares: 9,060 What is the model error variance?
RSS / n - 2 9,060/293-2 = 0,031
You have. model with 2 independent variables and 13 observations. Anova table is presented below Regression Sum of squares: 300 Residuals sum of squares: 200 What is the value of the overall F-test statistic?
RSS = 200 ESS = 300 TSS = 500 K = 2, N = 13. (300/2)/200/(13-2-1) 150/20 = 7,5
You have a model with 3 independent variables and 30 observations. Your ANOVA table is presented below. Regression sum of squares: 200 Residuals Sum of squares: 300 What is the coefficient of determination?
RSS = 300 ESS = 200 TSS = 500 R^2 = 200/500 = 0,4 eða 1- 300/500 = 0,4
What is the value of coefficient of determination for the following case: (Yi - Y average)^2 = 25 (Yi - Y estimated)^2 = 5
RSS = 5 TSS = 25. 1- 5/25 = 0.8
What is ESS? What is RSS?
RSS = sum of squared residuals. Take from Residual in anova. ESS = Regression. Explained sum of squared. RSS = SSW in Anova. ESS = SSG. MSG = SSG/K-1 MSW = SSW/n-K F = MSG/MSW
If the coefficient of determination is 0 then:
R^2 = ESS/TSS or 1-RSS/TSS. If explained sum of squares is 0 (ESS) this indicates that the model explains none of the variability of the response data around its mean. ESS (SSR) = regression sum of squares must be 0.
Maria has set up a regression using monthly data. Among the independent variables there is a lagged dependent variable on the right hand side. Maria wants to test the model for autocorrelation, which test should we use?
She used montly data and she has a lagged dependent variable. Appropriate test would be Lagrange multiplier test or Breusch Pagan.
First thing to be checked when we work with time series is:
Stationarity. We can plot up the graph and see whether the line is growing or not. If it is growing, nonstationary. We confirm this by doing a dickey fuller test.
OLS assumption 6 says that there should be no multicollinearity problem in the model. This means:
That independent variables should not be correlated with each other. Meaning it predicts the same thing, having sales tax and sales in a model.
If we get a p value of 0,89 in a t-test what can we conclude?
The coefficent is not significant at any significance levels.
Suppose you were to run a regression of leisure travel expenditures by households on household income. We would expect that households with low incomes do not travel much. High-income households may or may not travel much, depending on a household's preferences for travel. The results from this regression will be subject to:
The observations of error term are not drawn from identical distributions. It is more likely that high income households have greater variance in travel expenditures compared to low income household. Evidence of heteroskedasticity.
If in a model, all the points on a scatter diagram lie on a straight line, what is the value of the RSS?
The value is 0
What are the consequences of multicollinearity?
The variance and standard errors of the estimate increases. The OLS estimators are still Blue, estimates will not be biased. Coefficients will not be over estimated.
You are estimating a model with price of a bicycle as the dependent variable. There are four independent variables in the model: age of the bicycle, number of gears, number of previous owners, a dummy equal to 1 when a bicycle has a basket and 0 otherwise. Model is of the following type: Y = B0 + B1x1 + B2x2 + B3x3 + B4x4 + u What additional model would you need to estimate in order to verify the following hypothesis. H0: B1=B2=0 H1: otherwise
Unconstrained equation Y = B0 + B1x1 + B2x2 + B3x3 + B4x4 + u Constrained equation -We drop B1 and B2 Y = B0 + B3x3 + B4x4 + u Number of constraints: 2 Degrees of freedom (N-K-1) = N - 4 -1.
Arna has been analysing yearly GDP for the last 38 years. She has set up a model in which she has explains GDP with 2 explanatory variables. While testing for positive autocorrelation she finds out that d is equal 1,68 at 5% significance level. What is the conclusion of this test?
We look at the durbin watson test. Why? because we are looking at yearly GDP, meaning it is lagged once. K = 2 and N = 38 Lower value: 1,38 Upper value: 1,59 1,68 > 1,59. We do not reject H0, there is no evidence of positive serial correlation.
If an independent variable, such as seasons, contains exactly four categories, then how many dummy variables will be needed to uniquely represent these categories?
We will need one fewer dummies than catagories, so 3.
Breusch Pagan test allows to check (heteroskedasticity):
Whether the errors have the same variance. H0: a1 = a2 = a3 = 0 errors are homoskedastic.