Quant2 Final
Stochastic Error Term
A term added to a regression equation to introduce all the variation in Y that cannot be explained by the included X's.
The F test statistic in a one-way ANOVA is
A) MSB/MSW.
Specification of the Model
Choose independent variables, functional form, and stochastic error term
Which of the following is true for a two-factor ANOVA model:
D) A and B above are true
The Y-intercept (b0) in the expression
D) predicted value of Y when X = 0.
The p-value measures
D) the lowest possible level of significance for which the null hypothesis would still be rejected.
Critical Value
Divides the acceptance from rejection region
Gauss-Markov Theorem
Given the Classical Assumptions, the OLS estimator is the minimum variance estimator from all linear unbiased estimators
Consequence of Omitting a Relevant Variable
Leads to biased estimates of other variables
Four sources of "specification error"
Omitted variables, Measurement Error, Different Functional Form, Nature of Random Component
Meaning of Regression Coefficient
The impact of a one-unit increase in X1 on the dependent variable Y, holding all other independent variables constant.
asdlfasdkjl;fasdf
awk;lfasdkljlkajsd;f
T/F A completely randomized ANOVA design with 4 groups would have 12 possible pairwise comparisons mean comparison.
f
T/F Data that exhibit an autocorrelation effect violate the regression assumption of homoscedasticity, or constant variance.
f
T/F The coefficient of determination is computed as the ratio of SSE to SST.
f
T/F If the null hypothesis is true in one-way ANOVA, then both MSB and MSW provide unbiased estimates of the population variance of the variable under investigation.
t
T/F The confidence interval for the mean of Y given X in regression analysis is always narrower than the prediction interval for an individual response Y given the same data set, X value, and confidence level.
t
The least squares method minimizes which of the following?
C) SSE
Interaction between main effects in an experimental design can be tested in
C) a two-factor model.
A regression diagnostic tool used to study the possible effects of collinearity is
C) the VIF.
The coefficient of determination (r^2) tells us
C) the proportion of total variation (SST) that is explained (SSR).
E) All of the above.
In a multiple regression model, which of the following is correct regarding the value of the adjusted r^2 ?
D) the estimated mean of Y increases by 2 units for each increase of 1 unit of X1, holding X2 constant.
In a multiple regression problem involving two independent variables, if b1 is computed to be +2.0, it means that
What do we need to predict direction of change for individual variables?
Knowledge of economic theory and general characteristics of how explanatory variables relate to the dependent variable under consideration
Consequence of Including an Irrelevant Variable
Leads to higher variances of estimated coefficients
Level of Significance
Level of Type 1 Error
R Squared Formula
The ratio of SSR/SST, explained sums of squares divided by total sums of squares
T/F Collinearity among explanatory variables in multiple regression analysis increases the standard errors of the beta coefficient estimates, thereby increasing the likelihood of committing a Type II error.
t
T/F Even though multicollinearity and autocorrelation are violations of major assumptions of multiple regression analysis, our simulations reveal that the expected values of the coefficient estimates are still equal to the true population values.
t
T/F If there are four dummy variables for some phenomenon such as occupation group, and all four are included in the regression, then the constant term must be excluded.
t
T/F Regression analysis is used for prediction and estimation of the impact and explanatory power of an independent variable on a dependent variable, while correlation analysis is used to measure the strength of the linear relationship between two numerical variables and implies no causal relationship.
t
Sum of Residuals in OLS
zero
Degrees of Freedom, Multiple Regression
1 - MSE/MST, where MSE is the Mean Squared Error (Variance of the error term) and MST is estimated variance of the dependent variable
Standard Error of Beta Coefficient
A measure of sampling variation (standard deviation) of the slope term estimates of the population parameters. Dividing this value into the beta coefficient estimate yields a t-ratio for comparision with the null that the true Beta equals zero
Durbin-Watson Statistic
A measure of the extent of serial correlation. A value of 2 indicated no evidence of serial correlation.
Null Hypothesis
A numeric contention or requirement that the researcher seeks to determine whether statistical evidence is supportive
Confidence Interval
A range containing the true value of an item a specific percentage of the time.
The degrees of freedom for the F test in a one-way ANOVA for the numerator and denominator, respectively, are
A) (c - 1) and (n - c).
High levels of intercorrelation among independent variables in a multiple regression model
A) can result in a problem called multicollinearity
In a one-way ANOVA
A) there is no interaction term.
Why would you use the Levene procedure?
A) to test for homogeneity of variance
Type II Error
Accepting a false null hypothesis. Also known as a beta error, and can be calculated only in reference to specific values of the alternative hypothesis
Two Sided Test
Alternative hypothesis is given for two sides of the null (null=0)
One Sided Test
Alternative hypothesis is only given for one side of the null, a more "powerful" test than a two sided test, meaning lower probability of Type II error
Residual Sum of Squares, a.k.a SSE for Sum of Squared Errors
Amount of squared deviation that is unexplained by the regression line, the squared difference of the actual value of Y from the predicted value of Y from the estimated regression equation, that is Sum(Y - Yhat)^2
Explained Sum of Squares, a.k.a SSR for Sum of Squares Regression
Amount of the squared deviation of the predicted value of Y as determined by the estimated regression equation from the mean of Y, that is Sum(Yhat - Ybar)^2
In a two-way ANOVA the degrees of freedom for the interaction term is
B) (r - 1)(c - 1).
If the Durbin-Watson statistic has a value close to 0, which assumption is violated?
B) independence of errors
An interaction term in a multiple regression model may be used when
B) the relationship between X1 and Y changes for differing values of X2.
The standard error of the estimate is a measure of
B) the variation of the dependent variable around the sample regression line.
Logarithm transformations can be used in regression analysis
B) to change a nonlinear model into a linear model.
C) homoscedasticity
Based on the residual plot to the right, you will conclude that there might be a violation of which of the following assumptions?
OLS is BLUE
Best Linear Unbiased Estimator, meaning that no other unbiased linear estimator has a lower variance than the least-squares measures
Estimated Regression Coefficients
Beta hats, are empirical best guesses, obtained from a sample
Reqression Coefficient Elasticity Measure
Correlation of the error terms, typically first order correlation where the error for time period t is correlated with the error from the prior time period, t-1. Tends to reduce standard errors, meaning that we are more likely to say that a variable matters when it doesn't (Type I Error)
Serial Correlation a.k.a. Autocorrelation
Correlation of the error terms, typically first order correlation where the error for time period t is correlated with the error from the prior time period, t-1. Tends to reduce standard errors, meaning that we are more likely to say that a variable matters when it doesn't (Type I Error)
Which of the following will generally lead to a model exhibiting a "better fit?"
D) All of the above tend to improve the "goodness of fit".
Clearly among all of the statistical techniques that we have studied, the most powerful and useful, because it is capable of incorporating other statistical models, is
D) Multiple regression analysis
In a one-way ANOVA, if the computed F statistic exceeds the critical F value we may
D) reject H0 since there is evidence that some of the means differ, and thus evidence of a treatment effect.
Why would you use the Tukey-Kramer procedure?
D) to test for differences in pairwise means
If the F statistic yields a p-value for a multiple regression model that is statistically significant, which of the following is true?
D)The regression coefficient for at least one independent variable is statistically significantly different from zero ( β1 ≠ 0; or β2 ≠ 0; or...or βk ≠ 0)
Cross-Sectional
Data set that includes entries from the same time period but different economic entities (countries, for instance)
Time Series Data
Data set that is ordered by time, typically generating an serial or autocorrelation violation of the randomness of the error term
Three purposes of econometrics
Describe reality, Test hypotheses, Predict the future
Critical T Value
Determined by degress of freedom and the level of significance
Residual, e
Difference between dependent variable's actual value and the estimated value of the dependent variable from the regression results
Sampling Distribution
Distribution of different values of B Hat across different samples
If the correlation coefficient (r) = 1.00 for two variables, then
E) A and D above.
In a simple two-variable linear regression model with
E) All of the above are true about the simple two-variable model.
The width of the prediction interval for the predicted value of Y is dependent on
E) All of the above.
Signs that multicollinearity among explanatory variables may be a problem are indicated by
E) All the above are indicators of multicollinearity problems.
Correlation of the error terms in a multiple regression model
E) All the above are true
Multiple Regression Analysis
E) All the above are true of multiple regression analysis.
C) 4
If a categorical independent variable contains 5 distinct types, such as a Student Home Residence factor (OKC region, Tulsa Region, Other OK, Texas, Other) , then when the model contains a constant term, ________ dummy variable(s) will be needed to uniquely represent these categories.
C) 4
If one categorical independent variable contains 4 types and a second categorical independent variable contains two types, then when the model contains a constant term, ________ dummy variable(s) will be needed to uniquely represent these categories.
B) 2.0
If the residuals in a regression analysis of time ordered data are not correlated, the value of the Durbin-Watson D statistic should be near ________.
Omitted Variable
Important explanatory variable has been left out
Multicollinearity
Intercorrelation of Explanatory variables that can lead to expansion of the standard errors of the coefficient estimates, thereby leading one to say that a variable does not matter when it actually does (Type II error)
Classical Assumptions
Linear, Zero Population Mean, Explanatory Variables Uncorrelated with Error, Error Term is Uncorrelated with Itself, Error has Constant Variance, No Perfect Multicollinearity
Multivariate Regression Coefficient
Measures change in the dependent variable associated with a one unit increase in the independent variable, holding all other independent variables constant.
R Squared Meaning
Measures the percentage of the variation of Y around the mean of Y that is explained by the regression equation.
Simple Correlation Coefficient, r
Measures the strength and direction of a linear relationship between two variables
Dummy variable
Only takes on values 0 and 1
Econometrics standard tool
Ordinary Least Squares or OLS: Single-equation linear regression analysis
OLS or "Ordinary Least-Squares"
Regression technique which minimizes the sum of squared residuals
Type I Error
Rejecting a true null hypothesis. Also known as the alpha error, as determined by the level of significance
Unbiased Estimator
Sampling distribution has its expected value equal to the true value of B.
Normalized Beta Coefficient
Slope-term multiplied by the ratio of the standard deviation of the independent variable to the standard deviation of the dependent variable. Transformed slope-term then reads as the standard deviation change in Y per one standard deviation change in X.
Standard Error of the Estimate
Square-root of the Mean Squared Error (MSE), a measure of the average deviation of error terms about the regression line.
Alternative Hypothesis
Statement in opposition to the null that the researcher seeks to detetermine whether statistical evidence is sufficient to call into question the null hypothesis. The "Research Question" that statistical evidence seeks to confirm
Econometrics
Statistical measurement of economic phenomena to determine independent influence of explanatory variables on a specified dependent variable
Regression Analysis
Statistical technique to "explain" movements in one variable as a function of movements in another
Total Sum of Squares, a.k.a. SST for Sum of Squares Total
Sum of squared variations of Y around its mean
B) testing that the slope (β1) differs from zero in a two variable regression.
Testing for the existence of statistically significant correlation between two variables is equivalent to
Adjusted R Squared Formula
The R Squared that has been adjusted for Degrees of Freedom lost, since adding an independent variable to the original R Squared will likely increase it, even if the fit isn't necessarily better.
Adjusted R Squared Meaning
The R Squared that has been adjusted for Degrees of Freedom lost, since adding an independent variable to the original R Squared will likely increase it, even if the fit isn't necessarily better.
B) coefficient of correlation.
The strength of the linear relationship between two numerical variables may be measured by the
Mean Squared Error
Variance of the Regression, or SSE/(n-k-1) where SSE is "sum of squared errors," n is # of observations, k is # of independent variables.
B) The model is a better predictor of Y than the sample mean,
What do we mean when we say that a simple linear regression model is "statistically" useful?
Correlation of the resulting error terms with any of the explanatory variables and the dependent variable.
Zero, that is that the regression results yield estimated errors that are uncorrelated with any of the variables used in the regression model