ch 12 isds 265
What values can the coefficient of determination, R2, assume?
0 ≤ R2 ≤ 1.
In simple linear regression, a downward sloping trend line suggests which of the following?
A negative linear relationship between x and y.
In a simple linear regression, an upward sloping trend line suggests which of the following?
A positive linear relationship between x and y.
If the residuals are plotted across values of an explanatory variable, what is true with regard to identifying nonlinear patterns?
A trend in the plot of residuals indicates a nonlinear pattern.
Which of the following would indicate a violation of the assumption of constant variability?
An increase in the variability of the residuals over values of x.
In a plot of residuals versus the each explanatory value or the predicted value, what would indicate changing variability?
An increase or decrease in the spread of residuals.
The coefficient of determination can assume which of the following values?
Between zero and one.
Which of the following is a goodness-of-fit measure of linear regression models?
Coefficient of determination
Which of the following is another name for R2?
Coefficient of determination
Which of the following defines multicollinearity?
Correlation between the explanatory variables
Which one of the following is true about the sample multiple regression equation?
Each explanatory variable has its own coefficient.
The standard error of the estimate is calculated as
ErrorSumofSquares √ n−k−1
True or false: If the assumption of constant variability is violated, then the least squares estimators are biased.
False
True or false: Multiple linear regression is an extension of simple linear regression in that more than one response variable is used.
False
True or false: The adjusted R2 always goes up as more explanatory variables are added to the regression model.
False
Consider the multiple regression equation: SP = β0+β1(SF)+β2(AGE)+ε, where SP = selling price of the house, SF= square footage of the house and AGE = age of the house. In a hypothesis test to determine whether AGE has a significant linear influence on the selling price of the house, what is the alternative hypothesis?
HA: β2 ≠ 0
In an attempt to predict y, three models are estimated. The coefficient of determination for Model 1 equals 0.10, for Model 2 equals 0.64, and for Model 3 equals 0.87. According to the coefficient of determination, which model provides the best fit?
Model 3
Which of the following are true about the ANOVA table? Select all that apply.
THERE ARE TWO OF THESE!!! The ANOVA table summarizes the error explained and unexplained by the regression model. The F statistic is found by MSR / MSE. The ANOVA table facilitates the calculation of the test statistic.
Which of the following are true about the ANOVA table? Select all that apply.
THERE ARE TWO OF THESE!!!Total df = Regression df + Residual df. Total df is n - 1.
In regression analysis, ANOVA is used for which of the following?
Test of joint significance.
The residual e represents
The difference between an observed and predicted value of the response variable at a given value of the explanatory variable
What is the unfortunate consequence of heteroskedasticity?
The t and F tests are unreliable.
When is the multiple regression model used?
When the researcher believes that two or more explanatory variables influence the response variable.
In a regression model, the Multiple R is the
correlation between the response variable and its predicted value.
In regression analysis, the response variable is also called the
dependent variable.
When the response variable is uniquely determined by the explanatory variable, the relationship is _____
deterministic.
A possible solution for multicollinearity is
dropping a redundant explanatory variable from the model.
Regression analysis is used to
examine the relationship between two or more variables.
Which of the following is degrees of freedom for the test statistic for the test of individual significance?
n-k-1
The R2 of a multiple regression of y as a function of x measures the
percentage variability of y that is explained by the variability of x.
The standard error of the estimate is the standard deviation of the
residuals.
The method of least squares minimizes the
sum of squared errors.
Which of the following is the test statistic for the test of individual significance?
tdf = bj−βj / sbj
All of the following are goodness-of-fit measures for linear regression models EXCEPT
the coefficient of variation.
In regression analysis, a large F statistic indicates that
the explanatory variables are explaining a large portion of variation in the response variable.
Consider the simple linear regression model: y = β0 + β1x + ε. What is the notation for the random error term?
ε
In a simple linear regression model, if all of the data points fall on the sample regression line, then the standard error of the estimate is
0
In a regression, if the Multiple R equals 0.80, then R2 equals
0.64.
Which of the following correlation coefficients between two explanatory variables would be a concern that multicollinearity exists?
0.92 -0.83
In a simple linear regression model, if all of the data points fall on the sample regression line, then the coefficient of determination is
1
Adjusted R2 is calculated as
1 - (1 - R^2)(n−1 / n−k−1)
If the sample regression equation is found to be yˆŷ = 10 + 2x1 - 3x2, which of the following is true?
2) THERE ARE TWO OF THESE!!! Mathematically, the predicted value of the response variable equals 10, when both x1 and x2 equal 0..
Which of the following is NOT true of the standard error of the estimate?
It can take on negative values.
When a plot of the residuals against time shows a wavelike pattern, which of the assumptions is likely violated?
No serial correlation
How many explanatory variables does a simple linear regression model have?
One
Which of the following are consequences of serial correlation? Select all that apply.
Regression coefficients are biased. Regression model looks better than it is. R2 is overestimated.
Implementing a correction for the standard errors is a remedy to which common violation in regression?
THERE ARE TWO OF THESE!!! Changing variability
Implementing a correction for the standard errors is a remedy to which common violation in regression?
THERE ARE TWO OF THESE!!! Correlated observations
What is a residual in regression analysis?
The difference between observed and predicted y values.
For which of the following situations is a simple linear regression model appropriate?
The response variable, y, is influenced by one explanatory variable.
Heteroskedasticity is a violation of which of the assumptions underlying regression analysis?
The variance of the error term ε is the same for all observations.
Suppose you estimate the model y = β0+β1x+ε. If it is determined that β1≠ 0, then it can be concluded that
There is a linear relationship between x and y.
A sign that serial correlation might be present would be
a wavelike pattern of the residuals.
Consider the multiple regression equation: y = β0+β1x1+β2x2+ε, If a joint test of significance leads to rejection of the null hypothesis, then
at least one explanatory variable is significant.
Which of the following is true regarding the calculation of the sample regression equation?
b1 must be calculated before b0.
Heteroskedasticity can be corrected by
correcting standard errors of the estimated coefficients.
The presence of serial correlation can be remedied by
correcting the standard errors of regression coefficient estimates.
One way to detect multicollinearity is to
examine the correlations between the explanatory variables.
Choose the option that best completes the following: Since the standard error of the estimate
has no predefined upper limit, it is hard to interpret in isolation.
In regression analysis, the explanatory variable is also called the
independent variable.
In simple linear regression, the p-value of the F test identical to that of the t test because there
is only one slope coefficient being tested.
When two regression models applied on the same data set have the same response variable but a different number of explanatory variables, the model that would provide the better fit is the one with the
lower se and higher adjusted R2.
If the correlation between the response variable and the explanatory variables is sufficiently low, then adjusted R2
may be negative.
Serial correlation is typically observed in ______.
time series data
The purpose of the test of joint statistical significance is to
to determine if the group of explanatory variables have a statistical influence on the response variable.
SST represents
total variation in Y
Consider the simple linear regression model: y = β0 + β1x + ε. What is the notation for the response variable?
y
The test statistic for a test of joint significance is assumed to follow the Fdf1,df2 distribution and its value is calculated as
MSR / MSE
The coefficient of determination is calculated as
RegressionSumofSquaresTotalSumofSquaresRegressionSumofSquaresTotalSumofSquares.
All else being equal, if three competing models have adjusted R2 values equal to 0.45, 0.72 and 0.86, respectively, which model should be selected?
The model with adjusted R2 = 0.86.
Why is the stochastic model used in regression analysis in place of the deterministic model?
The relationship between the response and the explanatory variables is inexact.
A company found that a strong positive linear relationship exists between Y, its sales, and X, advertising expense. Predicting Y for values of X outside the range of the sample data is risky for which of the following reasons?
The relationship may not be linear for values of X outside the range of the sample data.
Predicting y for values of x outside the range of the sample data is risky for which of the following reasons?
The relationship may not be linear for values of x outside the range of the sample data.
Which of the following is true about residual plots? Select all that apply.
They help detect outliers. They often plot residuals on the y-axis and the explanatory variables or predicted values along the x-axis. They provide an informal way to examine assumptions of regression analysis.
True or false: Regression analysis can be used to build a model that uses information on an explanatory variable to predict changes in a response variable.
True
True or false: With simple linear regression, the test of joint significance is the same as the test of individual significance.
True
In a multiple regression model, which of the following do we employ in a test of joint significance of the slope coefficients?
Upper-tailed F test
In regression, multicollinearity is considered problematic when two or more explanatory variables are ______.
highly correlated
In the presence of multicollinearity, the ordinary least squares estimators of the intercept and slopes are ______.
imprecise and difficult to interpret.
In a regression model, the residual e is calculated as
y - yˆŷ
Which of the following is the correct form of the sample multiple regression equation?
ŷ = b0 + b1x1 + b2x2 + ... + bkxk
Which of the following can be used to test for a statistically significant relationship between the explanatory variables and the response variable? Select all that apply.
A test of regression coefficients jointly. Tests of individual regression coefficients.
If the sample regression equation is found to be yˆŷ = 10 + 2x1 - 3x2, which of the following is true?
1) THERE ARE TWO OF THESE!!! For every unit change in x1, the predicted value of the response variable goes up by 2, assuming x2 is held constant.
If the residuals are plotted across values of an explanatory variable resembles a 'U,' what might this indicate?
A violation of the linearity assumption.
Which of the following is a possible remedy for multicollinearity?
Collect more data
True or false: The only necessary test with multiple regression models is for joint significance of the regression coefficients.
False
Consider the multiple regression equation: SP = β0+β1(SF)+β2(AGE)+ε, where SP = selling price of the house, SF= square footage of the house and AGE = age of the house. To test whether SF and AGE have a joint influence on the selling price of the house, which null hypothesis is correct?
H0: β1= β2 = 0
Consider the multiple regression equation: SP = β0+β1(SF)+β2(AGE)+ε, where SP = selling price of the house, SF= square footage of the house and AGE = age of the house. To test whether SF and AGE have a joint influence on the selling price of the house, which alternative hypothesis is correct?
HA: At least 1 βj ≠ 0
Which one of these is a disadvantage of R2 as a goodness-of-fit measure?
It can be inflated by adding explanatory variables with no predictive value.
In the sample regression equation, yˆŷ = b0 + b1x, how is b1 interpreted?
It is the change in the response variable for every unit change in the explanatory variable.
In the sample regression equation, yˆŷ = b0 + b1x, how is b0 interpreted?
It is the predicted value of the response variable when the explanatory variable is 0.
Which of the following are accurate regarding the standard error of the estimate? Select all that apply.
It is the standard deviation of the residuals. It measures the spread of the points around the sample regression line.
Suppose that the slope parameter in a simple linear regression model is β1 = -5.12. What does this indicate about the nature of the relationship between x and y?
Negative linear relationship
Which of the following approaches is used to calculate the estimated intercept and slope? Select all that apply.
Ordinary Least Squares Method of Least Squares
Which of the following is NOT one of the assumptions of regression analysis?
Perfect multicollinearity exists among the explanatory variables.
Suppose that the slope parameter in a simple linear regression model is β1 = 3.52. What does this incidate about the nature of the relationship between x and y?
Positive linear relationship
Which of the following are assumptions of regression analysis? Select all that apply.
The error term ε is normally distributed. The variance of the error term ε changes across observations. The expected value of the error term ε is zero.
In the sample regression equation, yˆŷ = b0 + b1x, what does b0 represent?
The estimated intercept.
In the sample regression equation, yˆŷ = b0 + b1x, what does b1 represent?
The estimated slope.
If three competing models have standard errors of the estimate, se, equal to 0.45, 1.72 and 2.86, respectively, which model should be selected?
The model with se = 0.45.
In evaluating a regression model, why is a scatterplot a useful tool?
The scatterplot can be used to assess the linearity of the relationship.
Consider the multiple regression equation: With multiple regression analysis, multiple individual tests at a given α are not equivalent to a joint test for which of the following reasons?
The α for the joint test will be smaller than the α for each of the individual tests.
Unlike R2, adjusted R2 explicitly accounts for
the sample size and the number of explanatory variables.
SSR represents
the variation in the response variable explained by the model.
Consider the simple linear regression model y = β0+β1x+ε. What is the implication of β1 = 0?
x does not have a significant linear influence on y.
Consider the simple linear regression model: y = β0 + β1x + ε. What is the notation for the population intercept?
β0
Consider the following simple linear regression model: y = β0+ β1x + ε. When determining whether there is a negative linear relationship between x and y, the alternative hypothesis takes the form
β1 < 0.
Consider the following simple linear regression model: y = β0+β1x + ε. When determining whether there is a positive linear relationship between x and y, the alternative hypothesis takes the form
β1 > 0.