Statistics Ch. 12, 13
In a regression study, b2=3.67 and sb2=0.92. Calculate the value of the test statistic for testing H0: B2=0 versus HA: B2 does not =0
+3.99
In a regression study, b2=3.67 and sb2=0.92. Calculate the value of the test statistic for testing H0: B2=0 versus HA: B2 does not =0.
+3.99 3.67-0/0.92
If the sample regression equation is found to be y=10-2x1-3x2, what is the predicted value of y when x1=4 and x2=1?
-1
In a regression study, b4=-8.67 and sb4=2.92. Calculate the value of the test statistic for testing H0: b4=0 versus HA: B4 does not =0.
-2.97 -8.67-0/2.92
In a simple linear regression model, if all of the data points fall on the sample regression line, then the standard error of the estimate is
0
What values can the coefficient of determination, R^2, assume?
0 < r 2 < 1
At the 5% significance level, which of the following p-values from a regression analysis will lead to the rejection of H0: B1=0
0.0341
The summary information in regression output gives the value of the tdf test statistic and its associated p-value for a two-tailed test as -0.762 and 0.452, respectively. Which of the following will be the p-value if a one-tailed test is specified?
0.226 0.452/2
If the sample regression equation is found to be y*= 10+2x1-3x2, what is the predicted value of y when x1=4 and x2=1?
15
If the sample regression equation is found to be Y*=30+5x, what is the estimate for the slope coefficient B1?
5 The estimated slope is the coefficient of x, which is equal to 5.
If the sample regression equation is found to be y=20+10x, what is the predicted value of y when x=3
50
Which of the following is a possible effect of multicollinearity in regression analysis?
A large value of R^2 and individually insignificant explanatory variables.
In simple linear regression, a downward sloping trend line suggests which of the following?
A negative linear relationship between x and y.
Which of the following is a good way to summarize the results of a regression analysis?
A user-friendly table showing relevant statistics
What is the multiple regression model for the response variable of annual salary (AS) and explanatory variables of years of education (ED) and years of experience (EXP)?
AS=B0+B1ED+B2EXP+E
The coefficient of determination can assume which of the following values?
Between zero and one
Which of the following is a goodness-of-fit measure?
Coefficient of determination
Which of the following is a possible remedy for multicollinearity?
Collect more data
From a study concerning the linear relationship between x1 and y, the test H0: B1=0 versus HA: B1 does not =0, had a p-value =0.1588. What is the decision at 5% significance level?
Do not reject H0; there is not a linear relationship between x1 and y.
In regression analysis, how are statistical tests of significance conducted on dummy variables?
Dummy variables are treated just like other explanatory variables.
Consider the simple linear regression model: y=B0+B1xE. What is the notation for the random error term?
E
Implementing the instrumental variable technique is a remedy to which common violation in regression?
Endogeneity
True or false: When there are two dummy variables in a model, tests of significance cannot be performed.
False
Consider the following sample regression equation: y=17+5x1+3x2. Interpret the value 5.
For a unit increase in x1, the average value of y will increase by 5 units, holding x2 constant.
Consider the multiple regression equation: SP=B0+B1(SF)+B2(AGE)+E, where SP = selling price of the house, SF = square footage of the house and AGE = age of the house. To test whether SF and AGE have a joint influence on the selling price of the house, which null hypothesis is correct?
H0: B1=B2=0
When a plot of the residual against time shows a wavelike pattern, which of the assumptions is likely violated?
Independence of the error term
Which of the following is NOT true of the standard error of the estimate?
It can take on negative values
In a simple linear regression study, the R^2 for Model 1 is 0.9425 and the R^2 Model 2 is 0.7248. Which model should be used in predicting y?
Model 1
In a simple linear regression study, the R^2 for Model 1 is 0.9425 and the R^2 for Model 2 is 0.7248. Which model should be used in predicting y?
Model 1
Dropping a redundant explanatory variable from the model is a remedy to which common violation in regression?
Multicollinearity
Which of the following explanatory variables will most likely display some multicollinearity in regression analysis?
Number of absences from a class and the midterm average in estimating final course average.
Quantitative variables assume what type of values?
Numerical
How many explanatory variables does a simple linear regression model have?
One
Suppose that the slope parameter in a simple linear regression model is B1=3.52. What does this possibly suggest about the nature of the relationship between x and y?
Positive linear relationship
Which of the following is represented by the intercept b0 in the estimated regression equation for a multiple-category qualitative variable?
Reference category
From a study concerning the linear relationship between x1 and y, the test H: B1=0 versus HA: B1 does not =0, had a p-value =0.0002. What is the decision at a 1% significance level?
Reject H0; there is a linear relationship between x1 and y. Since the p-value = 0.0002<0.01, we reject H0 and conclude that there is a linear relationship between x1 and y
Multiple individual tests at a given a are not equivalent to a joint test for which of the following reasons?
The a for the joint test will be larger than the a for each of the individual tests.
Which of the following is an assumption of regression analysis?
The error term E is normally distributed.
If the sample regression equation is y*=15+5x, which of the following is the correct interpretation of 15?
The line crosses the y axis at y=15
If three competing models have adjusted R^2 values equal to 0.45, 0.72, and 0.86, respectively, which model should be selected?
The model with adjusted R^2 = 0.86
Why is the stochastic model used in regression analysis in place of the deterministic model?
The relationship between the response and the explanatory variables is inexact.
A medical journal concluded that a positive linear relationship exists between Y-IQ and X=brain size. Predicting Y for values of X outside the range of the sample data is risky for which of the following reasons?
The relationship may not be linear for values of X outside the range of the sample data.
For which of the following situations is a simple linear regression model appropriate?
The response variable y is influenced by one explanatory variable.
If in a study the 95% confidence interval for the average selling price of a house (in $1000s) is found to be [200, 230], then which of the following is the correct interpretation?
We are 95% confident that the average selling price of a house lies between $200,000 and $230,000.
When is the multiple regression model used?
When the researcher believes that two or more explanatory variables influence the response variable.
The estimated linear regression equation is y=-15+3x. Interpret the intercept.
When x=0, the predicted value of y is -15
When the need is to estimate the expected value of y for a given value of x, the interval estimate to be constructed is called
a confidence interval
Suppose you estimate the model y=B0+B1x1+B2x2+E. Which of the following is being tested if the null hypothesis states that B2 is less than or equal to zero and the alternative hypothesis states that B2 is greater than 0?
a positive linear relationship between x2 and y.
When the need is to estimate an individual value of y for a given value of x, the interval estimate to be constructed is called
a prediction interval
Consider the multiple regression equation: y=B0+B1x1+B2x2+E, If a joint test of significance leads to a rejection of the null hypothesis, then
at least one explanatory variable is significant.
In a regression analysis, if we reject the null hypothesis in a test of joint significance, then we conclude that
at least one of the explanatory variables influences y.
Consider the model y=B0+B1x+B2d+E, where x is a quantitative variable and d is a dummy variable. If d increases from 0 to 1, the change in the intercept is given by:
b2
If one or more of the explanatory variables are endogenous, then the ordinary least squares estimators are
biased
Changing variance of the error term in a regression model is typically observed in
cross-sectional data
Unlike R^2, adjusted R^2 can be used to compare regression models with
different numbers of explanatory variables
One way to detect multicollinearity is to
examine the correlation between the explanatory variables.
In regression analysis, the explanatory variable is also called the
independent variable
A qualitative variable with two categories is often referred to as a(n)
indicator variable
In simple linear regression, the p-value of the F test identical to that of the t test because there
is only one slope coefficient being tested.
When two regression models applied on the same data set have the same response variable but a different number of explanatory variables, the model that would provide the better fit is the one with the
lower se and higher adjusted R^2.
In regression analysis, MSE represents the
mean error of sum of squares
The common approach to fitting a line to sample data in a scatterplot is to
minimize the value of the sum of the squared residuals.
The number of dummy variables representing a qualitative variable should be:
one less than the number of categories of the variable.
The R^2 of a multiple regression of y as a function of x measures the
percentage variability of y that is explained by the variability of x.
In regression analysis, the _____ for y will be wider than the _____ for y because it also incorporates the error term E0.
prediction interval, confidence interval
In regression analysis, a(n) _____ is used to detect common violations of the assumptions.
residual plot
In regression analysis, quantitative variables can be used as
response and explanatory variables
Which of the following measures are used to judge the goodness-of-fit of a regression model?
se, R^2, and adjusted R^2
Which of the following best completes the following statement? In hypothesis testing, if sample data lead to the rejection of the null hypothesis, then we say that the test is
statistically significant
A dummy variable d is defined as a variable that
takes on values of 0 or 1
The 90% confidence interval for the expected value of the response variable y is [4.67, 7.24]. This means
that with 90% confidence, E(y) lies within the interval.
The sample variance of the residual, se^2, is defined as
the average of the squared differences between y1 and ^y1.
How can we avoid the dummy variable trap when using a multiple-category qualitative variable?
the number of dummy variables should be one less than the number of categories.
Which of the following is the notation used for the predicted value of the response variable in the simple linear regression model y=B0+B1x1+E?
y
Consider the model y=B0+B1x+B2d+E, where x is a quantitative variable and d is a dummy variable. If a t test shows that the model coefficient of d is significant, then
y depends on the two categories of d.
Consider the model y=B0+B1x+B2d, where x is a quantitative variable and d is a dummy variable. For d=0, the predicted value of y is computed as:
y=b0+b1x
If a plot of the residuals against one of the explanatory variables reveals a 'fanning out' across the horizontal axis, then it is likely that _____ is a problem in this application.
heteroskedasticity
All of the following are goodness-of-fit measures EXCEPT
the coefficient of variation. This is a relative measure of dispersion, not a goodness-of-fit measure.
The residual e represents
the difference between an observed and predicted value of the response variable at a given value of the explanatory variable.
Endogeneity occurs when
the error term is correlated with explanatory variables.
SSR represents
the explained variation in the response variable.
In the presence of correlated observations,
the value of R^2 is likely to overstate the model's usefulness.
Heteroskedasticity means that
the variance of the error term is not the same for all observations.
Correlated observations are typically observed in
time series data
Consider the following simple linear regression model: y=B0+B1x+E. When determining whether there is a positive linear relationship between x and y, the alternative hypothesis takes the form
B1>0
The standard error of the estimate can assume which of the following values?
Between zero and infinity
Implementing a correction for the standard errors is a remedy to which common violation in regression?
Correlated observations Changing variability
In a regression, if the Multiple R equals 0.80, then R^2 equals
0.64
If X=1 for age>18 years and 0 otherwise, what type of variable is X?
Dummy variable
The standard error of the estimate is calculated as
SqRt Error sum of Squares/n-k-1
In regression analysis, a qualitative variable can be used as
a response variable and/or explanatory variable.
In the presence of multicollinearity, the ordinary least squares estimators are
unbiased and efficient
In the presence of changing variability over a cross-section of data, the ordinary least squares estimators are
unbiased, but inefficient
Consider the simple linear regression model y=B0+B1x+E. If at the 5% significance level we reject H0: B1=0, then we conclude that
x has a significant linear influence on y.