exam 3 QMB
The following regression model has been proposed to predict sales at a gas station: , where x1= competitor's previous day's sales (in $1,000s), x2= population within five miles (in 1,000s), x3= 1 if any form of advertising was used, 0 if otherwise, and = sales (in $1,000s). Predict sales (in dollars) for a store with competitor's previous day's sale of $3,000, a population of 10,000 within five miles, and six radio advertisements.
$86,000
In multiple regression analysis, if the estimated regression equation is , find the estimated value, given the first and second independent variables as 10 and 20, respectively.
-12.6486
The range of the Durbin-Watson statistic is:
0 to 4.
The tests of significance in regression analysis are based on assumptions about the error term ɛ. One such assumption is that the error term ɛ is a random variable with a mean or expected value of:
0.
When dealing with the problem of nonconstant variance, we can apply the reciprocal transformation, which means that we use:
1/y as the dependent variable instead of y.
Suppose, after calculating an estimated multiple regression equation, we find that the value of R2 is .9201. Interpret this value.
92.01% the variability in y can be explained by the estimated regression equation.
Which of the following variables is categorical?
Gender
When working with regression analysis, an outlier is:
any observation that does not fit the trend shown by the remaining data.
A multiple regression model has the form yhat=5+6x1-7x2. As x1 increases by 1 unit (holding x2 constant), the dependent variable is expected to:
increase by 6 units.
In general, R2 always _____ as independent variables are added to the regression model.
increases
In a regression analysis, if SSE = 200 and SSR = 300, then the coefficient of determination is:
.6
Multicollinearity can cause problems if the absolute value of the sample correlation coefficient exceeds:
.7 for any two of the independent variables.
The estimated regression equation, Yhat=-10.42+.79x , can be used to predict a company's sales volume (y), in millions, based upon its advertising expenditure (x), in $10,000s. What is the company's predicted sales volume if they spend $500,000 on advertising?
Approximately $395,000 Feedback: Incorrect. The tricky thing about this problem is to know what to substitute into the regression equation for x. $500,000 is (50)(10,000), so x = 50. Substituting x = 50 into the regression equation gives ŷ = 29.08, which is in millions of dollars. See Section 14.2, Least
Which of the following is not an iterative variable selection procedure?
Best-subsets regression
Which of the following options guarantees that the best model for a given number of variables will be found?
Best-subsets regression
What test can be used to determine whether first-order autocorrelation is present?
Durbin-Watson test
Using the two-factorial design,E=(y)=B0+B1X1+B2X2+B3X3+B4X4+B5X1X2+B6X1X3+B7X1X4 , write the form of the multiple regression equation that would give the expected value for level 1 of factor A and level 3 of factor B.
E(Y)=B0+B3
Write a multiple regression equation that can be used to analyze the data for a two-factorial design with two levels for factor A and four levels for factor B.
E(y)=B0+B1X1+B2X2+B3X3+B4X4+B5X1X2+B6X1X3+B7X1X4
A researcher is trying to decide whether or not to add another variable to his model. He currently has a first-order model with two predictor variables based upon a sample of 28 observations. For this model, SSE = 1425. Then, he estimated the data with a first-order model with an additional predictor variable x3. The SSE for the new model is 1300. We would like to know if the addition of the third predictor results in a significant reduction in the error sum of squares. Calculate the test statistic.
F = 2.31
A regression model involving 8 independent variables for a sample of 69 periods resulted in the following sum of squares: SSE = 306, SST = 1800. At α = .05, test to determine whether or not the model is significant. State the F value and your conclusion.
F = 36.62; p-value < .05. The model is significant.
A researcher is trying to decide whether or not to add another variable to his model. He currently has a first-order model with two predictor variables based upon a sample of 28 observations. For this model, SSE = 1425. Then, he estimated the data with a first-order model with an additional predictor variable x 3. The SSE for the new model is 1300. State the null and alternative hypotheses.
H0:B3=0, HA:B3 NOT= 0
Which of the following statements about the backward elimination procedure is false?
It begins with zero independent variables.
Suppose a multiple coefficient of determination coming from a regression analysis with 50 observations and 3 independent variables is .8455. Calculate the adjusted multiple coefficient of determination.
R-Sq(adj) = 83.54%
In best-subsets regression, Minitab can be used to provide output that identifies the two best one-variable estimated regression equations, the two best two-variable estimated regression equations, the two best three-variable estimated regression equations, and so on. What criterion is used in determining which estimated regression equations are best?
R2
Which of the following statements is false?
Regression analysis can be interpreted as a procedure for establishing a cause-and-effect relationship between variables.
If PV is < 5
The data provide evidence of a significant relationship between the number of people who use the public pool and the outside temperature.
When carrying out an F test to determine if the addition of extra predictor variables results in a significant reduction in the error sum of squares, what are the degrees of freedom of the numerator and denominator of the F statistic?
The numerator degrees of freedom equals the number of predictors added to the model. The denominator degrees of freedom is n—p—1.
A researcher is trying to decide whether or not to add another variable to his model. He currently has a first-order model with two predictor variables based upon a sample of 28 observations. For this model, SSE = 1425. Then, he estimated the data with a first-order model with an additional predictor variable x3. The SSE for the new model is 1300. We would like to know if the addition of the third predictor results in a significant reduction in the error sum of squares. What are the degrees of freedom of the numerator and denominator?
The numerator has 1 degree of freedom, and the denominator has 24 degrees of freedom.
A researcher is trying to decide whether or not to add another variable to his model. He currently has a first-order model with two predictor variables based upon a sample of 28 observations. For this model, SSE = 1425. Then, he estimated the data with a first-order model with an additional predictor variable x3. The SSE for the new model is 1300. We would like to know if the addition of the third predictor results in a significant reduction in the error sum of squares. What is the p-value? State your conclusion. Use = .05.
The p-value is greater than .10. We do not have enough evidence to conclude that the inclusion of x 3 results in a significant reduction in the error sum of squares.
When autocorrelation is present, which of the following assumptions is violated?
The variance is constant.
In order to use the output from a multiple regression model to perform the ANOVA test on the difference among the means of four populations, how many dummy variables do we need to use to indicate treatments?
Three
What value of Durbin-Watson statistic indicates that no autocorrelation is present?
Two
The mathematical equation that explains how the dependent variable y is related to several independent variables and has the form Y=B0+B1X1+BpXp+e is called:
a multiple regression model.
With negative autocorrelation, we expect a positive residual in one period to be followed by:
a negative residual in the next period, then a positive residual, and so on.
When studying the relationship between two quantitative variables, an interval estimate of the mean value of y for a given value of x is called a(n):
a. confidence interval.
The multiple regression equation based on the sample data, which has the form of yhat=b0+b1x1+b2x2...+bpxp , is called:
an estimated multiple regression equation.
Suppose a residual plot of x verses the residuals, y - ŷ, shows a nonconstant variance. In particular, as the values of x increase, suppose that the values of the residuals also increase. This means that:
as the values of x get larger, the ability to predict y becomes less accurate.
Correlation in the errors that arises when the error terms at successive points in time are related is called:
autocorrelation.
The variable selection procedure that identifies the best regression equation, given a specified number of independent variables, is:
best-subsets regression.
If the coefficient of determination is a positive value, then the coefficient of correlation:
can be either negative or positive.
The value of the coefficient of correlation (r):
can be equal to the value of the coefficient of determination (r^2).
In multiple regression analysis, the general linear model:
can be used to accommodate curvilinear relationships between the independent variables and the dependent variable.
The coefficient of determination:
cannot be negative.
When we use the estimated regression equation to develop an interval that can be used to predict the mean for ALL units that meet a particular set of given criteria, that interval is called a(n):
confidence interval.
Influential observations always:
decrease/increase the value of the slope. decrease/increase the value of the correlation. decrease/decrease the value of the y-intercept.
In regression analysis, the variable that is being predicted is the:
dependent variable.
A variable used to model the effect of categorical independent variables is called a(n):
dummy variable.
The term in the multiple regression model that accounts for the variability in y that cannot be explained by the linear effect of the p independent variables is the:
error term, e.
The model developed from sample data that has the form yhat=b0+b1x is known as the:
estimated regression equation.
If a significant relationship exists between x and y and the coefficient of determination shows that the fit is good, the estimated regression equation should be useful for:
estimation and prediction.
If the value of y in time period t is related to its value in time period t - 1, we say that:
first-order autocorrelation is present.
A model in the form of y=b0+b1z1+b2z2...+e , where each independent variable zj (for j = 1, 2,..., p) is a function of x1, x2,..., xk, is known as the:
general linear model.
Looking at the sample correlation coefficients between the response variable and each of the independent variables can give us a quick indication of which independent variables are, by themselves,
good predictors.
A researcher would like to know whether or not the addition of three variables to a model will result in a significant reduction in the error sum of squares. She currently has a first-order model with two predictor variables based upon a sample of 25 observations. For this model, SSE = 725. Then, she estimated the relationship with a first-order model with three additional predictor variables x3, x4, and x5. The SSE for the new model is 320.
h0: B3=B4=B5=0, HA: One or more of the parameters is not = to 0
In a multiple regression model, the variance of the error term, ε, is assumed to be:
he same for all values of x1, x2,..., xp.
Observations with extreme values for the independent variables are called:
high leverage points.
A regression model between sales ( in $1,000) and unit price (x1 in dollars) and television advertisement (x2 in dollars) resulted in the following function: . The coefficient of the unit price indicates that if the unit price is:
increased by $1 (holding advertisement constant), the sales are expected to decrease by $3,000.
The tests of significance in regression analysis are based on assumptions about the error term ɛ. One such assumption is that the values of ɛ are:
independent.
When we conduct significance tests for a multiple regression relationship, the t test can be conducted for each of the independent variables in the model. Each of those tests are called tests for:
individual significance.
An observation that has a strong influence or effect on the regression results is called a(n):
influential observation.
The effect of two independent variables acting together is called:
interaction.
If a categorical variable has k levels, then:
k - 1 dummy variables are needed.
Larger values of r2 imply that the observations are more closely grouped about the:
least squares line.
The method used to develop the estimated regression equation that minimizes the sum of squared residuals is called the:
least squares method.
The tests of significance in regression analysis are based on several assumptions about the error term ɛ. Additionally, we make an assumption about the form of the relationship between x and y. We assume that the relationship between x and y is:
linear.
The term used to describe the case when the independent variables in a multiple regression model are correlated is:
multicollinearity.
The proportion of the variability in the dependent variable that can be explained by the estimated multiple regression equation is called the:
multiple coefficient of determination.
The study of how a dependent variable y is related to two or more independent variables is called:
multiple regression analysis.
When constructing a confidence or a prediction interval to quantify the relationship between two quantitative variables, the appropriate degrees of freedom are:
n - 2.
A t test for slope is based on a t distribution with _____ DF
n-2
The tests of significance in regression analysis are based on assumptions about the error term ɛ. One such assumption is that the error term follows ɛ a(n) _____ distribution for all values of x.
normal
A graph of the standardized residuals plotted against values of the normal scores that helps to determine whether the assumption that the error term has a normal probability distribution appears to be valid is called a:
normal probability plot.
In a multiple regression model, the values of the error term, ε, are assumed to be:
normally distributed.
The parameters of nonlinear models have exponents:
other than one.
All things held constant, which interval will be wider: a confidence interval or a prediction interval?
prediction interval
When studying the relationship between two quantitative variables, whenever we want to predict an individual value of y for a new observation corresponding to a given value of x, we should use a(n):
prediction interval.
When we use the estimated regression equation to develop an interval that can be used to predict the mean for a specific unit that meets a particular set of given criteria, that interval is called a(n):
prediction interval.
The mathematical equation relating the independent variable to the expected value of the dependent variable, E(y)=B0+B1x , is known as the:
regression equation.
In regression analysis, the equation in the form y = 𝛽0 + 𝛽1x + ε is called the:
regression model.
The difference between the observed value of the dependent variable and the value predicted using the estimated regression equation is called a(n):
residual.
The regression model Y=B0+B1X1+B2X1^2+E is a:
second-order model with one predictor variable.
The regression model Y=B0+B1X1+B2X2+B3X1^2+B3X1^2+B5X1X2+E is a:
second-order model with two predictor variables.
An F test, based on the F probability distribution, can be used to test for:
significance in regression.
The Durbin-Watson test is generally inconclusive for:
smaller sample sizes.
When determining the best estimated regression equation to model a set of data, the procedure that allows an independent variable to enter the model at one step, be removed at a subsequent step, and then enter the model at a later step is:
stepwise regression.
When determining the best estimated regression equation to model a set of data, the procedure that begins each step by determining whether any of the variables already in the model should be removed is called:
stepwise regression.
When constructing a confidence or a prediction interval to quantify the relationship between two quantitative variables, what distribution do confidence and prediction intervals follow?
t distribution
If a residual plot of x versus the residuals, y - ŷ, shows a non-linear pattern, then we should conclude that:
the regression model is not an adequate representation of the relationship between the variables.
The tests of significance in regression analysis are based on assumptions about the error term ɛ . One such assumption is that the variance of ɛ, denoted by 𝝈2, is:
the same for all values of x.
In a regression analysis, an outlier will always increase:
the value of the correlation.
In multiple regression analysis:
there can be several independent variables, but only one dependent variable.
Suppose a high correlation existed between variables x 1 and x 2 . If variable x 1 was used as an independent variable, then variable x 2 :
would not add much more explanatory power to the current model.
In a regression analysis, the error term ε is a random variable with a mean or expected value of
zero.