ECO MW Final
The following regression model has been proposed to predict sales at a gas station: , where x 1= competitor's previous day's sales (in $1,000s), x 2= population within 5 miles (in 1,000s), x 3= 1 if any form of advertising was used, 0 if otherwise, and = sales (in $1,000s). Predict sales (in dollars) for a store with competitor's previous day's sale of $5,000, a population of 15,000 within 5 miles, and five radio advertisements. $113,000 $104,000 $98,000 $158,000
$113,000
The following estimated regression model was developed to predict yearly income (in $1,000s) of 30 individuals with their age (x1) and their gender (x2) (0 if male and 1 if female) for a sample of 50 engineers. yhat= -10+4x1+7x2 What is the estimated income of a 30-year-old female? $110,000 $127,000 $117,000 $137,000
$117,000
The following regression model has been proposed to predict sales at a gas station: , where x1= competitor's previous day's sales (in $1,000s), x2= population within five miles (in 1,000s), x3= 1 if any form of advertising was used, 0 if otherwise, and = sales (in $1,000s). Predict sales (in dollars) for a store with competitor's previous day's sale of $3,000, a population of 10,000 within five miles, and six radio advertisements. $78,000 $86,000 $82,000 $176,000
$86,000
In multiple regression analysis, if the estimated regression equation is yhat1= .7904+.2345x1-.7892x2 , find the estimated value, given the first and second independent variables as 10 and 20, respectively. -12.6486 14.2294 -2.4116 3.9924
-12.6486 yhat1= .7904 + .2345(10) - .7892(20) = -12.6486
In multiple regression analysis, any observation with a standardized residual of less than _____ or greater than _____ is known as an outlier. -3; 3 -4; 4 -2; 2 -1; 1
-2; 2
For a multiple regression model, SSR = 600 and SSE = 200. The multiple coefficient of determination is: .333. .30. .275. .75.
.75. For a multiple regression model, SSR = 600 and SSE = 200. The multiple coefficient of determination is .75. r squared= ssr/sst = 600/ 200+600 =6000/800 = .75 .
In a multiple regression model, the error term ε is assumed to have a mean of: -1. 1. 0. μ
0
The tests of significance in regression analysis are based on assumptions about the error term ɛ. One such assumption is that the error term ɛ is a random variable with a mean or expected value of: 0. ŷ x̄ 1.
0
The estimated regression equation, , can be used to predict a company's sales volume (y), in millions, based upon its advertising expenditure (x), in $10,000s. What is the company's predicted sales volume if they spend $500,000 on advertising? Approximately -$6.5 million Approximately $29 million Approximately $50 million Approximately $395,000
Approximately $29 million
Suppose a multiple coefficient of determination coming from a regression analysis with 50 observations and 3 independent variables is .8455. Calculate the adjusted multiple coefficient of determination. R-Sq(adj) = 74.46% R-Sq(adj) = 87.10% R-Sq(adj) = 83.54% R-Sq(adj) = 90.06%
R-Sq(adj) = 83.54% R-Sq(adj) = . In this situation, n = 50 and p = 3. Substituting these values gives R-Sq(adj) =
Which of the following statements is false? +In the estimated simple linear regression equation, b0 is the y-intercept and b1 is the slope. +In practice, parameter values are not known and must be estimated using sample data. +Regression analysis can be interpreted as a procedure for establishing a cause-and-effect relationship between variables. +ŷ is the point estimator of E(y) , the mean value of y for a given value of x.
Regression analysis can be interpreted as a procedure for establishing a cause-and-effect relationship between variables.
Below is a portion of the computer output for a regression analysis relating y = number of people who use the public pool to x = the outside temperature.
The p-value < .05. The data provide evidence of a significant relationship between the number of people who use the public pool and the outside temperature.
The mathematical equation relating the expected value of the dependent variable to the value of the independent variables, which has the form of , is called: a simple linear regression model. an estimated multiple regression equation. a multiple regression model. a multiple regression equation.
a multiple regression equation.
The mathematical equation that explains how the dependent variable y is related to several independent variables and has the form is called: a simple linear regression model. an estimated multiple regression equation. a multiple regression model. a multiple regression equation.
a multiple regression model.
If you suspect that you have an influential observation, the first thing you should do is: increase the value of the slope. re-record the data to see if the observation shows up again. remove the influential observation from the data set. check to make sure no error has been made in collecting or recording data.
check to make sure no error has been made in collecting or recording data.
When studying the relationship between two quantitative variables, an interval estimate of the mean value of y for a given value of x is called a(n): a. confidence interval. b. estimation interval. c. determination interval. d. prediction interval.
confidence interval.
When we use the estimated regression equation to develop an interval that can be used to predict the mean for ALL units that meet a particular set of given criteria, that interval is called a(n): confidence interval. estimation interval. prediction interval. population interval.
confidence interval.
In regression analysis, the variable that is being predicted is the: dependent variable. confounding variable. random variable. independent variable.
dependent variable.
A variable used to model the effect of categorical independent variables is called a(n): explanatory variable. dummy variable. categorical variable. quantitative variable.
dummy variable.
The term in the multiple regression model that accounts for the variability in y that cannot be explained by the linear effect of the p independent variables is the: error term, E correlation coefficient, r. leading coefficient, .Bo response variable, .y hat
error term "E"
The model developed from sample data that has the form is known as the: simple linear regression equation. estimated simple linear regression equation. simple linear regression model. correlation equation.
estimated simple linear regression equation.
If a significant relationship exists between x and y and the coefficient of determination shows that the fit is good, the estimated regression equation should be useful for: determining nonresponse error. estimation and prediction. determining cause and effect. extrapolation.
estimation and prediction.
Which of the following variables is categorical? Age Weight Height Gender
gender
Observations with extreme values for the independent variables are called: outliers. influential observations. high leverage points. mistakes.
high leverage points.
A multiple regression model has the form . As x1 increases by 1 unit (holding x2 constant), the dependent variable is expected to: increase by 11 units. increase by 6 units. decrease by 11 units. decrease by 6 units.
increase by 6 units.
A regression model between sales ( in $1,000) and unit price (x1 in dollars) and television advertisement (x2 in dollars) resulted in the following function: . The coefficient of the unit price indicates that if the unit price is: increased by $1 (holding advertisement constant), the sales are expected to increase by $3. increased by $1 (holding advertisement constant), the sales are expected to increase by $6,000. decreased by $1 (holding advertisement constant), the sales are expected to decrease by $3. increased by $1 (holding advertisement constant), the sales are expected to decrease by $3,000.
increased by $1 (holding advertisement constant), the sales are expected to decrease by $3,000.
In general, R2 always _____ as independent variables are added to the regression model. decreases stays the same increases increases or decreases depending on how the variables relate to the response variable.
increases
In a multiple regression model, the values of the error term, ε, are assumed to be: zero. independent of each other. dependent on each other. always negative.
independent of each other.
The tests of significance in regression analysis are based on assumptions about the error term ɛ. One such assumption is that the values of ɛ are: independent. uniformly distributed. limited. categorical.
independent.
When we conduct significance tests for a multiple regression relationship, the t test can be conducted for each of the independent variables in the model. Each of those tests are called tests for: complete significance. overall significance. individual significance. pairwise significance.
individual significance.
An observation that has a strong influence or effect on the regression results is called a(n): outlier. influential observation. residual. mistake.
influential observation.
If a categorical variable has k levels, then: k dummy variables are needed. k + 1 dummy variables are needed. k - 1 dummy variables are needed. n dummy variables are needed.
k - 1 dummy variables are needed.
Larger values of r2 imply that the observations are more closely grouped about the: average value of the independent variables. least squares line. origin. average value of the dependent variable.
least squares line.
The method used to develop the estimated regression equation that minimizes the sum of squared residuals is called the: least significant difference method. linear regression model. least squares method. multiple regression technique.
least squares method.
The tests of significance in regression analysis are based on several assumptions about the error term ɛ. Additionally, we make an assumption about the form of the relationship between x and y. We assume that the relationship between x and y is: constant. linear. quadratic. exponential.
linear.
The term used to describe the case when the independent variables in a multiple regression model are correlated is: regression. multicollinearity. explanatory correlation. causation.
multicollinearity.
The proportion of the variability in the dependent variable that can be explained by the estimated multiple regression equation is called the: correlation. multiple coefficient of determination. error term. slope of the least squares regression line.
multiple coefficient of determination.
The study of how a dependent variable y is related to two or more independent variables is called: least significant difference analysis. multiple regression analysis. linear regression analysis. factorial design analysis.
multiple regression analysis.
When constructing a confidence or a prediction interval to quantify the relationship between two quantitative variables, the appropriate degrees of freedom are: n - 1. (r - 1)(c - 1). k - 1. n - 2.
n - 2.
The tests of significance in regression analysis are based on assumptions about the error term ɛ. One such assumption is that the error term follows ɛ a(n) _____ distribution for all values of x. binomial normal uniform exponential
normal
A ________ is a graph of the standardized residuals plotted against values of the normal scores. This helps to determine whether the assumption that the error term has a normal probability distribution appears to be valid. scatter diagram. regression plot. normal probability plot. residual plot.
normal probability plot.
In a multiple regression model, the values of the error term, ε, are assumed to be: uniformly distributed. skewed to the right. normally distributed. skewed to the left.
normally distributed.
When we conduct significance tests for a multiple regression relationship, the F test will be used as the test for: complete significance. overall significance. individual significance. pairwise significance.
overall significance.
All things held constant, which interval will be wider: a confidence interval or a prediction interval? confidence interval The confidence interval and the prediction interval will have the same width. prediction interval It cannot be determined from the information given.
prediction interval
When studying the relationship between two quantitative variables, whenever we want to predict an individual value of y for a new observation corresponding to a given value of x, we should use a(n): confidence interval. estimation interval. determination interval. prediction interval.
prediction interval.
When we use the estimated regression equation to develop an interval that can be used to predict the mean for a specific unit that meets a particular set of given criteria, that interval is called a(n): confidence interval. estimation interval. prediction interval. population interval.
prediction interval.
Graphical representation of the residuals that can be used to determine whether the assumptions made about the regression model appear to be valid is called a: scatter diagram. regression plot. normal probability plot. residual plot.
residual plot.
The difference between the observed value of the dependent variable and the value predicted using the estimated regression equation is called a(n): residual. prediction. point estimate. outlier.
residual.
Since the multiple regression equation generates a plane or surface, its graph is called a: dependent variable plane. dependent variable graph. response plane. response surface.
response surface
An F test, based on the F probability distribution, can be used to test for: equality of the means of two populations. significance in the relationship between two categorical variables. significance in regression. equality of two population proportions.
significance in regression.
The mathematical equation relating the independent variable to the expected value of the dependent variable, , is known as the: simple linear regression equation. estimated regression equation. regression model. correlation equation.
simple linear regression equation.
In regression analysis, the equation in the form y = 𝛽0 + 𝛽1x + ε is called the: regression equation. estimated regression equation. simple linear regression model. correlation equation.
simple linear regression model.
Below is a portion of the computer output for a regression analysis relating y = number of people who use the public pool to x = the outside temperature. predictor constant temp: s=1.198 coef 57.912 0.81138 R-sq = 94.2% Predict approximately how many people will use the public pool in a day when the temperature is 90 degrees. 58 90 131 81
131
In a regression analysis, if SSE = 200 and SSR = 300, then the coefficient of determination is: .67. .4. 1.5. .6.
6 If SSE = 200 and SSR = 300, then the coefficient of determination is .6. The value of r2 = SSR/(SSE + SSR) = 300/(200 + 300) = .6. See Section 14.3, Coefficient of Determination.
The following data show the results of an aptitude test and the grade point average of 10 students. The t test for a significant relationship between GPA and Aptitude Test Score is based on a t distribution with _____ degrees of freedom. 7 9 10 8
8 A t test for slope is based on a t distribution with n - 2 degrees of freedom. The sample size for this problem is n = 10, so there are 8 degrees of freedom. See Section 14.5, Testing for Significance.
Suppose, after calculating an estimated multiple regression equation, we find that the value of R2 is .9201. Interpret this value. 92.01% the variability in can be explained by the estimated regression equation. 92.01% the variability in y can be explained by the estimated regression equation. 95.9% the variability in can be explained by the estimated regression equation. 95.9% the variability in y can be explained by the estimated regression equation.
92.01% the variability in y can be explained by the estimated regression equation.
A multiple regression model has the form . Predict when and x1 and x2. = 3 = 10 = 4 = 31
= 31 See Section 15.2, Least Squares Method. yhat=5 + 6(2) - 7(-2)
A data set has 10 observations. Observation 5 seems much larger than the other observations. When looking back at the data, you notice a mistake has been made in recording observation 5. Including this wrongly recorded observation in the data set has a substantial effect on the goodness of fit. What should you do with the wrongly recorded observation? Keep the wrongly recorded observation as is. Completely remove the observation, leaving the data set with 9 total observations. Re-record every observation. Change it to the correctly recorded observation.
Change it to the correctly recorded observation.
A regression model involving 8 independent variables for a sample of 69 periods resulted in the following sum of squares: SSE = 306, SST = 1800. At α = .05, test to determine whether or not the model is significant. State the F value and your conclusion. F = 36.62; p-value < .05. The model is significant. F = 25.33; p-value < .05. The model is significant. F = 36.62; p-value > .05. The model is not significant. F = 25.33; p-value > .05. The model is not significant.
F = 36.62; p-value < .05. The model is significant. F = 36.62; p-value < .05. The model is significant. Complete an ANOVA table to calculate the F statistic. See Section 15.5, Testing for Significance.
The following data show the results of an aptitude test and the grade point average of 10 students. At 95% confidence, test to determine if the model is significant (Perform an F test). What is the test statistic and p-value ? F = 29.07 and p-value = .2 F = 39.07 and p-value = .0002 F = 49.07 and p-value = .2 F = 29.07 and p-value = .0002
F = 39.07 and p-value = .0002 he test statistic is F = 39.07, and the p-value = .0002. F = MSR/MSE. The p-value is based on an F distribution with 1 degree of freedom in the numerator and n − 2 degrees of freedom in the denominator. See Section 14.5, Testing for Significance.
A regression was performed on a sample of 16 observations. The estimated equation is . yhat= 3.5-15.54x1+1.4x2+12.6x3 The standard errors for the coefficients are , , and . Test for the significance of β1, β2, and β3 at the 5% level of significance. Factor x1 and factor x2 are statistically significant. Factor x2 and factor x3 are statistically significant. Factor x1 and factor x3 are statistically significant. Factor x2 alone is statistically significant.
Factor x1 and factor x3 are statistically significant. how to find this out: divide x1 (-15.54) by the b1 (4.2) = -3.7 1.4/4.6 = .25 12.6/2.8 = 4.5.
In a simple linear regression model, the error term ε accounts for the variability in ______ that cannot be explained by the linear relationship between x and y. y the mean x the standard deviation
y
The following data represent a company's yearly sales volume and its advertising expenditure over a period of 8 years. Identify the independent and dependent variables. The independent variable is the advertising expenses, and the dependent variable is sales. The independent variable is the advertising expenses, and the dependent variable is time. The independent variable is time, and the dependent variable is sales. The independent variable is sales, and the dependent variable is the advertising expenses.
The independent variable is the advertising expenses, and the dependent variable is sales.
The following data represent a company's yearly sales volume and its advertising expenditure over a period of 8 years. Use the least squares method to develop the estimated regression equation.
yhat=-10.42+.79x
The value of the coefficient of correlation (r): can be equal to the value of the coefficient of determination (r2). is always smaller than the value of the coefficient of determination. is always larger than the value of the coefficient of determination. can never be equal to the value of the coefficient of determination (r2).
can be equal to the value of the coefficient of determination (r2).
The multiple regression equation based on the sample data, which has the form of , is called: a simple linear regression model. an estimated multiple regression equation. a multiple regression model. a multiple regression equation.
an estimated multiple regression equation.
When working with regression analysis, an outlier is: any observation that does not fit the trend shown by the remaining data. any observation that is extreme in the x direction. any value that has a small residual. any value that falls more than 1.5(IQR) above Q3 or below Q1
any observation that does not fit the trend shown by the remaining data.
Suppose a residual plot of x verses the residuals, y - ŷ, shows a nonconstant variance. In particular, as the values of x increase, suppose that the values of the residuals also increase. This means that: as the values of x get larger, the ability to predict y becomes less accurate. as the values of x get larger, the values of y become larger. as the values of x get larger, the standard deviation of the residuals becomes smaller. as the values of x get larger, the error term, , becomes smaller.
as the values of x get larger, the ability to predict y becomes less accurate.
The following data show the results of an aptitude test and the grade point average of 10 students. If GPA and Aptitude Test Scores are linearly related, which of the following must be true?
b1 not = 0
If the coefficient of determination is a positive value, then the coefficient of correlation: must also be positive. must be zero. can be either negative or positive. must be larger than 1.
can be either negative or positive.
The coefficient of determination: cannot be negative. is the same as the coefficient of correlation. can be negative or positive. is the square root of the coefficient of correlation.
cannot be negative.
The following data show the results of an aptitude test and the grade point average of 10 students. Does the t test indicate a significant relationship between GPA and Aptitude Test Score? State the test statistic, and then state your conclusion using ⍺ = .05. t = 2.25. The p-value is greater than .05, so the evidence is not sufficient to conclude that a significant relationship exists between GPA and Aptitude Test Scores. t = 6.25. The p- value is less than .05, so the evidence is sufficient to conclude that a significant relationship exists between GPA and Aptitude Test Scores. t = 8.25. The p-value is less than .05, so the evidence is sufficient to conclude that a significant relationship exists between GPA and Aptitude Test Scores. t = 4.25. The p-value is greater than .05, so the evidence is not sufficient to conclude that a significant relationship exists between GPA and Aptitude Test Scores.
t = 6.25. The p- value is less than .05, so the evidence is sufficient to conclude that a significant relationship exists between GPA and Aptitude Test Scores.
When constructing a confidence or a prediction interval to quantify the relationship between two quantitative variables, what distribution do confidence and prediction intervals follow? Chi-Square distribution t distribution Uniform distribution Normal distribution
t distribution
If a residual plot of x versus the residuals, y - ŷ, shows a non-linear pattern, then we should conclude that: the regression model was not based upon a large enough sample size. the regression model is useful for making predictions. the regression model is not an adequate representation of the relationship between the variables. the regression model describes the relationship between x and y very well.
the regression model is not an adequate representation of the relationship between the variables.
The tests of significance in regression analysis are based on assumptions about the error term ɛ . One such assumption is that the variance of ɛ, denoted by 𝝈2, is: greater as x increases. the same for all values of x. unrelated to the value of x. less as x increases.
the same for all values of x.
In a multiple regression model, the variance of the error term, ε, is assumed to be: 0. the same for all values of x1, x2,..., xp. 1. larger as the values of x increase.
the same for all values of x1, x2,..., xp.
In multiple regression analysis: there can be any number of dependent variables, but only one independent variable. the coefficient of determination must be larger than 1. there must be only one independent variable. there can be several independent variables, but only one dependent variable.
there can be several independent variables, but only one dependent variable.
Dummy variables must always have: a value of 0. values of either 0 or 1. a value of 1. positive values.
values of either 0 or 1.