Chapter 7: Linear Regression
The following regression model has been proposed to predict sales at a gas station where x1 = their previous day's sales (in $1,000's), x2 = population within 5 miles (in 1,000's), x3 = 1 if any form of advertising was used, 0 if otherwise, and sales (in $1,000's). Predict sales (in dollars) for a store with competitor's previous day's sale of $3,000, a population of 10,000 within 5 miles, and six radio advertisements.
$86,000
The following regression model has been proposed to predict sales at a gas station where x1 = their previous day's sales (in $1,000's), x2 = population within 5 miles (in 1,000's), x3 = 1 if any form of advertising was used, 0 if otherwise, and sales (in $1,000's). Predict sales (in dollars) for a store with competitor's previous day's sale of $3,000, a population of 10,000 within 5 miles, and twenty radio advertisements.
$86,000
What would be the value of the sum of squares due to regression (SSR) if the total sum of squares (SST) is 22.21 and the sum of squares due to error (SSE) is 6.89?
15.32 SST = SSR + SSE
Regression analysis involving one independent variable and one dependent variable is referred to as
simple linear regression
In the simple linear regression equation, the parameter B1 is the _______ of the true regression line.
slope
When we use the estimated regression equation to develop an interval that can be used to predict the mean for ALL units that meet a particular set of given criteria, that interval is called a
confidence interval
The process of making conjecture about the value of a population parameter, collecting sample data that can be used to assess this conjecture, measuring the strength of the evidence against the conjecture that is provided by the sample, and using these results to draw a conclusion about the conjecture is known as
hypothesis testing
A variable used to model the effect of categorical independent variables is called a
dummy variable
In the simple linear regression model, the ________ accounts for the variability in the dependent variable that cannot be explained by the linear relationship between x and y.
error term
The term in the multiple regression model that accounts for the variability in y that cannot be explained by the linear effect of the q independent variables is the
error term ε
The _____ is the range of values of the independent variables in the data used to estimate the regression model.
experimental region
Prediction of the value of the dependent variable outside the experimental region is called
extrapolation
The graph of the simple linear regression equation is a(n) _____.
line
The study of how a dependent variable y is related to two or more independent variables is called
multiple linear regression
The tests of significance in regression analysis are based on assumptions about the error term ε. One such assumption is that the error term ε follows a ________ distribution for all values of x.
normal
When there are many independent variables to consider, special procedures are sometimes employed to select the independent variables to include in the regression model. All of the following are examples of variable selection procedures except for
overfitting -forward selection, backward elimination, and best subsets are example of variable selection procedures
The process of making estimates and drawing conclusions about one or more characteristics of a population through analysis of sample data drawn from the population is known as
statistical interference
When determining the best estimated regression equation to model a set of data, the procedure that uses an iterative variable selection procedure that considers adding an independent variable and removing an independent variable at each step is called
stepwise selection
Which of the following options guarantees that the best model for a given number of variables will be found?
best subsets regression
Which of the following options is NOT an iterative variable selection procedure?
best subsets regression
What would be the coefficient of determination if the total sum of squares (SST) is 30 and the sum of squares due to regression (SSR) is 27?
.9 -27/30
The tests of significance in regression analysis are based on assumptions about the error term ε. One such assumption is that the error term ε is a random variable with a mean or expected value of
0
In a regression analysis if SSE = 200 and SSR = 300, then the coefficient of determination is
0.6 -r^2 = SSR/(SSE+SSR)
The _____ is a measure of the error that results from using the estimated regression equation to predict the values of the dependent variable in a sample.
SSE - sum of squares due to error
The coefficient of determination is defined as
SSR/SST -SST = (SSE+SSR)
Suppose an estimated regression equation has a coefficient of determination (r2) of 0.866. Interpret this value.
The estimated regression equation explains approximately 86.6% of the variation in the dependent variable.
The multiple regression equation based on the sample data, which has the form "y mean = b0 + b1x1 + b2x2 etc." is called
an estimated multiple regression equation
Suppose a residual plot of x verses the residuals, y - y mean , shows a nonconstant variance. In particular, as the values of x increase, suppose that the value of the residuals also increase. This means that
as the values of x get larger, the ability to predict y becomes less accurate.
Iterative variable selection procedure
at each step of the procedure a single independent variable is added or deleted and the new model is evaluated
Two variables have a positive linear correlation. As the dependent variable increases, the independent variable will
increase
The tests of significance in regression analysis are based on assumptions about the error term ε. One such assumption is that the values of ε are
independent
_____ refers to the use of sample data to calculate a range of values that is believed to include the unknown value of a population parameter.
interval estimation
A(n) ________ refers to a measurable factor that defines a characteristic of a population, process, or system.
parameter
When we use the estimated regression equation to develop an interval that can be used to predict the mean for a specific unit that meets a particular set of given criteria, that interval is called a
prediction interval
What type of regression model should be used when there is a nonlinear relationship between the independent and dependent variables which is fit by including the independent variable and the square of the independent variable?
quadratic regression model
__________ is a statistical procedure used to develop an equation showing how two variables are related.
regression analysis
The difference between the observed value of the dependent variable and the value predicted using the estimated regression equation is known as the
residual
A _____ is used to visualize sample data graphically and to draw preliminary conclusions about the possible relationship between the variables.
scatter chart
The procedure of using sample data to find the estimated regression equation is better known as
the least squares method
The tests of significance in regression analysis are based on assumptions about the error term ε. One such assumption is that the variance of ε, denoted by σ 2 , is
the same for all values of x
In the simple linear regression equation, the parameter Bo represents the _____ of the true regression line.
y-intercept
When the mean value of the response variable is independent of variation in the predictor variable, the slope of the regression line is
zero