Data Analysis: Chapter 13:
three important assumptions that are tested for random error E
1. the errors are normally distributed 2. the errors have constant variance 3. the errors are independent
A measure of relative fit for a simple regression line is called the R2 or ___________ __________ ___________.
coefficient of determination
Which of the following is a "goodness of fit" measure?
coefficient of determination
A __________ interval for Y, the response variable, predicts the mean of Y whereas a _________ interval for Y predicts the individual value for Y.
confidence; prediction
A response variable, Price, is defined as the selling price of a used car. Three predictor variables include, age, the age of the car in years, mileage, the mileage of the car in thousands of miles, and cylinders, the number of engine cylinders. The estimated regression equation is: Price = 10,000-1800Age-50Mileage+1200Cylinders. Predict the average price of 2 year old car with 50,000 miles and 4 cylinders.
$8700 *note: use mileage = 50, not 50,000*
A multivariate data set will have ________
1. a single column of Y values 2. n rows of observations 3. k columns of X values
Variance inflation caused by multicollinearity can result ________.
1. in untrustworthy t statistics for the coefficient estimates 2. in difficulty identifying the contribution of each predictor 3. in wider confidence intervals for the coefficients of parameters than warranted
Variance inflation cause by multicollinearity can result __________
1. in wider confidence intervals for the coefficients of parameters than warranted 2. in difficulty identifying the contribution of each predictor 3. in untrustworthy t statistics for the coefficient estimates
4 criteria for regression assessment
1. logic 2. fit 3. parsimony 4. stability
Limitations of simple regression
1. multiple relationships usually exist 2. biased estimates if relevant predictors are omitted 3. lack of fit does not show that X is unrelated to Y is the true model is multivariate
A multiple regression model is preferred over a simple regression model because ________.
1. rarely does one predictor explain the variation in Y as well as several predictors 2. it is possible that a predictor can appear unrelated to Y in a simple regression but it can show significance when combined with another predictor
If two predictor variables, X1 and X2, are suspected of having an interaction effect on the response variable Y, we can test for this by adding the term _______ to the model.
B3x1x2
If a predictor variable, Xj, is suspected of having a quadratic relationship with the response variable Y, we can test for this by adding the term __________ to the model.
Bjxj2
The ______ statistic is defined as the ratio MSR/MSE.
F
The response variable (Y) is assumed to be related to the ______ predictors by a linear equation called the ______ _________ ___________.
K; population regression model
Match the regression criteria with the reasoning
Logic: Is there an expectation that the predictor will help explain variation in the best response? Fit: Does the overall regression show significant predictive ability? Parsimony: Does each predictor contribute significantly to the model? Stability: are the predictors independent enough from each other so that the model is stable?
A response variable is defined as the selling price of a used car. Three predictor variables include age of the car, the mileage f the car, and the number of cylinders. The proper estimated regression equation would be:
Price = b0 +b1Age + b2Mileage + b3Cylinders
The standard error of the regression, se, is calculated by taking the square root of ________ divided by ______.
SSE; n-k-1
What does SSE represent in regression analysis?
The amount of variation in Y that is left unexplained
If c binary variables are created for a categorical predictor with c categories, the regression calculations will fail because we will have ________.
a redundant predictor that causes perfect collinearity
Suppose we define a qualitative variable called payment method and the categories are credit card, personal check, or cash. The binary variables defined are CC =1 if pay by credit card (0 otherwise) and PC = 1 if pay by personal check (0 otherwise). If both CC = 0 and PC = 0 it means the payment method was by ______
cash
A correlation matrix can be used to identify possible ______ between two predictor variables.
correlation
A response variable, Price, is defined as the selling price of a used car. Three predictor values include, Age, the age of the car in years, Mileage, the mileage of the car in thousands of miles, and Cylinders, the number of engine cylinders. The estimated regression equation is: Price = 10,000-1800Age-50Mileage+1200Cylinders. If the estimated price of a used car is $8700 and the actual selling price is $9000, residual is _____.
ei = $300
multiple regression
extends simple regression to include several independent variables. - required when a single-predictor model is inadequate to describe the true relationship between the dependent variable Y and its potential predictors.
True or false: A binary predictor variable is tested for significance using a different test statistic than used for a qualitative predictor variable
false
True or false: Software packages such as Excel or MINITAB routinely report left-tail p-values when testing multiple regression coefficients.
false
When the predictor variables are related to each other rather than being independent we have a condition called _______.
multicollinearity
Coefficient instability would be when X1 and X2 both show strong correlation with Y and ________.
one or both of their coefficients are not significant
Klein's Rule
suggests that we should worry about the stability of the regression coefficient R
A response variables, Price, is defined as the selling price of a used car. Three predictor variables include, Age, the age of the car in years, Mileage, the mileage of the car in thousands of miles, and cylinders, the number of engine cylinders. The estimated regression equation is: Price= 10,000-1800Age-50Mileage+1200Cylinders. The coefficient -1800 means ________.
that for each year increases in a car's age, the price decreases by $1800 on average
If we fail to reject the null hypothesis that the coefficient Bk = 0 then we conclude ______.
that the predictor variables X is not associated with the response variable Y.
identify the estimated multiple regression equation
y(hat)=b0+b1x1+b2x2+...+bkxk
identify the population multiple regression model.
y=B0 + B1x1+B2x2+...+Bkxk+E
The objective when conducting a regression analysis is _______.
to find a linear equation that minimizes the sum of the squared differences between the observed response and the estimated response
True or False: R2adj is always less than R2.
true
stepwise regression
uses the power of the computer to fit the best model using 1,2,3....k predictors.
A response variable, Price, is defined as the selling price of a used car. Three predictor variables include, Age, the age of the car in years, Mileage, the mileage of the car in thousands of miles, and Cylinders, the number of engine cylinders. The estimated regression equation is: Price= 10,000-1800Age-50milage+1200Cylinders. By how much will the price be reduced for each additional 10,000 miles on the car?
$500
"When two explanations are otherwise equivalent, we prefer the simplest explanation." This is known as the principle of ______
Occam's Razor
The R2 value falls in the range __________ to ________.
0 to 1
Each category of a qualitative variable can be converted to a binary variable by assigning the value ______ or ______ to indicate the presence or absence of the condition.
0;1
The quick rule for finding 95% confidence or prediction intervals for Y substitutes the number _________ for the __________ statistic
2; t
Doane's Rule states that there should be at least ________ observations for each predictor variable
5
Identify the four criteria for regression assessment
Logic Fit Parsimony Stability
Principle of Occam's Razor
When two explanations are otherwise equivalent, we prefer the simpler, more parsimonious one.