Regression Review
In multiple regression with p predictor variables, when constructing a confidence interval for any βi , the degrees of freedom for the tabulated value of t should be: a) n-1 b) n-2 c) n- p-1 d) p-1
c
Worst kind of outlier, can totally reverse the direction of association between x and y.
influential points
Used when the effect of a predictor on the response depends on other predictors.
interaction
A point that lies far away from the rest.
outliers
Is the observed value of y minus the predicted value of y for the observed x
residual
Used to check the assumptions of the regression model.
residual points
In simple linear regression, when β is not significantly different from zero we conclude that: a) X is a good predictor of Y b) there is no linear relationship between X and Y c) the relationship between X and Y is quadratic d) there is no relationship between X and Y
B
Both the prediction interval for a new response and the confidence interval for the mean response are narrower when made for values of x that are: a) closer to the mean of the x's b) further from the mean of the x's c) closer to the mean of the y's d) further from the mean of the y's
A
Used when a numerical predictor has a curvilinear relationship with the response
Quadratic regression
Proportion of the variability in y explained by the regression model.
R2
Used when trying to decide between two models with different numbers of predictors.
R2 adjusted
Most supermarkets use scanners at the checkout counters. The data collected this way can be used to evaluate the effect of price and store's promotional activities on the sales of any product. The promotions at a store change weekly, and are mainly of two types: flyers distributed outside the store and through newspapers (which may or may not include that particular product), and in-store displays at the end of an aisle that call the customers' attention to the product. Weekly data was collected on a particular beverage brand, including sales (in number of units), price (in dollars), flyer (1 if product appeared that week, 0 if it didn't) and display (1 if a special display of the product was used that week, 0 if it wasn't). As a preliminary analysis, a simple linear regression model was done. The fitted regression equation was: sales = 2259 - 1418 price. The ANOVA F test p-value was .000, and R2 = 59.7%.
USE THIS TO ANSWER FOLLOWING QUESTIONS
At the same confidence level, a prediction interval for a new response is always; a) somewhat larger than the corresponding confidence interval for the mean response b) somewhat smaller than the corresponding confidence interval for the mean response c) one unit larger than the corresponding confidence interval for the mean response d) one unit smaller than the corresponding confidence interval for the mean response
a
In a multiple regression model, where the x's are predictors and y is the response, multicollinearity occurs when: a) the x's provide redundant information about y b) the x's provide complementary information about y c) the x's are used to construct multiple lines, all of which are good predictors of y d) the x's are used to construct multiple lines, all of which are bad predictors of y
a
Is price a good predictor of sales? a) Yes, the p-value is very small. b) Yes, the intercept is very large. c) No, R-square is not too good. d) No, the slope is negative.
a
The following appeared in the magazine Financial Times, March 23, 1995: "When Elvis Presley died in 1977, there were 48 professional Elvis impersonators. Today there are an estimated 7328. If that growth is projected, by the year 2012 one person in four on the face of the globe will be an Elvis impersonator." This is an example of: a) extrapolation b) dummy variables c) misuse of causality d) multicollinearity
a
The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~N(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ
a
According to the null hypothesis of the ANOVA F test, which predictor variables are providing significant information about the response? a) most of them b) none of them c) all of them d) some of them
b
According to this model, how many units will be sold, on average, when the price of the beverage is $1.10? a) 3818.8 b) 699.2 c) 1066.9 d) 3902.9
b
In a regression study, a 95% confidence interval for β1 was given as: (-5.65, 2.61). What would a test for H0: β1=0 vs Ha : β10 conclude? a) reject the null hypothesis at α=0.05 and all smaller α b) fail to reject the null hypothesis at α=0.05 and all smaller α c) reject the null hypothesis at α=0.05 and all larger α d) fail to reject the null hypothesis at α=0.05 and all larger α
b
In the regression model Y = α+ βx + ε the change in Y for a one unit increase in x: a) will always be the same amount, α b) will always be the same amount, β c) will depend on the error term d) will depend on the level of x
b
We can measure the proportion of the variation explained by the regression model by: a) r b) R2 c) σ2 d) F
b
Which of the following is the best interpretation of the slope of the line? a) As the price increases by 1 dollar, sales will increase, on average, by 2259 units. b) As the price increases by 1 dollar, sales will decrease, on average, by 1418 units. c) As the sales increase by 1 unit, the price will increase, on average, by 2259 dollars. d) As the sales increase by 1 unit, the price will decrease, on average, by 1418 dollars.
b
Below is a sketch of the residual plot for this analysis. What can you conclude from it? (graph is in the shape of a U) a) All the assumptions seem to be satisfied. b) There seems to be an outlier in the data. c) Simple linear regression might not be the best model. d) The assumption of constant variance might be violated.
c
In a regression model with a dummy variable without interaction there can be: a) more than one slope and more than one intercept b) more than one slope, but only one intercept c) only one slope, but more than one intercept d) only one slope and one intercept
c
Studies have shown a high positive correlation between the number of firefighters dispatched to combat a fire and the financial damages resulting from it. A politician commented that the fire chief should stop sending so many firefighters since they are clearly destroying the place. This is an example of: a) extrapolation b) dummy variables c) misuse of causality d) multicollinearity
c
The MSE is an estimator of: a) ε b) 0 c) σ2 d) Y
c
The proportion of the variability in sales accounted for by the price of the product is: a) 14.18% b) 22.59% c) 59.70% d) 100%
c
.In general, the Least Squares Regression approach finds the equation: a) that includes the best set of predictor variables b) of the best fitting straight line through a set of points c) with the highest R2 , after comparing all possible models d) that has the smallest sum of squared errors
d
According to the alternative hypothesis of the ANOVA F test, which predictor variables are providing significant information about the response? a) most of them b) none of them c) all of them d ) some of them
d
If a predictor variable x is found to be highly significant we would conclude that: a) a change in y causes a change in x b) a change in x causes a change in y c) changes in x are not related to changes in y d) changes in x are associated to changes in y
d
In a study of the relationship between X=mean daily temperature for the month and Y=monthly charges on electrical bill, the following data was gathered. Which of the following seems the most likely model? X: 20 30 50 60 80 90 Y: 125 110 95 90 110 130 a) Y= α +βx+ε β<0 b) Y= α +βx+ε β>0 c) Y= α +β1x+β2x 2 +ε β2<0 d) Y= α +β1x+β2x 2 +ε β2>0
d
Should the intercept of the line be interpreted in this case? a) Yes, as the average price when no units are sold. b) Yes, as the average sales when the price is zero dollars. c) No, since sales of zero units are probably out of the range observed. d) No, since a price of zero dollars is probably out of the range observed
d
The coefficient of linear correlation, r, for this analysis is: a) 7.73 b) -7.73 c) .773 d) -.773
d
The response variable is: a) quantitative b) y c) sales d) all of the above
d
Used in a regression model to represent categorical variables.
dummy variables
Can give bad predictions if the conditions do not hold outside the observed range of x's.
extrapolation
yˆ =a+b1x1+b2x2+...+bpxp
fitted equatio
Problem that can occur when the information provided by several predictors overlaps
multicollinearity
y= α +β1x1+β2x2+...+βpxp+ε ε~N(0,σ2 )
multiple regression model