Ch 7
Six blueberry shrubs were planted in a garden and then fertilized and watered for two years. Monthly fertilizer amounts (in milliliters) and shrub heights at two years (in centimeters) were recorded as follows: Fertilizer 0, 5, 10, 15, 20, 25 Height 39, 49, 57, 57, 65, 66 Predicting the height from the fertilizer amount via the least-squares line would be unreliable for which fertilizer amount?
35
The correlation coefficient should be used for only which one of the data sets described below?
A scatterplot of weight of cars versus gas mileage shows an approximately linear relationship.
Select all that apply Which of the following operations leave the correlation coefficient r between x and y uncharged?
Adding a constant to each x. Interchanging the values of x and y. Dividing each x by a positive constant. Multiplying each by a positive constant.
Select all that apply Suppose two variables x and y are positively correlated. Which is the following statements are true?
An increase in x is associated with an increase in y. There may be a third confounding variable z that is the cause of both x and y. A decrease in x is associated with a decrease in y.
True or false: When two variables are highly correlated, a change in the value of one will cause a change in the value of another.
False
Select all that apply The correlation coefficient, r, is calculated between x, the winning men's shot put distance (in meters), and y, the air temperature (in degrees Celsius), over many track and field competitions. Which of the following would change r?
Replace each temperature y with the wind speed. Replace each distance x with In(x), its natural logarithm
Which properties of a residual plot are required to assure ourselves that the assumptions for errors in a linear model hold?
The plot is homoscedastic. The plot shows no substantial trend or curve.
Select all that apply What are the indications from this residual plot that the assumptions of a linear model do not hold?
The plot shows a prominent curve. The plot is heteroscedastic.
The figure shows a scatterplot with least-squares line, a residual plot, and a plot of residuals versus the order in which observations were made. What action is best?
The trend in the plot of residuals versus the order in which observations were made indicates that a time variable should be added.
True or false: In a simple linear model, it is assumed that the errors ε1ε1, ..., εnεn are random and independent, each satisfying εi ~ N(0, σ2)εi ~ N(0, σ2).
True
In the linear model yi = β0 + β1xi + εi, the number β0 is called ______.
a regression coefficient
In the linear model yi = β0 + β1xi + εi, yi is called the ______ variable.
dependent
In the linear model yi = β0 + β1xi + εi, εi is called the ______.
error
The coefficients β̂β̂0 and β̂β̂1 in the least squares line y = β̂β̂0 + β̂β̂1 x are ______.
estimated from the data
To _______ is to use a least-squares line for prediction outside the range of the data. It is unreliable because the linear relationship may not hold outside the range of the data.
extrapolate
Six blueberry shrubs were planted in a garden and then fertilized and watered for two years. Monthly fertilizer amounts (in milliliters) and shrub heights at two years (in centimeters) were recorded as follows: Fertilizer 0, 5, 10, 15, 20, 25 Height 39, 49, 57, 57, 65,66 Here is the output from software used to find the least-squares line for predicting height from fertilizer: Predictor Coef: 1.046 SE Coef: 0.153 T: 6.825 P: 0.002 Constant Coef: 42.429 SE Coef: 2.320 1T: 8.292 P: 0.000 Which equation gives the least-squares line?
height = 42.429 + 1.046(fertilizer)
A scatterplot is _________ if the vertical spread of its points varies across the plot.
heteroscedastic
In the linear model yi = β0 + β1xi + εi, xi is called the ______ variable.
independent
Select all that apply In the context of least-squares regression, r2 ______.
is the proportion of variance in y explained by regression =regression sum of squares/ total sum of squares is the coefficient of determination is the square of the correlation coefficient
The distributions of β̂β̂0 and β̂β̂1 are ______.
normal with means β0 and β1
To check the assumption that errors are normally distributed in a linear model, a normal _______ plot can be made.
probability
Select all that apply Which of the following are properties of the correlation coefficient, r, calculated from a bivariate data set (x1,y1),...,(xn,yn)?
r > 0 for a least-squares line with positive slope. -1 ≤ r ≤ 1
Select all that apply Which of the following are properties of the correlation coefficient, r, calculated from a bivariate data set (x1, y1),.....(xn,yn)?
r does not depend on units or scale. r=0 implies that x and y have no linear correlation.
Match each expression on the left with its description on the right.
r2 matches Coefficient of determination ∑i=1n(yi-y)2∑ni=1(yi-y)2 matches Total sum of squares ∑i=1n(yi-ŷi)2∑ni=1(yi-ŷi)2 matches Error sum of squares
The vertical difference ei = yi − yiˆei = yi - yi^ between observed yiyi and predicted yiˆyi^ associated with the point (xi, yi)(xi, yi) in a least-squares line is called the _____.
residual
The best diagnostic for least-squares regression is a plot of residuals ei versus fitted values ŷŷi, sometimes called a _______ plot.
residual or residuals
The least-squares line is chosen to minimize the sum of the _____ from the data points to the line.
squared vertical distances
The least-squares coefficients β̂β̂0 and β̂β̂1 are ______.
statistics random variables
Match each standard deviation expression on the left to its value on the right.
sβ̂o goes with the one that starts s in front sβ̂1sβ̂1 goes with the one that has s on top of the fraction s has n-2 on the bottom of he fraction
Select all that apply Even supposing that ice cream sales per week are correlated with the number of drowning swimmers per week, it is unreasonable to say that ice cream causes drowning. Potential confounding variables include ______.
the number of swimmers per week the average temperature during the week
When a plot of residuals against the order in which observations were made indicates a trend, an independent variable representing _______ should be included in the model.
time
A heteroscedastic scatterplot shows ______ of its points.
varying vertical spread
A residual plot for least-squares regression graphs points with ______.
x-coordinate ŷŷi and y-coordinate ei
Match the notation from the least-squares line on the left with its description on the right.
yiˆ = βˆo + β1ˆxyi^ = β^o + β1^x matches fitted value ei = yi − yiˆei = yi - yi^ matches residual βoˆβo^ matches the estimated intercept β1ˆβ1^ matches the estimated slope
The formula for the intercept of the least-squares line is β̂β̂0 = ______
yy - β̂β̂1x
Select all that apply Regarding the least squares line y = β̂β̂0 + β̂β̂1 x, which statements are true?
β̂β̂0 is an estimate of β0, the true intercept. β̂β̂1 is an estimate of β1, the true slope.
Select all that apply Which of these quantities typically change from sample to sample?
β̂β̂0, ei, εi, β̂β̂1
Match each expression related to the least-squares line on the left with its equivalent on the right.
∑i=1n(xi-x)2∑ni=1(xi-x)2 matches ∑i=1nxi2∑ni=1xi2 - nxx2 ∑i=1n(yi-y)2∑ni=1(yi-y)2 matches ∑ni=1yi2-ny2 ∑i=1nyi2∑ni=1yi2 - nyy2 ∑i=1n(xi-x)(yi-y)∑ni=1(xi-x)(yi-y) matches ∑ni=1xiyi-nxy ∑i=1nxiyi∑ni=1xiyi - nxxy
The formula for the slope of the least-squares line is β̂β̂1 = ______.
∑j=1n(xi-x)(yi-y)/∑j=1n(xi-x)2
The coefficients β̂β̂0 and β̂β̂1 in the least squares line y = β̂β̂0 + β̂β̂1 x are chosen to minimize ______.
∑ni=1ei2∑i=1nei2 = ∑ni=1(yi-βˆ0-βˆ1xi)2