AP STATS- Unit 4 Linear Regression
strong
-.8,-1. .8, 1
weak
-.5-.5
moderate
-.8,-.5. .8-.5
Equation of least squares regression in context
.118(body)+13.195
The mean of the residuals is always
0
Suppose a certain scale is not calibrated correctly, and as a result, the mass of any object is displayed as 0.75 kilogram less than its actual mass. What is the correlation between the actual masses of a set of objects and the respective masses of the same set of objects displayed by the scale?
1
There is a linear relationship between the number of chirps made by the striped ground cricket and the air temperature. A least squares fit of some data collected by a biologist gives the model ŷ = 25.2 + 3.3x 9 < x < 25, where x is the number of chirps per minute and ŷ is the estimated temperature in degrees Fahrenheit. What is the estimated increase in temperature that corresponds to an increase of 5 chirps per minute?
16.5
correlation
64% of variation +-squrt of .64 r=-.80
Linear Regression: Using the Formulas
REVIEW PROBLEM NUMBER FOUR
Intercept in context 537.8+(-6.76x)
In a state where 0% of the people are vaccinated, the predicted death rate is 537.8 people per 100,000
Meaning of residual
Negative is over-predicted, positive is under-predicted
-1
Perfect Negative Correlation
1
Perfect positive correlation
Least Squares
Regression line is the line that makes the sum of the squared differences between the observed value of y and the predicted value of y as small as possible
The height and age of each child in a random sample of children was recorded. The value of the correlation coefficient between height and age for the children in the sample was 0.8. Based on the least-squares regression line created from the data to predict the height of a child based on age, which of the following is a correct statement?
The proportion of the variation in height that is explained by a regression on age is 0.640.64.
strong, nonlinear, correlation
arch
`A scatterplot of student height, in inches, versus corresponding arm span length, in inches, is shown below. One of the points in the graph is labeled A.
The slope of the least squares regression line increases and the correlation coefficient increases.
Interpret the slope in context 537.8+(-6.76x)
as the percent vaccinated increases by 1%, the deathr ate is predicted to decrease by 6.76 people per 100,000
curved around center
curved pattern, not good fit
x axis
explanitory
Estimating the value of a variable outside of the original observation range is called
extrapolation
An engineer believes that there is a linear relationship between the thickness of an air filter and the amount of particulate matter that gets through the filter; that is, less pollution should get through thicker filters. The engineer tests many filters of different thickness and fits a linear model. If a linear model is appropriate, what should be apparent in the residual plot?
There should be no pattern in the residual plot.
A roadrunner is a desert bird that tends to run instead of fly. While running, the roadrunner uses its tail as a balance. A sample of 10 roadrunners was taken, and the birds' total length, in centimeters (cm), and tail length, in cm, were recorded. The output shown in the table is from a least-squares regression to predict tail length given total length. Suppose a roadrunner has a total length of 59.0 cm and tail length of 31.1 cm. Based on the residual, does the regression model overestimate or underestimate the tail length of the roadrunner?
Underestimate, because the residual is positive.
Explanatory variable
a variable that we think explains or causes changes in the response variable
varied distance around center on residual plot
increasing variability, more accurate for smaller than larger
residual
is the difference between an observed value of the response variable and the value predicted by the regression line (residual=observed-predicted)
correlation r
measures the direction and strength of linear relationship between two quantitative variables
0
no correlation
y hat
predicted value of y
Calculate residual
predicted value of y minus actual value
The standard deviation of the residuals tells us how much the ____ y values differ from the ____ y values on average
predicted, actual
coefficient of determination
proportion of variation in response that can be explained by the linear model using explanatory variable as a predictor (r^2)
Explain cod
r=.87 .87 squared=.75 75.69% of variation in exam score can be explained by regression model using hours studied a predictor
y axis
response variable
Scatterplot
shows the relationship between two quantitative variables measured on the same individuals
even around line on residual plot
uniform scatter, good model
Consider n pairs of numbers (x1,y1), (x2,y2), ..., and (xn, yn). The mean and standard deviation of the x-values are x̄ =5 and sx = 4, respectively. The mean and standard deviation of the y-values are ȳ = 10 and sy = 10 respectively. Of the following, which could be the least squares regression line?
ŷ = 8.5 + 0.3x