Ch.8: Linear Regression
y hat = b0 + b1x
- b0 = y-intercept - b1 = slope
relating the equation to the standardized graph
- moving one standard deviation away from the mean in x moves our estimate r standard deviations away from the mean in y - moving any number of standard deviations in x moves r times that number of standard deviations in y
residual mean is always
0
"Best Fit" means
Least Squares - the sum for which the squared residuals is the smallest
Linear Model (Line of Best Fit)
an equation of a straight line through the data
R^2
between 0 and 100%
changing units
doesn't change the correlation but changes the standard deviations
regression to the mean (regression line)
each predicted y tends to be closer to its means (in standard deviations) than its corresponding x was
predicted value
estimate made from a model (ŷ)
1 - r^2
fraction of original variation left in residuals
squared correlation (r^2)
fraction of the original variation left in the residuals
equation of the line of standardized points
hat zy = r(zx)
regression assumptions and conditions
linearity assumption (straight enough condition) equal variance assumption (does the plot thicken condition) outlier condition
standard deviation of the residuals (se)
measure of how much the points spread around regression line
data =
model + residual
line of standardized scores
must go through the origin, which is the point of the means of x and y
slope of the regression line
over one standard deviation in x, up r standard deviations in y hat - b = rsy/sx
percentage of variability in y explained by x
r squared
what can go wrong
see page 189
what have we learned
see page 190
terms
see page 190-191
skills
see page 191
y-intercept
serves only as a starting value for predictions, not to be interpreted as a meaningful predicted value
scatterplot of residuals versus x-values
should be the most boring scatterplot you've ever seen - no distinctive features (direction, shape)
standard deviation of the residuals
square root of sigma e squared over n minus 2
finding residuals
subtract the predicted value from the observed ones - negative means actual is below line, positive above
slope of a regression line
the correlation coefficient (r)
residual
the difference between an observed value of the response variable and the value predicted by the regression line
units of slope
units of y per unit of x
model
with a set of data points, this is done with a line and giving its equation