Statistics: Chapter 7
Interpret R^2
% of the variability in remaining (y-context) can be explained by its linear relationship with (x-context)
Problems with residual plots?
-Curved pattern -Outliers -Fanning
What is a negative residual?
A negative residual tells us OBSERVED is less than the PREDICTED (OVERESTIMATED)
What is a positive residual?
A positive residual tells us OBSERVED is more than predicted (UNDERESTIMATED)
How should a scatterplot of the residuals look like?
A scatterplot of the residuals should be the most boring scatterplot you've ever seen; it shouldn't have any interesting features like direction or shape; it should have the same amount of scatter throughout without bends or outliers
A line that fits WELL has...
A very small residual
After we fit a regression model, we usually plot the...
After we fit a regression model, we usually plot the residuals in the hope of finding NOTHING
What is an R^2 of 100%?
An R^2 of 100% is a perfect fit with no scatter around the line; the Se would be zero
What is the y-intercept?
B0=(mean of the y)-(B1)(mean of x)
What is the slope?
B1= rSy/Sx
Interpret the slope
Based on the model, for each additional (x-context), we expect a (increase or decrease) in (y-context) of # on average
Interpret the y-intercept
Based on the model, when (x-context) is zero, the expected (y-context)!is # on average
Will changing the units of the variables affect the correlation?
Changing the units of the variables doesn't change the correlation
What does each predicted y tend to do?
Each predicted y tends to be closer to its mean (in standard deviations) than its corresponding x was
Residual plot
Ideally: NO PATTERN
r= 0.8-0.5
MODERATE
What happens when you move any number of standard deviations in x?
Moving many number of standard deviations in x, moves r times that number of standard deviations in y
What is r=0?
NO LINEAR RELATIONSHIP
What is r=1 or r=-1?
PERFECT linear association
What is R^2?
R^2 is the percentage of variability in y that is explained by x
r= 1.00-0.8
STRONG
Interpret the Standard Deviation of Residuals
The AVERAGE distance between the actual (observed) y-context values and the predicted y-context values is 's' units
What does the size of the residuals tell us?
The SIZE of the residuals tells us how well the line fits over the data
The smaller the sum...
The better the fit
The closer the R^2 is to 100% then..
The less scatter there is around the line
What is the line of best fit?
The line of best fit is the line for which the sum of the squared residuals is smallest (the least squares line)
What is residual?
The residual is the difference between the observed value and its associated predicted value residual= observed-predicted
What does the residual value tell us?
The residual value tells us how far off the model's prediction is at that point
Should the residuals all share the same underlying spread?
The residuals should all share the same underlying spread, so we make sure that the residual plot has about the same amount of scatter throughout
What does the standard deviation of the residuals (Se) gives us?
The standard deviation of the residuals gives us a measure of how far the points spread around the regression line
What does the variation in residuals (R^2) give us?
The variation in residuals is the key to assessing how well the model fits
How do you find the residual?
To find the residuals, you always subtract the: observed-predicted
Does units matter for the slope?
Units do matter for the slope
r= 0.5-0
WEAK
What do we call the estimates made from a model?
We call the estimates made from a model, the predicted value and write it as y-hat
Is it a good idea to make a histogram of the residuals?
Yes because if we see a UNIMODAL, SYMMETRIC histogram then we can apply the 68-95-99.7 rule to see how well the regression model describes the data
What variations do you compare?
You compare the variation of the response variable with the variation of the residuals
What is the regression line?
y-hat=B0+B1(x)