unit 8 prob and stat
What is a residual and how is it calculated?
A residual is how far a piece of data (dot) is from the best fit line. Whether it is above or below the line. It is calculated by subtracting the predicted value from the observed value
How many residuals does a set of data have?
A set of data will have many residuals. Some will be positive (if the actual value is above the best fit line) and some will be negative (if the actual value is below the best fit line). The number of residuals a set of data has is equal to the number of points a set of data has.
The R2 value shows how much of the variation in the response variable can be accounted for by the linear regression model. If R2 = 0.95, what can be concluded about the relationship between x and y? _____ % of the variability in _____ is accounted for by the linear relationship with _____.
change*- 95% of the variability in the Y is accounted for by the linear relationship with the X. (this is just something we all have to memorize, % in the Y with the X)
The line of best fit always passes through which point?
the origin (0,0)
Explain how to construct a residual plot.
In a calculator go to STAT EDIT and put the plots under RESID. Set up a STATPLOT Plot 2 as a scatterplot with Xlist:YR and YLIST:RESID. Go to Y= screen and hit ENTER and turn off the regression line and Plot1 and turn on Plot2. ZoomStat.
What does a negative residual indicate? A positive residual? A residual of zero?
Observed - Predicted = Residual) A negative residual= The model's perdiction was too high. (overestimate) A positive residual= The model's prediction was too low. (underestimate) Residual of 0= The models prediction matched the observed value exactly. (model=reality) <Highly unlikely in real life situations.
What conditions are necessary before using a linear model for a set of data?
See that the data satisfies the straight enough condition by checking to see if the scatterplot looks reasonably straight. (you should also check linearity when examining the residuals). You must also check for outliers because they can greatly influence a regression model.
What are the parameters of the Normal model?
The Normal model's parameters are mean and standard deviation
What is meant by a line of best fit?
The line of best fit is the line for which the sum of the squared residuals is the smallest. Which means that when you square all of the residuals of a set of data, the line best fit runs through the sum which is the smallest
If a least-squares regression line fits the data well, what characteristics should the residual plot exhibit? Sketch a well-labeled example.
The residual plot should be in a horizontal direction, have a shapeless form, and it should have roughly equal scatter for all predicted values.
Explain the quote (by George Box, a famous statistician), "All models are wrong, but some are useful."
We can use the model to help us find an appropriate equation. The equation can be used to give us a prediction. The prediction is not 100% accurate because it cannot relate to every real life situation. However, it can still be useful by serving as a visual aid to help us plan/predict real life situations. Stats doesn't account for real life. The line a model gives you will never be exactly correct. Models can be a useful visual aid and good for predictions, however models are only for that use. They are not to be relied on to be completely accurate because nothing in real life is exact or perfect in relation to a model. Also, models are too specific, when referring to real life situations everything needs to be vague because nothing comes out exactly as it was predicted
Describe the difference in notation between y and yˆ
Y-hat is the predicted value, which would be on the line of best fit; Y is the actual real life value.