Chapter 7: Linear Regression
What does a negative residual indicate? A positive residual? A residual of zero? (Observed - Predicted = Residual)
A negative residual= The model's perdiction was too high. (overestimate) A positive residual= The model's prediction was too low. (underestimate) Residual of 0= The models prediction matched the observed value exactly. (model=reality)
What is a residual and how is it calculated?
A residual is how far a piece of data (dot) is from the best fit line. Whether it is above or below the line. It is calculated by subtracting the predicted value from the observed value.
How many residuals does a set of data have?
A set of data will have many residuals. Some will be positive (if the actual value is above the best fit line) and some will be negative (if the actual value is below the best fit line). *The number of residuals a set of data has is equal to the number of points a set of data has.
residual
difference between observed value and its associated value, can be positive or negative. If it is below the line of best fit = negative...if it is above line of best fit= positive
predicted value
estimate made form a model
slope
gives the "y-units per x-unit"; the value of y where the line crosses the y-axis and b1
coefficient of determination
measures the success of the regression model in terms of the fraction of the variation of y accounted for by the regression
parameter
numbers in the model that have to chosen to explicitly determine the value of the model
r-squared
the fraction of the data's variance accounted for by the model (often given as a percent.)
line of best fit
the line for which the sum of the squared residuals is smallest; line that fits the data points the best with equal amounts of data on each side of it
What is meant by a line of best fit?
the line for which the sum of the squared residuals is the smallest. Which means that when you square all of the residuals of a set of data, the line best fit runs through the sum which is the smallest
The line of best fit always passes through which point?
the origin (0,0)
regression line
the particular linear equation that satisfies the least squares criterion
The R2 value shows how much of the variation in the response variable can be accounted for by the linear regression model. If R2 = 0.95, what can be concluded about the relationship between x and y?
change*- 95% of the variability in the Y is accounted for by the linear relationship with the X. (this is just something we all have to memorize, % in the Y with the X)
linear model
an equation of a straight line through the data
Explain how to construct a residual plot.
In a calculator go to STAT EDIT and put the plots under RESID. Set up a STATPLOT Plot 2 as a scatterplot with Xlist:YR and YLIST:RESID. Go to Y= screen and hit ENTER and turn off the regression line and Plot1 and turn on Plot2. ZoomStat.
What conditions are necessary before using a linear model for a set of data?
See that the data satisfies the straight enough condition by checking to see if the scatterplot looks reasonably straight. (you should also check linearity when examining the residuals). You must also check for outliers because they can greatly influence a regression model
What are the parameters of the Normal model?
The Normal model's parameters are mean and standard deviation.
If a least-squares regression line fits the data well, what characteristics should the residual plot exhibit?
The residual plot should be in a horizontal direction, have a shapeless form, and it should have roughly equal scatter for all predicted values.
Describe the difference in notation between y and y hat.
Y-hat is the predicted value, which would be on the line of best fit; Y is the actual real life value.