Stats 1430 Midterm (Correlation and Regression)
coefficient of determination
% of variability in y that is explained by x R^2 works for all relationships NOT JUST LINEAR ONES R^2 has to be between 0 and 1
What does r (correlation) have to be between?
-1 and 1
Things Residuals Should Not Have
1. pattern 2. systematic changes as x increases 3. unusually large values of a residual 4. influential points (outliers)
What should be true of the residual plot if the best line fits well?
Look flat No pattern (should have random scatter about the regression line) No systematic change as X increases ex: Y fans out as X increases Few unusually large values of a residual (outliers in the y direction) Few influential points (outliers in the x direction)
How do scatter plots relate to residual plots
Points above best fit line on scatter plot are positive residuals Points below best fit line on scatter plot are negative residuals Points that are on the line y = 0 on the residual plot have residual of 0 Value on residual plot = distance from best fit line on scatter plot
What type of data does correlation and regression deal with?
Quantitative
What percentage of the variability in Y is explained by X in this case? Or vice versa?
R^2 (coefficient of determination)
How do residuals relate to SSE?
Sum of squares for residuals
How does switching X and Y affect the regression line?
The slope and y-intc. change
What in the correlation formula is affected by outliers?
Xbar, Ybar, Sx, and Sy
In the best fit line equation, what parts need to be minimized to find the best fit line?
b0 (y-int.) and b1 (slope)
What do you have to find before solving for b0
b1x
slope
change in y / over 1 unit change in x ex: -553.7 / 1 as (x) temp increased by 1 F, coffee sales decrease by 553.7 cups on average
What does skewness mean?
having more than one outlier
What does the summation of (x - xbar)(y - ybar) show in the correlation formula?
how and x and y move together
What does (sx)(sy) show in the correlation formula?
how x and y move seperately
Residual
looking at individual points (how far is the point from the best fit line?) observed - predicted
Extrapolation
making predictions outside the range of x DON'T DO THIS
How do you find the best fit line?
minimize the sum of squared differences from points smallest SSE
What does a correlation of 0 mean?
no LINEAR relationship there is still a relationship it just isn't linear
What are the properties of correlation?
only 2 quantitative variables LINEAR RELATIONSHIP only Has no units Switching x and y doesn't change r
Predicted line
y value that is calculated
Observed line
y value that is given in the problem
What does SSE stand for?
sum of squares for error of ANY given line going through the data
Regarding slope, what happens if the change in y increased by a factor ex: 5?
the change in x changes by that factor ex: x*5
What is correlation (r) affected by?
outliers and skewness
Influential Point
outliers in the X direction
Large residual values
outliers in the y direction
How to interpret a scatterplot?
pattern, direction, strength ex: there is a strong, negative linear relationship between temperature and coffee sales
What does a correlation of -1 and 1 mean?
perfect linear relationship
What does a positive residual indicate?
point is above the best fit line
What does a negative residual
point is below the best fit line