AP Stats: Unit 3 Outcomes
Find appropriate non-linear models if the simple linear model is not appropriate, confirming the model with proof from the residual plot.
1. Determine the type of function from patterns on regular scatterplot and residual plot - quadratic and exponential - logarithmic and square root - power 2. Force data to be linear by undoing the function - variable that is operated on reverts to original - variable that is not operated on reverts to function - create a third list with the reverse of the function - keep track of each list's values 3. Run linear regression with the new lists - create strong, linear relationship 4. Plot residual plot with new lists after linear regression\ - looking for most scattered or most convenient 5. Include all properties and functions done to "x" or "y" when writing LSRL
Offer possible confounding factors to explain why two variables are correlated but not necessarily causated.
Correlation does not mean Causation - "x" is not proven to cause "y"
Interpret the slope, y-intercept (if possible), and the correlation coefficient.
Slope: "For each (x unit) increase in (x variable), there is an approximate (increase / decrease) of (slope) in (y variable)." Y-Intercept: "When the (x variable) in (x unit) is zero, the (y variable) in (y unit) is (y-intercept)." Correlation Coefficient: "There is a (direction) (strength) linear association between (x variable) and (y variable)."
Find and interpret the coefficient of determination.
coefficient of determination: How much of variation in "y" is attributed to "x"? r^2: proportion of variation in "y" - attributed to approximately linear relationship with "x" - remains true despite which variable is which - non-resistant, affected by outliers "Approximately (r^2)% of the variation in (y variable) can be explained by the LSRL of (x variable) and (y variable)."
Find the Pearson Correlation Coefficient given bivariate data as a measure of the extent to which the variables have a linear relationship.
correlation coefficient: Is it a good line? r: quantitative numerical assessment of linear relationships - tells direction and strength - between bivariate, quantitative data Properties: - legitimate values on interval [-1, 1] - "r" determines strength of correlation, (points near LSRL) - sign of "r" determines positive / negative correlation - does not depend on unit of measurement - able to switch variables "x" & "y" - non-resistant, affected by outliers - only useful regarding linear relationships "There is a (direction) (strength) linear association between (x variable) and (y variable)."
Identify numerical data as having a strong or weak and a positive, negative, or no linear relationship using the Pearson coefficient.
little to no correlation: r = 0 weak correlation: r = (0, 0.5) moderate correlation: r = [0.5, 0.8) strong correlation: r = [0.8, 1] positive correlation: as "x" increases, "y" increases - direct relationship between x & y negative correlation: as "x" increases, "y" decreases - inverse relationship between "x" & "y"
Find outlier and influential points.
outlier: data point with a large residual - in a regression setting with two variables - has potential to drastically change LSRL if removed influential point: point that drastically changes LSRL - influences where LSRL is located - if removed, will significantly change where LSRL is located
Create scatterplots with the residuals against both x and y.
residual plot: Is a line appropriate? used to find if linear association exists between "x" & "y" - from scatterplot of (x, y) to scatterplot of (x, residuals) - LSRL becomes x-axis - must run linear regression before graphing residuals - can be graphed against other statistics besides "x"
Calculate and interpret residuals.
residual: distance between deviation points and LSRL - sum of residuals in zero, same distance above and below - vertical deviation from LSRL - found by equation "ӯ - y" overestimate: if residual falls below LSRL - real value less than predicted value underestimate: if residual falls above LSRL - real value more than predicted value
Find the LSRL given either a data set or the summary statistics.
Given Set of Data: - use linear regression on calculator with original data - slope is "b" in calculator linear regression - y-intercept is "a" in calculator linear regression Given Set of Summary Statistics: y = b0 + b1 x - where y: predicted value of y - where b0: y-intercept - where b1: slope b0 = ӯ - b1 x̅ - where b0: y-intercept - where ӯ: mean of "y" - where b1: slope - where x̅: mean of "x" b1 = r (Sy / Sx) - where b1: slope - where r: correlation coefficient - where Sy: standard deviation of "y" - where Sx: standard deviation of "x" Given Computer Output: - use computer output with slope and intercept - slope is variable in computer generated analysis - "constant" in computer generated analysis
Predict values of a correlation using the LSRL.
LSRL: Least Squares Regression Line use "x" data to predict "y" data - gives the best fit to the data set y-intercept: value of "y" when "x" is equal to zero slope: change in "y" when "x" increases by one unit
Describe the characteristics of LSRL.
LSRL: Least Squares Regression Line use "x" data to predict "y" data - gives the best fit to the data set - minimizes the sum of squares of the deviations from line - non-resistant, affected by outliers y-intercept: value of "y" when "x" is equal to zero - not always a real life meaning - "a" in calculator linear regression - "b0" in summary statistics - "constant" in computer generated analysis slope: change in "y" when "x" increases by one unit - "b" in calculator linear regression - "b1" in summary statistics - variable in computer generated analysis
Use the residual plot to determine if two variables have a linear association.
Linear: - very scattered residual plot - equal residuals above and below LSRL Not Linear: - pattern forming on residual plot - unequal residuals above and below LSRL
Determine if the correlation coefficient , the coefficient of determination, and the LSRL are resistant.
Non-Resistant: affected by outliers - correlation coefficient - coefficient of determination - LSRL outlier: data point with a large residual - in a regression setting with two variables - has potential to drastically change LSRL if removed influential point: point that drastically changes LSRL - influences where LSRL is located - if removed, will significantly change where LSRL is located