Correlation and regression analysis
spurious correlation
EG price of petrol shows a positive correlation with divorce rate over time
regression line
a straight line equation used to model the relationship between the dependant and independent variable
coefficient of determination R2
all points wont be along the line but a straight line can summaries the pattern of the data the proportion of the total variabilityof the dependant variable Y explained by the regression of X is called R2 often quoted as a measure of goodness to fit of the regression line to data equal to the square of the correlation coefficient r
correlation analysis
correlation measures the strength of the linear relationship between 2 quantitative variables
simple linear regression
describe quantitatively the linear relationship between a dependant variable Y and the independent variable X regression is used when it is thought one variable affects or predicts the other us value of X to predict Y may want to quantify the relationship between 2 quantitative variables by a regression line
spearmans rank correlation
if 2 random variables are not normally distributed then use this test ordinal scales and rank data outliers do not effect it
line of best fit
line comes closer to all points than any other line least squares used to fit the line of best fit method choses a line so that the square of the vertical between the line and the point is minimised
multiple linear regression
method can be extended and have more predictor varibles in the regression equation you can investigate the effect of both height and age on shoe size multiple prefictor values
pearson correlation coefficient
r, is a measure of linear association between 2 continuous numerical variables normally distributed
regression assumptions
relationship between x and y is approximately linear and this can be checked in a scatter plot don't fit a straight line into a non linear relationship dependant is normally distributed and this can be checked by a histogram or boxplot
when not to use correlation coefficient
there is a non linear relationship between variables correlation can miss a strong non linear relationship there is presence of outliers there are distinct sub groups
equation of a straight line
y=mx+c y is dependant, x is independent, c is the intercept of the Y when X=0 and m is the gradient gradient shows change in y for a unit change in x when value of Y increases as X increases it will be positive or if y decreases as x decreases it will be negative equation of regression line gives values of y for different values of x