Statistics: Linear Relationships
Properties of r
1) The correlation does not change when the units of measurement of either one of the variables change. In other words, if we change the units of measurement of the explanatory variable and/or the response variable, this has no effect on the correlation (r) . 2) The correlation only measures the strength of a linear relationship between two variables. It ignores any other type of relationship, no matter how strong it is. 3) A value of r close to 1 or -1 is not enough by itself (without looking at the scatterplot) to conclude that the relationship is linear.
Equation of a straight line
A line is described by a set of points (X,Y) that obey a particular relationship between X and Y . That relationship is called the equation of the line, which we will express in the following form: Y=a+bX In this equation, a and b are constants that can be either negative or positive. The reason to write the line in this form is that the constants a and b tell us what the line looks like, as follows: The intercept a is the value of Y when X = 0 The slope b is the change in Y for every increase of 1 unit in X. Note that if either the intercept (a) or the slope (b) is negative, the corresponding blue arrow on the diagram above would point downward.
Least Square Criterion
Among all the lines that look good on your data, choose the one that has the smallest sum of squared vertical deviations.This line is called the least-squares regression line
Correlation Note
If the scatterplot doesn't indicate there's at least somewhat of a linear relationship, the correlation doesn't mean much. Why measure the amount of linear relationship if there isn't enough of one to speak of? However, you can take the idea of no linear relationship two ways: 1) If no relationship at all exists, calculating the correlation doesn't make sense because correlation only applies to linear relationships; and 2) If a strong relationship exists but it's not linear, the correlation may be misleading, because in some cases a strong curved relationship exists. That's why it's critical to examine the scatterplot first.
The Correlation Coefficient—r Interpretation
In a positive linear relationship (with the value of r ranging from 0 to 1), as the value of one variable increases, the value of the other variable also increases. In a negative linear relationship (with the value of r ranging from -1 to 0), as the value of one variable increases, the value of the other variable decreases.The closer the value of r gets to 1 or -1, the stronger the relationship between the two variables.
the slope and intercept of the least squares regression line are found using the following formulas:
Like any other line, the equation of the least-squares regression line for summarizing the linear relationship between the response variable (Y) and the explanatory variable (X) has the form: Y=a+bX All we need to do is calculate the intercept a, and the slope b, which is easily done if we know: X¯¯¯—the mean of the explanatory variable's values SX—the standard deviation of the explanatory variable's values Y¯¯¯—the mean of the response variable's values SY—the standard deviation of the response variable's values r—the correlation coefficient
Linear Regression: Summarizing the Pattern of the Data with a Line
Linear regression is the technique of finding the line that best fits the pattern of the linear relationship (or, in other words, the line that best describes how the response variable linearly depends on the explanatory variable)We need to agree on what we mean by "best fits the data;" in other words, we need to agree on a criterion by which we would select this line.
The Correlation Coefficient—r
The correlation coefficient (r) is a numerical measure that measures the strength and direction of a linear relationship between two quantitative variables.
Use of the regression line
The slope of the regression line can be interpreted as the average change in the response variable (Y) when the explanatory variable (X) increases by one unit. Or for Prediction
The Correlation Coefficient—r Calculation
r is calculated using the following formula
Rules for Interpreting the Correlation Coefficient R
Exactly -1. A perfect downhill (negative) linear relationship -0.70. A strong downhill (negative) linear relationship -0.50. A moderate downhill (negative) relationship -0.30. A weak downhill (negative) linear relationship 0. No linear relationship +0.30. A weak uphill (positive) linear relationship +0.50. A moderate uphill (positive) relationship +0.70. A strong uphill (positive) linear relationship Exactly +1. A perfect uphill (positive) linear relationship
