Module 4 stats
Correlation of a sample (r) will always be a number between
1 and -1
Each of the following statements about correlation has a statistical problem, in terms of the properties of correlation (see lecture notes). In each case identify the problem. The correlation between weight and height is .4 inches per pound.
Correlation is unit free. It has no units at all.
Each of the following statements about correlation has a statistical problem, in terms of the properties of correlation (see lecture notes). In each case identify the problem. The correlation between number of rushing yards and number of passing yards for Ohio State is 0.6. In feet, the correlation is 0.6 x 3 = 0.18.
Correlation is unit free; if you change the units the correlation doesn't change.
Each of the following statements about correlation has a statistical problem, in terms of the properties of correlation (see lecture notes). In each case identify the problem. The correlation between the shelf number at the store (1 = lowest) and the cost of the product (per ounce) is 1.2.
Correlation must be between -1 and +1.
An influential point always has a large residual (True or False)
False; it usually does not, because it pulls the line toward itself.
The residual plot should show you a linear pattern if a regression line fits well. (True or False)
False; the scatterplot, yes, but not the residual plot.
Each of the following statements about correlation has a statistical problem, in terms of the properties of correlation (see lecture notes). In each case identify the problem. There is a strong correlation between region of the country and sales.
Region of country is not a quantitative variable.
If you know the value of x-bar (the mean of the x values) and you know the equation of the regression line, can you find the value of y-bar (the mean of the y-values)? If so, explain how, and if not, explain why not.
Yes, you can find the mean of y (y-bar), by taking the regression line and plugging in x-bar. We learned in class that the point (x-bar, y-bar) is on EVERY regression line.
Suppose you know the value of R2. Can you simply take its square root to get the correct correlation? Explain why or why not. If not, what other info do you need?
You only get the magnitude of r by taking the square root of r-squared. To get the sign on r, you need to look at the sign on the slope or the direction of the scatterplot.
Which of the following are affected by outliers?
a. The correlation b. The slope of the regression line c. The Y-intercept of the regression line
Correlation has . a. The same units as the data b. No units
b. No units
Scatterplots examine relationships between what type(s) of variables? a. Categorical b. Quantitative c. Both categorical and quantitative variables
b. Quantitative
When finding the correlation between two quantitative variables, you will get the same answer if you switch X and Y. Explain briefly.
If you look at the formula for r (formula sheet is in the Course Info section of Carmen), you see that r measures how X and Y move together (numerator) compared to how they move separately (denominator). If you switch the X's and Y's around in the entire formula you get the same answer, by commutative property of multiplication.
Each of the following statements about correlation has a statistical problem, in terms of the properties of correlation (see lecture notes). In each case identify the problem. The correlation sales and customer returns is .35. This means the correlation between customer returns and sales is 1 - .35 = .65.
If you switch the X and Y variables, the correlation stays the same.
Correlation is a measure of the strength and direction of what type of relationship between two quantitative variables?
LINEAR
Correlation is affected by outliers. Explain why, briefly.
Looking at the formula for r, correlation is based on the mean of X, the mean of Y, the SD of X, and the SD of Y. All four of these items are affected by outliers, as we learned in Chapter 1.
The value of r2 is always greater than or equal to the value of r.
No, it's actually less than or equal to because you are finding the square of a number between 0 and 1.
To fit a linear regression line all you need is a strong value of r. (True or False)
No; you also need to look at the scatterplot because you could have a high correlation that is actually in the shape of a curve. And you could have a circular pattern with r = 0.