Statistics Chapter 4
Statisticians often write the word _______ in front of the y-variable in the equation of the regression line.
"predicted"
The correlation coefficient is always a number between _______.
-1 and 1
When computing the correlation coefficient, what is the effect of changing the order of the variables on r?
It has no effect on r.
The _______ is a number that measures the strength of the linear association between two numerical variables.
correlation coefficient
Another name for the regression line is the _______ line.
least squares because it is chosen so that the sum of the squares of the differences between the observed y-value and the value predicted by the line is as small as possible.
When testing the IQ of a group of adults (aged 25 to 50), an investigator noticed that the correlation between IQ and age was negative. Does this show that IQ goes down as we get older? Why or why not? Explain.
No, correlation does not mean causation.
Some investors use a technique called the "Dogs of the Dow" to invest. They pick several stocks that are performing poorly from the Dow Jones group (which is a composite of 30 well-known stocks) and invest in these. Explain why these stocks will probably do better than they have done before.
Part of the poor historical performance could be due to chance, and if so, regression toward the mean predicts that stocks turning in a lower-than-average performance should tend to perform closer to the mean in the future. In other words, they should increase.
The correlation between height and arm span in a sample of adult women was found to be r=0.941. The correlation between arm span and height in a sample of adult men was found to be r=0.864. Which association—the association between height and arm span for women, or the association between height and arm span for men—is stronger? Explain.
The association between height and arm span for women is stronger because the value of r is farther from 0.
If the correlation between height and weight of a large group of people is 0.67, find the coefficient of determination (as a percent) and explain what it means. Assume that height is the predictor and weight is the response, and assume that the association between height and weight is linear.
The coefficient of determination is 44.89%. Therefore, 44.89% of the variation in weight can be explained by the regression line.
How is the coefficient of determination related to the correlation, and what does the coefficient of determination show?
The coefficient of determination is the square of the correlation, and it shows the proportion of the variation in the response variable that is explained by the explanatory variable.
The correlation between house price (in dollars) and area of the house (in square feet) for some houses is 0.91. If you found the correlation between house price in thousands of dollars and area in square feet for the same houses, what would the correlation be?
The new correlation would be 0.91. Changing units or multiplying the numbers for a variable by a positive constant does not change the correlation.
Since, in general, the longer a car is owned the more miles it travels one can say there is a _______ between age of a car and mileage.
a positive association
Attempting to use the regression equation to make predictions beyond the range of the data is called _______.
extrapolation
Since outliers can greatly affect the regression line they are also called _______ points.
influential
For what types of associations are regression models useful?
linear
Under what conditions can extrapolation be used to make predictions beyond the range of the data?
never
When can a correlation coefficient based on an observational study be used to support a claim of cause and effect?
never
The _______ is a tool for making predictions about future observed values and is a useful way of summarizing a linear relationship.
regression equation
When describing two-variable associations, a written description should always include trend, shape, strength, and which of the following?
The context of the data
What is an influential point?
An influential point is a point that changes the regression equation by a large amount.
The value that measures how much variation in the response variable is explained by the explanatory variable is called the _______.
coefficient of determination
What is extrapolation and why is it a bad idea in regression analysis?
Extrapolation is prediction far outside the range of the data. These predictions may be incorrect if the linear trend does not continue, and so extrapolation generally should not be trusted.
It has been noted that people who go to church frequently tend to have lower blood pressure than people who don't go to church. Does this mean you can lower your blood pressure by going to church? Why or why not? Explain.
Going to church may not cause lower blood pressure. Just because two variables are related does not show that one caused the other.
Suppose that the growth rate of children looks like a straight line if the height of a child is observed at the ages of 24 months, 28 months, 32 months, and 36 months. If you use the regression obtained from these ages and predict the height of the child at 21 years, you might find that the predicted height is 20 feet. What is wrong with the prediction and the process used?
Growth rates slow as people get older. One should not extrapolate. That is, one should not predict outside the range of the data.
If there is a positive correlation between number of years studying math and thumb length (for children), does that prove that longer thumbs cause more studying of math, or vice versa? Can you think of a hidden variable that might be influencing both of the other variables?
It does not prove causation, because older children have longer thumbs and have studied math longer. Longer thumbs do not cause an increase in years of studying. The hidden variable is age.
The correlation coefficient makes sense only if the trend is linear and the _______.
variables are numerical
A large amount of scatter in a scatterplot is an indication that the association between the two variables is _______.
weak
Suppose a doctor telephones those patients who are in the highest 10% with regard to their recently recorded blood pressure and asks them to return for a clinical review. When she retakes their blood pressures, will those new blood pressures, as a group (that is, on average), tend to be higher than, lower than, or the same as the earlier blood pressures, and why?
The new blood pressures will tend to be lower. Part of the high reading might be due to chance, and regression toward the mean predicts that a repeated measurement will be closer to the typical value.
One important use of the regression line is to?
To make predictions about the values of y for a given x-value
The intercept of a regression line tells a person the predicted mean y-value when the x-value is _______.
0
What type of effect can outliers have on a regression line?
A big effect
Does a correlation of −0.4 or +0.5 give a larger coefficient of determination? We say that the linear relationship that has the larger coefficient of determination is more strongly correlated. Which of the values shows a stronger correlation?
A correlation of +0.5 gives a larger coefficient of determination and shows a stronger correlation.
When writing a regression equation, what is not a name for the x-variable?
Dependent variable
When one has influential points in their data, how should regression and correlation be done?
Do regression and correlation with and without these points and comment on the differences