Math 221 - Chapter 4
Since outliers can greatly affect the regression line they are also called _______ points.
Since outliers can greatly affect the regression line, these types of observations are called influential points because their presence or absence has a big effect on conclusions.
When describing two-variable associations, a written description should always include trend, shape, strength, and which of the following? The number of pairs in the data set The context of the data The name of the person who gathered the data All of the above
The context of the data
What happens to the correlation coefficient when a constant is added to each number?
The correlation coefficient remains the same when a constant is added to each number.
What happens to the correlation coefficient when numbers are multiplied by a positive constant?
The correlation coefficient remains the same when the numbers are multiplied by a positive constant.
Which of the following is not something that one looks for when studying scatterplots? Trend Shape Variation Strength
Variation note: Variation is something one does not look for when studying scatterplots.
When writing a regression equation, which of the following is not a name for the x-variable? Predictor variable Independent variable Dependent variable Explanatory variable
When writing a regression equation, the dependent variable is not another name for the x-variable. The dependent variable is the y-variable.
Statisticians often write the word _______ in front of the y-variable in the equation of the regression line.
Statisticians often write the word "predicted" in front of the y-variable in the equation of the regression line to emphasize that the line consists of predictions for the y-variable, not actual values.
The correlation coefficient is always a number between _______.
−1 and +1.
The intercept of a regression line tells a person the predicted mean y-value when the x-value is _______.
0.
A large amount of scatter in a scatterplot is an indication that the association between the two variables is _______.
A large amount of scatter in a scatterplot is an indication that the association between the two variables is weak.
What type of effect can outliers have on a regression line? Choose the correct answer below. A. A big effect B. Outliers are never included in a regression line. C. A small and insignificant effect D. No effect
A. A big effect note: A regression line is a line of means, and outliers have a big effect on the regression line.
The scatterplot shows the actual weight and desired weight change of some students. Thus, if they weighed 220 and wanted to weigh 190, the desired weight change would be negative 30. Explain what you see. In particular, what does it mean that the trend is negative? A. The more people weigh, the more weight they tend to want to lose. B. The more people weigh, the less weight they tend to want to lose. C. The less people weigh, the more weight they tend to want to lose.
A. The more people weigh, the more weight they tend to want to lose. note: Since there is a negative trend, it appears that the more people weigh, the more weight they tend to want to lose.
One important use of the regression line is to do which of the following? A. To determine the strength of a linear association between two variables B. To determine if a distribution is unimodal or multimodal C. To make predictions about the values of y for a given x-value D. Both A and B are correct
An important use of the regression line is to make predictions about the values of y for a given x-value.
When one has influential points in their data, how should regression and correlation be done? Choose the correct answer below. A. Always include the influential points in your data set when doing regression and correlation B. Do regression and correlation with and without these points and comment on the differences C. Remove the influential points from your data set before doing regression and correlation D. Don't use regression or correlation on data sets containing influential points
B. Do regression and correlation with and without these points and comment on the differences note: When one has influential points in their data, they should do the regression and correlation with and without these points and comment on the differences.
A doctor is studying cholesterol readings in his patients. After reviewing the cholesterol readings, he calls the patients with the highest cholesterol readings (the top 5% of readings in his office) and asks them to come back to discuss cholesterol-lowering methods. When he tests these patients a second time, the average cholesterol readings tended to have gone down somewhat. Explain what statistical phenomenon might have been partly responsible for this lowering of the readings. A. The cholesterol going down might be partly caused by extrapolation, since the second measurement is closer to the mean. B. The cholesterol going down might be partly caused by regression toward the mean, since the second measurement is farther from the mean. C. The cholesterol going down might be partly caused by extrapolation, since the second measurement is farther from the mean. D. The cholesterol going down might be partly caused by regression toward the mean, since the second measurement is closer to the mean.
D. The cholesterol going down might be partly caused by regression toward the mean, since the second measurement is closer to the mean. note: Regression towards the mean is the event where if a variable is extreme on its first measurement, it will tend to be closer to the average on a second measurement. Also, if it is extreme on a second measurement, it will tend to have been closer to the average on the first measurement. The cholesterol going down might be partly caused by regression toward the mean.
Which do you think has a stronger relationship with value of the land long dash—the number of acres of land or the number of rooms in the homes? Why? A. The number of acres of land has a stronger relationship with the value of the land, as shown by the fact that the points are more scattered in a vertical direction. B. The number of rooms in the homes has a stronger relationship with the value of the land, as shown by the fact that the points are less scattered in a vertical direction. C. The number of acres of land has a stronger relationship with the value of the land, as shown by the fact that the points are less scattered in a vertical direction. D. The number of rooms in the homes has a stronger relationship with the value of the land, as shown by the fact that the points are more scattered in a vertical direction.
answer: C. note: Weak associations result in a large amount of scatter in the scatterplot. A large amount of scatter means that points have a great deal of spread in the vertical direction. The number of acres of land has a stronger relationship with the value of the land, as shown by the fact that the points are less scattered in a vertical direction.
The value that measures how much variation in the response variable is explained by the explanatory variable is called the _______.
answer: coefficient of determination The coefficient of determination is the correlation coefficient squared; r2. In fact, this statistic is often called r-squared. This value measures how much variation in the response variable is explained by the explanatory variable.
Attempting to use the regression equation to make predictions beyond the range of the data is called _______.
answer: extrapolation Extrapolation means that one uses the regression line to make predictions beyond the range of the data. This practice can be dangerous, because although the association may have a linear shape for the range one is observing, that might not be true over a larger range.
The _______ is a number that measures the strength of the linear association between two numerical variables.
correlation coefficient
What is an influential point? A. An influential point is a point that changes the regression equation by a large amount. B. An influential point is used in the regression line to make predictions beyond the range of the data. C. An influential point is a point that measures the strength of the linear association between two numerical variables.
An influential point is a point that changes the regression equation by a large amount. When there are influential points in the data, it is good practice to try the regression and correlation with and without these points and to comment on the difference.
Another name for the regression line is the _______ line.
Another name for the regression line is the least squares line because it is chosen so that the sum of the squares of the differences between the observed y-value and the value predicted by the line is as small as possible.
If there is a positive correlation between number of years studying grammar and thumb length (for children), does that prove that longer thumbs cause more studying of grammar, or vice versa? Can you think of a hidden variable that might be influencing both of the other variables? A. It proves causation, because the two variables are related implying causation. Longer thumbs cause an increase in years of studying. The hidden variable is age. B. It does not prove causation, because older children have longer thumbs and have studied grammar longer. Longer thumbs do not cause an increase in years of studying. The hidden variable is age. C.It does not prove causation, because older children have longer thumbs and have studied grammar longer. Longer thumbs do not cause an increase in years of studying. There is no hidden variable. D. It proves causation, because the two variables are related implying causation. Longer thumbs cause an increase in years of studying. There is no hidden variable.
B. It does not prove causation, because older children have longer thumbs and have studied grammar longer. Longer thumbs do not cause an increase in years of studying. The hidden variable is age. note: Remember that correlation does not imply causation. Older children have longer thumbs and have studied grammar longer. However, longer thumbs do not cause an increase in years of studying. Both are affected by age.
When computing the correlation coefficient, what is the effect of changing the order of the variables on r? Choose the correct answer below. A. It has no effect on r. B. It changes both the sign and magnitude of r. C. It changes the magnitude of r. D. It changes the sign of r.
C. It has no effect on r. Changing the order of the variables does not change r. Note that in the equation for r, it does not matter which variable is called x and which is called y.
It has been noted that people who go to church frequently tend to have lower blood pressure than people who don't go to church. Does this mean you can lower your blood pressure by going to church? Why or why not? Explain. A. Going to church may not cause lower blood pressure. Just because two variables are related does not show that one caused the other. B. Since the two variables are not related, going to church may not cause lower blood pressure. C. Since the two variables are related, going to church may not cause lower blood pressure.
Correlation does not imply causation. Going to church may not cause lower blood pressure. Just because two variables are related does not show that one caused the other. It could be that healthy people are more likely to go to church, or there could be other confounding factors.
Under what conditions can extrapolation be used to make predictions beyond the range of the data? Choose the correct answer below. A. When there is a strong positive linear association in the data. B. When the correlation coefficient is close to −1 or +1. C. When the data set contains a large number of pairs of data. D. Never
Extrapolation can never be used to make predictions beyond the range of the data.
If you were trying to predict the value of a parcel of land in this area (on which there is a home), would you be able to make a better prediction by knowing the acreage or the number of rooms in the house? Explain. A. The number of rooms because the association is stronger between the value of land and the number of rooms than with the acreage because the vertical spread is less. B. The acreage because the association is stronger between the value of land and acreage than with the number of rooms because the vertical spread is less. C.Neither because the association is the same between the value of land and the acreage and the value of land and the number of rooms.
answer: c note: The stronger the association, the better the model is for prediction. The scatterplots show that the association is stronger between the value of land and acreage than the between the value of land and the number of rooms because the vertical spread is less. Therefore, knowing the acreage is a better way to predict the value of the land than knowing the number of rooms in the house.
When can a correlation coefficient based on an observational study be used to support a claim of cause and effect? A. When the correlation coefficient is close to −1 or +1. B. When the scatterplot of the data has little vertical variation. C. When the correlation coefficient is equal to −1 or +1. D. Never
never note: A correlation coefficient based on an observational study can never be used to support a claim of cause and effect.
For what types of associations are regression models useful? Non-linear Linear Both linear and non-linear For all types of associations
note: Regression models are useful only for linear associations. If the association is not linear, a regression model can be misleading and deceiving.
Since, in general, the longer a car is owned the more miles it travels one can say there is a _______ between age of a car and mileage.
note: Since the longer a car is owned the more miles it travels, there is a positive association because this indicates that there is an increasing trend.
Fill in the blank. The _______ is a tool for making predictions about future observed values and is a useful way of summarizing a linear relationship.
regression equation
The correlation coefficient makes sense only if the trend is linear and the _______.
variables are numerical.