Module 5: Examining Relationships Checkpoint 2
Using data from 100 hospitals, a correlation of r = 0.78 is found between the size of a hospital (measured in number of beds) and the average length of patient stays (measured in days). Which of the following is true?
"Severity of illness" is a possible lurking variable that is behind this relationship. Correct. Severity of illness is a possible lurking variable, since the seriousness of a patient's condition is will determine the length of stay. Also, since larger hospitals are more likely to treat severe conditions, this would explain why the patient stays are longer for large hospitals.
If there is no relationship (linear or otherwise) between two quantitative variables as observed on a scatterplot, the value of the correlation coefficient, r, is likely to be which of the following? Closer to 1 Closer to −1 Closer to 0 Either closer to −1 or 1
Closer to 0 Correct. A correlation value close to 0 indicates no relationship between two quantitative variables.
High blood pressure is unhealthy. Here are the results of one of the studies that link high blood pressure to death from cardiovascular disease. The researchers classified a group of white males aged 35 to 64 as having low blood pressure or high blood pressure, then followed the subjects for 5 years. The following two-way table gives the results of the study: In this example, which of the following would be appropriate to calculate?
Conditional column percentages Correct. To explore a categorical relationship, the most appropriate summary is the outcome percentages within each explanatory group. In this case, the column variable (blood pressure) would be the explanatory variable, so the appropriate percentages are the death percentages from each blood pressure group.
A 2009 study analyzed data from the National Longitudinal Study of Adolescent Health. Participants were followed into adulthood. Each study participant was categorized as to whether they were obese (BMI >30) or not and whether they were dating, cohabiting, or married. The researchers were trying to determine the effect of relationship status on obesity. The table below summarizes the results. Dating Cohabiting Married Total Obese 81 103 147 331 Not obese 359 326 277 962 Total 440 429 424 1,293 In this example, which of the following would it be appropriate to calculate?
Conditional column percentages To explore a categorical relationship, the most appropriate summary is the outcome percentages within each explanatory group. For the table shown, think about whether the row variable (obesity) or the column variable (relationship status) would be the explanatory variable.
A local ice cream shop kept track of the number of cans of cold soda it sold each day, and the temperature that day, for two months during the summer. The data are displayed in the scatterplot below: The one outlier corresponds to a day on which the refrigerator for the soda was broken. Which of the following is true?
If the outlier were removed, r would increase. Correct. The outlier makes the data less "tightly clustered" around the linear trend. Removing the outlier will increase the overall tightness of the points around their straight-line trend, so the correlation coefficient would be closer to 1.
The weights (in pounds) and cholesterol levels (in mg/dL) of several individuals was observed. The data are shown in the scatterplot below: The outlier on the graph is likely due to an error in recording the data. Which of the following statements is true?
If the outlier were removed, the correlation coefficient (r) would increase. Correct. The outlier lies outside the linear pattern formed by most of the data. Since correlation measures how tightly the points in a graph cluster about a straight line, removing the outlier, which has the largest deviation from the line will increase the correlation.
A local city council's study found that the correlation between number of liquor stores in a neighborhood and neighborhood crime rates across all city neighborhoods was r = 0.88. Which one of the following statements is true?
Population is a possible lurking variable in this scenario, since areas that are more densely populated are more likely to have higher crime rates. Correct. It is possible that a lurking variable, such as population, is behind the observed relationship.
The data in the scatterplot below are an individual's age (in years) and the expected life span (in years). The circles correspond to females and the x's to males. Which of the following conclusions is most accurate?
There is a negative correlation between age and life expectancy for both males and females. Correct. For each separate gender, we see that the points on the scatterplot tend to fall from left to right. This indicates a negative relationship. So the correlation coefficient would be negative for males, and would also be negative for females.
The data in the scatterplot below are an individual's weight and the time it takes (in seconds) on a treadmill to raise his or her pulse rate to 140 beats per minute. The o's correspond to females and the +'s to males. Which of the following conclusions is most accurate?
There is a negative correlation between time and weight for males and for females. Correct. For each separate gender, we see that the points on the scatterplot tend to fall from left to right. This indicates a negative relationship. So the correlation coefficient would be negative for males, and would also be negative for females.
Suppose that the correlation r between two quantitative variables was found to be r = 0. Which of the following is the best interpretation of this correlation value?
There is no linear relationship between the two variables. Correct. The correlation coefficient indicates the strength of the linearity of a scatterplot relationship, and a correlation coefficient that is zero indicates no linear relationship. However, that there might still be some other (non-straight-line) kind of relationship.
A study of admission rates of men and women into graduate programs for several majors at a certain university was conducted. The study found that overall a higher percentage of men were admitted than women. However, when looking at each major program separately it was found that the rate of admission for women was higher than that for men. Which of the following statements is true? Check all that apply.
This is an example of Simpson's paradox. "Type of major" may be a lurking variable in this study. Correct. This is an example of Simpson's paradox in which an observed association disappears when the additional variable "type of major" is considered. Thus "type of major" is a lurking variable that may explain the difference in admission rates between men and women at this university.
A study was done of all homicide convictions in the State of Florida between 1976 and 1980 in order to examine if the application of the death sentence was racially biased. The data showed that a larger percentage of white suspects (11.2%) were sentenced to death than black suspects (8.5%). However, if the race of the victim was included in the analysis, the study found that for white victims, a larger percentage of black suspects (19.3%) were sentenced to death than white suspects (12.3%). Which of the following is correct? Check all that apply. "Race of the suspect" is a lurking variable in this situation. "Race of the victim" is a lurking variable in this situation. This is an example of Simpson's paradox. This is an example of a negative association.
This is an example of Simpson's paradox. "Race of the victim" is a lurking variable in this situation. You are correct that this is an example of Simpson's paradox, because there is an association that is seen a particular way for each of several groups separately but reverses when the groups are combined. Such reversal typically indicates a lurking variable, which in this case is "race of victim." However, it is only appropriate to call an association positive or negative when the variables are quantitative. But in this case, both variables ("race of suspect" and "use of death penalty") are categorical.
What can we say about the relationship between the correlation r and the slope b of the least-squares line for the same set of data?
r and b have the same sign (+ or −). Correct. Although the correlation r isn't the same as the slope b, the thing they always have in common is their sign.