Chapter 10
There is a certain geyser that erupts on a regular basis. Researchers are interested in the relationship between the duration of a current eruption of the geyser (duration) and the time between when that eruption ends and the next eruption begins (interval). Review the accompanying scatterplot of 222 eruptions of the geyser. What is the correlation coefficient? (Try to figure out the correct answer without calculating the correlation coefficient.)
+0.88
The value of r, the linear correlation coefficient, that represents the strongest negative correlation between two variables is
-1
The value of r, the linear correlation coefficient, that represents no correlation between two variables is
0
The value of r, the linear correlation coefficient, that represents the strongest positive correlation between two variables is
1
In regression, what is the difference between an observed value of the response variable and its predicted value called?
A residual
What is a residual?
A residual is a value of y−y, which is the difference between an observed value of y and a predicted value of y.
What is a scatterplot and how does it help us?
A scatterplot is a graph of paired (x, y) quantitative data. It provides a visual image of the data plotted as points, which helps show any patterns in the data.
Which of the following is NOT one of the three common errors involving correlation?
Correlation does not imply causality
Which of the following is not equivalent to the other three?
Dependent variable
Which of the following is NOT a requirement in determining whether there is a linear correlation between two variables?
If r>1, then there is a positive linear correlation.
Which of the following is NOT true for a hypothesis test for correlation?
If |r|>critical value, we should fail to reject the null hypothesis and conclude that there is not sufficient evidence to support the claim of a linear correlation.
Suppose the equation of a least-squares regression line is y=−3.17−2.4x. What can be said about the y-intercept?
It is −3.17.
Suppose the equation of a least-squares regression line is y=−3.17−2.4x. What can be said about the correlation coefficient?
It is negative, but its exact value cannot be determined from the given information.
In regression, what can be said about the sum of the residuals of all the observations?
It will always be 0.
When analyzing two quantitative variables, what is the first thing that should be done?
Make a scatterplot.
Twenty different statistics students are randomly selected. For each of them, their body temperature (°C) is measured and their head circumference (cm) is measured. If it is found that r=0, does that indicate that there is no association between these two variables?
No, because while there is no linear correlation, there may be a relationship that is not linear.
If we find that there is a linear correlation between the concentration of carbon dioxide in our atmosphere and the global temperature, does that indicate that changes in the concentration of carbon dioxide cause changes in the global temperature?
No. The presence of a linear correlation between two variables does not imply that one of the variables is the cause of the other variable.
Data were collected on many different variables of a fast food chain's sandwiches several years ago. Two variables were the serving size (in ounces) of a sandwich and the number of calories in the sandwich. A hungry customer wanted to estimate the number of calories in a sandwich based on its serving size. With this in mind, which variable would go on the y-axis in the scatterplot?
Number of calories goes on the y-axis, since it is the response variable.
There is a certain geyser that erupts on a regular basis. Researchers are interested in the relationship between the duration of a current eruption of the geyser (duration) and the time between when that eruption ends and the next eruption begins (interval). Review the accompanying scatterplot of 222 eruptions of the geyser. The least-squares regression equation is y=33.967+11.358x, where y is the interval from the end of the current eruption to the beginning of the next eruption and x is the duration of current eruption. For a duration of 4 minutes, y=75.4 minutes. What does this mean?
The average wait-time until the next eruption for all eruptions that last 4 minutes is 75.4 minutes. For eruptions that last 4 minutes, it is estimated that a visitor will have to wait 75.4 minutes after the current eruption ends before the next eruption begins.
What is the definition of the correlation coefficient?
The correlation coefficient is a measure that describes the direction and strength of the linear relationship between two quantitative variables.
What is the difference between the following two regression equations? ^y=b0+b1x y=
The first equation is for sample data; the second equation is for a population.
How is the best-fitting line between the points in a scatterplot defined?
The line that gives the smallest sum of the squared vertical distances between each point and the line
Which of the following is NOT a property of the linear correlation coefficient r?
The linear correlation coefficient r is robust. That is, a single outlier will not affect the value of r.
Which of the following is not a requirement for regression analysis?
The method for regression analysis line is not robust. It is seriously affected by a small departure from a normal distribution.
In what sense is the regression line the straight line that "best" fits the points in a scatterplot?
The regression line has the property that the ▼ sum of squares of the residuals is the ▼ lowest possible sum.
There is a certain geyser that erupts on a regular basis. Researchers are interested in the relationship between the duration of a current eruption of the geyser (duration) and the time between when that eruption ends and the next eruption begins (interval). Review the accompanying scatterplot of 222 eruptions of the geyser. The least-squares regression equation is y=33.967+11.358x, where y is the interval from the end of the current eruption to the beginning of the next eruption and x is the duration of current eruption. In this equation, what is 11.358?
The slope of the least-squares regression line
In this section we use r to denote the value of the linear correlation coefficient. Why do we refer to this correlation coefficient as being linear?
The term linear refers to a straight line, and r measures how well a scatterplot fits a straight-line pattern.
What is the relationship between the linear correlation coefficient r and the slope b1 of a regression line?
The value of r will always have the same sign as the value of b1.
Which of the following statements best describes this scatterplot? 12
There are two clusters of points. The relationship between X and Y for each cluster is strong and positive.
Gina calculated a correlation coefficient between hours studied and grade point average as +0.75. Which of the following is a correct statement based on this correlation coefficient?
There is a fairly strong positive relationship between hours studied and grade point average, indicating that grade point averages tend to be higher for students who study more.
Data were collected on many different variables of a fast food chain's sandwiches several years ago. Two variables were the serving size (in ounces) of a sandwich and the number of calories in the sandwich. Review the accompanying scatterplot of serving size versus number of calories. Which of the following best describes the relationship between these two variables?
There is a fairly strong positive relationship with no extreme outliers.
Which of the following statements best describes this scatterplot? 10
There is a negative, moderately strong relationship between X and Y with one outlier.
Which of the following statements best describes this scatterplot? 11
There is a non-linear relationship between X and Y with two outliers.
What does a correlation coefficient of 0 indicate?
There is no linear relationship between the two quantitative variables.
In regression, a residual can be negative. Is this statement true or false?
True
When making predictions based on regression lines, which of the following is not listed as a consideration?
Use the regression line for predictions only if the data go far beyond the scope of the available sample data.
Which of the following statements about correlation is true?
We say that there is a positive correlation between x and y if the x-values increase as the corresponding y-values increase.
What is a variable other than x and y that simultaneously affects both variables called?
a lurking variable
The point circled in red corresponds to an eruption that lasted _______ and had a time until the next eruption began of _______ after the eruption ended.
about 3 minutes; about 72 minutes
A __________ exists between two variables when the values of one variable are somehow associated with the values of the other variable.
correlation
A high correlation coefficient indicates that the relationship between the two quantitative variables must be linear.
false
Determine if the following statement is true or false. A correlation coefficient close to 1 is evidence of a cause-and-effect relationship between the two variables.
false
Paired sample data may include one or more ___________, which are points that strongly affect the graph of the regression line.
influential points,
A straight line satisfies the __________________ if the sum of the squares of the residuals is the smallest sum possible.
least-squares property
The ______________ measures the strength of the linear correlation between the paired quantitative x- and y-values in a sample.
linear correlation coefficient r
When performing a linear regression analysis, it is important that the relationship between the two quantitative variables be _______.
linear.
In working with two variables related by a regression equation, the _________________ in a variable is the amount that it changes when the other variable changes by exactly one unit.
marginal change
In a scatterplot, a(n) ______________ is a point lying far away from the other data points.
outlier
Given a collection of paired sample data, the ____________________ y=b0+b1x algebraically describes the relationship between the two variables, x and y.
regression equation
For a pair of sample x- and y-values, the ______________ is the difference between the observed sample value of y and the y-value that is predicted by using the regression equation.
residual
A ______________ is a scatterplot of the (x,y) values after each of the y-coordinate values has been replaced by the residual value y−y.
residual plot
A _______ is a plot of paired data (x,y) and is helpful in determining whether there is a relationship between the two variables.
scatterplot
When determining whether there is a correlation between two variables, one should use a ____________ to explore the data visually.
scatterplot
The line that fits best between the points in a scatterplot is the line that gives the _______ sum of the squared _______ distances between each point and the line.
smallest; vertical
A correlation coefficient can be 0.
true