Statistics test 2
For a linear regression model with square of the correlation r2 and standard error se, which of the following best describes the size of r2 and se that you would most want for your model? A) r2 is small and se is small B) r2 is big and se is big C) r2 is small and se is big D) r2 is big and se is small
D) r2 is big and se is small A big r2 and a small se tells you that a lot of the variation in the response variable is explained by the model and that the typical amount of prediction error for the model is small.
Suppose that we want to examine the relationship between high school GPA and college GPA. We collect data from students at a local college. The linear regression predicted college GPA = 1.07 + 0.62 * high school GPA. One student has a high school GPA of 3.00 and a college GPA of 3.15. What is the residual for this student? A) 0.22 B) −0.22 C) 0.15 D) −0.15
A) 0.22 The predicted college GPA is 2.93. Residual = observed y − predicted y = 3.15 − 2.93 = 0.22.
Again, the scatterplot below shows Olympic gold medal performances in the long jump from 1900 to 1988. The long jump is measured in meters. For the regression line predicted long jump = 7.24 + 0.014 (year since 1900), what does the 7.24 tell us? A) 7.24 meters is the predicted value for the long jump in 1900. B) 7.24 meters is the actual value for the long jump in 1900. C) 7.24 is the minimum value for the long jump from 1900 to 1988. D) 7.24 meters is the predicted increase in the winning long jump distance for each additional year after 1900.
A) 7.24 meters is the predicted value for the long jump in 1900. 7.24 is the y-intercept and thus gives the predicted value in 1900.
Pre-Statistics and Statistics Course Grades: We recorded the pre-statistics course grade (in percent) and introductory statistics course grade (in percent) for 60 community college students. The linear regression equation is: Predicted introductory statistics course grade = - 0.147 + 0.981 (Pre-statistics course grade) What does the slope of the regression line tell us? A) For each 1% increase in the pre-statistics course grade, the predicted introductory statistics course grade will increase 0.981%. B) For each 1% increase in the pre-statistics course grade, the predicted introductory statistics course grade will decrease 0.147%. C) For each 1% increase in the pre-statistics course grade, the introductory statistics course grade will increase 0.981%. D) For each 0.981% increase in the pre-statistics course grade, the predicted introductory statistics course grade will increase 1%.
A) For each 1% increase in the pre-statistics course grade, the predicted introductory statistics course grade will increase 0.981%. (The slope indicates a predicted increase in the introductory statistics course grade for each additional one percent increase in the pre-statistics course grade.)
Wedding expenses and marriage length: With the headline, "Want a happy marriage? Have a big, cheap wedding" CNN reported on a study that examined the correlation between wedding expenses and the length of marriages. The news article states, "A new study found that couples who spend less on their wedding tend to have longer-lasting marriages than those who splurge." http://www.cnn.com/2014/10/13/living/wedding-expenses-study/ What would be a reasonable explanation for the observed correlation? A) Having an inexpensive wedding helps young couples avoid financial burdens that may strain their marriage. B) Having an inexpensive wedding guarantees a couple will have a long-term marriage because the study shows a strong correlation between the two variables. C) Having an inexpensive wedding has no impact on the length of marriage because the cost of a wedding is a confounding variable that explains the correlation.
A) Having an inexpensive wedding helps young couples avoid financial burdens that may strain their marriage. A lurking variable that could explain the association between wedding expenses and the length of a marriage is the amount of financial stress the couple has. Financial issues are known to be among the most problematic contributors to marital strife.
Wedding expenses and marriage length: With the headline, "Want a happy marriage? Have a big, cheap wedding" CNN reported on a study that examined the correlation between wedding expenses and the length of marriages. The news article states, "A new study found that couples who spend less on their wedding tend to have longer-lasting marriages than those who splurge." http://www.cnn.com/2014/10/13/living/wedding-expenses-study/ What would be a reasonable explanation for the observed correlation? A) Having an inexpensive wedding helps young couples avoid financial burdens that may strain their marriage. B) Having an inexpensive wedding guarantees a couple will have a long-term marriage because the study shows a strong correlation between the two variables. C) Having an inexpensive wedding has no impact on the length of marriage because the cost of a wedding is a confounding variable that explains the correlation. Feedback
A) Having an inexpensive wedding helps young couples avoid financial burdens that may strain their marriage. A lurking variable that could explain the association between wedding expenses and the length of a marriage is the amount of financial stress the couple has. Financial issues are known to be among the most problematic contributors to marital strife.
Movie data: We collected data from IMDb.com on 70 movies listed in the top 100 US box office sales of all time. These are the variable descriptions: Metascore: Score out of 100, based on major critic reviews as provided by Metacritic.com Total US box office sales: Total box office sales in millions of dollars Rotten Tomatoes: Score out of 100, based on authors from writing guilds or film critic associations We used Metscore ratings as an explanatory variable and Rotten Tomato ratings as the response variable in a linear regression. The r2 value is 76%. With US box office sales as the explanatory variable and Rotten Tomato ratings as the response variable in a linear regression, the r2 value is 2%. Using the r2 value, which is a better predictor of a movie's Rotten Tomatoes score: Metascore or total US box office sales? A) Metascore B) Total US box office sales
A) Metascore Larger r2 values are preferable because it indicates that the explanatory variable explains a larger portion of the total variation in the response variable.
Movie data: We collected data from IMDb.com on 70 movies listed in the top 100 US box office sales of all time. These are the variable descriptions: Metascore: Score out of 100, based on major critic reviews as provided by Metacritic.com Total US box office sales: Total box office sales in millions of dollars Rotten Tomatoes: Score out of 100, based on authors from writing guilds or film critic associations We used Metascore ratings as an explanatory variable and Rotten Tomato ratings as the response variable in a linear regression. The se value is 11. With US box office sales as the explanatory variable and Rotten Tomato ratings as the response variable in a linear regression, the se value is 22. Using the se value, which is a better predictor of a movie's Rotten Tomatoes score: Metascore or total US box office sales? A) Metascore B) Total US box office sales
A) Metascore The se value is the typical or average prediction error, so a smaller value is preferable because it indicates that the model makes more accurate predictions.
Airfare prices: Suppose that we want to examine the relationship between distance (in miles) and the cost of round-trip airfare from LAX (Los Angeles International Airport). We collect data from a travel search engine. The linear regression equation is Predicted round-trip airfare = 146.2 + 0.1442 (distance) The standard error of the regression se is $117. Which of the following is an appropriate interpretation of se? A) On average, we estimate that our predictions using this linear regression model will be off by approximately ±$117. B) The error for any prediction using this linear regression model will be approximately $117. C) The typical error between the predicted round-trip airfare and the actual round-trip airfare using this linear regression model will be approximately $117 D) On average, we estimate that our predictions using this linear regression model will be off by approximately $117.
A) On average, we estimate that our predictions using this linear regression model will be off by approximately ±$117. The se value is a measurement of typical error in our predictions using the linear regression model.
Pre-Statistics and Statistics Course Grades: We recorded the pre-statistics course grade (in percent) and introductory statistics course grade (in percent) for 60 community college students. Then we generated the following scatterplot of the data. For this linear regression model, r2 = 0.70. What does this mean? A) Our linear regression model explains 70% of the total variation in the introductory statistics course grade. B) There will be about 70% of the data along the regression line. C) The pre-statistics course grade explains 70% of the introductory statistics course grade. D) Our linear regression model explains 70% of the total variation in the pre-statistics course grade.
A) Our linear regression model explains 70% of the total variation in the introductory statistics course grade. The number r2 represents the proportion of the total variation in the introductory statistics course grade explained by the pre-statistics course grade.
The scatterplot below shows Olympic gold medal performances in the long jump from 1900 to 1988. The long jump is measured in meters. The Olympics were not held in 1940 because of World War II. If the Olympics had happened in 1940, how could you estimate the gold medal winning distance in the long jump for that year? A) Plug 40 into the regression equation: predicted long jump = 7.24 + 0.014 (40). B) Plug 1940 into the regression equation: predicted long jump = 7
A) Plug 40 into the regression equation: predicted long jump = 7.24 + 0.014 (40). The independent variable is years since 1900.
The scatterplot below shows Olympic gold medal performances in the long jump from 1900 to 1988. The long jump is measured in meters. The Olympics were not held in 1940 because of World War II. If the Olympics had happened in 1940, how could you estimate the gold medal winning distance in the long jump for that year? A) Plug 40 into the regression equation: predicted long jump = 7.24 + 0.014 (40). B) Plug 1940 into the regression equation: predicted long jump = 7.24 + 0.014 (1940).
A) Plug 40 into the regression equation: predicted long jump = 7.24 + 0.014 (40). The independent variable is years since 1900.
Pre-Statistics and Statistics Course Grades: We recorded the pre-statistics course grade (in percent) and introductory statistics course grade (in percent) for 60 community college students. In this data set, no one earned a 90% for the pre-statistics course grade. How could you estimate a student's introductory statistics course grade if she earned 90% for the pre-statistics course grade? A) Substitute 90 into the regression equation: Predicted introductory statistics course grade = -0.147 + 0.981(90). B) Substitute 0.90 into the regression equation: Predicted introductory statistics course grade
A) Substitute 90 into the regression equation: Predicted introductory statistics course grade = -0.147 + 0.981(90). The explanatory variable is the pre-statistics course grade (in percent).
Pre-Statistics and Statistics Course Grades: We recorded the pre-statistics course grade (in percent) and introductory statistics course grade (in percent) for 60 community college students. In this data set, no one earned a 90% for the pre-statistics course grade. How could you estimate a student's introductory statistics course grade if she earned 90% for the pre-statistics course grade? A) Substitute 90 into the regression equation: Predicted introductory statistics course grade = -0.147 + 0.981(90). B) Substitute 0.90 into the regression equation: Predicted introductory statistics course grade = -0.147 + 0.981(0.90)
A) Substitute 90 into the regression equation: Predicted introductory statistics course grade = -0.147 + 0.981(90). The explanatory variable is the pre-statistics course grade (in percent). Thus, there is no need to convert 90% into decimal form.
Influential outlier: The correlation coefficient for the scatterplot above is r = −0.72. The point (17, 21) appears to be an outlier since it doesn't follow the general pattern of the data(it's much lower on the left side of a negative regression line), so we want to determine how much it influences the value of r. To do this, we remove the outlier from the data set and recalculate r. How does the value of r change? A) The correlation coefficient would be stronger and therefore closer to −1.00. B) The correlation coefficient would remain the same (r = −0.72). C) The correlation coefficient would be weaker and therefore closer to 0.
A) The correlation coefficient would be stronger and therefore closer to −1.00. (The correlation coefficient is sensitive to influential points, so the value of r would be stronger with the removal of the outlier from the data set.)
Influential outlier: The correlation coefficient for the scatterplot above is r = −0.72. The point (17, 21) appears to be an outlier since it doesn't follow the general pattern of the data, so we want to determine how much it influences the value of r. To do this, we remove the outlier from the data set and recalculate r. How does the value of r change? A) The correlation coefficient would be stronger and therefore closer to −1.00. B) The correlation coefficient would remain the same (r = −0.72). C) The correlation coefficient would be weaker and therefore closer to 0.
A) The correlation coefficient would be stronger and therefore closer to −1.00. The correlation coefficient is sensitive to influential points, so the value of r would be stronger with the removal of the outlier from the data set.
The correlation coefficient for the scatterplot above is r = 0.41. The point (18, 78) appears to be an outlier (It's on the bottom right side of a positive regression line. since it doesn't follow the general pattern of the data, so we want to determine how much it influences the value of r. To do this, we remove the outlier from the data set and recalculate r. How does the value of r change? A) The correlation coefficient would increase to a value higher than r = 0.41. B) The correlation coefficient would remain the same (r = 0.41). C) The correlation coefficient would decrease to a value lower than r = 0.41.
A) The correlation coefficient would increase to a value higher than r = 0.41. The correlation coefficient is sensitive to influential points, so the value of r would increase with the removal of the outlier from the data set.
The correlation coefficient for the scatterplot above is r = 0.41. The point (18, 78) appears to be an outlier since it doesn't follow the general pattern of the data, so we want to determine how much it influences the value of r. To do this, we remove the outlier from the data set and recalculate r. How does the value of r change? A) The correlation coefficient would increase to a value higher than r = 0.41. B) The correlation coefficient would remain the same (r = 0.41). C) The correlation coefficient would decrease to a value lower than r = 0.41.
A) The correlation coefficient would increase to a value higher than r = 0.41. The correlation coefficient is sensitive to influential points, so the value of r would increase with the removal of the outlier from the data set.
Which of the following statements is true of a least-squares regression line? Check all that apply. A) The least-squares regression line is chosen so that the sum of the squares of the residuals is as small as possible. B) The least-squares regression line is the only line with the smallest sum of the squares of the errors. C) If all the points are on a line, then the sum of the squares of errors is zero. D) The sum of the squares of the residuals is always equal to r2.
A) The least-squares regression line is chosen so that the sum of the squares of the residuals is as small as possible. B) The least-squares regression line is the only line with the smallest sum of the squares of the errors. C) If all the points are on a line, then the sum of the squares of errors is zero. The errors are the same as the residuals, and it is the regression line makes the sum of the squares of the errors as small as possible. Additionally, there is only one line that makes the sum of the squares of the errors as small as possible, and that is the regression line. Lastly, if all the points are on a line, then there is zero error, so the sum of the squares of the errors is also zero.
Suppose that we want to examine the relationship between high school GPA and college GPA. We collect data from students at a local college. The linear regression predicted college GPA = 1.07 + 0.62 * high school GPA. The standard error of the regression, se, was 0.374. What does this value of the standard error of the regression mean? A) The typical error between a predicted college GPA using this model and an actual college GPA for a given student will be about 0.374 grade points in size (absolute value). B) 37.4% of the variation in college GPA is explained by this regression model. C) 37.4% of the variation in college GPA is not explained by this regression model. D) The typical difference between a student's college GPA and high school GPA will be about 0.374 grade points in size (absolute value
A) The typical error between a predicted college GPA using this model and an actual college GPA for a given student will be about 0.374 grade points in size (absolute value). The standard error is roughly a measure of the average or typical distance of the points about the regression line.
The standard error se for this linear model for Olympic gold medal performances in the long jump from 1900 to 1988 (predicted long jump = 7.24 + 0.014(year since 1900)) is about 0.2. In terms of estimating long jump distances, what does this mean? A) On the average, estimates will be up to 0.2 meters too short. B) On the average, estimates will be either be 0.2 meters too short or too long. C) 20% of the error in the estimates is explained by the regression line. D) 4% of the error in the estimates is explained by the regression line.
B) On the average, estimates will be either be 0.2 meters too short or too long. (The standard error give the average error in either the positive or negative direction.)
If the correlation coefficient for a given scatterplot is r = 0.73, which of the following must be true about the relationship between the explanatory and response variable? A) The association has a linear form. B) The association must be positive. C) The association must be negative. D) The association is weakened by outliers.
B) The association must be positive. The correlation coefficient gives us information about the direction and strength of the relationship. A positive r-value tells us the association is positive (upward).
Which one of the following statements is true of a least-squares regression line? A) The least-squares regression line is chosen so that the sum of the squares of the residuals is as large as possible. B) The least-squares regression line is the only line with the smallest sum of the squares of the errors. C) If all the points are on a line, then the sum of the squares of errors is 1. D) The sum of the squares of the residuals can be positive or negative.
B) The least-squares regression line is the only line with the smallest sum of the squares of the errors. There is only one line that makes the sum of the squares of the errors as small as possible, and that is the regression line.
Which one of the following statements is true of a least-squares regression line? A) The least-squares regression line is chosen so that the sum of the squares of the residuals is as large as possible. B) The least-squares regression line is the only line with the smallest sum of the squares of the errors. C) If all the points are on a line, then the sum of the squares of errors is 1. D) The sum of the squares of the residuals can be positive or negative. Feedback
B) The least-squares regression line is the only line with the smallest sum of the squares of the errors. There is only one line that makes the sum of the squares of the errors as small as possible, and that is the regression line.
Videogame data: We collected data on the top 50 best-selling video games. These are the variable descriptions: Metascore: Score out of 100, based on major critic reviews as provided by Metacritic.com Total Worldwide Sales: Total number copied sold worldwide since December 31, 2015. GameRankings: Score out of 100%, based on an average scores from websites and magazine reviews. We used Metascore ratings as an explanatory variable and GameRankings ratings as the response variable in a linear regression. The r2 value is 48%. With Total Worldwide Sales as the explanatory variable and GameRankings ratings as the response variable in a linear regression, the r2 value is 62%. Using the r2 value, which is a better predictor of a videogames's GameRankings score: Metascore or Total Worldwide Sales? A) Metascore B) Total Worldwide Sales
B) Total Worldwide Sales (Larger r2 values are preferable because it indicates that the explanatory variable explains a larger portion of the total variation in the response variable.)
Videogame data: We collected data on the top 50 best-selling video games. These are the variable descriptions: Metascore: Score out of 100, based on major critic reviews as provided by Metacritic.com Total Worldwide Sales: Total number copied sold worldwide since December 31, 2015. GameRankings: Score out of 100%, based on an average scores from websites and magazine reviews. We used Metascore ratings as an explanatory variable and GameRankings ratings as the response variable in a linear regression. The se value is 24. With Total Worldwide Sales as the explanatory variable and GameRankings ratings as the response variable in a linear regression, the se value is 10. Using the se value, which is a better predictor of a videogame's GameRanking score: Metascore or Total Worldwide sales? A) Metascore B) Total Worldwide Sales
B) Total Worldwide Sales (The se value is the typical or average prediction error, so a smaller value is preferable because it indicates that the model makes more accurate predictions.)
Videogame data: We collected data on the top 50 best-selling video games. These are the variable descriptions: Metascore: Score out of 100, based on major critic reviews as provided by Metacritic.com Total Worldwide Sales: Total number copied sold worldwide since December 31, 2015. GameRankings: Score out of 100%, based on an average scores from websites and magazine reviews. We used Metascore ratings as an explanatory variable and GameRankings ratings as the response variable in a linear regression. The r2 value is 48%. With Total Worldwide Sales as the explanatory variable and GameRankings ratings as the response variable in a linear regression, the r2 value is 62%. Using the r2 value, which is a better predictor of a videogames's GameRankings score: Metascore or Total Worldwide Sales? A) Metascore B)Total Worldwide Sales
B)Total Worldwide Sales Larger r2 values are preferable because it indicates that the explanatory variable explains a larger portion of the total variation in the response variable.
Airfare prices: Suppose that we want to examine the relationship between distance (in miles) and the cost of round-trip airfare from LAX (Los Angeles International Airport). We collect data from a travel search engine. The linear regression equation is Predicted round-trip airfare = 146.2 + 0.1442 (distance) The round-trip airfare to fly between LAX and San Francisco is $174, and the distance traveled is 382 miles. What is the residual related to this flight itinerary? A) $27 B)−$27 C) $3 D) −$3
B)−$27 The predicted round-trip airfare is $201. Residual = observed y − predicted y = 174 − 201 = −27, or −$27.
Airfare prices: Suppose that we want to examine the relationship between distance (in miles) and the cost of round-trip airfare from LAX (Los Angeles International Airport). We collect data from a travel search engine. The linear regression equation is Predicted round-trip airfare = 146.2 + 0.1442 (distance) The round-trip airfare to fly between LAX and San Francisco is $174, and the distance traveled is 382 miles. What is the residual related to this flight itinerary? A) $27 B)−$27 C) $3 D)−$3
B)−$27 The predicted round-trip airfare is $201. Residual = observed y − predicted y = 174 − 201 = −27, or −$27.
Movie ratings: This scatterplot below consists of 72 movies listed in the top 100 USA box office sales of all time. The Rotten Tomatoes score (percent) is based on authors from writing guilds or film critic associations. Metascores range from 1 to 100 based on major critic reviews provided by Metacritic.com. Higher scores for both Rotten Tomatoes scores and Metascores indicate better overall reviews. Which equation is a reasonable description of the least-squares regression line for the Predicted Rotten Tomatoes Score? A) -7.476 - 1.201 * Metascore B) 1.201 - 7.476 * Metascore C) -7.476 + 1.201 * Metascore D) 29.476 + 1.201 * Metascore
C) -7.476 + 1.201 * Metascore The slope is positive since it increases as we go from left to right and if we extend the line, the y-intercept will be negative.
For this linear regression model, r2= 0.90. What does this mean? A) The maximum long jump was around 90 inches. B) Each year the winning long jump distance increased by 90%. C) 90% of the variation in long jump distances is explained by the regression line. D) The data ends around 1990
C) 90% of the variation in long jump distances is explained by the regression line. (The number r2 gives the proportion of the variation in the long jump explained by the change in years.)
The scatterplot below shows Olympic gold medal performances in the long jump from 1900 to 1988. The long jump is measured in meters. For this linear regression model, r2 = 0.90. What does this mean? A) The maximum long jump was around 90 inches. B) Each year the winning long jump distance increased by 90%. C) 90% of the variation in long jump distances is explained by the regression line. D) The data ends around 1990.
C) 90% of the variation in long jump distances is explained by the regression line. The number r2 gives the proportion of the variation in the long jump explained by the change in years.
The scatterplot below shows Olympic gold medal performances in the long jump and the high jump from 1900 to 1988. Each point shows the long jump distance that won the gold medal and also the height that won the high jump for the same year. The correlation between the two kinds of jumps is 0.89. What would be a reasonable explanation for this high correlation? A) The athletes' age is a lurking variable explaining the association between long jump and high jump performance. B) Long jumpers have improved their performance and this has inspired the high jumpers to improve their performance. C) Athletes have gotten better over time and performance improvements have occurred in both events over time.
C) Athletes have gotten better over time and performance improvements have occurred in both events over time. This is 88 years of data. Time is the lurking variable. Athletes have gotten better over time.
Each month a bank adjusts the initial interest rate it offers to customers who wish to open a new high-yield savings account. The bank wants to determine if there is a relationship between the initial interest rate and the average daily number of new savings accounts. The bank plans to use the interest to predict the average number of new savings accounts opened in a month. Which one of the following statements is correct? A) Both the interest rate and the average daily number of new accounts are explanatory variables. B) Both the interest rate and the average daily number of new accounts are response variables. C) Average daily number of new accounts is the response variable. Interest rate is the explanatory variable. D) Interest rate is the response variable. Average daily number of new accounts is the explanatory variable.
C) Average daily number of new accounts is the response variable. Interest rate is the explanatory variable. The variable whose value we are predicting is the response variable; the variable we are using to make that prediction is the explanatory variable. We are predicting the average number of new savings accounts that would open in a month based on the initial interest rate offered for that month.
For a linear regression model with standard error se, why would you want a small se in your model? A) It means more of the variation in the response variable is explained by the model. B) It means less of the variation in the response variable is explained by the model. C) It means the typical amount of prediction error for the model will be small. D It means the typical amount of prediction error for the model will be large.
C) It means the typical amount of prediction error for the model will be small. A small se tells you that the typical amount of prediction error for the model is small.
For a linear regression model with standard error se, why would you want a small se in your model? A) It means more of the variation in the response variable is explained by the model. B) It means less of the variation in the response variable is explained by the model. C) It means the typical amount of prediction error for the model will be small. D) It means the typical amount of prediction error for the model will be large.
C) It means the typical amount of prediction error for the model will be small. A small se tells you that the typical amount of prediction error for the model is small.
Each month a bank adjusts the initial interest rate it offers to customers who wish to open a new high-yield savings account. The bank might offer an initial interest rate of 8%. The bank wants to predict the average daily number of new accounts to expect per day if the interest rate is 8%. Should the regression line be used for this prediction? A) Yes, as the model probably has a very high r2 value. B) Yes, as the model appears to have a strong linear association, and the residual values appear small in size. C) No, because the model was developed using only initial interest rate values from 0.5% to 3.8%, so it is risky to assume that the linear trend will continue far beyond that span of values.
C) No, because the model was developed using only initial interest rate values from 0.5% to 3.8%, so it is risky to assume that the linear trend will continue far beyond that span of values. No matter how strong the association is, how high the r2 value is, or how small in size the residuals are, since the model was developed using only initial interest rate values from 0.5% to 3.8%, 8% would be too far beyond that span of values to assume that the linear trend would continue.
Pre-Statistics and Statistics Course Grades: We recorded the pre-statistics course grade (in percentage) and introductory statistics course grade (in percentage) for 60 community college students. Suppose a struggling student who is currently taking pre-statistics and not passing (60%) wants to predict his introductory statistics course grade. Should the regression line be used for this prediction? A) Yes, as the model probably has a very high r2 value. B) Yes, as the model appears to have a strong linear association, and the residual values appear small in size. C) No, because the model was developed using only pre-statistics course grades between 70% and 95%, so it is risky to assume that the linear trend will continue far beyond that span of values.
C) No, because the model was developed using only pre-statistics course grades between 70% and 95%, so it is risky to assume that the linear trend will continue far beyond that span of values. (No matter how strong the association is, how high the r2 value is, or how small in size the residuals are, since the model was developed using only pre-statistics course grades between 70% and 95%, 60% would be too far beyond that span of values to assume that the linear trend would continue.)
Pre-Statistics and Statistics Course Grades: We recorded the pre-statistics course grade (in percentage) and introductory statistics course grade (in percentage) for 60 community college students. Suppose a struggling student who is currently taking pre-statistics and not passing (60%) wants to predict his introductory statistics course grade. Should the regression line be used for this prediction? A) Yes, as the model probably has a very high r2 value. B) Yes, as the model appears to have a strong linear association, and the residual values appear small in size. C) No, because the model was developed using only pre-statistics course grades between 70% and 95%, so it is risky to assume that the linear trend will continue far beyond that span of values.
C) No, because the model was developed using only pre-statistics course grades between 70% and 95%, so it is risky to assume that the linear trend will continue far beyond that span of values. No matter how strong the association is, how high the r2 value is, or how small in size the residuals are, since the model was developed using only pre-statistics course grades between 70% and 95%, 60% would be too far beyond that span of values to assume that the linear trend would continue.
Correlation coefficient: If the correlation coefficient for a given scatterplot is r = -0.81, which of the following must be true about the relationship between the explanatory and response variable? A) The association has a linear form. B) The association must be positive. C) The association must be negative. D) The association is weakened by outliers.
C) The association must be negative. The correlation coefficient gives us information about the direction and strength of the relationship. A negative r-value tells us the association is negative (downward).
The scatterplot shows Olympic gold medal performances in the long jump from 1900 to 1988. The long jump is measured in meters. The regression equation is predicted long jump = 7.24 + 0.014 (year since 1900). What does the slope of the regression line tell us? A) Each year the winning Olympic long jump performance is expected to increase 7.24 meters. B) Each year the winning Olympic long jump performance is expected to increase 0.014 meters. C) Each time the Olympics are held (generally every four years), the long jump gold medal winner will definitely achieve a 0.014 meter increase over the previous gold medal winner. D) Each time the Olympics are held (generally every four years), there is a predicted 1.4% increase in long jump performance.
D) Each time the Olympics are held (generally every four years), there is a predicted 1.4% increase in long jump performance. The slope is not a percentage. Also, the slope indicates a predicted (not definite) increase in the winning long jump for each additional year, not the increase from one Olympics to the next (generally every four years).
Math workshops and final exams: The college tutoring center staff are considering whether the center should increase the number of math workshops they offer to help students improve their performance in math classes. Faculty would like to know if requiring student attendance at these math workshops will improve overall passing rates for their students in their math classes. They plan to use the number of workshops attended to predict the final exam score and regression analysis to determine the effectiveness of the mandatory workshop attendance policy. Which is the response variable? A) Whether the student attended a workshop (yes, no) B) Number of workshops attended C) Whether the student passes the course (yes, no) D) Final exam score
D) Final exam score The variable "Final exam score" is what we are trying to predict, based on the number of workshops a student attends. This is the response variable for this scenario.
Pre-Statistics and Statistics Course Grades: We recorded the pre-statistics course grade (in percent) and introductory statistics course grade (in percent) for 60 community college students. The standard error se for the linear regression model that predicts introductory statistics course grades using pre-statistics course grades (predicted introductory statistics course grade = -0.147 + 0.9810 (pre-statistics grade)) is about 4.5%. In terms of estimating introductory statistics course grades, what does this mean? A) On average, our estimates will be up to 4.5% too low. B) On average, our estimates will be up to 4.5% too high. C) 4.5% of the error in our estimates is explained by the regression line. D) On average, our estimates will be either be up to 4.5% too low or too high.
D) On average, our estimates will be either be up to 4.5% too low or too high. The standard error gives the average error in either the positive or negative direction.