Stat 042 chapter 7 homework
Question 22 Researchers hoping to find ways to make a good estimate of a person's body fat percentage immersed 20 male subjects in water, then measured their waists and recorded their weights. The results are shown in the accompanying table. The linear model used to predict % Body Fat from Weight has Rsquared equals47.1 % and s Subscript e equals6.9 %. Would a model that uses the person's Waist size be able to predict the % Body Fat more accurately than one that uses Weight? Create and analyze that model. 1.Write the equation of the regression line.
-62, 2.21
Question 3 A least squares regression line was calculated to relate the length (cm) of newborn boys to their weight in kg. Look at pearson to see equation. A newborn was 48cm long and weighed 3 kg. According to the regression model, what was his residual? What does that say about him? 1. What was his residual? 2. What does that say about him? Select the correct choice and fill in any answer boxes to complete your answer. A. The newborn weighs ____ kg more than the weight predicted by the regression equation. B. The newborn weighs _____ kg less than the weight predicted by the regression equation. C. The newborn weighs the same as the weight predicted by the regression equation.
1. -.092 kg 2. B
A least squares regression line was calculated to relate the length (cm) of newborn boys to their weight in kg. The line is ModifyingAbove weight with caret equals negative 5.28 plus 0.1694 length. Explain in words what this model means. Should new parents (who tend to worry) be concerned if their newborn's length and weight don't fit this equation? 1. What does the given model mean? A. The weight of a newborn boy will always equal negative 5.28 kg plus 0.1694 kg per cm of length. B. The minimum length of a newborn baby boy can be no less than 31.1688 cm. C. The weight of a newborn boy can be predicted as negative 5.28 kg plus 0.1694 kg per cm of length. D. The length of a newborn boy can be predicted as negative 5.28 cm plus 0.1694 cm per kg of weight. 2. Should new parents (who tend to worry) be concerned if their newborn's length and weight don't fit this equation? A. No, because this is a model fit to divide the data. All newborn weights above the line are normal and all newborn weights below the line are a matter for concern. B. Yes, because any newborn whose length and weight do not fit the model are far outside the normal weight to length proportion. C. Yes, because 97% of new born babies fit into this linear model perfectly. D. No, because this is a model fit to data. No particular baby should be expected to fit this model exactly.
1. C 2. D
Question 18 Suppose the entering freshmen at a certain college have a mean combined SAT score of 1218 , with a standard deviation of 126 . In the first semester, these students attained a mean GPA of 2.62 , with a standard deviation of 0.52 . A scatterplot showed the association to be reasonably linear, and the correlation between SAT score and GPA was 0.48 . What SAT score would you predict a freshman who attained a first-semester GPA of 2.9 would have gotten? Note that in this case, the explanatory variable is the student's GPA and the response variable is their SAT score.
1251
Question 10 Analysis of the relationship between the fuel economy (mpg) and engine size (liters) for 35 models of cars produces the regression model look at questionIf a car has a 5 liter engine, what does this model suggest the gas mileage would be?
17mpg
The correlation between a car's engine size and its fuel economy (in mpg) is r = minus 0.735 . What fraction of the variability in fuel economy is accounted for by the engine size?
54%
Question 8 What is the value of R squared? What is the interpretation of R squared ? The value of Upper R squared equals98.89 % indicates the percentage of the variability in the price of these disk drives that can be accounted for by a linear model on the capacity of the drives. (Round to two decimal places as needed.)
98.89
Question 12 look at problem on pearson A.The true potassium contents of cereals vary from the predicted amounts with a standard deviation of 30.03 milligrams. B. The true potassium contents of cereals vary from the predicted amounts with a variance of 30.03 milligrams. C. The true fiber contents of cereals vary from the predicted amounts with a standard deviation of 30.03 grams. D. The true fiber contents of cereals vary from the predicted amounts with a variance of 30.03 grams.
A
Question 5' A CEO complains that the winners of his "rookie junior executive of the year" award often turn out to have less impressive performance the following year. He wonders whether the award actually encourages them to slack off. Can you offer a better explanation? Which of the following is a better explanation for why the winners of the "rookie junior executive of the year" award often turn out to have less impressive performance the following year? A. Performance is often random. If a junior executive had a good year their first year, then odds say it is unlikely that they will have a repeat performance the next year. B. Perhaps they weren't really better than other rookie executives, but just happened to have a lucky year. C. The winners were considered the best of the year, so naturally they reached the maximum level of performance that year and it is impossible to improve upon that. D. No, the CEO stated it perfectly; the award actually encourages them to slack off.
B
question 14 Players in any sport who are having great seasons, turning in performances that are much better than anyone might have anticipated, often are pictured on the cover of Sports Illustrated. Frequently, their performances then falter somewhat, leading some athletes to believe in a "Sports Illustrated jinx." Similarly, it is common for phenomenal rookies to have less stellar second seasons, the so-called "sophomore slump." While fans, athletes, and analysts have proposed many theories about what leads to such declines, a statistician might offer a simpler (statistical) explanation. Explain. The slope of the linear regression, predicting performance from years in the sport, must be negative because an athlete's performance always decreases over time. No matter how well an athlete performed one year, they must perform worse the next year. B. People on the cover are usually there for outstanding performances. Because they are so far from the mean, the performance in the next year is likely to be closer to the mean. C. People on the cover are usually considered the best of the year, so naturally they reached the maximum level of athletic performance that year and it is impossible to improve upon that. D. Once an athlete has made the cover of Sports Illustrated, they have reached their ultimate goal as an athlete and lack motivation to try the following year.
B
Question 17 A group of high school seniors took a scholastic aptitude test. The resulting math scores had a mean 473.4 with a standard deviation of 179.3, verbal scores had a mean 508.4 with a standard deviation of 171.5, and the correlation between verbal and math scores was r=0.707 . a) What is the correlation? The correlation is=? b) Write the equation of the line of regression predicting verbal scores from math scores. c) In general, what would a positive residual mean in this context? A. A positive residual means the student has the exact verbal score that the linear model would predict. B. A positive residual means the student has a higher verbal score than the linear model would predict. C. A positive residual means the student has a lower verbal score than the linear model would predict. d)A person tells you her math score was 394. Predict her verbal score. The student is expected to have a verbal score of ______ e)Using the predicted verbal score from part (d) and the regression equation predict the student's math score.
a) .707 b) y=188.382+.676x c) B d)454.726 e)433.735
Question 4 a) Find the slope estimate, b 1 b)What does this b 1 value mean, in this context? A. It means that, on average, the total sales will equal b 1* $ 1000 multiplied by the number of sales people working. B. It means that, on average, an additional increase of b 1* 1000 sales people working is associated with each additional dollar in sales. C. It means that, on average, an additional increase of b 1* $ 1000 in sales is associated with each additional sales person working. D. It means that, on average, the number of sales people working is approximately b 1 *1000 multiplied by the total sales. c) Find the intercept, b 0. b 0 =? d) What does this b 0 value mean, in this context? Does this value of b 0 make sense? A. It means that, on average, an additional increase of b 0 times $ 1000 in sales is associated with each additional sales person working. It does not make sense, because the value of b 0 is much larger than b 1. B. It would mean that, on average, the minimum amount of sales made from 2 sales people working is b 0*$ 1000 . It makes sense, because b 0 is greater than zero. C. It means that, on average, the expected sales is b 0 times $ 1000 with 0 sales people working. It does not make sense, because it is unlikely that any sales would be made with zero sales people working. D. It means that, on average, the total sales will equal b 0 times $ 1000 multiplied by the number of sales people working. It makes sense, because b 0 is greater than zero. e) Write down the equation that predicts Sales from Number of Sales People Working, using the variable x to represent Number of Sales People Working. sales= ____+_____x f) If 19 people are working, what sales (in dollars) do you predict? The predicted sales are _____ dollars. g) If sales are actually $24 ,000, what is the value of the residual? residual=?dollar(s) h) Has the original estimate from part f overestimated or underestimated the sales? A. overestimated B. underestimated C. neither overestimated nor underestimated
a) .782 b) C c) 8.776 d)C e) 8.776+.782x f)23634 g) 366 h) underestimated
Question 15 A regression analysis of 117 homes for sale produced the following model, where price is in thousands of dollars and size is in square feet. Look at model pearson a) Explain what the slope of the line says about housing prices and house size. A. For every additional square foot of area of a house, the price is predicted to increase by $0.067 . B. For every additional square foot of area of a house, the price is predicted to increase by $67 . C. For every $1000 increase in price of a house, the size is predicted to increase by 0.067 square feet. D. For every $1 increase in price of a house, the size is predicted to increase by 67 square feet. b)What price would you predict for a 2500 -square-foot house in this market? c)A real estate agent shows a potential buyer a 1200 -square-foot house, saying that the asking price is $6000 less than what one would expect to pay for a house of this size. What is the asking price? d) What is the $6000 called? A. Intercept B. Residual C. Slope D. Predicted value
a) B b) 215350 c) 122250 d. residual
Question 9 a) Select all the assumptions or conditions that are violated. A. The Quantitative Variables Condition is violated. B. The Outlier Condition is violated. . C. The Does the Plot Thicken Condition is violated. D. The Straight Enough Condition is violated. E. There are no assumptions or conditions that are violated. b) Choose the correct answer below. A. The capacity should be expressed in megabytes instead of terabytes and the regression performed again. B. The prices should be expressed in cents instead of dollars and the regression performed again. C. The high influence point should be removed and the regression performed again. D. There are no issues with the regression.
a) B,D b) C
Question 16 a) Is the linear model appropriate here? Explain. A.The linear model could be appropriate because the Tar coefficient is close to zero, meaning the Nicotine value does not vary much. B. The linear model is not appropriate because the constant coefficient in not equal to zero. C. The linear model could be appropriate. There is some curvature to the residuals but not enough to completely disregard the linear model. Some more data points may be required. D. The linear model is not appropriate because the residuals are constantly decreasing b) Explain the meaning of R squared in this context. A. The predicted nicotine content is equal to some constant plus 92.4% of the tar content. B. Around 92.4% of the data points fit the linear model. C. Around 92.4% of the data points have a residual with magnitude less than the constant coefficient. D. The linear model on tar content accounts for 92.4% of the variability in nicotine content.
a) C b) D
Question 19 he accompanying scatterplot shows the relationship between the percentage of teenagers who had used marijuana and the percentage of teenagers who had used other drugs in 11 countries. Summary statistics showed that the mean percent that had used marijuana was 23.7%, with a standard deviation of 15.8%. An average of 11.9% of teens had used other drugs, with a standard deviation of 10.0%. a) Do you think a linear model is appropriate? Explain. A. No. There are outliers in the plot. B. Yes. While the relationship is weak, there is no reason to think that the linear model is not appropriate. C. Yes. The plot shows a positive, linear, fairly strong relationship. D. No. The plot shows a nonlinear pattern. b) For this regression, R squared is 79.6%. Interpret this statistic in this context. A linear model on the percentage of ____ use accounts for ____% of the variation in the percent use of _______ c) Write the equation you would use to estimate the percentage of teens who use Other Drugs from the percentage who have used Marijuana. d) Explain in context what the slope of this line means. The slope indicates that_____ increases, on average, by _____ for each percent increase in ______. e) Do these results confirm that marijuana is a "gateway drug," that is, that marijuana use leads to the use of other drugs? A. Since the value of Upper R squared is small, the results do not indicate that marijuana leads to other drug use. B. The results indicate an association between marijuana and other drug use; however, association does not imply causation. C. Since the value of Upper R squared is large, the results confirm that marijuana leads to other drug use. D. The results do not show a strong association between marijuana and other drug use.
a) C b) marijuana, 79.6, other drugs c) -1.573+.565 marijuana % d) other drug, .565, marijuana use e) B
Question 1 Determine if the following statements are True or False. If False, explain briefly. a) Choose the linear model that passes through the most data points on the scatterplot. A. True. Choose the linear model that passes through the most data points on the scatterplot. B. False. The linear model line usually passes through exactly half of the data points. C. False. All of the data points either touch the line or fall below the line. D. False. The line usually touches none of the points. Minimize the sum of the squared errors. Part 2 b) The residuals are the observed y-values minus the y-values predicted by the linear model. A. True. The residuals are the observed y-values minus the y-values predicted by the linear model. B. False. The residuals are the observed x-values minus the x-values predicted by the linear model. C. False. The residuals are the predicted y-values minus the y-values observed by the linear model. D. False. The residuals are the observed y-values minus the mean y-value. Part 3 c) Least squares means that the square of the largest residual is as small as it could possibly be. A. True. Least squares means that the square of the largest residual is as small as it could possibly be. B. False. Least squares means that the product of the squares of all the residuals is minimized. C. False. Least squares means that the sum of the squares of all the residuals is minimized. D. False. Least squares means that the square of the median residual is minimized.
a) D b) A c) C
Question 20 a) Using this information, describe the association between the costs of a cappuccino and a third of a liter of water. The association is (strong/moderate/weak),(positive, negative) and shaped ________. b) The correlation is 0.655. Find and interpret the value of Rsquared Interpret this value. Select the correct choice below and fill in the answer box within your choice. (Round to one decimal place as needed.) A. For every $1 increase in the price of a regular cappuccino, the price of a third of a liter of water increases, on average, by $enter your response here . B. About ______ % of the variation in cappuccino prices can be explained by using a linear model on water prices. C. About enter your response here % of the variation in water prices can be explained by using a linear model on cappuccino prices. D. For every $1 increase in the price of a third of a liter of water, the price of a regular cappuccino increases, on average, by $enter your response here c) The regression equation predicting the cost of a cappuccino from the cost of a third of a liter of water is Look at pearson.In a certain city, a third of a liter of water costs $0.53 and a cappuccino is $1.19. Calculate and interpret the residual for this city. the residual=? interpret the residual the price of a ______ in this city is _____ cents ___ than predicted
a) moderate, positive, mostly linear but curving ar the highest water prices b) .429, 42.9 c)-.90, cappacino, 90, less
Question 21 a)What is the correlation between CO 2 and Temperature? r=? b)Explain the meaning of R-squared in this context. A. A linear model on mean temperature accounts for 33.1 % of the variation in CO 2 levels. B. A linear model on mean temperature accounts for 66.9 % of the variation in CO 2 levels. C. A linear model on CO 2 levels accounts for 66.9 % of the variation in mean temperature. D. A linear model on CO 2 levels accounts for 33.1 % of the variation in mean temperature. d) What is the meaning of the slope of this equation? A. For every 0.003 ppm increase in CO 2 levels, the mean temperature increases by 1 degrees Upper C . B. For every degree that the mean temperature increases, CO 2 levels increase by 0.003 ppm. C. For every 1 ppm increase in CO 2 levels, the mean temperature increases by 0.003 degrees C. D. The slope does not have a meaningful interpretation in the context of this problem. e) What is the meaning of the y-intercept of this equation? A. For every 1 ppm increase in CO 2 levels, the mean temperature increases by 0.003 degrees C. B. When the global mean temperature is 0degrees C, the CO 2 level is 15.606 ppm. C. When the CO 2 level is 0 ppm, the global mean temperature will be 15.606 degreesC. D. The y-intercept does not have a meaningful interpretation in the context of this problem. f) View the accompanying scatterplot of the residuals vs. CO 2 . Does the scatterplot of the residuals vs. CO 2 show evidence of the violation of any assumptions behind the regression? A. Yes, the outlier condition is violated. B. Yes, all the assumptions are violated. C. Yes, the linearity and equal variance assumptions are violated. D. Yes, the equal variance assumption is violated. E. Yes, the linearity assumption is violated. Your answer is not correct. F. No, all assumptions are okay. g) Suppose CO 2 levels reach 362 ppm this year. What mean temperature does the regression predict from this information? 16.692 degreesC h) Does the answer is part g mean that when CO 2 levels hit 362 ppm , the temperature will reach the predicted level? Explain briefly. A. No. The actual temperature will be 15.606 degrees C. B. No. The actual temperature will be significantly higher than the predicted level. C. No. The actual temperature is likely to be different than the predicted level. D. Yes. The temperature will reach the predicted level when CO 2 levels hit 362 ppm .
a) r=.575 b) D c) 15.606, .003, d) C e)D f) F g) 16.692 h) C
Question 6 look at pearson a) What are the units of the residuals? The residuals are in terms of: b) Which residual contributes the most to the sum that was minimized according to the least squares criterion to find this regression? The residual that contributes the most to the sum is ____ c) Which residual contributes the least to the sum that was minimized according to the least squares criterion to find this regression? The residual that contributes the least to the sum is____.
a) thousands of dollars b) 2.81 c) .07
question 7 look at pearson a) Which drive capacity contributes the most to the sum that is minimized by the least squares criterion? b) Two of the residuals are negative. What does that mean about those drives? Be specific and use the correct units. a)The drive with a capacity of _____TB contributes the most to the sum of squared residuals b)A negative residual means that the drive costs less than what might be expected from this model and its capacity. A residual of negative 15.58 indicates a drive that costs $15.58 less than expected.
a)3