stats exam 1

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Chapter 7 #27 Residuals Tell what each of the residual plots below indicates about the appropriateness of the linear model that was fit to the data. graph (pg. 203)

a) Model is appropriate. b) Model is not appropriate. Relationship is nonlinear. c) Model may not be appropriate. Spread is changing.

Chapter 7 #25 Regression equations Fill in the missing information in the following table. graph (pg.203)

GRAPH (pg. A-11)

Chapter 8 #25 Unusual points Each of these four scatterplots shows a cluster of points and one "stray" point. For each, answer these questions: 1) In what way is the point unusual? Does it have high leverage, a large residual, or both? 2) Do you think that point is an influential point? 3) If that point were removed, would the correlation become stronger or weaker? Explain. 4) If that point were removed, would the slope of the regression line increase or decrease? Explain. graph (pg. 236)

a) 1) High leverage, small residual. 2) No, not influential for the slope. 3) Correlation would decrease because outlier has large zx and zy, increasing correlation. 4) Slope wouldn't change much because outlier is in line with other points. b) 1) High leverage, probably small residual. 2) Yes, influential. 3) Correlation would weaken, increasing toward zero. 4) Slope would increase toward 0, since outlier makes it negative. c) 1) Some leverage, large residual. 2) Yes, somewhat influential. 3) Correlation would increase, since scatter would decrease. 4) Slope would increase slightly. d) 1) Little leverage, large residual. 2) No, not influential. 3) Correlation would become stronger and become more negative because scatter would decrease. 4) Slope would change very little.

Chapter 6 #19 Matching Here are several scatterplots. The calculated correlations are -0.923, -0.487, 0.006, and 0.777. Which is which? graph (pg. 171)

a) 0.006 b) 0.777 c) -0.923 d) -0.487

Chapter 5 #43 More cattle Based on the model N(1152, 84) describing Angus steer weights from Exercise 25, what are the cutoff values for a) the highest 10% of the weights? b) the lowest 20% of the weights? c) the middle 40% of the weights?

a) 1259.7 lb b) 1081.3 lb c) 1108 lb to 1196 lb

Chapter 5 #25 Cattle Using N(1152, 84), the Normal model for weights of Angus steers in Exercise 9, a) How many standard deviations from the mean would a steer weighing 1000 pounds be? b) Which would be more unusual, a steer weighing 1000 pounds or one weighing 1250 pounds?

a) About 1.81 standard deviations below the mean. b) 1000 (z = -1.81) is more unusual than 1250 (z = 1.17).

Chapter 3 #19 Heart attack stays- The histogram shows the lengths of hospital stays (in days) for all the female patients admitted to hospitals in New York during one year with a primary diagnosis of acute myocardial infarction (heart attack). graph (pg.74) a) From the histogram, would you expect the mean or median to be larger? Explain. b) Write a few sentences describing this distribution (shape, center, spread, unusual features). c) Which summary statistics would you choose to summarize the center and spread in these data? Why?

a) Because the distribution is skewed to the right, we expect the mean to be larger. b) Bimodal and skewed to the right. Center mode near 8 days. Another mode at 1 day (may represent patients who didn't survive). Most of the patients stay between 1 and 15 days. There are some extremely high values above 25 days. c) The median and IQR, because the distribution is strongly skewed.

Chapter 4 #25 Women's basketball Here are boxplots of the points scored during the first 10 games of the season for both Scyrine and Alexandra: graph (pg. 104) a) Summarize the similarities and differences in their performance so far. b) The coach can take only one player to the state championship. Which one should she take? Why?

a) Both girls have a median score of about 17 points per game, but Scyrine is much more consistent. Her IQR is about 2 points, while Alexandra's is over 10. b) If the coach wants a consistent performer, she should take Scyrine. She'll almost certainly deliver somewhere between 15 and 20 points. But if she wants to take a chance and needs a "big game," she should take Alexandra. Alex scores over 24 points about a quarter of the time. (On the other hand, she scores under 11 points as often.)

Chapter 4 #39 Reading scores A class of fourth graders takes a diagnostic reading test, and the scores are reported by reading grade level. The 5-number summaries for the 14 boys and 11 girls are shown: graph (pg. 106) a) Which group had the highest score? b) Which group had the greater range? c) Which group had the greater interquartile range? d) Which group's scores appear to be more skewed? Explain. e) Which group generally did better on the test? Explain. f) If the mean reading level for boys was 4.2 and for girls was 4.6, what is the overall mean for the class?

a) Boys. b) Boys. c) Girls. d) The boys appeared to have more skew, as their scores were less symmetric between quartiles. The girls' quartiles are the same distance from the median, although the left tail stretches a bit farther to the left. e) Girls. Their median and upper quartiles are larger. The lower quartile is slightly lower, but close. f) 314(4.2) + 11(4.6)4>25 = 4.38

Chapter 8 #33 Heating After keeping track of his heating expenses for several winters, a homeowner believes he can estimate the monthly cost from the average daily Fahrenheit temperature by using the model Cost = 133 - 2.13 Temp. Here is the residuals plot for his data: Graph (pg.237) A) Interpret the slope of the line in this context. B) Interpret the y-intercept of the line in this context. C) During months when the temperature stays around freezing, would you expect cost predictions based on this model to be accurate, too low, or too high? Explain. D) What heating cost does the model predict for a month that averages 10? E) During one of the months on which the model was based, the temperature did average 10. What were the actual heating costs for that month? F) Should the homeowner use this model? Explain. G) Would this model be more successful if the temperature were expressed in degrees Celsius? Explain.

a) Cost decreases by $2.13 per degree of average daily Temp. So warmer temperatures indicate lower costs. b) For an avg. monthly temperature of 0F, the cost is predicted to be $133. c) Too high; the residuals (observed - predicted) around 32F are negative, showing that the model overestimates the costs. d) $111.70 e) About $105 f) No, the residuals show a definite curved pattern. The data are probably not linear. g) No, there would be no difference. The relationship does not depend on the units.

Chapter 4 #17 Rock concert accidents- Crowd Management Strategies (www.crowdsafe.com) monitors accidents at rock concerts. In their database, they list the names and other variables of victims whose deaths were attributed to "crowd crush" at rock concerts. Here are the histogram and boxplot of the victims' ages for data from a recent one-year period. graph (pg. 101) a) What features of the distribution can you see in both the histogram and the boxplot? b) What features of the distribution can you see in the histogram that you could not see in the boxplot? c) What summary statistic would you choose to summarize the center of this distribution? Why? d) What summary statistic would you choose to summarize the spread of this distribution? Why?

a) Essentially symmetric, very slightly skewed to the right with two high outliers at 36 and 48. Most victims are between the ages of 16 and 24. b) The slight increase between ages 22 and 24 is apparent in the histogram but not in the boxplot. It may be a second mode. c) The median would be the most appropriate measure of center because of the slight skew and the extreme outliers. d) The IQR would be the most appropriate measure of spread because of the slight skew and the extreme outliers.

Chapter 6 #41 Fuel economy 2010 Here are advertised horsepower ratings and expected gas mileage for several 2010 vehicles. (www.kbb.com) Graph (pg. 174) a) Make a scatterplot for these data. b) Describe the direction, form, and strength of the plot. c) Find the correlation between horsepower and miles per gallon. d) Write a few sentences telling what the plot says about fuel economy.

a. GRAPH b) Negative, linear, strong. c) -0.909 d) There is a fairly strong linear relation in a negative direction between horsepower and highway gas mileage. Lower fuel efficiency is generally associated with higher horsepower.

Chapter 7 #65 A second helping of burgers- In Exercise 63, you created a model that can estimate the number of Calories in a burger when the Fat content is known. a) Explain why you cannot use that model to estimate the fat content of a burger with 600 calories. b) Using an appropriate model, estimate the fat content of a burger with 600 calories.

65. a) The regression was for predicting calories from fat, not the other way around. b) Fat = -15.0 + 0.083 Calories. Predict 34.8 grams of fat.

Chapter 3 #29 Pizza prices The histogram shows the distribution of the prices of plain pizza slices (in $) for 156 weeks in Dallas, TX. graph (pg. 75) Which summary statistics would you choose to summarize the center and spread in these data? Why?

The mean and standard deviation because the distribution is unimodal and symmetric.

chapter 1 #15 Bicycle safety- Ian Walker, a psychologist at the University of Bath, wondered whether drivers treat bicycle riders differently when they wear helmets. He rigged his bicycle with an ultrasonic sensor that could measure how close each car was that passed him. He then rode on alternating days with and without a helmet. Out of 2500 cars passing him, he found that when he wore his helmet, motorists passed 3.35 inches closer to him, on average, than when his head was bare. (Source: NY Times, Dec. 10, 2006)

Who—2500 cars; What—Distance from car to bicycle; Population—All cars passing bicyclists.

Chapter 5 #7 Guzzlers?- Environmental Protection Agency (EPA) fuel economy estimates for automobile models tested recently predicted a mean of 24.8 mpg and a standard deviation of 6.2 mpg for highway driving. Assume that a Normal model can be applied. a) Draw the model for auto fuel economy. Clearly label it, showing what the 68-95-99.7 Rule predicts. b) In what interval would you expect the central 68% of autos to be found? c) About what percent of autos should get more than 31 mpg? d) About what percent of cars should get between 31 and 37.2 mpg? e) Describe the gas mileage of the worst 2.5% of all cars.

a) GRAPH b) 18.6 to 31.0 mpg c) 16% d) 13.5% e) Less than 12.4 mpg

Chapter 3 #3 Outliers- The 5-number summary for the run times in minutes of the 150 highest grossing movies of 2010 looks like this: Are there any outliers in these data? How can you tell? Min= 43 QI= 98 Med= 104.5 Q3= 116 Max= 160 Are there any outliers in the data? how can you tell?

IQR = 116 - 98 = 18. Q3 + 1.5 * IQR = 116 + 1.5 * 18 = 143. Q1 - 1.5 * IQR = 71. So any point above 143 or below 71 is an outlier. Because the Min and Max are both outside these fences, there is at least one outlier on both the low end and the high end.

Chapter 6 #23 Roller coasters- Most roller coasters get their speed by dropping down a steep initial incline, so it makes sense that the height of that drop might be related to the speed of the coaster. Here's a scatterplot of top Speed and largest Drop for 75 roller coasters around the world. Graph (pg.172) a) Does the scatterplot indicate that it is appropriate to calculate the correlation? Explain. b) In fact, the correlation of Speed and Drop is 0.91. Describe the association.

23. a) Yes. It shows a linear form and no outliers. b) There is a strong, positive, linear association between drop and speed; the greater the coaster's initial drop, the higher the top speed.

Chapter 2 #27 Seniors- Prior to graduation, a high school class was surveyed about its plans. The following table displays the results for white and minority students (the "Minority" group included African-American, Asian, Hispanic, and Native American students): graph pg. 38 a) What percent of the seniors are white? b) What percent of the seniors are planning to attend a 2-year college? c) What percent of the seniors are white and planning to attend a 2-year college? d) What percent of the white seniors are planning to attend a 2-year college? e) What percent of the seniors planning to attend a 2-year college are white?

a) 82.5% b) 12.9% c) 11.1% d) 13.4% e) 85.7%

Chapter 5 #45 Cattle, finis Consider the Angus weights model N(1152, 84) one last time. a) What weight represents the 40th percentile? b) What weight represents the 99th percentile? c) What's the IQR of the weights of these Angus steers?

a) 1130.7 lb b) 1347.4 lb c) 113.3 lb

Chapter 3 #5 Adoptions II Here is a histogram showing the total number of adoptions in each of the 50 states and the District of Columbia. graph (pg.73) Would you expect the mean number of adoptions or the median number of adoptions to be higher? Why?

The mean will be higher. The distribution is unimodal and skewed to the right, so the mean will be pulled by the tail toward the higher values.

Chapter 1 #25 Babies Medical researchers at a large city hospital investigating the impact of prenatal care on newborn health collected data from 882 births during 1998-2000. They kept track of the mother's age, the number of weeks the pregnancy lasted, the type of birth (cesarean, induced, natural), the level of prenatal care the mother had (none, minimal, adequate), the birth weight and sex of the baby, and whether the baby exhibited health problems (none, minor, major).

Who—882 births; Cases—Each of the 882 births is a case; What—Mother's age, length of pregnancy, type of birth, levelof prenatal care, birth weight of baby, sex of baby, and baby's health problems; When—1998-2000; Where—Large city hospital; Why—Researchers were investigating the impact of prenatal care on newborn health; How—Not specified exactly, but probably from hospital records; Variable—Mother's age; Type—Quantitative; Units—Not specified, probably years; Variable—Length of pregnancy; Type—Quantitative; Units— Weeks; Variable—Birth weight of baby; Type—Quantitative; Units—Not specified, probably pounds and ounces; Variable— Type of birth; Type—Categorical; Variable—Level of prenatal care; Type—Categorical; Variable—Sex; Type—Categorical; Variable—Baby's health problems; Type—Categorical.

Chapter 5 #11 Music library Corey has 4929 songs in his computer's music library. The lengths of the songs have a mean of 242.4 seconds and standard deviation of 114.51 seconds. A Normal probability plot of the song lengths looks like this: graph (pg.133) a) Do you think the distribution is Normal? Explain. b) If it isn't Normal, how does it differ from a Normal model?

a) No. The plot is not straight. b) It is skewed to the right.

Chapter 8 #39 Gestation For humans, pregnancy lasts about 280 days. In other species of animals, the length of time from conception to birth varies. Is there any evidence that the gestation period is related to the animal's life span? The first scatterplot shows Gestation Period (in days) vs. Life Expectancy (in years) for 18 species of mammals. The highlighted point at the far right represents humans. Graph (239) A) For these data, r = 0.54, not a very strong relationship. Do you think the association would be stronger or weaker if humans were removed? Explain. B) Is there reasonable justification for removing humans from the data set? Explain. C) Here are the scatterplot and regression analysis for the 17 nonhuman species. Comment on the strength of the association. graph (239) D) Interpret the slope of the line. E) Some species of monkeys have a life expectancy of about 20 years. Estimate the expected gestation period of one of these monkeys.

39. a) Stronger. Both slope and correlation would increase. b) Restricting the study to nonhuman animals would justify it. c) Moderately strong. d) For every year increase in life expectancy, the gestation period increases by about 15.5 days, on average. e) About 270.5 days

Chapter 3 #57 Math scores 2009- The National Center for Education Statistics (nces.ed.gov/nationsreportcard/) reported 2005 average mathematics achievement scores for eighth graders in all 50 states: graph (pg.79) a) Find the median, the IQR, the mean, and the standard deviation of these state averages. b) Which summary statistics would you report for these data? Why? c) Write a brief summary of the performance of eighth graders nationwide.

a) Median 284, IQR 10, Mean 282.98, SD 7.60 b) Because it's skewed to the left, probably better to report Median and IQR. c) Skewed to the left. The center is around 284. The middle 50% of states scored between 278 and 288. Mississippi's was much lower than other states' scores.

Chapter 5 #9 Normal cattle- The Virginia Cooperative Extension reports that the mean weight of yearling Angus steers is 1152 pounds. Suppose that weights of all such animals can be described by a Normal model with a standard deviation of 84 pounds. What percent of steers weigh a) over 1250 pounds? b) under 1200 pounds? c) between 1000 and 1100 pounds?

a) 12.2% b) 71.6% c) 23.3%

Chapter 5 #5 Shipments- A company selling clothing on the Internet reports that the packages it ships have a median weight of 68 ounces and an IQR of 40 ounces. a) The company plans to include a sales flyer weighing 4 ounces in each package. What will the new median and IQR be? b) If the company recorded the shipping weights of these new packages in pounds instead of ounces, what would the median and IQR be? (1 lb. = 16 oz.)

a) 72 oz., 40 oz. b) 4.5 lb, 2.5 lb

Chapter 3 #39 Payroll- A small warehouse employs a supervisor at $1200 a week, an inventory manager at $700 a week, six stock boys at $400 a week, and four drivers at $500 a week. a) Find the mean and median wage. b) How many employees earn more than the mean wage? c) Which measure of center best describes a typical wage at this company: the mean or the median? d) Which measure of spread would best describe the payroll: the range, the IQR, or the standard deviation? Why?

a) Mean $525, median $450 b) Two employees earn more than the mean. c) The median because of the outlier. d) The IQR will be least sensitive to the outlier of $1200, so it would be the best to report.

Chapter 6 #13 Scatterplots Which of these scatterplots show a) little or no association? b) a negative association? c) a linear association? d) a moderately strong association? e) a very strong association? graph (pg.170)

a) None b) 3 and 4 c) 2, 3, and 4 d) 2 e) 3 and 1

Chapter 2 #15 Magnet schools An article in the Winter 2003 issue of Chance magazine reported on the Houston Independent School District's magnet schools programs. Of the 1755 qualified applicants, 931 were accepted, 298 were wait-listed, and 526 were turned away for lack of space. Find the relative frequency distribution of the decisions made, and write a sentence describing it.

1755 students applied for admission to the magnet schools program. 53% were accepted, 17% were wait-listed, and the other 30% were turned away.

Chapter 7 #63 Burgers In the last chapter, you examined the association between the amounts of Fat and Calories in fast-food hamburgers. Here are the data: graph (pg.209) a) Create a scatterplot of Calories vs. Fat. b) Interpret the value of R2 in this context. c) Write the equation of the line of regression. d) Use the residuals plot to explain whether your linear model is appropriate. e) Explain the meaning of the y-intercept of the line. f) Explain the meaning of the slope of the line. g) A new burger containing 28 grams of fat is introduced. According to this model, its residual for calories is +33. How many calories does the burger have?

a) GRAPH b) 92.3% of the variation in calories can be accounted for by the fat content. c) Calories = 211.0 + 11.06 Fat d) GRAPH (pg A-12) Residuals show no clear pattern, so the model seems appropriate. e) Could say a fat-free burger still has 211.0 calories, but this is extrapolation (no data close to 0). f) Every gram of fat adds 11.06 calories, on average. g) 553.5 calories

Chapter 7 #53 SAT scores- The SAT is a test often used as part of an application to college. SAT scores are between 200 and 800, but have no units. Tests are given in both Math and Verbal areas. SAT-Math problems require the ability to read and understand the questions, but can a person's verbal score be used to predict the math score? Verbal and math SAT scores of a high school graduating class are displayed in the scatterplot, with the regression line added. graph (pg. 207) A) Describe the relationship. B) Are there any students whose scores do not seem to fit the overall pattern? C) For these data, r = 0.685. Interpret this statistic. D) These verbal scores averaged 596.3, with a standard deviation of 99.5, and the math scores averaged 612.2, with a standard deviation of 96.1. Write the equation of the regression line. E) Interpret the slope of this line. F) Predict the math score of a student with a verbal score of 500. G) Every year, some students score a perfect 1600. Based on this model, what would such a student's residual be for her math score?

a) Moderately strong, fairly straight, and positive. Possibly some outliers (higher-than-expected math scores). b) The student with 500 verbal and 800 math. c) Positive, fairly strong linear relationship. 46.9% of variation in math scores is explained by verbal scores. d) Math = 217.7 + 0.662 * Verbal e) Every point of verbal score adds 0.662 points to the predicted average math score. f) 548.5 points g) 53.0 points

Chapter 4 #33 Graduation?- A survey of major universities asked what percentage of incoming freshmen usually graduate "on time" in 4 years. Use the summary statistics given to answer the questions that follow. graph (pg. 105) a) Would you describe this distribution as symmetric or skewed? Explain. b) Are there any outliers? Explain. c) Create a boxplot of these data. d) Write a few sentences about the graduation rates.

a) Probably slightly left skewed. The mean is slightly below the median, and the 25th percentile is farther from the median than the 75th percentile. b) No, all data are within the fences. C. Graph d) The participants scored about the same with no caffeine and low caffeine. The medians for both were 21 points, with slightly more variation for the low-caffeine group. The high-caffeine group generally scored lower than the other two groups on all measures of the 5-number summary: min, lower quartile, median, upper quartile, and max.

Chapter 4 #1 Load factors, domestic and international- The Research and Innovative Technology Administration of the Bureau of Transportation Statistics (www.TranStats.bts.gov/ Data_Elements.aspx?Data=2) reports load factors (passenger-miles as a percentage of available seat-miles) for commercial airlines for every month from 2000 through 2011. Here are histograms comparing the domestic and international load factors for this time period: graph (pg. 99)

Both distributions are unimodal and skewed to the left. The lowest international value may be an outlier. Because the distributions are skewed, we choose to compare medians and IQRs. The medians are very similar. The IQRs show that the domestic load factors vary a bit more.

Chapter 5 #13 Payroll- Here are the summary statistics for the weekly payroll of a small company: lowest salary = +300, mean salary = +700, median = +500, range = +1200, IQR = +600, first quartile = +350, standard deviation = +400. a) Do you think the distribution of salaries is symmetric, skewed to the left, or skewed to the right? Explain why. b) Between what two values are the middle 50% of the salaries found? c) Suppose business has been good and the company gives every employee a $50 raise. Tell the new value of each of the summary statistics. d) Instead, suppose the company gives each employee a 10% raise. Tell the new value of each of the summary statistics.

a) Skewed to the right; mean is higher than median. b) $350 and $950 c) Minimum $350. Mean $750. Median $550. Range $1200. IQR $600. Q1 $400. SD $400. d) Minimum $330. Mean $770. Median $550. Range $1320. IQR $660. Q1 $385. SD $440.

Chapter 3 #31 Pizza prices again Look again at the histogram of the pizza prices in Exercise 29. a) Is the mean closer to $2.40, $2.60, or $2.80? Why? b) Is the standard deviation closer to $0.15, $0.50, or $1.00? Explain.

a) The mean is closest to $2.60 because that's the balancing point of the histogram. b) The standard deviation is closest to $0.15 since that's a typical distance from the mean. There are no prices as far as $0.50 or $1.00 from the mean.

Chapter 2 #17 Causes of death 2007 The Centers for Disease Control and Prevention lists causes of death in the United States during 2007: graph pg.36 a) Is it reasonable to conclude that heart or respiratory diseases were the cause of approximately 30.7% of U.S. deaths in 2007? b) What percent of deaths were from causes not listed here? c) Create an appropriate display for these data.

a) Yes. We can add because these categories do not overlap. (Each person is assigned only one cause of death.) b) 100 - (25.4 + 23.2 + 5.6 + 5.3 + 5.1) = 35.4% c) Either a bar chart or pie chart with "other" added would be appropriate. A bar chart is shown. graph (pg. A-2)


Kaugnay na mga set ng pag-aaral

Principles of Management C11-C16

View Set

APUSH- Colonial Life, Settling the Colonies, Clash of Culture

View Set

BSc 1407 MB Chapter 48: Neurons, Synapses, and Signaling

View Set

Series 7 - Chapter 02: Custom Exam

View Set

LITERATURE (Ch. 6) (test: irony) (part 1)

View Set

Chapter 1: Criminal law and criminal punishments

View Set

APUSH Semester 2 Final Review, unit 9 Test, unit 6 test, Unit 7 test, unit 8 test

View Set

Legal Dimensions of Nursing Practice

View Set

IRREGULAR VERBS:INFINITIVE->PAST SIMPLE: -EW- Family

View Set

Criminal Justice--Chapter 3: Criminal Law and the Criminal Justice Process, CCJ Final, CCJ 2002 Chapter 1, soca 234 criminal justice system corey colyer exam 1, soca 234 criminal justice wvu colyer exam 2 (ch 4,5,6), soca 234 wvu colyer exam 3, soca...

View Set

Chapter 39 Ports Pathophysiology

View Set