Statistics Final Exam
Categorize these measurements associated with student life according to level: nominal, ordinal, interval, or ratio. (a) Length of time to complete an exam (b) Time of first class (c) Major field of study (d) Course evaluation scale: poor, acceptable, good (e) Score on last exam (based on 100 possible points) (f) Age of student
1.1.11
A national survey asked 1261 U.S. adult fast-food customers which meal (breakfast, lunch, dinner, snack) they ordered. (a) Identify the variable. (b) Is the variable quantitative or qualitative? (c) What is the implied population?
1.1.7
Suppose you are assigned the number 1, and the other students in your statistics class call out consecutive numbers until each person in the class has his or her own number. Explain how you could get a random sample of four students from your statistics class. (a) Explain why the first four students walking into the classroom would not necessarily form a random sample. (b) Explain why four students coming in late would not necessarily form a random sample. (c) Explain why four students sitting in the back row would not necessarily form a random sample. (d) Explain why the four tallest students would not necessarily form a random sample.
1.2.9
Zane is examining two studies involving how different generations classify specified items as either luxuries or necessities. In the first study, the Echo generation is defined to be people ages 18-29. The second study defined the Echo generation to be people ages 20-31. Zane notices that the first study was conducted in 2006 while the second one was conducted in 2008. (a) Are the two studies inconsistent in their description of the Echo generation? (b) What are the birth years of the Echo generation?
1.3.5
The following table shows site type and type of pottery for a random sample of 628 sherds at a location in Sand Canyon Archaeological Project, Colorado. Site Type MsaVrde McElmo Mancos Ttl Mesa Top 75 61 53 189 Cliff-Talus 81 70 62 213 CnynBnch 92 68 66 226 Clmn Total 248 199 181 628 Use a chi-square test to determine if site type and pottery type are independ- ent at the 0.01 level of significance. (a) What is the level of significance? State the null and alternate hypotheses. (b) Find the value of the chi-square statistic for the sample. Are all the expected frequencies greater than 5? What sampling distribution will you use? What are the degrees of freedom? (c) Find or estimate the P-value of the sample test statistic. (d) Based on your answers in parts (a) to (c), will you reject or fail to reject the null hypothesis of independence? (e) Interpret your conclusion in the context of the application. Use the expected values E to the hundredths place.
10.1.11
A data set has values ranging from a low of 10 to a high of 50. What's wrong with using the class limits 10-20, 20-30, 30-40, 40-50 for a frequency table?
2.1.3
A data set with whole numbers has a low value of 20 and a high value of 82. Find the class width and class limits for a frequency table with 7 classes.
2.1.5
A survey of 1000 adults (reported in USA Today) uncovered some interesting housekeeping secrets. When unexpected company comes, where do we hide the mess? The survey showed that 68% of the respondents toss their mess into the closet, 23% shove things under the bed, 6% put things into the bathtub, and 3% put the mess into the freezer. Make a circle graph to display this information.
2.2.9
The Boston Marathon is the oldest and best-known U.S. marathon. It covers a route from Hopkinton, Massachusetts, to downtown Boston. The distance is approximately 26 miles. Search the marathon site to find a wealth of information about the history of the race. In particular, the site gives the win- ning times for the Boston Marathon. They are all over 2 hours. The following data are the minutes over 2 hours for the winning male runners: 1961-1980: 23 23 18 19 16 17 15 22 13 10 18 15 16 13 9 20 14 10 9 12 1981-2000: 9 8 9 10 14 7 11 8 9 8 11 8 9 7 9 9 10 7 9 9 (a) Make a stem-and-leaf display for the minutes over 2 hours of the winning times for the years 1961 to 1980. Use two lines per stem. (b) Make a stem-and-leaf display for the minutes over 2 hours of the winning times for the years 1981 to 2000. Use two lines per stem. (c) Interpretation Compare the two distributions. How many times under 15 minutes are in each distribution?
2.3.5
Consider the mode, median, and mean. Which average represents the middle value of a data distribution? Which average represents the most frequent value of a distribution? Which average takes all the specific values into account?
3.1.1
When computing the standard deviation, does it matter whether the data are sample data or data comprising the entire population? Explain.
3.2.3
Consider the following ordered data: 2 5 5 6 7 7 8 9 10 (a) Find the low, Q1, median, Q3, high. (b) Find the interquartile range. (c) Make a box-and-whisker plot.
3.3.5
A recent Harris Poll survey of 1010 U.S. adults selected at random showed that 627 consider the occupation of firefighter to have very great prestige. Estimate the probability (to the nearest hundredth) that a U.S. adult selected at random thinks the occupation of firefighter has very great prestige.
4.1.7
Assume A and B are events such that: 0 < P(A) < 1 and 0 < P(B) < 1. Answer true or false and give a brief explanation for each answer. P(A) and (complement of A) = 0 P(A | complement of A) = 1 P(A and B) ≤ P(A)
4.2.37, 4.2.39, 4.2.43
Given P(A)=0.2 and P(B)=0.4: (a) If A and B are independent events, compute P(A and B). (b) If P(A | B)= 0.1, compute P(A and B).
4.2.5
Which of the following are continuous variables, and which are discrete? (a) Number of traffic fatalities per year in the state of Florida (b) Distance a golf ball travels after being hit with a driver (c) Time required to drive from home to college on any given day (d) Number of ships in Pearl Harbor on any given day (e) Your weight before breakfast each morning
5.1.1
Consider each distribution. Determine if it is a valid probability distribution or not, and explain your answer. (a) x 0 1 2 p(x) 0.25 0.60 0.15 (b) x 0 1 2 p(x) 0.25 0.60 0.20
5.1.3
Consider the probability distribution shown below: x 0 1 2 p(x) 0.25 0.60 0.15 Compute the expected value and the standard deviation of the distribution.
5.1.7
Suppose x has a mound-shaped distribution with σ=3. (a) Find the minimal sample size required so that for a 95% confidence interval, the maximal margin of error is E=0.4. (b) Check Requirements Based on this sample size, can we assume that the x distribution is approximately normal? Explain.
7.1.13
Total plasma volume is important in determining the required plasma component in blood replacement therapy for a person undergoing surgery. Plasma volume is influenced by the overall health and physical activity of an individual. Suppose that a random sample of 45 male firefighters are tested and that they have a plasma volume sample mean of x̅ = 37.5ml/kg (milliliters plasma per kilogram body weight). Assume that σ=7.50ml/kg for the distribution of blood plasma. (a) Find a 99% confidence interval for the population mean blood plasma volume in male firefighters. What is the margin of error? (b) What conditions are necessary for your calculations? (c) Interpret your results in the context of this problem. (d) Find the sample size necessary for a 99% confidence level with maximal margin of error E = 2.50 for the mean plasma volume in male firefighters.
7.1.17
A random sample is drawn from a population with σ= 12. The sample mean is 30. (a) Compute a 95% confidence interval for μ based on a sample of size 49. What is the value of the margin of error? (b) Compute a 95% confidence interval for μ based on a sample of size 100. What is the value of the margin of error? (c) Compute a 95% confidence interval for m based on a sample of size 225. What is the value of the margin of error? (d) Compare the margins of error for parts (a) through (c). As the sample size increases, does the margin of error decrease? (e) Critical Thinking Compare the lengths of the confidence intervals for parts (a) through (c). As the sample size increases, does the length of a 90% confidence interval decrease?
7.1.21
True or False: A larger sample size produces a longer confidence interval for μ.
7.1.5
Suppose x has a mound-shaped distribution. A random sample of size 16 has sample mean 10 and sample standard deviation 2. (a) Check Requirements Is it appropriate to use a Student's t distribution to compute a confidence interval for the population mean μ? Explain. (b) Find a 90% confidence interval for μ. (c) Explain the meaning of the confidence interval you computed.
7.2.11
At Burnt Mesa Pueblo, the method of tree-ring dating gave the following years a.d. for an archaeological excavation site: 1189 1271 1267 1272 1268 1316 1275 1317 1275 (a) Use a calculator with mean and standard deviation keys to verify that the sample mean year is x̅ = 1272, with sample standard deviation s = 37years. (b) Find a 90% confidence interval for the mean of all tree-ring dates from this archaeological site. (c) Interpretation What does the confidence interval mean in the context of this problem?
7.2.13
Over the past several months, an adult patient has been treated for tetany (severe muscle spasms). This condition is associated with an average total calcium level below 6 mg/dl. Recently, the patient's total calcium tests gave the following readings (in mg/dl). 9.3 8.8 10.1 8.9 9.4 9.8 10.0 9.9 11.2 12.1 (a) Use a calculator to verify that x̅ = 9.95 and s= 1.02. (b) Find a 99.9% confidence interval for the population mean of total calcium in this patient's blood. (c) Based on your results in part (b), does it seem that this patient still has a calcium deficiency? Explain.
7.2.17
Find tc for a 0.90 confidence level when the sample size is 22.
7.2.3
Consider a 90% confidence interval for μ. Assume σ is not known. For which sample size, n =10 or n=20, is the critical value tc larger?
7.2.7
Lorraine computed a confidence interval for μ based on a sample of size 41. Since she did not know σ, she used s in her calculations. Lorraine used the normal distribution for the confidence interval instead of a Student's t distribution. Was her interval longer or shorter than one obtained by using an appropriate Student's t distribution? Explain.
7.2.9
For a binomial experiment with r successes out of n trials, what value do we use as a point estimate for the probability of success p on a single trial?
7.3.1
What percentage of your campus student body is female? Let p be the proportion of women students on your campus. (a) If no preliminary study is made to estimate p, how large a sample is needed to be 99% sure that a point estimate pˆ will be within a distance of 0.05 from p? (b) The Statistical Abstract of the United States, 112th Edition, indicates that approximately 54% of college students are female. Answer part (a) using this estimate for p.
7.3.25
If we fail to reject (i.e., "accept") the null hypothesis, does this mean that we have proved it to be true beyond all doubt? Explain your answer.
8.1.3
A random sample of 30 binomials trials resulted in 12 successes. Test the claim that the population proportion of successes does not equal 0.50. Use a level of significance of 0.05. (a) Check Requirements Can a normal distribution be used for the pˆdistribution? Explain. (b) State the hypotheses. (c) Compute pˆ and the corresponding standardized sample test statistic. (d) Find the P-value of the test statistic. (e) Do you reject or fail to reject H0? Explain. (f) Interpretation What do the results tell you?
8.3.5
Are data that can be paired independent or dependent?
8.4.1
When we use a least-squares line to predict y values for x values beyond the range of x values found in the data, are we extrapolating or interpolating? Are there any concerns about such predictions?
9.2.3
Given P(A)=0.2, P(B)=0.5, P(A | B)= 0.3: (a) Compute P(A and B). (b) Compute P(A or B).
4.2.7
In general, are chi-square distributions symmetric or skewed? If skewed, are they skewed right or left?
10.1.1
True or false: The value z of c is a value from the standard normal distribution such that P(-zc < x < zc) = c
7.1.1
Suppose you want to test the claim that a population mean equals 40. (a) State the null hypothesis. (b) State the alternate hypothesis if you have no information regarding how the population mean might differ from 40. (c) State the alternate hypothesis if you believe (based on experience or past studies) that the population mean may exceed 40. (d) State the alternate hypothesis if you believe (based on experience or past studies) that the population mean may be less than 40.
8.1.11
You are interested in the weights of backpacks students carry to class and decide to conduct a study using the backpacks carried by 30 students. (a) Give some instructions for weighing the backpacks. Include unit of measure, accuracy of measure, and type of scale. (b) Do you think each student asked will allow you to weigh his or her backpack? (c) Do you think telling students ahead of time that you are going to weigh their backpacks will make a difference in the weights?
1.1.15
What is the difference between a parameter and a statistic?
1.1.3
Explain the difference between a stratified sample and a cluster sample.
1.2.1
A study involves three variables: income level, hours spent watching TV per week, and hours spent at home on the Internet per week. List some ways the variables might be confounded.
1.3.1
The Focus Problem at the beginning of the chapter refers to excavations at Burnt Mesa Pueblo in Bandelier National Monument. One question the archaeologists asked was: Is raw material used by prehistoric Indians for stone tool manufacture independent of the archaeo- logical excavation site? Two different excavation sites at Burnt Mesa Pueblo gave the information in the following table. Use a chi-square test with 5% level of significance to test the claim that raw material used for construction of stone tools and excavation site are independent. Material Site A Site B RowTotal Basalt 731 584 1315 Obsidian 102 93 105 PdrnalChert 510 525 1035 Other 85 94 179 ColumnTotal 1428 1296 2724 (a) What is the level of significance? State the null and alternate hypotheses. (b) Find the value of the chi-square statistic for the sample. Are all the expected frequencies greater than 5? What sampling distribution will you use? What are the degrees of freedom? (c) Find or estimate the P-value of the sample test statistic. (d) Based on your answers in parts (a) to (c), will you reject or fail to reject the null hypothesis of independence? (e) Interpret your conclusion in the context of the application. Use the expected values E to the hundredths place.
10.1.17
Random samples of people ages 15-24 and 25-34 were asked about their preferred method of (remote) com- munication with friends. The respondents were asked to select one of the methods from the following list: cell phone, instant message, e-mail, other. Age Cell IM Email Other Total 15-24 48 40 5 7 100 25-34 41 30 15 14 100 clm total 89 70 20 21 200 (i) Make a cluster bar graph showing the percentages in each age group who selected each method. (ii) Test whether the two populations share the same proportions of prefer- ences for each type of communication method. Use a= 0.05. (a) What is the level of significance? State the null and alternate hypotheses. (b) Find the value of the chi-square statistic for the sample. Are all the expected frequencies greater than 5? What sampling distribution will you use? What are the degrees of freedom? (c) Find or estimate the P-value of the sample test statistic. (d) Based on your answers in parts (a) to (c), will you reject or fail to reject the null hypothesis of independence? (e) Interpret your conclusion in the context of the application. Use the expected values E to the hundredths place.
10.1.19
Zane is interested in the proportion of people who recycle each of three distinct products: paper, plastic, electronics. He wants to test the hypothesis that the proportion of people recycling each type of product differs by age group: 12-18 years old, 19-30 years old, 31-40 years old, over 40 years old. I. Describe the sampling method appropriate for a test of homogeneity regarding recycled products and age. II. Consider Zane's study regarding products recycled and age group. Suppose he found a sample x2= 16.83. (a) How many degrees of freedom are used? Recall that there were 4 age groups and 3 products specified. Approximate the P-value and conclude the test at the 1% level of significance. Does it appear that the proportion of people who recycle each of the specified products differ by age group? Explain. (b) From this study, can Zane identify how the different age groups differ regarding the proportion of those recycling the specified product?
10.1.5, 10.1.7
The following table shows the Myers-Briggs personality preferences for a random sample of 406 people in the listed professions. E refers to extroverted and I refers to introverted. Occptn Type E Type I Row Total Clergy 62 45 107 M.D. 68 94 162 Lawyer 56 81 137 Clmn Total 186 220 406 Use the chi-square test to determine if the listed occupations and personality preferences are independent at the 0.05 level of significance. (a) What is the level of significance? State the null and alternate hypotheses. (b) Find the value of the chi-square statistic for the sample. Are all the expected frequencies greater than 5? What sampling distribution will you use? What are the degrees of freedom? (c) Find or estimate the P-value of the sample test statistic. (d) Based on your answers in parts (a) to (c), will you reject or fail to reject the null hypothesis of independence? (e) Interpret your conclusion in the context of the application. Use the expected values E to the hundredths place.
10.1.9
You are manager of a specialty coffee shop and collect data throughout a full day regarding waiting time for customers from the time they enter the shop until the time they pick up their order. (a) What type of distribution do you think would be most desirable for the wait- ing times: skewed right, skewed left, mound-shaped symmetrical? Explain. (b) What if the distribution for waiting times were bimodal? What might be some explanations?
2.1.7
It is costly in both time and money to go to college. Does it pay off? According to the Bureau of the Census, the answer is yes. The average annual income (in thousands of dollars) of a household headed by a person with the stated education level is as follows: 21.6 if ninth grade is the high- est level achieved, 39.6 for high school graduates, 56.8 for those holding associ- ate degrees, 75.6 for those with bachelor's degrees, 91.7 for those with master's degrees, and 120.9 for those with doctoral degrees. Make a bar graph showing household income for each education level.
2.2.5
When a distribution is mound-shaped symmetrical, what is the general relationship among the values of the mean, median, and mode?
3.1.11
Find the mean, median, and mode of the data set 8 2 7 2 6
3.1.5
Which average—mean, median, or mode—is associated with the standard deviation?
3.2.1
Consider the data set 2 3 4 5 6 (a) Find the range. (b) Use the defining formula to compute the sample standard deviation s. (c) Use the defining formula to compute the population standard deviation.
3.2.5
The town of Butler, Nebraska, decided to give a teacher- competency exam and defined the passing scores to be those in the 70th percentile or higher. The raw test scores ranged from 0 to 100. Was a raw score of 82 necessarily a passing score? Explain.
3.3.3
Isabel Briggs Myers was a pioneer in the study of personality types. The personality types are broadly defined according to four main preferences. Do married couples choose similar or different personality types in their mates? The following data give an indication: Similarities and Differences in a Random Sample of 375 Married Couples: # of Similar Preferences # of Couples All four 34 Three 131 Two 124 One 71 None 15 Suppose that a married couple is selected at random. (a) Use the data to estimate the probability that they will have 0, 1, 2, 3, or 4 personality preferences in common. (b) Do the probabilities add up to 1? Why should they? What is the sample space in this problem?
4.1.17
When do creative people get their best ideas? USA Today did a survey of 966 inventors (who hold U.S. patents) and obtained the following information: Time of Day When Best Ideas Occur Time # of Inventors 6 a.m.-12 noon 290 12 noon-6 p.m. 135 6 p.m.-12 midnight 319 12 midnight-6 a.m. 222 a) Assuming that the time interval includes the left limit and all the times up to but not including the right limit, estimate the probability that an inven- tor has a best idea during each time interval: from 6 a.m. to 12 noon, from 12 noon to 6 p.m., from 6 p.m. to 12 midnight, from 12 midnight to 6 a.m. (b) Do the probabilities of part (a) add up to 1? Why should they? What is the sample space in this problem?
4.1.19
What is the probability of (a) an event A that is certain to occur? (b) an event B that is impossible?
4.1.3
If two events are mutually exclusive, can they occur con- currently? Explain.
4.2.1
M&M plain candies come in various colors. According to the M&M/Mars Department of Consumer Affairs, the distribution of colors for plain M&M candies is: Purple 20% Yellow 20% Red 20% Orange 10% Green 10% Blue 10% Brown 10% Suppose you have a large bag of plain M&M candies and you choose one candy at random. Find: (a) P(green candy or blue candy). Are these outcomes mutually exclusive? Why? (b) P(yellow candy or red candy). Are these outcomes mutually exclusive? Why? (c) P(not purple candy)
4.2.15
Given P(A)= 0.3 and P(B)=0.4: (a) If A and B are mutually exclusive events, compute P(A or B). (b) If P(A and B) = 0.1, compute P(A or B)
4.2.3
Consider a binomial experiment with n=6 trials where the probability of success on a single trial is p 5 0.85. (a) Find P (r≤1) (b) Interpretation If you conducted the experiment and got fewer than 2 successes, would you be surprised? Why?
5.2.13
A fair quarter is flipped three times. For each of the following probabilities, use the formula for the binomial distribution and a calculator to compute the requested probability. Next, look up the probability in Table 3 of Appendix II and compare the table result with the computed result. (a) Find the probability of getting exactly three heads. (b) Find the probability of getting exactly two heads. (c) Find the probability of getting two or more heads. (d) Find the probability of getting exactly three tails.
5.2.15
Sociologists say that 90% of married women claim that their husband's mother is the biggest bone of contention in their marriages (sex and money are lower-rated areas of contention). Suppose that six married women are having coffee together one morning. What is the probability that (a) all of them dislike their mother-in-law? (b) none of them dislike their mother-in-law? (c) at least four of them dislike their mother-in-law? (d) no more than three of them dislike their mother-in-law?
5.2.19
Approximately 75% of all marketing personnel are extroverts, whereas about 60% of all computer programmers are introverts. (a) At a meeting of 15 marketing personnel, what is the probability that 10 or more are extroverts? What is the probability that 5 or more are extroverts? What is the probability that all are extroverts? (b) In a group of 5 computer programmers, what is the probability that none are introverts? What is the probability that 3 or more are introverts? What is the probability that all are introverts?
5.2.23
Old Friends Information Service is a California company that is in the business of finding addresses of long-lost friends. Old Friends claims to have a 70% success rate. Suppose that you have the names of six friends for whom you have no addresses and decide to use Old Friends to track them. (a) Make a histogram showing the probability of r 5 0 to 6 friends for whom an address will be found. (b) Find the mean and standard deviation of this probability distribution. What is the expected number of friends for whom addresses will be found? (c) Quota Problem How many names would you have to submit to be 97% sure that at least two addresses will be found?
5.3.13
Consider a binomial experiment with n=8 trials and p=0.20. (a) Find the expected value and the standard deviation of the distribution. (b) Interpretation Would it be unusual to obtain 5 or more successes? Explain. Confirm your answer by looking at the binomial probability distribution table.
5.3.3
What percentage of the area under the normal curve lies (a) to the left of μ? (b) between μ - σ and μ + σ? (c) between μ - 3σ and μ + 3σ?
6.1.5
A person's blood glucose level and diabetes are closely related. Let x be a random variable measured in milligrams of glucose per deciliter (1/10 of a liter) of blood. After a 12-hour fast, the random variable x will have a distribution that is approximately normal with mean μ=85 and standard deviation σ=25. Note: After 50 years of age, both the mean and standard deviation tend to increase. What is the probability that, for an adult (under 50 years old) after a 12-hour fast, (a) x is more than 60? (b) x is less than 110? (c) x is between 60 and 110? (d) x is greater than 140 (borderline diabetes starts at 140)?
6.3.25
Assume that x has a normal distribution with the specified mean and standard deviation. Find the indicated probabilities. P(3 ≤ x ≤ 6); μ= 4, σ=2 P(8 ≤ x ≤ 12); μ= 15, σ=3.2 P (x ≥ 30); μ= 20, σ=3.4
6.3.5, 6.3.9, 6.3.11
List two unbiased estimators and their corresponding parameters.
6.5.3
Based on long experience, an airline has found that about 6% of the people making reservations on a flight from Miami to Denver do not show up for the flight. Suppose the airline overbooks this flight by selling 267 ticket reservations for an airplane with only 255 seats. (a) What is the probability that a person holding a reservation will show up for the flight? (b) Let n = 267 represent the number of ticket reservations. Let r represent the number of people with reservations who show up for the flight. Which expression represents the probability that a seat will be available for everyone who shows up holding a reservation? P(255≤r) P(r≤255) P(r≤267) P(r=255) (c) Use the normal approximation to the binomial distribution and part (b) to answer the following question: What is the probability that a seat will be available for every person who shows up holding a reservation?
6.6.15
Suppose we have a binomial experiment in which success is defined to be a particular quality or attribute that interests us. (a) Suppose n = 100 and p = 0.23. Can we safely approximate the pˆ distribution by a normal distribution? Why? Compute μ of pˆ and σ of pˆ . (b) Suppose n=20 and p=0.23. Can we safely approximate the pˆ distribution by a normal distribution? Why or why not?
6.6.21
Check that it is appropriate to use the normal approximation to the binomial. Then use the normal distribution to estimate the requested probabilities. More than a decade ago, high levels of lead in the blood put 88% of children at risk. A concerted effort was made to remove lead from the environment. Now, according to the Third National Health and Nutrition Examination Survey (NHANES III) conducted by the Centers for Disease Control and Prevention, only 9% of children in the United States are at risk of high blood-lead levels. (a) In a random sample of 200 children taken more than a decade ago, what is the probability that 50 or more had high blood-lead levels? (b) In a random sample of 200 children taken now, what is the probability that 50 or more have high blood-lead levels?
6.6.7
Check that it is appropriate to use the normal approximation to the binomial. Then use the normal distribution to estimate the requested probabilities. It is estimated that 3.5% of the general population will live past their 90th birthday (Statistical Abstract of the United States, 112th Edition). In a graduating class of 753 high school seniors, what is the probability that (a) 15 or more will live beyond their 90th birthday? (b) 30 or more will live beyond their 90th birthday? (c) between 25 and 35 will live beyond their 90th birthday? (d) more than 40 will live beyond their 90th birthday?
6.6.9
Suppose x has a normal distribution with σ=6. A random sample of size 16 has sample mean 50. (a) Check Requirements Is it appropriate to use a normal distribution to compute a confidence interval for the population mean μ? Explain. (b) Find a 90% confidence interval for μ. (c) Explain the meaning of the confidence interval you computed.
7.1.11
Sam computed a 95% confidence interval for m from a specific random sample. His confidence interval was 10.1 < μ < 12.2. He claims that the probability that μ is in this interval is 0.95. What is wrong with his claim?
7.1.9
Use Table 6 of Appendix II to find tc for a 0.95 confidence level when the sample size is 18.
7.2.1
Isabel Myers was a pioneer in the study of personality types. The following information is taken from A Guide to the Development and Use of the Myers-Briggs Type Indicator by Myers and McCaulley (Consulting Psychologists Press). In a random sample of 62 professional actors, it was found that 39 were extroverts. (a) Let p represent the proportion of all actors who are extroverts. Find a point estimate for p. (b) Find a 95% confidence interval for p. Give a brief interpretation of the meaning of the confidence interval you have found. (c) Check Requirements Do you think the conditions np > 5 and nq > 5 are satisfied in this problem? Explain why this would be an important consideration.
7.3.11
A random sample of 5792 physicians in Colorado showed that 3139 provide at least some charity care (i.e., treat poor people at no cost). These data are based on information from State Health Care Data: Utilization, Spending, and Characteristics (American Medical Association). (a) Let p represent the proportion of all Colorado physicians who provide some charity care. Find a point estimate for p. (b) Find a 99% confidence interval for p. Give a brief explanation of the meaning of your answer in the context of this problem. (c) Is the normal approximation to the binomial justified in this problem? Explain.
7.3.15
In a marketing survey, a random sample of 730 women shoppers revealed that 628 remained loyal to their favorite supermarket during the past year (i.e., did not switch stores). (a) Let p represent the proportion of all women shoppers who remain loyal to their favorite supermarket. Find a point estimate for p. (b) Find a 95% confidence interval for p. Give a brief explanation of the meaning of the interval. (c) As a news writer, how would you report the survey results regarding the percentage of women supermarket shoppers who remained loyal to their favorite supermarket during the past year? What is the margin of error based on a 95% confidence interval?
7.3.19
Results of a poll of a random sample of 3003 American adults showed that 20% did not know that caffeine contributes to dehydration. The poll was conducted for the Nutrition Information Center and had a margin of error of +/-1.4%. (a) Does the margin of error take into account any problems with the wording of the survey question, interviewer errors, bias from sequence of questions, and so forth? (b) What does the margin of error reflect?
7.3.3
Consider n=100 binomial trials with r =30 successes. (a) Is it appropriate to use a normal distribution to approximate the pˆ distribution? (b) Find a 90% confidence interval for the population proportion of successes p. (c) Explain the meaning of the confidence interval you computed.
7.3.7
Inorganic phosphorous is a naturally occurring element in all plants and animals, with concentrations increasing progressively up the food chain (fruit , vegetables , cereals , nuts , corpse). Geochemical surveys take soil samples to determine phosphorous content (in ppm, parts per million). A high phosphorous content may or may not indicate an ancient burial site, food storage site, or even a garbage dump. The Hill of Tara is a very important archaeological site in Ireland. It is by legend the seat of Ireland's ancient high kings. Independent random samples from two regions in Tara gave the following phosphorous measurements (in ppm). Assume the population distributions of phosphorous are mound-shaped and symmetric for these two regions. Region I: x1; n1 =12 540 810 790 790 340 800 970 720 890 860 820 640 Region II: x2; n2=16 750 870 700 810 635 955 710 890 895 850 280 993 965 350 520 650 (a) Use a calculator to verify that x̅1 =747.5, s1 = 170.4, x̅2=738.9, and s2 = 212.1. (b) Let μ1 be the population mean for x1 and let μ2 be the population mean for x2. Find a 90% confidence interval for μ1 - μ2. (c) Interpretation Explain what the confidence interval means in the context of this problem. Does the interval consist of numbers that are all positive? all negative? of different signs? At the 90% level of confidence, is one region more interesting than the other from a geochemical perspective? (d) Check Requirements Which distribution (standard normal or Student's t) did you use? Why?
7.4.11
Isabel Myers was a pioneer in the study of personality types. She identified four basic personality preferences, which are described at length in the book A Guide to the Development and Use of the Myers-Briggs Type Indicator by Myers and McCaulley (Consulting Psychologists Press). Marriage counselors know that couples who have none of the four preferences in common may have a stormy marriage. Myers took a random sample of 375 married couples and found that 289 had two or more personality preferences in common. In another random sample of 571 married couples, it was found that only 23 had no preferences in common. Let p1 be the population proportion of all married couples who have two or more personality preferences in common. Let p2 be the population proportion of all married couples who have no personality preferences in common. (a) Check Requirements Can a normal distribution be used to approximate the pˆ 1 - pˆ 2 distribution? Explain. (b) Find a 99% confidence interval for p1 2 p2. (c) Interpretation Explain the meaning of the confidence interval in part (a) in the context of this problem. Does the confidence interval contain all positive, all negative, or both positive and negative numbers? What does this tell you (at the 99% confidence level) about the proportion of married couples with two or more personality preferences in common compared with the proportion of married couples sharing no personality preferences in common?
7.4.17
A random sample of size 20 from a normal distribution with σ=4 produced a sample mean of 8. (a) Is the x̅ distribution normal? Explain. (b) Compute the sample test statistic z under the null hypothesis H0: μ = 7. (c) For H1: μ does not equal 7, estimate the P-value of the test statistic. (d) For a level of significance of 0.05 and the hypotheses of parts (b) and (c), do you reject or fail to reject the null hypothesis? Explain.
8.1.13
The body weight of a healthy 3-month-old colt should be about μ= 60 kg. (a) If you want to set up a statistical test to challenge the claim that μ = 60 kg, what would you use for the null hypothesis H0? (b) In Nevada, there are many herds of wild horses. Suppose you want to test the claim that the average weight of a wild Nevada colt (3 months old) is less than 60 kg. What would you use for the alternate hypothesis H1? (c) Suppose you want to test the claim that the average weight of such a wild colt is greater than 60 kg. What would you use for the alternate hypothesis? (d) Suppose you want to test the claim that the average weight of such a wild colt is different from 60 kg. What would you use for the alternate hypothesis? (e) For each of the tests in parts (b), (c), and (d), would the area corresponding to the P-value be on the left, on the right, or on both sides of the mean? Explain your answer in each case.
8.1.15
Weatherwise magazine is published in association with the American Meteorological Society. Volume 46, Number 6 has a rating system to classify Nor'easter storms that frequently hit New England states and can cause much damage near the ocean coast. A severe storm has an average peak wave height of 16.4 feet for waves hitting the shore. Suppose that a Nor'easter is in progress at the severe storm class rating. (a) Let us say that we want to set up a statistical test to see if the wave action (i.e., height) is dying down or getting worse. What would be the null hypothesis regarding average wave height? (b) If you wanted to test the hypothesis that the storm is getting worse, what would you use for the alternate hypothesis? (c) If you wanted to test the hypothesis that the waves are dying down, what would you use for the alternate hypothesis? (d) Suppose you do not know whether the storm is getting worse or dying out. You just want to test the hypothesis that the average wave height is different (either higher or lower) from the severe storm class rating. What would you use for the alternate hypothesis? (e) For each of the tests in parts (b), (c), and (d), would the area corresponding to the P-value be on the left, on the right, or on both sides of the mean? Explain your answer in each case.
8.1.17
Bill Alther is a zoologist who studies Anna's hum- mingbird. Suppose that in a remote part of the Grand Canyon, a random sample of six of these birds was caught, weighed, and released. The weights (in grams) were 3.7 2.9 3.8 4.2 4.8 3.1 The sample mean is x̅=3.75 grams. Let x be a random variable representing weights of Anna's hummingbirds in this part of the Grand Canyon. We assume that x has a normal distribution and σ=0.70 gram. It is known that for the population of all Anna's hummingbirds, the mean weight is μ=4.55 grams. Do the data indicate that the mean weight of these birds in this part of the Grand Canyon is less than 4.55 grams? Use a 5 0.01.
8.1.21
If the P-value in a statistical test is greater than the level of significance for the test, do we reject or fail to reject H0?
8.1.7
Suppose the P-value in a right-tailed test is 0.0092. Based on the same population, sample, and null hypothesis, what is the P-value for a corresponding two-tailed test?
8.1.9
For the same sample data and null hypothesis, how does the P-value for a two-tailed test of m compare to that for a one-tailed test?
8.2.1
Weatherwise is a magazine published by the American Meteorological Society. One issue gives a rating system used to classify Nor'easter storms that frequently hit New England and can cause much damage near the ocean. A severe storm has an average peak wave height of μ = 16.4 feet for waves hitting the shore. Suppose that a Nor'easter is in progress at the severe storm class rating. Peak wave heights are usually measured from land (using binoculars) off fixed cement piers. Suppose that a reading of 36 waves showed an average wave height of x̅ = 17.3 feet. Previous studies of severe storms indicate that σ=3.5 feet. Does this information suggest that the storm is (perhaps temporarily) increasing above the severe rating? Use a=0.01. (a) What is the level of significance? State the null and alternate hypotheses. (b) Check Requirements What sampling distribution will you use? Explain the rationale for your choice of sampling distribution. Compute the value of the sample test statistic. (c) Estimate the P-value. Sketch the sampling distribution and show the area corresponding to the P-value. (d) Based on your answers in parts (a) to (c), will you reject or fail to reject the null hypothesis? Are the data statistically significant at level a? (e) Interpret your conclusion in the context of the application.
8.2.11
A random sample of 46 adult coyotes in a region of northern Minnesota showed the average age to be x̅=2.05 years, with sample standard deviation s=0.82 years (based on information from the book Coyotes: Biology, Behavior and Management by M. Bekoff, Academic Press). However, it is thought that the overall population mean age of coyotes is μ=1.75. Do the sample data indicate that coyotes in this region of northern Minnesota tend to live longer than the average of 1.75 years? Use a=0.01. (a) What is the level of significance? State the null and alternate hypotheses. (b) Check Requirements What sampling distribution will you use? Explain the rationale for your choice of sampling distribution. Compute the value of the sample test statistic. (c) Estimate the P-value. Sketch the sampling distribution and show the area corresponding to the P-value. (d) Based on your answers in parts (a) to (c), will you reject or fail to reject the null hypothesis? Are the data statistically significant at level a? (e) Interpret your conclusion in the context of the application.
8.2.13
Let x be a random variable that represents red blood cell (RBC) count in millions of cells per cubic millimeter of whole blood. Then x has a distribution that is approximately normal. For the population of healthy female adults, the mean of the x distribution is about 4.8. Suppose that a female patient has taken six laboratory blood tests over the past several months and that the RBC count data sent to the patient's doctor are: 4.9 4.2 4.5 4.1 4.4 4.3 i. Use a calculator to verify that x̅=4.40 and s=0.28. ii. Do the given data indicate that the population mean RBC count for this patient is lower than 4.8? Use a=0.05. (a) What is the level of significance? State the null and alternate hypotheses. (b) Check Requirements What sampling distribution will you use? Explain the rationale for your choice of sampling distribution. Compute the value of the sample test statistic. (c) Estimate the P-value. Sketch the sampling distribution and show the area corresponding to the P-value. (d) Based on your answers in parts (a) to (c), will you reject or fail to reject the null hypothesis? Are the data statistically significant at level a? (e) Interpret your conclusion in the context of the application.
8.2.17
When using the Student's t distribution to test μ, what value do you use for the degrees of freedom?
8.2.3
Consider a test for μ. If the P-value is such that you can reject H0 for a=0.01, can you always reject H0 for a=0.05? Explain.
8.2.5
A random sample of 25 values is drawn from a mound-shaped and symmetrical distribution. The sample mean is 10 and the sample standard deviation is 2. Use a level of significance of 0.05 to conduct a two-tailed test of the claim that the population mean is 9.5. (a) Check Requirements Is it appropriate to use a Student's t distribution? Explain. How many degrees of freedom do we use? (b) What are the hypotheses? (c) Compute the sample test statistic t. (d) Estimate the P-value for the test. (e) Do we reject or fail to reject H0? (f) Interpret the results.
8.2.9
To use the normal distribution to test a proportion p, the conditions np >5 and nq >5 must be satisfied. Does the value of p come from H0, or is it estimated by using pˆ from the sample?
8.3.1
The U.S. Department of Transportation, National Highway Traffic Safety Administration, reported that 77% of all fatally injured automobile drivers were intoxicated. A random sample of 27 records of automobile driver fatalities in Kit Carson County, Colorado, showed that 15 involved an intoxicated driver. Do these data indicate that the population proportion of driver fatalities related to alcohol is less than 77% in Kit Carson County? Use a =0.01. (a) What is the level of significance? State the null and alternate hypotheses. (b) Check Requirements What sampling distribution will you use? Do you think the sample size is sufficiently large? Explain. Compute the value of the sample test statistic. (c) Find the P-value of the test statistic. Sketch the sampling distribution and show the area corresponding to the P-value. (d) Based on your answers in parts (a) to (c), will you reject or fail to reject the null hypothesis? Are the data statistically significant at level a? (e) Interpret your conclusion in the context of the application.
8.3.11
The following is based on information from The Wolf in the Southwest: The Making of an Endangered Species by David E. Brown (University of Arizona Press). Before 1918, the proportion of female wolves in the general population of all southwestern wolves was about 50%. However, after 1918, southwestern cattle ranchers began a widespread effort to destroy wolves. In a recent sample of 34 wolves, there were only 10 females. One theory is that male wolves tend to return sooner than females to their old territories where their predecessors were exterminated. Do these data indicate that the population proportion of female wolves is now less than 50% in the region? Use a=0.01. (a) What is the level of significance? State the null and alternate hypotheses. (b) Check Requirements What sampling distribution will you use? Do you think the sample size is sufficiently large? Explain. Compute the value of the sample test statistic. (c) Find the P-value of the test statistic. Sketch the sampling distribution and show the area corresponding to the P-value. (d) Based on your answers in parts (a) to (c), will you reject or fail to reject the null hypothesis? Are the data statistically significant at level a? (e) Interpret your conclusion in the context of the application.
8.3.13
Are most student government leaders extroverts? According to Myers-Briggs estimates, about 82% of college student government leaders are extroverts. Suppose that a Myers-Briggs personality preference test was given to a random sample of 73 student government leaders attending a large national leadership conference and that 56 were found to be extroverts. Does this indicate that the population proportion of extroverts among college student government leaders is different (either way) from 82%? Use a=0.01. (a) What is the level of significance? State the null and alternate hypotheses. (b) Check Requirements What sampling distribution will you use? Do you think the sample size is sufficiently large? Explain. Compute the value of the sample test statistic. (c) Find the P-value of the test statistic. Sketch the sampling distribution and show the area corresponding to the P-value. (d) Based on your answers in parts (a) to (c), will you reject or fail to reject the null hypothesis? Are the data statistically significant at level a? (e) Interpret your conclusion in the context of the application.
8.3.21
Benford's Law claims that numbers chosen from very large data files tend to have "1" as the first nonzero digit disproportionately often. In fact, research has shown that if you randomly draw a number from a very large data file, the probability of getting a number with "1" as the leading digit is about 0.301. Now suppose you are an auditor for a very large corporation. The revenue report involves millions of numbers in a large computer file. Let us say you took a random sample of n=215 numerical entries from the file and r=46 of the entries had a first nonzero digit of 1. Let p represent the population proportion of all numbers in the corporate file that have a first nonzero digit of 1. i. Test the claim that p is less than 0.301. Use a=0.01. (a) What is the level of significance? State the null and alternate hypotheses. (b) Check Requirements What sampling distribution will you use? Do you think the sample size is sufficiently large? Explain. Compute the value of the sample test statistic. (c) Find the P-value of the test statistic. Sketch the sampling distribution and show the area corresponding to the P-value. (d) Based on your answers in parts (a) to (c), will you reject or fail to reject the null hypothesis? Are the data statistically significant at level a? (e) Interpret your conclusion in the context of the application. ii. If p is in fact less than 0.301, would it make you suspect that there are not enough numbers in the data file with leading 1's? Could this indicate that the books have been "cooked" by "pumping up" or inflating the numbers? Comment from the viewpoint of a stockholder. Comment from the perspective of the Federal Bureau of Investigation as it looks for money laundering in the form of false profits. i. Comment on the following statement: "If we reject the null hypothesis at level of significance a, we have not proved H0 to be false. We can say that the probability is a that we made a mistake in rejecting H0." Based on the outcome of the test, would you recommend further investigation before accusing the company of fraud?
8.3.7
In environmental studies, sex ratios are of great importance. Wolf society, packs, and ecology have been studied extensively at different locations in the United States and foreign countries. Sex ratios for eight study sites in northern Europe are shown in the following table. Location % Males Winter % Males Summer Finland 72 53 Finland 47 51 Finland 89 72 Lapland 55 48 Lapland 64 55 Russia 50 50 Russia 41 50 Russia 55 45 It is hypothesized that in winter, "loner" males (not present in summer packs) join the pack to increase survival rate. Use a 5% level of significance to test the claim that the average percentage of males in a wolf pack is higher in winter. (a) What is the level of significance? State the null and alternate hypotheses. Will you use a left-tailed, right-tailed, or two-tailed test? (b) Check Requirements What sampling distribution will you use? What assumptions are you making? Compute the value of the sample test statistic. (c) Find (or estimate) the P-value. Sketch the sampling distribution and show the area corresponding to the P-value. (d) Based on your answers in parts (a) to (c), will you reject or fail to reject the null hypothesis? Are the data statistically significant at level a? (e) Interpret your conclusion in the context of the application.
8.4.13
The following data are based on information taken from the book Navajo Architecture: Forms, History, Distributions by S. C. Jett and V. E. Spencer (University of Arizona Press). A survey of houses and traditional hogans was made in a number of different regions of the modern Navajo Indian Reservation. The following table is the result of a random sample of eight regions on the Navajo Reservation. Area # Inhabited Houses # Inhabited Hogan Bitter Springs 18 13 Rnbow Lodge 16 14 Kayenta 68 46 Red Mesa 9 32 Black Mesa 11 15 CanyondeChelly 28 47 Cedar Point 50 17 Burnt Water 50 18 Does this information indicate that the population mean number of inhabited houses is greater than that of hogans on the Navajo Reservation? Use a 5% level of significance. (a) What is the level of significance? State the null and alternate hypotheses. Will you use a left-tailed, right-tailed, or two-tailed test? (b) Check Requirements What sampling distribution will you use? What assumptions are you making? Compute the value of the sample test statistic. (c) Find (or estimate) the P-value. Sketch the sampling distribution and show the area corresponding to the P-value. (d) Based on your answers in parts (a) to (c), will you reject or fail to reject the null hypothesis? Are the data statistically significant at level a? (e) Interpret your conclusion in the context of the application.
8.4.15
Do professional golfers play better in their first round? Let row B represent the score in the fourth (and final) round, and let row A represent the score in the first round of a professional golf tournament. A random sample of finalists in the British Open gave the following data for their first and last rounds in the tournament. B: Last 73 68 73 71 71 72 68 68 74 A: First 66 70 64 71 65 71 71 71 71 Do the data indicate that the population mean score on the last round is higher than that on the first? Use a 5% level of significance. (a) What is the level of significance? State the null and alternate hypotheses. Will you use a left-tailed, right-tailed, or two-tailed test? (b) Check Requirements What sampling distribution will you use? What assumptions are you making? Compute the value of the sample test statistic. (c) Find (or estimate) the P-value. Sketch the sampling distribution and show the area corresponding to the P-value. (d) Based on your answers in parts (a) to (c), will you reject or fail to reject the null hypothesis? Are the data statistically significant at level a? (e) Interpret your conclusion in the context of the application.
8.4.19
The same experimental design discussed in Problem 20 was used to test rats trained to climb a sequence of short ladders. Times in seconds for eight rats to perform this task are shown in the following table. RAT: A B C D E F G H Time 1 Pellet: 12.5 13.7 11.4 12.1 11.0 10.4 14.6 12.3 Time 5 Pellets: 11.1 12.0 12.2 10.6 11.5 10.5 12.9 11.0 (a) What is the level of significance? State the null and alternate hypotheses. Will you use a left-tailed, right-tailed, or two-tailed test? (b) Check Requirements What sampling distribution will you use? What assumptions are you making? Compute the value of the sample test statistic. (c) Find (or estimate) the P-value. Sketch the sampling distribution and show the area corresponding to the P-value. (d) Based on your answers in parts (a) to (c), will you reject or fail to reject the null hypothesis? Are the data statistically significant at level a? (e) Interpret your conclusion in the context of the application.
8.4.21
When testing the difference of means for paired data, what is the null hypothesis?
8.4.3
For a random sample of 36 data pairs, the sample mean of the differences was 0.8. The sample standard deviation of the differences was 2. At the 5% level of significance, test the claim that the population mean of the differences is different from 0. (a) Check Requirements Is it appropriate to use a Student's t distribution for the sample test statistic? Explain. What degrees of freedom are used? (b) State the hypotheses. (c) Compute the sample test statistic. (d) Estimate the P-value of the sample test statistic. (e) Do we reject or fail to reject the null hypothesis? Explain. (f) Interpretation What do your results tell you?
8.4.7
When drawing a scatter diagram, along which axis is the explanatory variable placed? Along which axis is the response variable placed?
9.1.1
Trevor conducted a study and found that the correlation be- tween the price of a gallon of gasoline and gasoline consumption has a linear correlation coefficient of 20.7. What does this result say about the relation- ship between price of gasoline and consumption? The study included gasoline prices ranging from $2.70 to $5.30 per gallon. Is it reliable to apply the results of this study to prices of gasoline higher than $5.30 per gallon? Explain.
9.1.11
Can a low barometer reading be used to predict maximum wind speed of an approaching tropical cyclone? Data for this problem are based on information taken from Weatherwise (Vol. 46, No. 1), a publication of the American Meteorological Society. For a random sample of tropical cyclones, let x be the lowest pressure (in millibars) as a cyclone approaches, and let y be the maximum wind speed (in miles per hour) of the cyclone. x: 004 975 992 935 985 932 y: 40 100 65 145 80 150 (a) Make a scatter diagram and draw the line you think best fits the data. (b) Would you say the correlation is low, moderate, or strong? positive or negative? (c) Use a calculator to verify that Σx= 5823, Σx2= 5,655,779, Σy= 580, Σy2= 65,750, and Σxy=556,315. Compute r. As x increases, does the value of r imply that y should tend to increase or decrease? Explain.
9.1.15
In baseball, is there a linear cor- relation between batting average and home run percentage? Let x represent the batting average of a professional baseball player, and let y represent the player's home run percentage (number of home runs per 100 times at bat). A random sample of n =7 professional baseball players gave the following information x: .243 .259 .286 .263 .268 .339 .299 y: 1.4 3.6 5.5 3.8 3.5 7.3 5.0 (a) Make a scatter diagram and draw the line you think best fits the data. (b) Would you say the correlation is low, moderate, or high? positive or negative? (c) Use a calculator to verify that Σx= 1.957, Σx2 =0.553, Σy= 30.1, Σy2=150.15, and Σxy= 8.753. Compute r. As x increases, does the value of r imply that y should tend to increase or decrease? Explain.
9.1.17
Suppose two variables are negatively correlated. Does the response variable increase or decrease as the explanatory variable increases?
9.1.3
Over the past 50 years, there has been a strong negative correlation between average annual income and the record time to run 1 mile. In other words, average annual incomes have been rising while the record time to run 1 mile has been decreasing. (a) Do you think increasing incomes cause decreasing times to run the mile? Explain. (b) What lurking variables might be causing the change in one or both of the variables? Explain.
9.1.9
In the least-squares line yˆ=5 - 2x, what is the value of the slope? When x changes by 1 unit, by how much does yˆ change?
9.2.1
Data for this problem are based on information taken from The Wall Street Journal. Let x be the age in years of a licensed automobile driver. Let y be the percentage of all fatal accidents (for a given age) due to speeding. For example, the first data pair indicates that 36% of all fatal accidents involving 17-year-olds are due to speeding. x: 17 27 37 47 57 67 77 y: 36 25 20 12 10 7 5 Complete parts (a) through (e), given Σx =329, Σy =115, Σx2 = 18,263, Σy2 = 2639, Σxy =4015, and r =-0.959. (a) Draw a scatter diagram displaying the data. (b) Verify the given sums Σx, Σy, Σx2, Σy2, and Σxy and the value of the sample correlation coefficient r c) Find x̅, ȳ, a, and b. Then find the equation of the least-squares line yˆ=a+bx . (d) Graph the least-squares line on your scatter diagram. Be sure to use the point x̅, ȳ as one of the points on the line. (e) Find the value of the coefficient of determination r2. What percentage of the variation in y can be explained by the corresponding variation in x and the least-squares line? What percentage is unexplained? (f) Predict the percentage of all fatal accidents due to speeding for 25-year-olds.
9.2.11
The following data are based on information from the book Life in America's Small Cities. Let x be the percentage of 16- to 19-year-olds not in school and not high school graduates. Let y be the reported violent crimes per 1000 residents. Six small cities in Arkansas (Blytheville, El Dorado, Hot Springs, Jonesboro, Rogers, and Russellville) reported the following information about x and y: x: 24.2 19.0 18.2. 14.9 19.0 17.5 y: 13.0 4.4 9.3 1.3 0.8 3.6 Complete parts (a) through (e), given Σx= 112.8, Σy= 32.4, Σx2= 2167.14, Σy2= 290.14, Σxy =665.03, and r = 0.764. (a) Draw a scatter diagram displaying the data. (b) Verify the given sums Σx, Σy, Σx2, Σy2, and Σxy and the value of the sample correlation coefficient r c) Find x̅, ȳ, a, and b. Then find the equation of the least-squares line yˆ=a+bx . (d) Graph the least-squares line on your scatter diagram. Be sure to use the point x̅, ȳ as one of the points on the line. (e) Find the value of the coefficient of determination r2. What percentage of the variation in y can be explained by the corresponding variation in x and the least-squares line? What percentage is unexplained? (f) If the percentage of 16- to 19-year-olds not in school and not graduates reaches 24% in a similar city, what is the predicted rate of violent crimes per 1000 residents?
9.2.15
We use the form yˆ=a+bx for the least-squares line. In some computer printouts, the least-squares equation is not given directly. Instead, the value of the constant a is given, and the coefficient b of the explanatory or predictor variable is displayed. Sometimes a is referred to as the constant, and sometimes as the intercept. Data from Climatology Report No. 77-3 of the Department of Atmospheric Science, Colorado State University, showed the following relationship between elevation (in thousands of feet) and average number of frost-free days per year in Colorado locations. A Minitab printout provides Notice that "Elevation" is listed under "Predictor." This means that elevation is the explanatory variable x. Its coefficient is the slope b. "Constant" refers to a in the equation yˆ=a+bx. Predictor Coef SE Coef T P Constant 318.16 28.31 11.24 0.002 Elevation -30.878 3.511 -8.79 0.003 s=11.8603 R-Sq 5 96.3% (a) Use the printout to write the least-squares equation. (b) For each 1000-foot increase in elevation, how many fewer frost-free days are predicted? (c) The printout gives the value of the coefficient of determination r2. What is the value of r? Be sure to give the correct sign for r based on the sign of b. (d) Interpretation What percentage of the variation in y can be explained by the corresponding variation in x and the least-squares line? What percentage is unexplained?
9.2.5
Do heavier cars really use more gasoline? Suppose a car is chosen at random. Let x be the weight of the car (in hun- dreds of pounds), and let y be the miles per gallon (mpg). The following information is based on data taken from Consumer Reports (Vol. 62, No. 4). x: 27 44 32 47 23 40 34 52 y: 30 19 24 13 29 17 21 14 Complete parts (a) through (e), given Σx =299, Σy =167, Σx2=11,887, Σy2=3773, Σxy =5814, and r = -0.946. (a) Draw a scatter diagram displaying the data. (b) Verify the given sums Σx, Σy, Σx2, Σy2, and Σxy and the value of the sample correlation coefficient r c) Find x̅, ȳ, a, and b. Then find the equation of the least-squares line yˆ=a+bx . (d) Graph the least-squares line on your scatter diagram. Be sure to use the point x̅, ȳ as one of the points on the line. (e) Find the value of the coefficient of determination r2. What percentage of the variation in y can be explained by the corresponding variation in x and the least-squares line? What percentage is unexplained? (f) Suppose a car weighs x=38 (hundred pounds). What does the least- squares line forecast for y=miles per gallon?
9.2.9
Consider a data set with at least three data values. Suppose the highest value is increased by 10 and the lowest is decreased by 10. (a) Does the mean change? Explain (b) Does the median change? Explain (c) Is it possible for the mode to change? Explain.
3.1.9
For mallard ducks and Canada geese, what percentage of nests are successful (at least one offspring survives)? Studies in Montana, Illinois, Wyoming, Utah, and California gave the following percentages of successful nests. x: Percentage success for mallard duck nests 56 85 52 13 39 y: Percentage success for Canada goose nests 24 53 60 69 18 (a) Use a calculator to verify that sigma x=245; sigma x squared = 14,755; sigma y = 224; and sigma y squared =12,070. (b) Use the results of part (a) to compute the sample mean, variance, and standard deviation for x, the percent of successful mallard nests. (c) Use the results of part (a) to compute the sample mean, variance, and standard deviation for y, the percent of successful Canada goose nests.
3.2.19 A, B, C ONLY
At Center Hospital there is some concern about the high turnover of nurses. A survey was done to determine how long (in months) nurses had been in their current positions. The responses (in months) of 20 nurses were 23 2 5 14 25 36 27 42 12 8 7 23 29 26 28 11 20 31 8 36 Make a box-and-whisker plot of the data. Find the interquartile range.
3.3.7
What percentage of the general U.S. population have bachelor's degrees? The Statistical Abstract of the United States, 120th Edition, gives the percentage of bachelor's degrees by state. For convenience, the data are sorted in increasing order. 17 18 18 18 19 20 20 20 21 21 21 21 22 22 22 22 22 22 23 23 24 24 24 24 24 24 24 24 25 26 26 26 26 26 26 27 27 27 27 27 28 28 29 31 31 32 32 34 35 38 (a) Make a box-and-whisker plot and find the interquartile range. (b) Illinois has a bachelor's degree percentage rate of about 26%. Into what quartile does this rate fall?
3.3.9
Consider a family with 3 children. Assume the probability that one child is a boy is 0.5 and the probability that one child is a girl is also 0.5, and that the events "boy" and "girl" are independent. a) List the equally likely events for the gender of the 3 children, from oldest to youngest. b) What is the probability that all 3 children are male? Notice that the com- plement of the event "all three children are male" is "at least one of the children is female." Use this information to compute the probability that at least one child is female.
4.1.11
Consider a binomial experiment with n = 7 trials where the probability of success on a single trial is p = 0.30. (a) Find P(r=0). (b) Find P (r≥1) by using the complement rule.
5.2.11
Suppose you are a hospital manager and have been told that there is no need to worry that respirator monitoring equipment might fail because the probability any one monitor will fail is only 0.01. The hospital has 20 such monitors and they work independently. Should you be more concerned about the probability that exactly one of the 20 monitors fails, or that at least one fails? Explain.
5.2.5
According to the college registrar's office, 40% of students enrolled in an introductory statistics class this semester are freshmen, 25% are sophomores, 15% are juniors, and 20% are seniors. You want to determine the probability that in a random sample of five students enrolled in introductory statistics this semester, exactly two are freshmen. (a) Describe a trial. Can we model a trial as having only two outcomes? If so, what is success? What is failure? What is the probability of success? (b) We are sampling without replacement. If only 30 students are enrolled in introductory statistics this semester, is it appropriate to model 5 trials as independent, with the same probability of success on each trial? Explain. What other probability distribution would be more appropriate in this setting?
5.2.9
Assuming that the heights of college women are normally distributed with mean 65 inches and standard deviation 2.5 inches, answer the following questions. (a) What percentage of women are taller than 65 inches? (b) What percentage of women are shorter than 65 inches? (c) What percentage of women are between 62.5 inches and 67.5 inches? (d) What percentage of women are between 60 inches and 70 inches?
6.1.7
Sketch the areas under the standard normal curve over the indicated intervals and find the specified areas. -To the left of z = 0.45 -To the right of z =-1.22 -Between z =-2.18 and z =1.3
6.2.17, 6.2.21, 6.2.25
Let z be a random variable with a standard normal distribution. Find the indicated probability, and shade the corresponding area under the standard normal curve. A) P(z ≤ -0.13) B) P(-1.20 ≤ z ≤ 2.64)
6.2.33, 6.2.41
A normal distribution has μ = 30 and σ = 5. (a) Find the z score corresponding to x = 25. (b) Find the z score corresponding to x = 42. (c) Find the raw score corresponding to z = -2. (d) Find the raw score corresponding to z = 1.3.
6.2.5
Find the z value described and sketch the area described: Find z such that 55% of the standard normal curve lies to the left of z.
6.3.17
What is the standard error of a sampling distribution?
6.5.1
Suppose x has a distribution with μ=15 and σ=14. (a) If a random sample of size n=49 is drawn, find μ of x̅, σ of x̅, and P(15 ≤ x̅ ≤ 17). (b) If a random sample of size n=64 is drawn, find μ of x̅, σ of x̅, and P(15 ≤ x̅ ≤ 17). (c) Why should you expect the probability of part (b) to be higher than that of part (a)? Hint: Consider the standard deviations in parts (a) and (b).
6.5.11
Let x be a random variable that represents the level of glucose in the blood (milligrams per deciliter of blood) after a 12-hour fast. Assume that for people under 50 years old, x has a distribution that is approximately normal, with mean μ=85 and estimated standard deviation σ=25. A test result x < 40 is an indication of severe excess insulin, and medication is usually prescribed. (a) What is the probability that, on a single test, x < 40? (b) Suppose a doctor uses the average x for two tests taken about a week apart. What can we say about the probability distribution of x? Hint: See Theorem 6.1. What is the probability that x̅ < 40? (c) Repeat part (b) for n=3 tests taken a week apart. (d) Repeat part (b) for n= 5 tests taken a week apart. (e) Interpretation Compare your answers to parts (a), (b), (c), and (d). Did the probabilities decrease as n increased? Explain what this might imply if you were a doctor or a nurse. If a patient had a test result of x̅ < 40 based on five tests, explain why either you are looking at an extremely rare event or (more likely) the person has a case of excess insulin.
6.5.15
Numbers are often assigned to data that are categorical in nature. (a) Consider these number assignments for category items describing electronic ways of expressing personal opinions: 1 Twitter; 2 e-mail; 3 text message; 4 Facebook; 5 blog Are these numerical assignments at the ordinal data level or higher? Explain. (b) Consider these number assignments for category items describing usefulness of customer service: 1 not helpful; 2 somewhat helpful; 3 very helpful; 4 extremely helpful. Are these numerical assignments at the ordinal data level? Explain. What about at the interval level or higher? Explain.
1.1.5
Professor Gill is designing a multiple-choice test. There are to be 10 questions. Each question is to have five choices for answers. The choices are to be designated by the letters a, b, c, d, and e. Professor Gill wishes to use a random-number table to determine which letter choice should correspond to the correct answer for a question. Using the number correspondence 1 for a, 2 for b, 3 for c, 4 for d, and 5 for e, use a random-number table to determine the letter choice for the correct answer for each of the 10 questions.
1.2.17
In a random sample of 50 students from a large university, all the students were between 18 and 20 years old. Can we conclude that the entire population of students at the university is between 18 and 20 years old? Explain.
1.2.5
Greg took a random sample of size 100 from the population of current season ticket holders to State College men's basketball games. Then he took a random sample of size 100 from the population of current season ticket holders to State College women's basketball games. (a) What sampling technique (stratified, systematic, cluster, multistage, convenience, random) did Greg use to sample from the population of current season ticket holders to all State College basketball games played by either men or women? (b) Is it appropriate to pool the samples and claim to have a random sample of size 200 from the population of current season ticket holders to all State College home basketball games played by either men or women? Explain.
1.2.7
Which technique for gathering data (observational study or experiment) do you think was used in the following studies? (a) The Colorado Division of Wildlife netted and released 774 fish at Quincy Reservoir. There were 219 perch, 315 blue gill, 83 pike, and 157 rainbow trout. (b) The Colorado Division of Wildlife caught 41 bighorn sheep on Mt. Evans and gave each one an injection to prevent heartworm. A year later, 38 of these sheep did not have heartworm, while the other three did. (c) The Colorado Division of Wildlife imposed special fishing regulations on the Deckers section of the South Platte River. All trout under 15 inches had to be released. A study of trout before and after the regulation went into effect showed that the average length of a trout increased by 4.2 inches after the new regulation. (d) An ecology class used binoculars to watch 23 turtles at Lowell Ponds. It was found that 18 were box turtles and 5 were snapping turtles.
1.3.7
How long does it take to finish the 1161-mile Iditarod Dog Sled Race from Anchorage to Nome, Alaska (see Viewpoint)? Finish times (to the nearest hour) for 57 dogsled teams are shown below. 261 271 236 244 279 296 284 299 288 338 360 341 333 261 266 287 296 313 299 303 277 283 304 305 288 290 288 332 330 309 328 307 328 285 291 295 310 318 318 320 333 321 323 324 327 Use five classes. a) Find the class width. (b) Make a frequency table showing class limits, class boundaries, midpoints, frequencies, relative frequencies, and cumulative frequencies. (c) Draw a histogram. (d) Draw a relative-frequency histogram. (e) Categorize the basic distribution shape as uniform, mound-shaped sym- metrical, bimodal, skewed left, or skewed right. (f) Draw an ogive.
2.1.15
Certain kinds of tumors tend to recur. The following data represent the lengths of time, in months, for a tumor to recur after chemotherapy. Use five classes. 19 18 50 1 14 45 38 40 27 20 17 1 21 22 54 46 25 49 59 39 43 39 5 9 38 18 54 59 46 50 29 12 19 36 43 41 10 50 41 25 19 39 a) Find the class width. (b) Make a frequency table showing class limits, class boundaries, midpoints, frequencies, relative frequencies, and cumulative frequencies. (c) Draw a histogram. (d) Draw a relative-frequency histogram. (e) Categorize the basic distribution shape as uniform, mound-shaped sym- metrical, bimodal, skewed left, or skewed right. (f) Draw an ogive.
2.1.17
Consider the numbers 2 3 4 5 5 (a) Compute the mode, median, and mean. (b) If the numbers represent codes for the colors of T-shirts ordered from a catalog, which average(s) would make sense? (c) If the numbers represent one-way mileages for trails to different lakes, which average(s) would make sense? (d) Suppose the numbers represent survey responses from 1 to 5, with 1 disagree strongly, 2 disagree, 3 agree, 4 agree strongly, and 5 agree very strongly. Which averages make sense?
3.1.13
In this problem, we explore the effect on the mean, median, and mode of multiplying each data value by the same number. Consider the data set 2, 2, 3, 6, 10. (a) Compute the mode, median, and mean. (b) Multiply each data value by 5. Compute the mode, median, and mean. (c) Compare the results of parts (a) and (b). In general, how do you think the mode, median, and mean are affected when each data value in a set is multiplied by the same constant? (d) Suppose you have information about average heights of a random sample of airplane passengers. The mode is 70 inches, the median is 68 inches, and the mean is 71 inches. To convert the data into centimeters, multiply each data value by 2.54. What are the values of the mode, median, and mean in centimeters?
3.1.17
The Grand Canyon and the Colorado River are beautiful, rugged, and sometimes dangerous. Thomas Myers is a physician at the park clinic in Grand Canyon Village. Dr. Myers has recorded (for a 5-year period) the number of visitor injuries at different landing points for commercial boat trips down the Colorado River in both the Upper and Lower Grand Canyon: Upper Canyon: Number of Injuries per Landing Point Between North Canyon and Phantom Ranch 2 3 1 1 3 4 6 9 3 1 3 Lower Canyon: Number of Injuries per Landing Point Between Bright Angel and Lava Falls 8 1 1 0 6 7 2 14 3 0 1 13 2 1 (a) Compute the mean, median, and mode for injuries per landing point in the Upper Canyon. (b) Compute the mean, median, and mode for injuries per landing point in the Lower Canyon. (c) Compare the results of parts (a) and (b). (d) The Lower Canyon stretch had some extreme data values. Compute a 5% trimmed mean for this region, and compare this result to the mean for the Upper Canyon computed in part (a).
3.1.21
Each of the following data sets has a mean of xbar = 10. (i) 8 9 10 11 12 (ii) 7 9 10 11 13 (iii) 7 8 10 12 13 (a) Without doing any computations, order the data sets according to increasing value of standard deviations. (b) Why do you expect the difference in standard deviations between data sets (i) and (ii) to be greater than the difference in standard deviations between data sets (ii) and (iii)? Hint: Consider how much the data in the respective sets differ from the mean.
3.2.9
Consider the following events for a driver selected at random from the general population: A = driver is under 25 years old B =driver has received a speeding ticket Translate each of the following phrases into symbols. (a) The probability the driver has received a speeding ticket and is under 25 years old (b) The probability a driver who is under 25 years old has received a speeding ticket (c) The probability a driver who has received a speeding ticket is 25 years old or older (d) The probability the driver is under 25 years old or has received a speeding ticket (e) The probability the driver has not received a speeding ticket or is under 25 years old
4.2.13
You roll two fair dice, a green one and a red one. (a) Are the outcomes on the dice independent? (b) Find P(5 on green die and 3 on red die). (c) Find P(3 on green die and 5 on red die). (d) Find P ((5 on green die and 3 on red die) or (3 on green die and 5 on red die)).
4.2.17
This problem involves a deck of 52 playing cards. There are four suits of 13 cards each. The four suits are: hearts, diamonds, clubs, spades. The 26 cards included in hearts and diamonds are red in color. The 26 cards included in clubs and spades are black in color. The 13 cards in each suit are: 2, 3, 4, 5, 6, 7, 8, 9, 10, Jack, Queen, King, and Ace. This means there are four Aces, four Kings, four Queens, four 10's, etc., down to four 2's in each deck. You draw two cards from a standard deck of 52 cards without replacing the first one before drawing the second. (a) Are the outcomes on the two cards independent? Why? (b) Find P(Ace on 1st card and King on 2nd). (c) Find P(King on 1st card and Ace on 2nd). (d) Find the probability of drawing an Ace and a King in either order.
4.2.21
Wing Foot is a shoe franchise commonly found in shopping centers across the United States. Wing Foot knows that its stores will not show a profit unless they gross over $940,000 per year. Let A be the event that a new Wing Foot store grosses over $940,000 its first year. Let B be the event that a store grosses over $940,000 its second year. Wing Foot has an administrative policy of closing a new store if it does not show a profit in either of the first 2 years. The accounting office at Wing Foot provided the following information: 65% of all Wing Foot stores show a profit the first year; 71% of all Wing Foot stores show a profit the second year (this includes stores that did not show a profit the first year); however, 87% of Wing Foot stores that showed a profit the first year also showed a profit the second year. Compute the following: (a) P(A) (b) P(B) (c) P(B|A) (d) P(A and B) (e) P(A or B) (f) What is the probability that a new Wing Foot store will not be closed after 2 years? What is the probability that a new Wing Foot store will be closed after 2 years?
4.2.33
What is the income distribution of super shoppers? In the following table, income units are in thousands of dollars, and each interval goes up to but does not include the given high value. The midpoints are given to the nearest thousand dollars. Income range: 5-15 15-25 25-35 35-45 45-55 55+ Midpoint x: 10 20 30 40 50 60 % spr shppers: 21% 14% 22% 15% 20% 8% (a) Using the income midpoints x and the percent of super shoppers, do we have a valid probability distribution? Explain. (b) Use a histogram to graph the probability distribution of part (a). (c) Compute the expected income m of a super shopper. (d) Compute the standard deviation s for the income of super shoppers.
5.1.11
The following data are based on information taken from Daily Creel Summary, published by the Paiute Indian Nation, Pyramid Lake, Nevada. Movie stars and U.S. presidents have fished Pyramid Lake. It is one of the best places in the lower 48 states to catch trophy cutthroat trout. In this table, x = number of fish caught in a 6-hour period. The percentage data are the percentages of fishermen who catch x fish in a 6-hour period while fishing from shore. x 0 1 2 3 4 or more % 44% 36% 15% 4% 1% (a) Convert the percentages to probabilities and make a histogram of the probability distribution. (b) Find the probability that a fisherman selected at random fishing from shore catches one or more fish in a 6-hour period. (c) Find the probability that a fisherman selected at random fishing from shore catches two or more fish in a 6-hour period. (d) Compute μ, the expected value of the number of fish caught per fisherman in a 6-hour period (round 4 or more to 4). (e) Compute σ, the standard deviation of the number of fish caught per fisherman in a 6-hour period (round 4 or more to 4)
5.1.13
Let x be a random variable that represents the weights in kilograms (kg) of healthy adult female deer (does) in December in Mesa Verde National Park. Then x has a distribution that is approximately normal, with mean μ=63.0 kg and standard deviation σ=7.1 kg. Suppose a doe that weighs less than 54 kg is considered undernourished. (a) What is the probability that a single doe captured (weighed and released) at random in December is undernourished? (b) If the park has about 2200 does, what number do you expect to be under-nourished in December? (c) To estimate the health of the December doe population, park rangers use the rule that the average weight of n=50 does should be more than 60 kg. If the average weight is less than 60 kg, it is thought that the entire population of does might be undernourished. What is the probability that the average weight x̅ for a random sample of 50 does is less than 60 kg (assume a healthy population)? (d) Interpretation Compute the probability that x̅ < 64.2 kg for 50 does (assume a healthy population). Suppose park rangers captured, weighed, and released 50 does in December, and the average weight was x̅ = 64.2 kg. Do you think the doe population is undernourished or not? Explain.
6.5.17
Suppose x has a distribution with a mean of 8 and a standard deviation of 16. Random samples of size n=64 are drawn. (a) Describe the x̅ distribution and compute the mean and standard deviation of the distribution. (b) Find the z value corresponding to x̅=9. (c) Find P(x̅ >9). (d) Interpretation: Would it be unusual for a random sample of size 64 from the x distribution to have a sample mean greater than 9? Explain.
6.5.5
Check that it is appropriate to use the normal approximation to the binomial. Then use the normal distribution to estimate the requested probabilities. The Denver Post stated that 80% of all new products introduced in grocery stores fail (are taken off the market) within 2 years. If a grocery store chain introduces 66 new products, what is the probability that within 2 years (a) 47 or more fail? (b) 58 or fewer fail? (c) 15 or more succeed? (d) fewer than 10 succeed?
6.6.11
You need to compute the probability of 5 or fewer successes for a binomial experiment with 10 trials. The probability of success on a single trial is 0.43. Since this probability of success is not in the table, you decide to use the normal approximation to the binomial. Is this an appropriate strategy? Explain.
6.6.5