Probability & Statistics #5
The housing market has recovered slowly from the economic crisis of 2008. Recently, in one large community, realtors randomly sampled 47 bids from potential buyers to estimate the average loss in home value. The sample showed the average loss was $9177 with a standard deviation of $1408. a) What assumptions and conditions must be checked before finding a confidence interval? How would one check them? b) Find a 99% confidence interval for the mean loss in value per home.
A. The data are assumed to be independent and from a Normal population. Check the independence assumption with the Randomization Condition. Check the Normal population assumption with the Nearly Normal Condition using a histogram. B. $8625, $9729
As the sample size increases, the mean of the sampling distribution
Stays the same
As the sample size increases, the standard deviation of the sampling distribution
Gets smaller
Pat ran an experiment to test optimum power and time settings for microwave popcorn. His goal was to deliver popcorn with fewer than 13% of the kernels left unpopped, on average. He determined that power 9 at 4 minutes was the best combination. To be sure that the method was successful, he popped 8 more bags of popcorn (selected at random) at this setting. All were of high quality, with the percentages of unpopped kernels shown below. 8.9, 8.1, 8.6, 8.3, 8.4, 7.7, 11.9, 12.5 a) Choose the correct null and alternative hypotheses. b) Calculate the test statistic. c) Calculate the P-value. d) Does this provide evidence that Pat met his goal?
A. H0:μ=13 HA:μ<13 B. t = -5.716 C. P value = .0004 D. Yes, there is enough evidence suggest that less than 13% of the kernels are left unpopped when the specified power and time settings are used.
For each of the situations in parts a) through d), define the parameter (proportion or mean) and write the null and alternative hypotheses in terms of parameter values. Example: We want to know if the proportion of up days in the stock market is 50%. Answer: Let p=the proportion of up days. H0: p=0.5 vs. HA: p≠0.5. a) A casino wants to know if their slot machine really delivers the 5 in 100 win rate that it claims. b) Last year, customers spent an average of $32.98 per visit to the company's website. Based on a random sample of purchases this year, the company wants to know if the mean this year has decreased. c) A pharmaceutical company wonders if their new drug has a cure rate different from the 40% reported by the placebo. d) A bank wants to know if the percentage of customers using their website has increased from the 20% that used it before their system crashed last week.
A. Let p=the proportion of wins in the casino's slot machine. H0:p=0.05 vs. HA: p ≠ 0.05 B. Let μ=the mean of customer spending per visit to the website. H0: μ = 32.98 vs. HA: μ < 32.98 C. Let p = proportion of patients cured by the new drug. H0: p = 0.40 vs. HA: p ≠ 0.40 . D. Let p = the proportion of customers using the bank website. H0: p = 0.20 vs. HA: p > 0.20
Livestock are given a special feed supplement to see if it will promote weight gain. Researchers report that the 77 cows studied gained an average of 56 pounds, and that a 95% confidence interval for the mean weight gain this supplement produces has a margin of error of ±11 pounds. Some students wrote the following conclusions. Did anyone interpret the interval correctly? Explain any misinterpretations. a) 95% of the cows studied gained between 45 and 67 pounds. Is this a correct interpretation of the confidence interval? Choose the correct answer below. b) One is 95% sure that a cow fed this supplement will gain between 45 and 67 pounds. Is this a correct interpretation of the confidence interval? Choose the correct answer below. c) One is 95% sure that the average weight gain among the cows in this study was between 45 and 67 pounds. Is this a correct interpretation of the confidence interval? Choose the correct answer below. d) The average weight gain of cows fed this supplement will be between 45 and 67 pounds 95% of the time. Is this a correct interpretation of the confidence interval? Choose the correct answer below. e) If this supplement is tested on another sample of cows, there is a 95% chance that their average weight gain will be between 45 and 67 pounds. Is this a correct interpretation of the confidence interval? Choose the correct answer below.
A. The interpretation is incorrect. The confidence interval is for the population mean, not the individual cows in the study. B. The interpretation is incorrect. The confidence interval is not for individual cows. C. The interpretation is incorrect. One knows the average weight gain in this study was 56 pounds. D. The interpretation is incorrect. The average weight gain of all cows does not vary. It is a constant value which the confidence interval estimates. E. The interpretation is incorrect. There is a 95% chance that another sample will have its average weight gain within two standard deviations of the true mean.
Before lending someone money, banks must decide whether they believe the applicant will repay the loan. One strategy used is a point system. Loan officers assess information about the applicant, totaling points they award for the person's income level, credit history, current debt burden, and so on. The higher the point total, the more convinced the bank is that it's safe to make the loan. Any applicant with a lower point total than a certain cutoff score is denied a loan. We can think of this decision as a hypothesis test. Since the bank makes its profit from the interest collected on repaid loans, their null hypothesis is that the applicant will repay the loan and therefore should get the money. Only if the person's score falls below the minimum cutoff will the bank reject the null and deny the loan. a) When a person defaults on a loan, which type of error did the bank make? b) Which kind of error is it when the bank misses an opportunity to make a loan to someone who would have repaid it? c) Suppose the bank decides to lower the cutoff score from 250 points to 200. Is that analogous to choosing a higher or lower value of α for a hypothesis test? d) What impact does this change in the cutoff value have on the chance of each type of error?
A. Type II error B. Type I error C. lower alpha level D. Decreased Type I, increased Type II.
Which of the following statements is true about the family of t distributions?
As the degrees of freedom increase, the t distributions approach the Normal distribution t distributions have fatter tails and narrower centers than Normal models t distributions are symmetric and unimodal
Which of the following are mistakes that can be made in a hypothesis test? I. H0 is true, and we reject it. II. H0 is true, and we fail to reject it. III. H0 is false, and we fail to reject it.
I and III
A medical researcher tested a new treatment for poison ivy against the traditional ointment. He concluded that the new treatment is more effective. Explain what the P-value of 0.065 means in this context.
If there is no difference in effectiveness, the chance of seeing an observed difference this large or larger is 6.5% by natural sampling variation.
If the P-value is smaller than the level of significance, what conclusion should we reach?
Reject the null hypothesis
Describe how the shape, center, and spread of t-models change as the number of degrees of freedom increases.
Shape becomes closer to Normal, center does not change, spread becomes narrower.
On a final project in an introductory statistics class, a student reports a 95% confidence interval for the average cost of a haircut to be ($5.50,$65.00). What is the correct interpretation of this confidence interval?
There is 95% confidence that the population mean is between these two numbers.
Using the t-tables, software, or a calculator, estimate the critical value of t for the given confidence interval and degrees of freedom.80% confidence interval with df = 11
1.363
Suppose we are making a 95% confidence interval for the population mean from a sample of size 15. What number of degrees of freedom should we use?
14
Using t-tables, software, or a calculator, estimate the critical value of t for a 99% confidence interval with df=24.Round to three decimal places as needed.
2.797
A college's data about the incoming freshmen indicates that the mean of their high school GPAs was 3.5, with a standard deviation of 0.20; the distribution was roughly mound-shaped and only slightly skewed. The students are randomly assigned to freshman writing seminars in groups of 25. What might the mean GPA of one of these seminar groups be? Describe the appropriate sampling distribution model, including shape, center, and spread, with attention to assumptions and conditions. Make a sketch using the 68-95-99.7 Rule. a) Describe the appropriate sampling distribution model, including shape, center, and spread. b) What assumptions and conditions must be satisfied for the sampling distribution model to be appropriate? Select all that apply. c) Make a sketch using the 68-95-99.7 Rule. Choose the correct graph below.
A. N(3.5,.04) B. Individuals' GPAs are independent The students represent less than 10% of all possible students The distribution of GPAs is roughly unimodal and symmetric, so the sample is large enough C. choose the graph that has the Mean GPA starting 3.38-3.62. Don't pick "-3 to 3, 2.9 to 4.1
The distribution of scores on a test for a particular class is skewed to the right. The professor wants to predict the maximum score and understand the distribution of the sample maximum. She simulates the distribution of the maximum of the test for 36 different tests (with n = 5). The histogram to the right shows a simulated sampling distribution of the sample maximum from these tests. a) Would a Normal model be a useful model for this sampling distribution? Explain b) The mean of this distribution is 46.9 and the SD is 3.5. Would you expect about 68% of the samples to have their maximums within 3.5 of 46.9? Why or why not?
A. No. The sampling distribution of the maximum is skewed to the right, so a Normal model would not be useful for this sampling distribution. B. No. The 68-95-99.7 Rule is based on the Normal distribution.
Data on the fuel economy of several 2010 model vehicles are given in the accompanying table. a) Find and interpret a 95% confidence interval for the gas mileage of 2010 vehicles. Select the correct choice below and fill in the answer boxes within your choice. (Round to two decimal places as needed. Use ascending order.) b) Does this confidence interval capture the mean gas mileage for all 2010 vehicles? Choose the correct answer below.
A. One is 95% confident that the true mean gas mileage for cars like the ones in the sample is between 28.15 mpg and 34.25 mpg. B. Without knowing how the data were selected, one must be cautious about generalizing to all 2010 cars.
For each of the situations in parts a through d below, state whether a Type I, a Type II, or neither error has been made. Explain briefly. a) A bank wants to know if the enrollment on their website is above 30% based on a small sample of customers. They test H0: p=0.3 vs. HA: p>0.3 and reject the null hypothesis. Later they find out that actually 28% of all customers enrolled. Choose the correct answer below. b) A student tests 100 students to determine whether other students on her campus prefer soda brand A or soda brand B and finds no evidence that preference for brand A is not 0.5. Later, a marketing company tests all students on campus and finds no difference. Choose the correct answer below. c) A human resource analyst wants to know if the applicants this year score, on average, higher on their placement exam than the 52.5 points the candidates averaged last year. She samples 50 recent tests and finds the average to be 54.1 points. She fails to reject the null hypothesis that the mean is 52.5 points. At the end of the year, they find that the candidates this year had a mean of 55.3 points. Choose the correct answer below. d) A pharmaceutical company tests whether a drug lifts the headache relief rate from the 25% achieved by the placebo. They fail to reject the null hypothesis because the P-value is 0.465. Further testing shows that the drug actually relieves headaches in 38% of people. Choose the correct answer below.
A. The bank made a Type I error. The actual value is not greater than 0.3 but they rejected the null hypothesis. Your answer is correct. B. The student did not make an error. The actual value is 0.50, which was not rejected. C. The analyst made a Type II error. The actual value was 55.3 points, which is greater than 52.5. D. The company made a Type II error. The null hypothesis was not rejected, but it was false. The true relief rate was greater than 0.25.
A study measured the waist size of 975 men, finding a mean of 36.24 inches and a standard deviation of 4.09 inches. A histogram of these measurements is shown to the right. a) Describe the histogram of the waist sizes. b) To explore variation of the mean from sample to sample, they simulated by drawing many samples of size 2, 5, 10, and 20 with replacement, from the 975 measurements. The histograms for each simulation are shown in the accompanying table. Explain how these histograms demonstrate what the Central Limit Theorem says about the sampling distribution model for sample means.
A. The histogram is skewed to the right. B. These simulations appear to demonstrate what the Central Limit Theorem says about the sampling distribution model for sample means. All of the histograms are centered near 36 inches. As n gets larger, the histograms approach the Normal shape, and the variability in the sample means decreases. The histograms are fairly Normal by the time the sample reaches size 5.
A clean air standard requires that vehicle exhaust emissions not exceed specified limits for various pollutants. Many states require that cars be tested annually to be sure they meet these standards. Suppose state regulators double-check a random sample of cars that a suspect repair shop has certified as okay. They will revoke the shop's license if they find significant evidence that the shop is certifying vehicles that do not meet standards. Complete parts a through d below. a) In this context, what is a Type I error? b) In this context, what is a Type II error? c) Which type of error would the shop's owner consider more serious? d) Which type of error might environmentalists consider more serious?
A. The regulators revoke the shop's license when it is only certifying vehicles that meet standards. B. The regulators do not revoke the shop's license when it is certifying vehicles that do not meet standards. C. The shop's owner would consider a Type I error more serious because the owner does not want the shop's license revoked. D. Environmentalists would consider a Type II error more serious because they do not want faulty cars on the road polluting the environment.
A waiter believes the distribution of his tips has a model that is slightly skewed to the right, with a mean of $9.60 and a standard deviation of $5.40. a) Explain why you cannot determine the probability that a given party will tip him at least $20. Choose the correct answer below. b) Can you estimate the probability that the next 4 parties will tip an average of at least $15? Explain. c) Is it likely that his 10 parties today will tip an average of at least $15? Explain.
A. This distribution is skewed, meaning it is non-symmetric and does not meet the conditions for the Normal model, so the probabilities of values within this distribution cannot be determined. B. No. A sample of 4 parties is probably not large enough for the CLT to allow the use of a Normal model to estimate the distribution of averages. C. While a sample of 10 parties may not be large enough to use a Normal model, it is likely that the sample of averages is starting to approach a Normal distribution. The standard deviation of this distribution is about $1.71, meaning that an average tip of $15 is more than 3 standard deviations above the mean. Even if the distribution is slightly skewed, this is still unlikely.
Which of the following are true? If false, explain briefly. a) A very high P-value is strong evidence that the null hypothesis is false. b) A very low P-value proves that the null hypothesis is false. c) A high P-value shows that the null hypothesis is true. d) A P-value below 0.05 is always considered sufficient evidence to reject a null hypothesis.
A. This statement is false because it is a low P-value that provides evidence that the null hypothesis is false. B. This statement is false because a very low P-value only shows strong evidence that the null hypothesis is false. C. This statement is false because a high P-value shows that the data is consistent with the null hypothesis, but can never prove that the null hypothesis is true. D. This statement is false because the null hypothesis is rejected whenever the P-value is below the value of α, which may not necessarily be 0.05.
Using the t tables, software, or a calculator, estimate the values asked for a) Find the critical value of t for a 95% confidence interval with df=15. b) Find the critical value of t for a 90% confidence interval with df=58.
A. t = 2.13 you want to figure out the right tail area first 1-.95/2 = .025 1-.025 = .975 use www.ttable.org/student-t-value-calculator.html Df :15 significance level: .975 calculate = t B. t = 1.67 1-.90/2 = .050 1-.050 = .950 use www.ttable.org/student-t-value-calculator.html Df :58 significance level: .950 calculate = t
Sam ran an experiment to test optimum power and time settings for microwave popcorn. His goal was to deliver popcorn with fewer than 8% of the kernels left unpopped, on average. He determined that power 9 at 4 minutes was the best combination. To be sure that the method was successful, he popped 8 more bags of popcorn (selected at random) at this setting. All were of high quality, with the percentages of unpopped kernels shown below. 3.7, 10.7, 4.6, 5.8, 6.1, 9.9, 12.3, 4.
Ho: u = 8 Ha: u <8 Claimed Hypothesis Mean, H0: 8 Sample Mean, x: 7.25 Standard deviation, σ:3.2293 Sample Size, n: 8 t= -0.657 p-value = 0.2661 Yes, there is enough evidence suggest that less than 11% of the kernels are left unpopped when the specified power and time settings are used
A sample of 20 CEOs shows total annual compensations ranging from a minimum of $0.1 to $62.72 million. The average for these 20 CEOs is $23.916 million. The histogram is shown to the right. Based on these data, a computer program found that a 95% confidence interval for the mean annual compensation of all CEOs is (−14.62,62.45) $M. Why should you be hesitant to trust this confidence interval?
The assumptions and conditions for a t-interval are not met. The distribution is too skewed for a sample size of only 20.