ERAU STAT 222
2.3.9 A researcher decides to set the significance level to 0.10. If the null hypothesis is true, what is the probability of a Type I error?
0.10 A researcher decides to set the significance level to 0.10. If the researcher chooses a smaller significance level, what will be the impact on the probability of a Type II error, assuming the alternative hypothesis is true? Type II error rate will increase
2.1.6 When stating null and alternative hypotheses, the hypotheses are:
Always about the parameter only.
2.1.38 A zoologist at a large metropolitan zoo is concerned about a potential new disease present among the 243 sharks living in the large aquarium at the zoo. The zoologist takes a random sample of 15 sharks from the aquarium, temporarily removes the sharks from the tank, and tests them for the disease. He finds that 3 of the sharks have the disease.The zoologist wishes to test whether there is evidence that less than one-fourth of the sharks in the aquarium are diseased. Evaluate the strength of evidence for this hypothesis. Find the p-value for the hypothesis using a simulation-based approach.
Based on the p-value evaluate the strength of evidence and state a conclusion about diseased sharks. There is no evidence that the proportion of the diseased sharks in the zoo is less than 0.25. In which population, if any, are you comfortable drawing your conclusion? The sharks at the zoo. Explain why a theory-based approach is or is not reasonable for these data. If a theory-based approach is reasonable, find the p-value. A theory-based approach is not reasonable since there were only 3 sharks with the disease. We need at least 10.
2.3.22 Later in the book you will encounter hypotheses of the following type:H0: Men and women do not differ on average with regard to the variable of interest.Ha: Men and women do differ on average with regard to the variable of interest.
Describe what Type I error means in this context. It is decided the men and women differ, when they actually do not differ Describe what Type II error means in this context. It is decided that it is plausible that men and women do not differ, when they actually do differ.
2.3.12 Dogs have been domesticated for about 14,000 years. In that time, have they been able to develop an understanding of human gestures such as pointing or glancing? How about similar nonhuman cues? Researchers Udell, Giglio, and Wynne tested a small number of dogs in order to answer these questions. The researchers positioned the dogs about 2.5 meters from the experimenter. On each side of the experimenter were two cups. The experimenter would perform some sort of gesture (pointing, bowing, looking) toward one of the cups or there would be some other nonhuman gesture (a mechanical arm pointing, a doll pointing, or a stuffed animal looking) toward one of the cups. The researchers would then see whether the dog would go to the cup that was indicated. There were six dogs tested.We will look at one of the dogs in two of his sets of trials. This dog, a four-year-old mixed breed, was named Harley. Each trial involved one gesture and one pair of cups, with a total of 10 trials in a set.You want to investigate whether Harley the dog could select the correct cup more than 50% of the time in the long run.
Describe what a Type I error would be in this study. Harley is guessing, but we determine his is not guessing. Describe what a Type II error would be in this study. Harley is not guessing, but we do not determine he is not guessing
1.5.13 Psychic abilities Statistician Jessica Utts has conducted extensive analysis of studies that have investigated psychic functioning. One type of study involves having one person (called the "sender") concentrate on an image while a person in another room (the "receiver") tries to determine which image is being "sent." The receiver is given four images to choose from, one of which is the actual image that the sender is concentrating on. This is a technique called Ganzfeld.
Describe what the symbol π stands for in this context. The long-run proportion of times that a person identifies the correct image What are the null and alternative hypotheses? Null: The long-run proportion of times that a person identifies the correct image = 0.25Alt: The long-run proportion of times that a person identifies the correct image > 0.25 If the subjects in these studies have no psychic ability, approximately what proportion will identify the correct image? Is this the null hypothesis or the alternative hypothesis? Approximately 0.25 will identify the correct image if they have no psychic ability. This is the null hypothesis. Utts reported that Bem and Honorton found a total of 106 "hits" in the 329 sessions. Does this result provide very strong evidence against the null hypothesis? Use an applet to simulate 1,000 repetitions of this study, assuming the null hypothesis to be true to find and report a p-value. Based on this p-value, summarize a conclusion in the context of this study and explain your reasoning for having arrived at this conclusion. Be sure that you have used the 3S strategy for assessing the strength of evidence for the research conjecture. The p-value is approximately 0.002 since we only got a result 0.322 or larger 2 out of 1000 times by chance. Thus, there is strong evidence that the proportion of correct guesses is larger than 0.25. Now carry out the same analysis, only instead of simulating the null, use the theory-based (normal approximation) approach for that null distribution. How does the p-value you found using the theory-based approach compare to the one you found using simulation? Is this surprising? Why or why not? Using the Theory-based inference applet (which uses the normal approximation for the null distribution) yields a standardized statistic of 3.02 and p-value of 0.0012, again showing strong evidence that the proportion of correct guesses is larger than 0.25. It is not surprising that the two approaches give similar results since the sample size is very much larger (in particular, there are 106 successful guesses and 223 unsuccessful guesses. Both values are much larger than 10).
1.5.10 According to researchers, a coin flip may not have a 50% chance of landing heads and a 50% chance of landing tails. In fact, they believe that a coin is more likely to land the same way it started. So if it starts out heads up, it is more likely to land heads up. Suppose someone tests this hypothesis with 1,000 flips of a coin where it starts out heads up each time.
Describe what the symbol π stands for in this context. The symbol π represents the long-run proportion of times of the coin lands heads up. What are the null and alternative hypotheses? Null: π = 0.5. Alternative: π > 0.5. Suppose 52% of the sample of 1,000 flips landed heads facing up. Verify the validity conditions that allow us to use a theory-based test. If the result was heads 52% of the time out 1000, then 520 must have been heads and 480 tails. Both of these are greater than 10. A theory-based test reports a standardized statistic of 1.26. What does this mean? A standardized statistic of 1.26 means that our observed proportion of 0.52 is 1.26 standard deviations above 0.50 in the null distribution. A theory-based test reports a p-value of 0.1030. What would be your conclusion in terms of strength of evidence and what that means in the context of the study? We have little-to-no evidence that it is more likely for a coin to land the same side up as it started than not.
2.3.2 Indicate whether or not you would reject the null hypothesis, at the α = 0.05 significance level, for the p-value = 0.078.
Fail to reject Indicate whether or not you would reject the null hypothesis, at the α = 0.05 significance level, for the p-value = 0.045. Reject Indicate whether or not you would reject the null hypothesis, at the α = 0.05 significance level, for the p-value = 0.001. Reject Indicate whether or not you would reject the null hypothesis, at the α = 0.05 significance level, for the p-value = 0.051. Fail to Reject
2.1.30 Television news survey In order to understand more about how people in the U.S. feel about the outcome of a recent criminal trial in which the defendant was found not guilty, a television news program invites viewers go to the news program's website and indicate their opinion about the event. At the end of the show 82% of the 562 people who voted in the poll indicated they were unhappy with the verdict. Evaluate the strength of evidence for the hypothesis that the proportion of U.S. adults opposed to the verdict is greater than 0.75.
Find the p-value for the hypothesis using a simulation-based approach. Based on the p-value evaluate the strength of evidence and state a conclusion about the opinions of U.S. adults about the verdict. There is strong evidence that the proportion of US adults opposed to the verdict is greater than 75%. To which population, if any, are you comfortable drawing your conclusion? Comfortable in generalizing to all people who watch this program and are motivated to participate. Explain why a theory-based approach is or is not reasonable for these data. If a theory-based approach is reasonable, find the p-value. Theory-based is appropriate because there are at least 10 successes and 10 failures in the data, p = 0.0001.
2.1.31 Television news survey A television news program has been running a story on a recent criminal trial. The news program invites viewers to go to their website and take a survey. One of the questions asks participants to report the amount of time the respondent spent reading or watching the news coverage about the trial during the last three days. The poll found that, on average, respondents had spent 92 minutes reading or watching news coverage about the trial during the last three days.
Identify the variable measured on each respondent. The variable is the time spent reading or watching news coverage. The poll found that, on average, respondents had spent 92 minutes reading or watching news coverage about the trial during the last three days. Is the variable categorical or quantitative? Quantitative The poll found that, on average, respondents had spent 92 minutes reading or watching news coverage about the trial during the last three days. Identify two statistics that the news program could use to summarize the variable. Mean and Median The poll found that, on average, respondents had spent 92 minutes reading or watching news coverage about the trial during the last three days. Identify one graph that the news program could use to summarize the variable. Dot Plot
2.1.13 In order to estimate the typical amount of TV watched per day by students at her school of 1,000 students, a student has all of the students in her statistics class (30 students) take a short survey. In the survey the student asked students whether or not they watched at least 10 minutes of TV yesterday. The student found that 21 of 30 students reported watching at least 10 minutes of TV yesterday.
Identify the variable measured on each student. Whether a student watched more than 10 minutes of TV yesterday Is the variable categorical or quantitative? categorical Identify one statistic that the student could use to summarize the variable. proportion Identify one graph that the student could use to summarize the variable. bar graph
2.2.16 Most dermatologists recommend using sunscreens that have a sun protection factor (SPF) of at least 30. One of the authors wanted to find out whether the SPF of sunscreens used by students at her school (which is in a very sunny part of the U.S.) exceeds this value, on average.
Identify the variable of interest and whether the variable is categorical or quantitative. SPF Value, quantitative Describe the author's parameter of interest and assign an appropriate symbol to denote it. h=mean SPF of sunscreen Write the appropriate hypotheses using symbols. H0: h=30 verses Ha: h >30
2.2.19 Needles!Consider a manufacturing process that is producing hypodermic needles that will be used for blood donations. These needles need to have a diameter of 1.65 mm—too big and they would hurt the donor (even more than usual), too small and they would rupture the red blood cells, rendering the donated blood useless. Thus, the manufacturing process would have to be closely monitored to detect any significant departures from the desired diameter. During every shift, quality control personnel take a random sample of several needles and measure their diameters. If they discover a problem, they will stop the manufacturing process until it is corrected. For now, suppose that a "problem" is when the sample average diameter turns out to be statistically significantly different from the target of 1.65 mm.
Identify the variable of interest and whether the variable is categorical or quantitative. The diameter of the needs and is it quantitative. Write the appropriate hypotheses using appropriate symbols to test whether the average diameter of needles from the manufacturing process is different from the desired value. H0: h=1.65 mm Ha: h =/1.65mm Suppose that the most recent random sample of 35 needles have an average diameter of 1.64 mm and a standard deviation of 0.07 mm. Assign appropriate symbols to these numbers. n=35, x-bar =1.64, s=0.07
1.4.26 Healthy lungs Researchers wanted to test the hypothesis that living in the country is better for your lungs than living in a city. To eliminate the possible variation due to genetic differences, they located seven pairs of identical twins with one member of each twin living in the country, the other in a city. For each person, they measured the percentage of inhaled tracer particles remaining in the lungs after one hour: the higher the percentage, the less healthy the lungs. They found that for six of the seven twin pairs the one living in the country had healthier lungs.
Is the alternative hypothesis one-sided or two-sided? one-sided Based on the sample size and distance between the null value and the observed proportion, estimate the strength of evidence. moderately strong Here are probabilities for the number of heads in seven tosses of a fair coin:Compute the p-value and state your conclusion. p-value = 0.0547 + 0.0078 = 0.0625 we have moderate evidence that individuals living in the country have healthier lungs than those of individuals living in cities.
1.4.28 Presidential stature In a race for U.S. president, is the taller candidate more likely to win?In the first election of the 20th century, Theodore Roosevelt (178 cm) defeated Alton B. Parker (175 cm). There have been 27 additional elections since then, for a total of 28. Of these, 25 elections had only two major party candidates with one taller than the other. In 19 of the 25 elections, the taller candidate won.
Let π = P (taller wins). State the research hypothesis in words and in symbols. In a race for U.S. president, is the taller candidate more likely to win? Alternatively, is π > 0.5? State the null and alternative hypotheses in words and symbols. Null: The long-run proportion of races where the taller candidate wins in U.S. presidential elections is 0.5;Alt: The long-run proportion of races where the taller candidate wins in U.S. presidential elections is larger than 0.5.Using symbols: H0: π = 0.5, Ha: π > 0.5; where π is the long-run proportion of races where the taller candidate won. What is the p-value of this test? probability of heads: 0.5, number of tosses: 25, approximate p-value = 0.0071 If you take the p-value at face value, what do you conclude? We have very strong evidence against the null and in support of the taller candidate winning the race more often than would be predicted by random chance. Are there reasons not to take the p-value at face value? If yes, list them. Yes, it is somewhat arbitrary to only look at 20th century elections.
2.3.6 Suppose that you perform a significance test using the α = 0.05 significance level. For what p-values would you reject the null hypothesis?
P - values less than or equal to 0.05 Suppose that you perform a significance test using the α = 0.05 significance level. For what p-values would you fail to reject the null hypothesis? p-values greater than 0.05
2.1.2 In most statistical studies the
Parameter, Statistic
Chapter 02.1 Exercise Question 05 2.1.5
Random samples only generate unbiased estimates of long-run proportions, NOT long-run means. False Nonrandom samples are always biased. False There is no way that a sample of 100 people can be representative of all adults living in the United States. False
1.4.4 Chess-boxing You have heard that in sports like boxing there might be some competitive advantage to those wearing red uniforms. You want to test this with your new favorite sport of chess-boxing. You randomly assign blue and red uniforms to contestants in 20 matches and find that those wearing red won 14 times (or 70%). You conduct a test of significance using simulation and get the following null distribution. (Note this null distribution uses only 100 simulated samples and not the usual 1000 or more.)Probability of success (π): 0.5Sample size (n): 20Number of samples: 100Total = 100
Suppose you want to see if competitors wearing red win more than 50% of the matches in the long run, so you test H0: π= 0.50 versus Ha: π > 0.50. What is your p-value based on the above null distribution? 0.05 Suppose you now want to see if competitors wearing either red or blue have an advantage, so you test H0: π= 0.50 versus Ha: π ≠ 0.50. What is your p-value now based on the above null distribution? 0.11
2.2.2 The monthly salaries of the three people working in a small firm are $3,500, $4,000, and $4,500. Suppose the firm makes a profit and everyone gets a $100 raise. How, if at all, would the average of the three salaries change?
The average would increase.
2.2.4 The monthly salaries of the three people working in a small firm are $3500, $4000, and $4500. Suppose the firm makes a profit and everyone gets a 10% raise, how, if at all, would the average of the three salaries change?
The average would increase.
2.2.1 On January 28, 1986, the Space Shuttle Challenger broke apart 73 seconds into its flight, killing all seven astronauts on board. All investigations into reasons for the disaster pointed towards the failure of an O-ring in the rocket's engine. Given below is a dot plot and some descriptive statistics on O-ring temperature (°F) for each test firing or actual launch of the shuttle rocket engine.
The numeric values of two possible measures of center are calculated to be 65.86°F and 67.50°F. Which one of these is the mean and which the median? How are you deciding? The numeric values of two possible measures of center are calculated to be 65.86°F and 67.50°F. Which one of these is the mean and which the median? How are you deciding? Since the distribution is skewed to the left, the mean will be to the left of the median; hence, 65.86°F is the mean and 67.50°F is the median. On January 28, 1986, the Space Shuttle Challenger broke apart 73 seconds into its flight, killing all seven astronauts on board. All investigations into reasons for the disaster pointed towards the failure of an O-ring in the rocket's engine. Given below is a dot plot and some descriptive statistics on O-ring temperature (°F) for each test firing or actual launch of the shuttle rocket engine. If we removed the observation 31°F from the data set, how would the following numerical statistics change, if at all? larger, larger, smaller
2.2.5 The monthly salaries of the three people working in a small firm are $3500, $4000, and $4500. Suppose the firm makes a profit and everyone gets a 10% raise, how, if at all, would the standard deviation of the three salaries change?
The standard deviation would increase.
2.2.3 The monthly salaries of the three people working in a small firm are $3,500, $4,000, and $4,500. Suppose the firm makes a profit and everyone gets a $100 raise. How, if at all, would the standard deviation of the three salaries change?
The standard deviation would stay the same.
2.2.7 An instructor collected data on the number of U.S. states her students had visited. A dot plot for the collected data is shown below. Identify the observational units.
The students in the class. An instructor collected data on the number of U.S. states her students had visited. Identify the variable recorded and whether it is categorical or quantitative. The number of states visited, quantitative. An instructor collected data on the number of U.S. states her students had visited. Her class has 50 students. A dot plot for the collected data is shown below. Use the dot plot to find and report the median value for the number of states visited by the students in this study. 7.5 An instructor collected data on the number of U.S. states her students had visited. A dotplot for the collected data is shown below. Would the mean value for these data be smaller than, larger than, or the same as the median, as reported in (d)? Mean will be larger since the distribution is skewed to the right. Suppose that the observation recorded as 43 states is a typo and was meant to be 34. If we corrected this entry in the data set, how would the following numerical statistics change, if at all? smaller, same, smaller
2.3.25 Do trick-or treaters have an overall preference between Halloween toys or candy? Researchers had 283 trick-or-treaters select their treat from a plate of small toys or candy. They found that 148 of them chose candy and 135 chose a toy.
Use the one-proportion applet to find the p-value for this study. Based on this p-value in this study, would you reject the null hypothesis at the 0.10 significance level? Do not reject the null hypothesis. Based on your answer to part (c), could you possibly be making Type II error? Yes
2.1.8 Argue whether or not you believe using a sample of students from your school's cafeteria (you recruit the next 100 people to visit the cafeteria to participate) may or may not yield biased estimates based on the variable being measured/research question being investigated in each of the following situations.
Use the proportion of students with Type O blood to learn about the proportion of U.S. adults with Type O blood. Representative, since blood types are probably not different among students at the cafeteria compared to the U.S. population. Use the proportion of students who eat fast food regularly to learn about the proportion of U.S. college students who eat fast food regularly. Not representative, since students in the cafeteria may eat most of their meals at the cafeteria instead of eating fast food. Use the proportion of students who have brown hair to learn about the proportion of all students at your school who have brown hair. Representative, since hair color is probably not different among students who eat at the cafeteria compared to the other students. Use the proportion of students who have brown hair to learn about the proportion of all U.S. adults who have brown hair. Not representative, since students in the cafeteria could differ racially from the U.S. population, and thus would have a different proportion of brown hair.
1.4.7 Minesweeper One of the authors sometimes likes to play Minesweeper, and of the last 20 times she played Minesweeper, she won 12 times. That is, she won 60% of the games.
What if she had played 20 games and won 18? Would that provide stronger, weaker, or evidence of similar strength compared to 12 wins out of 20, to conclude that her long-run proportion of winning at Minesweeper is higher than 50%? Explain how you are deciding. Stronger because the statistic(18/20 = 90%) is much farther away from the null hypothesized value (50%) than before (12/20 = 60%). What if she had played 100 games and won 60? Would that provide stronger, weaker, or evidence of similar strength compared to 12 wins out of 20, to conclude that her long-run proportion of winning at Minesweeper is higher than 50%? Explain how you are deciding. Stronger because the statistic is the same (60%) but the sample size is much larger (100 vs. 20). What if she had played 30 games and won 12? Would that provide evidence that her long-run proportion of winning at Minesweeper is higher than 50%? Explain how you are deciding. No, 40% is less than the null hypothesis value of 50%, so this is not evidence that that long-run proportion of wins is more than 50%
2.1.26 Television news survey In order to understand more about how people in the U.S. feel about the outcome of a recent criminal trial in which the defendant was found not guilty, a television news program invites viewers go to the news program's website and indicate their opinion about the event. At the end of the show 82% of the people who voted in the poll indicated they were unhappy with the verdict.
What is the population of interest? The population is all adults in the United States Do you believe that the proportion of people unhappy with the verdict in the sample is likely less than, similar to, or greater than the proportion of individuals unhappy with the verdict in the population? It is perhaps greater then the population proportion because the people who voluntarily went to the website and responded to the poll would likely have a strong opinion about the verdict.
1.5.23 One of the authors used to have an electronic Yahtzee game that he played frequently. The game would "roll" five virtual six-sided dice. But were the dice fair?It seemed to the author that sixes showed up more than what they should if the dice were fair and he wanted to test this. He had the machine roll 500 dice and obtained 92 sixes.
Which of the following describes what the parameter is in the context of this problem? The long-run proportion of times a six is rolled Which of the following states the appropriate null and alternative hypotheses in the context of this study? Null: The long-run proportion of times a six is rolled is 16.7%.Alt: The long-run proportion of times a six is rolled is more than 16.7%. Using an appropriate applet, find the p-value using a theory-based test (one-proportion z-test; normal approximation). (Round your answer to 4 decimal places; e.g. 5.2751.) 0.1541 Summarize the conclusion from the p-value. We have little-to-no evidence that the long-run proportion of times a six is rolled is more than 16.7%.
1.5.17 Have you ever played rock-paper-scissors (or Rochambeau)? It's considered a "fair game" in that the two players are equally likely to win (like a coin toss). Both players simultaneously display one of three hand gestures (rock, paper, or scissors), and the objective is to display a gesture that defeats that of your opponent. The main gist is that rocks break scissors, scissors cut paper, and paper covers rock. We investigated some results of the game rock-paper-scissors, where the researchers had 119 people play rock-paper-scissors against a computer. They found 66 players (55.5%) started with rock, 39 (32.8%) started with paper, and 14 (11.8%) started with scissors. We want to see if players start with scissors with a long-term probability that is different from 1/3.
Which of the following states the appropriate null and alternative hypotheses in the context of this study, first in words and then in symbols? Null: The long-run proportion of times that a player starts with scissors is 33%, π = 33%Alt: The long-run proportion of times that a player starts with scissors is different from 33%, π ≠ 33% Using an appropriate applet, find the p-value using a theory-based test (one-proportion z-test; normal approximation).(Round your answer to 2 decimal places, e.g. 0.58.) Summarize the conclusion from the p-value. 0 We have very strong evidence that the long-run proportion of times that a player starts with scissors is different from 33%.
1.5.8 Suppose you ride to school with a friend and often arrive at a certain stop light when it is red. One day she states, "It seems like this light is green only 10% of the time when we get here." You think it is more often than 10% and want to test this. You keep track of the color (green/not green) the next 20 times you go to school and find that 4 times (4/20 = 20%) the light is green when you arrive. You wish to see if your sample provides strong evidence that the true proportion of times the light is green is greater than 10%. In other words, you are testing the hypotheses H0: π = 0.10 versus Ha: π > 0.10 where π = the long-run proportion of times the light is green.Two different approaches were taken in order to yield a p-value and both are shown in the applet output.• Option 1. A simulation-based test was done and found a p-value of 0.148, showing weak evidence against the null.• Option 2. A one-proportion z-test was conducted and found a p-value of 0.068, yielding moderate evidence against the null.
Which test gives a more valid p-value? Option 1 Which of the following represent the BEST reason why the p-value from option 1 is more valid. The validity conditions are not met for this test since the light was green only 4 times (which is less than 10). We can also see this is a problem in the applet since the normal overlay does not match up nicely with the skewed null distribution.
1.4.5 Suppose you are testing the hypothesis H0: versus Ha : . You get a sample proportion of 0.54 and find that your p-value is 0.08. Now suppose you redid your study with each of the following changes.
You increase the sample size and still find a sample proportion of 0.54. How will the new p-value compare to the p-value of 0.08 you first obtained? Will be smaller Now keeping the sample size the same, you take a new sample and find a sample proportion of 0.55. How will the new p-value compare to the p-value of 0.08 you first obtained? Will be smaller With your original sample, you decided to test a two sided alternative instead of Ha: . How will the new p-value compare to the p-value of 0.08 you first obtained? Will double