MATH-11-SP18: OpenIntro Statistics (3rd Ed)
5.4 Find the p-value, Part II. An independent random sample is selected from an approximately normal population with an unknown standard deviation. Find the p-value for the given set of hypotheses and T test statistic. Also determine if the null hypothesis would be rejected at ↵ = 0.01. a) HA :μ>0.5,n=26,T =2.485 b) HA :μ<3,n=18,T =0.5
a) b)
2.5 Coin Flips: If you flip a fair coin 7 times, what is the probability of each of the following? (please round all answers to 4 decimal places) a) getting all tails? b) getting all heads? c) getting at least one tails?
a) b) c)
4.17 Identify hypotheses, Part I. Write the null and alternative hypotheses in words and then symbols for each of the following situations. (a) New York is known as "the city that never sleeps". A random sample of 25 New Yorkers were asked how much sleep they get per night. Do these data provide convincing evidence that New Yorkers on average sleep less than 8 hours a night? (b) Employers at a firm are worried about the e↵ect of March Madness, a basketball championship held each spring in the US, on employee productivity. They estimate that on a regular business day employees spend on average 15 minutes of company time checking personal email, making personal phone calls, etc. They also collect data on how much company time employees spend on such non- business activities during March Madness. They want to determine if these data provide convincing evidence that employee productivity decreases during March Madness.
(a) H0 : μ = 8 (On average, New Yorkers sleep 8 hours a night.) HA :μ<8 (On average,New Yorkers sleep less than 8 hours a night.) (b) H0 : μ = 15 (The average amount of com- pany time each employee spends not working is 15 minutes for March Madness.) HA : μ > 15 (The average amount of com- pany time each employee spends not working is greater than 15 minutes for March Madness.)
2.23 HIV in Swaziland: Swaziland has the highest HIV prevalence in the world: 25.9% of this country's population is infected with HIV. The ELISA test is one of the first and most accurate tests for HIV. For those who carry HIV, the ELISA test is 99.7% accurate. For those who do not carry HIV, the test is 92.6% accurate. If an individual from Swaziland has tested positive, what is the probability that he carries HIV? (please round to 4 decimal places)
0.8247
2.37 Portfolio return: A portfolio's value increases by 17% during a financial boom and by 8% during normal times. It decreases by 13% during a recession. What is the expected return on this portfolio if each scenario is equally likely?
4
1.48 Stats scores. Below are the final exam scores of twenty introductory statistics students. 57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94 Create a box plot of the distribution of these scores.
Min: 57. 1Q: 72.75 Med: 78.50 3Q: 82.25 Max: 94.
1.34 Music and learning: You would like to conduct an experiment in class to see if students learn better if they study without any music, with music that has no lyrics (instrumental), or with music that has lyrics. Briefly outline a design for this study.
Select a group of participants and randomly assign them to a set number of groups. The groups should be no music, instrumental music, and music with lyrics. Give every participant an info sheet to learn about the music, the assess them. What were the results that we're correct on average in the three groups.
4.19 Online communication. A study suggests that the average college student spends 10 hours per week communicating with others online. You believe that this is an underestimate and decide to collect your own sample for a hypothesis test. You randomly sample 60 students from your dorm and find that on average they spent 13.5 hours a week communicating with others online. A friend of yours, who o↵ers to help you with the hypothesis test, comes up with the following set of hypotheses. Indicate any errors you see. H0 :x ̄<10hours H A : x ̄ > 1 3 . 5 h o u r s
The hypotheses should be about the population mean (μ), not the sample mean. The null hypothesis should have an equal sign and the alternative hypothesis should be about the null hypothesized value, not the observed sam- ple mean. Correction: H0 :μ=10 hours HA :μ>10hours The one-sided test indicates that we are only interested in showing that 10 is an underesti- mate. Here the interest is in only one direction, so a one-sided test seems most appropriate. If we would also be interested if the data showed strong evidence that 10 was an overestimate, then the test should be two-sided.
4.27 Working backwards, one-sided. You are given the following hypotheses: H0 :μ=30 HA :μ>30 We know that the sample standard deviation is 10 and the sample size is 70. For what sample mean would the p-value be equal to 0.05? Assume that all conditions necessary for inference are satisfied.
Z=1.65=10/p70!x ̄=31.97.
1.44 Make-up exam: In a class of 21 students, 20 of them took an exam in class and 1 student took a make-up exam the following day. The professor graded the first batch of 20 exams and found an average score of 80 points with a standard deviation of 6.5 points. The student who took the make-up the following day scored 61 points on the exam. a) Does the new student's score increase or decrease the average? b) The new average is: (round to two decimal places) c) Does the new student's score increase or decrease the standard deviation of the scores?
a) b)
2.33 College smokers: At a university, 15% of students smoke. a) Calculate the expected number of smokers in a random sample of 140 students from this university (please do not round your answer): b) The university gym opens at 9 am on Saturday mornings. One Saturday morning at 8:55 am there are 19 students outside the gym waiting for it to open. Should you use the same approach from part (a) to calculate the expected number of smokers among these 19 students?
a) b) No, it is unlikely that smoking habits and waking up early to go to the gym on Saturday are independent
4.18 Identify hypotheses, Part II. Write the null and alternative hypotheses in words and using symbols for each of the following situations. (a) Since 2008, chain restaurants in California have been required to display calorie counts of each menu item. Prior to menus displaying calorie counts, the average calorie intake of diners at a restaurant was 1100 calories. After calorie counts started to be displayed on menus, a nutritionist collected data on the number of calories consumed at this restaurant from a random sample of diners. Do these data provide convincing evidence of a di↵erence in the average calorie intake of a diners at this restaurant? (b) Based on the performance of those who took the GRE exam between July 1, 2004 and June 30, 2007, the average Verbal Reasoning score was calculated to be 462. In 2011 the average verbal score was slightly higher. Do these data provide convincing evidence that the average GRE Verbal Reasoning score has changed since 2004?
a) b)
6.14 The Civil War. A national survey conducted in 2011 among a simple random sample of 1,507 adults shows that 56% of Americans think the Civil War is still relevant to American politics and political life.46 a) Conduct a hypothesis test to determine if these data provide strong evidence that the majority of the Americans think the Civil War is still relevant. b) Interpret the p-value in this context.
a) b)
1.5 Cheaters, study components: Researchers studying the relationship between honesty, age, and self-control conducted an experiment on 160 children between the ages of 5 and 15. Participants reported their age, sex, and whether they were an only child or not. The researchers asked each child to toss a fair coin in private and to record the outcome (white or black) on a paper sheet, and said they would only reward children who report white. Half the students were explicitly told not to cheat and the others were not given any explicit instructions. In the "no instruction" group the probability of cheating was found to be uniform across groups based on a child's characteristics. In the group that was explicitly told to not cheat, girls were less likely to cheat, and while rate of cheating didn't vary by age for boys, it decreased with age for girls (Ritz et al. 2000). Identify the following a) What are the cases? b) What are the variables and their types? c) What is the main research question?
a) b) c)
1.6 Stealers, study components: In a study of the relationship between socio-economic class and unethical behavior, 129 University of California undergraduates at Berkeley were asked to identify themselves as having low or high social-class by comparing themselves to others with the most (least) money, most (least) education, and most (least) respected jobs. They were also presented with a jar of individually wrapped candies and informed that the candies were for children in a nearby laboratory, but that they could take some if they wanted. After completing some unrelated tasks, participants reported the number of candies they had taken. It was found that those who were identified as upper-class took more candy than others (Piff, 2012). Identify the following about this study. a) What are the cases? b) What are the variables and their types? c) What is the main research question?
a) b) c)
2.10 Guessing on an exam: In a multiple choice exam, there are 5 questions and 4 choices for each question (a, b, c, d). Nancy has not studied for the exam at all and decides to randomly guess the answers. What is the probability that: (please round all answers to four decimal places) a) the first question she gets right is question number 5? b) she gets all of the questions right? c) she gets at least one question right?
a) b) c)
2.6 Dice Rolls: If you roll a pair of fair dice, what is the probability of each of the following? (round all answers to 4 decimal places) a) getting a sum of 1? b) getting a sum of 5? c) getting a sum of 12?
a) b) c)
1.14 Cats on YouTube: Suppose you want to estimate the percentage of videos on YouTube that are cat videos. It is impossible for you to watch all videos on YouTube so you use a random video picker to select 1000 videos for you. You find that 2% of these videos are cat videos. Determine which of the following is an observation, a variable, a sample statistic, or a population parameter. a) The percentage of all videos on YouTube that are cat videos is a/an: b) 2% c) A video in your sample d) Whether or not a video is a cat video
a) b) c) d)
1.16 Income and education in US counties: The scatterplot below shows the relationship between per capita income (in thousands of dollars) and percent of population with a bachelor's degree in 3,143 counties in the US in 2010. a) What is the explanatory variable? b) What is the response variable? c) Describe the relationship between the variables. d) Can we conclude that having a bachelor's degree increases one's income?
a) b) c) d)
1.32 Vitamin supplements: In order to assess the effectiveness of taking large doses of vitamin C in reducing the duration of the common cold, researchers recruited 400 healthy volunteers from staff and students at a university. A quarter of the patients were assigned a placebo, and the rest were evenly divided between 1g Vitamin C, 3g Vitamin C, or 3g Vitamin C plus additives to be taken at onset of a cold for the following two days. All tablets had identical appearance and packaging. The nurses who handed the prescribed pills to the patients knew which patient received which treatment, but the researchers assessing the patients when they were sick did not. No significant differences were observed in any measure of cold duration or severity between the four medication groups, and the placebo group had the shortest duration of symptoms a) Was this an experiment or observational study? Experiment Observational Correct b) What is the explanatory variable? level of vitamin C supplement duration of cold volunteers severity of cold Correct c) What is the response variable? duration of cold level of vitamin C supplement severity of cold volunteers Correct term-24 d) Were the patients blind to their treatment? e) Was the study double-blind?
a) b) c) d) e)
2.36 Is it worth it?: Andy is always looking for ways to make money fast. Lately, he has been trying to make money by gambling. Here is the game he is considering playing: The game costs $2 to play. He draws a card from a deck. If he gets a number card (2-10), he wins nothing. For any face card ( jack, queen or king), he wins $3. For any ace, he wins $5, and he wins an extra $20 if he draws the ace of clubs. a) Andy's expected profit per game is: $ b) Would you recommend this game to Andy as a good way to make money? Explain.
a) -.54 b) No, we expect Andy to lose money each time he plays this game
2.20 Assortative mating: Assortative mating is a nonrandom mating pattern where individuals with similar genotypes and/or phenotypes mate with one another more frequently than what would be expected under a random mating pattern. Researchers studying this topic collected data on eye colors of 214 Scandinavian men and their female partners. The table below summarizes the results (rows represent male eye color while columns represent female eye color). For simplicity, we only include heterosexual relationships in this exercise. (please round any numerical answers to 4 decimal places) Blue Brown Green Total Blue 61 32 26 119 Brown 16 26 7 49 Green 13 9 24 46 Total 90 67 57 214 a) What is the probability that a randomly chosen male respondent or his partner has blue eyes? b) What is the probability that a randomly chosen male respondent with blue eyes has a partner with blue eyes? c) What is the probability that a randomly chosen male respondent with brown eyes has a partner with blue eyes? d) What is the probability of a randomly chosen male respondent with green eyes having a partner with blue eyes? e) Does it appear that the eye colors of male respondents and their partners are independent? Explain.
a) b) c) d) e)
1.1 Migraine and Acupuncture: A migraine is a particularly painful type of headache, which patients sometimes wish to treat with acupuncture. To determine whether acupuncture relieves migraine pain, researchers conducted a randomized controlled study where 96 females diagnosed with migraine headaches were randomly assigned to one of two groups: treatment or control. 40 patients in the treatment group received acupuncture that is specifically designed to treat migraines. 56 patients in the control group received placebo acupuncture (needle insertion at non-acupoint locations). 24 hours after patients received acupuncture, they were asked if they were pain free. Results are summarized in the contingency table below. (please round answers to within one hundredth of a percent) Pain Free: Yes Pain Free: No Total Treatment 11 29 40 Control 4 52 56 Total 15 81 96 a) What percent of patients in the treatment group were pain free 24 hours after receiving acupuncture? Incorrect % b) What percent of patients in the control group were pain free after 24 hours? Incorrect % c) At first glance, does acupuncture appear to be an effective treatment for migraines? Explain your reasoning. d) Do the data provide convincing evidence that there is a real pain reduction for those patients in the treatment group? Or do you think that the observed difference might just be due to chance?
a) b) c)Yes, because a higher percentage of individuals in the treatment group were pain-free after 24 hours. d)It is impossible to tell merely by comparing the sample proportions because the difference could be the result of random error in our sample
1.11 Buteyko method, scope of inference: Exercise 1.4 introduces a study on using the Buteyko shallow breathing technique to reduce asthma symptoms and improve quality of life. As part of this study 600 asthma patients aged 18-69 who relied on medication for asthma treatment were recruited and randomly assigned to two groups: one practiced the Buteyko method and the other did not. Those in the Buteyko group experienced, on average, a significant reduction in asthma symptoms and an improvement in quality of life. a) Identify the population of interest and the sample in the study. Clearly label each (ie. The population of interest is..., the sample is...). b) Comment on whether or not the results of the study can be generalized to the population, and if the findings of the study can be used to establish causal relationships.
a) b)
1.25 Flawed reasoning: Identify the flaw(s) in reasoning in the following scenarios. Explain what the individuals in the study should have done differently if they wanted to make such strong conclusions. a) Students at an elementary school are given a questionnaire that they are asked to return after their parents have completed it. One of the questions asked is, "Do you find that your work schedule makes it difficult for you to spend time with your kids after school?" Of the parents who replied, 85% said "no". Based on these results, the school officials conclude that a great majority of the parents have no difficulty spending time with their kids after school. b) A survey is conducted on a simple random sample of 1,000 women who recently gave birth, asking them about whether or not they smoked during pregnancy. A follow-up survey asking if the children have respiratory problems is conducted 3 years later, however, only 567 of these women are reached at the same address. The researcher reports that these 567 women are representative of all mothers. c) An orthopedist administers a questionnaire to 30 of his patients who do not have any joint problems and finds that 20 of them regularly go running. He concludes that running decreases the risk of joint problems.
a) b) c)
5.12 Auto exhaust and lead exposure. Researchers interested in lead exposure due to car exhaust sampled the blood of 52 police officers subjected to constant inhalation of automobile exhaust fumes while working trac enforcement in a primarily urban environment. The blood samples of these officers had an average lead concentration of 124.32 μg/l and a SD of 37.74 μg/l; a previous study of individuals from a nearby suburb, with no history of exposure, found an average blood level concentration of 35 μg/l.36 a) Write down the hypotheses that would be appropriate for testing if the police officers appear to have been exposed to a higher concentration of lead. b) Explicitly state and check all conditions necessary for inference on these data. c) Test the hypothesis that the downtown police officers have a higher lead exposure than the group in the previous study. Interpret your results in context.
a) b) c)
6.15 Browsing on the mobile device. A 2012 survey of 2,254 American adults indicates that 17% of cell phone owners do their browsing on their phone rather than a computer or other device.47 a) According to an online article, a report from a mobile research company indicates that 38 per- cent of Chinese mobile web users only access the internet through their cell phones.48 Conduct a hypothesis test to determine if these data provide strong evidence that the proportion of Americans who only use their cell phones to access the internet is different than the Chinese proportion of 38%. b) Interpret the p-value in this context.
a) b) If in fact 38% of Americans used their cell phones as a primary access point to the internet, the prob- ability of obtaining a random sample of 2,254 Americans where 17% or less or 59% or more use their only their cell phones to access the internet would be approximately 0. (c) (0.1545, 0.1855). We are 95% confident that approximately 15.5% to 18.6% of all Americans primarily use their cell phones to browse the internet.
2.9 Disjoint vs. independent: In parts (a) and (b), identify whether the events are disjoint, independent, or neither (events cannot be both disjoint and independent). a) You and a randomly selected student from your class both earn A's in this course. b) You and your class partner both earn A's in this course. c) If two events can occur at the same time, they must be independent.
a) independent b) neither c) Fasle
2.2 Roulette wheel: The game of roulette involves spinning a wheel with 38 slots: 18 red, 18 black, and 2 green. A ball is spun onto the wheel and will eventually land in a slot, where each slot has an equal chance of capturing the ball. a) You watch a roulette wheel spin 10 consecutive times and the ball lands on a red slot each time. What is the probability that the ball will land on a red slot on the next spin? b) You watch a roulette wheel spin 280 consecutive times and the ball lands on a red slot each time. What is the probability that the ball will land on a red slot on the next spin?
a) 0.4737 b) 0.4737
2.38 Baggage fees: An airline charges the following baggage fees: $25 for the first bag and $35 for the second. Suppose 53% of passengers have no checked luggage, 32% have only one piece of checked luggage and 15% have two pieces. We suppose a negligible portion of people check more than two bags. a) The average baggage-related revenue per passenger is: $ (please round to the nearest cent) b) The standard deviation of baggage-related revenue is: $ (please round to the nearest cent) c) About how much revenue should the airline expect for a flight of 120 passengers? $
a) 17 b) 21.24 c) 2040
1.46/47 Means, Medians, Standard Deviations, and IQRs: Answer the following about each dataset. (Round to two decimal places where appropriate) Dataset I: 1 2 3 4 5 a) The median of Dataset I is: b) The IQR of Dataset I is from to c) The mean of Dataset I is: d) The standard deviation of Dataset I is: Dataset II: 6 7 8 9 10 e) The median of Dataset II is: f) The IQR of Dataset II is from to g) The mean of Dataset II is: h) The standard deviation of Dataset II is:
a) 3 b) 1.5, 4.5 c) 3 d) 1.58 e) 8 f) 6.5, 9.5 g) 8 h) 1.58
1.66 Views on immigration: 880 randomly sampled registered voters from Tampa, FL were asked if they thought workers who have illegally entered the US should be (i) allowed to keep their jobs and apply for US citizenship, (ii) allowed to keep their jobs as temporary guest workers but not allowed to apply for US citizenship, or (iii) lose their jobs and have to leave the country. The results of the survey by political ideology are shown below. (Round all answers to the nearest hundredth of a percent) Conservative Moderate Liberal Total (i) Apply for citizenship 34 88 200 322 (ii) Guest worker 81 87 44 212 (iii) Leave the country 114 76 18 208 (iv) Not sure 71 13 54 138 Total 300 264 316 880 a) What percent of these Tampa, FL voters identify themselves as conservatives? Correct % b) What percent of these Tampa, FL voters are in favor of the citizenship option? Correct % c) What percent of these Tampa, FL voters identify themselves as conservative and are in favor of the citizenship option? Incorrect % d) What percent of these Tampa, FL voters who identify themselves as conservatives are also in favor of the citizenship option? Correct % e) What percent of moderates share this view? Correct % f) What percent of liberals share the view? Correct % g) Political ideology and views on immigration appear to be:
a) 34.1 b) 36.6 c) d) 11.3 e) 33.3 f) 63.3 g) Dependent
2.44 Cat weights: The histogram shown below represents the weights (in kg) of 47 female and 97 male cats. a) Approximately Correct % of these cats weigh less than 2.5kg. b) Approximately Correct % of these cats weigh between 2.5 and 2.75kg. c) Approximately Correct % of these cats weigh between 2.75 and 3.5kg.
a) 42 b) 15 c) 37
1.42 Sleeping in college: A recent article in a college newspaper stated that college students get an average of 6.4 hrs of sleep each night. A student who was skeptical about this value decided to conduct a survey by randomly sampling 24 students. On average, the sampled students slept 5.4 hours per night. Identify which value represents the sample mean and which value represents the claimed population mean. a) What is the sample mean? b) What is the claimed population mean?
a) 5.4 b) 6.4
1.46/47 Means, Medians, Standard Deviations, and IQRs: Answer the following about each dataset. (Round to two decimal places where appropriate) Dataset I: 0 10 50 60 100 a) The median of Dataset I is: b) The IQR of Dataset I is from to c) The mean of Dataset I is: d) The standard deviation of Dataset I is: Dataset II: 0 100 500 600 1000 e) The median of Dataset II is: f) The IQR of Dataset II is from to g) The mean of Dataset II is: h) The standard deviation of Dataset II is:
a) 50 b) 5, 80 c) 44 d) 40.37 e) 500 f) 50, 800 g) 440 h) 403.73
1.43 Parameters and statistics: Identify which value represents the sample mean and which value represents the claimed population mean. a) American households spent an average of about $72 in 2007 on Halloween merchandise such as costumes, decorations and candy. To see if this number had changed, researchers conducted a new survey in 2008 before industry numbers were reported. The survey included 1550 households and found that average Halloween spending was $56 per household. The sample mean is _____ dollars, while the claimed population mean is _____ dollars. b) The average GPA of students in 2001 at a private university was 3.57. A survey on a sample of 340 students from this university yielded an average GPA of 3.55 in Spring semester of 2012. The sample mean is _____ and the claimed population mean is _____
a) 56, 72 b) 3.55, 3.57
1.46/47 Means, Medians, Standard Deviations, and IQRs: Answer the following about each dataset. (Round to two decimal places where appropriate) Dataset I: 3 5 6 7 9 a) The median of Dataset I is: b) The IQR of Dataset I is from to c) The mean of Dataset I is: d) The standard deviation of Dataset I is: Dataset II: 3 5 6 7 20 e) The median of Dataset II is: f) The IQR of Dataset II is from to g) The mean of Dataset II is: h) The standard deviation of Dataset II is:
a) 6 b) 4, 8 c) 6 d) e) 6 f) 4, 13.5 g) 8.2 h)
1.46/47 Means, Medians, Standard Deviations, and IQRs: Answer the following about each dataset. (Round to two decimal places where appropriate) Dataset I: 3 5 6 7 9 a) The median of Dataset I is: b) The IQR of Dataset I is from to c) The mean of Dataset I is: d) The standard deviation of Dataset I is: Dataset II: 3 5 8 7 9 e) The median of Dataset II is: f) The IQR of Dataset II is from to g) The mean of Dataset II is: h) The standard deviation of Dataset II is:
a) 6 b) 4, 8 c) 6 d) 2.24 e) 7 f) 4, 8.5 g) 6.4 h) 2.4
1.10 Cheaters, scope of inference: Exercise 1.5 introduces a study where researchers studying the relationship between honesty, age, and self-control conducted an experiment on 160 children between the ages of 5 and 15. The researchers asked each child to toss a fair coin in private and to record the outcome (white or black) on a paper sheet, and said they would only reward children who report white. Half the students were explicitly told not to cheat and the others were not given any explicit instructions. Differences were observed in the cheating rates in the instruction and no instruction groups, as well as some differences across children's characteristics within each group. a) Identify the population of interest in the study. b) Identify the sample for this study. c) Can the results of the study can be generalized to the population? Should the findings of the study can be used to establish causal relationships.
a) All children between the ages of 5 and 15 b) c)
1.31 Light and exam performance: A study is designed to test the effect of light level on exam performance of students. The researcher believes that light levels might have different effects on males and females, so wants to make sure both are equally represented in each treatment. The treatments are fluorescent overhead lighting, yellow overhead lighting, no overhead lighting (only desk lamps). a) What is the response variable? b) What is the explanatory variable? c) What are the levels of the explanatory variable? d) What is the blocking variable? e) What are the levels of the blocking variable?
a) Exam performance b) Light level c) Fluorescent, overhead lighting, yellow overhead lighting, no overhead lighting. d) Sex e) Male and female
1.32 Vitamin supplements: In order to assess the effectiveness of taking large doses of vitamin C in reducing the duration of the common cold, researchers recruited 400 healthy volunteers from staff and students at a university. A quarter of the patients were assigned a placebo, and the rest were evenly divided between 1g Vitamin C, 3g Vitamin C, or 3g Vitamin C plus additives to be taken at onset of a cold for the following two days. All tablets had identical appearance and packaging. The nurses who handed the prescribed pills to the patients knew which patient received which treatment, but the researchers assessing the patients when they were sick did not. No significant differences were observed in any measure of cold duration or severity between the four medication groups, and the placebo group had the shortest duration of symptoms a) Was this an experiment or observational study? b) What is the explanatory variable? c) What is the response variable? d) Were the patients blind to their treatment? e) Was the study double-blind?
a) Experiment b) level of vitamin C supplement c) duration of cold d) Yes e) No
5.7 Sleep habits of New Yorkers. New York is known as "the city that never sleeps". A random sample of 25 New Yorkers were asked how much sleep they get per night. Statistical summaries of these data are shown below. Do these data provide strong evidence that New Yorkers sleep less than 8 hours a night on average? n x ̄ s min max 25 7.73 0.77 6.17 9.78 a) Write the hypotheses in symbols and in words. b) Check conditions, then calculate the test statistic, T, and the associated degrees of freedom. c) Find and interpret the p-value in this context. Drawing a picture may be helpful. d) What is the conclusion of the hypothesis test?
a) H0: μ = 8 (New Yorkers sleep 8 hrs per night on average.) HA: μ < 8 (New Yorkers sleep less than 8 hrs per night on average.) b) Independence: The sample is random and from less than 10% of New Yorkers. The sample is small, so we will use a t-distribution. For this size sample, slight skew is acceptable, and the min/max suggest there is not much skew in the data.T=1.75.df=251=24. c)0.025< p-value < 0.05. If in fact the true population mean of the amount New Yorkers sleep per night was 8 hours, the probability of getting a random sample of 25 New Yorkers where the aver- age amount of sleep is 7.73 hrs per night or less is between 0.025 and 0.05. d) Since p-value < 0.05, reject H0. The data provide strong evidence that New Yorkers sleep less than 8 hours per night on average.
1.70 Heart transplants. The Stanford University Heart Transplant Study was conducted to determine whether an experimental heart transplant program increased lifespan. Each patient entering the program was designated an official heart transplant candidate, meaning that he was gravely ill and would most likely benefit from a new heart. Some patients got a transplant and some did not. The variable transplant indicates which group the patients were in; patients in the treatment group got a transplant and those in the control group did not. Another variable called survived was used to indicate whether or not the patient was alive at the end of the study. a) Based on the mosaic plot, is survival independent of whether or not the patient got a transplant? Explain your reasoning. b) What do the box plots below suggest about the efficacy (effectiveness) of the heart transplant treatment. c) What proportion of patients in the treatment group and what proportion of patients in the control group died? d) One approach for investigating whether or not the treatment is effective is to use a randomization technique. 1. What are the claims being tested? 2. The paragraph below describes the set up for such approach, if we were to do it without using statistical software. Fill in the blanks with a number or phrase, whichever is appropriate. 3. What do the simulation results shown below suggest about the effectiveness of the transplant program?
a) If we see the mosaic plot, we can conclude that the survival is not independent since the expectancy of life is bigger for the patients who got the heart transplant. b) The box plot suggest that the heart transplant increases the survival rate for a longer period of time. c) Control Group alive = 4 dead = 30 total control = 34 Treatment Group alive = 24 dead = 45 treatment total = 69 Control Group died proportion: ControlDead/TotalControl=30/34 Treatment Group died proportion: TreatmentDead/TotalTreatment=45/69 d) 1. H0: We start with a null hypothesis that represents the status quo. HA: We also have an alternative hypothesis that represents our research question (Survival due to transplant). 2. We write alive on 28 cards representing patients who were alive at the end of the study, and dead on 75 cards representing patients who were not. Then, we shuffle these cards and split them into two groups: one group of size 69 representing treatment, and another group of size 34 representing control. We calculate the difference between the proportion of dead cards in the treatment and control groups (treatment - control) and record this value. We repeat this 100 times to build a distribution centered at 45/69−30.34=−0.230179. If this fraction is low, we conclude that it is unlikely to have observed such an outcome by chance and that the null hypothesis should be rejected in favor of the alternative. 3. Based on the 100 simulations, we can conclude as follows: We conclude that the study results do provide strong evidence against the NULL hypothesis. That is, we do have sufficiently strong evidence to conclude the heart transplant was a success since the difference in between the 100 simulations is centered near zero. We conclude that the evidence is sufficiently strong to reject H0 and assert that there was a success survival rate due to heart transplant. When we conduct formal studies, usually we reject the notion that we just happened to observe a rare event 0.50 So in this case, we reject the independence model in favor of the alternative. That is, we are concluding the data provide strong evidence of survival due to hearth transplant.
2.11 Educational attainment of couples: The table below shows the distribution of education level attained by US residents by gender based on data collected during the 2010 American Community Survey. (please round all answers to four decimal places) A B C D F (a) 0.3 0.3 0.3 0.2 0.1 (b) 0 0 1 0 0 (c) 0.3 0.3 0.3 0 0 (d) 0.3 0.5 0.2 0.1 -0.1 (e) 0.2 0.4 0.2 0.1 0.1 (f) 0 -0.1 1.1 0 0 a) Distribution (a) is a/an: b) Distribution (b) is a/an: c) Distribution (c) is a/an: d) Distribution (d) is a/an: e) Distribution (e) is a/an: f) Distribution (f) is a/an:
a) Invalid probability distribution b) Valid probability distribution c) Invalid probability distribution d) Invalid probability distribution e) Valid probability distribution f) Invalid probability distribution
1.28 Reading the paper: Below are excerpts from two articles published in the NY Times a) An article tiled: Risks: Smokers Found More Prone to Dementia states the following: "Researchers analyzed data from 23,123 health plan members who participated in a voluntary exam and health behavior survey from 1978 to 1985, when they were 50-60 years old. 23 years later, about 25% of the group had dementia, including 1,136 with Alzheimer's disease and 416 with vascular dementia. After adjusting for other factors, the researchers concluded that pack-a-day smokers were 37% more likely than nonsmokers to develop dementia, and the risks went up with increased smoking; 44% for one to two packs a day; and twice the risk for more than two packs." Based on this study, can we conclude that smoking causes dementia later in life? Explain your reasoning. b) Another article titled The School Bully Is Sleepy states the following: "The University of Michigan study, collected survey data from parents on each child's sleep habits and asked both parents and teachers to assess behavioral concerns. About a third of the students studied were identified by parents or teachers as having problems with disruptive behavior or bullying. The researchers found that children who had behavioral issues and those who were identified as bullies were twice as likely to have shown symptoms of sleep disorders." A friend of yours who read the article says, "The study shows that sleep disorders lead to bullying in school children." Is this statement justified? If not, how best can you describe the conclusion that can be drawn from this study?
a) No, this is an observational study b) No, this is an observational study
1.39 Associations: Describe the relationship between the predictor and response variables in each of the four scatterplots below. a) Describe plot (1) above: b) Describe plot (2) above: c) Describe plot (3) above: d) Describe plot (4) above:
a) Positive, linear b) No association c) Positive, non-linear d) Negative, linear
1.36 Exercise and mental health. A researcher is interested in the effects of exercise on mental health and he proposes the following study: Use stratified random sampling to ensure representative proportions of 18-30, 31-40 and 41- 55 year old from the population. Next, randomly assign half the subjects from each age group to exercise twice a week, and instruct the rest not to exercise. Conduct a mental health exam at the beginning and at the end of the study, and compare the results. a) What type of study is this? b) What are the treatment and control groups in this study? c) Does this study make use of blocking? If so, what is the blocking variable? d) Does this study make use of blinding? e) Comment on whether or not the results of the study can be used to establish a causal relationship between exercise and mental health, and indicate whether or not the conclusions can be generalized to the population at large. f) Suppose you are given the task of determining if this proposed study should get funding. Would you have any reservations about the study proposal?
a) Prospective Study (It identifies individuals and collects information as events unfold). b) Treatment Group: Patients that exercise twice a week. Control Group: Patients for whom advice was given as to not to exercise. c) Yes, the blocking variable is the age. d) Yes (the experimental cases don't know whether they are in the control group or the treatment group). e) Yes the results can be used to establish a casual relationship between exercise and mental health since the sampling is random and the assignments were random; in this case the results can be generalized to the population at large. f) No, I would not have any reservations about the study proposal. I think is statistically well presented since it complies with the Principles of Experimental Design: Control, Randomize, Replicate, Block
1.26 City council survey: A city council has requested a household survey be conducted in a suburban area of their city. The area is broken into many distinct and unique neighborhoods, some including large homes, some with only apartments, and others a diverse mixture of housing structures. Identify the sampling methods described below, and comment on whether or not you think they would be effective in this setting. a) Randomly sample 50 households from the city b) Divide the city into neighborhoods, then sample 20 households from each neighborhood c) Divide the city into neighborhoods, randomly sample 10 neighborhoods, and sample all households from those neighborhoods d) Divide the city into neighborhoods, randomly sample 10 neighborhoods, and then randomly sample 20 households from those neighborhoods e) Sample the 200 households closest to the city council offices
a) Simple Random, Effective b) Stratified, Effective c) Cluster, Ineffective d) Multistage, Ineffective e) Convenience, Ineffective
1.8 Smoking habits of UK residents: A survey was conducted to study the smoking habits of UK residents. Below is a data matrix displaying a portion of the data collected in this survey. Note that "£" stands for British Pounds Sterling, "cig" stands for cigarettes, and "N/A" refers to a missing component of the data. sex age marital grossIncome smoke amtWeekends amtWeekdays 1 Female 42 Single Under £2600 Yes 12 cig/day 12 cig/day 2 Male 44 Single £10,400 to £15,600 No N/A N/A 3 Male 53 Married Above £36,400 Yes 6 cig/day 6 cig/day ∶ ∶ ∶ ∶ ∶ ∶ ∶ ∶ 1130 Male 40 Single £2600 to £5,200 Yes 8 cig/day 8 cig/day a) What does each row of the data matrix represent? b) How many participants were included in the survey? c) Identify each variable, determine whether each variable is numerical or categorical. If the variable is numerical, specify continuous or discrete. If the variable is categorical, specify whether the variable is ordinal or not.
a) an observation b) 1130 c) The variables are sex (regular categorical), age (discrete), marital status (regular categorical), earnings (ordinal), whether or not the individual smokes (regular categorical), amount the individual smokes per day on a weekday (discrete), amount the individual smokes per day on a weekend (discrete)
1.67 Views on the DREAM Act: A random sample of registered voters from Tampa, FL were asked if they support the DREAM Act, a proposed law which would provide a path to citizenship for people brought illegally to the US as children. The survey also collected information on the political ideology of the respondents. Based on the mosaic plot shown below, do views on the DREAM Act and political ideology appear to be independent? Explain your reasoning. a) Views on the DREAM Act and political affiliation appear to be: b) Explain your reasoning:
a) dependent b) From the mosaic plot, it looks as though a higher proportion of liberals support the DREAM Act
1.68 Raise Taxes: A random sample of registered voters nationally were asked whether they think it's better to raise taxes on the rich or raise taxes on the poor. The survey also collected information on the political party affiliation of the respondents. Based on the mosaic plot shown below, do views on raising taxes and political affiliation appear to be independent? Explain your reasoning. a) Views on raising taxes and political affiliation appear to be: b) Explain your reasoning:
a) dependent b) From the mosaic plot, it looks as though a larger proportion of Democrats think it is better to raise taxes on the rich
2.17 Global warming: A research poll asked 1550 Americans "From what you've read and heard, is there solid evidence that the average temperature on earth has been getting warmer over the past few decades, or not?". The table below shows the distribution of responses by party and ideology, where the counts have been replaced with relative frequencies. Earth is warming Not warming Don't know (or refuse) Total Conservative Republican 0.07 0.08 0.07 0.22 Mod/Lib Republican 0.09 0.02 0.13 0.24 Mod/Cons Democrat 0.12 0.02 0.08 0.22 Liberal Democrat 0.22 0.01 0.09 0.32 Total 0.5 0.13 0.37 1 a) Are believing that the earth is warming and being a liberal Democrat mutually exclusive? b) What is the probability that a randomly chosen respondent believes the earth is warming or is a liberal Democrat? c) What is the probability that a randomly chosen respondent believes the earth is warming given that they are a liberal Democrat? d) What is the probability that a randomly chosen respondent believes the earth is warming given that they are a conservative Republican? e) Does it appear that whether or not a respondent believes the earth is warming is independent of their party ideology? f) What is the probability that a randomly chosen respondent is a moderate/liberal Republican given that they do not believe that the earth is warming?
a) not mutually exclusive b) c) .6875 d) .3182 e) belief in global warming and party ideology are dependent f)
1.56 Distributions and appropriate statistics (Part II): For each of the following, state whether you expect the distribution to be symmetric, right skewed, or left skewed. Also specify whether the mean or median would best represent a typical observation in the data, and whether the variability of observations would be best represented using the standard deviation or IQR. Explain your reasoning. (a) Housing prices in a country where 25% of the houses cost below $350,000, 50% of the houses cost below $450,000, 75% of the houses cost below $1,000,000 and there are a meaningful number of houses that cost more than $6,000,000. The distribution is expected to be: A typical observation is best represented by the: The variability in the observations is best measured by the: standard deviation (b) Housing prices in a country where 25% of the houses cost below $300,000, 50% of the houses cost below $600,000, 75% of the houses cost below $900,000 and very few houses that cost more than $1,200,000. The distribution is expected to be: A typical observation is best represented by the: The variability in the observations is best measured by the: (c) Number of alcoholic drinks consumed by college students in a given week. Assume that most of these students don't drink since they are under 21 years old, and only a few drink excessively. The distribution is expected to be: A typical observation is best represented by the: The variability in the observations is best measured by the: (d) Annual salaries of the employees at a Fortune 500 company where only a few high level executives earn much higher salaries than the all other employees. The distribution is expected to be: A typical observation is best represented by the: The variability in the observations is best measured by the:
a) right skewed, median, IQR b) symmetric, mean, standard deviation c) right skews, median, IQR d) right skewed, median, IQR
1.50 Mix-and-match. Describe the distribution in the histograms below and match them to the box plots. a) Symmetrical distribution; b) Multimodal distribution; c) Right Skew distribution;
a) the match will be box plot #2. b) the match will be the box plot #3. c) the match will be the box plot #1.
5.9 Find the mean. You are given the following hypotheses: H0 :μ=60 HA :μ<60 We know that the sample standard deviation is 8 and the sample size is 20. For what sample mean would the p-value be equal to 0.05? Assume that all conditions necessary for inference are satisfied.
t*19 is 1.73 for a one-tail. We want the lower tail, so set -1.73 equal to the T-score, then solve for x ̄: 56.91.