Introduction to Statistics: Chapter 11 Homework (Multiple Comparisons and Analysis of Variance)
A survey was given to StatCrunch users on the length of time for commuting and the method of commuting. Assume this is a random sample. A technology output for one-way ANOVA is given, along with the means and standard deviations. Divide the largest standard deviation (StDev) by the smallest, and explain why you should not use ANOVA on this data set. Assume Normality.
Divide the largest standard deviation by the smallest. Ans: The quotient is 2.35. Why should ANOVA not be used? Ans: The quotient of the largest standard deviation divided by the smallest standard deviation is too large, so the equal variance condition is not satisfied. NOTE: View the technology output.
Three independent random samples of community college students were obtained to find out how many hours the students spent each week doing math homework outside of the classroom. The samples were made up of students enrolled in pre-algebra (PreAlg), elementary algebra (ElemAlg), and intermediate algebra (InterAlg). Use three two-sample t-tests, applying the appropriate Bonferroni Correction to achieve an overall significance level of 0.05, to compare all possible pairs of means. Assume the conditions for two-sample t-tests are met.
Fill out the table, reporting all the sample means and standard deviations. PreAlg: 1.32 (Mean), 0.96 (SD) ElemAlg: 3.18 (Mean), 0.75 (SD) InterAlg: 4.42 (Mean), 2.78 (SD) Finish the list of all three possible comparisons. 1. PreAlg compared to ElemAlg 2. PreAlg compared to InterAlg 3. ElemAlg compared to InterAlg Find the corrected value for the significance level by dividing 0.05 by the number of comparisons. The corrected significance level is 0.0167. We have assumed that the conditions for two-sample t-tests are met. For all tests, the null hypothesis is that the two population means are the same, and the alternative hypothesis is that the two population means are different. Complete the table below. For a significant difference, the p-value must be less than the Bonferroni-corrected value for the significance level. PreAlg and ElemAlg: 5.08 (t-value), 0.000 (p-value), different (conclusion) PreAlg and InterAlg: 3.64 (t-value), 0.003 (p-value), different (conclusion) ElemAlg and InterAlg: 1.49 (t-value), 0.163 (p-value), not different (conclusion) Write a clear conclusion based on what you found. Which groups have sample means that are significantly different, and how do they differ? Ans: PreAlg students spend less time doing homework than the others. NOTE: View the table of homework hours.
11.3 Some software (such as SPSS) requires that ANOVA data be stacked and coded. Some software works with both stacked and unstacked data, and some (such as the TI-84) requires unstacked data. Random samples of gasoline prices were obtained in three cities and are shown in the table. Stack and code the data. For codes, use 1 for Denver, 2 for Houston, and 3 for Cleveland.
First Row: Price Second Row: Code NOTE: View the table of gasoline prices.
A random survey was done at a small college, and the students were asked how many hours a week they spent studying outside of class time. They were also asked what class they were in (1 = Freshman, 2 = Sophomore, 3 = Junior, and 4 = Senior). Assuming that the conditions for ANOVA are met, test the hypothesis that the mean number of hours of school work varies by class, reporting the p-value and conclusion. Use the 0.05 level of significance. State your conclusion in the context of the data.
H0: All of the means are the same, or there is no association between class and the number of hours spent studying. Ha: At least one of the means is different, or there is an association between class and the number of hours spent studying. The test statistic is 5.3. The p-value is .002. Write your conclusion. Ans: The p-value is less than the significance level, reject the null hypothesis. The study indicates that at least one class has a different mean.
Two students collected data on random baseball player run-times. Players were required to run 50 yards and were timed in seconds (pitchers did not have to run). Assume that the distribution of each population (outfielders, infielders, catchers) is close enough to Normal to satisfy the Normal condition for using ANOVA. Test the hypothesis that different positions have different mean run-times, using a significance level of 0.05. Do not do post-hoc tests.
H0: All of the means are the same, or there is no association between field position and speed. Ha: At least one of the means is different, or there is an association between field position and speed. Which of the conditions for ANOVA can be assumed to be satisfied based on the given information or can be shown to be satisfied? Select all that apply. Ans: equal variance, Normality, independent groups, and random and independent samples The test statistic is 8.01. The p-value is 0.002. Write your conclusion. Ans: The p-value is less than the significance level, reject the null hypothesis. The study indicates that speed is different for at least one position.
Information was gathered on the starting median salary for students who attended four different types of colleges. Assume the samples are random and Normal. Test the hypothesis that the population means are equal for all the types of colleges. Show all four steps for ANOVA. Do not do post-hoc comparisons. Use a significance level of 0.05.
H0: All of the means are the same, or there is no association between school type and mean starting salary. Ha: At least one of the means is different, or there is an association between school type and mean starting salary. Which of the conditions for ANOVA can be assumed to be satisfied based on the given information or can be shown to be satisfied? Select all that apply. Ans: equal variance, Normality, independent groups, and random and independent samples The test statistic is 39.45. The p-value is .000. Write your conclusion. Ans: The p-value is less than the significance level, reject the null hypothesis. The study indicates that the mean starting salary is different for at least one type of school. NOTE: View the ANOVA results.
Pulse rates were taken for five people each, in three different situations: sitting, after meditation, and after exercise. Explain why it would not be appropriate to use one-way ANOVA to test whether the population mean pulse rates were associated with activity.
The pulse rates are not in three independent groups, so the condition of independent groups fails.
11.2 Refer to the figure. Assume that all distributions are symmetric (therefore the sample mean and median are approximately equal) and that all the samples are the same size. Imagine carrying out two ANOVAs. The first compares the means based on samples A, B, and C (above the horizontal line), and the second is based on samples L, M, and N (below the horizontal line). One of the calculated values of the F statistic is 9.38, and the other is 150.00. Which value is which? Explain.
The F-value of 9.38 goes with A, B, and C. The F-value of 150.00 goes with L, M, and N. The reason for the difference is that the variation between groups (the separation between means) is larger for L, M, and N, relative to the variation within groups (which is the same in all groups).
Random samples of gasoline prices were obtained in three cities and are shown in the popup. Find Bonferroni-corrected intervals for all three comparisons assuming an overall confidence level of 95%, that is, an individual confidence level of 98.33%. Then state whether the means are significantly different based on whether the intervals capture 0 or not.
Write the confidence interval for the difference of means, City A−City B. Ans: (-0.06, 0.06) Write the confidence interval for the difference of means, City A−City C. Ans: (−0.10, 0.01) Write the confidence interval for the difference of means, City B−City C. (−0.10, 0.00) Complete the table below with the results. City A and City B: not different City A and City C: not different City B and City C: not different NOTE: View the gasoline prices table.
Random samples of gasoline prices were obtained in three cities and are shown in the popup. Complete parts (a) through (c) below.
a. Assuming the overall level of significance is 0.05, what is the Bonferroni-corrected level of significance for three groups? Ans: α=.0167 b. Report all three sample means. Also, state which two means are closest to each other. City A: 2.998 City B: 2.998 City C: 3.042 Which means are the closest? Based on the table below, the means for City A and City B are the closest. c. Carry out two-sample t-tests for all three pairs. Summarize your findings in a table that reports the t-values, p-values, and conclusions. Base your conclusion on the Bonferroni-corrected level of significance. Do not assume equal variances. What are the hypotheses for each comparison? H0: The mean for the first city is the same as the mean for the second city. Ha: The mean for the first city is different from the mean for the second city. Complete the table below with the results. City A−City B: 0.00, (t-value), 1.000 (p-value), not different (conclusion) City A−City C: −2.51 (t-value), 0.038 (p-value), not different (conclusion) City B−City C: −1.40 (t-value), 0.220 (p-value), not different (conclusion)
A random survey was done at a small college, and the students were asked how many hours a week they spent studying outside of class time. They were also asked what class they were in (1 = Freshman, 2 = Sophomore, 3 = Junior, and 4 = Senior). Complete parts (a) through (d) below.
a. Figure out the missing SS (sum of squares). Ans: 6,942.9 b. Find MS Error by dividing SS Error (from part (a)) by DF Error, and compare it with 64.9. Ans: MS Error is 64.9, which is approximately equal to the reported value of 64.9. c. Divide MS class by MS Error (calculated in part (b)), and compare the result with the F-value. Ans: The result is 5.185, which is approximately equal to the reported F-value. d. When MS factor (in this case MS class) is more than MS Error, what does that show about the F-value? Will it be more or less than 1? Ans: When MS class is more than MS Error, the F-value will be more than 1 because the dividend of the operation is greater than the divisor. NOTE: View the partial ANOVA results from the survey.
Suppose you have four groups of observations, and you do hypothesis tests (t-tests) to compare all possible pairs of means. Complete parts a and b below.
a. How many pairwise comparisons can be done with four groups? List all comparisons with four groups labeled H, I, J, and K starting with HI, HJ, etc. Ans: HI, HJ, HK, IJ, IK, JK b. Using the Bonferroni Correction, what significance level should you use for each hypothesis test if you want an overall significance level of 0.025? Ans: The Bonferroni-corrected significance level is .0042. (0.025/6)
A random sample of people were asked whether they were athletic, moderately athletic (Mod), or not athletic (NotAth). Then they were tested for reaction speed. Reaction speed was measured indirectly, through reaction distance, as follows: A vertical meter stick was dropped and they caught it. The distance (in centimeters) that the stick fell is the reaction distance, and shorter distances correspond to faster reaction times. Complete parts a and b below.
a. Interpret the boxplots given. Compare the medians, interquartile ranges, and shapes, and mention any potential outliers. Choose the correct answer below. Ans: The medians and interquartile ranges are all similar. The not athletic boxplot is strongly skewed to the right. There are no potential outliers. b. Test the hypothesis that people with different levels of athletic ability (self-described) have different mean reaction distances, reporting the F-statistic, p-value and conclusion. Assume that the distribution of each population is close enough to Normal to satisfy the Normal condition of an ANOVA and that the sample is randomly selected. Ans: H0: μath=μmod=μnot and Ha: At least one of the means is different from another. What is the F-statistic? F=0.09 What is the p-value? p-value=0.917 What is the correct conclusion? Ans: Do not reject the null hypothesis. There is not enough evidence that the population means are different. NOTE: View the reaction distances and use StatCrunch.
Assume you have four groups to compare through hypothesis tests and confidence intervals, and you want the overall level of significance to be 0.025 for the hypothesis tests (which is the same as a 97.5% confidence level for the confidence intervals). a. How many possible comparisons are there? b. What is the Bonferroni-corrected value of the significance level for each hypothesis test? c. What is the Bonferroni-corrected confidence level for each interval? Show your calculations.
a. There are 6 possible comparisons. b. The Bonferroni-corrected significance level is .0042. c. Choose the correct calculation below and enter the Bonferroni-corrected confidence level in your choice. Ans: (1-0.025/6)*100%=99.58%
A random survey was done at a small college, and the students were asked how many hours a week they spent studying outside of class time. They were also asked what class they were in (1 = Freshman, 2 = Sophomore, 3 = Junior, and 4 = Senior). Complete parts (a) through (d) below.
a. Which class had the highest sample mean, and which class had the lowest sample mean? Ans: The Freshman class had the highest sample mean and the Senior had the lowest sample mean. b. Write out the null and alternative hypotheses for the effect of class on schoolwork. H0: All of the means are the same, or there is no association between class and the number of hours spent studying. Ha: At least one of the means is different, or there is an association between class and the number of hours spent studying. c. Identify the F-value from the output. Ans: The F-value is 4.6. d. Assuming that you found an association between class and schoolwork, would that show that class caused the different levels of school work? Explain. Ans: No. The study was observational, so cause-and-effect conclusions are not valid. NOTE: View the partial ANOVA results from the survey.
Refer to the StatCrunch output from a health and nutrition survey, which shows the effect of marital status on cholesterol. Assume the population distributions are close enough to Normal to justify using ANOVA. Use this information to complete parts a. through d. below.
a. Write the null and alternative hypotheses for the effect of marital status on cholesterol. H0: The population means are all equal or marital status and cholesterol levels are not associated. Ha: At least one mean is different from another or marital status and cholesterol levels are associated. b. Identify the F-statistic from the StatCrunch output. Ans: The F-statistic is equal to 8.774316. c. Which marital status had the largest sample mean and which had the smallest sample mean? The status with the largest sample mean is "Widowed." The status with the smallest sample mean is "Never married." d. Assuming that you did find an association between marital status and cholesterol levels, would this association mean that marital status caused a different cholesterol level? Ans: This was an observational study, from which you cannot conclude causality. Can you think of a confounding factor? Ans: One possible confounder is age. For example, the "Never married" may tend to be young, and youth may cause the low cholesterol. NOTE: View the StatCrunch ANOVA table.
11.1 For each situation, choose the appropriate test. Complete parts a and b below.
a. You wish to test whether an association exists between a categorical variable (such as rank of a professor at a university, assuming four ranks) and a numerical variable (such as yearly salary). Ans: ANOVA b. You wish to test whether the means of a numerical variable are different for two possible values of a categorical variable (such as yearly salary for gender). Ans: two-sample t-test