CH18

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Exercise 18.33: Could preventing iron deficiency in infants by delaying the clamping of the umbilical cord help promote child development? Healthy Swedish newborns were randomly assigned to have their umbilical cord clamped either right after birth or after a 3-minute delay - Four years later, parents were instructed to administer the Ages and Stages Questionnaire (ASQ) to assess psychomotor development in their child - The ASQ is scored on a scale of 0 to 300, with higher scores indicating more developmental milestones reached Here are the summary findings: Delayed clamping - n = 130 - M = 278.9 - s = 21.6 Early clamping - n = 115 - M = 275.5 - s = 27.6 Obtain a 95% confidence interval for the difference in mean ASQ score for the populations of 4-year-old Swedish children with delayed and early cord clamping at birth

(−2.9, 9.7) - The evidence is inconclusive on the effectiveness of delayed cord clamping

Conditions for Inference when Comparing Two Means:

1. We have TWO SIMPLE RANDOM SAMPLES, representing two distinct populations - The samples are INDEPENDENT • That is, the INDIVIDUALS IN ONE SAMPLE are UNRELATED TO the INDIVIDUALS IN the OTHER SAMPLE (e.g., matching violates independence) - We MEASURE THE SAME QUANTITATIVE VARIABLE for both samples 2. Both populations are NORMALLY DISTRIBUTED - The MEANS AND STANDARD DEVIATIONS of the populations are UNKNOWN - In practice, it is enough that the DISTRIBUTIONS HAVE SIMILAR SHAPES and that the data have NO STRONG OUTLIERS

Check Your Skills 18.17: A study of the effects of exercise used rats bred to have high or low capacity for exercise - The 8 high-capacity rats had mean blood pressure 89 millimeters of mercury (mm Hg) and standard deviation 9 mm Hg - The 8 low-capacity rats had mean blood pressure 105 mm Hg with standard deviation 13 mm Hg The two-sample t test has degrees of freedom df = 12.46 The P-value corresponding to the hypotheses specified in Exercise 18.16 satisfies... a. 0.005 < P < 0.01 b. 0.01 < P < 0.02 c. 0.02 < P < 0.05

A

Two-Sample Problems in Essence

A two-sample problem can arise from a RANDOMIZED COMPARATIVE EXPERIMENT that RANDOMLY DIVIDES SUBJECTS INTO TWO GROUPS and EXPOSES EACH GROUP TO A DIFFERENT TREATMENT - COMPARING RANDOM SAMPLES SELECTED SEPARATELY FROM TWO POPULATIONS is also a two-sample problem - So is COMPARING TWO SUBSETS OF INDIVIDUALS WITHIN A GENERAL POPULATION by SPLITTING ONE RANDOM SAMPLE INTO CORRESPONDING SUBSETS (e.g., males vs. females) Unlike in the matched pairs designs covered in CH17: - There is NO MATCHING OF THE INDIVIDUALS in two-sample problems, and THE TWO SAMPLES CAN BE OF DIFFERENT SIZES - The computations required for inference procedures for two-sample data differ from those for matched pairs When comparing two groups or two conditions, these groups or conditions form, in essence, an explanatory variable (factor) - You can think of two-sample problems comparing two population means as BIVARIATE PROBLEMS with a QUANTITATIVE RESPONSE VARIABLE and a TWO-LEVEL FACTOR Comparing two populations or the population responses to two treatments starts with data analysis: - Make BOXplots, DOTplots (for SMALL samples), or HISTOGRAMS (for LARGER samples) and then COMPARE the SHAPES, CENTERS, and SPREADS OF THE TWO SAMPLES The most common goal of inference is to COMPARE THE AVERAGE/TYPICAL RESPONSES in the two populations - When data analysis suggests that both population distributions are SYMMETRIC, and especially when they are at least APPROXIMATELY NORMAL, we want to COMPARE THE POPULATION MEANS

Check Your Skills 18.12: Do chimpanzees trust their friend? Using a game setting, researchers were able to compute a trust index based on how often the main player would enlist the help of a partner, hoping that the partner would share the reward - Each chimpanzee performed the game, in random order, once when interacting with a close friend chimpanzee and once when interacting with a non-friend chimpanzee - The study report states that "chimpanzees trust their friends significantly more frequently than their non-friends" Which procedure is used to form this conclusion? a. The one-sample t test b. The matched pairs t test c. The two-sample t test

B

Check Your Skills 18.19: One major reason that the two-sample t procedures are widely used is that they are quite *robust* This means that... a. t procedures do not require that we know the standard deviations of the populations b. Confidence levels and P-values from the t procedures are quite accurate even if the population distribution is not exactly Normal c. t procedures compare population means—a comparison that answers many practical questions

B

Check Your Skills 18.18: A study of the effects of exercise used rats bred to have high or low capacity for exercise - The 8 high-capacity rats had mean blood pressure 89 millimeters of mercury (mm Hg) and standard deviation 9 mm Hg - The 8 low-capacity rats had mean blood pressure 105 mm Hg with standard deviation 13 mm Hg We are 95% confident that the difference μ(LC) - μ(HC) is... a. 16 mm Hg b. A value between 3.9 and 28.1 mm Hg c. A value between 5.1 and 26.9 mm Hg

B M₁ - M₂ - 105 - 89 - 16 t* - 2.170 - invT((1-0.95)/2, 12.46) SE = √((s₁²/n₁) + (s₂²/n₂)) - √((13²/8) + (9²/8)) - √(21.125 + 10.125) - √(31.25) - 5.5902 m = t*(SE) - (2.170)(5.5902) - 12.13 M(diff) ± m - 16 + 12.13 = 28.13 - 16 - 12.13 = 3.87

Check Your Skills 18.15: A study of the effects of exercise used rats bred to have high or low capacity for exercise - The 8 high-capacity rats had mean blood pressure 89 millimeters of mercury (mm Hg) and standard deviation 9 mm Hg - The 8 low-capacity rats had mean blood pressure 105 mm Hg with standard deviation 13 mm Hg What is the value of the two-sample t statistic for comparing the two population means? a. 0.5 b. 2.86 c. 9.65

B t = (M₁ - M₂)/√((s₁²/n₁) + (s₂²/n₂)) - (105 - 89)/√((13²/8) + (9²/8)) - 16/√(21.125 + 10.125) - 16/√31.25 - 2.862

Check Your Skills 18.13: The study in Exercise 18.5 compared individuals who have Highly Superior Autobiographical Memory (HSAM) with control individuals without this ability - Participants were asked to take a verbal association test Here is the CrunchIt! output for the two-sample t test asking whether individuals with HSAM score significantly higher than control individuals on such a measure of verbal association: Null hypothesis: difference of means = 0 Alternative hypothesis: difference of means > 0 n M SD HSAM 9 3.4444 2.506 CON 18 3 2.301 df: 14.92 Difference of means: 0.4444 t-statistic: 0.4463 P-value: 0.3309 What is the value of the test statistic for this two-sample t test? a. 0 b. 0.4463 c. 12.92

B t = (M₁ - M₂)/√((s₁²/n₁) + (s₂²/n₂)) - (3.4444 - 3)/√((2.506²/9) + (2.301²/18)) - 0.4444/(√0.697782 + 0.294145) - 0.4444/(√0.99193) - 0.4444/0.995955 - 0.4462

Check Your Skills 18.11: Topiramate can reduce drinking in alcohol-dependent individuals but would it work for heavy drinkers who are not alcohol-dependent? A study recruited 138 heavy drinkers who wanted to reduce their alcohol consumption and randomly assigned them to take either topiramate or a placebo for three months - The study report states that "topiramate treatment significantly reduced heavy drinking days relative to placebo" Which procedure is used to form this conclusion? a. The one-sample t test b. The matched pairs t test c. The two-sample t test

C

Check Your Skills 18.14: The study in Exercise 18.5 compared individuals who have Highly Superior Autobiographical Memory (HSAM) with control individuals without this ability - Participants were asked to take a verbal association test Here is the CrunchIt! output for the two-sample t test asking whether individuals with HSAM score significantly higher than control individuals on such a measure of verbal association: Null hypothesis: difference of means = 0 Alternative hypothesis: difference of means > 0 n M SD HSAM 9 3.444 2.506 CON 18 3 2.301 df: 14.92 Difference of means: 0.4444 t-statistic: 0.4463 P-value: 0.3309 From the two-sample t test we conclude that... a. Individuals with HSAM have significantly greater abilities on verbal association tests than non-HSAM individuals, on average b. Individuals with or without HSAM have the same abilities on verbal association tests, on average c. The study failed to find significant evidence that, on average, individuals with HSAM have greater abilities on verbal association tests than non-HSAM individuals

C

Check Your Skills 18.16: A study of the effects of exercise used rats bred to have high or low capacity for exercise - The 8 high-capacity rats had mean blood pressure 89 millimeters of mercury (mm Hg) and standard deviation 9 mm Hg - The 8 low-capacity rats had mean blood pressure 105 mm Hg with standard deviation 13 mm Hg We suspect that rats with a low capacity for exercise tend to have a higher blood pressure than rats with a high capacity for exercise To see if this is true, test these hypotheses for the mean blood pressure of all rats with high capacity (HC) and all rats with low capacity (LC) for exercise: a. H₀: μ(HC) = 89; μ(LC) = 105 versus Hₐ: μ(HC) ≠ 89; μ(LC) ≠ 105 b. H₀: μ(HC) = μ(LC) versus Hₐ: μ(HC) ≠ μ(LC) c. H₀: μ(HC) = μ(LC) versus Hₐ: μ(HC) < μ(LC)

C

Check Your Skills 18.20: A study of road rage asked samples of 596 men and 523 women about their behavior while driving - Based on their answers, each subject was assigned a road rage score on a scale of 0 to 20 - The subjects were chosen by random digit dialing of telephone numbers Are the conditions for two-sample t inference satisfied? a. Maybe: The SRS condition is acceptable but we need to look at the data to check Normality b. No: Scores in a range between 0 and 20 can't be Normal c. Yes: The SRS condition is acceptable and large sample sizes make the Normality condition unnecessary

C

Comparing Two Means, in Context

Call the variable we measure x₁ in the first population and x₂ in the second, because the variable may have different distributions in the two populations Here is the notation we will use to describe the two populations: Pop. Variable Pop. Mean Pop. Standard Deviation 1 x₁ μ₁ σ₁ 2 x₂ μ₂ σ₂ There are four unknown parameters: the two means and the two standard deviations - The subscripts remind us which population a parameter describes We want to compare the two population means, either by: - Giving a confidence interval for their difference μ₁ - μ₂ - Testing the hypothesis H₀: μ₁ - μ₂ = 0, which is the same as H₀: μ₁ = μ₂, a statement of no difference We use the sample means and standard deviations to estimate the unknown parameters - Again, subscripts remind us which sample a statistic comes from Here is the notation that describes the samples: Population Sample Size Sample M Sample SD 1 n₁ M₁ SD₁ 2 n₂ M₂ SD₂ To perform inference about the difference μ₁ - μ₂ between the means of the two populations, we start from the difference M₁ - M₂ between the means of the two samples

Two-Sample t Statistic

Draw an SRS of size n₁ from a large Normal population with unknown mean and standard deviation μ₁ and σ₁ - Draw an independent SRS of size n₂ from another large Normal population with unknown mean and standard deviation μ₂ and σ₂ The distribution of the two-sample t statistic: - t = ((M₁ - M₂) - (μ₁ - μ₂))/√((s₁²/n₁) + (s₂²/n₂)) - Is very close to the t distribution with degrees of freedom df given by: • df = [(s₁²/n₁) + (s₂²/n₂)]²/{(1/n₁ - 1)(s₁²/n₁)]² + 1/n₂ - 1)((s₂²/n₂)²} This approximation is accurate when both sample sizes n₁ and n₂ are 5 or larger

Two-Sample t Procedures

Draw an SRS of size n₁ from a large Normal population with unknown mean μ₁ - Draw an independent SRS of size n₂ from another large Normal population with unknown mean μ₂ - For the purpose of inference about the unknown quantity μ₁ - μ₂ A level C CONFIDENCE INTERVAL FOR μ₁ - μ₂ is given by: - (M₁ - M₂) ± t*[√((s₁²/n₁) + (s₂²/n₂))] - Where t* is the critical value with area C between -t* and t* under the t density curve with the appropriate degrees of freedom To TEST THE HYPOTHESIS H₀: μ₁ - μ₂ = 0 (equivalent to μ₁ = μ₂), obtain the TWO-SAMPLE T STATISTIC: - t = (M₁ - M₂)/[√((s₁²/n₁) + (s₂²/n₂))] - Where the test P-VALUE is the probability, when H₀ is true, of getting a test statistic t at least as extreme in the direction of Hₐ as that obtained, and is computed as the corresponding area under the t distribution with the appropriate degrees of freedom Any technology with basic statistical inference capabilities will perform all necessary computations and produce the lower and upper boundaries for a confidence interval or the test statistic and the P-value for a hypothesis test These procedures are accurate when both populations are Normal and both sample sizes n₁ and n₂ are 5 or larger - They are approximately correct for large enough sample sizes otherwise

Exercise 18.29: Avandia (rosiglitazone maleate) is an oral antidiabetic drug produced by the pharmaceutical company GlaxoSmithKline - Before a drug can be prescribed, we must know how the body absorbs and excretes it Patients were given a single dose of either 1 milligram (mg) or 2 mg of rosiglitazone maleate, and the maximum plasma concentration of the drug (in nanograms per milliliter, ng/ml) was assessed Treatment 1 mg - n = 32 - M = 76 - s = 13 2 mg - n = 32 - M = 156 - s = 42 Is there significant evidence that maximum plasma concentration is dose-dependent (that is, higher doses result in higher concentrations)?

H₀: μ₁ = μ₂ - Hₐ: μ₁ < μ₂ - t = −10.29 - P (df = 36.89) = 1.07 × 10^(−12)

Exercise 18.37: The initial pool of volunteers for the study described in Exercise 18.35 was larger - A few individuals dropped out for various reasons after the study started Here are the retention rates for all three groups: Acupuncture 91% Sham 94% Waitlist 84% Comparatively more patients dropped out of the study in the waitlist group - This could be just chance, or maybe individuals with fewer migraines lost patience with being left on the wait list for several weeks, or maybe those who least believed in the effectiveness of acupuncture decided to drop out How could this fact affect the conclusions of the study?

If those who had fewer migraines dropped out the most, then those remaining would have a higher migraine rate - Since the wait list had the highest dropout rates, this would imply that the wait list group would have a disproportional increase in migraines - If those who believed least in the acupuncture dropped out, the remaining subjects might feel that only acupuncture can help them

Avoid Inference about Standard Deviations

In a two-sample setting with quantitative data, we may wish to compare either the centers or the spreads of the two groups - In a Normal population, we measure the center by the mean and the spread by the standard deviation - We use the t procedures for inference about population means for Normal populations, and we know that t procedures are widely useful for non-Normal populations as well - It is natural to turn next to inference about the standard deviations of Normal populations Our advice here is short and clear: - Don't do it without expert advice There are methods for inference about the standard deviations of Normal populations - The most common such method is the F test for comparing the spread of two Normal populations - Unlike the t procedures for means, the F test is extremely sensitive to non-Normal distributions - This lack of robustness does not improve in large samples - It is difficult in practice to tell whether a significant test result is evidence of unequal population spreads or simply a sign that the populations are not Normal The deeper difficulty underlying the very poor robustness of Normal population procedures for inference about spread has already been seen in our work on describing data - The standard deviation is a natural measure of spread for Normal distributions, but not for distributions in general - In fact, because skewed distributions have unequally spread tails, no single numerical measure does a good job of describing the spread of a skewed distribution In summary, the standard deviation is not always a useful parameter, and even when it is (for symmetric distributions), the results of inference are not always trustworthy - Consequently, we do not recommend trying to perform inference about population standard deviations in basic statistical practice

Avoid the Pooled Two-Sample t Procedures

Most software packages, including those illustrated in Figure 18.5, offer a choice of two-sample t statistics - One is often labeled for "unequal" variances, and the other for "equal" variances The "unequal" variance procedure is our two-sample t - This test is valid whether or not the population variances are equal The other choice is a special version of the two-sample t statistic that assumes the two populations have the same variance - It averages (the statistical term is "pools") the two sample variances to estimate the common population variance - The resulting statistic is called the pooled two-sample t statistic - It is calculated as: • t(pooled) = M₁ - M₂/[s(pooled)](√(1/n₁) + (1/n₂)) • s(pooled) = (√(n₁ - 1)s₁² + (n₂ - 1)s₂²)/(n₁ + n₂ - 2) - Where s(pooled)² is the pooled sample variance - The pooled t statistic (which is directly related to the ANOVA procedure we will discuss in CH24) is equal to our t statistic if the two sample sizes are the same, but not otherwise We could choose to use the pooled t for tests and confidence intervals - The pooled t statistic has exactly the t distribution with n₁ + n₂ - 2 degrees of freedom if the two population variances really are equal and the population distributions are exactly Normal - Of course, in the real world, distributions are not exactly Normal, and population variances are not exactly equal The requirement for equal population variances is one more test assumption to check, and simulations show that the "unequal" variances t procedures are almost always more accurate than the pooled procedures Our advice: - Always use the t procedures for "unequal" variances if you can

Ex 18.2: STATE: People gain weight when they take in more energy from food than they expend - James Levine and his collaborators at the Mayo Clinic investigated the link between obesity and energy spent on daily activity Choose 20 healthy volunteers who don't exercise - Deliberately choose 10 who are lean and 10 who are mildly obese but still healthy - Attach sensors that monitor the subjects' every move for 10 days Table 18.1 presents data on the time (in minutes per day) that the subjects spent standing or walking, sitting, and lying down Grp Subj Stand/walk Sit Lie Lean 1 511.100 370.300 555.500 Lean 2 607.925 374.512 450.650 Lean 3 319.212 582.138 537.362 Lean 4 584.644 357.144 489.269 Lean 5 578.869 348.994 514.081 Lean 6 543.388 585.312 506.500 Lean 7 677.188 268.188 467.700 Lean 8 555.656 322.219 567.006 Lean 9 374.831 537.031 531.431 Lean 10 504.700 528.838 396.962 Obese 11 260.244 646.281 521.044 Obese 12 464.756 456.644 514.931 Obese 13 367.138 578.662 563.300 Obese 14 413.667 463.333 532.208 Obese 15 347.375 567.556 504.931 Obese 16 416.531 567.556 448.856 Obese 17 358.650 621.262 460.550 Obese 18 267.344 646.181 509.981 Obese 19 410.631 572.769 448.706 Obese 20 426.356 591.369 412.919 Do lean and obese people differ in the average time they spend standing and walking?

PLAN: Examine the data, check the conditions for inference, and carry out a test of hypotheses - We suspect in advance that lean subjects (Group 1) are more active than obese subjects (Group 2), so we test the hypotheses • H₀: μ₁ = μ₂ (μ₁ - μ₂ = 0) • Hₐ: μ₁ > μ₂ (μ₁ - μ₂ > 0) SOLVE: CONDITIONS FOR INFERENCE - Are the conditions for inference met? - The subjects are volunteers, so the groups are not SRSs from all lean and mildly obese adults - The study tried to recruit comparable groups: • All worked in sedentary jobs, none smoked or took medication, and so on - Setting clear standards like these helps make up for the fact that we can't reasonably get SRSs for so invasive a study - The subjects were not told that they were chosen from a larger group of volunteers because they did not exercise and were either lean or mildly obese - Because their willingness to volunteer wasn't related to the purpose of the experiment, we will treat them as two independent SRSs - A dotplot of the two groups (Figure 18.1) displays the data in detail - The distributions are a bit irregular, as we expect with just 10 observations - There are no clear departures from Normality such as extreme outliers or skewness - This is confirmed in a Normal quantile plot (not shown)

Ex 18.5: STATE: Infection of chickens with the avian flu is a threat to both poultry production and human health - A research team created transgenic chickens resistant to avian flu infection - Could the modification affect chickens in other ways? - The researchers compared the hatching weights (in grams, g) of 45 transgenic chickens and 54 independently selected commercial chickens of the same breed The data are displayed in Table 18.2: Transgenic 38.8 39.0 39.7 40.0 40.8 40.9 41.0 41.0 41.0 42.5 42.6 43.0 43.0 43.4 43.5 43.5 43.8 44.4 44.7 44.7 44.7 45.3 45.7 45.8 46.4 46.5 46.6 46.7 46.7 46.8 46.9 47.1 47.1 47.1 47.3 47.6 47.7 48.1 48.3 49.3 49.3 49.8 50.3 50.9 52.1 Commercial 36.7 37.1 38.9 39.5 39.5 39.8 40.0 40.2 40.3 40.5 40.5 40.7 41.1 41.2 41.5 41.5 41.6 41.6 41.7 42.4 43.1 43.3 43.3 43.4 43.7 44.1 44.2 45.2 45.3 45.4 46.0 46.1 46.4 46.6 46.6 46.9 47.3 47.5 48.1 48.2 48.4 48.6 49.0 49.1 49.3 49.6 50.1 50.2 50.4 50.6 52.2 53.0 55.5 56.4

PLAN: We will examine the data and check the conditions for inference - We had no specific direction for the difference in weight before looking at the data, so the alternative should be two-sided - We will test the hypotheses: • H₀: μ₁ = μ₂ (μ₁ - μ₂ = 0) • Hₐ: μ₁ ≠ μ₂ (μ₁ - μ₂ ≠ 0) SOLVE: The two samples can be regarded as SRSs from two populations, although the population of transgenic chickens is a potential population not yet created - Figure 18.4 shows side-by-side boxplots of the two data sets - We find no outliers (which would appear as asterisks on the graph) or skew, and we can reasonably assume a Normal distribution model for both groups - Thus, the t procedures should be valid Figure 18.5 shows the Minitab output for a two-sample t test with a two-sided alternative - The mean hatching weights are 45.14 g for the sample of transgenic chickens and 44.99 g for the sample of commercial chickens - The test statistic is t = 0.19 and the corresponding two-sided P-value is 0.847, which is obviously not statistically significant CONCLUDE: The data give no significant evidence (P = 0.85) that the population of transgenic chickens differs in mean hatching weight from the population of commercial chickens of the same breed Here are the calculations involved in the two-sample t test: Summary statistics (also shown in the Minitab output): Grp Condition n M s 1 Transgenic 45 45.14 3.32 2 Commercial 54 44.99 4.57 Degrees of freedom of the t distribution: - df = [(s₁²/n₁) + (s₂²/n₂)]²/{[(1/n₁ - 1)(s₁²/n₁)]² + [(1/n₂ - 1)(s₂²/n₂)]²} - [(3.32²/45) + (4.57²/54)]²/{[(1/44)(3.32²/45)]² + [(1/53)(4.57²/54)]²} - 95.3 Two-sample t statistic: - t = (M₁ - M₂)/√((s₁²/n₁) + (s₂²/n₂)) - (45.14 - 44.99)/√((3.32²/45) + (4.57²/54)) - 0.15/0.795 - 0.19 P-value: - Probability, if the null hypothesis was true, of obtaining a test statistic at least as extreme in the direction of the alternative as the value computed - This is the probability under the t(95.3) distribution of getting a t value either greater than 0.19 or less than -0.19 (because our alternative is two-sided) - As Figure 18.6 illustrates, the two corresponding areas have the same size so we can compute, for example, just the upper-tail area and then double it - In Excel (using a t value rounded to two decimal places), the command =2*T.DIST.RT(0.19,95.3) returns the value 0.84971 - Minitab, which does not round t but rounds df to 95 gives P = 0.847 - Again, these differences are very minor If no statistical software is available, Table C can be used to assess the order of magnitude of the test P-value - As there is no df = 95 in Table C, we select the nearest lower value available, df = 80 (it is more conservative and typically preferable to select a smaller value) - Compare t = 0.19 with the values listed in the row for df = 80 and find the corresponding two-sided P - This tells us that the P-value for our test is greater than 0.50 A common mistake we discussed in CH15 (pg. 386) is to believe that a large P-value offers strong support for the null hypothesis - A large P-value says only that the data would not be surprising if H₀ was true, but it doesn't imply that H₀ is necessarily true - In fact, the Minitab output in Figure 18.5 reports the range -1.424 g to 1.731 g as a 95% confidence interval for μ₁ - μ₂ Therefore μ₁ - μ₂ could very well be a negative value, a positive value, or even zero: - Perhaps the commercial chickens are heavier at birth on average, or perhaps the transgenic chickens are heavier on average - Perhaps the genetic modification has no impact on hatching weight and the two populations have exactly the same mean hatching weight A P-value that isn't small is simply inconclusive, no matter how large it is

Apply Your Knowledge 18.3: Identify the response variable and whether the setting involves (1) a single sample, (2) matched pairs, or (3) two independent samples - The procedures of Chapter 17 apply to designs (1) and (2) To check a new analytical method, a chemist obtains a reference specimen of known concentration from the National Institute of Standards and Technology - She then makes 20 measurements of the concentration of this specimen with the new method and checks for bias by comparing the mean result with the known concentration

Response Variable - Concentration Setting - Single Sample

Apply Your Knowledge 18.1: Identify the response variable and whether the setting involves (1) a single sample, (2) matched pairs, or (3) two independent samples - The procedures of Chapter 17 apply to designs (1) and (2) The physiological blind spot refers to a very small zone of functional blindness in the eye where the optic nerve passes through the retina - We do not notice it because our nervous system compensates for it - Can eye training reduce the size of a person's physiological blind spot? Researchers recruited a representative sample of 10 adults with normal vision - Each participant performed training exercises with one eye for three weeks - The size of the physiological blind spot was measured (in degrees of visual angle squared) with a motion detection task both prior to training and again after the training was completed

Response Variable - Blind spot size Setting - Matched Pairs

Ex 18.3: We can now complete Ex 18.2 SOLVE: INFERENCE Software will find the summary statistics for both samples, the test statistic t, and the P-value, as shown in Figure 18.2. - The sample means, in particular, are M₁ = 525.75 and M₂ = 373.2 minutes, for a difference M₁ - M₂ = 152.48 minutes - The test statistic is = 3.81, rounded to two decimal places - The one-sided P-value is reported as 0.0008 (Minitab rounds all P-values to three decimal places, so it gives P = 0.001)

The computations performed by software in this example involve the following steps: Summary statistics: Grp n Mean M Std. dev. s Grp 1 (lean) 10 525.751 107.121 Grp 2 (obese) 10 373.269 67.498 Degrees of freedom of the t distribution: - df = [(s₁²/n₁) + (s₂²/n₂)]²/{[(1/n₁ - 1)(s₁²/n₁)]² + [(1/n₂ - 1)(s₂²/n₂)]²} - [(107.121²/10) + (67.498²/10)]²/{[(1/10 - 1)(107.121²/10)]² + [1/10 - 1)(67.498²/10)]²} - (1147.491 + 455.598)²/[(1147.491²/9) + (455.598²/9)] - 2,569,894/169,367.2 - 15.1735 ≈ 15 Two-sample t statistic comparing the average minutes spent standing and walking in Group 1 (lean) and Group 2 (obese): - t = (M₁ - M₂)/√((s₁²/n₁) + (s₂²/n₂)) - (525.751 - 373.269)/[√((107.121²/10) + (67.498²/10))] - 152.482/(√1147.491 + 455.598) - 152.482/40.039 - 3.808 P-value: - As shown in Figure 18.3, this is the area to the right of t = 3.808 under the t(15) distribution, because the alternative hypothesis is one-sided in the greater-than direction - The TI-83 calculator command tcdf(3.808, 1E99,15.17) gives a P-value of 0.00084, whereas tcdf(3.808,1E99, 15) gives a P-value of 0.00086 CONCLUDE: - There is very strong evidence (P = 0.0008) that, on average, lean people spend more time walking and standing than do moderately obese people - In the study, the average daily walking and standing times were roughly 526 minutes for the lean group and 373 minutes for the moderately obese group Note that the two-sample t procedures we have described work whether or not the two populations have the same variance The findings come from an observational study, which means that potential confounding variables prevent us from reaching a causal conclusion - For instance, perhaps people who gain weight are so depressed by the experience that they become less active - Or perhaps other factors, such as people's metabolism or their environment, influence both their activity level and their weight - It is impossible from this study alone to determine what causes the observed association between body type and daily activity The P-value 0.0008 is highly statistically significant - This doesn't imply that μ(lean) is much larger than μ(obese), only that the evidence that μ(lean) is larger than μ(obese) is very strong - Obtaining a confidence interval for μ(lean) - μ(obese) is a better way to assess how much more lean people stand and walk, on average, compared with mildly obese people

Two-Sample Problems

The goal of inference is to compare the population responses to two treatments or to compare the characteristics of two populations - Using TWO DISTINCT SAMPLES OF UNRELATED INDIVIDUALS to COMPARE TWO POPULATIONS OR TWO TREATMENTS We have a SEPARATE SAMPLE FOR EACH treatment/population

Robustness of Two-Sample Procedures

The two-sample t procedures are more robust than the one-sample t methods, particularly when the distributions are not symmetric When the sizes of the two samples are equal and the two populations being compared have distributions with similar shapes, probability values from the t density curve are quite accurate for a broad range of population distributions when the sample sizes are as small as n₁ = n₂ = 5 When the two populations have different shapes, larger samples are needed As a guide to practice, we can adapt the guidelines given on pg. 435 for the use of one-sample t procedures to two-sample procedures by replacing "sample size" with "sum of the sample sizes," n₁ + n₂ - These guidelines err on the side of safety, especially when the two samples are of equal size - In planning a two-sample study, choose equal sample sizes whenever possible - The two-sample t procedures are most robust against non-Normality in this case

Two-Sample t-Procedures in Depth

The values M₁ and M₂ are sample statistics - Therefore, the difference M₁ - M₂ we actually compute is specific to the two independent random samples we selected - A different set of independent random samples would most likely yield a different value for M₁ - M₂ - How much the difference M₁ - M₂ can vary from sample to sample is given by its sampling distribution It can be shown mathematically that the distribution of the difference between two independent random variables has: - A mean equal to the difference of their respective means - A variance equal to the sum of their respective variances In CH14 we saw that the sampling distributions of M₁ and M₂ have means μ₁ and μ₂ and standard deviations σ₁/√n₁ and σ₂/√n₂, respectively Therefore, the center of the sampling distribution of M₁ - M₂ is: - μ₁ - μ₂ - (Making M₁ - M₂ an unbiased estimate of μ₁ - μ₂) The variance of the sampling distribution of M₁ - M₂ is: - (σ₁²/n₁) + (σ₂²/n₂) Therefore its standard deviation is: - √((σ₁²/n₁) + (σ₂²/n₂)) - This standard deviation is SMALLER when the SAMPLE SIZES n₁ and n₂ are INCREASED When both populations are Normally distributed, the sampling distributions of M₁ and M₂ will also both be Normally distributed, and so will the sampling distribution of their difference M₁ - M₂ Because we typically don't know σ₁ and σ₂, we estimate them by the sample standard deviations SD₁ and SD₂ The result is the STANDARD ERROR, or ESTIMATED STANDARD DEVIATION, of the difference in sample means: - SE = √((s₁²/n₁) + (s₂²/n₂)) When we standardize the estimate M₁ - M₂, we get: - t = ((M₁ - M₂) - (μ₁ - μ₂))/SE - This TWO-SAMPLE T STATISTIC has approximately a t distribution - It does not have exactly a t distribution even if the populations are both exactly Normal - In practice, however, the approximation is very accurate

Two-Sample df

Unlike in the one-sample situation, the degrees of freedom of the two-sample t distribution depends on the variability of the sample data as well as the sample sizes The degrees of freedom, df, is generally not a whole number The exact value is always at least as large as the smaller of n₁ - 1 and n₂ - 1 and at most equal to (n₁ + n₂) - 2 There is a t distribution for any positive degrees of freedom, even though some statistical software and Table C at the back of the book provide degrees of freedom only as whole numbers - The impact of rounding is typically minor enough to be of no concern As with the one-sample situation in CH17, we can use the properties of the distribution of the two-sample t statistic to: - Obtain a confidence interval about the unknown parameter value μ₁ - μ₂ - Test a set of null and alternative hypotheses about the unknown parameter value μ₁ - μ₂ - The logic and interpretation are similar

Ex 18.4: We want to estimate with 95% confidence the difference in mean daily time spent walking or standing between the two populations of lean people and mildly obese people - We already checked the conditions for inference in Ex 18.2 As shown in Figure 18.2, JMP provides a 95% confidence interval for μ₁ - μ₂ based on this study: - "Confidence 0.95," "Lower CL Dif 67.227," and "Upper CL Dif 237.737" - That is, the 95% confidence interval for μ₁ - μ₂ is (67.227, 237.737)

We are 95% confident that, on average, lean people stand and walk between 67 and 238 minutes more daily than mildly obese people do Most of the computations required for this confidence interval were already performed for Ex 18.3: - Unbiased estimate: M₁ - M₂ = 152.482 - Standard error: SE = 40.039 - Degrees of freedom: df = 15.17 ≈ 15 The following computations are now needed: - Critical value for a 95% confidence level: under t(15): t* = 2.131 (rounded to three decimal places, obtained from the Excel output "t Critical two-tail" or Table C) - Margin of error: m = t*(SE) = 2.131(40.039) = 85.323 - Confidence interval, computed as estimate ±m: 152.482 ± 85.323, or 67.159 to 237.805 minutes Note that for df = 15.17, t* ≈ 2.129, as given by the TI-84 command invT(0.975,15.17) - This explains why JMP provides the slightly narrower interval (67.227, 237.737) The confidence interval (67, 238) is quite wide because the samples are small and the variation among individuals, as measured by the two sample standard deviations, is large - The variability of individuals is just a natural property of these populations and is beyond our control - Collecting data from substantially larger samples, however, would have provided a narrower, more informative confidence interval - Thinking about sample size early on is an important part of designing any study The difference in mean daily standing/walking times between the two populations is on the order of one to four hours - Any value in this range would be fairly substantial considering that the variable is a measure of daily activity - The practical relevance of such a difference in population means is a separate issue, which would need to be assessed by experts in terms of its impact on human physiology and health

Apply Your Knowledge 18.9: The Zika virus, which is known to be transmitted by mosquitoes, can also be transmitted sexually, as the virus can persist for weeks in the semen of symptom-free infected men Using a mouse model, researchers examined the impact of Zika on male reproductive organs - Young adult male lab mice of a breed susceptible to Zika were either inoculated with Zika or not Here are the weights of their testes two weeks later (in grams, g): Uninfected 85 85 90 91 96 100 Infected 17 93 15 12 10 9 10 12 a. What two populations do the researchers want to compare and what is the response variable? - State the null and alternative hypotheses b. Plot the data and check the conditions for inference - Explain why a two-sample t test is not appropriate here

a) A standard t distribution is always wider than the standard Normal distribution - So, we can tell right away that the value t = −0.771 is so small that it would give a large, nonsignificant P-value b) Echinacea: - n = 52 - M = 13.21 - s = 1.91 × √52 = 13.77 Placebo: - n = 103 - M = 15.05 - s = 1.43 × √103 = 14.51 Software gives: - t = −0.771 - df = 107.3 - P = 0.22 (one-sided)

Apply Your Knowledge 18.5: Some individuals have the ability to recall accurately vast amounts of autobiographical information without mnemonic tricks or extra practice - This ability is called Highly Superior Autobiographical Memory (HSAM) A study recruited adults with confirmed HSAM and control individuals of similar age without HSAM - All study participants were given a battery of cognitive and behavioral tests with the goal of finding out how this extraordinary ability works Here are the participants' results for a visual memory test: HSAM 4 4 5 9 6 6 7 6 8 5 Control 5 1 2 4 2 2 3 3 3 9 4 2 4 5 5 1 6 4 a. Following is the Minitab output for the two-sample t test N M SD SE Mean HSAM 10 6.00 1.63 0.52 CON 18 3.61 1.97 0.47 Difference = μ(HSAM) - μ(CON) Estimate for difference = 2.389 T-test of difference = 0 (vs >): T-value = 3.44 P-value = 0.001 DF = 21 - What are M and s for each of the two samples? - Starting from these values, obtain the two standard errors of the means and the t test statistic - Your work should agree with the output b. Do individuals with HSAM have significantly greater abilities on visual memory tests than typical individuals? - Summarize the findings in a sentence or two, including t, df, P, and a conclusion, as if you were preparing a report for publication

a) For the HSAM group: - M = 6.00 - s = 1.63 - SE = 1.63/√10 = 0.52 For the control group: - M = 3.61 - s = 1.97 - SE = 1.97/√18 = 0.47 t = (6.00 − 3.61)/√(0.5² + 0.4²) = 3.44 b) There is significant evidence (t = 3.44, P = 0.001, df = 21) that individuals with HSAM have higher visual memory test scores than typical individuals, on average

Exercise 18.31: A study compared tooth health and periodontal damage in a group of 46 young adult males with a tongue piercing and a control group of 46 young adult males without tongue piercing - One question of interest was whether individuals with tongue piercing had more enamel cracks, on average Here are the summary statistics: Tongue piercing - n = 46 - M = 4.0 - s = 3.5 No tongue piercing - n = 46 - M = 1.2 - s = 1.3 a. Is there evidence at the 1% level that young adult males with a tongue piercing have significantly more enamel cracks? - State hypotheses in terms of the two population means, obtain the two-sample t statistic and P-value, and conclude in context b. Can you conclude that tongue piercing causes more enamel cracks? - Explain your reasoning

a) H₀: μ(TP) = μ(no) - Hₐ: μ(TP) > μ(no) - t = 5.09 - df = 57 - P = 0.000002 < 0.01, and highly statistically significant - There is very strong evidence that young adult males with tongue piercing have more enamel cracks, on average, than similar individuals with no tongue piercing b) No; this is an observational study, so causation cannot be established

Exercise 18.35: Acupuncture is widely used to prevent migraine attacks, but does it work? A study investigated the effectiveness of acupuncture compared with sham acupuncture and with no acupuncture (patients left on the wait list) in individuals with migraines - Patients were randomly assigned to one of the three "treatments" for a period of 8 weeks, after which they monitored their migraine attacks over the next 3 weeks Here are summary data for the number of days in which each patient used pain medication for migraines or headaches during the monitoring period: Acupuncture - n = 132 - M = 3.2 - s = 3.0 Sham - n = 76 - M = 3.4 - s = 2.9 Waitlist - n = 64 - M = 4.4 - s = 3.6 a. Did patients who received the acupuncture treatment have significantly fewer pain medication days than patients who stayed on the wait list? - Perform the appropriate hypothesis test and conclude in context b. Give a 95% confidence interval for the mean difference - Describe the effect size - Would you consider recommending acupuncture for the preventive treatment of headaches?

a) H₀: μ(a) = μ(w) - Hₐ: μ(a) < μ(w) - t = −2.31 - P (df = 106.75) = 0.0115 - On average, people receiving acupuncture had fewer migraines than those on the wait list b) 1.2 ± 1.03 - On the low end the reduction is almost nothing, but on the high end it is about 50%

Exercise 18.41: A randomized, double-blind experiment studied whether magnetic fields applied over a painful area can reduce pain intensity - The subjects were 50 volunteers with postpolio syndrome who reported muscular or arthritic pain - The pain level when pressing a painful area was graded subjectively on a scale from 0 to 10 (where 0 is no pain and 10 is maximum pain) - Patients were randomly assigned to wear either a magnetic device or a placebo device over the painful area for 45 minutes Here is a summary of the pain scores for this experiment, expressed as means ± standard deviations: Magnetic device (n = 29) Placebo device (n = 21) Pretreatment 9.6 ± 0.7 9.5 ± 0.8 Post-treatment 4.4 ± 3.1 8.4 ± 1.8 Change 5.2 ± 3.2 1.1 ± 1.6 Conclusions from any study should be more comprehensive than a simple statement of significance or a confidence interval a. The random assignment of subjects to treatments can sometimes lead, by chance, to an unbalanced split of subjects - Is there evidence of a significant difference in mean pain scores between the two groups at the beginning of the experiment? b. You have used summary data from this experiment for your inference - What would you like to know about the raw data to support the legitimacy of your statistical results?

a) H₀: μ(m) = μ(p) - Hₐ: μ(m) ≠ μ(p) - t = 0.4594 - P (df = 39.6) = 0.6484 b) Were the data skewed or bimodal? Were there any outliers?

Exercise 18.25: Bird songs have been hypothesized to be a secondary sexual character signaling an individual's health status Researchers designed an experiment in which they randomly assigned male collared flycatchers (Ficedula albicollis) to two groups: - One group received an immune challenge in the form of an injection of sheep red blood cells - The other group received a placebo injection Here are the changes in song rate (in strophes per minute) after the injection for 15 male collared flycatchers in the immune-challenge group and 12 males in the placebo group: Immune challenge − 1.6 − 3.1 − 2.7 − 3.7 − 3.1 − 3.6 − 1.9 − 1.5 − 0.1 0.8 − 0.1 − 0.2 − 1.2 − 1.9 0.2 Placebo − 1.5 1.7 0.4 − 1.8 0.0 0.4 0.8 2.0 0.0 − 2.4 − 1.5 − 0.1 a. Make dotplots to investigate the shape of the distributions - Is the use of a two-sample t procedure appropriate? b. Do the data provide significant evidence that an immune challenge reduces the male song rate, on average, more than a placebo injection does? - Obtain the test statistic, degrees of freedom, and P-value, and state your conclusion - Does the study support the hypothesis that male bird songs advertise the male's health status?

a) Plotting the data reveals no major causes for concern b) t = 2.57 - df = 24 - P = 0.008 (one-sided), statistically significant

Exercise 18.27: The body's natural electrical field helps wounds heal - If diabetes changes this field, that might explain why people with diabetes heal slowly A study of this idea compared normal mice and mice bred to spontaneously develop diabetes - The investigators attached sensors to the right hip and front feet of the mice and measured the difference in electrical potential (millivolts) between these locations Here are the data: Diabetic mice 13.60 7.40 1.05 10.55 16.40 22.60 15.20 19.60 17.25 18.40 11.70 14.85 14.45 18.25 10.15 10.30 10.45 8.55 8.85 19.20 10.00 9.80 10.85 14.70 Normal mice 13.80 9.10 4.95 7.70 7.20 10.00 14.55 13.30 9.50 10.40 7.75 8.70 8.40 8.55 12.60 9.40 6.65 8.85 a. Make a dotplot of each sample of potentials - There is a low outlier in the diabetic group - Does it appear that potentials in the two groups differ in a systematic way? b. Is there significant evidence of a difference in mean potentials between the two groups? c. Repeat your inference without the outlier - Does the outlier affect your conclusion?

a) The diabetic potentials appear to be large b) H₀: μ(D) = μ(N) - Hₐ: μ(D) ≠ μ(N) - t = 3.077 - P = 0.0039 (df = 36.60) c) t = 3.841 - P = 0.0005 (df = 37.15)

Exercise 18.21: The central tail feathers of the long-tailed finch (Poephila acuticauda) are a sexually dimorphic trait hypothesized to play a role in sexual selection - Longer tail feathers in males cost energy to produce, and this is thought to signal the male's excellent health condition Here are the lengths of the central tail feathers (average of the two central feathers, in millimeters) of 20 male and 21 female long-tailed finches: Males 87 77 95 73 74 85 56 86 95 108 75 87 73.5 82 89 64 74.5 87 85 86 Females 60 59 72 54 65 58 59 60 65 60 68 70.5 80 87 65 59 65 67 62 66 70 a. Treat these data as SRSs from the population of adult long-tailed finches in the wild - Make dotplots of both data sets and determine whether the use of a two-sample t procedure is appropriate b. How much longer, on average, are the central tail feathers of male long-tailed finches than those of the females? - Give a 95% confidence interval for the difference in population mean length between the male and the female adult long-tailed finches

a) The female data are a bit skewed, but the combined sample size is large enough b) 10.4 to 22.9 mm (df = 32)

Ex 18.1: a) Researchers used an animal model to examine the long-term impact of exposure to triclosan, a broad-spectrum antimicrobial agent commonly added to soaps - They randomly assigned mice to diets containing either 0.08% triclosan or no triclosan for 8 months, then compared the liver weights in each group b) A field biologist observes gender-based behavior in wild chimpanzees - 12 randomly chosen young chimpanzees are tagged remotely with a dart and their behavior is monitored - The amount of time each young chimpanzee spends in contact with its mother is recorded - The biologist then compares the amount of time spent in contact with the mother by young male chimpanzees and by young female chimpanzees c) What was the role of vaccination history in the California pertussis (whooping cough) epidemic of 2010? - Researchers selected a random sample of 682 medical records of California children who had been diagnosed with pertussis and a control random sample of 2016 medical records of California children in the same age group who had received care from the same clinicians on the same day but were not diagnosed with pertussis - The researchers then compared the length of time since vaccination in both groups

a) The response variable is the liver weight (quantitative) and the two-level factor is triclosan exposure (0.08% or none) b) The response variable is time spent with the mother (quantitative) and the two-level factor is the sex of the young chimpanzee (male or female) c) The response variable is the length of time since vaccination (quantitative) and the two-level factor is pertussis diagnosis in 2010 (yes or no)

Exercise 18.39: The National Toxicology Program evaluates the toxicity of chemicals found in manufacturing, in consumer products, or in the environment after disposal - Toxicity is assessed through a battery of tests Here are some results from a study of the toxicity of black newsprint ink in 7-week-old female rats - The rats' fur was locally clipped twice a week for 13 weeks - One group of rats received a dermal application of ink right after each clipping, and a control group of rats was left untreated Table 18.5 shows the body weights (in grams) of female rats at the beginning of the study and at the end of the 13 weeks Control Group Treatment Group Week 0 Week 13 Week 0 Week 13 111.2 191.6 107.3 187.0 105.4 191.2 116.7 189.5 110.8 210.7 112.2 179.2 105.6 185.2 103.4 172.2 106.1 195.0 113.2 178.7 104.4 188.3 110.6 180.9 114.0 188.4 110.6 188.3 115.1 195.6 100.5 188.9 109.2 204.6 106.3 183.1 111.3 195.7 112.5 184.5 a. Verify that the two experimental groups are not significantly different at the beginning of the study b. Is there good evidence that ink application impairs growth in female rats between 7 and 20 weeks of age? - Estimate the difference between the mean weight gains of the two populations - (Use a 95% confidence level)

a) Two-sample t test (dependent) - P = 0.992 b) Compute the weight difference for each group, then find a two-sample confidence interval: - 11.42 ± 6.88 with df rounded to 17 - The control group gained significantly more weight than the group receiving ink applications - The ink appears to be toxic

Exercise 18.23: Here are the IQ test scores of 31 seventh-grade girls in a midwestern school district: 114 100 104 89 102 91 114 114 103 105 108 130 120 132 111 128 118 119 86 72 111 103 74 112 107 103 98 96 112 112 93 The IQ test scores of 47 seventh-grade boys in the same district are as follows: 111 107 100 107 115 111 97 112 104 106 113 109 113 128 128 118 113 124 127 136 106 123 124 126 116 127 119 97 102 110 120 103 115 93 123 79 119 110 110 107 105 105 110 77 90 114 106 a. Make dotplots or histograms of both sets of data - Because the distributions are reasonably symmetric with no extreme outliers, the t procedures will work well b. Treat these data as SRSs from all seventh-grade students in the district - Is there good evidence that girls and boys differ in their mean IQ scores?

b) H₀: μ(G) = μ(B) - Hₐ: μ(G) ≠ μ(B) - t = 1.64 - P = 0.1057, not statistically significant


Set pelajaran terkait

Econ 201- Chapter 1 Quiz and Homework

View Set

Ch. 1 Sec. 2 Examining Your Personality

View Set

Chem exam #3 connect homework part two

View Set

Psychology 6.3 Operant Conditioning terms

View Set

Cost: Chapter 11 HW - Decision Making

View Set

Consumer Behavior CH.7 Study Guide

View Set

Social Science Statistics Exam 2 (chapters 4-6)

View Set

Chapter 10 Making Capital Investment Decisions

View Set