CH2

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Steps for Finding the Median of a Distribution

1. Arrange all observations in order of size, from smallest to largest 2. If the number of observations (n) is ODD, the Median is the CENTER observation in the ordered list - Find the location of the Median by counting (n + 1)/2 observations up from the smallest observation in the list 3. If the number of observations (n) is EVEN, the Median is the MEAN OF THE TWO CENTER OBSERVATIONS in the ordered list - The location of the Median is again (n + 1)/2, counting from the smallest observation in the list Note that the formula (n + 1)/2 does not give the Median, but rather the location of the Median in the ordered list

Calculating Quartiles

1. Arrange the observations in increasing order & locate the Median in the ordered list of observations 2. The First Quartile (Q1) is the Median of the observations whose position in the ordered list is to the LEFT of the location of the overall Median 3. The Third Quartile (Q3) is the Median of the observations whose position in the ordered list is to the RIGHT of the location of the overall Median 4. The Interquartile Range IQR is the distance between the First & Third Quartiles: - IQR = Q3 - Q1

Measures of Center (Central Tendency)

1. Median 2. Mean 3. Mode

Measures of Spread

1. Range 2. Quartiles - Q1 - Q3 - IQR 3. Standard Deviation - Variance

The Properties that Determine the Usefulness of the Sample Standard Deviation:

1. SD measures SPREAD ABOUT THE MEAN & should be used only when the Mean is chosen as the measure of Center - Although this is a gross simplification, you might find it helpful to think of the Standard Deviations as representing roughly the average dispersion, in either direction, of the n data points relative to their Mean 2. SD is always ZERO OR GREATER THAN ZERO - SD = 0 only when there is no spread (the values in the sample are all identical) - Otherwise, SD > 0 - As the observations become more spread out about their Mean, SD gets larger 3. SD has the SAME UNITS OF MEASUREMENT AS the ORIGINAL OBSERVATIONS - Ex: if you measure metabolic rates in kilocalories (cal), both the Mean & Standard Deviation are also in kilocalories - This is one reason to prefer SD to the Variance, which in this case is measured in squared kilocalories (cal²) 4. SD is NOT RESISTANT - Outliers & skew increase the spread of a Distribution &, therefore, also increase SD - The use of squared deviations renders SD even more sensitive than the Mean to a few extreme observations

The 4-Step Process of Organizing a Statistical Problem:

1. STATE - What is the PRACTICAL QUESTION, in the context of the real-world setting? 2. PLAN - What specific STATISTICAL OPERATIONS does this problem call for? 3. SOLVE - ANALYZE the data with GRAPHS & COMPUTATIONS suitable for this problem 4. CONCLUDE - Give your PRACTICAL CONCLUSION in the setting of the real-world problem

Ex 2.11: Some individuals have the ability to recall accurately vast amounts of autobiographical information without mnemonic tricks or extra practice - This ability is called Highly Superior Autobiographical Memory (HSAM) A study recruited 11 adults with confirmed HSAM & 15 control individuals of similar age without HSAM - All study participants were given a battery of cognitive & behavioral tests with the goal of finding out what might explain this extraordinary ability. First, autobiographical memory was assessed by asking each participant to recall in detail 5 important personal events - These events were selected by the researchers, & the participants did not know ahead of time which events they would have to recall - Answer accuracy was then verified from documents & interviews, & each correct detail was scored as one point - Individual total scores on this verifiable autobiographical memory task are displayed below HSAM 22 23 26 26 33 34 38 39 39 46 47 Control 5 5 5 6 7 7 10 10 11 12 13 16 18 22 23 Do individuals with HSAM & without HSAM display distinct Distributions of verifiable autobiographical memory scores? - How do the Mean scores compare? Use the 4-Step Process for Organizing Stastical Problems to answer these questions

1. State - Do individuals with HSAM & without HSAM display distinct Distributions of verifiable autobiographical memory scores? - How do the Mean scores compare? 2. Plan - Use graphs & numerical descriptions to describe & compare these two Distributions of verifiable autobiographical memory scores 3. Solve - Dotplots work best for data sets of these sizes - Figure 2.8 displays stacked Dotplots to facilitate comparison - The control group has a somewhat right-skewed Distribution, so we might choose to compare the Five-Number Summaries - But because the researchers plan to use M & SD for further analysis, we instead calculate these measures: Group Mean score Standard deviation HSAM 33.9 8.8 Control 11.3 6.0 4. Conclude - The two groups differ so much in verifiable autobiographical memory scores that there is little overlap among them - Overall, individuals with HSAM have higher scores than control individuals without HSAM - The Mean scores are 33.9 for the HSAM group compared with only 11.3 for the control group, despite similar variations in individual scores (Standard Deviations 8.8 & 6.0, respectively) - This first test confirms that individuals with HSAM have better recall of autobiographical events than ordinary individuals do - The researchers can now proceed to compare cognitive & behavioral traits in both groups

Apply Your Knowledge 2.13: The study in Example 2.11 assessed common obsessional symptoms in the HSAM and the control individuals, as previous findings had suggested a possible obsessional component to HSAM - The study participants completed the Leyton Obsessional Inventory test, which has a maximum score of 30 points - The results for the 11 HSAM individuals & for 14 of the controls are displayed below (one individual in the control group left before completing this test) HSAM: 2 4 5 8 8 9 9 10 11 11 12 Control: 1 2 2 2 2 4 4 4 5 7 8 8 12 12 How do individuals with & without HSAM compare in terms of obsessional score? - Follow the 4-Step Process in reporting your work

1. State - How do individuals with & without HSAM compare in terms of obsessional score? 2. Plan 3. Solve 4. Conclude - The two Distributions overlap completely but scores tend to be higher, on average, for individuals with HSAM (M = 8.1) than for controls (M = 5.2)

Check Your Skills 2.20: What are all the values that a Standard Deviation (SD) can possibly take? a. 0 ≤ s b. 0 ≤ s ≤ 1 c. -1 ≤ s ≤ 1

A

Check Your Skills 2.23: Which of the following is least affected if an extreme high Outlier is added to your data? a. Median b. Mean c. Standard Deviation

A

Boxplots

A graph of the FIVE-NUMBER SUMMARY - A central box spans the Quartiles Q1 & Q3 - A line in the box marks the Median - Lines extend from the box out to the smallest & largest observations They are best used for SIDE-BY-SIDE COMPARISON OF MORE THAN ONE DISTRIBUTION Be sure to include a numerical scale in the graph When you look at this graph: 1. Locate the Median, which marks the Center of the Distribution 2. Then look at the Spread - The box shows the Spread of the middle half of the data - The extremes (the smallest & largest observations) show the Spread of the entire data set In a symmetric distribution, the first and third quartiles are equally distant from the median. In contrast, in most distributions that are skewed to the right, the third quartile will be farther above the median than the first quartile is below it. In most distributions that are skewed to the left, the first quartile will be farther below the median than the third quartile is above it. The extremes behave the same way, but remember that they are just single observations and may say little about the distribution as a whole.

Error Bars

A graphical representation of the VARIABILITY of data - They are used on graphs to indicate the ERROR or UNCERTAINTY IN A REPORTED MEASUREMENT They often show the STANDARD DEVIATION - But they might also show some other measure of variability (e.g., the Standard Error of the Mean; the Margin of Error

Sample VARIANCE (s², S², SD²)

An AVERAGE OF the SQUARES OF the DEVIATIONS OF the OBSERVATIONS FROM their MEAN - Measures Spread by looking at how far the observations are from their Mean - [∑(X - M)²]/(n - 1) = SS/(n - 1)

Check Your Skills 2.15: 4.8 7 6.2 7.3 6.0 7.3 3.7 9.4 8.4 6.0 5.0 5.1 6.0 7.3 4.7 What is the Median of the data? a. 5.70 b. 6.00 c. 6.28

B

Check Your Skills 2.18: What percent of the observations in a Distribution lie between the First Quartile & the Third Quartile? a. 25% b. 50% c. 75%

B

Check Your Skills 2.21: 4.8 7 6.2 7.3 6.0 7.3 3.7 9.4 8.4 6.0 5.0 5.1 6.0 7.3 4.7 What is the approximate value of the Standard Deviation of the 15 cesium-137 values? - Use your calculator a. 1.47 b. 1.52 c. 2.32

B

Check Your Skills 2.22: What are the correct units for the Standard Deviation of the 15 cesium-137 values in Exercise 2.14? a. No units—it's just a number b. Becquerels per kilogram of dry tissue c. Becquerels squared per square kilogram of dry tissue

B

Exercise 2.35: Every 17 years, swarms of cicadas emerge from the ground in the eastern United States, live for about 6 weeks, then die - There are several "broods," so we see cicada eruptions more often than every 17 years - So many cicadas die that their bodies may serve as fertilizer & increase plant growth In an experiment, a researcher added 10 cicadas under some plants in a natural plot of American bellflowers in a forest, leaving other plants undisturbed - One of the Variables studied was the size of seeds produced by the plants - Here are data (seed mass in milligrams) for 39 plants fertilized with cicadas & 33 undisturbed plants: Cicada plants 0.237 0.277 0.241 0.142 0.109 0.209 0.238 0.277 0.261 0.227 0.171 0.235 0.276 0.234 0.255 0.296 0.239 0.266 0.296 0.217 0.238 0.210 0.295 0.193 0.218 0.263 0.305 0.257 0.351 0.245 0.226 0.276 0.317 0.310 0.223 0.229 0.192 0.201 0.211 Undisturbed plants 0.212 0.188 0.263 0.253 0.261 0.265 0.135 0.170 0.203 0.241 0.257 0.155 0.215 0.285 0.198 0.266 0.178 0.244 0.190 0.212 0.290 0.253 0.249 0.253 0.268 0.190 0.196 0.220 0.246 0.145 0.247 0.140 0.241 Do these data support the idea that dead cicadas can serve as fertilizer? - Follow the 4-Step Process in your work

Both Distributions overlap & have very similar measures of Center (whether assessed by their Mean or by their Median), so there is little reason to believe that the addition of cicadas affects seed mass, on average

Check Your Skills 2.14: Cesium-137 is a waste product of nuclear reactors - A study examined the cesium-137 tissue concentration of a random sample of 15 Pacific bluefin tuna (Thunnus orientalis) captured off the coast of California 4 months after the Fukushima (Japan) nuclear reactor meltdown of 2011 - Here are the findings, in becquerels per kilogram of dry tissue: 4.8 7 6.2 7.3 6.0 7.3 3.7 9.4 8.4 6.0 5.0 5.1 6.0 7.3 4.7 What is the mean of these data? a. 5.70 b. 6.00 c. 6.28

C

Check Your Skills 2.16: 4.8 7 6.2 7.3 6.0 7.3 3.7 9.4 8.4 6.0 5.0 5.1 6.0 7.3 4.7 What is the Interquartile Range (IQR)? a. 1.3 b. 2.0 c. 2.3

C

Check Your Skills 2.17: If a Distribution is clearly skewed to the right... a. The Mean is less than the Median b. The Mean & the Median are equal c. The Mean is greater than the Median

C

Check Your Skills 2.19: To make a Boxplot of a Distribution, you must know... a. All the individual observations b. the Mean & the Standard Deviation c. The Five-Number Summary

C

Ex 2.7: A person's metabolic rate is the rate at which the body consumes energy - Metabolic rate is important in studies of weight gain, dieting, & exercise Here are the metabolic rates of 7 men who took part in a study of dieting - The units are kilocalories (Cal) for a 24-hour period, where kilocalories are the same calories used to describe the energy content of foods 1792 1666 1362 1614 1460 1867 1439 Compute the Mean (M) & Standard Deviation (SD)

Calculate the Mean: M = ∑X/n = (1792 + 1666 + 1362 + 1614 + 1460 + 1867 + 1439)/7 = 11,200/7 = 1600 Cal Calculate the Variance: SD² = ∑(X - M)²/(n - 1) = SS/(n - 1) X X - M (X - M)² 1792 1792 − 1600 = 192 192² = 36,864 1666 1666 − 1600 = 66 66² = 4,356 1362 1362 − 1600 = −238 (−238)² = 56,644 1614 1614 − 1600 = 14 14² = 196 1460 1460 − 1600 = −140 (-140)² = 19,600 1867 1867 − 1600 = 267 267² = 71,289 1439 1439 − 1600 = −161 (-161)² = 25,921 Sum = 0 Sum = 214,870 SD² = 214,870/(7 - 1) = 214,870/6 = 35,811.67 Cal² Calculate the Standard Deviation: SD = √SD² SD = √35,811.67 = 189.24 Cal

The 1.5×IQR Rule for Suspected Outliers

Call an observation a suspected Outlier if it falls more than 1.5×IQR ABOVE the THIRD Quartile or BELOW the FIRST Quartile This is not a replacement for looking at the data - It is most useful when large volumes of data are scanned automatically Some software programs create Modified Boxplots, which display any value outside of the 1.5×IQR interval around either Quartile as individual data points (rather than including them within the long tail going to the minimum or maximum) Examine the data carefully & decide for yourself if a particular data point fits the definition of an Outlier - "An individual value that falls outside the overall pattern" Sometimes Outliers are really obvious, as in Example 2.9, & sometimes they are more ambiguous, as in Example 2.10 - It can be helpful to use words such as "mild" or "moderate" versus "extreme" to describe them - The four largest numbers of flood events may be described, for instance, as possibly mild Outliers within a pronounced right skew - Simply be honest & describe your findings as they are

Five-Number Summary

Consists of the SMALLEST observation, the FIRST QUARTILE, the MEDIAN, the THIRD QUARTILE, & the LARGEST observation, written in order from smallest to largest - They offer a reasonably complete description of a data set's CENTER & SPREAD - Minimum Q1 Med Q3 Maximum

Exercise 2.33: This is a Standard Deviation contest - You must choose 4 numbers from the whole numbers 0 to 10, with repeats allowed a. Choose 4 numbers that have the smallest possible Standard Deviation b. Choose 4 numbers that have the largest possible Standard Deviation c. Is more than one choice possible in either a or b? - Explain

a. Any set of 4 identical numbers b. 0, 0, 10, 10 is the only possible answer

Degrees of Freedom (df)

Determines the NUMBER OF SCORES in the Sample that are INDEPENDENT & FREE TO VARY - N-1

The Formula (n + 1)/2...

Does not give the Median itself, but rather the LOCATION OF THE MEDIAN in the ordered list

Exercise 2.37: "Conservationists have despaired over destruction of tropical rainforest by logging, clearing, & burning" - These words begin a report on a statistical study of the effects of logging in Borneo Researchers compared forest plots that had never been logged (Group 1) with similar plots nearby that had been logged 1 year earlier (Group 2) & 8 years earlier (Group 3) - All plots were 0.1 hectare (ha) in area - Here are the counts of trees for plots in each group: Group 1 27 22 29 21 19 33 16 20 24 27 28 19 Group 2 12 12 15 9 20 18 17 14 14 2 17 19 Group 3 18 4 22 15 18 19 22 12 12 To what extent has logging affected the count of trees? - Follow the 4-Step Process in reporting your work

Here, M & SD are reasonable summaries Group 1: - M = 23.75 - SD = 5.07 Group 2: - M = 14.08 - SD = 4.98 Group 3: - M = 15.78 - SD = 5.76 Logging appears to reduce the number of trees per plot

Apply Your Knowledge 2.9: In Exercise 2.1 you plotted the silk yield stress for 21 female golden orb weaver spiders The Five-Number Summary for that data is: - Min = 164.0 - Q1 = 260.9 - Med = 290.7 - Q3 = 354.95 - Max = 740.2 Use the 1.5×IQR rule to identify suspected Outliers

IQR = Q3 - Q1 - 354.95 - 260.9 - 94.05 1.5×IQR Rule - (1.5)(94.05) - 141.075 ≈ 141.08 Q1 - (1.5×IQR) - 260.9 - 141.08 - 119.82 Q3 + (1.5×IQR) - 354.95 + 141.08 - 496.03 740.2 is the only Outlier

Ex 2.10: In Example 2.4 we looked at the number of flood events recorded between 2009 & 2013 in each of 52 Atlantic coastal communities The Five-Number Summary for these data is: - 0, 5, 37.5, 73.5, 250 Use the 1.5×IQR Rule & your judgement to identify any Outliers

IQR = Q3 - Q1 - 73.5 - 5 - 68.5 1.5×IQR Rule - (1.5)(68.5) - 102.75 Q1 - (1.5×IQR) - 5 - 102.75 - (-97.75) Q3 + (1.5×IQR) - 73.5 + 102.75 - 176.25 Any values not falling between -97.75 & 176.25 are suspected Outliers We can see from the Dotplot in Figure 2.2 that the four largest values are flagged by this rule - Now we must decide whether they actually are Outliers Figure 2.7 shows a combination Histogram-modified Boxplot created by the statistical software JMP - The 4 data points flagged by the 1.5×IQR Rule are displayed as individual dots rather than as part of the Boxplot's high whisker - A gap appears between these 4 points & the rest of the high whisker, but it is not very large - In fact, the Histogram underneath has a very pronounced right skew without any gap So are the suspected Outliers flagged by the 1.5×IQR Rule actual Outliers? - They are certainly somewhat more extreme than the rest of the Distribution, yet they fit well within the overall right-skew pattern The 4 largest numbers of flood events may be described as possibly mild Outliers within a pronounced right skew

Ex 2.9: For the acorn sizes data, the Five-Number Summary is: - 0.4, 1.6, 4.1, 6.0, 17.1 Use the 1.5×IQR Rule to find any suspected Outliers

IQR = Q3 - Q1 - 6.0 - 1.6 = 4.4 1.5×IQR - (1.5)(4.4) = 6.6 Q1 - (1.5×IQR) - 1.6 - 6.6 = -5.0 Q3 + (1.5×IQR) - 6.0 + 6.6 = 12.6 Any values not falling between -5.0 & 12.6 are flagged as suspected Outliers - In this case, the 1.5×IQR Rule flags only one value in the data set, the largest value of 17.1, & suggests that it may be an Outlier - Looking at the Dotplot of acorn sizes in Figure 2.6, we can confirm that the largest acorn (17.1 cm^3) is indeed an Outlier

Ex 2.8: In Examples 2.1 & 2.2 we examined sample sets of needles from two distinct species of pine trees, Aleppo & Torrey Compare their Boxplots

In Figure 2.4 we can see that the needles in the Torrey pine set are all longer than the needles in the Aleppo pine set: - The minimum length of the Torrey pine set is larger than the maximum length for the Aleppo pine set Torrey pine needle lengths also have greater variability, as shown by the spread of the box & the spread between the extremes Finally, the data for the Torrey pine are symmetrical, whereas the data for the Aleppo pine are mildly right-skewed

Ex 2.3: Find the Mean length of the 15 Aleppo pine needles 7.2 7.6 8.5 8.5 8.7 9.0 9.0 9.3 9.4 9.4 10.2 10.9 11.3 12.1 12.8

M = (∑X)/n = (7.2 + 7.6 + 8.5 + 8.5 + 8.7 + 9.0 + 9.0 + 9.3 + 9.4 + 9.4 + 10.2 + 10.9 + 11.3 + 12.1 + 12.8)/15 = 143.9/15 = 9.59 cm

First & Third Quartiles (Q1 & Q3)

Mark out the CENTRAL HALF of the Distribution

Exercise 2.27: Figure 1.9 (page 20) is a Histogram of the Distribution of age at the onset of anorexia for 691 Canadian girls diagnosed with the disorder - If you round the age to whole numbers of years, the first bar of the Histogram (the first class) would include all girls diagnosed during their 11th year With a little care, you can find the Median & the quartiles from the Histogram - What are these numbers? - How did you find them?

Med = 14 Q1 = 13 Q3 = 15

Find the Five-Number Summary of the 18 Torrey pine needles (Example 2.3): 21.2 21.6 21.7 23.1 23.7 24.2 24.2 25.5 26.6 * 26.8 28.9 29.0 29.7 29.7 30.2 32.5 33.7 33.7

Min Q1 Med Q3 Max - 21.2 23.7 26.7 29.7 33.7

Find the Five-Number Summary of the 15 Aleppo pine needles (Example 2.1): 7.2 7.6 8.5 8.5 8.7 9.0 9.0 9.3 9.4 9.4 10.2 10.9 11.3 12.1 12.8

Min Q1 Med Q3 Max - 7.2 8.5 9.3 10.9 12.8

Statistics

Numerical Summaries that describe a SAMPLE

Parameters

Numerical Summaries that describe an ENTIRE POPULATION

Numerical Summaries for a QUANTITATIVE Variable

Should provide measures of SPREAD & CENTER

Interquartile Range (IQR)

The DISTANCE BETWEEN the FIRST & THIRD QUARTILES - It is mainly used as the basis for a rule of thumb for IDENTIFYING SUSPECTED OUTLIERS - Q1 - Q3

Range

The DISTANCE BETWEEN the MINIMUM & the MAXIMUM VALUES in a Distribution - The difference between the smallest & largest observations - The minimum & the maximum show the full Spread of the data, but they may actually be Outliers

LEFT-Skewed Boxplots

The FIRST Quartile will be farther BELOW the Median than the Third Quartile is above it - Longer section on the LEFT/BOTTOM side

Symmetric Boxplots

The First & Third Quartiles are equally distant from the Median

SECOND Quartile (Q2) =

The MEDIAN - The 50th Percentile - It is larger than 50% of the observations

Median

The MIDPOINT of a Distribution - The NUMBER SUCH THAT HALF THE OBSERVATIONS ARE SMALLER & the other HALF ARE LARGER - Gives an idea of the Center of the "TYPICAL" part of the Distribution Also known as the 50th Percentile It is a RESISTANT MEASURE OF CENTER - It is more ROBUST than the Mean

Mean

The ORDINARY ARITHMETIC AVERAGE - M = (∑X)/n It is very sensitive to extreme values (i.e., Outliers)

Sample STANDARD DEVIATION (s, S, SD)

The SQUARE ROOT OF the SAMPLE VARIANCE - Measures Spread by looking at how far the observations are from their Mean - Much MORE COMPACT than the Sample Variance - It is in the SAME UNITS OF MEASUREMENT AS THE DATA - √s² = √S² = √SD²

RIGHT-Skewed Boxplots

The THIRD Quartile will be farther ABOVE the Median than the First Quartile is below it - Longer section on the RIGHT/TOP side

Quartiles

The VALUES THAT DIVIDE the DATA INTO FOUR EQUAL PARTS - A MORE RESISTANT way to describe the Spread of a Quantitative Variable is to LOOK AT the SPREAD OF the MIDDLE HALF of the data - They are resistant to Outliers Include: - Q1 - Q3 - IQR

Exercise 2.25: Figure 1.8 (page 19) shows the age in years of 241,931 patients diagnosed with Lyme disease in the United States between 1992 & 2006 Give a brief description of the important features of the Distribution - Explain why no Numerical Summary would appropriately describe this Distribution

The data are Bimodal - There is no unique Center &, therefore, no simple Numerical Summary

We also have the lengths (in cm) of 18 needles from trees of the noticeably different Torrey pine species - What is the Median length for these 18 pine needles? - The ordered data are: 21.2 21.6 21.7 23.1 23.7 24.2 24.2 25.5 26.6 26.8 28.9 29.0 29.7 29.7 30.2 32.5 33.7 33.7

There is no unique Center observation, but there is a Center pair - The values 26.6 & 26.8, which have 8 observations before them in the ordered list & 8 observations after them The Median is midway between these two observations: - (n + 1)/2 = (18 + 1)/2 = 19/2 = 9.5 - The Median is "halfway between the 9th & 10th observations in the ordered list"

Mean vs. Median

Unlike the Median, the Mean uses all the values in the data set - Therefore, very small or very large numerical values, such as those found in skews & Outliers, will influence the value of the Mean - Because the Mean cannot resist the influence of extreme observations, we say that it is NOT a resistant measure of center The MEDIAN, in contrast, is a RESISTANT MEASURE OF CENTER because it is influenced only by the total number of data points & the numerical value of the point(s) located at the Center of the Distribution The Mean & the Median of a symmetric Distribution are close together - If the Distribution is exactly symmetric, the Mean & the Median are exactly the same - In a skewed Distribution, the Mean is usually farther out in the long tail than is the Median

Ex 2.6: Here are the lengths of the 18 Torrey pine needles (Example 2.3), arranged in increasing order: 21.2 21.6 21.7 23.1 23.7 24.2 24.2 25.5 26.6 * 26.8 28.9 29.0 29.7 29.7 30.2 32.5 33.7 33.7 Find Q1, Q2, & Q3

We have an even number of observations, so the Median lies midway between the middle pair, the 9th & 10th values in the list - Med = 26.7 cm - Marked by a * The First Quartile is the Median of the first 9 observations because these are the observations to the left of the location of the Median Confirm that: - Q1 = 23.7 cm - Q3 = 29.7 cm

Ex 2.5: Our Sample of 15 Aleppo pine needles (Example 2.1), arranged in increasing order, is: 7.2 7.6 8.5 8.5 8.7 9.0 9.0 9.3 9.4 9.4 10.2 10.9 11.3 12.1 12.8 Find Q1, Q2, & Q3

We have an odd number of observations, so the Median is the middle one, the 9.3 in the list The First Quartile is the Median of the 7 observations to the left of the Median - This is the 4th of these 7 observations, so - Q1 = 8.5 cm If you want, you can use the formulation for locating the Median with n = 7: - Q1 Location = (n + 1)/2 = (7 + 1)/2 = 8/2 = 4 The Third Quartile is the Median of the 7 observations to the right of the Median - Q3 = 10.9 cm The Quartiles are resistant to Outliers - Ex: Q3 would still be 10.9 if the largest needle length were 50 cm rather than 12.8 cm

Describing the Center of Skewed Distributions

When dealing with strongly skewed Distributions, it is somewhat customary to report the MEDIAN ("midpoint") rather than the Mean ("arithmetic average") - Because the Median gives an idea of the Center of the "TYPICAL" part of the Distribution However, a health organization or a government agency may need to account for all possible survival times - Therefore, they will calculate the Mean to estimate the cost of medical care for a given disease & to plan medical staffing appropriately - Relying only on the Median would result in underestimating the medical & financial needs The Mean & Median measure Center in different ways, and both are useful

FIRST Quartile (Q1)

When the list of observations is sorted in increasing order, this Quartile lies ONE QUARTER of the way UP THE LIST - It is larger than 25% of the observations

THIRD Quartile (Q3)

When the list of observations is sorted in increasing order, this Quartile lies THREE QUARTERS of the way UP THE LIST - It is larger than 75% of the observations

Apply Your Knowledge 2.5: In Exercise 2.1 you plotted the silk yield stress for 21 female golden orb weaver spiders (measured in MPa) 164.0 478.7 251.3 351.7 173.0 448.9 300.6 362.0 272.4 740.2 329.0 327.2 270.5 332.1 288.8 176.1 282.2 236.1 358.2 270.5 290.7 a. Obtain the Five-Number Summary of the Distribution of yield stresses b. Obtain the Mean & the Standard Deviation of the Sample of yield stresses c. Which summary gives more information about the Distribution of silk yield stresses? - How do they reflect what you see in your Dotplot? - Remember that a summary for a Quantitative Variable, no matter how detailed, will never be as informative as a graph of the raw data

a. 5-Number Summary - Min = 164.0 - Q1 = 260.9 - Med = 290.7 - Q3 = 354.95 - Max = 740.2 b. Mean & Standard Deviation - M = 319.25 - SD = 124.92 c. The Five-Number Summary is more informative - In particular, it reflects the fact that the maximum (an Outlier) is much farther away from the Median than the Minimum

Apply Your Knowledge 2.7: A 2014 study examined the prices (in dollars) billed by hospitals all over California for common blood tests such as a blood lipid panel (to check cholesterol level, first row) & a blood metabolic panel (including fasting plasma glucose level, second row) Here is how the findings were reported: N M SD Min 5th Perc 25th Perc Med 75th Perc 95th Perc Max 178 299 759 10 76 134 220 303 602 10,169 189 371 814 35 62 111 214 389 716 7,303 a. Make side-by-side Boxplots comparing the Distributions of prices for the two procedures (as in Figure 2.4) - Describe the Distribution of prices for each blood test - Are there substantial differences between the two? b. The report also provided the Mean & Standard Deviation for the two Distributions - Explain why in this case the Mean & Standard deviation would be poor choices of summary statistics to cite in a news report

a. Both Distributions are extremely skewed to the right, with fairly similar Centers & Spreads b. The Mean & Standard Deviation are not resistant to skews & outliers

Exercise 2.31: In Exercise 1.41 (page 36) you graphed the Distribution of ovarian tumor increases under two experimental conditions: - A new nanoparticle-based delivery system for a suicide gene therapy - An inactive buffer solution a. Make a Boxplot comparing tumor increase under the two conditions & compute the Mean & Standard Deviation for each condition b. Write a short description of the experimental results based on your work in a

a. Buffer: M = 6.08; SD = 1.98 - Nanoparticles: M = 2.020; SD = 1.02 b. The nanoparticle treatment was much more effective at limiting tumor growth

Apply Your Knowledge 2.11: A manufacturer of blood pressure monitoring devices lists 10 factors that can affect blood pressure readings: 1. Blood pressure cuff is too small 2. Blood pressure cuff is used over clothing 3. Not resting for a few minutes 4. Arms, back, or feet unsupported 5. Emotional state 6. Talking 7. Smoking recently 8. Consuming alcohol or caffeine recently 9. Room temperature 10. Bladder is full or empty a. Which factor(s) would lead to data points you would definitely want to discard in a study of blood pressure in healthy adults? - Explain your reasoning - Which part of the Discussion on page 55 addresses this issue? b. Which factor(s) may lead to unusual blood pressure values that should not be ignored in a study of blood pressure in healthy adults? - Explain your reasoning c. If the manufacturer of a blood pressure cuff wanted to examine the variability of readings obtained at the doctor's office, should suspicious data points resulting from any of the 10 factors listed here be discarded? - If your answer is different than the one you gave in part a, explain your reasoning - Which part of the Discussion on page 55 addresses this issue?

a. Factors 1 & 2 (incorrect use) & factors 3 and 4 (experimental consistency) b. Factors 5 through 10 (natural variation in healthy humans) c. Factors 1 through 4, which represent errors in experimentation or data collection - Additional factors may be argued, depending on purpose

Apply Your Knowledge 2.3: A study of a new type of vision screening test recruited a sample of 175 children age 3 to 7 years - The publication provides the following summary of the children's ages: "Twelve patients (7%) were 3 years old; 33 (19%), 4 years old; 29 (17%), 5 years old; 69 (39%), 6 years old; and 32 (18%), 7 years old." a. What is the Median age in the study? - Notice that you can easily add up the percents provided in parentheses (in increasing order of age) until the total just exceeds 50% b. What is the Mean age in the study? - You will need to either organize the data in a way that your technology will accept or do the computations by hand - If so, be sure to multiply each age by the number of children with that age in the numerator of the formula for the mean c. Display the reported ages in a Histogram - Compare the values of the Mean & the Median in relation to the age Distribution in the study - What general fact does your comparison illustrate?

a. Med = 6 b. M = 5.43 c. The age Distribution is moderately left-skewed

Exercise 2.29: In Example 2.7 you examined the metabolic rates of 7 men - Here are the metabolic rates for 12 women from the same study: 995 1425 1396 1418 1502 1256 1189 913 1124 1052 1347 1204 a. The most common methods for formal comparison of two groups use M & SD to summarize the data - What kinds of distributions are best summarized by M & SD? b. Make a summary graph comparing the metabolic rates of the 7 men & 12 women, as in Figure 2.4 - What can you conclude about these two groups from your graph?

a. Symmetric Distributions with no Outliers b. On average, women have lower metabolic rates than men

Apply Your Knowledge 2.1: Spider silk is the strongest known material, natural or man-made, on a weight basis - A study examined the mechanical properties of spider silk using 21 female golden orb weavers (Nephila clavipes) - Here are data on silk yield stress, which represents the amount of force per unit area needed to reach permanent deformation of the silk strand - The data are expressed in megapascals (MPa) 164.0 478.7 251.3 351.7 173.0 448.9 300.6 362.0 272.4 740.2 329.0 327.2 270.5 332.1 288.8 176.1 282.2 236.1 358.2 270.5 290.7 a. Make a Dotplot of these data - Describe the Shape, Center, & Spread of the Distribution b. Find the Mean & Median yield stress - Compare these two values - Referring to your plot, what general fact does your comparison illustrate?

a. The Distribution is irregular but somewhat symmetric except for a high Outlier - The Spread is from 164.0 to 740.2 b. M = 319.25; Med = 290.70 - The Median is notably less than the Mean, which suggests a right-skew or high Outlier(s)

Ex 2.1: What is the Median length for our 15 Aleppo pine needles? - Here are the data, arranged in increasing order: 7.2 7.6 8.5 8.5 8.7 9.0 9.0 9.3 9.4 9.4 10.2 10.9 11.3 12.1 12.8

n = 15 - The number of observations is odd The bold 9.3 is the center observation in the ordered list, with 7 observations to its left & 7 observations to its right - This value is the Median - Median = 9.3 Because n= 15, our rule for the location of the Median gives - Median Location = (n + 1)/2 = (15 + 1)/2 = 16/2 = 8 - That is, the Median is the 8th observation in the ordered list Use this rule to locate the center in an ordered list or even in a Dotplot


Kaugnay na mga set ng pag-aaral

Chapter 23 Management of patients with chest and lower Respiratory tract disorders

View Set

Employee and Labor Relations (40%) (part 1)

View Set

Survey of World Religion Ch. 9- 13

View Set

NUR 114 Test 2 (Intracranial Regulation)

View Set

Mastering Anatomy/Physiology: Chapter 1- The Human Body

View Set

NUR 108 Ch40: Fluid, Electrolyte, and Acid-Base Balance

View Set

Financial Accounting - Module 14: Introduction to the Statement of Cash Flows

View Set

Managerial Accounting - Chapter 18 - 20

View Set