CH3

¡Supera tus tareas y exámenes ahora con Quizwiz!

Check Your Skills 3.11: You have data for many individuals on their walking speed & their heart rate after a 10-minute walk - When you make a Scatterplot, the Explanatory Variable on the X-axis... a. Is walking speed b. Is heart rate c. Doesn't matter

A

Z-Score

A STANDARDIZED score that tells HOW MANY STANDARD DEVIATION UNITS A given SCORE IS ABOVE/BELOW THE MEAN for that group - How many SD's an observation is from its Mean

Check Your Skills 3.16: What are all the values that a Correlation r can possibly take? a. r ≥ 0 b. 0 ≤ r ≤ 1 c. -1 ≤ r ≤ 1

C

Check Your Skills 3.20: In Exercise 3.18 you calculated a Correlation for knee height (X in cm) & height (Y in cm) - If, instead, you used height (in cm) as the X Variable & knee height (in cm) as the Y Variable, the new Correlation would... a. Have the inverse value (1 over) b. Have the opposite value (a different sign) c. Remain the same

C

RESPONSE Variable

DEPENDENT Variables - This Variable DEPENDS ON the EXPLANATORY VARIABLE - The Variable being INFLUENCED/AFFECTED - "Effect" - The VERTICAL/Y axis The Variable that MEASURES AN OUTCOME of a study - What is being MEASURED

Bivariate Analysis

Examining the RELATIONSHIP BETWEEN TWO VARIABLES

Distribution of 2 Quantitative Variables

Scatterplot

When we DON'T SET THE VALUES OF EITHER VARIABLE but simply observe both of them...

There MAY OR MAY NOT BE EXPLANATORY & RESPONSE VARIABLES, depending on how we plan to use the data

Exercise 3.33: Some types of cancer are much more common than others - Although genetic & environmental factors contributing to cancer often make the news, cancer may also arise from random mutations during routine stem cell divisions over the course of a lifetime Researchers examined the relationship between the total number of stem cell divisions in the lifetime of a given tissue & the lifetime risk of cancer in that tissue - Table 3.6 provides the data for 31 types of cancers for which this information is known in the U.S. population - Each risk value reflects the proportion of individuals in the population who get this particular cancer over their lifetime Cancer type Lifetime risk Stem cell divisions AM leukemia 0.0041000 129,900,000,000 Basal cell 0.3000000 3,550,000,000,000 CL leukemia 0.0052000 129,900,000,000 Colorectal 0.0480000 1,168,000,000,000 FAP colorectal 1.0000000 1,168,000,000,000 Lynch colorectal 0.5000000 1,168,000,000,000 Duodenum 0.0003000 7,796,000,000 FAP duodenum 0.0350000 7,796,000,000 Esophageal 0.0019380 1,203,000,000 Gallbladder 0.0028000 78,400,000 Glioblastoma 0.0021900 270,000,000 Head & neck 0.0138000 31,860,000,000 HPV head & neck 0.0793500 31,860,000,000 Hepatocellular 0.0071000 270,900,000,000 HCV hepatocellular 0.0710000 270,900,000,000 Lung (nonsmokers) 0.0045000 9,272,000,000 Lung (smokers) 0.0810000 9,272,000,000 Medulloblastoma 0.0001100 272,000,000 Melanoma 0.0203000 763,800,000,000 Osteosarcoma 0.0003500 29,260,000 Arms osteosarcoma 0.0000400 4,550,000 Head osteosarcoma 0.0000302 6,020,000 Legs osteosarcoma 0.0002200 11,130,000 Pelvis osteosarcoma 0.0000300 3,150,000 Ovarian germ cell 0.0004110 22,000,000 Pancreatic ductal 0.0135890 342,800,000,000 Pancreatic islet 0.0001940 6,068,000,000 Small intestine 0.0007000 292,200,000,000 Testicular 0.0037000 3,348,000,000 Thyroid follicular 0.0102600 585,000,000 Thyroid medullary 0.0003240 58,500,000 a. Make a Scatterplot showing lifetime cancer risk as a function of total number of stem cell divisions - Describe the form, direction, & strength of the relationship - Would it be appropriate to compute the correlation r between these 2 Variables? - Explain your reasoning b. When a Variable spreads over several orders of magnitude, transforming the data using a base-10 logarithm function helps focus on these differences of magnitude - What are the minimum & maximum for each Variable? - Now obtain the base-10 log of each value in the table - What are the minimums & the maximums now? c. Make a Scatterplot showing the log of lifetime cancer risk as a function of the log of total number of stem cell divisions - Describe the form, direction, & strength of this log-log relationship - Most software programs provide the option of using logarithm scales for the axes of Scatterplots, allowing you to skip the conversion to logarithms of the original data - How strong is the relationship between the log of cancer risk & the log of stem cell divisions? - Obtain the value of r using the log-transformed data

a. Most of the data are lumped in the lower right corner, so r could be misleading b. The data range from 0.00003 to 1.0 for lifetime risk - From 3,150,000 to 3,550,000,000,000 for stem cell divisions - In log scale, however, the ranges are (-4.52) to 0 & 6.5 to 12.55, respectively c) The Scatterplot now shows a strong, positive linear relationship - r = 0.804, up to roundoff error

Ex 3.2: a. The National Center for Health Statistics (NCHS) surveys the American population & collects, for each individual in the survey, information about body height, age, sex, & a long list of other attributes - The purpose of the NCHS survey is to document the characteristics of the American population b. A pediatrician looks at the same data with an eye toward using age & sex, along with other Variables such as ethnicity, to discuss a child's growth Are there Exploratory & Response Variables in these examples? - If there are, define them

a. There is NO Explanatory or Response Variable in this context b. Now AGE & SEX are EXPLANATORY Variables - HEIGHT is the RESPONSE Variable

Interpreting Correlation:

1. Correlation makes NO DISTINCTION BETWEEN EXPLANATORY & RESPONSE VARIABLE - It makes no difference which Variable you call X & which Variable you call Y when calculating the Correlation 2. Because r uses the standardized values of the observations, R DOES NOT CHANGE WHEN WE CHANGE THE UNITS OF MEASUREMENT OF X, Y, OR BOTH - Measuring height in inches rather than centimeters & weight in pounds rather than kilograms does not change the Correlation between height & weight - The Correlation itself has no unit of measurement; it is just a number 3. In a linear association, a POSITIVE r indicates a POSITIVE ASSOCIATION between the Variables - A NEGATIVE r indicates a NEGATIVE ASSOCIATION 4. The Correlation R IS ALWAYS A NUMBER BETWEEN NEGATIVE ONE & POSITIVE ONE - −1 ≤ r ≤ 1 - Values of r NEAR ZERO indicate a VERY WEAK linear relationship - The strength of the linear relationship increases as r moves away from 0 toward either -1 or 1 - Values of r CLOSE TO NEGATIVE ONE or POSITIVE ONE indicate that the points in a Scatterplot lie CLOSE TO a straight LINE - The extreme values r = -1 & r = 1 occur only in the case of a PERFECT LINEAR RELATIONSHIP when the POINTS LIE EXACTLY ALONG a straight LINE

Cautions to Keep in Mind when Using Correlation (r)

1. Correlation requires that BOTH VARIABLES be QUANTITATIVE so that it makes sense to perform the arithmetic indicated by the formula for r - Ex: we cannot calculate a Correlation between the amount of fat in the diets of a group of people & their ethnicity, because ethnicity is a Categorical Variable 2. Correlation measures the strength of only the linear relationship between 2 Variables - Correlation DOES NOT DESCRIBE CURVED RELATIONSHIPS between Variables, no matter how strong they are - Exercise 3.9 illustrates this important fact 3. Like the Mean & the Standard Deviation, the CORRELATION IS NOT RESISTANT: R IS STRONGLY AFFECTED BY A FEW OUTLYING OBSERVATIONS - In Example 3.6 we saw that researchers had identified 2 Outliers (shown as open symbols in Figure 3.3) representing years with exceptional flood events - The Correlation between starvation & fledging is r = -0.68 when the 2 Outliers are excluded, but it is a much weaker −0.47 when they are included - Use r with caution when the Scatterplot includes any Outliers 4. CORRELATION CALCULATED FROM AVERAGED DATA is typically MUCH STRONGER than Correlation calculated from the raw individual data points - Because averaged values mask some of the individual-to-individual variations that would otherwise appear in the Scatterplot - Exercise 3.8 addresses this fact 5. Correlation is NOT A COMPLETE SUMMARY OF TWO-VARIABLE DATA, even after you have established that the relationship between the Variables is linear - You should give the Means & Standard Deviations of both X & Y, along with the Correlation & Sample Size - Of course, these Numerical Summaries do not point out Outliers or Clusters in the Scatterplot - Numerical Summaries complement plots of data, but they don't replace them

Examining A Scatterplot

1. In any graph of data, look for the OVERALL PATTERN & for striking deviations from that pattern 2. You can describe the overall pattern of a Scatterplot by the DIRECTION, FORM, & STRENGTH of the relationship 3. An important kind of deviation is an OUTLIER, an individual value that falls outside the overall pattern of the relationship - Look for points on the Scatterplot that do not fit the overall pattern, regardless of whether they would be identified as Outliers in a Dotplot or Histogram of each Distribution separately

Examining Several Variables

1. Plot your data - Look for overall patterns & deviations from those patterns 2. Based on what your plot shows, choose Numerical Summaries for some aspects of the data

Ex 3.3: Manatees are large, herbivorous, aquatic mammals found primarily in the rivers & estuaries of Florida - This endangered species suffers from cohabitation with human populations, & many manatees die each year from collisions with powerboats Following our four-step process (page 58), let's look at the influence of the number of powerboats registered on manatee deaths from collisions with powerboats - We examine the relationship between the number of manatee deaths from powerboat collisions & the number of powerboats registered in any given year between 1977 & 2016, based on the data reported in Table 3.1 Year Powerboats Deaths 1977 447 13 1978 460 21 1979 481 24 1980 498 16 1981 513 24 1982 512 20 1983 526 15 1984 559 34 1985 585 33 1986 614 33 1987 645 39 1988 675 43 1989 711 50 1990 719 47 1991 681 55 1992 679 38 1993 678 35 1994 696 49 1995 713 42 1996 732 60 1997 755 54 1998 809 66 1999 830 82 2000 880 78 2001 944 81 2002 962 95 2003 978 73 2004 983 69 2005 1010 79 2006 1024 92 2007 1027 73 2008 1010 90 2009 982 97 2010 942 83 2011 922 87 2012 902 81 2013 897 73 2014 900 69 2015 916 86 2016 931 104

1. State - The number of powerboats registered in Florida varies from year to year - Does this Variable help explain the differences from year to year in the number of manatee deaths from collisions with powerboats? 2. Plan - Examine the relationship between powerboats registered & manatee deaths from collision - Choose the Explanatory & Response Variables (if any) - Make a Scatterplot to display the relationship between the Variables - Interpret the plot to understand the relationship 3. Solve First Steps - We suspect that "powerboats registered" will help explain "manatee deaths from collisions" - So "powerboats registered" is the Explanatory Variable, & "manatee deaths from collisions" is the Response Variable - Time (the year in which the data were gathered) is not a Variable of interest here - Instead, we want to see how the Variable manatee deaths changes when the Variable powerboat registrations changes, so we put powerboat registrations (the Explanatory Variable, expressed in thousands) on the horizontal axis - Figure 3.1 is the Scatterplot of these data - Each point represents a single year - In 1997, for example, there were 755,000 powerboats registered & 54 manatee deaths due to powerboat collisions - Find 755 on the X (horizontal) axis & 54 on the Y (vertical) axis. - In Figure 3.1 the year 1997 appears as the point (755, 54) above 755 & to the right of 54 Interpret the Plot - Figure 3.1 shows a clear direction: The overall pattern moves up, from lower left to upper right - That is, years in which powerboat registrations were higher tend to have higher counts of manatee deaths from collisions - We call this sort of pattern a positive association between the two variables - The form of the relationship is linear; that is, the overall pattern follows a straight line from lower left to upper right - The strength of a relationship in a Scatterplot is determined by how closely the points follow a clear form - The overall relationship in Figure 3.1 is strong because the points are fairly close to forming a straight line 4. Conclude - The number of powerboat registrations explains much of the variation in the number of manatee deaths from collisions - Years that had fewer powerboats registered also tended to have fewer accidental manatee deaths - To preserve this endangered species, restricting the number of powerboats registered might be helpful - However, the Scatterplot in Figure 3.1 does not take into consideration other factors, such as speed limits, fines, or driver education, that might also influence the number of manatee deaths from collisions - Our conclusions are limited to the available data & what they say about the relationship between powerboat registrations & manatee accidents

Check Your Skills 3.12: You have data for many individuals on their walking speed & their heart rate after a 10-minute walk - You expect to see... a. A positive association b. Very little association c. A negative association

A

Check Your Skills 3.13: Many species exhibit some degree of Sexual Dimorphism - Researchers examining the relationship between length & weight in a species of scorpion (Centruroides vittatus) should create a Scatterplot of length & weight... a. With different symbols for male & female scorpions b. For all scorpions regardless of sex, because sex is a Categorical Variable c. For only 1 sex of scorpion, because analyzing both sexes would introduce too much variability

A

Check Your Skills 3.17: If mothers were always 2 years younger than the fathers of their children, the Correlation between the ages of mothers & fathers would be... a. 1 b. 0.5 c. Can't tell without seeing the data

A

Check Your Skills 3.18: Because elderly people may have difficulty standing straight to have their height measured, a study looked at the relationship between overall height & height to the knee - Here are data (in centimeters) for 5 elderly men: Knee height x (cm) 57.7 47.4 43.5 44.8 55.2 Overall height y (cm) 192.1 153.3 146.4 162.7 169.1 Using your calculator, what is the Correlation between knee height & overall height? a. r = 0.88 b. r = 0.09 c. r = 0.77

A

SCATTERplot

A graph that SHOWS the RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES measured on the same individuals - The values of one Variable appear on the horizontal axis, & the values of the other Variable appear on the vertical axis - Each individual in the data appears as the point in the plot fixed by the values of both Variables for that individual Always plot the explanatory variable, if there is one, on the horizontal axis (the X-axis) - As a reminder, we usually call the Explanatory Variable X & the Response Variable Y If there is no Explanatory-Response distinction, either Variable can go on the horizontal axis It is meant to examine a Bivariate Relationship - To achieve that purpose, the graph should zoom in on the pattern of the relationship, avoiding extra blank space on the edges, & give equal emphasis to both axes, resizing the plot to be roughly square rather than rectangular in shape - Taking the time to properly scale this graph is important because our eyes can be fooled by changing the plotting scales or the amount of empty space around the cloud of points

Check Your Skills 3.19: In Exercise 3.18, both heights are measured in centimeters - Just for the fun of it, someone decides to measure knee height in millimeters & height in meters - The data in these units are: Knee height x (mm) 577 474 435 448 552 Overall height y (m) 1.921 1.533 1.464 1.627 1.691 The Correlation for the data using these units would be... a. Very close to the value calculated in Exercise 3.18 b. Exactly the same as in Exercise 3.18 c. Exactly 10 times smaller than in Exercise 3.18

B

Ex 3.1: How does drinking beer affect the level of alcohol in a person's blood? - The legal blood alcohol content limit for driving is now 0.08% in all states Student volunteers at Ohio State University drank different numbers of cans of beer - 30 minutes later, a police officer measured their blood alcohol content Which is the Explanatory & which is the Response Variable?

EXPLANATORY Variable - Number of beers consumed RESPONSE Variable - Percent of alcohol in the blood (BAC)

Ex 3.7: Longevity in male fruit flies is positively associated with adult size - But do other factors, such as sexual activity, also matter? - The cost of reproduction is well-documented for the females of the species A study looks at the association between longevity & adult size in male fruit flies kept under 1 of 2 conditions - 1 group is kept with sexually active females over their lifespan - The other group is cared for in the same way but kept with females that are not sexually active

Figure 3.4 shows the Scatterplot of longevity (Response Variable) versus adult thorax length (indicative of body size, Explanatory Variable) for both conditions The conditions are coded so that: - Individual fruit flies from the sexually active group are represented by triangles - Sexually inactive fruit flies from the second group are represented by circles This coding introduces a third Variable into the Scatterplot - "Condition" is a Categorical Variable that has 2 values, identified by the 2 different plotting symbols The Scatterplot is quite clear - Both groups show a positive linear relationship between thorax length & longevity, as expected - However, the sexually active fruit flies have, on the whole, a lower longevity - That is, fruit flies of a given size tend to die sooner in the sexually active group than in the inactive group - In fact, only 1 male in the sexually active group outlived a male in the inactive group with similar thorax length

EXPLANATORY Variable

INDEPENDENT Variable - "Cause" - The HORIZONTAL/X axis The Variable that may EXPLAIN/INFLUENCE CHANGE IN A RESPONSE VARIABLE - The Variable that influences another Variable - What is being MANIPULATED

DIRECTION of a Correlation

If the relationship has a clear one, we speak of either: - POSITIVE Association - NEGATIVE Association

How to Handle Data that has been Collected Over Time

In Table 3.1, manatee deaths & powerboats registered are listed for each year between 1977 & 2016 - Because we wanted to understand how powerboats are associated with manatee deaths, we created a Scatterplot of the relationship between the number of powerboats registered & the number of manatee deaths from powerboat collisions in any given year - We then analyzed this pattern regardless of time The data in Table 3.1 could also be used to create a Timeplot & study the trends over time for each Variable separately - This would give us a historical perspective not available with the Scatterplot It is possible to use time as the Explanatory Variable in a Correlation analysis, though time in itself does not explain or cause the changes (other factors do) - Exercise 3.8 provides such an example All approaches have some merit - What matters most is to carefully examine the question being investigated & to match the proper analytic method to the question

STRENGTH of a Correlation

It is determined by HOW CLOSE the POINTS in the Scatterplot LIE to a simple form such as a line

FORM of a Correlation

LINEAR Relationships are an important one between 2 variables - Where the points show a straight-line pattern Curved Relationships & Clusters are others to watch for

A Scatterplot displays the Direction, Form, & Strength of the Relationship Between 2 Quantitative Variables

Linear (straight-line) relations are particularly important because a straight line is a simple pattern that is quite common A linear relation is: - Strong if the points lie close to a straight line - Weak if they are widely scattered about a line Like all qualitative judgments, however, statements such as "weak" & "strong" are open to individual interpretation - Therefore, we follow our strategy for data analysis by using a numerical measure to supplement the graph - Correlation is the measure we use

In many studies, the goal is to show that changes in one/more Explanatory Variables actually cause changes in a Response Variable

Many Explanatory-Response Relationships, however, do not involve direct causation - The age & sex of a child can help predict future height, but they certainly do not cause a particular height

Correlation

Measures the DIRECTION & STRENGTH of the linear relationship between 2 Quantitative Variables (X & Y) - Only used for STRAIGHT-LINED, LINEAR Relationships It is usually written as r Suppose that we have data on Variables X & Y for n individuals - The values for the first individual are X1 & Y1 - The values for the second individual are X 2 & Y2 - & so on The Means & Standard Deviations of the 2 Variables are: - Mx & SDx for the X-values - My & SDy for the Y-values r = ∑(Zx)(Zy)/(n - 1) - Zx = (X - Mx)/SDx - Zy = (Y - My)/SDy

Exercise 3.35: Bushmeat = the meat of wild animals - It is widely traded in Africa, but its consumption threatens the survival of some animals in the wild - Bushmeat is often not the first choice of consumers: They eat bushmeat only when other sources of protein are in short supply Researchers looked at declines in 41 species of mammals in nature reserves in Ghana & at catches of fish (the primary source of animal protein) in the same region - The data appear in Table 3.7.23 Fish supply is measured in kilograms per person - The other Variable is the percent change in the total "biomass" (weight in tons) for the 41 animal species in six nature reserves - Most of the yearly percent changes in wildlife mass are negative because most years saw fewer wild animals in West Africa Year Fish supply (kg/person) Biomass change (%) 1971 34.7 2.9 1972 39.3 3.1 1973 32.4 −1.2 1974 31.8 −1.1 1975 32.8 −3.3 1976 38.4 3.7 1977 33.2 1.9 1978 29.7 −0.3 1979 25.0 −5.9 1980 21.8 −7.9 1981 20.8 −5.5 1982 19.7 −7.2 1983 20.8 −4.1 1984 21.1 −8.6 1985 21.3 −5.5 1986 24.3 −0.7 1987 27.4 −5.1 1988 24.5 −7.1 1989 25.2 −4.2 1990 25.9 0.9 1991 23.0 −6.1 1992 27.1 −4.1 1993 23.4 −4.8 1994 18.9 −11.3 1995 19.6 −9.3 1996 25.3 −10.7 1997 22.0 −1.8 1998 21.0 −7.4 Follow the 4-step outline (page 58) to examine whether the data support the idea that more animals are killed for bushmeat when the fish supply is low

Moderately strong, positive linear association - r = 0.8042 - This supports the idea that animal populations decline when the fish supply is low

Exercise 3.37: The toco toucan (Ramphastos toco) is the largest member of the toucan family & possesses the largest beak relative to body size of all birds - This exaggerated feature has been interpretated in various ways, such as being a refined adaptation for feeding - However, the large surface area may also be an important mechanism for radiating heat (& hence cooling the bird) as outdoor temperature increases - Here are data for beak heat loss, as a percent of total body heat loss, at various temperatures in degrees Celsius: Temperature (°C) 15 16 17 18 19 20 21 22 Percent heat loss from beak 32 34 35 33 37 46 55 51 Temperature (°C) 23 24 25 26 27 28 29 30 Percent heat loss from beak 43 52 45 53 58 60 62 62 Investigate the relationship between outdoor temperature & beak heat loss as a percent of total body heat loss

Strong, positive linear association - r = 0.9143 - This supports the idea that beak size may play a role in cooling

Ex 3.6: Punta Tombo, Argentina, is home to the world's largest breeding colony of Magellanic penguins (Spheniscus magellanicus) Researchers have studied this colony since 1983, recording each year how many eggs were laid, how many chicks hatched, how many died of various causes, & how many successfully left the nest (fledged) - Figure 3.3 is reproduced from the published findings - It shows the relationship between starvation & fledging in Magellanic penguin chicks for the 28 years between 1983 & 2010

The Scatterplot shows a clear negative, linear pattern: - Not surprisingly, years in which a higher percent of chicks died of starvation had a lower percent of chicks successfully fledging However, two points (displayed with open symbols) stand noticeably apart from the overall pattern - These Outliers of the relationship correspond to the years 1991 & 1999, when torrential rain killed more than 40% of the chicks each year The climate in Punta Tombo is arid, with low annual precipitation, so the years 1991 & 1999 were very unusual in this respect - For this reason, the researchers decided not to include these two data points when computing Numerical Summaries of the relationship between starvation & fledging

Ex 3.8: The Scatterplots in Figure 3.5 illustrate how values of r closer to 1 or -1 correspond to stronger linear relationships To make the meaning of r clearer, the Standard Deviations of both Variables in these plots are equal - The horizontal & vertical scales are the same In general, it is not so easy to guess the value of r from the appearance of a Scatterplot - Remember that changing the plotting scales in a Scatterplot may mislead our eyes, but it does not change the Correlation

The real data we have examined also illustrate how Correlation measures the strength & direction of linear relationships Figure 3.1 shows a very strong positive linear relationship between manatee deaths & powerboat registrations in Florida - The Correlation, in this case, is r = 0.945 Figure 3.2 shows a weaker but still fairly strong negative linear relationship between fat gain and non-exercise activity when overeating - Here the Correlation is r = -0.779 Figure 3.4 shows a reasonably strong positive association between thorax length & longevity in the group of sexually active fruit flies (displayed as triangles) - The Correlation in this study is r = 0.806 Notice how the strength of the association depends on the absolute value of r (its numerical value irrespective of its sign)

Apply Your Knowledge 3.9: In Exercise 3.4 you made a Scatterplot of acid phosphatase activity rate at different temperatures & described the strength of the relationship - If you calculate the Correlation based on the data provided in that exercise, you find r = 0.81 (check for yourself) Explain why the numerical value of the Correlation is only moderately strong, even though the Scatterplot indicates that the effect of temperature on enzymatic activity is clear & very reliable

The relationship is clearly not linear; therefore, r is meaningless

Statistical Relationships Between 2 Variables

They are OVERALL TENDENCIES, not iron-clad rules - They ALLOW INDIVIDUAL EXCEPTIONS - Ex: Although smokers, on average, die younger than nonsmokers, some people live to age 90 while smoking 3 packs a day To understand them we measure both Variables on the same individuals - Often, we must examine other variables as well - Ex: To conclude that smoking reduces lung capacity, the researchers had to eliminate the effect of other Variables such as each person's size & exercise habits - The relationship between two variables can be STRONGLY INFLUENCES BY OTHER VARIABLES that are LURKING IN THE BACKGROUND In each relationship there is one: - Explanatory Variable - Response Variables

Categorical Variables in Scatterplots

To add a Categorical Variable to a Scatterplot, use a DIFFERENT PLOT COLOT or SYMBOL FOR EACH CATEGORY Doing so allows us to examine visually the effect of a third Variable on the relationship between X & Y - This is a starting point to a MULTIVARIATE ANALYSIS Describe the relationship between X & Y separately for each category (that is, for each color or symbol) - Then compare & contrast these relationships

Ex 3.5: Obesity is a growing problem around the world - Surprisingly, some people don't gain weight even when they overeat - Perhaps fidgeting & other "non-exercise activity" (NEA) explains why - Some people may spontaneously increase their non-exercise activity when fed more Researchers deliberately overfed 16 healthy young adult volunteers for 8 weeks - They measured fat gain (in kilograms, kg) &, as an Explanatory Variable, change in energy use (in kilocalories, or Calories, Cal) from activity other than deliberate exercise (i.e., fidgeting, daily living, & the like) - Here are the data: NEA change (Cal) −94 −57 −29 135 143 151 245 355 Fat gain (kg) 4.2 3.0 3.7 2.7 3.2 3.6 2.4 1.3 NEA change (Cal) 392 473 486 535 571 580 620 690 Fat gain (kg) 3.8 1.7 1.6 2.2 1.0 0.4 2.3 1.1 Do people with larger increases in NEA tend to gain less fat?

To see the pattern in the data, we make a Scatterplot with NEA on the horizontal axis (the Explanatory Variable), as displayed in Figure 3.2 The plot shows a negative association: - People with larger increases in NEA tend to gain less fat The form of the association between NEA & fat gain is linear The association is moderately strong because the points make up a linear pattern but deviate quite a lot from the line

NEGATIVE Association

When above-average values of 1 Variable tend to accompany below-average values of the other, & vice versa - High values of 1 Variable tend to occur with low values of the other Variable - As 1 Variable INCREASES, the other Variable DECREASES r < 0 - r is NEGATIVE

POSITIVE Association

When above-average values of one Variable tend to accompany above-average values of the other - Below-average values also tend to occur together - High values of the 2 Variables tend to occur together - As 1 Variable INCREASES, the other also INCREASES r > 0 - r is POSITIVE

Apply Your Knowledge 3.1: In each of the following situations, is it more reasonable to simply explore the relationship between the 2 Variables or to view 1 of the Variables as an Explanatory Variable & the other as a Response Variable? - In the latter case, which is the Explanatory Variable & which is the Response Variable? a. The typical amount of calories a person consumes per day and that person's percent of body fat b. The weight in kilograms and height in centimeters of a person c. Inches of rain in the growing season and the yield of corn in bushels per acre d. A person's leg length and arm length, in centimeters

a. Explanatory: number of calories - Response: percent body fat b. Explore the relationship c. Explanatory: inches of rain - Response: yield of corn d. Explore the relationship

Exercise 3.27: Exercise 3.26 describes an automated grading system for oysters - Engineers were given the task of improving on the 2D reconstruction program - They designed a new program that estimates oyster volume using three-dimensional (3D) digital image processing - The results are displayed in Table 3.3 Actual (cm3) 2D (thousand pixels) 3D (million voxels) 13.04 47.907 5.136699 11.71 41.458 4.795151 17.42 60.891 6.453115 7.23 29.949 2.895239 10.03 41.616 3.672746 15.59 48.070 5.728880 9.94 34.717 3.987582 7.53 27.230 2.678423 12.73 52.712 5.481545 12.66 41.500 5.016762 10.53 31.216 3.942783 10.84 41.852 4.052638 13.12 44.608 5.334558 8.48 35.343 3.527926 14.24 47.481 5.679636 11.11 40.976 4.013992 15.35 65.361 5.565995 15.44 50.910 6.303198 5.67 22.895 1.928109 8.26 34.804 3.450164 10.95 37.156 4.707532 7.97 29.070 3.019077 7.34 24.590 2.768160 13.21 48.082 4.945743 7.83 32.118 3.138463 11.38 45.112 4.410797 11.22 37.020 4.558251 9.25 39.333 3.449867 13.75 51.351 5.609681 14.37 53.281 5.292105 a. Make a Scatterplot of 3D volume reconstruction (in millions of volume pixels) & actual volume (in cm3), using 3D reconstruction as the Explanatory Variable - Describe the overall pattern of the relationship &, if appropriate, give the Correlation Coefficient b. Compare your analysis for this 3D system with that for the 2D system from Exercise 3.26 - Is the 3D reconstruction program an improvement over the 2D version? Explain your reasoning

a. Linear, positive, & very strong - r = 0.9766 b.The 3D reconstruction is a better model - r is stronger

Apply Your Knowledge 3.7: Metabolic rate = the rate at which the body consumes energy - It is important in studies of weight gain, dieting, & exercise Here are data on the lean body mass & resting metabolic rate for 12 women & 7 men who are subjects in a study of dieting - Lean body mass, given in kilograms, is a person's weight after taking out all fat - Metabolic rate is measured in kilocalories (Cal) burned per 24 hours (the same calories used to describe the energy content of foods) - Researchers believe that lean body mass has an important influence on metabolic rate Subject Sex Mass (kg) Rate (Cal) 1 M 62.0 1792 2 M 62.9 1666 3 F 36.1 995 4 F 54.6 1425 5 F 48.5 1396 6 F 42.0 1418 7 M 47.4 1362 8 F 50.6 1502 9 F 42.0 1256 10 M 48.7 1614 11 F 40.3 1189 12 F 33.1 913 13 M 51.9 1460 14 F 42.4 1124 15 F 34.5 1052 16 F 51.1 1347 17 F 41.2 1204 18 M 51.9 1867 19 M 46.9 1439 a. Make a Scatterplot of the data for the female subjects - Which is the Explanatory Variable? - Explain why the subject number is not part of the Scatterplot b. Is the association between these Variables positive or negative? - What is the form of the relationship? - How strong is the relationship? c. Now add the data for the male subjects to your graph, using a different color or a different plotting symbol - Does the pattern of relationship that you observed for women hold for men as well? - How do the male subjects as a group differ from the female subjects as a group?

a. Mass is Explanatory - Subject number is an Index Variable & thus contains no data b. Moderately strong, positive linear relationship c. The same relationship seems to hold - Men tend to have higher values & are overall more variable than women

Apply Your Knowledge 3.3: In 2015, the United Nations signed a new global climate change agreement in Paris - Around that time, the Pew Research Center conducted an international survey to explore the relationship between the level of concern over climate change & CO₂ emissions per capita in each country. a. Does this study have a clear Response Variable? - Would it be possible for a country's per-capita CO₂ emissions to influence its citizens' level of concern over global climate change? - Would it be possible for people's level of concern over global climate change to influence a country's per-capita CO₂ emissions? - Explain your reasoning b. The stated objective of the survey was to examine factors that may help explain opinions about global climate change worldwide - In this context, which of the 2 Variables cited would be the Explanatory Variable?

a. Neither is clearly the Explanatory Variable, & each one may influence the other to some extent b. The stated objective is to study opinions as the Response Variable, making emissions the Explanatory Variable in this context

Exercise 3.29: Huntington's disease (HD) is a genetic neurodegenerative disorder - Patients with HD experience progressive brain degeneration, especially in the caudate nucleus Researchers examined the association between the molecule N-acetylaspartate (NAA, a brain-specific metabolite) in the caudate nucleus & caudate atrophy in a sample of 10 patients with very early-stage HD - Here are the patients' caudate NAA concentrations (in international units, IU) & caudate volumes (in milliliters, ml): Subject Volume NAA 1 2.6 9.4 2 2.5 7.8 3 2.9 7.6 4 3.5 10.0 5 3.4 8.8 6 3.1 8.4 7 4.0 10.9 8 3.5 9.5 9 4.2 11.2 10 3.7 11.6 a. It is currently unknown whether caudate atrophy might precede or follow any change in NAA level - Therefore, the Variables here are neither explanatory nor Response Variables - Display & describe the relationship between caudate volume & caudate NAA level b. How strong is the association between caudate volume & caudate NAA level? - Give the value of the Correlation

a. Strong, positive linear relationship b. r = 0.8117

Exercise 3.23: The study described in Exercise 3.21 also examined the grazing effect of another species, Daphnia pulex, on the abundance of the nuisance alga G. semen in the lab - Here are the findings: Number of grazers 0 1 2 3 4 5 6 Net growth rate −0.7 −0.4 −0.6 −0.3 −0.5 −1.0 −0.2 a. Make a Scatterplot of number of grazers & net growth rate - Do you think that D. pulex is an effective grazer of the G. semen alga? b. Find the Correlation r - How does it support your interpretation? c. The Correlation between number of D. magna & net growth rate of G. semen in Exercise 3.21 was r = -0.93 - Contrast the findings for the 2 grazing species - State your conclusions about the potential effectiveness of D. magna & D. pulex as ecological methods to control G. semen algal blooms

a. The lack of association suggests that it isn't b. r = 0.115 - This is very weak c. D. magna may be an effective method but not D. pulex

Exercise 3.21: Algal blooms can have negative effects on an ecosystem by dominating its phytoplankton communities - Gonyostomum semen is a nuisance alga infesting many parts of northern Europe - Could the overall biomass of G. semen be controlled by grazing zooplankton species? A research team examined the relationship between the net growth rate of G. semen & the number of Daphnia magna grazers introduced in test tubes - A net growth rate was computed by comparing the initial & final abundances of G. semen in the experiment, with a negative value indicating a decrease in abundance - Here are the findings: Number of grazers 1 2 3 4 5 6 Net growth rate −1.9 −2.5 −2.2 −3.9 −4.1 −4.3 a. Make a Scatterplot of number of grazers & net growth rate - Do you think that D. magna is an effective grazer of the G. semen alga? b. Find the Correlation r - How does it support your interpretation?

a. The negative linear relationship suggests that it is b. r = -0.928 - This is very strong

Apply Your Knowledge 3.5: The black-capped chickadee (Poecile atricapilla) is a small songbird commonly found in the northern United States & Canada - Chickadees often live in cooperative flocks, using a complex language to communicate about food sources & predator threats In an experiment, researchers recorded chickadee vocalizations in an aviary when the birds were presented with predators of various sizes - The following data represent the average number of D notes per chickadee warning call for each type of predator, along with the predator wingspan (in centimeters): Predator Predator wingspan (cm) Number of D notes per call Pygmy owl 31.2 3.96 Saw-whet owl 38.8 4.09 Kestrel 57.6 2.76 Merlin 60.6 3.04 Cooper's hawk 80.6 3.18 Short-eared owl 89.2 2.28 Prairie falcon 109.9 2.20 Gyrfalcon 115.1 2.25 Peregrine falcon 120.0 2.80 Red-tailed hawk 120.0 2.56 Great horned owl 120.4 2.46 Great gray owl 132.2 2.06 Rough-legged hawk 138.0 1.36 a. Plot the number of D notes per call (Response Variable) against the predator wingspan (Explanatory Variable) b. Describe the form, direction, & strength of the relationship between the number of D notes per call & the predator wingspan - What does the relationship suggest about chickadee warning calls?

b. Linear, negative, & reasonably strong - The number of D notes per call appears to represent predator wingspan

Exercise 3.25: Exercise 3.3 described a 2015 report by the Pew Research Center relating a country's per-capita CO₂ emissions to its citizens' level of concern over global climate change - The data from the report are displayed in Table 3.2 Country CO₂ Concern Argentina 4.67 10.70 Australia 16.52 8.75 Brazil 2.23 11.42 Burkina Faso 0.11 11.22 Canada 14.14 9.45 Chile 4.59 10.86 China 6.71 9.11 Ethiopia 0.08 9.49 France 5.19 9.94 Germany 8.92 9.49 Ghana 0.41 10.73 India 1.70 10.77 Indonesia 2.31 9.21 Israel 8.95 8.66 Italy 6.70 10.12 Japan 9.29 10.11 Jordan 3.60 9.26 Kenya 0.32 10.41 Lebanon 4.67 9.92 Malaysia 7.85 9.86 Mexico 3.91 10.70 Nigeria 0.54 10.52 Pakistan 0.93 9.09 Palestine 0.57 9.04 Peru 1.79 11.09 Philippines 0.86 10.92 Poland 8.34 8.73 Russia 12.65 9.10 Senegal 0.59 10.01 South Africa 9.26 9.44 South Korea 11.84 10.03 Spain 5.79 9.88 Tanzania 0.16 10.27 Turkey 4.39 9.28 U.S. 17.02 8.78 Uganda 0.12 11.23 U.K. 7.09 8.78 Ukraine 6.26 9.31 Venezuela 6.40 11.05 Vietnam 1.97 10.62 a. Create a Scatterplot of the data & describe the relationship - If your country was part of this survey, locate it on the Scatterplot b. How strong is this relationship? - Obtain the Correlation c. Refer to your answer to Exercise 3.3, part a - Would it be appropriate to conclude that citizens' level of concern over global climate change is directly affected by their country's CO₂ emissions? - Explain your reasoning

b. Moderate, negative linear association - r = -0.5436 c. No, this is a complex problem with no obvious Explanatory Variable & likely multiple influential factors


Conjuntos de estudio relacionados

오늘 뭐해요? (what are you doing today?)

View Set

AGEC 327 Exam 1 Practice Questions

View Set

BUS/475: Integrated Business Topics Wk 2 - Practice: Ch. 3, External Analysis [due Day 5]

View Set

Perspectivas Leccion 4, Estructuras 4.3 "Por and Para"

View Set

Chapter 8 and 9 City of Ember Vocabulary

View Set

Chapter 4- Trigonometric Functions

View Set

MED-SURG ATI Learning System Cardiovascular and Hematology

View Set

Derecho del Trabajo - recuperación

View Set

Students With Special Gifts and Talents

View Set