CH6

Ace your homework & exams now with Quizwiz!

Check Your Skills 6.18: The researchers in Exercise 6.17 had initially selected an additional 113 patients, but these individuals refused to participate in the study - What is the rate of nonresponse for this study? a. 22% b. 27% c. 73%

A

A Trustworthy Survey must go beyond a Good Sampling Design

Don't trust the results of a Sample Survey until you have read the exact questions asked - The amount of nonresponse & the date of the Survey are also important

Exercise 6.27: Stratified samples are widely used to study large areas of forest - Based on satellite images, a forest area in the Amazon basin is divided into 14 types - Foresters studied the four most commercially valuable types: alluvial climax forests of quality levels 1, 2, and 3, and mature secondary forest - They divided the area of each type into large parcels, chose parcels of each type at random, and counted tree species in a 20- by 25-meter rectangle randomly placed within each parcel selected - Here is some detail: Forest type Total parcels Sample size Climax 1 36 4 Climax 2 72 7 Climax 3 31 3 Secondary 42 4 The researchers chose the stratified sample of 18 parcels described in the table - Explain how a stratified random sample differs from a simple random sample - Why do you think the researchers preferred to use a stratified design?

In a stratified random sample, the entire population is broken up into subgroups; then an SRS is taken from each of the individual subgroups; and the SRSs are finally recombined to create the sample - This ensures that all four forest types are represented in the sample in proportion to their representation in the population

COHORT Studies

In which SUBJECTS SHARING a COMMON DEMOGRAPHIC CHARACTERISTIC are ENROLLED & OBSERVED AT REGULAR INTERVALS OVER an EXTENDED PERIOD OF TIME - Researchers enlist a homogeneous group of fairly similar individuals & keep track of them over a long period of time They aim to EXAMINE the EMERGENCE OF a SPECIFIC CONDITION OVER TIME & HOW IT RELATED TO a number of VARIABLES - For the most part, they are PROSPECTIVE Studies - After some time, INDIVIDUALS who have DEVELOPED a CONDITION are then COMPARED WITH the REMAINING, UNAFFECTED INDIVIDUALS Unlike Case-Control Studies, they are very costly & better suited to investigating common outcomes (e.g., type 2 diabetes; heart disease)

Experiment

In which a scientist ACTIVELY IMPOSES some TREATMENT/CONDITION in order TO OBSERVE the RESPONSE - A scientist deliberately imposes some treatment on individuals in order to record their responses Can answer Q's such as: - Does aspirin reduce the chance of a heart attack? - Does ginkgo biloba enhance memory? The purpose is to STUDY WHETHER the TREATMENT CAUSES a CHANGE IN the RESPONSE - Conducted when the goal is to UNDERSTAND CAUSE & EFFECT They take steps to defeat Confounding Variables - They are still susceptible to Lurking Variables, though

MULTISTAGE Random Sampling

These samples typically involve CHOOSING SRS'S WITHIN SRS'S - Often used for nationwide or statewide governmental surveys Ex: - The National Youth Tobacco Survey selects 1st an SRS of counties nationwide - Then for each chosen county, an SRS of schools is chosen - Lastly, within each chosen school, an SRS of classrooms is drawn - This is a MORE PRACTICAL & COST-EFFECTIVE method than directly selecting an SRS of students from all over the United States

BIASED/NON-Probability Sampling Designs

1. Convenience Sample - The interviewer makes the choice out of convenience 2. Voluntary Response Sample - people choose whether to respond to an open call for participation They are CHEAP, but... - They often PRODUCE (MISLEADING) SENSATIONAL RESULTS In both cases, personal choice introduces bias - The statistician's remedy is to allow impersonal chance to select the sample

Using Table A

1. Label - Give each member of the Population a numerical label with the same number of digits - Use as few digits as possible 2. Select - Read consecutive groups of digits of the appropriate length from left to right across a line in the Table - Ignore any group of digits that wasn't used as a label or that duplicates a label already in the Sample - Stop when you have chosen n different labels - Your Sample contains the Individuals whose labels you find

Types of OBSERVATIONAL Studies

1. SAMPLE SURVEYS 2. CROSS-SECTIONAL Studies 3. CASE-CONTROL Studies - Retrospective 4. COHORT Studies - Retrospective - Prospective - Longitudinal Studies

Limitations of Case-Control Studies

1. The main challenge is identifying a random sample of Control individuals as similar to the Case subjects as possible 2. Another important restriction is that they cannot be used to estimate the proportion of individuals in the population with the rare condition - This limitation occurs because the investigators choose how many Cases & how many Controls to study - In the pertussis example, the researchers chose to have examined vaccination records for 682 Cases & roughly 3X as many Controls (2016) - This most definitely does not mean that pertussis affects 25% of children

Check Your Skills 6.20: People who experience online harassment either ignore it or they respond via postings or by involving authorities - Is one method more effective than the other? A 2014 Pew Research Center report, based on a random sample of American adults, found that 412 of the 549 respondents who had chosen to ignore online harassment thought that it was an effective way to deal with the issue - Of the 368 respondents who had chosen to respond to online harassment, 305 thought that this approach had been effective This study is... a. An observational study with a probability sample b. A case-control observational study c. A randomized experiment

A

Check Your Skills 6.23: From 1986 to 2000, a study followed a large number of male health professionals age 40 to 75 - Each respondent filled out a lifestyle questionnaire every two years - The 22,086 subjects who had reported having good erectile function in 1986 were included in an analysis of risk factors for the development of erectile dysfunction - The analysis revealed that "obesity and smoking were positively associated, and physical activity was inversely associated, with the risk of erectile dysfunction developing" This is... a. An observational study with a cohort design b. An observational study with a case-control design c. An experiment

A

SIMPLE Random Sampling (SRS)

A Sample of size n consists of n individuals from the population chosen in such a way that EVERY SET OF N INDIVIDUALS has an EQUAL CHANCE to be the sample actually selected This Design not only gives each individual an equal chance to be selected, but also gives every possible sample of size n an equal chance to be selected When you think of this Sampling Design, picture drawing names from a hat to remind yourself that this Design doesn't favor any part of the population - That's why it is a better method of selecting samples than convenience or voluntary response sampling Avoids selection bias, because it uses impersonal chance to select the individuals in the sample

VOLUNTARY RESPONSE (Volunteer/Self-Selected) Sample

A Sampling Design in which INDIVIDUALS CHOOSE WHETHER TO PARTICIPATE in the study Ex: Opt-in polls - Write-in, call-in, or online quick votes They are not scientific polls - Instead, these types of samples are biased because people with strong opinions are most likely to respond - The problem is that people who take the trouble to respond to an open invitation are usually not representative of any clearly defined population This type of Sampling Design may also allow individuals to submit any number of entries, further exacerbating the potential for Bias

SYSTEMATIC Random Sampling

A form of Probability Sampling in which EVERY KTH ELEMENT is SELECTED IN SEQUENCE after RANDOMLY CHOOSING a STARTING POINT - A method of Sampling in which sample elements are selected from a list/sequential files, with every nth element being selected after the 1st element is selected randomly within the 1st interval

Undercoverage (Coverage Bias)

A form of Selection Bias that occurs when SOME GROUPS IN the TARGET POPULATION are LEFT OUT of the process of selecting the Sample Ex: - If we aim to study the American population, then using a Sample Survey of households will miss not only homeless people but also prison inmates, individuals in long-term care facilities, & students in dormitories - Conducting the survey by calling landline telephone numbers will miss households that have only cell phones as well as households without a phone - The results of National Sample Surveys will, in turn, have some bias if the people not covered differ from the rest of the Population

Apply Your Knowledge 6.9: The CDC conducts the yearly National Health Interview Survey (NHIS) Here is a description of its methodology: - "To achieve sampling efficiency and to keep survey operations manageable, cost-effective, and timely, the NHIS survey planners used multistage sampling techniques to select the sample of persons and households for the NHIS" Explain how a Multistage Sample is different from a Simple Random Sample & what both types of Samples have in common

A multistage design divides the population into groups, takes a random sample of these groups, & then takes a random sample of individuals within the groups sampled An SRS just samples from the entire population, without dividing it into subgroups Both techniques use randomness for selecting individuals

CASE-CONTROL Studies

A random sample of individuals with a condition (the cases) is compared with a random sample of individuals without the condition (the controls) - An Observational Study in which CASE-SUBJECTS are selected based on a DEFINED OUTCOME, & a CONTROL GROUP of subjects is selected separately to serve as a BASELINE WITH WHICH the CASE GROUP IS COMPARED They start with 2 Samples of individuals selected for their different outcomes & look for differences between the 2 groups - Most commonly, researchers look for exposure factors in the subjects' pasts that differ They are an efficient way to examine rare outcomes & often give fast results, within the usual limitations of observational studies

Check Your Skills 6.22: A study enrolled 517 children age 2 to 5 years with a diagnosis of autism spectrum disorder and 315 control children in the same age group but without such a diagnosis - Information was collected about metabolic conditions of the mothers at the time of their pregnancy, such as type 2 diabetes, hypertension, and obesity - Metabolic conditions during pregnancy tended to be more frequent among the mothers of children with autism spectrum disorder than among mothers of control children This is... a. An observational study with a cohort design b. An observational study with a case-control design c. An experiment

B

Ex 6.2: In Example 4.7 (page 108) we saw that the observed association between moderate use of alcohol and better health exists in part because some non-drinkers choose to avoid alcohol due to preexisting health conditions

Among people who do drink alcohol in moderation, observational studies found that drinking wine rather than beer or spirits is associated with better health - But people who prefer wine are different from those who drink mainly beer or stronger spirits Moderate wine drinkers as a group are richer and better educated -They eat healthier food and are less likely to smoke The explanatory variable (Which type of alcoholic beverage do you drink most often?) is confounded with many lurking variables (education, wealth, diet, and so on) A large study therefore concluded: - The apparent health benefits of wine compared with other alcoholic beverages, as described by others, may be a result of confounding by dietary habits and other lifestyle factors." Figure 6.1 shows the confounding in visual form - Wine vs. beer/spirits (Explanatory Variable) → (Cause?) → Health (Response Variable) - Lifestyle, diet, socioeconomic (Lurking Variables) → Health (Response Variable)

Sample Survey

An OBSERVATIONAL study that relies on a RANDOM SAMPLE DRAWN FROM the ENTIRE POPULATION of interest at ONE POINT IN TIME They most often ASSESS the CHARACTERISTICS/OPINIONS OF PEOPLE, & have a wide array of applications - The Current Population Survey, described in Ex 6.3, is a comprehensive survey of households from all parts of the United States (approximately 100 million U.S. households in all) - Opinion polls are ones that cover all sorts of topics & typically use voter registries or telephone numbers to select their samples They typically collect lots of data for each individual sampled - In addition to the main questions of interest, personal information about the respondents is often recorded, such as gender, age, and sociodemographics - This enables the comparison of results between subgroups of individuals at the data analysis stage Susceptible to UNDERCOVERAGE, NONRESPONSE, & RESPONSE BIAS

Check Your Skills 6.19: The study described in Exercise 6.17 is... a. An observational study with a voluntary response sample b. An observational study with a probability sample c. A randomized experiment

B

Check Your Skills 6.21: A Gallup poll of Americans' smoking behavior and attitudes was conducted in July 2012 - A total of 1014 American adults were randomly selected and interviewed by phone - Of the 166 current smokers in the sample, only 1% said that they smoke more than one pack of cigarettes per day, a historical low - In its report Gallup states, "It is possible that the decline in reports of smoking is the result of respondents' awareness that smoking is socially undesirable - Therefore, respondents may aim to present themselves in the best possible light to the interviewer and underestimate the amount they truly smoke" To which type of bias is Gallup referring in this statement? a. Nonresponse bias b. Response bias c. Interviewer bias

B

Statistical Inference

The process of DRAWING CONCLUSIONS ABOUT a POPULATION ON the BASIS OF SAMPLE DATA - We INFER information about the POPULATION from what we KNOW about the SAMPLE

RETROSPECTIVE Approach in Cohort Studies

By USING the EXISTING MEDICAL RECORDS of members of a large healthcare organization to form their Sample, some COMPARE MEMBERS WITH DIFFERENT CONDITIONS to IDENTIFY PREVIOUS HELTH EVENTS that might have INFLUENCED LATER OUTCOMES

Check Your Skills 6.15: The New England Journal of Medicine (NEJM) posted an opt-in poll on its website, next to an editorial about regulation of sugar-sweetened beverages - The poll asked, "Do you support government regulation of sugar-sweetened beverages?" - You just needed to click on a response (yes or no) to become part of the sample - The poll stayed open for several weeks in October 2012. Of the 1290 votes cast, 864 were "yes" responses You can conclude that... a. Approximately two-thirds of Americans support government regulation of sugar-sweetened beverages b. Approximately two-thirds of NEJM readers support government regulation of sugar-sweetened beverages c. The poll uses voluntary response, so the results tell us little about any particular population

C

Check Your Skills 6.16: What population is represented by the opt-in poll in Exercise 6.15? a. All Americans with access to the internet b. All internet users who visit the NEJM website c. Only those internet users who chose to participate in this particular opt-in poll

C

Check Your Skills 6.17: Researchers investigated material-need insecurities among adults diagnosed with diabetes - They found that, in a random sample of 411 adult diabetic patients treated in medical centers all over Massachusetts, 28% reported cost-related medication underuse The population represented by this study is... a. All adults living in the United States b. All adults diagnosed with diabetes living in the United States c. All adults diagnosed with diabetes treated in Massachusetts

C

Check Your Skills 6.24: How strong is the evidence cited in Exercise 6.23 that physical activity may lower the risk of erectile dysfunction? a. Quite strong because it comes from an experiment b. Quite strong because it comes from a large random sample c. Weak because physical activity is confounded with many other variables

C

Sampling Design

Describes exactly HOW a SAMPLE IS CHOSEN FROM the POPULATION - Refers to the method used to choose the Sample from the Population We often draw conclusions about a whole on the basis of a Sample - Everyone has sipped a spoonful of soup & judged the entire bowl on the basis of that taste - When your doctor is concerned about your white blood cell count, you give a "blood sample" to see how low your white cell count really is - But the bowl of soup & your blood are quite homogeneous, so that the taste of a single spoonful represents the whole soup & a small vial of your blood represents your whole blood composition at the time of extraction - These are relatively easy examples of this Conclusions reached from analyzing sample data extend only to a Population of individuals similar to those sampled, so it is important that the Sample truly represent the intended Population Choosing a Representative Sample from a large & varied Population is not so simple 1. Say exactly what Population we want to describe 2. Say exactly what we want to record - Give exact definitions of our Variables

Ex 6.12: The Nurses' Health Study is one of the largest prospective observational studies designed to examine factors that may affect major chronic diseases in women - Since 1976, the study has followed a cohort of more than 100,000 registered nurses, based on the idea that nurses are able to respond accurately to technically worded medical questionnaires In the 2007 newsletter, study investigators reported their findings on age-related memory loss - Approximately 20,000 women age 70 and older had completed telephone interviews every 2 years to assess their memory with a set of cognitive tests

Every 2 years, enrolled nurses receive a follow-up questionnaire about diseases and health-related topics such as diet and lifestyle - The response rates to the questionnaires are approximately 90% for each 2-year cycle One of the study findings was that the more women walked during their late 50s & 60s, the better their memory was at age 70 & older - The study was observational because the investigators did not randomly assign different walking regimens - Instead, they observed that women who had walked more during their late 50s & 60s ended up with better memory scores at age 70 & older

Not all statistical studies use Samples that are drawn directly from the entire Population of interest

Ex: - Studies of animal behavior cannot realistically capture animals from their entire wildlife habitat Whether the results may also apply to a larger population (the entire tree shrew breeding colony in Hanover or even the entire population of tree shrews) is a biological argument, not a statistical one

Choosing an SRS

For large Samples: - Ex: all American adults - Use technology Use a Table of Random Digits - Table A at the back of the book - Consists of a long random sequence of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Selecting an SRS is a 2-step process 1. Label - Give each member of the Population a unique numerical label of fixed length 2. Select - Use software or a table of random digits to pick n of the individuals in the labeled Population

Apply Your Knowledge 6.1: A study aims to determine the percent of adults whose right cerebral hemisphere is larger than their left hemisphere - A group of individuals age 20 and older representative of the American adult population is asked to come to a brain-imaging facility to have their brains scanned - Each individual's scan is then analyzed to identify whether the right or left cerebral hemisphere is larger Is this an observational study or an experiment? - Explain your answer

Observational - The study did not attempt to change the participants in any way

LONGITUDINAL Studies

Observational Studies that MONITOR a SAMPLE OF INDIVIDUALS REPEATEDLY OVER TIME - Research in which the same people are restudied and retested over a long period of time Ex: - Cohort Studies

CROSS-SECTIONAL Studies

Observational studies that collect data about a Population at ONE POINT IN TIME - A research method that compares participants in different groups at the same time In epidemiology, they are used to estimate the proportion of individuals in the Population with various medical conditions & diseases

Observation Study

In which the subject of our scientific inquiry is disturbed as little as possible by the act of gathering information - A scientist OBSERVES INDIVIDUALS & MEASURES VARIABLES OF INTEREST but DOES NOT ATTEMPT TO INFLUENCE the RESPONSES Can answer Q's such as: - What is the range of gestation times in the prairie dog? - What is the proportion of Americans who are overweight? The purpose is to DESCRIBE & COMPARE EXISTING GROUPS/SITUATIONS They are essential sources of data about topics ranging from the heights of adult Americans to the behavior of animals in the wild - They are a poor way to gauge the effect of an intervention - Those studying the effect of 1 Variable on another often fail to demonstrate causality because the EXPLANATORY Variable is CONFOUNDED WITH LURKING Variables

NONRESPONSE Bias

Occurs when a selected individual CANNOT BE CONTACTED/REFUSES TO PARTICIPATE In Sample Surveys, this is often very large, even with careful planning & several callbacks - If the people contacted differ from those who are rarely available or who refuse to answer questions, some bias will be introduced

RETROSPECTIVE Approach in Case-Control Studies

LOOKING BACK INTO THE PAST - Ex: In the pertussis study, the researchers examined the children's vaccination history

Ex 6.5: Imagine an archery range where people come to practice - At the end of a training session, the iconic multicolor archery target is removed but perhaps you can still see the holes left by the arrows - Think of these holes as actual data points in a Sample - Figure 6.2 shows 4 scenarios

Our goal is to guess, with a given level of certainty, the location of the target's center where the archer was aiming - In our archery analogy, this would be the center of the Population Draw a circle in each scenario where you are fairly certain that the center of the target should be, based on the pattern of points - Then look at Figure 6.3 to see whether your guess was correct Comparing these scenarios highlights important attributes of samples: 1. (a) versus (b): - More arrows were shot in scenario (b), so there are a lot more data points - More information makes it easier to pinpoint the likely center of the target 2. (a) versus (c): - The same number of arrows were shot in both scenarios, but scenario (c) shows less variability (perhaps a more practiced archer) - Less variability makes it easier to pinpoint the likely center of the target 3. (a) versus (d): - Both patterns of points are identical - Yet, the systematic downward misses (perhaps from an improperly adjusted bow) in scenario (d) make it practically impossible to correctly guess the center of the target

STRATIFIED Random Sampling

Sampling distinct groups within the Population separately, and then combining these Samples - In which a POPULATION is DIVIDED INTO SUBGROUPS (STRATA) & RANDOM SAMPLES are then TAKEN FROM EACH STRATA Useful when the Population contains groups that are more rare or harder to reach than others Ex: - A study of exercise habits in American adults may use this technique, selecting individuals of different ethnic groups separately, to ensure inclusion of majority & minority groups according to their proportions in the population - Inversely, a study of healthcare use in different ethnic groups may use this technique to ensure the selection of a large enough number of individuals from minority groups so that meaningful comparisons can be made between the groups - In either case, this Sampling type provides 1 additional level of control over the selection of individuals in the Sample, compared with a straightforward SRS

Apply Your Knowledge 6.5: Manatees are large aquatic mammals found primarily in the rivers and estuaries of Florida - Because they are an endangered species, the Florida Fish and Wildlife Conservation Commission tracks all manatee deaths in the state and establishes a probable cause of death for each carcass - In 2016, the commission recorded 520 manatee deaths Does this number represent a Sample of manatee deaths or does it represent the actual manatee Population in Florida?

Population

Scientists Gather Information about only Part of the Group (Sample) in order to Draw Conclusions about the Whole (Population of Interest)

Population - The entire group of individuals (not necessarily people) about which we want information Sample - The part of the Population from which we actually collect information - Used to draw conclusions about the entire Population - Procured via a Sampling Design

CONVENIENCE Sample

The easiest, but not the best, Sampling Design - Individuals are chosen based on proximity - INDIVIDUALS CLOSE AT HAND are chosen as participants Ex: - If we are interested in finding out what percent of adults take vitamin and mineral supplements, we might go to a shopping mall and ask people we meet there - However, a shopping mall sample will almost surely overrepresent middle-class and retired people and underrepresent the poor - This pattern will recur almost every time we take such a sample - That is, the source of bias is a systematic error caused by a bad sampling design, not just bad luck on one sample - The outcomes of shopping mall surveys will repeatedly miss the truth about the population in the same ways

PROSPECTIVE Approach in Cohort Studies

Scientists RECORD, AT REGULAR INTERVALS, all sorts of NEW RELEVANT INFORMATION ABOUT the study PARTICIPANTS

Do not confuse Nonresponse in a Probability Sample with Opt-in Polls that use a Voluntary Response Sample

Surveys using a probability sample may suffer bias from nonresponse, but this bias can be alleviated to some extent by weighting various groups to make the sample more closely representative of the target population In contrast, with opt-in polls that rely on voluntary response, we never know who the respondents are & what kind of population they might represent - Some individuals may even participate more than once (there are documented examples of thousands of online submissions coming from the same computer) - Nothing can be done with voluntary response samples to make the respondents truly representative of the intended target population

Ex 6.11: Pertussis, also known as whooping cough, is a highly contagious respiratory disease that is characterized by uncontrollable, violent coughing & affects mostly children - California has recently seen a resurgence of cases of pertussis, despite the existence of a vaccine Researchers compared vaccination records among case children diagnosed with pertussis during the 2010 California epidemic & control children who had no such diagnosis at that time

The Cases were a random sample of 682 medical records of California children ages 4 to 10 who had been diagnosed with pertussis - The Controls were a random sample of 2016 medical records of California children in the same age group who had received care from the same clinicians but were not diagnosed with pertussis For every Case, about 3 Controls were selected from the clinician's appointment log for that day The pertussis study found that the Cases were much more likely than the Controls to have not received any pertussis-containing vaccines, to have not received all recommended 5 doses of the vaccine, or to have had a longer interval of time since their last vaccination - Of course, differences in vaccination rates could potentially be explained by differences in access to health care - However, because the Controls had received care from the same medical providers as the Cases, we can make a strong case that access to health care was not a Confounding Variable in this study

Ex 6.9: The Census Bureau's American Community Survey (ACS) is a monthly survey mailed each time to a new random sample of nearly 300,000 U.S. addresses (households and group housing) - Response to the survey is mandatory - Failure to reply triggers telephone and in-person follow-ups - The ACS has the lowest nonresponse rate of any poll we know, consistently well below 5% historically Why so much effort?

The Census Bureau states that the information collected helps to determine the allocation of more than $400 billion of federal funding each year and offers reliable data for purposes ranging from business decisions to community emergency planning The CDC's National Health Interview Survey, described in Ex 6.8, interviews participants in person - Its nonresponse rate was 30% in 2015 - The University of Chicago's General Social Survey (GSS) is the nation's most important social science survey; it is conducted in person every 2 years - The GSS contacts its sample in person, & it is run by a university - The 2014 survey had a rate of nonresponse of 31% What about opinion polls by news media & opinion-polling firms? - Few disclose their rates of nonresponse openly - A task force for the American Association for Public Opinion obtained response rate data from eight large & prestigious commercial & nonprofit research firms - Combined, their 2015 rate of nonresponse was 91% for individuals contacted via landline & 93% for those contacted via cell phone

Bias

The SYSTEMATIC FAVORITISM OF CERTAIN OUTCOMES in a statistical study

Ex 6.10: Ask a Sample of college students these 2 questions: 1. "How happy are you with your life in general?" (Answers on a scale of 1 to 5) 2. "How many dates did you have last month?"

The correlation between answers is r = -0.012 when asked in this order - It appears that dating has little to do with happiness Reverse the order of the questions, however, & r = 0.66 - Asking a question that brings dating to mind makes dating success a big factor in happiness

Ex 6.1: Should women take hormones such as estrogen after menopause, when natural production of these hormones ends? - In 1992, several major medical organizations said "yes" - In particular, women who took hormones seemed to reduce their risk of a heart attack by 35% to 50% - The risks of taking hormones appeared small compared with the benefits

The evidence in favor of hormone replacement came from a number of observational studies that compared women who were taking hormones on their own accord with others who were not But women who elect to take hormones are most likely quite different from women who do not: - They may be better informed and see their doctors more often, and they may do many other things to maintain their health So, is it surprising that they have fewer heart attacks? Experiments don't let participants decide what to do; instead they assign them to specific conditions The Women's Health Initiative (WHI) trial sponsored by the National Institutes of Health assigned women either to hormone replacement or to dummy pills that look and taste the same as the hormone pills - The assignment was done randomly, so that all kinds of women were equally likely to get either treatment In 2002, the WHI trial published its first results, which indicated that women who took hormones had a higher incidence of cardiovascular disease and breast cancer -Taking hormones after menopause quickly fell out of favor This first WHI study, however, had focused on older women, with an average age of 63 years - When a follow-up WHI trial of women in their 50s was published in 2007, it showed that younger women taking hormone therapy had lower levels of calcium deposits in their arteries, which may lower their risk of heart disease

Ex 6.3: The Current Population Survey (CPS) is a highly important sample survey conducted by the U.S. government - The CPS contacts approximately 60,000 households each month - The results are used to determine the monthly unemployment rate and various demographic characteristics - Supplemental questionnaires on a variety of topics are added at regular intervals - They are designed to update other government surveys such as the National Health Interview Survey, which is conducted on a yearly basis One question of interest to the government is the proportion of current U.S. smokers who attempted to quit smoking within the previous 12 months

The first step in producing data for this question is to specify the Population we want to describe - Which age groups will we include? - Will we include undocumented immigrants or people in prisons? - The CPS defines its population as all U.S. residents (whether citizens or not) who are at least 15 years of age, who are civilians, and who are not in an institution such as a prison or a nursing home - Persons from all 50 states are considered, representing both genders and all ethnic groups The second step involves choosing what precisely we want to measure - What does it mean to be a "current smoker"? - Should we include individuals who smoke only occasionally? - What constitutes an "attempt to quit smoking"? - The CPS defines current smokers as individuals who currently smoke either every day or some days - If you are a current smoker, the interviewer then goes on to ask about quitting attempts - A quitting attempt is defined as an attempt to quit smoking during the past year that lasted for 24 hours or more

Ex 6.4: Tree shrews (Tupaia belangeri) are small omnivorous mammals that are phylogenetically related to primates - Do tree shrews exhibit a paw preference when grasping food?

The first step in producing data to answer this question is to specify the population we want to describe - This would be the population of all tree shrews; a population naturally found in Asian tropical forests - What are researchers' options for obtaining a sample of live animals? - They can capture tree shrews in the wild or obtain tree shrews from a dedicated breeding facility - The researchers in this study chose to obtain 36 tree shrews from a breeding facility run by the University of Veterinary Medicine, Hanover, Germany - The advantage is that their population of interest is similar to that studied by other researchers in their field - The disadvantage is that these animals are not representative of the population of wild tree shrews The second step involves choosing what we want to measure: - How do we assess "paw preference"? - The researchers observed the tree shrews during food-grasping tasks and recorded which paw was used for grasping - They used this information to compute a quantitative "pawedness index" reflecting the relative use of the right and left paws during numerous grasping attempts; they then used this index value to label each animal as having a right, left, or ambidextrous paw preference Another study examined the biomechanics of jumping in the hedgehog flea (Archaeopsyllus erinacei) - The target population is all adult hedgehog fleas (only the adults jump), but how do you sample from this population? - The researchers obtained 10 adult specimens from hedgehogs at St. Tiggywinkles Wildlife Hospital Trust in the United Kingdom - These fleas are representative of wild fleas in the area around the wildlife hospital, but there is no guarantee that they are truly representative of all hedgehog fleas everywhere - One aspect of jump examined in this study was takeoff angle - The researchers filmed the fleas in slow motion during spontaneous jumps and recorded the takeoff angle of each jump

Wording of Questions

The most important influence on the answers given to a Sample Survey - Can strongly influence the answers given to a Sample Survey CONFUSING/LEADING QUESTIONS can introduce strong bias, & - Even MINOR CHANGES IN WORDING can change a Survey's outcome - The ORDER IN WHICH QUESTIONS ARE ASKED can make a difference, too Ex: - Only 13% of Americans surveyed think we are spending too much on "assistance to the poor" - But 44% think we are spending too much on "welfare"

Ex 6.6: A study analyzed the answers of 60,058 heterosexual participants to an open poll titled "Sex and Love" posted for 10 days on the official website of NBC News - Because anyone could participate, we have no way of knowing what types of individuals chose to participate or even if they participated only once - There is therefore no larger, clearly defined population here about which to infer

The researchers acknowledged that their sample "was not nationally representative" and described 2 main reasons why - They cited research showing that voluntary response samples recruited online "tend to include participants who are relatively more educated and have higher income than the national population" - They also pointed out the possibility "that the survey title Sex and Love appealed to people with more liberal attitudes toward sex"

Advantages & Disadvantages of COHORT Studies

They accumulate enormous amounts of detailed information & can examine the Compounded Effect of various factors over time However, they also take a long time to complete & lose subjects over time, creating a potential Confounding Effect - This is especially true when studying older individuals, who may die before the end of the study - Ex: it is possible that the women with the greatest memory loss also had poor overall health & died younger &, therefore, were not included in the analysis Overall, though, they are less prone to confounding than Case-Control Studies because cohorts start with one homogeneous group - This approach has other important advantages - Cohorts can provide information about the relative health risks of different subgroups - They also support incidence calculations, unlike Case-Control Studies, which select subjects based on an existing disease status However, like all Observational Studies, they cannot establish causation - The observed differences between groups can be attributed either to the groups' differentiating feature or to Confounding Variables - Ex: perhaps the women with the better health were both more capable of walking and less prone to memory loss • Therefore, we cannot unambiguously conclude that walking has a protective effect against memory loss

Apply Your Knowledge 6.13: Early life corresponds to the first bacterial colonization of the digestive tract, creating a microbiome that appears to play an important role in human development - What might be the effect of early exposure to antibiotics on children's development? One study followed 11,532 full-term babies all born in the county of Avon (United Kingdom) during 1991 and 1992 - Exposures to antibiotics during infancy were recorded, along with recurring measures of body mass over several years - The study found that antibiotic exposure during the first 6 months of life was associated with increased body mass later on; a pattern not seen with antibiotic exposure occurring after the first six months What type of observational study is this? - What population or populations are represented in the study? - What are the explanatory and response variables?

This is a Cohort Study - The population is full-term babies born in a similar time & place - The explanatory variable is antibiotic exposure - The response variable is body mass recorded over time

PROBABILITY Sampling

When a SAMPLE is CHOSEN BY CHANCE - This allows for neither favoritism by the sampler nor self-selection by respondents - Choosing a sample by chance mitigates bias by giving all individuals an equal chance to be chosen - Rich and poor, young and old, male and female; they all have the same chance to be in the sample 3 types: 1. SIMPLE Random Sampling 2. SYSTEMATIC Random Sampling 3. STRATIFIED Random Sampling 4. MULTISTAGE Random Sampling - Large-scale sample surveys typically use Stratified, Multistage Samples that combine SRSs from each stage of the sampling process

Exercise 6.43: Researchers enrolled a group of 10,892 middle-aged adults and studied them over a period of nine years - They found that smokers who quit had a higher risk for diabetes within three years of quitting than either nonsmokers or continuing smokers a. What type of observational study is this? b. Does this show that stopping smoking causes the short-term risk for diabetes to increase? - Should a doctor cite this study to tell a middle-aged adult patient who smokes that stopping smoking can cause diabetes and advise him or her to continue smoking? c. What confounding variables might explain these findings?

a. A cohort study b. No, this is an observational study c. For example, existing health concerns may force the person to quit smoking

Ex 6.8: Random digit dialing (RDD) is a common & inexpensive computer-based method of contacting a random sample of people by telephone - However, federal regulations forbid RDD technology for dialing cell phone numbers, which must instead be dialed by hand at greater expense But can we conduct a reliable telephone survey of the American population using landline numbers only?

This was the issue the CDC considered in 2003 when it planned its first survey of telephone access - The ideal tool for this inquiry was the CDC's own National Health Interview Survey (NHIS), because it interviews participants in person, regardless of their telephone access or lack thereof - The data showed that, in early 2003, 3% of households in the United States had a cell phone but no landline phone ("wireless only" households) - This estimate climbed to 25% by early 2010, & to 51% by the end of 2016 - Therefore, telephone interviews conducted exclusively via landlines after 2003 would have missed an ever-growing segment of the population What would be the impact of excluding wireless-only households from telephone surveys? - The NHIS shows that wireless-only individuals differ in a number of ways from the general population - They tend to be younger (in 2016, 73% of adults age 25 to 29 were wireless-only, compared with only 24% of those age 65 and older) & to have a lower socioeconomic status - Their ethnic distribution is different (non-Hispanic whites are less likely to live in wireless-only households) - They also differ on several health indicators: Wireless-only individuals are more likely to smoke, to drink excessively, & to be physically active, & they are less likely to have diabetes - Therefore, telephone surveys relying exclusively on landline phones would sample from a population substantially different from the entire U.S. population So what is the solution? - At first, the CDC simply adjusted its landline survey results to compensate for the bias introduced by Undercoverage - Ex: if a sample contained too few young adults, the responses of the young adults who did respond were given extra weight - When the share of wireless-only households became too substantial to justify this kind of adjustment, the CDC switched to using both wireless & landline numbers in its telephone surveys, in proportions reflecting their respective adoption in the target population (a "dual frame" approach) - This approach requires some adjustment to compensate for individuals with both a landline & a cell phone, who would otherwise be twice as likely to be selected (Overcoverage) - Many other polling organizations have since followed the CDC's lead, regularly increasing the proportion of cell phone numbers in their telephone surveys

Ex 6.7: A healthcare organization has comprehensive records for all 4000 of its patients, including their billing addresses Researchers in Guam examined 504 brown tree snakes, an invasive species causing substantial environmental damage to the island - How were these 504 snakes "selected"? Researchers studied the contents of 800 vaccine-related posts on the social media site Pinterest - How were these posts selected?

To find out how satisfied its patients are with the care provided, a manager assigns the patients in the company database a number from 1 to 4000 and uses software to select 100 patients at random who will receive a customer satisfaction survey in the mail - The result is an SRS of 100 patients The researchers placed, at regular intervals in a large forested area, standard brown-tree snake traps with a mouse bait - The result is a reasonable approximation of an SRS because every brown tree snake in the forested area should have the same chance of ending in the sample On 3 days in March 2014, the researchers chose every 5th post containing 1 of these keywords: vaccination, vaccine, vaccines, & vaccinate - This is a Systematic Random Sample - The Population here is all Pinterest posts on these days with one of these four keywords - As long as there is no consistent pattern in the order of the posts, this approach offers a random sample without built-in bias in the same way that an SRS would

Collecting Data

To understand phenomena or answer scientific questions 2 approaches: 1. Observational Study 2. Experiment

The Impact of important Sample Attributes on Inference

We can accurately perform Inference for any reasonable Sample Size & data variability, but... - LARGER SAMPLES & LOWER VARIABILITY lead to MORE PRECISE INFERENCE In contrast, SYSTEMATIC ERRORS in the data collection process (Bias) are likely to lead to a WRONG CONCLUSION - A large part of statistics consists of identifying possible sources of Bias & designing data collection to avoid them

RESPONSE Bias

When the BEHAVIOR OF the RESPONDENT OR of the INTERVIEWER can cause BIAS IN SAMPLE RESULTS - Tendency of subjects to systematically respond to a stimulus in a particular way due to non-sensory factors Ex: - Drinking a lot of alcohol & having many sexual partners typically have negative connotations, & many interviewees understate their responses to questions on these topics - A study contrasting self-reports of height & weight with actual measurements showed a tendency to report values that would be considered slightly more desirable - Responses may also be influenced by whether the interviewer is male/female or from one ethnic group/another - Answers to questions that require recalling past events are often inaccurate because of faulty memory • Many people "telescope" events in the past, bringing them forward in memory to more recent time periods • "Have you visited a dentist in the last six months?" will often draw a "yes" from someone who last visited a dentist 8 months ago Careful training of interviewers & careful supervision to avoid variation among the interviewers can reduce this - Good interviewing technique is another aspect of a well-done sample survey

CONFOUNDING Variable

When the EFFECTS OF TWO VARIABLES (either Explanatory or Lurking Variables) on the Response Variable CANNOT BE DISTINGUISHED FROM EACH OTHER - Lurking Variables greatly complicate our ability to interpret findings because their effect & the effect of the Explanatory Variable are mixed up

Exercise 6.29: A January 2010 SurveyUSA poll asked a random sample of adults living in the state of Washington the following question: "State lawmakers are considering making marijuana possession legal - Do you think legalizing marijuana is a good idea or a bad idea?" - Of the 500 adults interviewed, 280 said "good idea" A separate January 2010 SurveyUSA poll asked a random sample of adults living in the San Diego, California, area, "Do you think marijuana should? or should not? be legal when used for recreational purposes?" - Of the 500 adults interviewed, 219 answered "should" a. What percent of respondents seemed to support the idea of legalizing marijuana in each poll? b. In what way do these two surveys differ? - Explain how these differences might explain the apparent difference of opinion

a. 56% and 43.8%, respectively b. The polls represent two different geographic areas with a different mix of individuals - The question wording is very different in the two polls

Exercise 6.41: The California Men's Health Study enrolled 82,695 middle-aged men with no prior history of heart failure - The time each one spent being sedentary was recorded using regularly administered questionnaires over a 10-year period - The study found that the proportion experiencing a heart failure over the course of the study was higher among the men spending more time being sedentary a. What type of observational study is this? - Explain your answer b. During the course of the study, 3473 men experienced heart failure - How is this information relevant when interpreting the study results? c. Can we conclude from this study that spending a lot of time being sedentary causes heart failure in middle-aged men? - Explain your answer

a. A cohort study b. There are enough cases for statistical evaluation c. No, this is an observational study

Exercise 6.39: A study examined coffee consumption between 1980 and 2004 for 50,739 American nurses from the Nurses' Health Study who were free of depressive symptoms at baseline in 1996 - Ten years later, 2607 nurses had reported a diagnosis of depression and antidepressant use at some point - The study found that depression risk decreased with greater caffeinated coffee consumption (but not with decaffeinated coffee consumption) a. What type of observational study is this? - Explain your answer b. According to the investigators, "this study cannot prove that caffeine or caffeinated coffee reduces the risk of depression but only suggests the possibility of such a protective effect" - Explain why this is the case. c. What would be a possible confounding variable for this study?

a. A cohort study - One homogeneous group was followed for many years to track the effect of caffeine on depression b This is an observational study, so a causal link cannot be established c. Answers will vary - Nurses with health problems could avoid caffeine, and health problems could also lead to depression and antidepressant use

Exercise 6.25: This is the title of a report published by Jawbone, a manufacturer of wearable devices used to monitor sleep, movement, and food intake - The report states: "Leveraging the sleep tracking capabilities of UP by Jawbone, we can take an unprecedented look at how tens of thousands of college students sleep across the country at over 100 universities, totaling 1.4 million nights of sleep" - The company's data showed an average sleep time of 7.0 hours for week nights and 7.4 hours for weekend nights a. What type of sample do these data come from? - What is the population represented by this sample? b. What can we conclude from the stated findings?

a. A convenience sample - The population is Jawbone UP users b. Nothing beyond the stated summary statistics representing this particular convenience sample

Exercise 6.37: The polling firm Gallup produces extensive monthly and annual reports on the well-being of the American population, including what fraction of the population exercises regularly, has access to health care, or had the flu in any given month - Here are some quotes taken from the 2016 web page describing the survey methodology: a. How are interviews conducted for the Gallup-Healthways Well-Being Index? Gallup interviews U.S. adults aged 18 and older living in all 50 states and the District of Columbia using a dual-frame design, which includes both landline and cellphone numbers. Gallup samples landline and cellphone numbers using random-digit-dial methods. Gallup purchases samples for this study from Survey Sampling International (SSI). Gallup chooses landline respondents at random within each household based on which member had the most recent birthday. Each sample of national adults includes a minimum quota of 60% cellphone respondents and 40% landline respondents, with additional minimum quotas by time zone within region. Gallup conducts interviews in Spanish for respondents who are primarily Spanish-speaking. - What is the population that this survey aims to describe? - Explain how including both landline telephones and cell phones helps reach the target population - Why do you think that Gallup selects the adult with the most recent birthday in a household, rather than the first one to answer the phone? b. Are Gallup-Healthways Well-Being Index samples weighted? Yes, Gallup weights samples to correct for unequal selection probability, nonresponse, and double coverage of landline and cellphone users in the two sampling frames. Gallup also weights its final samples to match the U.S. population according to gender, age, race, Hispanic ethnicity, education, region, population density, and phone status (cellphone only, landline only, both, and cellphone mostly). - Explain why Gallup weights (adjusts) the answers of its survey respondents - Name one type of bias that cannot be helped with weighting answers

a. All adults living in the United States - Some people have only a landline phone; others have only a cell phone - Choosing based on birthday is random and unbiased - Selecting the one who answers might introduce a bias if some types of people tend to answer the phone more often b. If different types of people are easier to reach, they would be over-represented in a random sample - Weights can compensate for such disparities but not for answer bias (inaccurate or untruthful answers)

Apply Your Knowledge 6.7: To gather data on a 1200-acre pine forest in Louisiana, the U.S. Forest Service laid a grid of 1410 equally spaced circular plots over a map of the forest - A ground survey visited a sample of 10% of these plots Selecting an SRS is a 2-step process 1. Label - Give each member of the Population a unique numerical label of fixed length 2. Select - Use software or a table of random digits to pick n of the individuals in the labeled Population a. Using a table of random digits 1. Label the plots using numerical labels starting with the number 1 - Make sure that all plot labels have the same length (think about how many digits are needed for the largest numerical label, in this case 1410) 2. Go to line 105 in Table A at the back of the book - Ignore the spacing between digits (their only purpose is readability), & read the digits in groups of the same length as your numerical labels - The first 3 groups that match a plot label make up the beginning of your sample - Ignore any repeated labels, because you don't want to choose the same plot twice b. Use the software of your choice to select the first 3 plots in an SRS of 141 plots - Ex: the function "= RANDBETWEEN(bottom, top)" in Excel returns an integer between a bottom & a top number you select

a. Label each plot from 0001 to 1410: - Plots labeled 769, 1315, & 94 are selected

Apply Your Knowledge 6.3: A study examined a large nationally representative sample of American adults age 18 to 64 years The primary outcome of interest was body mass index (BMI), which was also used to categorize participants as: - Obese (BMI ≥ 30) - Not obese (BMI < 30) In the study, inadequate hydration (as evidenced by a urine osmolality ≥ 800 mOsm/kg) was associated with higher BMI and obesity The published findings include this statement: - "Although inadequate intake of water among obese adults may explain the observed findings, differential consumption of food with high water content may also contribute to the relationship between inadequate hydration and elevated BMI a. Is this study observational or experimental? b. What variables recorded for each participant are described here? c. What lurking variable cited here is confounded with the explanatory variable? d. Would it be appropriate, based on this study alone, to advise obese patients that drinking more water will help them lower their BMI? - Explain your answer

a. Observational b. BMI, obesity, hydration status c. Water content of food consumed d. No, this is just an observational study

Apply Your Knowledge 6.11: In 2010, the Physicians Foundation conducted a survey of physicians' attitudes about health care reform, calling the report "a survey of 100,000 physicians" - The survey was sent to 100,000 randomly selected physicians practicing in the United States, 40,000 via post office mail and 60,000 via email - A total of 2379 completed surveys were received a. State carefully what population is sampled in this survey and what is the sample size - Could you draw conclusions from this study about all physicians practicing in the United States? b. What is the rate of nonresponse for this survey? - How might this affect the credibility of the survey results? c. Why is it misleading to call the report "a survey of 100,000 physicians"?

a. The population is all physicians practicing in the United States - The sample size is 2379 (physicians for whom we do have data) - Conclusions apply to all U.S. physicians b. The rate of nonresponse is 97.6% - There is a very real potential for bias c. The Physicians Foundation hoped to survey 100,000 physicians but obtained data from only 2379 physicians - The report should be called "a survey of 2379 physicians"

Exercise 6.33: Comment on each of the following as a potential sample survey question - Is the question clear? - Is it slanted toward a desired response? a. "It is estimated that disposable diapers account for less than 2% of the trash in today's landfills. In contrast, beverage containers, third-class mail, and yard wastes are estimated to account for about 21% of the trash in landfills. Given this, in your opinion, would it be fair to ban disposable diapers?" b. "Given the current trend of more home runs and more injuries in baseball today, do you think that steroid use should continue to be banned even though it is not enforced?" c. "In view of the negative externalities in parent labor force participation and pediatric evidence associating increased group size with morbidity of children in daycare, do you support government subsidies for daycare programs?"

a. The question is clear, but the wording seems to suggest that cell phone usage caused the brain cancer b. The question is slanted toward an affirmative answer c. This question is so unclear & charged with negative words that it is likely to prevent many "yes" answers

Exercise 6.31: Jane Goodall dedicated years of her life to the study of chimpanzee behavior in their natural habitat of East Africa - Initially, the chimps would flee at the sight of her, so Goodall had to observe them from a distance with binoculars - But she persisted until the animals eventually became accustomed enough to her presence to ignore her a. What kind of bias was she trying to minimize? b. Explain why this lengthy step is particularly important when observing animal behavior

a. To avoid response bias due to the disturbance of a new presence b. If animals are scared or offended by a new, unusual presence, then we cannot observe their natural behavior


Related study sets

Chapter 22, Nursing Assessment: Integumentary System: Integ Assessment

View Set

Module 4 Quiz: Individual Influences

View Set

Unit 3 - Anatomy (Literal Suffering)

View Set

MN Laws, Rules, & Regulations Pertinent to Life Insurance

View Set

Chapter 11: Maternal Adaptation During Pregnancy

View Set

Accounting 299 Exam 1 (chapter 1, 13, 9 and 12).

View Set