AP Stats Chapter 4
How to choose a simple random sample
1. Manually: Hat method, very specific instructions 2. Using a random number generator: Must # population; choose distinct numbers 3. Using a random number table: Label-using as few digits as possible- Randomize
Principles of Experimental Design:
Comparison, random assignment, control and replication
Identify the sampling method used and explain how the sampling method could lead to bias: A farmer brings a juice company several crates of oranges each week. A company inspector looks at 10 oranges from the top of each crate before deciding whether to buy all the oranges
Convenience sampling
Types of bad sampling
Convenience sampling and voluntary response sampling
Convenience sampling
Easy to reach; produces representative data
Most experiments use a design as follows:
Experimental units -------> Treatment ---------> Measure response
Data ethics
While this is a good principle to follow in experimental design, it can often be difficult to weigh the need for experimentation with perceived levels of "harm"
To collect information using surveys, the ideal situation would be to ask ___________________________
a census (them all)
Block
a group of EUs that are known before the experiment to similar in some way that effects the response to treatment
Response bias
a systematic pattern of inaccurate answers
Researchers who conduct statistical studies often want to draw conclusions (________________________) that go beyond the data they produce
make inferences
Confounding
non-distinguishable variables and how they impact the response variable
Undercoverage
occurs when some members of the population can't be chosen
Placebo effect
often times is favorable
Since the ideal isn't always practical, we survey a ____________ of the __________________________
sample, population
When the units are human beings, they are usually
subjects
Double-blind
the experimenter and patient don't know what they are getting
Inference
the process of drawing conclusions about the population based on sample data
A specific condition applied to the individuals in an experiment
treatment
Comparison
use a design that compares two or more treatments
Nonresponse
when an individual can't or won't participate
A well-designed experiment tells us that changes in the _________________________________________ cause changes in the _________________________________________
x (explanatory variable), y (response variable)
In the experiment of the previous exercise, the subjects were randomly assigned to the different treatments. What is the most important reason for this random assignment? (a) Random assignment eliminates the effects of other variables such as stress and body weight (b) Random assignment is a good way to create groups of subjects that are roughly equivalent at the beginning of the experiment (c) Random assignment makes it possible to make a conclusion about all men (d) Random assignment reduces the amount of variation in blood pressure (e) Random assignment prevents the placebo effect form ruining the results of the study
(b) Random assignment is a good way to create groups of subjects that are roughly equivalent at the beginning of the experiment
Consider an experiment to investigate the effectiveness of different insecticides in controlling pests and their impact on the productivity of tomato plants. What is the best reason for randomly assigning treatment levels (spraying or not spraying) to the experimental units (farms)? (a) Random assignment allows researchers to generalize conclusions about the effectiveness of the insecticides to all farms (b) Random assignment will tend to average out all other uncontrolled factors such as soil fertility so that they are not confounded with the treatment effects (c) Random assignment eliminates the effects of other variables, like soil fertility (d) Random assignment eliminates chance variation in the responses (e) Random assignment helps avoid bias due to the placebo effect
(b) Random assignment will tend to average out all other uncontrolled factors such as soil fertility so that they are not confounded with the treatment effects
The most important advantage of experiments over observational studies is that (a) experiments are usually easier to carry out (b) experiments can give better evidence of causation (c) confounding cannot happen in experiments (d) an observational study cannot have a response variable (e) observational studies cannot use random samples
(b) experiments can give better evidence of causation
A researcher wishes to compare the effects of two fertilizers on the yield of soybeans. She has 20 plots of land available for the experiment, and she decides to use a matched pairs design with 10 pairs of plots. To carry out the random assignment for this design, the researcher should (a) use a table of random numbers to divide the 20 plots into 10 pairs and then, for each pair, flip a coin to assign the fertilizers to the 2 plots (b) subjectively divide the 20 plots into 10 pairs (making the plots within a pair as similar as possible) and then, for each pair, flip a coin to assign the fertilizers to the 2 plots (c) use a table of random numbers to divide the 20 plots into 10 pairs and then use the table of random numbers a second time to decide on the fertilizer to be applied to each member of the pair (d) flip a coin to divide the 20 plots into 10 pairs and then, for each pair, use a table of random numbers to assign the fertilizers to the 2 plots (e) use a table of random numbers to assign the two fertilizers to the 20 plots and then use the table of random numbers a second time to place the plots into 10 pairs
(b) subjectively divide the 20 plots into 10 pairs (making the plots within a pair as similar as possible) and then, for each pair, flip a coin to assign the fertilizers to the 2 plots
To gather information about the validity of a new standardized test for tenth-grade students in a particular state, a random sample of 15 high schools was selected from the state. The new test was administered to every 10th grade student in the selected high schools. What kind of sample is this? (a) A simple random sample (b) A stratified random sample (c) A cluster sample (d) A systematic random sample (e) A voluntary response sample
(c) A cluster sample
When we take a census, we attempt to collect data from (a) a stratified random sample (b) every individual chosen in a simple random sample (c) every individual in the population (d) a voluntary response sample (e) a convenience sample
(c) every individual in the population
Tonya wanted to estimate the average amount of time that students at her school spend on Facebook each day. She gets an alphabetical roster of students in the school from the registrar's office and numbers the students from 1 to 1137. Then Tonya uses a random number generator to pick 30 distinct labels from 1 to 1137. She surveys those 30 students about their Facebook use. Tonya's sample is a simple random sample because (a) it was selected using a chance process (b) it gave every individual the same chance to be selected (c) it gave every possible sample of the same size an equal chance to be selected (d) it doesn't involve strata or clusters (e) it is guaranteed to be representative of the population
(c) it gave every possible sample of the same size an equal chance to be selected
A simple random sample of 1200 adult Americans is selected, and each person is asked the following question: "In light of the huge national deficit, should the government at this time spend additional money to establish a national system of health insurance?" Only 39% of those responding answered "Yes." This survey (a) is reasonably accurate since it used a large simple random sample (b) needs to be larger since only about 24 people were drawn from each state (c) probably understates the percent of people who favor a system of national health insurance (d) is very inaccurate but neither understates nor overstates the percent of people who favor a system of national health insurance. Because simple random sampling is used, it is unbiased (e) probably overstates the percent of people who favor a system of national health insurance
(c) probably understates the percent of people who favor a system of national health insurance
Your statistics class has 30 students. You want to call an SRS of 5 students from your class to ask where they use a computer for the online quizzes. You label the students 01, 02, ...., 30. You enter the table of random digits at this line: 14459 26056 31424 80371 65103 62253 22490 61181 Your SRS contains the students labeled (a) 14, 45, 92, 60, 56 (b) 14, 31, 03, 10, 22 (c) 14, 03, 10, 22, 22 (d) 14, 03, 10, 22, 06 (e) 14, 03, 10, 22, 11
(d) 14, 03, 10, 22, 06
A gardener wants to try different combinations of fertilizer (none, 1 cup, 2 cups) and mulch (none, wood chips, pine needles, plastic) to determine which combination produces the highest yield for a variety of green beans. He has 60 green-been plants to use in the experiment. If he wants an equal number of plants to be assigned to each treatment, how many plants will be assigned to each treatment? (a) 1 (b) 3 (c) 4 (d) 5 (e) 12
(d) 5
A TV station wishes to obtain information on the TV viewing habits in its market area. The market area contains one city of population 170,000, another city of 70,000, and four towns about 5000 inhabitants each. The station suspects that the viewing habits may be different in larger and smaller cities and in the rural areas. Which of the following sampling designs would give the type of information that the station requires? (a) A cluster sample using the cities and towns as clusters (b) A convenience sample from the market area (c) A simple random sample from the market area (d) A stratified sample from the cities and towns in the market area (e) An online poll that invites all people from the cities and towns in the market area to participate
(d) A stratified sample from the cities and towns in the market area
You want to know the opinions of American high school teachers on the issue of establishing a national proficiency test as a prerequisite for graduation from high school. You obtain a list of all high school teachers belonging to the National Education Association (the country's largest teachers' union) and mail a survey to a random sample of 2500 teachers. In all, 1347 of the teachers return the survey. Of those who responded, 32% say that they favor some kind of national proficiency test. Which of the following statements about this situation is true? (a) Because random sampling was used, we can feel confident that the percent of all American high school teachers who would say they favor a national proficiency test is close to 32% (b) We cannot trust these results, because the survey was mailed. Only survey results from face-to-face interviews are considered valid (c) Because over half of those who were mailed the survey actually responded, we can feel fairly confident that the actual percent of all American high school teachers would say they favor a national proficiency test is close to 32% (d) The results of this survey may be affected by nonresponse bias (e) The results of this survey cannot be trusted due to voluntary response bias
(d) The results of this survey may be affected by nonresponse bias
You wonder if TV ads are more effective when they are longer or repeated more often or both. So you design an experiment. You prepare 30-second and 60-second ads for a camera. Your subjects all watch the same TV program, but you assign them at random to four groups. One group sees the 30-second ad once during the program; another sees it three times; the third group sees the 60-second ad once; and the last group sees the 60-second ad three times. You ask all subjects how likely they are to buy the camera. (a) This is a randomized block design, but not a matched pairs design (b) This is a matched pairs design (c) This is a completely randomized design with one explanatory variable (factor) (d) This is a completely randomized design with two explanatory variables (factors) (e) This is a completely randomized design with four explanatory variables (factors)
(d) This is a completely randomized design with two explanatory variables (factors)
A study of treatments for angina (pain due to low blood supply to the heart) compared bypass surgery, angioplasty, and use of drugs. The study looked at the medical records of thousands of angina patients whose doctors had chosen one of these treatments. It found that the average survival time of patients given drugs was the highest. What do you conclude? (a) This study proves that drugs prolong life and should be the treatment of choice (b) We can conclude that drugs prolong life because the study was a comparative experiment (c) We can't conclude that drugs prolong life because the patients were volunteers (d) We can't conclude that drugs prolong life because this was an observational study (e) We can't conclude that drugs prolong life because no placebo was used
(d) We can't conclude that drugs prolong life because this was an observational study
Bias in a sampling method is (a) any difference between the sample result and the truth about the population (b) the difference between the sample result and the truth about the population due to using chance to select a sample (c) any difference between the sample result and the truth about the population due to practical difficulties such as contacting the subjects selected (d) any difference between the sample result and the truth about the population that tends to occur in the same direction whenever you use this sampling method (e) racism or sexism on the part of those who take the sample
(d) any difference between the sample result and the truth about the population that tends to occur in the same direction whenever you use this sampling method
You want to take a simple random sample (SRS) of 50 of the 816 students who live in a dormitory on campus. You label the students 001 to 816 in alphabetical order. In the table of random digits, you read the entries 95592 94007 69769 33547 72450 16632 81194 14873 The first three students in your sample have labels (a) 955, 929, 400 (b) 400, 769, 769 (c) 559, 294, 007 (d) 929, 400, 769 (e) 400, 769, 335
(e) 400, 769, 335
Explanatory variable
(x) helps explain or predict changes in the response variable
Response variable
(y) measures the outcome
AP Exam Tip: Be sure to get as much credit as possible on the exam when describing how you assign treatments to experimental units in an experiment:
- Method used must be random - Method must be described in sufficient detail - Address how you will deal with repeated numbers that come up when using a random number generator or random number table
What are the criteria for establishing causation if we can't do an experiment?
- The association is strong - The association is consistent - Larger values of the explanatory variable (x) are associated with stronger responses - The alleged cause precedes the effect in time - The alleged cause is plausible
AP Exam Tip: If you are asked to describe how the design of a study leads to bias, you're expected to do two things:
1. Identify a problem with the design 2. Explain how this would lead to an under/over estimation
Steps to plan a sample survey
1. State exactly what population we want to describe 2. State exactly what we want to measure; exact definition of variable 3. Decide how to choose a sample from the population
2 reasons we can trust random samples:
1. The results obey the laws of probability 2. Laws of probability allow trustworthy interference about the population
AP Exam Tip: If you're asked to identify a possible confounding variable in a given setting, you are expected to explain how the variable you chose:
1. is associated with the explanatory variable 2. affects the response variable
_______________________________________ may limit our ability to apply the conclusions of an experiment to the settings of greatest interest
Lack of realism
This is a possible source of bias in a sample survey. Name the type of bias that could result: Some people cannot be contacted in five calls
Nonresponse
Voluntary response sampling
People who choose themselves to respond
______________________________________ of individuals to groups permits inference about ________________________________________
Random assignment, cause and effect
_____________________________________ of individuals allows inference about the ____________________________
Random selection, population
This is a possible source of bias in a sample survey. Name the type of bias that could result: The sample is chosen at random from a telephone directory
Undercoverage
This is a possible source of bias in a sample survey. Name the type of bias that could result: Interviewers choose people walking by on the sidewalk to interview
Undercoverage
What can go wrong in sample surveys
Undercoverage, nonresponse, response bias and the wording of the question
Bias
Using a method that favors some outcomes over others
Identify the sampling method used and explain how the sampling method could lead to bias: The ABC program Nightline once asked whether the United Nations should continue to have its headquarters in the United States. Viewers were invited to call one telephone number to respond "Yes" and another for "No." There was a charge for calling either number. More than 186,000 callers responded, and 67% said "No"
Voluntary response
Cluster sample
all individuals in chosen cluster are sampled
Placebo
an inactive treatment
Statistically significant
an observed effect so large it would rarely occur by chance
Be sure to choose the strata _____________ the sample is taken!!
before
A note to make early: Larger random samples give ________________________________________ than smaller random samples
better information about the population
If treatments are given to groups that differ greatly when the experiment begins, ____________ will result
bias
When our goal is to understand ______________________ and _____________________, experiments are the only sources of fully convincing data
cause, effect
A ________________________ collects data from every individual in the population
census
Random sampling
chance chooses the sample
In experiments, the solution is random assignment, which means that experimental units are assigned to treatments using a _______________________________________
chance process
Stratified random sample
combine an SRS from each strata
All individual data must be kept _____________________________ (not the same thing as anonymous). Only statistical summaries for groups of subjects may be made public
confidental
Outside the lab, badly designed experiments often yield worthless results because of ______________________
confounding
Matched Pairs Design
creates blocks by matching pairs of similar experimental units; chance decides which member of a pair gets the first treatment and the other subject in that pair receives that other treatment
Experiment
deliberately imposes a treatment
The ________________________________________ determines the type of inference that can be made from a particular study
design of the study
The population in a statistical study is the ________________________ of individuals we want information about
entire group
Simple random sample (SRS)
every group of, n, individuals has an equal chance to be selected
The _________________________________ are the smallest collection of individuals to which treatments are applied
experimental units
Strata
group of similar individuals
Cluster
groups located near each other
Observational study
has no treatment imposed
Replication
have enough EUs in each group so that any differences in the effects of the treatment can be distinguished from chance differences
All individuals who are subjects in the study must give their ________________________________ before data is collected
informed consent
All planned studies must be reviewed in advance by an ________________________________ charged with protecting the safety and well-being of the subjects
institutional review
Control
keep other variables that might affect the response the same for all
The purpose of a sample is to give us information about a __________________________________
larger population
Cluster sampling works best when the clusters ________________________________________________________________
look just like the population
Single-blind
patient or experimenter doesn't know what they are getting
Randomized Block Design
random assignments of EUs to treatments is carried out separated within each block
The solution to the problem of bias in sampling is ___________________________________________
random selection
AP Exam Tip: When describing how to select a sample using a random integer generator, be sure to:
select 4 distinct (different) numbers; ignore repeats
An experiment is a statistical study in which we actually do something (_______________________) to people, animals, or objects (________________________________) to observe the response
treatment, experimental units
Random Assignment
use chance to assign EUs to treatments creating roughly equivalent groups
Sample results will ____________________________________
vary just by chance
The _______________________________________________ is the most important influence on the answers given to a sample survey
wording of the question