AP Stats Chapter 4
Planning a Sample Survey
1.Decide what population we want to describe. 2.Decide what we want to measure. 3.Decide how to choose a sample from the population
block
A block is a group of experimental units that are known before the experiment to be similar in some way that is expected to affect the response to the treatments.
census
A census collects data from every individual in the population.
cluster
A cluster is a group of individuals in the population that are located near each other.
completely randomized design
In a completely randomized design, the experimental units are assigned to the treatments completely by chance.
double-blind
In a double-blind experiment, neither the subjects nor those who interact with them and measure the response variable know which treatment a subject received.
randomized block design
In a randomized block design, the random assignment of experimental units to treatments is carried out separately within each block. Using a randomized block design allows us to account for the variation in the response that is due to the blocking variable. This makes it easier to determine if one treatment is really more effective than the other. When blocks are formed wisely, it is easier to find convincing evidence that one treatment is more effective than another.
single-blind
In a single-blind experiment, either the subjects don't know which treatment they are receiving or the people who interact with them and measure the response variable don't know which subjects are receiving which treatment.
Principles of Experimental Design
The basic principles for designing experiments are as follows: 1.Comparison. Use a design that compares two or more treatments. 2.Random assignment. Use chance to assign experimental units to treatments. Doing so helps create roughly equivalent groups of experimental units by balancing the effects of other variables among the treatment groups. 3.Control. Keep other variables the same for all groups, especially variables that are likely to affect the response variable. Control helps avoid confounding and reduces variability in the response variable. 4.Replication. Use enough experimental units in each group so that any differences in the effects of the treatments can be distinguished from chance differences between the groups.
placebo effect
The placebo effect describes the fact that some subjects in an experiment will respond favorably to any treatment, even an inactive treatment.
population
The population in a statistical study is the entire group of individuals we want information about.
Sample Surveys: What Can Go Wrong?
The use of bad sampling methods often leads to bias. Researchers can avoid these methods by using random sampling to choose their samples. Other problems in conducting sample surveys are more difficult to avoid. Such as: Under coverage, nonresponse, response bias CAUTION: Some students misuse the term voluntary response to explain why certain individuals don't respond in a sample survey. Their belief is that participation in the survey is optional (voluntary), so anyone can refuse to take part. What the students are describing is nonresponse.
Designing Experiments: Random Assignment
To create roughly equivalent groups at the beginning of an experiment, we use random assignment to determine which experimental units get which treatment.
There are several criteria for establishing causation when we can't do an experiment:
•The association is strong. •The association is consistent. •Larger values of the explanatory variable are associated with stronger responses. •The alleged cause precedes the effect in time. •The alleged cause is plausible.
Designing Experiments: Comparison
Good designs are essential for effective experiments, just as they are for sampling. To see why, let's start with an example of a bad experimental design. Experimental Units -> Treatment -> Measure Response In this design, there are many other variables (besides the treatment) that are potentially confounded with taking caffeine. The remedy for potential confounding is to do a comparative experiment with two or more groups
matched pairs design
A matched pairs design is a common experimental design for comparing two treatments that uses blocks of size 2. In some matched pairs designs, two very similar experimental units are paired and the two treatments are randomly assigned within each pair. In others, each experimental unit receives both treatments in a random order. The idea is to create blocks by matching pairs of similar experimental units. Just as with other forms of blocking, matching helps account for the variation due to the variable(s) used to form the pairs.
placebo
A placebo is a treatment that has no active ingredient, but is otherwise like other treatments.
response variable
A response variable measures an outcome of a study.
How to Sample Well: Simple Random Sampling
A sample chosen by chance rules out both favoritism by the sampler and self-selection by respondents. - Random Sampling - Simple Random Sample (SRS)
sample
A sample is a subset of individuals in the population from which we actually collect data.
sample survey
A sample survey is a study that collects data from a sample that is chosen to represent a specific population.
simple random sample (SRS)
A simple random sample (SRS) of size n is chosen in such a way that every group of n individuals in the population has an equal chance to be selected as the sample.
treatment
A specific condition applied to the individuals in an experiment is called a treatment. If an experiment has several explanatory variables, a treatment is a combination of specific values of these variables.
experiment
An experiment deliberately imposes some treatment on individuals to measure their responses. An experiment is a statistical study in which we actually do something (a treatment) to people, animals, or objects (the experimental units or subjects) to observe the response.
experimental unit & subjects
An experimental unit is the object to which a treatment is randomly assigned. When the experimental units are human beings, they are often called subjects.
explanatory variable
An explanatory variable may help explain or predict changes in a response variable.
Systematic random sampling
Another way to choose a sample by random chance is systematic random sampling. Systematic random sampling selects a sample from an ordered arrangement of the population by randomly selecting one of the first k individuals and choosing every kth individual thereafter. Systematic random sampling is particularly useful in certain contexts, such as exit polling at a polling place on Election Day. Because an unknown number of voters will come to the polling place that day. CAUTION: If there are patterns in the way the population is ordered that coincide with the pattern in a systematic sample, the sample may not be representative of the population.
How to Sample Badly
Choosing individuals from the population who are easy to reach results in a convenience sample CAUTION: Convenience sampling often produces unrepresentative data. The design of a statistical study shows bias if it is very likely to underestimate or very likely to overestimate the value you want to know. CAUTION: Bias is not just bad luck in one sample.
convenience sample
Choosing individuals from the population who are easy to reach results in a convenience sample Convenience sampling will almost always result in bias. But so will some other sampling methods.
Confounding
Confounding occurs when two variables are associated in such a way that their effects on a response variable cannot be distinguished from each other.
control group
In an experiment, a control group is used to provide a baseline for comparing the effects of other treatments. Depending on the purpose of the experiment, a control group may be given an inactive treatment (placebo), an active treatment, or no treatment at all. In all other aspects, the treatment groups should be treated exactly the same so that the only difference is the treatments. This way, if there is convincing evidence of a difference in the average response, we can safely conclude it was caused by the treatments.
Factor & Levels
In an experiment, a factor is a variable that is manipulated and may cause a change in the response variable. The different values of a factor are called levels.
control
In an experiment, control means keeping other variables constant for all experimental units. After randomly assigning treatments and controlling other variables, treatment groups should be about the same, except for the treatments. Then a difference in the average response must be due either to the treatments themselves or to the random assignment. We can't say that any difference in average response between treatment groups must be caused by the treatments because there would likely be some difference just because the random assignment is unlikely to produce two groups that are exactly equivalent.
random assignment
In an experiment, random assignment means that experimental units are assigned to treatments using a chance process.
replication
In an experiment, replication means using enough experimental units to distinguish a difference in the effects of the treatments from chance variation due to the random assignment. CAUTION: In statistics, replication means "use enough subjects." In other fields, the term replication has a different meaning.
Well-designed experiments allow for inferences about cause and effect.
In an experiment, there are two ways to explain why the average response for one group is different than the average response for another group: 1.The treatment does not have a different effect for the groups, and the difference in the response happened because of chance variation in the random assignment. 2.The treatment causes a difference in the average response of the groups. When the observed results of a study are too unusual to be explained by chance alone, the results are called statistically significant.
Sampling Variability and Sample Size
Larger random samples tend to produce estimates that are closer to the true population value than smaller random samples. In other words, estimates from larger samples are more precise.
Collect data from a representative sample...
Make an inference about the population.
Nonresponse
Nonresponse occurs when an individual chosen for the sample can't be contacted or refuses to participate.
retrospective
Observational studies that examine existing data for a sample of individuals are called retrospective. From the past
prospective
Observational studies that track individuals into the future are called prospective. Track people
Stratified random sampling
One of the most common alternatives to simple random sampling is called stratified random sampling. Stratified random sampling selects a sample by choosing an SRS from each stratum and combining the SRSs into one overall sample. Stratified random sampling works best when the individuals within each stratum are similar (homogeneous) with respect to what is being measured and when there are large differences between strata. When we can choose strata that have similar responses within strata but different responses between strata, stratified random samples give more precise estimates than simple random samples of the same size.
Random sampling
Random sampling involves using a chance process to determine which members of a population are included in the sample.
Response bias
Response bias occurs when there is a systematic pattern of inaccurate answers to a survey question.
observational study
Sample surveys are one kind of observational study An observational study observes individuals and measures variables of interest but does not attempt to influence the responses.
Sampling variability
Sampling variability refers to the fact that different random samples of the same size from the same population produce different estimates.
Strata
Strata are groups of individuals in a population who share characteristics thought to be associated with the variables being measured in a study.
Undercoverage
Undercoverage occurs when some members of the population are less likely to be chosen or cannot be chosen in a sample.
Voluntary response sampling
Voluntary response sampling allows people to choose to be in the sample by responding to a general invitation. Most Internet polls, along with call-in, text-in, and write-in polls, rely on voluntary response sampling. People who self-select to participate in such surveys are usually not representative of some larger population of interest.
The Idea of a Sample Survey
We often draw conclusions about a whole population on the basis of a sample. Choosing a sample from a large, varied population is not that easy.
Cluster sampling
When populations are large and spread out over a wide area, we'd prefer a method that selects groups (clusters) of individuals that are "near" one another. That's the idea of cluster sampling. Cluster sampling selects a sample by randomly choosing clusters and including each member of the selected clusters in the sample. Cluster sampling works best when the individuals in each cluster are heterogeneous (mirroring the population). Cluster sampling is often used for practical reasons, like saving time and money.
Inference for Sampling
When the members of a sample are selected at random from a population, we can use the sample results to make inferences about the population. Even when making an inference from a random sample, it would be surprising if the estimate from the sample was exactly equal to the truth about the population.
statistically significant
When the observed results of a study are too unusual to be explained by chance alone, the results are called statistically significant.
How to Choose an SRS with Technology
•Label. Give each individual in the population a distinct numerical label from 1 to N, where N is the number of individuals in the population. •Randomize. Use a random number generator to obtain n different integers from 1 to N, where n is the sample size. •Select. Choose the individuals that correspond to the randomly selected integers.
How to Choose an SRS with Table D
•Label. Give each member of the population a distinct numerical label with the same number of digits. Use as few digits as possible. •Randomize. Read consecutive groups of digits of the appropriate length from left to right across a line in Table D. Ignore any group of digits that wasn't used as a label or that duplicates a label already in the sample. Stop when you have chosen n different labels. •Select. Choose the individuals that correspond to the randomly selected integers.
The Scope of Inference
•Random selection of individuals allows inference about the population from which the individuals were chosen. •Random assignment of individuals to groups allows inference about cause and effect.