Stats Ch 8
Sampling bias
-bad sampling frame -sampling method tends to obtain non-representative samples examples of sampling bias: undercoverage: when sampling frame doesn't represent all parts of the population; when some portion of the population is not samples or has a smaller representation in the sample than it has in the population over coverage: when members that are not in the population of interest are included in the sample -good sampling design can prevent sampling bias
ways to evaluate survey questions
-experts review questions -conducting a focus group discussion -conducting cognitive interviews - how respondents understand Q's formulate A's -conduct field pretests or pilot studies of the draft survey
Voluntary response bias (?)
-usually when all can respond only those with strong opinions will respond -results skewed due to large number of people with strong opinions responding -to prevent voluntary response bias, only accept responses from those sampled
Response bias
-wording of questions is confusing/misleading -questions are asked in a leading way -subjects lie because they think their response is socially unacceptable -prevent response bias by constructing questions that are clear and understandable, avoid confusing, long and leading questions -good sampling design CANNOT prevent nonresponse bias (?)
three ideas of sampling
1. -we sample to learn about a population -it is unrealistic to conduct a census due to time/money restraints -sample should be representative of population for which it is obtained -relevant sample reflect the relevant characteristics of population -biased sample: missing one/more characteristics of population 2.-in order to gain representative sample, must employ randomization -randomization: random process that generates members of sample -ensures everyone in sampling frame has chance of being selected; all relevant characteristics of population are represented 3.-in sampling, it is the size of the sample that matters, not the size of the population -sample size just has to be representative (spoon of soup includes all parts of ingredients) -larger sample size ≠ better information if sample is representative (bigger spoon to taste soup once all ingredients are present) -sample size should not be greater than 10% of population
4 questions for a valid survey
1. what do I want to know? -short survey, bare minimum 2. who are the right respondents? -define population of interest, have correct sampling frame and sample techniques to generate a representative sample 3. what are the right questions? -ask specific, carefully worded, clear questions, beware of measurement errors via inaccurate responses, pilot test survey 4. what will be done with the results? -be sure you need to conduct a survey to obtain the data you need
sample
a subset of individuals selected form the population
sampling frame
all members of a population of interest
3 ways to obtain information
census: info from entire population of interest sampling: choose a representative sample from population of interest experimentation: researchers impose treatments and controls on subjects (exp units) from the population of interest (not randomly selected, but random assignment of treatment/control)
statistical sampling designs use
chance rather than choice to select the sample
3 standards for Evaluating survey questions
content, cognitive, usability content: determine if the questions will generate the needed data cognitive: if the respondents can understand the questions usability: if the respondents can answer the questions easily
poor ways to sample
convenance sample -include those who are easy to sample and therefore may not represent the population -ex: internet polls, polls in shopping malls -however, they are often essential in observational studies and experiments (ex: medical research) volunteer (response) sample -most common type of convenance sample -difficult to define the sampling (?) -stronger opinions, not representative of the population ex: call in shows, text in polls, internet polls large, non-representative sample -sample size does not matter if its not representative of the population of interest
good sampling design
ensures each subject in a population has an opportunity to be selected an incorporates randomness in the selection process
census
entire population is sample -difficult to complete, expensive, population may chance while taking the census -survey in shorter tf is better
is simple random sampling random sampling??
idk
why not match sample to population (why employ randomization?)
impossible to think of all relevant factors/match sample to pop for all characteristics
actual sample
individuals from who you actually receive responses -sample we obtain
target sample
individuals from whom you intend to measure responses -those we sample
systematic sampling
less expensive than random sampling 1. randomly select a starting place 2. employ a systematic method to continue choosing sample ex: every 20th name on a list of names 3.order of list cannot be associated any way with the responses sought 4. beware of confounding variables ex: alphabetized list may skip ethnic groups with similar last name
sampling design
method used to obtain a sample
parameter
numerical characteristic of a population, a fixed quantity numbers in model
statistic
numerical characteristic of a sample, a variable quantity (changes depending on people in sample)
representative sample
sample is considered representative of population when each subject in the population has the same chance of being included in that sample -statistics computed from sample estimate the corresponding population parameters accurately
Potential sources of bias in surveys
sampling bias nonresponse bias response bias
multistage
sampling schemes that combine several methods ex: 2 stage cluster sampling 1. choose 1 chapter from each of the 5 parts of the textbook (stratified sampling) 2. select a few pages from each chapter (cluster sampling) 3. select a few sentences from each page (SRS)
stratified random sampling
strata: homogeneous group sliced from population -divide population into strata -use simple random sampling within each stratum to choose members -combine end results from each strata to form sample -used to reduce sampling variability -helps us see differences among groups -may be difficult
population
total set of ALL individuals of interest
how to assess surveys (when taking them)
-how was the sample selected -what was the sample size -method, location, date of data collected -response rate -wording of Q's -how many Q's -are the questions confusing, leading -are the questions controversial -who sponsored the study? -who conducted the study? -population of interest -sampling fram -should have a description of any weighting of the data less you know these details, the less you should trust the results
cluster random sampling
-preferred when a reliable sampling frame is not available or when the cost of selecting a simple random sample is excessive -clusters: heterogeneous groups that resemble overall population; can give an unbiased sample 1. split population into clusters 2.use random sampling to select several clusters 3. perform a *census* of each selected cluster uses a few clusters vs all groups (stratified)
randomization
-protects against factors you are unaware of -helps ensure that the sample represents all features of a population
guidelines for wiring good questions
-questions should be as specific as possible -use words that are easy to understand -avoid double barreled words -start with general questions and move onto more specific ones -when asking q's about multiple items, start with the least popular one -include all reasonable possibilities as response options -use std wording so answers can be summarized, tabulated -add memory cues (ex: life calendar) to improve recall if forgetting is likely -use household records for accurate reporting -for a longitudinal study (chg over time), ask the same questions each time
Nonresponse bias
-some sampled subjects cannot be reached or refuse to participate -subjects who are willing to participate are different from overall sample -even those who do participate may not respond to some questions, which causes nonresponse bias via missing data -nonresponse bias can be prevented through following up with sample -good sampling design CANNOT prevent nonresponse bias
simple random sampling
-use random device to select sample ex: toss coin, names out of hat, technology -each possible sample (combo of individuals) of a given size is equally likely to be the one obtained -eliminates unintentional selection bias -can be done with or without replacement (selected more than once/cannot be selected more than once) -sampling variability/error: sample to sample differences based on random selection (no actual error has taken place)