PSYC 3980 Exam 2 (Chapters 6-9)
Population of interest
Before researchers can decide whether a sample is biased or unbiased, they have to specify a population to which they want to generalize - Instead of the population as a whole, a research study's population is more limited Ex: If we are considering results of a national election poll, we might primarily care about the population of people who will vote in the next election in the country
convenience sampling
Choosing a sample based on those who are easiest to access and readily available; a biased sampling technique. - sampling only areas that are at a convenient reach - sampling people who you feel more comfortable approaching - being unable to sample some people Graph: less people picking up phone when pollster calls; polling organizations developing and using internet-based polling methods as a solution
probability sampling (random sampling) - all types of this sampling involve an element of random selection
When external validity is vital and researchers need an unbiased, representative sample from a population, which type of sampling is the best option?
CIs that do not contain zero (Statistical Validity Question 2)
When the 95% CI does not include zero, it is common to say the association is statistically significant
Nonprobability samples in the real world
When you know a sample is not representative, you should think carefully about how much it matters. Are the characteristics that make the sample biased actually relevant to what you are measuring? In certain cases, it's reasonable to trust the reports of unrepresentative samples.
convenience sampling?
Which nonprobability sampling technique is most common?
- They can be surprisingly difficult and time consuming - variants are just as externally valid as simple random sampling because they all contain an element of random selection
Why are variants of the basic technique (simple random sampling & systematic sampling) used instead sometimes?
Population, sample, and census comparison
You don't need to eat the whole bag (the whole population) to know whether you like the chips; you only need to test a small sample. If you did taste every chip in the population, you would be conducting a census.
Purposive sampling
a biased sampling technique in which only certain kinds of people are included in a sample (in a nonrandom way) Ex: researchers studying the effectiveness of a specific intervention to quit smoking would seek only smokers for their sample. Limiting this sample to one type of participant does not make it this type of sample. If researchers recruit smokers at random in a community, it is not this type of sample because it is a random sample. BUT, if researchers recruit the sample of smokers by posting flyers at a local tobacco store, that action makes it this type of sample because only smokers will participate and because the smokers are not randomly selected
nonprobability sampling
a category name for nonrandom sampling techniques, such as convenience, purposive, and quota sampling, that result in a biased sample
effect size
the magnitude, or strength, of a relationship between two or more variables
simple random sampling
the most basic form of probability sampling, in which the sample is chosen completely at random from the population of interest - each member of population's name on a ball that is rolled around in a bowl that spits out a number of balls equal to size of the desired sample; people's names on selected balls will make up the sample - assign a # to each person in population and select certain ones using a table of random numbers - pollsters program computers to randomly select telephone numbers or home addresses from a database of eligible people
replication
the process of conducting a study again to test whether the result is consistent
Random assignment
the use of a random method to assign participants into different experimental groups Ex: In an experiment testing how exercise affects well-being, random assignment would make it likely that the people in the treatment and comparison groups are equally happy at the start
self-reporting "more than they can know"
we don't always know why we behave, think, or feel a certain way
unbiased sample (representative sample)
A sample in which all members of the population of interest are equally likely to be included (usually through some random method), and therefore the results can generalize to the population of interest.
association claim
An association claim is not supported by a particular kind of statistic or a particular kind of graph; it is supported by a study design--correlational research--in which all variables are measured
Pros of Self-report measures
- People are able to report their own gender identity, happiness, income, ethnicity, and so on; there is no need to use expensive or difficult measures to collect such information. - People can accurately report on things they did or that happened to them. - Diener and his colleagues subjective well-being self-report measure -- Who else but you knows how happy you feel? - Self-reports may be only option -- can monitor brain activity to identify when someone is dreaming, but need self-report to find out the content of the person's dream - finding how anxious someone feels or if someone has been a victim of violence is not very observable
Combining Techniques
- As long as clusters and individuals are selected at random, the sample will represent the population of interest and will have good external validity - to control for bias, researchers might supplement random selection with a statistical technique called weighting: If final sample determined to contain fewer members of a subgroup then it should, they adjust data so responses from members of underrepresented categories count more and overrepresented members count less
Biased and Unbiased Sample of Different Populations of Interest
- a researcher's sample might contain too many of the most unusual people Ex: Students who rate a professor might tend to be the ones who are angry are disgruntled and may not represent the rest of the professor's students very well - researcher's sample might include only one kind of people, when the population of interest is more like a variety pack Ex: A study sampled only men when the population of interest contains both men and women
When External Validity is a low priority:
- random assignment is prioritized over random sampling when conducting an experiment
Associations between two quantitative variables review
- scatterplots and the correlation coefficient r to describe relationship b/w 2 measured variables - positive r = High scores one 1 variable go with high scores on other variable - r has two qualities: direction (+, -, 0) and strength (how closely related the 2 variables are--> strong correlation will be closer to 1.0 or -1.0)
All else being equal, larger effect sizes are more important (Statistical Validity Question 1)
- the association between deep talk and happiness (r=.26) may be more important than the weaker one b/w meeting online and having a happier marriage - But, a tiny effect size can be important in situations like a small change in average global temperature in which can lead to major sea-level rise
Construct validity of association claims
1. How well was each variable measured? - once you know what kind of measure was used for each variable, you can ask questions to assess each one's construct validity: 2. Does the measure have good reliability? 3. Is it measuring what it's intended to measure? 4. What is the evidence for its face validity, its concurrent validity, its discriminant and convergent validity?
Sample size is not an external validity issue; it is a statistical validity issue
1. The larger the sample size, the smaller the margin of error - that is, the more accurately the sample's results estimate the views of the population 2. After a random sample size of 1,000, it takes many more people to gain just a little more accuracy in the margin of error. - Reason why 1,000 is seen as an optimal balance between statistical accuracy and polling effort. - Sample of 1,000 people, as long as its random, allows them to generalize to the population (even a population of 330 million) quite accurately
How can researcher's stop the tendency of Fence Sitting?
1. When a scale contains an even number of response options, the person has to choose one side or the other because there is no neutral choice - drawback is that sometimes people really do not have an opinion or an answer, so having to choose a side is invalid 2. Using force-choice questions: people must pick one of two answers - still can be invalid for people who feels there opinion is in the middle of the two options
Observer Bias
A bias that occurs when observer expectations influence the interpretation of participant behaviors or the outcome of the study Ex: Psychoanalytic therapists were shown a videotape of a 26 year old man talking to a professor about his feelings and work experiences - half were told he was a patient; other half told he was a job applicant - their descriptions afterwards were very different
Quota sampling
A biased sampling technique in which a researcher identifies subsets of the population of interest, sets a target number for each category in the sample, and nonrandomly selects individuals within each category until the quotas are filled. - similar to stratified sampling because both specify subcategories and attempt to fill targeted percentages or numbers for each subcategory - But, in this sampling technique, the particpants are selected nonrandomly (maybe through convenience or purposive sampling), and in stratified random sampling they are selected using a random selection technique
probability sampling (random sampling)
A category name for random sampling techniques, such as simple random sampling, stratified random sampling, and cluster sampling, in which a sample is drawn from a population of interest so each member has an equal and known chance of being included in the sample
Reactivity
A change in behavior of study participants (such as acting less spontaneously) because they are aware they are being watched.
Observer effects (expectancy effects)
A change in behavior of study participants in the direction of observer expectations Ex: each student received a randomly selected group of rats and half were told their rats were "maze-bright" and the other half were told their rats were "maze-dull". "maze-bright rats" completed mazes faster than "maze-dull rats" even though they were genetically similar *Observers not only see what they expect to see; sometimes they even cause the behavior of those they are observing to conform to their expectations* Ex: A horse was thought to be able to do math, but it was found he could detect nonverbal gestures from his owner and others *Shows how an observer's' subtle behavior changed a subject's behavior*
Stratified random sampling
A form of probability sampling; a random sampling technique in which the researcher identifies particular demographic categories, or strata, and then randomly selects individuals within each category. Ex: Researchers may want to be sure their sample of 1,000 Canadians includes people of South Asian descent in the same proportion as in the Canadian population (which is 4%). Thus, they may have two categories (strata) in their population: South Asian Canadians and other Canadians. In a sample of 1,000 they would make sure to include at least 40 members of the category of interest (South Asian Canadians). But, all 1,000 members of both categories are selected at random
Oversampling
A form of probability sampling; a variation of stratified random sampling in which the researcher intentionally overrepresents one or more groups. Ex: Instead of 40 people, researcher decides that of the 1,000 people they sample, a full 100 will be sampled at random from the Canadian South Asian community. The ethnicities of the participants are still the categories, but the researcher oversampled the South Asian population: the population will constitute 10% of the sample, even though it represents only 4% of the population - A survey that includes an oversample adjusts the final results so members in the oversampled group are weighted to their actual proportion in the population - still a probability sample because the 100 South Asians in final sample were sampled randomly from the population of South Asians
self-selection
A form of sampling bias that occurs when a sample contains only people who volunteer to participate (problems for external validity) - When members of an Internet survey panel are invited via random selection, then self-selection bias can be ruled out
population
A larger group from which a sample is drawn; the group to which a study's conclusions are intended to be applied
Cluster sampling
A probability sampling technique in which clusters of participants within the population of interest are selected at random, followed by data collection from all individuals in each cluster. Ex: If a researcher wanted to randomly sample high school students in the state of Pennsylvania, he could: 1. start with a list of the 952 public high schools (clusters) in that state 2. randomly select 100 of those high schools (clusters) 3. and then include every student from each of those 100 schools in the sample
systematic sampling
A probability sampling technique in which the researcher uses a randomly chosen number N, and counts off every Nth member of a population to achieve a sample. Ex: 1. Researcher starts by selecting 2 random numbers, say 4 and 7. 2. If population of interest was a room of students, researcher would start with the 4th person in the room 3. and then count off, choosing every 7th person until sample is the desired size. Ex: Mehl and colleagues used EAR device to sample conversations every 12.5 minutes. Although number was not chosen at random, the effect is essentially the same as being a random sample of participant's conversations
Negative Worded Questions
A question in a survey or poll that contains negatively phrased statements, making its wording complicated or confusing and potentially weakening its construct validity Ex: Survey on Holocaust Denial -- "Does it seem possible or does it seem impossible to you that the Nazi extermination of the Jews never happened?" - 20% denied that Nazi Holocaust every happened - Second Survey with reworded question -- "Does it seem possible to you that the Nazi extermination of the Jews never happened, or do you feel certain that it happened?" - only 1% responded that Holocaust may have not happened, 8% did not know, 91% said they were certain it happened *Survey probably did not measure people's true beliefs* Ex: Sometimes even on negative word can make a question difficult to answer (picture) - after asking question both ways, researchers can study items internal consistency (using Cronbach's alpha) to see whether people respond similarly to both questions (agreement with first item should correlate with with disagreement with the second item)
Open-ended questions
A survey question format that allows respondents to answer any way they like - responses provide researchers with spontaneous, rich information - drawback is that responses must be coded and categorized, a process that is difficult and time-consuming Ex: commenting on experience at hotel
Likert scale
A survey question format using a rating scale containing multiple response options anchored by the specific terms strongly agree, agree, neither agree nor disagree, disagree, and strongly disagree. Ex: Rosenberg Self-Esteem Scale "I am able to do things as well as most other people" Scale: 1 to 5 (1 = strongly disagree, 5 = strongly agree)
Double-barreled Question
A type of question in a survey or poll that is problematic because it asks two questions in one, thereby weakening its construct validity - people might be responding to the first half of the question, the second half, or both (therefore measuring only the first construct, the second construct, or both) Ex: Online survey from the National Rifle Association asked this question: (picture) Careful researchers would have asked each question separately: 1. Do you agree that the Second Amendment guarantees your individual right to own a gun? ---Support ---Oppose ---No Opinion 2. Do you agree that the Second Amendment is just as important as your other Constitutional rights? ---Support ---Oppose ---No Opinion
Graphing associations when one variable is categorical
Although you can make a scatterplot of data where one variable is categorical, you can also use a bar graph. - each person is not represented by one data point; instead, the graph shows the mean marital satisfaction rating (the arithmetic average) for all the people who met their spouses online and the mean marital satisfaction rating for those who met their spouses in person - in a bar graph, you usually examine the diff. b/w the group avgs. to see whether there is an association (magnitude of diff. b/w means (group avgs.) Ex: in the graph, the average satisfaction score is slightly higher in the online group than offline group --> diff. in means indicates an association b/w where people met their spouse and marital satisfaction
Statistical Validity Question 4:
Could outliers be affecting the association? Outliers can have a large impact on the direction or strength of the correlation. - In a bivariate correlation, outliers are an issue when they involve extreme scores on both variables Ex: evaluating the positive correlation b/w height and weight; extremely tall and heavy would make r appear stronger and extremely short and heavy would make r appear weaker - Best way to find outliers is through a scatter plot
Question Order Importance
Example 1: Political Opinion researcher David Wilson and his colleagues asked people whether they supported affirmative action for different groups. Half the participants were asked two forced-choice questions in this order: 1. Do you generally favor or oppose affirmative action programs for women? 2. Do you generally favor or oppose affirmative action for racial minorities? Other half were asked the same questions, but in opposite order - Whites reported more support for affirmative action for minorities when they had first been asked about affirmative action for women - Presumably, most Whites support affirmative action for women more than they do for minorities. - To appear consistent, they might feel obligated to express support for affirmative action for racial minorities if they have just indicated their support for affirmative action for women Example 2: People are up to 5% more likely to vote for the first person they see on a ballot (Miller & Krosnick, 1998). Because of this research, a Florida court ruled in 2019 that the party in power does not have the right to list its own candidates first. (picture)
The idea that larger samples are more externally valid than smaller samples is perhaps one of the most persistent misconceptions in a research methods course. - when a phenomenon is rare, we do need a large random sample to locate enough instances of the phenomenon for valid statistical analysis - For external validity it is how, not how many - For statistical accuracy, most polls use a sample of at most 2,000 - A researcher chooses a sample size for the poll in order to optimize the margin of error of the estimate (a statistic that sets up the confidence interval for a study's estimate)
For external validity, is a bigger sample always a better sample?
Faking bad
Giving answers on a survey (or other self-report measure) that make one look worse than one really is
Socially desirable responding
Giving answers on a survey (or other self-report measures) that make one look better than one really is - bc respondents are embarrassed, shy, or worried about giving people an unpopular opinion, they will not tell the truth on a survey or other self-report measure
Statistical Validity Question 3:
Has it been replicated? Solid evidence that engaging in substantive conversations was moderately associated with life satisfaction
Researchers can assess the construct validity of these coded measures by using multiple observers - this allows them to assess the interrater reliability of the ratings - In one study of emotional tone, they reported the way they coded emotional tone in the Method section of their article - Used ICC (a correlation that quantifies degree of agreement) where the closer to 1.0 it is, the more observers agreed with each other showing interrater reliability - even when operationalizations have good interrater reliability, they may still not be valid
How can researchers assess the construct validity of a coded measure?
Most common way is by including reverse-word items: - Diener might have changed the wording of some items to mean their opposite: "If I had my life to live over, I'd change almost everything." - reverse-worded items might slow people down so they answer more carefully (Before computing a scale average for each person, the researchers rescore only the reverse-worded items such that, for example, "strongly disagree" becomes a 5 and "strongly agree" becomes a 1.) - More construct validity bc high or low averages would be measuring true happiness or unhappiness instead of acquiescence - Drawback is that sometimes it results in negatively worded items, which are more difficult to answer
How can researchers tell the difference between a respondent who is yea-saying and one who really does agree with all the items?
1. Researcher might ensure that the participants know their responses are anonymous - by conducting the survey online, or in the case of an in-person interview, reminding people of their anonymity right before asking sensitive questions) However, anonymous respondents may treat surveys less seriously: - in one study, anonymous respondents were more likely to start using response sets in long surveys - anonymous people were less likely to accurately report a simple behavior, such as how many candies they had just eaten, which suggests they were paying less attention to details 2. Include special survey items that identify socially desirable responders with target items like these: >My table manners at home are as good as when I eat out in a restaurant >I don't find it particularly difficult to get along with loud-mouthed, obnoxious people - If people agree with many such items, researchers may discard that individual's data from the final set, under suspicion that they are exaggerating on the other survey items or not paying close attention 3. Researchers can ask people's friends to rate them - friends rate you on traits that are observable but not desirable 4. Use computerized measures to evaluate people's implicit opinions about sensitive topics - Implicit Associations Test asks people to respond quickly to positive and negative words on the right and left of a computer screen in which are intermixed with different social groups to show negative attitudes on an implicit, or unconscious, level
How can we avoid socially desirable responding?
Statistical Validity Question 2:
How precise is the estimate? To communicate the precision of their estimate of r, researchers report a 95% confidence interval (CI) --> CI calculations ensure that 95% of CIs will contain the true population correlation Ex: Correlation between time spent sitting and MTL thickness; r = -.37, and its CI is [-.07, -.64]; we cannot know the true population relationship b/w sitting and MTL thickness, but we do know CIs are designed to capture the true relationship in 95% of studies like this, so we can't rule out that the true r is around -.07 or even as high as -.64 - sample size and precision - CIs that do not contain zero - CIs that do contain zero
Statistical validity question 1:
How strong is the relationship? - All else being equal, larger effect sizes are more important - "Small" Effect Sizes Can Compound Over Many Observations - Benchmarks: Compared to What?
Likert-type scale
If the scale does not exactly follow the Likert-scale format exactly.
settings, in this case, to a population of conversations
In Mehl's study using the EAR device to sample conversations every 12.5 seconds: although external validity often involves generalizing to populations of people, researchers may also generalize to:
statistically significant
In NHST, the conclusion assigned when p < .05: that is, when it is unlikely the result came from the null-hypothesis population
restriction range
In a bivariate correlation, the absence of a full range of possible scores on one of the variables, so the relationship from the sample underestimates the true correlation
1. Convenience sampling 2. Purposive sampling 3. Snowball sampling 4. Quota sampling
In the case where external validity is not vital to a study's goal, researchers might be content with a nonprobability sampling technique. These include:
In a Frequency Claim, External Validity is a priority
In these such claims, external validity, which relies on probability sampling techniques, is crucial - "8 out of 10 drivers say they experience road rage". If study used sampling techniques that contained mostly urban residents, the road rage estimate might be too high because urban driving may be more stressful External validity of surveys based on random samples can actually be confirmed in some cases: - In political races, the accuracy of pre-election opinion polling can be compared with the final voting results. In most cases however, researchers are not able to check the accuracy of their samples' estimates because they hardly ever complete a full census of a population on the variable of interest: - we could never evaluate the well-being of all the people in Afghanistan to find out the true percentage of those who are struggling or suffering. Similarly, a researcher can't locate all the owners of a particular style of shoe to ask them whether their shoes "fit true to size." Because you cannot directly check accuracy when interrogating a frequency claim, the best you can do is examine the method the researcher used (probability sampling technique makes for confident external validity)
1. strata are meaningful categories (such as ethnic or religious groups), whereas clusters are more arbitrary (any random set of high schools would do) 2. the final sample sizes of the strata reflect their proportion in the population, whereas clusters are not selected with such proportions in mind
In what two ways does stratified sampling differ from cluster sampling?
Most secretive methods, such as one-way mirrors and covert video recording must gain permission from participants - If hidden video recording is used, the researcher must explain the procedure at the conclusion of the study - If people object to having been recorded, they must erase the file without watching it
Is it ethical to observe the behaviors of others?
Statistical Validity Question 5:
Is there restriction of range? A restriction of range can make the correlation seem smaller than it really is. Primarily asked about when correlation appears weaker than expected. Ex: restricting the range of SAT scores to only the ones above 1,200 to be shown on the graph, when the true range of SAT scores is 400 to 1,600 - To solve this issue, you can use the correction for restriction of range; it estimates the full set of scores based on what we know about an existing, restricted set, and then recomputes the correlation
Sample Size and Precision (Statistical Validity Question 2)
Large samples give estimates with much narrower, more precise confidence intervals Ex: Both endpoints depict weak relationships, but the precise estimate gives us a strong guide to what to expect for a future study. Assuming all else is equal, we can predict the future estimates will mostly be in this narrow range
Self-reporting memories of events
People's accounts of adverse events--and many other events--sd be trusted - In some cases people's certainty about their memories might not match their accuracy - Cognitive psychologist have checked the accuracy of "flashbulb memories": researchers give a short questionnaire to people a day after a dramatic event, asking them to recall where they were, with whom, and so forth; a few weeks or years later, the same people answer the same questions; people's flashbulb memories remain vivid over time even as they decline in accuracy - People's feelings of confidence (and vividness) in their memories do not, by themselves, inform us about their accuracy
Fence sitting
Playing it safe by answering in the middle of the scale for every question in a survey or interview - may happen when survey items are controversial - people also may do this (or say "I don't know") when question is confusing or unclear - weaken construct validity when middle of the road scores suggest that some responders don't have an opinion, though they actually do
Unobtrusive observations (Blend In)
Solution 1 to Reactivity: An observation in a study made indirectly, through physical traces of behavior, or made by someone who is hidden or is posing as a bystander Ex: One-way mirror or researcher acting like a casual onlooker allows researchers to record behaviors of children without being a known observer
Similarities: Quota sampling similar to stratified sampling because both specify subcategories and attempt to fill targeted percentages or numbers for each subcategory Differences: In Quota sampling, the participants are selected nonrandomly (maybe through convenience or purposive sampling), and in stratified random sampling they are selected using a random selection technique
Quota Sampling vs. Stratified Random Sampling
Random sampling: researchers create a sample using some random method, such as drawing names from a hat or using a random-digit phone dialer, so that each member of the population has an equal chance of being in the sample. - Random sampling enhances external validity Random assignment: is used only in experimental designs. When researchers want to place participants into two different groups (such as a treatment group and a comparison group), they usually assign them at random. - Random assignment enhances internal validity by helping ensure that the comparison group and the treatment group have the same kinds of people in them, thereby controlling or alternative explanations
Random sampling vs. Random assignment
Codebooks
Rating instructions so the observers can make reliable judgements with minimal bias - they are precise statements of how the variables are operationalized, and the more precise and clear the statements are, the more valid the operationalizations will be - researchers can assess the construct validity of these coded measures by using multiple observers (allows them to assess the interrater reliability of the ratings)
Wait it out
Solution 2 to Reactivity: Wait out the situation. Ex: researcher observing children at school may let the children get used to their presence until they forget they are being watched
Measure the behavior's results
Solution 3 to Reactivity: Use unobtrusive data. Instead of observing behavior directly, researchers measure the traces a particular behavior leaves behind. Using these indirect methods, researchers can measure behavior without doing any direct observation Ex: In a museum, wear-and-tear on the flooring can signal which areas of the museum are the most popular, and height of smudges on the windows can indicate age of visitors
Rating Consumer Products
Studies suggest that people may not always be able to accurately report the quality of products they buy - Amazon 5 star rating was correlated with cost of product rather than the prestige of its brand - Online ratings are examples of frequency claims
1. Observer bias 2. Observer effects 3. Reactivity
The construct validity of observations can be threatened by three problems:
Construct Validity of Surveys and Polls
The format of a question (open-ended, forced-choice, or Likert scale) does not make or break its construct validity. The way the questions are worded and the order in which they appear are much more important.
sample
The group of people, animals, or cases used in a study; a subset of the population of interest.
observational research
The process of watching people or animals and systematically recording how they behave or what they are doing - can be the basis for frequency claims - can also be used to operationalize variables in association claims and causal claims
CIs that do contain zero (Statistical Validity Question 2)
The true association could be negative or positive, and we cannot rule out zero when the 95% CI contains zero. This type of association is commonly called not statistically significant.
Prepare different versions of a survey, with the questions in different sequences. - If the results for the first order differ from the results for the second order, researchers can report each set of results separately.
What is the most direct way to control the effect of question order?
occurring without any order or pattern
What precise meaning does random have in research?
External validity of a study
This validity of a study concerns whether the sample used in the study is adequate to represent the unstudied population - when this validity is good, we can say the sample is representative of a population of interest - if the sample is biased in some way, then this validity is unknown
"Small" Effect Sizes Can Compound Over Many Observations (Statistical Validity Question 1)
Tiny effect sizes can become important when aggregated over many situations, and they can also become important when aggregated over many people Ex: - The correlation between the personality trait of agreeableness and the success of a single social interaction may be only .07, but when this effect is aggregated over the course of a couple hundred social interactions, an agreeable person will actually be more popular than a less agreeable person - Teens randomly assigned to growth mindset group had better grades than those who were not with an effect size of r = .05 which was enough to prevent 79,000 teens from scoring in the D or F range (when outcome is not as extreme as success and failure, a very small effect size can be negligible)
construct validity & statistical validity
What are the two most important validities to interrogate in an association claim?
What is the variable of interest, and did the observation accurately measure that variable?
What do you ask when interrogating the construct validity of any observational measure?
Survey & poll
a method of posing questions to people on the phone, in personal interviews, on written questionnaires, or online
multistage sampling
a probability sampling technique involving at least two stages: a random sample of clusters followed by a random sample of people within the selected clusters Ex: In the highschool example, the researcher would 1. start with a list of high schools (clusters) in the state and select a random 100 of those schools. 2. Then, instead of including all students at each school, the researcher would select a random sample of students from each of the selected schools
biased sample (unrepresentative sample)
a sample in which some members of the population of interest are systematically left out, and therefore the results cannot generalize to the population of interest
Outlier
a score that stands out as either much higher or much lower than most of the other scores in a sample
census
a set of observations that contains all members of the population of interest
Response Sets (or nondifferentiation)
a shortcut respondents may use to answer items in a long survey, rather than responding to the content of each item - people might adopt a consistent way of answering all the questions (especially toward the end of a long questionnaire) - rather than thinking carefully about each question, people might answer all of them positively, negatively, or neutrally - weaken construct validity because these survey respondents are not saying what they really think
Masked design (blind design)
a study design in which the observers are unaware of the experimental conditions to which participants have been assigned
Forced-choice questions
a survey question format in which respondents give their opinion by picking the best of two or more options - often used in political polls: Ex: If the Ohio congressional election were held today, would you vote for the Republican Steve Chabot? Or the Democrat Aftab Pureval?) - Can also measure personality: Ex: Narcissistic Personality Inventory asks people to choose one statement from each of 40 pairs of items; researcher adds up # of times people choose the "narcissistic" response over the "non-narcissistic" one (narcissistic response is 1st one in picture) - also simple yes/no questions can be considered this format
Semantic Differential Format
a survey question format using a response scale whose numbers are anchored with contrasting adjectives Ex: On the Internet site RateMyProfessors.com, students assign rating to a professor using the following adjective phrases (in picture) Ex: Five-star rating format that Internet rating sites (like Yelp): 1 star = poor, 5 stars = the best
Leading Questions
a type of question in a survey or poll that is problematic because its wording encourages one response more than others, thereby weakening its construct validity Question wording matters - different versions of the question led to different results - Second and Third questions in picture
Snowball sampling
a variation on purposive sampling, a biased sampling technique in which participants are asked to recommend acquaintances for the study Ex: Study on coping behaviors in people with a disease; researcher may start with 1 or 2 people with disease, then ask them to recruit people from their support groups who then recruit 1 or 2 acquaintances until sample is large enough - it is unrepresentative because people are recruited via social networks, which are not random
mean
an arithmetic average; a measure of central tendency computed from the sum of all the scores in a set of data, divided by the total number of scores
bivariate correlation
an association that involves exactly two variables - To investigate associations, researchers need to measure first variable and the second variable in the same group of people; then use graphs and simple statistics to describe the type of relationship the variables have with each other (positive, negative, zero) Ex: Corrleational study investigating the association b/w sitting during the workweek and the thickness of certain brain regions. Tested sample of 35 adults and asked each person how many hours they typically spend sitting on weekdays and then using an MRI to measure the thickness of their medial-temporal lobes (focus on MTL which is smaller in people who have Alzheimer's or memory decline. - Notice that even though the study measured more than two variables, an analysis of this type of correlation looks only at two variables at a time.
acquiescence (or yea-saying)
answering "yes" or "strongly agree" to every item in a survey or interview Ex: might answer "5" to every item on Diener's cale of subjective well-being not because the respondent is happy, but because that person is using this short cut - people apparently have a bias to agree with (say "yes" to) any item, no matter what it states - threatens construct validity because instead of measuring the construct of true feelings of well-being, the survey could be measuring the tendency to agree or the lack of motivation to think carefully
Benchmarks: Compared to What? (Statistical Validity Question 1)
average effect size in psych students is around r = .20 and may only rarely be as high as r = .40