Psy 1901
HARKing?
"Hypothesizing after results are known" -Expand or alter the set of hypotheses • Type I error --> hard-to eradicate theory • Hiding what didn't work (though you had good reason to think it would) • Slippery slope into fudging other aspects • Narrow new theories, suppressed alternatives
What are the disadvantages of a Semantic Differential Scale?
- Scale not necessarily always valid across objects and respondents (athletes versus odor)
Internal Consistency
- Split-half reliability: • the correlation between one half of the items composing a scale with the other half of the items in that scale - Cronbach's coefficient alpha: • derived from the correlation of each item in the scale with each other item in the scale • equal to the mean of all possible split-half reliability estimates
Confounding Variable and Relationship to Error/Threats
- an extraneous variable that varies systematically with condition Error/Threat: -Increase systematic error -Threat to internal validity ( e.g., SES in texting study)
Hypotheses are not testable if...
- the concepts to which they refer are not adequately defined - they are circular: the event itself is used as the explanation - they appeal to ideas or forces that are not recognized by science!
Inter-rater reliability
- the correlation between scores on the same measure administered by two different raters • If two different judges/observers apply the measure to the construct, they should get equivalent results • Example: Grading
Divergent (Discriminant) Validity
- the lack of correlation between measures of different constructs - these measures are intended to tap different constructs (i.e. intelligent different from shyness/anxiety)
Features of high quality debrief?
-"Do you have any questions?"! -Debrief should be short (script should be 1-2 paragraphs) -Depending on study, you can hand out a debriefing sheet or experimenter can debrief orally (oral debrief is preferred method) -Write debrief script in a very straightforward way that can be easily understood -Avoid using words that might make them feel bad or diminish the importance of their involvement (Ex. Don't use the words "control group", instead say "some people were asked to...") -Clearly explain why you are interested in the phenomenon, and communicate the excitement -Clearly explain any deception used and why it was necessary -Enlist participant cooperation ("don't tell other participants about study") -End with "we are very grateful for your participation..."
Confidence Interval
-95% confident our interval contains μ (population mean) -If we replicate the study, mean will lie within this interval ~83% of the time
What is factorial design and why do we use it?
-At least 2 IVs with at least 2 levels each -The design includes every possible combo of IVs -Could be within- or between-participants Why? 1) Interested in 2 or more IVs and don't want to do separate experiments 2) To include an extraneous variable as an IV in itself- could be a variable that you'd control for- e.g., order of conditions in within subjects study 3) To look at interactions between IVs - Follow up with post-hoc tests
Construct validity? How is it different from internal/external validity?
-Construct validity is "the degree to which a test measures what it proposed to be measuring", therefore it is describing the validity of a test. -Internal and External validity is describing whether an experiment is causal and could be generalized to outer population.
What are the disadvantages of a Likert Scale?
-Do not yield info about respondent's latitude of acceptance -Do not carry information about the exact pattern of responses (Not clear that this is a major disadvantage)
Face validity
-Does the measure seem, on it's face, to measure what we claim in does? • e.g., set of math problems is high in face validity for math ability! -Not always good to have high face validity! - e.g., explicit racist attitudes -now use implicit measures
How do we measure reliability?
-Over time (test-retest reliability) -Across people (inter-rater reliability) -Within a scale/measurement (internal consistency)
A priori vs. Post hoc
1) A priori- Should be done before an experiment (if probable effect size is known): • How many people do I need to test to have 80% power? 2) Post-hoc- Can be done after experiment: • Given my data, what power did I achieve?
Steps to setting the experimental stage?
1) A protocol and script is needed to standardize your experimental session 2) Consent form; debriefing form 3) Design/assemble study measures 4) Prepare randomization procedure 5) Practice and pilot test your study on each other/friends
Best practices for ordering questions
1) Begin with short introduction 2) Organize thematically (provide clear transitions; Introduce each section) 3) First items: Should be easy to answer (NOT make participants reactive/defensive); Be mindful of contaminating questions 4) Last items: sensitive questions; potentially contaminating questions; manipulation check; end with demographics 5) Order effects: if in doubt, counterbalance; general questions are more likely to be influenced by specific questions than vice versa; surveys should follow the funnel principle 6) Manipulation + Attention Checks: See if your manipulation actually put people in the psychological state you intended; See if respondents are actually paying attention; Careful how you throw out data (can undermine random assignment)
What are the two main classes of variables?
1) Categorical: no meaningful order - nominal (or binary, if only 2 categories) - qualitatively different categories, with no rank order; values are usually names (sometimes #s) 2) Continuous: meaningful order
What are two ways we describe continuous variables?
1) Centrality (mean, median) 2) Variability (spread) - range -variance (standard deviation, standard error, confidence interval)
t(critical) depends on...
1) Choosing an α level (usually .05) - the smaller the α, the larger the t(c), the more specific the test 2) Knowing the df (# of subjects -1)* - the larger the df, the smaller the t(c), the more powerful the test
What are some experimenter-related biases?
1) Confirmation bias: overweigh results that support, under-weigh those that don't 2) Experimenter expectancy: can influence how experimenter interacts with the participant *Double-blind, Partial blindness designs, Blind to incoming data, Eliminate experimenter
What has the field done to mitigate the negative effects of p-hacking?
1) Create new requirements for authors 2) New recommendations to reviewers 3) Encourage replication (website to register and post results for replicated studies)
3 Types of Deception
1) Deception by omission • not told a lie, but not told the whole truth either • many studies do this to reduce DEMAND CHARACTERISTICS 2) Active deception • told a lie (e.g. Milgram's obedience experiments) • Most problematic: false feedback 3) Double deception (aka second order deception) • told experiment over, but continue collect data • not entirely uncommon
Common ethical issues today and how to deal with them?
1) Deception: obtain consent to be deceived, obtain waiver to be fully informed beforehand, or after debriefing, give opportunity to withdraw 2) Debrief: Tell participant about full details, especially in case of deception and false feedback (tell them about Belief Perseverance effect to reinforce) 3) Confidentiality & Anonymity: -Confidentiality: researcher can figure out who contributed which data, but promises never to share the identity of the participants (except as required by law) -Anonymity: no identifying info is recorded; impossible for researcher to figure out who contributed which data 4) Ethics of not doing research -Should be considered in cost-benefit analysis (Rosenthal & Rosnow, 1984) - e.g. long-term social costs of not discovering an intervention for acute social problem (poverty, prejudice) - applicable to biomedical research (stem cells, animal research) *IRB created to protect the rights and welfare of human subjects recruited to participate in research (required for all federal funded research)
Types of construct validity?
1) Face validity 2) Convergent validity 3) Discriminant (Divergent) validity 4) Criterion-Related validity (concurrent and predictive) 5) Concurrent validity 6) Predictive validity
Best practices for formulating questions
1) For each question, item you create: - "What can I learn from the answer to this question?" - "What can't I learn from the answer to this question?" - Consider specific hypotheses 2) KISS Principle ("Keep it simple and short") - Use simple informal language -Use previously validated measures if they are available -Avoid negation 3) Make it easy for respondents, decompose questions- do not make participants do math if you can do it later 4) Be specific: Participants should be clear on what you are interested in learning; Avoid subjective terms for behaviors (may be interpreted differently, give specific ranges) 5) Avoid double-barreled questions: ask only one question at a time to avoid forced choice, don't make unwarranted assumptions 6) Filter questions: Ensure questions are relevant to the respondent (skip logic on web survey) 7) Avoid biased questions: Express all alternatives in a balanced way; Attempt to use neutral terms and avoid "loaded" terms 8) Reverse coded items: Increases diversity of questions; Checks and reduces response bias
Different strategies for minimizing confounds within and between subjects?
1) Hold it constant across conditions 2) Let it vary randomly across conditions! 3) Counterbalance it across conditions *If you can't do any of the above, at least measure the CV and statistically account for CV's effect
A good IV is...
1) Impactful 2) Well-controlled: Manipulates variable of interest and nothing else 3) Has homogeneity of impact: Each level of IV should affect participants in the same way, have same magnitude! 4) Includes some type of manipulation check 5) Minimizes demand characteristics
What are some participant-related threats?
1) Individual differences (threat to internal validity) • Participants may respond to IVs differently • Related to random error *Random assignment 2) Selection bias and nonresponse bias (threat to external validity) • What kind of person volunteers, skips parts, or drops out? 3) History (threat to internal validity) • outside events (e.g., Hurricane Katrina) *Control group 4) Maturation (threat to internal validity) • Participants change over course of experiment *Control group 5) Regression to the mean (threat to internal validity) • If a variable is extreme on its first measurement, it will tend to be closer to the average on the second measurement * Random assignment and a proper control group 6) Testing effects (threat to internal validity) • e.g. practice on IQ tests! *Random assignment; don't include pre-test; use a different test at each testing question and counterbalance 7) Experimental mortality/attrition (both types of validity) • inferences are based on participants who finished • particularly concerning with longitudinal data 8) Participant reaction bias (both types of validity) • know they are being observed - stems from demand characteristics
How to reduce social desirability bias?
1) Keep it anonymous 2) Start with easy questions/impersonal
What are some different types of self-report scales?
1) Likert Scale 2) Semantic Differential Scale 3) Guttman (cumulative) 4) Bogardus social distance (cumulative)
Elements involved in "beneficence"
1) Minimize risk (do no harm): physical, psychological, social, legal, or economic • federal regulations define "minimal risk" as: "Where probability and magnitude of harm or discomfort anticipated in the proposed research are not greater, in and of themselves, than those ordinarily encountered in daily life or during the performance of routine physical or psychological examinations or tests." • Deception 2) Maximize benefit • contribution to society • benefits for participant (participant payment doesn't count)
What are three ways we describe categorical variables?
1) Mode 2) Frequency 3) Contingency Table (ex: for two binary variables)
P-values are not...
1) Not the probability of falsely rejecting the null 2) Not the probability that replicating the experiments would yield the same conclusion 3) DEFINITELY not a measure of effect size
What are the different types of experimental designs?
1) One factor within-participants design 2) One factor between-participants design 3) Pretest-Posttest Two Group design 4) Factorial Design ***REVIEW BRANCHES***
Elements involved in "respect for persons"
1) Protect autonomy via INFORMED CONSENT 2) Thorough, honest information 3) Clear comprehension (elderly, illiterate, child) 4) Voluntary (no coercion or undue influence)
Elements involved in "justice"
1) Representative sample • selection must be equitable, not based on ease of access to a group 2) Fair to all experimental groups • appropriate distribution of risks/burdens • appropriate distribution of benefits -e.g., control group in medical experiments • tell participants of necessity of control group • tell them of exact chances of being in control group • assure them that if intervention successful, they'll get it
3 key principles of the Belmont Report
1) Respect for persons • treat individuals as autonomous agents • provide extra protections to those with diminished autonomy (children, elderly) 2) Beneficence (& non-malfeasance) • maximize possible benefits, minimize risks 3) Justice (& fidelity) • fair selection procedures and outcomes
What are best practices for designing your study?
1) Specify variables, procedures, cover story 2) Optimize design 3) Measuring confounds, mediators, moderators 4) Manipulation check
Types of reliability?
1) Test-retest: how highly correlated are the measures taken from the same person at two points in time! 2) Inter-rater reliability: how correlated are the ratings of two or more raters 3) Internal consistency: how highly correlated are different items with one another
How to reduce floor or ceiling effects?
1) Use common sense 2) Use validated tests 3) Run a pilot
Best practices for formulating response options
1) Vagueness vs. Over-precision: Where possible, ask for numerical ranges or responses instead of verbal estimates; Avoid asking for unobtainable precision 2) Facilitate responding: May clarify the level of detail the researcher wants; May help respondents' memory (list vs. open-ended) 3) High/Low frequency alternatives: middle of scale seen as "average"; can influence perception of the "norm" 4) Matrix Form: Useful when you have a scale or multiple questions with the same response categories 5) Don't know/don't care: Where relevant, provide option to express indifference, lack of knowledge, lack of opinion (i.e. "floaters" who might otherwise respond "I don't know can substantially affect results); Express all response alternatives in a balanced way
What are the advantages of multiple item measures?
1) improves reliability and validity 2) reduces complexity of analysis 3) permits test of dimensionality of construct
What are some examples of random error? How can we reduce these errors?
1) rater fatigue 2) inattention 3) misunderstanding! 1) Train raters well; keep instructions clear! 2) Increase motivation (make it less boring)! 3) Pseudo randomize items!!
For a t-test, to reject the null, we want...
1) t(critical) to be small • choose a bigger α? • increase sample size? 2) t(observed) to be big • minimize variability (HOW*** ADD)
How to reduce carry-over effects?
1. COUNTERBALANCE ORDER 2. Consider a between subjects design
Types of methodological bias
1. Demand Characteristics: Subtle aspects of study lead participants to guess hypothesis 2. Social desirability bias: when the subjects feels pressure to respond in a socially desirable way 3. Carry-over effects: condition order affects results 4. Floor or ceiling effects: when the task is too hard or too easy
What are the characteristics of a good theory?
1. Explains data (observations) 2. Predicts new data 3. Is testable 4. Is parsimonious and specific *somewhat surprising
What are the 3 cardinal sins?
1. Fabrication 2. Falsification 3. Plagiarism
How do we identify/measure confounds, mediators, and moderators?
1. Read literature: have alternative explanations been raised in related studies? (include measures for these variables)! 2. Think critically: what variables do you want to rule out as alternative explanations for your findings? (measure all of them!) - Ex. Good Samaritan Study: rule out that hurry manipulation was really a stress manipulation; include questions to assess self-reported stress levels
What are key steps to "specifying variables, procedures, and cover story?"
1. Write out operational definitions of variables (IVs, DVs, control) 2. Clearly specify your experimental design (conditions, assignment, between/within, factorial, etc.) 3. Write out your study procedures and cover story
2x2x2- how many factors? how many levels? conditions?
3 factors; 2 levels in each; 8 conditions
Mediator
A go-between variable linking some X to Y
Concurrent Validity
A measure's ability to distinguish between groups of people known to differ on the construct • e.g., a depression scale's ability to distinguish between clinically diagnosed depressed patients and paranoid patients.
Predictive Validity
A measure's ability to identify future differences between people • e.g., a test of math ability should predict how well people will do in math classes or how likely they will be to enter a quantitative field
Control Variables
A potential IV that is held constant across all conditions -Does not vary because it is controlled by the experimenter -Alternatively, measure the variable (e.g., age) and then statistically control for it
Moderator
A variable that promotes (makes stronger) or inhibits (makes weaker) the relationship between X and Y
Power
Ability to detect result, if present -If we run the experiment many times, how often will we expect to correctly reject the null hypothesis (standard = 80%)
Advantage/Disadvantage of holding CV constant across conditions?
Adv: -Same treatment for all conditions (easy to administer) Disadvantage: -Works with a between-subjects design (not within-subjects) - Could limit external validity
Advantage/Disadvantage of letting CV vary randomly across conditions?
Advantage: Disadvantage: -still a chance there will be a confound - less likely with lots of subjects
Advantage/Disadvantage of counterbalancing CV across conditions?
Advantage: -Makes sure each paragraph is assigned to each condition equal number of times across subjects! Disadvantage: - Will also need to counterbalance order if within-subjects (cannot read all "dog" first)
Advantages of within-participant design?
Advantages: - Reduce random error associated with individual differences - More statistical power (Easier to detect real differences) -Requires fewer Ps (Each person is his/her own "control" group; Less costly)
Advantages/Disadvantages of self-report?
Advantages: 1) Easy to use 2) Face-valid 3) Ask multiple q's related to issue Disadvantages: 1) Social Desirability 2) Easy to recognize hypothesis 3) Telling more than we know (people make up stuff because you're asking them to)
Advantages/Disadvantages of physiological?
Advantages: 1) Fixes reactivity issues 2) Fixes social desirability issues Disadvantages: 1) Expensive 2) Movement restrictions 3) General level of arousal vs. specific type of arousal
Advantages/Disadvantages of implicit measures?
Advantages: 1) Fixes reactivity issues 2) Fixes social desirability issues 3) Easy to administer Disadvantages: 1) sensitive to context effects 2) does it actually measure attitudes?
Advantages/Disadvantages of behavioral?
Advantages: 1) Fixes reactivity issues 2) More engrossing and absorbing Disadvantages: 1) Difficult to develop 2) Difficult to generalize
Best practices for eliciting responses in sensitive domains
Begin with easy to answer questions that are not sensitive and then move progressively more sensitive for later items
What are the characteristics/advantages of a Semantic Differential Scale?
Bipolar evaluative scales to judge any object! • Anchored by two opposite endpoints ! - positivity vs. negativity! • e.g., "Americans," "mothers," "terrorists" Pick an attitude object! - Your bff, cats, chocolate! • Assess evaluation, potency, and activity! Advantages: Osgood et al. claim that the scale was valid across objects and respondents!
Systematic error is caused by...
CONFOUNDS that... - influence ALL scores in one condition in the same direction - Have no effect, or a different effect, on the scores in other conditions - Can bias (i.e., inflate) the size of the differences between conditions!
Beta
Chance of Type II error - Risk of false negative (missing it!)
Difference between means
Cohen's d -How different are the two samples? -Takes into account both the difference between means and the standard deviation Standard: - 0.2 = small - 0.5 = medium - 0.8 = large
Potential biases and fallacies that affect validity?
Correlational Fallacy- correlation does not imply causation 1) Experimenter Bias 2) Participant Bias: Demand Characteristic 3) Volunteer Bias
Steps involved in post-experimental protocol?
Debriefing: • "Manipulation checks" • Psychological well-being of Ps • Educational experience of Ps • Think of Ps as consultants Before Debrief: probe for participant suspicion and hypotheses about the study *Participant should feel good about themselves and their research study experience!
Definition of construct?
Degree to which conceptualization of what is being measured is actually being measured (i.e., what is actually being tested)
Effect Size
Descriptive statistic that tells us the magnitude of relationship between two variables; strength of the phenomenon/manipulation you are measuring
Statistical power is determined by _____ of _______ and _______ _______
Determined by the SIZE OF EFFECT and the SAMPLE SIZE (n)
Disadvantages of within-participant design? What is a potential solution?
Disadvantages: - Carry-over effects (aka contamination effects): Learning, Fatigue, Habituation, Pretest sensitization, Demand effects - Not always feasible Solution: Counterbalance the conditions
Criterion-Related Validity
Does the measure allow us to predict a person's score on some independent, concrete, behavioral outcome? - e.g., predicting successful researchers • locate sample of "successful" and "unsuccessful" researchers! • grade point average during grad school!
Margin of Error
Estimate of population mean from sample
What is noise and how do we minimize it?
Extraneous variables that can influence your IV and DV unevenly, obscuring your ability to see if your IV causes your DV To Minimize: -Lab studies allows for greater control of extraneous variables, but still at risk for confounds - Standardize your procedure - Recruit a homogeneous sample of participants or use random assignment with a sufficiently large sample
Random error is caused by...
Extraneous variables! - Variables whose average influence on the outcomes is the same across all conditions - Makes it difficult to detect a significant effect
What are some sources of random error?
Gender, SES, personality -Becomes systematic if all girls/boys assigned to same group or condition
Researcher degrees of freedom?
If combine a lot of variables, or play around enough, it is very likely to find p< 0.05 (60.7%)
When to do post-hoc (i.e., simple effects) tests?
If interaction is significant, then must analyze simple effects -Do pairwise comparisons (t-tests!) of all rows and columns!
What is the experimenter's dilemma?
In experiments, often external validity is sacrificed for the sake of internal validity
Manipulation Check?
Include this measure sometime after your manipulation (in most cases it should be one of the last measures you administer)
Why is p-hacking problematic?
Increases type 1 error and reduces the legitimacy of field
Difference between manipulated variable and independent differences variable? Is a independent differences variable still an IV?
Independent differences variables are not manipulated or selected by the researcher, but are characteristics or traits that varies consistently across individuals (e.g., age, depression, gender, intelligence) -Technically not an IV because it is not randomly manipulated or selected by the researcher
What are the two types of multiple-item measures?
Index: simply accumulate scores of individual items (dichotomous) Scale: assign scores to patterns of responses across items
What type of design uses an Individual Differences Variable as the IV?
Individual difference variables (subject variables) are often studied as independent variables in the NATURAL GROUPS DESIGN.
Reliability?
Instrument is consistent and stable • Do the measures yield consistent results? • When repeated multiple times, the standard deviation of the observations is small
Validity?
Instrument measures what it is supposed to measure
Difference between internal and external validity?
Internal: Extent to which our research allows us to make statements about a causal relationship between the IV and DV, and not caused by non-related extraneous variable. External: Extent to which results tell us something about the real world (other people, other settings, other ways of measuring IV and DV)
Provide an example of poor construct validity
Last year's income as a measure of SES (lose job, inconsistent salary, maternity leave, etc.)
What are the characteristics/advantages of a Likert Scale?
Likert is most widely used scale in social sciences today -simpler to construct -easier to answer on range than dichotomous variable -Uses only monotonic items -Scale score is derived by summing or averaging the item ratings -Can be used in cases where construct is multi-dimensional
Maier (1931) and Nisbett & Wilson (1977)
Maier: Problem solving Nisbett & Wilson: Remote Associates *ADD MORE*
What do we mean by main effects and interactions?
Main Effect: • effect of one IV on DV (independent of other IV) • compares marginal means Interaction: • when the effect of an IV on DV changes with levels of another IV • compares cells means
Dependent Variable
Measure of behavior used by the researcher to assess the effect (if any) of the independent variables.
Mediator versus Moderator
Mediator: -The why question -Reflecting a process -Believed to be associated with both the IV & DV Moderators: -The when question -Qualifying a condition -Need not be associated with either the other IV or DV
What is the difference between mundane and experimental realism?
Mundane Realism: degree to which events and situations created in experiment are similar to real world Experimental Realism: experiment is psychologically realistic and meaningful (do they believe the cover story?) *Lack of mundane realism is less troubling than lack of experimental realism
Regression
One or more predictors used to predict a single outcome; Goal is to compute equation for the line that best fits the data! Y = mx + b, where m is the slope and b is the y-intercept
Correlation
Pearson's r correlation Correlation Coefficient: A statistic that indicates the degree to which two variables are related to one another in a linear fashion (ranges from -1.00 to 1.00)
What are 3 measures for effect size?
Pearson's r correlation, Cohen's d, Phi
Association between categorical variables
Phi
What is p-hacking?
Playing with the data to create "signiificant" results
Alpha
Preset limit for Type I error - Risk of false positive
How to maximize internal validity?
Random Assignment -IV is manipulated and isolated -Extraneous variables are controlled -Subjects are randomly assigned *Rules out confounding variables and reduces error
How to maximize external validity?
Random Sampling -Use hypothesis to limit desired generalization -Gather data from entire population (not practical) OR -*Gather data from representative sample that shares characteristics of population by drawing a random sample.
How do you minimize systematic error?
Random assignment
Difference between random sampling and random assignment?
Sampling: - method of selecting participants so that they are representative of the population of interest - affects external validity Assignment: -procedure used after sample of participants is obtained, but before they are exposed to the experimental manipulation - affects internal validity
What are the 11 steps in survey research?
Step 1: Decide on your survey method! Step 2: Decide on sampling method! Step 3: Decide on specific content areas! Step 4: Look for existing questionnaires or scales! Step 5: Draft the questionnaire! *** FOCUS **** Step 6: Assemble into a draft and circulate to experts! - Identify superfluous, and necessary but missing, questions - Identify problems with wording or flow Step 7: Pretest; Do reliability and validity tests; Come up with responses for open-ended questions; Determine length (administration time) Step 8: Analyze Pretest Step 9: Assemble final version and train interviewers Step 10: "Field" it Step 11: Code and analyze data
An experiment must have at least _________ conditions/levels in order to demonstrate that the IV has an effect on the DV. They are the _________ group and the _________ group.
TWO conditions/levels; experimental, control
Convergent Validity
The correlation between alternative measures of the same construct - these measures are intended to tap the same construct but have different sources of systematic error (i.e. shyness and anxiety measures)
Why run experiments? (Strengths and limitations)
To infer that X caused changes in O... - X must covary with O - X must precede O in time - Must rule out alt. explanations for changes in O through 1) random assignment or 2) by controlling confounds! Strength: tells us about causation Limitations: 1) Limited external validity - often can't get random sample - laboratory setting is unnatural 2) Not always possible - Can't always manipulate variable of interest (e.g., IQ, SES, etc.)
Cardinal
Type of continuous variable where the distance between two consecutive values is always the same -interval: has no meaningful zero -ratio: has meaningful zero!
Ordinal
Type of continuous variable where the distance between two consecutive values is not always the same
What is a manipulation check and why use them?
Used to measure the psychological state in order to determine if the measure or manipulation induced the construct of interest.
How do we optimize design?
Using a within-subjects design: participant serves in more than one condition of a study
Confound difference from extraneous variable?
Variable that varies systematically with condition, while extraneous is just a variable of disinterest
Statistic
a quantity computed from a sample
Compared to IVs, DVs are ____________ and ___________ to administer
cheaper and easier; - Each IV you add doubles your sample size! - Use multiple DVs - look for convergence!
Inferential statistics
help us draw conclusions about population *Is the observed association between our variables true of population or due to chance?
Between-participants design is also known as...
independent samples design -subjects only go through one condition that they are randomly assigned to prior to experiment
T-Test looks at _________ and ________ to determine if difference between conditions is due to chance.
mean; variability t= mean difference between conditions/variability -If t(observed) > t(critical), then reject the null hypothesis
Within-participant design is also known as...
repeated measures design -subjects go through both conditions (order randomly assigned)
Different ways to operationalize a variable?
self report, implicit measures, physiological, behavioral
No __________ measurement or variable can capture a ________ of interest. What is the solution?
single; construct -Use multiple operational definitions and measurements
Implicit associations detect the ________ of a person's __________ ___________ between representations of concepts in memory.
strength; automatic associations
Descriptive statistics
summarize results in sample
Test-retest reliability
the correlation between scores on the same measure administered on two separate occasions •If you applied the same operation (measure) to the same construct repeatedly, you should get the same result if your measure is reliable • Example: Blood pressure cuff
Individual differences variable
• Characteristics of the person (aka, subject variable) • Can be a predictor variable of interest • Examples: 1) Personality 2) Demographic 3) Cognitive Reflection Task
What are some new requirements for authors that have been suggested to mitigate p-hacking?
• Decide rule for terminating data collection • At least n = 20 per condition • List all variables and conditions (Including failed manipulations) • List all covariates (How do they change the results?) • Report observation elimination (How do they change the results?)
Threats to Validity (INternal and External)
• Different kinds of threat to internal and external validity (e.g., methodological, experimenter, participant-related) - know definition - know which bias threatens which kind of validity - know how to minimize **NEED TO ADD****
What are some examples of systematic error?
• Halo bias: tendency for overall positive or negative impression to influence ratings on specific dimensions • Generosity error: tendency of raters to over-estimate desirable qualities of people/objects they like • Contrast error: tendency for raters to compare targets to one another
How to reduce demand characteristics?
• Keep participant engaged (PILOT) • Use a persuasive cover story • Separate your DV from the rest of your study (the "unrelated 2nd Experiment") • Use a behavioral/behavioroid DV • Enlist participant cooperation • Make SURE participant is blind to her condition
What are some new recommendations to reviewers?
• Make sure authors follow guidelines • Tolerate result imperfections • Require demonstration that authors' findings do not hinge on arbitrary analytic decisions • Require replication when data collection or analysis is not sufficiently compelling
Measures can be _________ but not at all ________. However, the reverse is NOT TRUE!
• Measures can be RELIABLE but not at all VALID - measure the same construct over and over and get the same results (reliability), but you have not measured what you really think your construct is (validity)! • But .....the reverse is NOT TRUE! - a measure that is not reliable CANNOT provide a valid measure of anything! - you are measuring something else each time!
A good DV....
• Must reflect its operational definition and be high in construct validity (e.g., is chili pepper amount a good measure of aggression?) • Should not elicit demand characteristics (e.g., "how aggressive did you feel in the last five minutes?" might be high in demand characteristics and might produce social desirability response)
What is a p-value?
• Same as α • P(H0) probability that the null H is true • P(D) probability of observing the data • P(H0|D): probability of the null H given the data • P(D|H0): probability of the data given the null H
Types of DVs
• Self-report measures (Mturk, Smartphone apps) • Behavioral measures • Behavioroid measures • Physiological measures • Indirect measures • Interviews • Judgments or ratings by others
How do we increase internal consistency/reliability?
• Start with a large number of items or observations • Eliminate unclear items • Standardize instructions • Maintain consistent scoring procedures
Hypotheses are....
• specific predictions generated by theories • posit an association between 2 or more variables • are testable (i.e., falsifiable) • are directional • cannot be proven; only supported • can be causal or correlational
