Research Methods I final
How to improve internal validity: (6)
- Control for confounding variables -> by using random assignment - Presence of control group - Standardized measures/instructions/setting -> only difference between groups should be IV - Double-blind -> eliminates experimenter expectancy - Order effects -> use counterbalancing - Better inclusionary criteria (but decreases external validity)
Non-probability sampling: Convenience
- Most common in research studies - Also called accidental or opportunity sampling - Sample drawn based on availability and convenience - Not a lot of effort for you to get a particular sample - Includes participants that are readily available e.g. stop people we encounter on a downtown street, recruit people from the local community, study patients at a local hospital or clinic, test children at a nearby school, or use a sample of students at our own college or university. - Although the sample is not representative of any particular population, we can nonetheless test hypotheses about relationships among variables - We can test the generalizability of our findings by replicating the experiment on other convenience samples. The more different those convenience samples are from one another, the better we can see whether our findings generalize across different groups of people.
Informed consent paperwork involves: (7)
- Purpose of research, expected duration, procedures - Right to decline/withdraw - Potential risks/adverse effects - Prospective research benefits - Confidentiality/limits to confidentiality - Incentives - Contact information for questions etc. Make sure they know what they're getting into -> so they're not surprised about the procedures If under 18, parents need to sign informed consent Don't want to describe research too well For treatment research: Need to tell them they might get placebo/control Any info about getting further treatment after the study if they want to continue
Statistical significance affected by and also ways to increase correlation: (3)
- Sample size (larger sample size = lower p value) - Magnitude of r (larger magnitude = lower p value) - Researcher's choice of significance level α (alpha) (usually set at 0.05 or 0.01)
Subtypes of Probability sampling (5)
- Simple random sampling - Stratified sampling: proportional vs. disproportional - Systematic sampling - Cluster sampling - Multi-stage sampling
Why use ANOVA over t-test? (2)
- T-test Increases chance of type I error - ANOVA keeps probability at set alpha level for everything overall
To infer that one variable causes another, we need to meet 3 criteria:
- The two variables must correlate - The presumed cause must precede the presumed effect - All extraneous factors that might influence the relationship between to the variables are controlled or eliminated -> do this in a lab/controlled setting
If the F-ratio is larger than 1: (3)
- The variability between conditions' means is larger than would be expected due to chance - We reject the null hypothesis - How large the F-ratio needs to be to reject the null hypothesis depends on the degrees of freedom (df)
Sources of measurement error: (7)
- Transient states (e.g. mood, fatigue, hungry) - Situational factor e.g. how interviewer asks the questions; lights/volume/temperature - Reactivity - any change to how you would ask just by being observed/study - Social desirability bias - Stable attributes e.g. dispositional factors e.g. tend to be hostile - The measure itself e.g. too long, too vague, different to administer...etc. over time participant just starts guessing to get it done quickly - Scoring/entering data error
Non-probability sampling: Quota
- Type of convenience sampling - Obtain data until quota is reached for certain criteria; after that, no more - Can be proportional or non-proportional - A little more effort (have to count then stop); can control it a bit more
Limitations of having a control group (3
- difficult to not let them know if they're the treatment group or not; - possible placebo effect; - unethical to withhold treatment if it'll be useful
Potential considerations for informed consent (3)
-may be long and boring to read, need to really specify important risks etc. -might effect sample/external validity if some people refuse to sign -might describe research too well and influence results
Interview Disadvantages (4)
1. By helping them elaborate on their answers, you may change each respondents answers; not standardized 2. Difficulty with verbal communication 3. More pressure to answer quickly, anxiety 4. Time consuming
Steps to conduct a t-test
1. Calculate the means of the two conditions 2. Calculate the standard error of the difference between the two means (denominator) -> calculation includes variance of the two groups and sample sizes 3. Calculate the observed value of t This is the t-value calculated from your actual study 4. Find the critical value of t - This is the value of t that could arise due to chance - The critical value of t varies by: -> The number of participants in the study (degrees of freedom) -> The alpha level (Type 1 error rate) - To reject the null hypothesis, the absolute value of our observed t needs to be larger than the value of the critical t - Anything higher than the critical value = significant = you can reject the null hypothesis 5. Compare the critical (from statistical table) and observed (from step3) t-value If observed is larger than critical, we can reject the null hypothesis
Interview advantages (5)
1. Can be used with the illiterate (approx. 10% of the US population) and young children, cognitively impaired, severely disturbed individuals...etc. 2. Interviewer can be sure respondents understand each item before answering 3. Can ask follow-up questions 4. Can read their body language - behavioral observation data 5. Can't avoid doing it (like a questionnaire)
Threats to external validity (6)
1. Demographics - ages, background, SES, education 2. How sample was obtained - probability vs. nonprobability 3. Other times - seasons, times of the day, week 4. Other places - countries, regions, geographic locations 5. Other settings - laboratory settings = reactivity =being studied will affect your behaviors 6. Situational factors - the study itself; anything about the experiment that are controlled and different to how people are outside can affect the generalizability
Different ways you can administer the independent variables: (3)
1. Environmental manipulations -modifications of the participant's physical or social environment 2. Instructional manipulations - vary the independent variable through the verbal instructions that participants receive 3. Invasive manipulations - create physical changes in the participant's body through surgery or the
Threats to internal validity (8)
1. Maturation: naturally occurring process within individuals that could cause change in behavior e.g. fatigue, intellectual development 2. History: historical event (within or outside study) that could affect dependent variable e.g. sesame street presenting math concepts; fire alarm going off during study 3. Pretest-sensitization: taking test primed them, did better on second go around 4. Regression to the mean: tendency for extreme scores to move closer to mean with repeated testing (e.g. lowest 10% on test improve on next test regardless of intervention) -> Would probably do better/worse second time, since you caught them at a bad/good time, so will go back to the mean the second time 5. Instrumentation: different tests to measure pre and post -> measures change e.g. different standardized tests used; observers becoming more careless over time 6. Mortality threats/Attrition: drop out people are qualitatively different than those who stayed 7. Testing - changes in test scores occur due to effects of testing (e.g. practice effects, sensitization to study purpose) 8. Expectations -> experimenter expectancy effects - researcher wants the study to work, so conscious/unconscious bias; If expect improvement, may treat differently or assess (score differently). -> Placebo - knowing that you're getting a treatment makes you better
Disadvantage of Internet for questionnaires (4):
1. Need a little more computer savvy- older generation might be less 2. Need access to computer - restrict SES 3. Internet sometimes can't proceed to next question without finishing the previous → might quit or half-ass it 4. Seeing all the questions might give you an idea of what they're asking for, might bias it
4 Types of scales:
1. Nominal - used for categorical (discrete) variables; can add numbers to these, but they don't mean anything (they're labels) e.g. colors (blue = 1) 2. Ordinal - ranking; doesn't tell you the real difference between the orders e.g. order in which runners finish a race doesn't indicate how much faster one person was than another; distance from every point is not necessarily the same e.g. liking song 4 more than song 2 doesn't mean you like song 4 two times more than song 2 3. Interval - distance between numbers is equivalent e.g. the difference between scores of 90 and 100 (10 points) is the same as the difference between scores of 130 and 140 (10 points) →Doesn't have true zero e.g. 0 Fahrenheit doesn't mean no temperature, (kinda just acts as a middle point) e.g. Agree-disagree scale →numbers DO mean something (so not nominal), 4 means more agreement than 3; spaced out evenly (so can be ordinal, but still has more meaning between the differences than in ordinal) 4. Ratio - same as interval scale but with true zero (has an absence of something) e.g. weight →there can't be a negative value either e.g. age e.g. number of correct answers e.g. distance
Quasi-experimental designs: 7
1. One-group pretest-postest design 2. Nonequivalent groups posttest only design 3. Nonequivalent groups pretest-posttest design 4. Simple interrupted time series design 5. Interrupted Time Series with a Reversal 6. Interrupted Time Series Design with Multiple Replications 7. Control Group Interrupted Time Series Design
ANOVA variations (5)
1. Repeated-measures ANOVA 2. One way ANOVA: one independent variable (with at least 3 conditions) -> 3 cuz if 2 then gotta use T-test 3. Two way ANOVA: two independent variables (at least 2 conditions for each) 4. MANOVA 5. ANCOVA
Questionnaire advantages: (5)
1. Require less extensive training of researchers 2. Can be administered to groups of people simultaneously; larger sample size 3. Usually less expensive and time consuming 4. For the more sensitive topics, respondents can remain anonymous, so will respond→ can be more honest 5. Don't have to fill it out right away; so can skip questions and go back once thought of properly → but is this accurate?
3 things the researcher must do to make it an experiment
1. The researcher must vary (manipulate) at least one variable - Rather than looking at naturally-occurring variation, the researcher creates variation by manipulating a variable (creating experimental conditions) and then sees what happens to other variables 2. The researcher must have the power to assign participants to experimental conditions in a way that ensures their initial equivalence - People in each condition must be equivalent at the start of the study 3. The researcher must control for all other systematic confounds
When can you use the t-test?
- IV is nominal with two categories only - DV is interval or ratio (i.e. continuous) - DV is normally distributed
Disadvantages of phone interviews (vs. in-person) (3)
1. Can't read body cues for phone 2. Might be some difficulty with technology 3. They can just hang up; easier to do than leaving an interview
3 types of non-probability sampling
1. Convenience sampling 2. Quota sampling 3. Purposive sampling
Advantages of internet for questionnaires (3)
1. Internet quicker to respond and send 2. Internet may feel more anonymous than paper 3. Internet can reach specific niche populations you might be interested in
Types of one-way experimental design (3)
1. Randomized groups design - Ps are assigned randomly to conditions 2. Matched subjects design - Ps are matched into blocks on the basis of a relevant variable (trait), then randomly assigned from blocks to conditions 3. Repeated measures design - each P serves in all experimental conditions
Coefficient of determination
= r^2 This number is equal to proportion of variance in one variable that is accounted for by another variable Explains the variability of the response data around the mean Example: The correlation between children's and parents' neuroticism scores is .25. If we square this correlation (.0625), the coefficient of determination tells us that 6.25% of the variance in children's neuroticism scores can be explained by their parent's scores.
Independent samples t-test
Compare means from two independent samples -> Between-subjects design
Factors that increase observed t-value (3) = What increases the likelihood that ANOVA is statistically significant?
Larger difference in means = larger t value Larger n (sample sizes) = larger t value Smaller variance = larger t value
One-group pretest-postest design
Least useful: O1 X O2 X is intervention O is outcome measure: O1 = pretest; O2 = posttest
Debriefing
Opportunity to obtain information about study
Non-probability: pros and cons (1:3)
Pros: - Easier to do Cons: - Not random - Not generalizable - Researchers have no way of knowing the probability that a particular case will be chosen for the sample. As a result, they can't calculate the error of estimation to determine precisely how representative the sample is of the population
Adv (1) and Dis (3) of Physiological monitoring
Pros: • More accurate Cons: • Really expensive • Need to control a lot in the environment e.g. temperature, light • Still might be some reactivity e.g. attaching electrodes to someone might raise heart beat a little
Failing to reject the null hypothesis
Researcher concludes that the null hypothesis is correct and that the independent variable did not have an effect. We cannot say that we "accept" the null hypothesis, because the null hypothesis can never be proven
Factors that affect correlation: restricted range
Restricted range - data in which participants' scores are confined to a narrow range of the possible scores on a measure Having a restricted range artificially lowers correlations below what they would be if the full range of scores was present Correlation much stronger when unrestricted Example: If wanting to see correlation between age and hours spent on social media, would not want to just obtain data from sample of college students You'd want to have a population sample with a wide array of ages + wide array of hours of social media use
Probability of making a Type II error determined by 3 things:
Type I error rate -> The larger you set your (to .10 or .15), the smaller your risk of Type II error Size of the true difference between means -> The bigger the actual effect of coffee on cognitive performance, the smaller risk of Type II error Amount of error variance and sample size -> The less noisy your data is and the bigger your sample size is, the smaller your risk of Type II error
One-way experimental design
an experiment in which only one independent variable is manipulated (can be any number of levels) The simplest one-way design is the two-group experimental design. An experiment requires at least two levels of the independent variable to compare the effects of one level to another. Control vs. treatment Medication vs. placebo Condition A vs. condition B
Negative skew
more high scores than low scores Where the tail lies e.g. tail on the left would be negatively skewed
T-score
t = difference between the 2 means/how much the means would differ due to chance High t score = the lower your p-value = more likely the difference you found results from the difference you found in your study; less possibility that it's due to chance -> can reject the null hypothesis T value of 0 = no difference in study T value of 1 = difference between the means is the same is what you'd find by chance
Power of a study
the probability that a study will reject the null hypothesis when it is false and, thus, detect effects that actually occur. Power increases with the number of participants in the study. Inverse of Type II error 1 - β
Dependent variable
the variables we measure to see if they change
How to increase variability within research design? (2)
• Sample considerations → want it to be diverse in terms of what you're looking at e.g. looking at age, get different age groups; different regions, age, large sample, • Measurement consideration → a variety of scores e.g. play the song longer so you have more time to think
3 types of single subject research designs
ABA Design Behavior is measured (Baseline period; A) Independent variable is introduced (B) Behavior is measured (A) ABC (DEFG...) Design -keep adding different levels of IV A - Baseline B - One level of the independent variable C - Another level of the independent variable Con: never know which IV is having the biggest impact ABACA design - inserts a baseline period between each introduction of a level of the independent variable -> helpful to know which intervention is more useful Cons: unethical to remove treatment if it's useful; can't generalize Pro: unique, on yourself
ANOVA Con 1
ANOVA is used to analyze data from designs that involve more than two conditions or groups (more than two means) -> 1 IV 3 means/groups An ANOVA analyzes the differences between all condition means simultaneously ANOVA holds the alpha level at .05 regardless of the number of means being tested ANOVA null hypothesis: mean 1 = mean 2 = mean 3 (all groups are equal) -> ANOVA hypothesis: at least one of the means is different from the others Uses the F-test Cons: increases chance of type II error
Hypothesis
An if-then statement of the general form, "If a, then b." Based on the theory, the researcher hypothesizes that if certain conditions occur, then certain consequences should follow - Must be developed a priori rather than post-hoc - Must be falsifiable - If we confirm the hypothesis, that only means we can support the theory and not that the theory is 100% true. since we can't test it on every single person - Failing to find support for hypothesis does not disprove theory →Because just because there's not a strong correlation, doesn't mean there's no correlation; Study could be flawed Directional Hypothesis - predicts the direction of the correlation (i.e., positive or negative) -> use a one tail test -> right side Nondirectional Hypothesis - predicts that two variables will be correlated but does not specify whether the correlation will be positive or negative -> use a two-tailed
Interactions
An interaction occurs when the effect of one independent variable differs across the levels of another independent variable. For example, if the effect of variable A is different under one level of variable B than it is under another level of variable B, an interaction is present. Number of interactions -> need to have at least 2 IV 1 IV = 0 interactions -> cuz only one IV being tested 2 IV = 1 interaction e.g. AB = BA -> 1 3 IV = 3 interactions e.g. AB, BC, CA Look at class 17 to practice
Control Group Interrupted Time Series Design Pro 1
An interrupted time series design that includes a nonequivalent control group that does not receive the quasi-independent variable Pro: helps rule out certain history effects O1 O2 O3 O4 X O5 O6 O7 O8 O1 O2 O3 O4 -- O5 O6 O7 O8
Post hoc tests or multiple comparisons (t-tests)
Are used to determine which means differ significantly If the F-test is not significant, follow-up tests are not conducted because the independent variable has no significant effect
5 general ethics principles
Beneficence and Nonmaleficence - doing good for the world and above all else do no harm Fidelity and Responsibility - establishing relationships of trust with the people you work with; consulting when necessary when it's out of your training; accurately portraying your results Integrity -accurate, honest, truthful, avoiding any fraud or misrepresentation Justice - everyone having equal access to mental institutions (research results and treatment) Respect for People's Rights and Dignity - privacy, confidentiality
How to account for experimenter and participant expectancy? (2)
Blind and double blind studies can help control for this Blind = either experimenter or participant; not know the level of Double blind = used mostly with medication; easy for researchers not to know which medication is given
Probability sampling: Multistage sampling
Can combine different sampling strategies E.g. stratify based on criteria, then use systematic sampling (2 stages) E.g. Cluster sample, then cluster sample within that cluster, then random sample (3 stages) NY counties->School districts->teachers
Paired-samples t-test
Compare means from two related samples Includes: Matched random assignment and within-subjects designs ->Takes into account that participants in the two conditions are similar (matched) or the same (within-subjects) Matched random assignment -> matching participants on some characteristic Therefore reduces error variance and increases power -> increases power cuz groups that are more similar = less chance/reason that the difference we find is due to group differences = more likely to find true differences due to IV
Contrived observation
Contrived observation - involves the observation of behavior in settings that are arranged specifically for observing and recording behavior - often conducted in laboratory settings in which participants know they're being observed (observers usually concealed behind one way mirror or behavior recorded and analyzed later) → problem is that people don't often respond naturally → reactivity - Can also be conducted in the real world, setting up situations outside of the lab to observe people's reactions e.g. helping behavior experiments
Correlational research
Correlational research is used to describe the relationship between two or more naturally occurring variables. Either direction could work; just linked
To guard against order effects, researchers use
Counterbalancing- randomizing the order that you go over the conditions in
Factors that affect correlation: outliers
Definition: Considered an outlier if more than 3 standard deviations away from mean -> Data point that's really far away from the rest of your data that makes you question whether something went wrong Can be both on line or off line Often data is represented with and without the outlier(s)
Standard error of the difference between two means
Denominator of formula for t score -> how much the means would differ due to chance Estimated using a measure of the amount of variability within each group, averaged across groups
Ways to obtain data (5)
Direct questioning - most commonly used Observation - natural vs. contrived; overt vs. covert Collateral reports Physiological monitoring -biological data Psychological assessment- IQ and personal tests Archival
Probability sampling: Cluster sampling + 2 adv
Divide population into clusters based on location, randomly select within those clusters, obtain data from all individuals within the final clusters Adv: 1. A sampling frame of the population is not needed to begin sampling—only a list of the clusters 2. If each cluster represents a grouping of participants that are close together geographically (such as students in a certain county or school), less time and effort are required to contact the participants.
Error bars
Error bars are the standard error -> what we'd see by chance Small error bars means more confidence that results are from IV Big error bars means more likely results are from error variance
How do we know whether the means are really different from each other because of our IV, or different just due to chance?
Estimate how much the means should differ due to error variance even if the independent variable has no effect. If the observed difference exceeds this amount, then the independent variable may be having an effect. We cannot be certain that the difference was caused by the independent variable, but we can estimate the probability that the independent variable caused the means to differ. When its less than 5% (p < .05) we say the findings are statistically significant On average, after running the study many times, the difference between means should be 0
Experimental research
Experiments are the best way to see whether one variable causes another variable
F-test (for ANOVA)
F-test = the ratio of the variance among conditions (between groups) to the variance within conditions (within groups). -> The larger the ratio of the variance....higher chance that it's statistically significant The larger this ratio, the larger the calculated value of F, and the less likely that the differences among the means are due to error variance.
If the F-ratio is small (0 or 1): (2)
F=0= means don't vary at all F=1 = between would be equal to within - The variability between conditions' means is no different than would be expected due to chance - We fail to reject the null hypothesis
Levels of the IV
For some IVs, there is a level that represents the absence of the IV -> participants assigned to this level are referred to as the control group -> Control groups are useful when you want to know the baseline level of a behavior in the absence of the IV -> improves internal validity Participants are assigned to a nonzero level of the independent variable are referred to as the experimental group(s) Difference between experimental group and placebo effect from control group tells you how effective the treatment really is
Confidence interval
Giving you a range of where you think the actual mean is→useful cuz sampling error, our sample doesn't look exactly like the actual population, but can say it looks around this range If confidence interval is smaller (e.g. 45-47%) = more confident; not confident is when you have a larger range →As sample size increases (higher N), the more likely you'll be more confident in your interval →confidence interval should decrease, ability to draw accurate inferences goes up, statistical power goes up As standard deviation goes up, confidence in interval goes down
Relationship between accuracy of (null) hypothesis and the means of the conditions
If the experimental hypothesis is correct, there would be a difference between the means of the conditions If the null hypothesis is correct, there would be no difference between the means of the conditions
When might we want to set a different alpha? (3)
If you're more sure that the study results are going to be statistically significant -> would reduce chance of making a type I error, but increase risk for type II error -> There might be a difference but you're not catching it cuz you're being extra conservative When you want to run the test a bunch of times within one study, might want to lower alpha so that overall chance of making a type I error remains low The larger you set your alpha (to .10 or .15), the smaller your risk of Type II error -> Larger when it's an exploratory design
The IRB process
Institutional Review Board (IRB) The IRB reviews all research proposal involving human subjects to ensure that the principles of voluntary participation, harmlessness, anonymity, confidentiality, and so forth are preserved, and that the risks posed to human subjects are minimal.
Archival data collection, adv/useful for (5); dis (1)
Looking at old records- coding with content analysis Could be useful for: • Studying social and psychological phenomena that occurred in the historical past → glimpse of how people thought, felt, and behaved by analyzing records from earlier times • Studying social and behavioral changes over time • Some topics that inherently involve existing documents such as newspaper articles, magazine advertisements, or campaign speeches • Studies that researchers can't conduct because a certain event needs to be studied after it has already occurred e.g. riots, suicides, mass murders → we wouldn't know in advance who to study as "participants" • To study certain phenomena, they need a large amount of data about events that occur in the real world Limits: • Must make do with whatever measures are already available → concerns about reliability and validity of the data
MANOVA
MANOVA tests differences between the means of two or more conditions on two or more dependent variables (can be one way or two way) MANOVA is used: when the dependent variables are conceptually related to one another to avoid increasing Type I error by conducting multiple ANOVAs on several dependent variables
Interrupted Time Series Design with Multiple Replications Pro 1 Con 1
Multiple replications: Introduce and withdraw intervention/quasi-independent variable multiple times -> Pro: more confident and evidence that intervention works; more you see this pattern of reversal, the higher confidence you have with internal validity Cons: - possible carryover effect -> fatigue, order effect
Collateral information: third party reports (1 adv 2 dis)
No self report bias But is there a discrepancy between what one person says compared to another? Tend to be more expensive-need more respondents
Probability sampling considerations (2)
Nonresponse problem - failure to obtain responses from individuals that researchers select for the sample Ways to prevent: -Rewards/incentive -Keep it short/concise -Be persistent → negative reinforcement -Giving them a heads-up; let them know you'll be contacting them soon Misgeneralization - occurs when a researcher generalizes the results to a population that differs from the one from which the sample was drawn
Simple interrupted time series design Con of more than 1 post test
O1 O2 O3 O4 X O5 O6 O7 O8 Measure the dependent variable on several occasions before and after the quasi independent variable occurs More than one pretest (establish a baseline) and more than one posttest Cons of more than one posttest: - Effect of the intervention might reduce -> time limited IV effect
Interrupted Time Series with a Reversal Pro 1
O1 O2 O3 O4 X O5 O6 O7 O8 -X O9 O10 O11 O12 Shows the effects of the quasi-independent variable on the target behavior (at X) AND what happens to the target behavior when the quasi-independent variable is removed (at -X) Pro: Adds internal validity Reversal: Take the intervention away if you revert back to the beginning, then intervention works
Correlation coefficient (6)
Often represented by Pearson Correlational coefficent (r) r is used when you have 2 continuous variables (interval and ratio) R ranges from -1 to +1, with the sign indicating direction If r is negative, it's an inverse relationship 0 means absolutely no relationship Closer to zero indicates weaker relationship, closer to -1 or +1 indicates stronger relationship If data is closer to the correlational line, r value is higher - strong relationship The higher the correlation is in the sample, the more likely the statistical significance = more likely you can say there's a correlation
Statistical significance: p-value
P value obtained is probability of getting results (test statistic) at least as extreme (equal or larger than data) as ones observed if null hypothesis is true With correlation: if there is no correlation between variables in population, p-value is the likelihood of finding the correlation that was found in the study sample
How to lower reactivity (3)
Partial concealment strategy: - researchers compromise by letting participants know they're being observed while withholding information regarding precisely what aspects of the participants' behavior are being recorded →lowers, but does not eliminate, the problem of reactivity while avoiding ethical questions involving invasion of privacy and informed consent Knowledgeable informants: - researchers occasionally recruit people who know the participants well—to observe and rate their behavior Unobtrusive measures: - involve measures that can be taken without participants knowing that they are being studied. Rather than asking participants to answer questions or observing them directly, researchers can assess their behaviors and attitudes indirectly without intruding on them in any way. e.g. content of people's trash cans
Null hypothesis
Prediction you are making if your theory is NOT TRUE → need this to do formal statistical tests Depends on the original hypothesis: - If hypothesis is non-directional, then not the opposite, not negatively; just the lack of ... e.g. there is not a relationship - If hypothesis is directional, then say there is no positive/negative relationship...(you do NOT say "there is a negative correlation..." for positive hypothesis) -> leave open the possibility that there's a negative correlation e.g. Group A will not have a higher mean score than Group B e.g. There is no positive relationship between variable x and variable y
2 main types of sampling
Probability: - Every unit in population has known chance (non-zero probability) of being selected. - Therefore you can estimate how much your results are affected by sampling error - Doesn't mean everyone has the same chance of being selected; just that everyone has a chance Non-probability: - Some units of population have zero chance of being selected or probability for some units can't be determined - Therefore you can't estimate how sampling error affects your results - BUT this is not necessarily a problem when researchers are not trying to describe precisely what a population thinks, feels, or does
Probability sampling: pros and cons (2:2)
Pros: - Generally more representative of the population we're looking at - So can extrapolate and generalize our findings better Cons: - More time consuming and expensive - Need a sampling frame - a list of every potential person that we need for the study e.g. need a list of every person who's going to walk by the park where we're looking for participants; not possible; need to be there and actively count (nonprobability)
Naturalistic observation: adv (4) and dis (3)
Pros: • Can be a more objective form of data rather than just taking the participant's word for it with self-report • What we're really interested in is people's behavior; good way to get that is through observation • High external validity - people are how they normally would be • Can observe people who can't articulate well e.g. babies Cons: • Observer bias- observer knows what they're looking for, so they can skew the data unconsciously; focus on certain things • Reactivity - a change to how you'd normally be when you're being observed • But no internal validity - can't control for stuff; can't control how everyone experiences the environment in the same way
Random assignment
Random assignment is when we randomly assign participants to be in each condition in a between-subjects design Random assignment is after you get your sample (NOT TO BE CONFUSED WITH RANDOM SAMPLING)! Unless we use RANDOM ASSIGNMENT, participants might vary on important characteristics at the start of the study Random assignment ensures that participants' Ensures that pre-existing characteristics are distributed randomly and evenly across conditions or groups -> no preexisting differences-> INITIAL EQUIVALENCE Therefore, the only thing that is different is the IV -> thus, we can attribute any differences in the DV across groups to the IV alone
Probability sampling: Simple Random Sampling
Randomly selecting respondents from sampling frame; everyone has an equal probability of being selected and included in the sample
Reliability coefficients: what do they tell us?
Range from 0 (no reliability) to 1 (hypothetical) Closer to 1 = more reliable > 0.90 is very good, > 0.80 is good, > .7 is acceptable 0 = all error variance; no consistency in the measure Reliability coefficient represents true score variability e.g. A test with a reliability coefficient of 0.87 indicates that 87% of variability is due to true variability vs. 13% due to measurement error
Factors that affect correlation: reliability of measures
Reliability of a Measure - the less reliable a measure is, the lower its correlations with other measures will be. If the true correlation between neuroticism in children and in their parents is .45, but you use a scale that is unreliable, the obtained correlation will not be .45 but rather near .00.
Non-probability sampling: Purposive
Researcher uses judgment to obtain sample based on what is thought to be appropriate "this person seems representative of the class"
Factors that affect correlation (3)
Restricted range Outliers Reliability of measures
Even if the study meets the vital properties of an experiment, we still have the issue of:
Sampling error! Even if you did no manipulation at all, the means of any two groups are likely to be different just due to chance Remember, we want to answer questions about the larger population, not just our sample
Probability sampling: Stratified sampling (proportional vs. disproportional
Sampling frame divided into sub-groups based on some characteristic/demographic, then randomly sampled within each group Used more often→ if you want your sample to be more representative of the overall population e.g. break population down into each race, and then within each race, choose a sample; Pros: You ensure that you'll get appropriate proportions of the demographics you want since random sampling could potentially just choose one race by chance, and neglect the other races Proportional way: your sample has the same proportions of the population; more representative of the overall population e.g. in a population, 60% caucasians, 20% AA, 20% HA; take 60% from C, 20% from AA, 20% from HA Disproportional: dividing the subgroups equally, then choosing equal amounts of each subgroup e.g. getting 20% C, 20% AA, 20% HA
Probability sampling: Systematic Sampling
Sampling frame is ordered by some criteria, every nth item is selected (from random starting point) Instead of randomly choosing, you choose every nth person from a list.g. every 6th person, you'll choose
T-tests
T-tests are used to test the difference between two means
Psychological assessment (2 dis)
Takes a lot of training to be able to administer them - need to be psychologist Time consuming - IQ test takes 2 hours
Relationship between r value and P value
The higher the r value, the more confident you are in your hypothesis, the more you can support your hypothesis and so less confident in your null hypothesis (lower probability of supporting your null hypothesis), so lower P value -> Low p value means low likelihood of obtaining these results if there is no correlation in population P value of 0.04 means 4% chance. Therefore, logically infer that there is a correlation in actual population. - The probability of finding what you did if your null hypothesis is true -> low P value = null hypothesis is rejected
Main effect
The main effect of an independent variable is the effect of that independent variable while ignoring the effects of all other independent variables in the design. A factorial design will potentially have as many main effects as there are independent variables e.g. 2 IV, up to 2 main effects are possible Look at class 17 to practice
How to interpret factorial designs
The number of digits means number of IV The number itself refers to the number of levels A "2 x 2 factorial" (read "2-by-2") is a design with two independent variables, each with two levels. = 4 groups in total A "3 x 3 factorial" has two independent variables, each with three levels. = 9 groups A "2 x 2 x 4 factorial" has three independent variables, two with two levels, and one with four levels. = 16 groups
Independent variable
The variable that the researcher manipulates Each independent variable must have 2 or more levels The levels reflect differences in the type or amount of the independent variable: - Qualitative differences - differences in type e.g. organic or nonorganic meat - Quantitative differences - differences in amount (e.g. 50 grams, 100 grams and 150 grams of a drug) Levels of the IV are also referred to as conditions or groups
Covariate (3)
Things that you think might be related to your outcome, but they're actually not what you're studying Usually demographic factors that YOU CAN CONTROL FOR (therefore not a moderator) e.g. race, age, gender Not a confound → covariate you know/assume, so you control for it; confound you're not aware of it, not controlling for
Single-case experimental design/Single subject research pros 5 cons 3
Use case studies -> focus on a few individuals a detailed study of a single individual, group, or event May use information from numerous sources: observation, interviews, questionnaires, news reports, and archival records All information is compiled into a narrative description Pros: - A very unique case of person; can get rare info - Can get more info on how each individual reacts, which might be different than how individuals on average react - Can go much more in-depth - Not much control involved, so easier and cheaper - Easy way to do a pilot study on one person Cons: - Can't generalize it to the population -> only one person; low external validity - Hard to know if your treatment is helpful low internal validity - High researcher bias
Deception
Use is "justified" and not feasible otherwise Does not apply to prospective distress/pain (physical or emotional) Must be explained as soon as possible (debriefing)
Repeated-measures ANOVA
Used with repeated measures (within subjects) designs or matched random assignment Same logic as paired t-test design
Variability vs. Variance
Variability is general term to describe how psychological phenomenon differ across people, situations, contexts, time Variance refers to how much each score in a given set of responses varies on average from the mean
If a F ratio is large and reflects a low p value, what does that mean?
We reject the null hypothesis that all the means are the same, and that at least one of the group's means is different to the other means
Type I and Type II error relationship
You always risk making one of these errors, every study involves a tradeoff between them
Type II error
a researcher fails to reject the null hypothesis when it is false beta - the probability of making a Type II error (and erroneously failing to find an effect that was actually present) you fail to reject a null when there is a difference AKA false negative Likelihood of making this error: Beta not something you get to decide YOU THINK YOU FOUND NO DIFFERENCE BETWEEN THE MEANS WHEN THERE ACTUALLY IS A DIFFERENCE.
Type I error
a researcher rejects the null hypothesis when it is true alpha - the probability of making a Type I error (and erroneously believing than an effect was obtained when it was actually due to error variance) find a difference in the study that causes you to think there's a difference in the world, when really there isn't AKA false positive Likelihood of making this error: Whatever you choose as your alpha - Called the significance criterion, or alpha () level - Rule of thumb, set at .05 -> there is only a 5% chance that the difference you found in the study does not reflect a difference in the population means YOU THINK YOU FOUND A DIFFERENCE BETWEEN THE MEANS WHEN THERE'S ACTUALLY NO DIFFERENCE.
Within-subjects design/Repeated measure pros(2) cons(4) how to account for these effects (1)
an experimental design in which each participant serves in all conditions of the experiment Pros: Each person serves as their own control in a way You can have fewer participants; no need as many Cons: Order effect may have some residual coffee from the last cup Fatigue - performance is going to worsen over time Practice effect - will probs do better as you know Sensitization - over time, the participant becomes more aware of what the study is about How to account for these effects: - Counter-balancing - change the order that they're put in = randomly assign people to a specific order of the conditions; should cancel out the order effects
Factorial design pros (1)and cons(3)
an experimental design in which two or more independent variables are manipulated. Independent variables are referred to as factors. Pros: Get info about how one variable might affect the other > know how variables interact Dis: Harder to control and conduct the study More time consuming and money Need more participants
ANCOVA
an extension of ANOVA that provides a way of statistically controlling the effect of variables one does not want to examine in a study. These extraneous variables are called covariates, or control variables. Often used to account for pretest scores -> control for pretest scores to see if only IV has effect on results
Expericorr factorial/mixed factorial designs
are designs that include both independent variables (that are manipulated) and participant variables (that are measured). Not technically a true experiment, cuz there's one IV that's quasi -> half experimental half correlational Unless you're assigning the variable, you can't say it's causational, can say moderated or associated e.g. Gender can never be the cause of the results
Nonequivalent groups pretest-posttest design Pros 2
both groups are measured before and after the quasi-independent variable O1 X O2 O1 -- O2 Pros: - Can compare changes in individuals - Can see if the groups differed initially
Nonequivalent groups posttest only design Cons 2 Pros 2
measure both groups after one receives the quasi-independent variable X O -- O Cons: - No idea how different the groups are before the intervention - Lower internal validity -> can't tell if it's cuz of the manipulated variable Pros - Not sensitizing groups through pretest - Good for one time events
Positive skew
more low scores than high scores Where the tail lies e.g. tail on the right would be positively skewed
With multiple groups, primary threat to internal validity is:
non-equivalent groups Hope is that both control and treatment group will be affected by the threats to internal validity equally But realistically, because the groups are different, internal threats apply differently Non-equivalent group: non-random assignment or failure of random assignment.
Matched random assignment / Matched pairs design pros 2 cons 2
participants are matched into homogenous blocks (characteristics; similar to stratified sampling), and then participants within each block are assigned randomly to conditions Matched random assignment helps to ensure that the conditions will be similar along some specific dimension, such as age or intelligence. Typically matched on variables likely to affect independent variable Number of people in a group = number of conditions; one in each condition Pros: - can ensure equivalence on an important attribute - same number of a certain type of characteristic in each group -> doesn't mean that it makes it more representative; just makes groups more equivalent Cons: - still differences between the groups (can't control everything; can't match people up perfectly) - hard/time consuming
Simple random assignment
participants are placed in experimental conditions in such a way that every participant has an equal probability of being in any condition When it's not explicitly said how they assigned the participants, it's simple random assignment
Rejecting the null hypothesis
researcher concludes that the null hypothesis is wrong and that the independent variable had an effect on the dependent variable
Three-way designs examine:
the main effects of three independent variables three two-way interactions - the A X B interaction (ignoring C), the A X C interaction (ignoring B), the B X C interaction (ignoring A). The three-way interaction of A X B X C Up to 3 main effects Up to 3 interactions
Quasi-experimental
the researcher lacks control over the assignment of participants to conditions and/or does not manipulate the variable of interest. Pre-existing differences that you can't assign or can't ethically assign e.g. Gender, race, ethnicity, SES; things you can't ethically assign e.g. mental illness, eating habits Main concern is internal validity since can't assign participants to groups, they already are in one e.g. Does smoking cause cancer? e.g. Did 9/11 cause an increase in prejudice against people of middle-eastern decent?
Item Formats styles (7)
• Closed form - given specific answers; choose answers from what you're given → easier to answer and score • Open form - you can come up with your own answers → harder to score and answer; but better data Pilot studies use open-ended questions more to find out what answers people may give • Dichotomous format - two choices; true/false; yes/no Used in educational tests and some personality tests e.g. MMPI-2 - personality test • Polytomous format More than 2 answer choices e.g. Multiple choice exam Good cuz can still guess correctly, but very unlikely • Scaled formats Series of gradations, levels, or values that describes various degrees of something E.g. Likert scale -indicate degree of agreement with particular attitudinal question eg. Strongly disagree, disagree, neutral, agree, strongly agree • Rankings Get a sense of what you like more or less based on ranking • Completion/fill in blank More open-ended; but still some structure to it in completing a response