Exam 4 - Research Methods
Reasons for poor replicability
- Hesitation to publish the null - small samples and low power - unknown moderators - questionable research practices
Calculating effect size for MA
Common effect size determined for all studies, weight based on sample size, can calculate effect sizes for subgroups
Respect for persons
Confidentiality, right to withdraw, debriefing, voluntary participation, informed consent
Advantages of MA/VG
- Better estimates of population parameters - Resolution of theoretical debates fueled by "mixed-findings." - Science progresses faster because evidence for or against theoretical propositions becomes clearer
Problems with MA
- "file drawer problem" all studies might not be published/accessible, biased effect sizes - power may still be inadequate if there are too few studies with small samples - Can't overcome methodological flaws in original study
Distribution of effect sizes
- If all samples were drawn from same population of effects, variability should be due to sampling error - if homogeneity is supported (tests of homogeneity are insignificant) then variability can be due to sampling error - weighted effect sizes are calculated, overall population effect size - if heterogeneity exists, studies grouped by potential moderators and re-tested - may also be due to outliers which can be trimmed
Coding studies for key characteristics
- Key characteristics are anything that could account for variance in effect sizes - I.e., potential moderators
Low power problem
- Many studies are greatly underpowered - Failures to reject the null are often the result of low power and result in "mixed findings" which prevent further exploration
P values cont
- P is not a sliding scale, choose P before analyses and stick with it - no such thing as a "trend towards significance" - Significance is a rule in vs. out process, accept the level of T1 error you are willing to commit and stick with it
Conducting the NHST
- calculate t statistic - compare to sampling distribution to see if it exceeds critical value - critical value dependent on df, p value, and whether 1 or 2 tailed - if our statistic does not exceed critical value, retain the null
types of nonprobability sampling methods
- convenience, accidental or haphazard - purposive - snowball
Types of probability sampling
- simple random sampling, systematic, stratified, cluster
Problems with NHST
- the idea that there is no difference or no correlation when the null is true is probably incorrect - given a large enough sample size, anything will be significant - We are more likely to commit Type II (saying there is no effect when there is one) errors with NHST, and NHST does not address this
Meta Analysis
-examines effect sizes across a number of different studies looking at same IV and DV - Estimate population effect sizes by avging effect sizes - Attempts to overcome low power problem by making use of many samples - Validity Generalization studies (VG) measure the validity of a certain measure across a number of studies -
Two tailed tests
-require larger samples because we are testing in both directions, splitting critical value - usually choose two tailed bc we want to interpret anything regardless of direction, choose 1 tailed for intervention research
Three questions about relationship between IV and DV
1. What is the nature of the relationship between the independent and dependent variables? 2. Is the relationship real, or is it due to chance? 3. How big an effect did the independent variable have?
stratified random sampling
A form of probability sampling; a random sampling technique in which the researcher identifies particular demographic categories of interest and then randomly selects individuals within each category. - oversample underrepresented populations - e.g., by race, add up to form representative sample
probability sampling
A type of sampling in which every element in the population being studied has a known chance of being selected for study
Goal of sampling
Achieve representative proportion of population so you can generalize study findings Different sampling techniques vary in how likely they are to achieve this goal.
Sources of Type II errors
Bad research design - Poor construct valid - Weak manipulation/unreliable variables - Failure to control extraneous variables - Failure to test for curvilinear relationships - Failure to test for moderators Low power
Confidence Intervals as an alternative to NHST
CI focuses on estimating the actual effect of interest - and the degree of uncertainty about what it really is - Researchers who see results presented as CIs are more likely to make correct inferences if they think in terms of estimating the effect size rather than NHST
Types of effect size
Cohens D: Difference between the means divided by SD Percentage of variance explained, e.g., R2, eta square, etc
proportionate stratified sampling
Each strata are equal in size
disproportionate stratified sampling
Each strata is a different size - necessary when oversampling certain populations, e.g., minorities
Exploring moderators in MA
If effect sizes are significantly different between subgroups, probable that something is moderating
Pitfalls of sampling
If sampling procedure is flawed, could end up with a sample that is not representative of general population
data torturing
If you analyze data enough, it will tell you what you want to hear -Type 1 error and multiple testing -Are subset differences real or chance findings
Real vs. Chance association
Inferential stats tells us the likelihood of whether or not our observed results are due to chance rather than the effect of the IV.
effect size (magnitude)
Low variability = smallest effect size, high variability, greatest effect size
Times when accepting the null is the goal
Mediator analysis - no effect of the IV after controlling for the M Ruling out confounds Discriminant validity Have to be careful, if not enough power, then accepting the null doesn't mean anything
Effect size uses
Metric of practical significance, how much the IV is influencing the DV Effect sizes are standardized, can be used to compare across multiple studies Can be used to determine power for a study
How big is the effect?
NHST tells us nothing about how big/important the size of the effect is. Effect size is the magnitude or size of the association, how much impact the IV has on the DV
Determining power
Power is 1-beta (probability of determining no effect when there is one) Higher power requires larger samples, typically strive for 80%
Ethical principles
Respect for persons, beneficence, justice
Beneficence
Risk-Benefit Analysis • Monitoring for Harms • Alleviating adverse effects • Debriefing • Confidentiality
Nonprobability sampling
Selection is systematic or haphazard, but not random.
Practical significance and small effects
Small effects add up over time Weak manipulations sometimes result in small effects, which means with a strong manipulation, we could see a large effect
Null hypothesis testing
Statistical test to determine whether the results are due to sampling error Goal is to reject the null, which is assumed to be true by default - there is no difference between groups (t/f test) - the corr coeff is 0 (regression) Accepting the null means that you did not find evidence suggesting it was false
Guidelines of strong research
Strong theoretical foundation Devise and stick to a-priori data collection and analyses Decide on reasons for data trimming in advance Avoid HARKING Report all the results, not just the significant ones Double check results for accuracy
Consequences of low power
Studies that are significant and make it into the lit yield distorted effect sizes
Interrelationships of power
T2 error is dependent on power Power of test is dependent on alpha (i.e., accepted T1 error commission As alpha becomes more conservative, power increase As sample size increases, power increases, As effect size increases, power increases
data snooping
Using multiple analyses to find significance and then publishing the results. "torturing the data until they confess" looking at data before experiment is complete
Practical Significance, effect size
Value judgement for how useful information is for theory or clinical implications Criterion for practical significance is the minimum impact considered to be important to research
Justice
Voluntary participation, compensating control groups, IC, equitable sharing of risks and benefits
Determining sample size and power
You need: The Effect Size you expect to find The Type I Error Rate you will set Whether the statistical test will be one-sided or two-sided The amount of Power you want to detect the effect (i.e., 1-Type II Error rate)
purposive sampling
a biased sampling technique in which only certain kinds of people are included in a sample - identify a typical case through lit and experts - Problem: Proportionality of these cases in the population
nonprobability sampling
a sampling technique in which there is no way to calculate the likelihood that a specific element of the population being studied will be chosen - likely to misrepresent population, no way to tell if it does or not
data trimming
consists of changing data values so that they better fit the predictions made by the research hypothesis
Simple random sampling
every member of population has an equal chance of being selected
Snowball
people forward the survey onto people they know. good for hard to reahc pops
Convenience & accidental sampling
people who are easily accessible - college studs - Problem: no evidence of representativeness
P values
probability that you obtained results your did based on sampling error, assumes the null hypothesis is TRUE The size of P is not the size of the effect (e.g., .001 is not a bigger effect than .05) - It is a measure of rarity, doesn't say anything about how big or important the effects are
sampling distribution
the distribution of values taken by the statistic in all possible samples of the same size from the same population - If you repeatedly took two samples of size n from the same population and computed the difference between the two means divided by the SE, those differences would form the sampling distribution of the t-statistic - sampling distributions will depend on the size of the sample - A sampling distribution tells you what percentage of samples (or differences between two samples) will exceed any particular value
Data dredging
the inappropriate (sometimes deliberately so) use of data mining to uncover relationships in data that may be misleading.