Statistical Power & Sample Size
Power can be given as
%, probability, or proportion
Effect Size
A basic, commonly-used between-groups effect size is Cohen's d • An effect size corresponding to the difference between 2 group means - Can also be used in within-groups designs d=(M1-M2)/SD - difference between 2 means divided by the pooled SD for the two groups *percentile standing = control group percentile corresponding to mean of treatment group
Calculating power
The correlation coefficient (ie Pearson's r) is a simple measure of effect size -Varies between -1 and +1 -Not dependent on the unit of the measurement scale -Important point: r is just one measure of effect size; there are many different measures of effect size depending on the research design and statistical test being used
Power in contemporary psychology
psych and related disciplines are in the midst of a statistical crisis -many researchers are using poor statistical practices; many results have not been replicated and/or are not replicable -this severely limits the knowledge base of our field
violations of assumptions underlying tests will
reduce power Non-parametric tests (not covered in this module), which do not have as many assumptions, may be more powerful in such cases
As sample size increases, power increases
with effect size (Cohen's d) = 0.5, a=0.5
As effect size increases, total sample size required to achieve power goes down
with power = 0.8, a=0.05
Type I error
- False positives (stating an effect is present when in reality there isn't actually an effect) - is the conventional probability value (α) used for significance testing (ie p = .05, .01 etc)
Type II error
- Misses (stating there is no effect [failing to reject the null hypothesis] when there is actually an effect) - Occur with a probability of β
We can estimate effect size from:
- Previous research: use sample means and SDs obtained from previous studies. • Meta-analyses (analyses that incorporate a large number of studies to examine the evidence for, and size of, an effect) can be especially useful. - Researchers estimate of important effect: are searcher decides on the minimum important difference between means. Still need to estimate SD. - Using conventional labels of effect size magnitude for d originally provided by Cohen
Power is a function of the following factors:
- The probability of a Type I error (α) - The magnitude of effect assuming H1 (ie mean difference) - Sample size - Type of statistical test used - One or two-tailed test used - How well the data satisfy the test assumptions
A conventional value of power = _______ is often desired
0.80 Note that increasing levels of power beyond this value are often bought at very high sample sizes.
An example using correlation:
Calculate or estimate r We then decide on our value of α, and we can use statistical tables (e.g. see Cohen, 1992) or online calculators etc to calculate power If power equals 0.60, then we can say the researcher has a 60% chance of correctly rejecting H0 if it is false (i.e., if there is a real linear relationship between the two variables) - The typical H0 in psychology is that r = 0
Factors affecting power [1]
Decreasing the threshold for significance (ie α) will decrease power. Increasing the threshold will increase power, but also the probability of a Type I error.
Practical note: when estimating your effect size for a study you plan to run, it is always best to err on the side of caution and make a conservative estimate (i.e. underestimation). Why?
If you overestimate your effect size, then you may be underpowered
Factors affecting power [3]
Increasing sample size will decrease variance of distributions, thereby increasing power
Factors affecting power [2]
Power will increase with larger effect sizes (i.e. larger magnitudes of effect between distributions)
Why should we worry about power?
We are often interested in knowing the sample size needed to achieve an adequate level of power before we begin a study or research program. • This often has important implications for planning time, resources etc when designing a study.
How do we calculate power?
We need to know (or estimate) the following ingredients: - Effect size (e.g. Cohen's D, Pearson's r) - Sample size - Significance level (α) - Whether the statistical test is 1-tailed or 2-tailed
Power tests can be used either ________________ or _________________
a priori or post hoc Determining sample size needed before a study takes place is an example of controlling statistical power a priori.
When assumptions hold, non-parametric tests usually have
less power
Power has a prominent position in current discussions
many studies are underpowered increasing statistical power is regarded as a critical step toward moving past and learning from the current crisis a priori power analyses are not required for many journals and applications to funding agencies
Statistical power example
null: 2 means don't differ alternative: they differ [see pt. 2]
statistical power
power = 1 - β the probability of correctly rejecting a false H0 (null hypothesis e.g. there is no difference between 2 groups) In other words, the probability of detecting an effect that is really there
β =
probability of making a Type II error (i.e., incorrectly failing to reject H0 [stating 2 groups don't differ when they actually do])
The best way to approximate the value of a meaningful effect size is to look at
the immediate research area what is considered to be a meaningful effect size will vary considerably across subfields of psychology and germane disciplines (psychiatry, neuroscience)
conventional labels (and obviously measures of effect size) will vary as a function of
the type of test, the study design, the measures, used, etc.
Power is also a function of
the underlying assumptions of parametric tests
We can plot the relationships between factors like effect size, power and sample size to visualise ___________________________________________
their dependence on each other
Analysis outcomes can be parsed into a 2x2 table based on ______ _______ and __________________
true state; decision
Post hoc power analyses can be used to examine
whether a statistical test had a fair chance at rejecting an incorrect H0. Importantly in this case, the measure of effect size should be based on the population effect size Sometimes post hoc power analyses are reported using sample estimates of the effect size (sometimes called 'observed power'). - Easily done in SPSS - This use is generally frowned upon as sample effect sizes are biased estimates of population effect sizes.