Statistical Power

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Limitations

Common statistical tests have easy to use equations, however, their use can be limited. •Based on statistical assumptions and expect certain data characteristics. Complications in statistical analyses can render simple power analyses unhelpful. One way to address this is to conduct power analysis via simulation.

Power Analysis

For every statistical test, power depends on: •Power (1 - Type II error rate). •Significance level. •Effect size. •Sample size. These elements are all inter-related. •For many standard statistical tests, if you know three of these things, you can calculate the fourth. There are a few reasons to conduct a power analysis: If you have a target power and effect size: •You can obtain the sample size needed to confidently detect an anticipated effect. If you have a sample size and target power: •you can examine the smallest effect size that can be reliably detected. If you have a sample size and effect size: •you can estimate the observed power of the test. •Using power analysis to determine the appropriate sample size: •If you have an estimate of the anticipated effect size of the effect you wish to detect, you can calculate the smallest sample size that could reliably detect the effect. •Conducted before data collection when developing research design/analysis plan. •Required for research/grant proposals. •Demonstrates an understanding of research design.

Low Statistical Power

Many published research article in psychology report studies with low statistical power. •Usually due to small sample and/or small effect sizes. Low statistical power is problematic because: •More uncertainty in estimated effects. •Increases false negative rates. •Increases false positive rates. •Overestimates true effect sizes. Overall, low statistical power undermines the confidence you can place in results.

Recap: Type I and Type II Errors

Recall that significance testing can lead to the wrong conclusion: Type I Error (False Positive) •Rejecting the null hypothesis when it is true. •I.e., concluding there is an effect when there is not. Type II Error (False Negative) •Failing to reject the null hypothesis when we should have rejected it. Inherent in the alpha level (p = .05), there is a trade-off between Type I and Type II errors. •A more strict alpha level (e.g., p = .001) leads to less Type I errors, but more Type II errors. •A more lenient alpha level (e.g., p = .10) leads to less Type II errors, but more Type I errors. •Aim is to minimise both types of errors, but it is impossible to completely eliminate either.

Statistical Power cont.

Power depends on the significance level. •With a stricter significance level (e.g., p < .001), it becomes harder to reject the null hypothesis. •Therefore, the chances of failing to reject the null hypothesis goes up (and statistical power goes down). •By convention, significance level is set to .05. Statistical power depends on the effect size. •When effect size increases, the effect becomes easier to detect. •When effect size decreases, the effect is harder to detect. •Therefore, as effect size increases, power also increases. •Usually, the effect size is unknown. Statistical power depends on sample size: •As sample size increases, the standard error decreases (i.e., the estimated effect becomes more like the population). •As standard error decreases, it becomes easier to detect an effect. •Therefore, as sample size increases, power increases. •Out of the four factors that influences power, sample size is the one researchers have control over.

Simulations

Simulated data refers to a dataset made by generating variables with "random" values according to set characteristics (parameters). Normally, you would simulate a dataset to match an existing dataset, or a dataset you plan to collect. •E.g., if you plan to conduct a t-test, simulate a dataset with a categorical IV with two levels and a continuous DV. By generating a large number of these simulated dataset, performing analyses, and recording the results each time, you can calculate a mean estimate and sampling distribution.

Power and Interactions

Statistical power for interactions is not intuitive. •Typically due to confusion regarding effect size of interactions. To sufficiently power an interaction, you typically need more participants than you would to find a main effect. Example - Given a 2x2 design: •As the difference between the different levels of the moderator decreases, so does the effect size of the interaction. •Effect size for a "knock out" or "attenuated" effect is half or much smaller than the simple main effect. •As such, more participants are required to detect this type of interaction. •The effect size for an interaction is the difference between effects at different levels of a moderator. Effect size for a "reversal" is the same size as the effect size for a simple main effect SO: For a 2x2 design: •If you expect the moderator to show a reversal: •Each cell requires the same sample size as the main effect (i.e., 2x the sample size). •If you expect the moderator to knock out the effect: •Each cell requires twice the sample size as the main effect (4x the sample size). •If you expect a 50% attenuation due to the moderator: •Each cell requires 7 times the sample size as the main effect (14x the sample size).

Statistical Power

The probability that a statistical test will be able to detect an effect when it truly exists. In actuality, because of NHST: •The probability of correctly rejecting the null hypothesis given a research design and statistical analysis. An example: •Power of .80 = if we perform a study 1000 times, we would see a statistical difference 800 times (80%). •Sufficiently powered studies lead to greater confidence in results. •Under powered studies lead to results that may be unreliable. •When conducting research, aim to conduct studies that have appropriate power. For every statistical test, power depends on: •Type II error rate. •Significance level. •Effect size. •Sample size. •Note: these are not the only things that influences statistical power. •Almost every research design decision affects the ability for a study to detect a true effect. Power is directly inversely related to Type II error rate. •As the chances of incorrectly failing to reject the null hypothesis goes up, statistical power goes down. Power = 1 - Type II Error Rate •By convention, type II error rate is set to .20, therefore, power is set to .80 (80%).

Power Analysis via Simulations

•Simulate a dataset many times (e.g., 1000) with a given sample size and expected effect size that is a realistic representation of the data you expect to obtain. •Perform your analysis and record whether a significant effect was detected. •The power is the proportion of the iterations which detected a significant effect. •By repeating the above steps, but with different sample sizes, you can gain insight into the required sample size to reach a certain power threshold.

Conducting a Power Analysis

•You wish to conduct a study on whether stress in students leads to unhealthy eating. •You plan to measure both variables on a continuous scale and run a correlation. •You expect there to be a medium sized effect (r = .30). •With a power threshold of 80% and significance level of .05, you can calculate the sample size required to reliably detect this effect. •N = 85 participants. •If you expect a different effect size, this influences the number of participants needed: •Small (r = .10): N = 782 participants. •Medium (r = .30): N = 84 participants. •Large (r = .50): N = 28 participants. Where to get effect size estimates for power analysis: •Previous research. •Meta-analyses. •Pilot data. •What you would expect based on convention. •What you think is important (e.g., clinical relevance). Determining sample size based on what previous research has used can be problematic, as much of published research is underpowered. Some things to be aware of: •Provide the minimum sample size required following the 'best case scenario'. •There is no disadvantage in collecting a larger sample size (other than time/effort in collecting additional data). •Quality of the power analysis is entirely dependent on the quality of the estimates. •A poor estimate of the anticipated effect size compromises the quality of power analyses. If you have a sample size and target power: •you can examine the smallest effect size that can be reliably detected. •Useful for evaluating the quality of a statistical analysis and/or results in published research.


Ensembles d'études connexes

Pharmacology - Chapters 30, 32, 33, 34

View Set

parties to a crime and court systems

View Set

Chapter 10 Pay for Performance: Incentive Rewards

View Set

Research Notecards (Animal Testing)

View Set

COMM2322 TV Production 1, Chapter 2 UH

View Set

(6) QUIZ - Texas Real Estate Finance

View Set