Null Hypothesis Significance Testing and Power

Ace your homework & exams now with Quizwiz!

Sources of Type II Errors

Low Power

Conducting a NHST using the t-statistic

The t-statistic is typically used when we want to compare whether the means of two groups are different - statistically speaking

Confidence Intervals vs NHST

• Based on the same information - size of the effect found in the research, the standard error of the effect size, and the critical test value associated with the chosen alpha (the Type I error rate) • NHST focuses on making an all or nothing decision about whether there is an effect - its all about the p value, but p values don't tell you much • CI focuses on estimating the actual effect of interest - and the degree of uncertainty about what it really is • Researchers who see results presented as CIs are more likely to make correct inferences if they think in terms of estimating the effect size rather than NHST

Sampling Distributions - the Basis of NHST

• Every statistic (e.g. M, SD, t or F) has a known sampling distribution for samples of size n1, n2, n3 etc. • If you take repeated samples of size n from the same population you will not get the same statistic for the sample each time. Instead you will get a distribution of the sample statistic - i.e. a sampling distribution • If you repeatedly took two samples of size n from the same population and computed the difference between the two means divided by the SE, those differences would form the sampling distribution of the t-statistic • A sampling distribution tells you what percentage of samples (or differences between two samples) will exceed any particular value • Note that the sampling distributions will vary depending on the size of the samples you are taking

Accepting the Null - I.E. Concluding that the Null is True

• Generally, you should not design a study to try to prove the null • The statistical test you would use assumes the null is true • The power you would need to have a Type II error rate of .05 would be 95% and would mean huge sample sizes

Null Results

• In NHST, our typical goal is to reject the null hypothesis and find an effect • The null hypothesis in most research is that there is no effect (note you can have other null hypotheses): • No differences between the means of two (t-test) or more (F-test) groups • The correlations coefficient is not different from zero • Failure to reject the null hypotheses (i.e. accepting the null) means you have not found evidence that the null hypothesis is false

The Low Power Problem

• Many studies in the literature are greatly underpowered. • Failures to reject the null hypothesis are often the result of low power and contribute to the wealth of "mixed findings" in the literature • Mixed findings may lead to a needless search for moderators when none actually exists

Some Times When Accepting the Null is the Goal in Research

• Mediator Analysis - no effect of the IV after controlling for the mediator • Ruling out confounds - no relationship between a potentially confounding variable and the DV • Discriminant Validity - no relationship between measures of two different constructs • Generalizability - absence of moderator effects indicate that the IV's effect is generalizable across the levels of the moderator tested • BUT BE CAREFUL, IF YOU DON'T HAVE ADEQUATE POWER, ACCEPTING THE NULL DOESN'T TELL YOU MUCH

Sources of Type II Errors: Bad research design

• Poor construct validity for the independent and or dependent measures • Weak manipulation, unreliable implementation, etc. • Failure to control extraneous variables • Failure to test for curvilinear relationships • Failure to test for moderator variables

Null Hypothesis Significance Testing (NHST)

• Tests of statistical significance were designed to serve as an aide in assessing the likelihood that data are due to sampling errors, assuming that the hypothesis being tested is true. • The hypothesis being tested is the null hypothesis - which is short for the hypothesis to be nullified, rejected, challenged, etc.

Other Problems with Null Hypothesis Testing

• The null hypothesis of no differences or zero effect is probably never really true • Given a large enough sample size, any difference will be statistically significant • Is it really likely that we would study an IV that has absolutely no impact? • So we may be more likely to be committing Type II errors - failing to reject the Null when we should • NHST does not address Type II errors

Bias against publishing null results may lead to

• Type I errors getting published • Wasted time as different researchers continue to investigate the same problem over and over

Times when accepting the null may seem reasonable

• Was the sample size adequate for detecting the effect size deemed important? • Was the research design and its execution adequate? • Is the absence of the effect predicted by theory? • Has the absence of the effect been replicated?

Conducting the NHST

• We calculate the t-statistic for our study and then compare it to the sampling distribution values to determine whether it exceed the critical value • The critical value depends on the sample size (df) and the Type I error rate we set (i.e. the p value, usually .05), and whether we are conducting a one-tail or two-tail test • If the value from our study exceeds the critical value, then we Reject the null hypothesis • If the value from our study does not exceed the critical value, then we Accept the null hypothesis

Prejudice Against Accepting the Null

• When we fail to reject the Null, why don't we just accept the null and be done with it • Bias against publishing null results

Reluctance to Live by the Test

• When we fail to reject the null, we often find other reasons (excuses) for why we didn't observe statistically significant effects • It was a bad study, for example • Remember the problem with the falsificationist approach and the Protective Belt • So we often don't use NHST as it was originally intended

p is Not a Sliding Scale

• You choose the p-value before the analyses - that is, you decide how confident you want to be that you will not commit a Type I error when deciding if an effect exists • Technically you shouldn't report the significance value at smaller levels of p than what you set before you conducted the study. For example, saying your results were significant a p < .00001 when you set out to test them at p < .05. • There is no such thing as "marginally" significant or a "trend" toward significance - terms commonly seen in many research articles and frowned upon by some. • Significance testing is used as a decision rule. Is it likely that the results are due to chance sampling errors or not? In order to use it correctly, you must set the Type I error rate you will tolerate before you conduct the analysis and then live by it.

Interpretation and Misinterpretation of p-values (cont.)

• p (Type I error rate) is not the index by which to judge the strength of your results. That is, p < .001 is not a bigger effect than p < .05. • The size of p is only an indication of rarity, i.e. that these results are rare (p < .05) assuming that the null hypothesis is true, but these results (p <.001) are even rarer.

Interpretation and Misinterpretation of p-values

• p (Type I error rate, alpha) is the likelihood that you got the results you did based on sampling errors under the assumption that the null hypothesis is true! • It is not the probability that the null hypothesis is correct - although we might like to know that we cannot say we are 95% confident that there are no differences (or no relationship)


Related study sets

RAD 227 Practice Exam #12 (Skeletal System)

View Set

Mint Por3 English Word List Make Sentences 00478001

View Set

Ace Group Fitness Instructor Chapter 3

View Set