Null Hypothesis & Significance Testing
Confidence Interval
Goal: capture the true effect most of the time (true mean) - a 95% confidence interval should include the true effect about 95% of the time
Measures for effect size
hree common measures in ANOVA are: Omega squared, Epsilon squared, Eta squared. Other popular measures include: Cohen's d Glass's Delta Hedges' g Somers' D
Finding, estimating and interpreting choen's d in group comparison studies
d = the difference between the means of the two groups, divided by the standard deviation - interpret a size of group difference in SD units - when average mean difference between tx and control groups is 0.8 to 1 SD, practical significance has been defined as "high"
Confidence Intervals and Alpha Levels
90% CI = .10 alpha 95% CI = .05 alpha 99% CI = .01 alpha
How to compute confidence intervals
In 95% of samples, the true population proportion (mean) will fall within 2 standards errors of the sample mean: For a bell curve, 95% of the data will be between +/- 1.96 standard deviations Standard error: this is not the standard deviation of the sample, it is the standard deviation of the sample proportion Commonly used "Z" levels of confidence - 90%, 95% and 99%
Concepts Underlying NUll Hypothesis
- a statement that the obtained differences being investigated are not significant (the observed sample mean differences are in fact just a chance of occurrence). - in other words, the findings are not indicative of what is really going on within the population (the differences are due to sampling error) - the null hypothesis was rejected. "The differences among sample means are big enough to suggest they are likely real and no chance occurrences."
Rejecting the Null Hypothesis
We should reject the null hypothesis only 5% of the time when the null is actually correct (risk of type 1 error). But, this is based on the following assumptions: 1. We set that alpha level and choose a directional or nondirectional test before we look at our data 2. Our data meet the assumptions required for the use of parametric statistics 3. Our sample is drawn randomly from the population and is representative of this population 4. we do only one statistical signficance test
What conclusions can we draw when we obtain statistically significant outcomes?
- sampling error - experimenter effects (expectancy) - Confounds -Sufficiently strong intervention - Moderate to strong effect size
Inferential Statistics and Hypothesis Testing - the Null Hypothesis
A hypothesis states a researchers' prediction about results A null hypothesis states that there will be no difference, no effect, or no relationship between the variables. Research hypothesis states that there will be a difference, an effect, or relationship
One vs. Two-Tailed Tests of Significance
Tests of significance can be either one or two tailed (two-tailed is most common). If it is determined that the difference or relationship will only occur in one direction (you have a specific directional hypothesis) then use a one-tailed test. Use a two tailed test if it is possible for the difference of relationship to go either way
Null Hypothesis and Confidence Interval
- When you use CI's to test null hypothesis, the size of the CI determines the alpha level for the significance test - the alpha level (significance level) refers to the size of the region of rejection in a sampling distribution The confidence level is equivalent to 1 - the alpha level. So, if your significance level is 0.05, the corresponding confidence level is 95%. If the P value is less than your significance (alpha) level, the hypothesis test is statistically significant. If the confidence interval does not contain the null hypothesis value, the results are statistically significant. If the P value is less than alpha, the confidence interval will not contain the null hypothesis value.
What conclusions can we draw when we obtain non-significant outcomes?
- small effect size - insufficient power - unreliable measures/error - manipulation not strong enough/inconsistent - Nonlinear relationship - Sampling error
Type 1 error
A type 1 error is also known as a false positive and occurs when a researcher incorrectly rejects a true null hypothesis. This means that your report that your findings are significant when in fact they have occurred by chance. The probability of making a type I error is represented by your alpha level (α), which is the p-value below which you reject the null hypothesis. A p-value of 0.05 indicates that you are willing to accept a 5% chance that you are wrong when you reject the null hypothesis. You can reduce your risk of committing a type I error by using a lower value for p. For example, a p-value of 0.01 would mean there is a 1% chance of committing a Type I error. However, using a lower value for alpha means that you will be less likely to detect a true difference if one really exists (thus risking a type II error).
Type 2 Error
A type II error is also known as a false negative and occurs when a researcher fails to reject a null hypothesis which is really false. Here a researcher concludes there is not a significant effect, when actually there really is. The probability of making a type II error is called Beta (β), and this is related to the power of the statistical test (power = 1- β). You can decrease your risk of committing a type II error by ensuring your test has enough power. You can do this by ensuring your sample size is large enough to detect a practical difference when one truly exists.
Confidence Intervals
Confidence interval is how much uncertainty there is with any particular statistic. Confidence intervals are often used with a margin of error. It tells you how confident you can be that the results from a poll or survey reflect what you would expect to find if it were possible to survey the entire population. Confidence intervals are intrinsically connected to confidence levels. The confidence interval tells you how confident you are in your results. With any survey or experiment, you're never 100% sure that your results could be repeated. If you're 95% sure, or 98% sure, that's usually considered "good enough" in statistics. That percentage of sureness is the confidence interval.
Levels of Significance
Level of Significance reflects the chance the researcher is willing to take of making an incorrect decision about the obtained result - a calue of .05 says the researcher is willing to accept a 5% chance of making a type 1 error Tests of significance: t-test, F-test, chi-square As a rule the larger the score on a given test, the greater the likelihood that the result is significant
Probability
Statistical theory allows us to find that probability - to reject the null hypothesis or not Probability is expressed in terms of a p-value 0 = no probability of an event happening 1 = 100% certainty an event will happen If a p-value is small enough, the decision is made to reject the null hypothesis - the phenomenon observed in the sample will also occur in the population .05 is most typically used (5% or 5 out of 100) - reject the null
Effect Size
The effect size is how large an effect of something is. For example, medication A is better than medication B at treating depression. But how much better is it? A traditional hypothesis testwill not give you that answer. Medication B could be ten times better, or it could be slightly better. This variability (twice as much? ten times as much?) is what is called an effect size. Most statistical research includes a p value; it can tell you which treatment, process or other investigation is statistically more sound than the alternative. But while a p value can be a strong indicator of which choice is more effective, it tells you practically nothing else. Effect size can tell you: How large the difference is between groups. The absolute effect (the difference between the average outcomes of two groups). What the standardized effect size is for an outcome. An example of absolute effect could be: patients taking drug B for depression might see a mean improvement on a depression test (like Beck Depression Inventory) of 25 points. Standardized effect sizes are similar to the way some scores are standardized using z-scores; they give a perceived effect some numerical value that is easily understood. For example, the categories on a Likert scale (agree, strongly agree, disagree etc..) have more meaning when they are standardized.
tests of significance
The inferential statistics that allows the researcher to conclude if the null hypothesis should or should not be rejected A test of significance is usually carried out using a preselected significance level (or alpha value) reflecting the chance the researcher is willing to accept when making a decision about the null hypothesis Typically no greater than 5 out of 100 (p<.05)
The Null Hypothesis
The null hypothesis, H0 is the commonly accepted fact; it is the opposite of the alternate hypothesis. Researchers work to reject, nullify or disprove the null hypothesis. Researchers come up with an alternate hypothesis, one that they think explains a phenomenon, and then work to reject the null hypothesis. The null hypothesis can be thought of as a nullifiable hypothesis. That means you can nullify it, or reject it. What happens if you reject the null hypothesis? It gets replaced with the alternate hypothesis, which is what you think might actually be true about a situation. For example, let's say you think that a certain drug might be responsible for a spate of recent heart attacks. The drug company thinks the drug is safe. The null hypothesis is always the accepted hypothesis; in this example, the drug is on the market, people are using it, and it's generally accepted to be safe. Therefore, the null hypothesis is that the drug is safe. The alternate hypothesis — the one you want to replace the null hypothesis, is that the drug isn't safe. Rejecting the null hypothesis in this case means that you will have to prove that the drug is not safe.