PSY3290 CH. 6 p Values, Null Hypothesis Significance Testing, and CI's
Red Flag 3: Beware Accepting the Null Hypothesis
A large p value provides no evidence that the null hypothesis is true. Beware accepting the null hypothesis. Beware the slippery slope of nonsignificance.
If you state α and H0, but not H1, could you make a Type I error? Do you know the TypeI error rate? If so, what is it? Could you make a Type II error? Do you know the Type II error rate? If so, what is it?
Yes, yes, α, yes, no.
In NHST, the test statistic is used to obtain a p value by determining the probability of obtaining a. the observed result, or more extreme, if the null is true. b. the observed result, or less extreme, if the null is true. c. the observed result exactly, if the null is true. d. the observed result approximately, if the null is true.
a
The significance level is the...
criterion for deciding whether or not to reject the null hypothesis. If the p value is less than the significance level, reject; if not, don't reject.
On average, the 95% CI constructed from sample X will include the mean of sample X + 1 how often? 95% of the time 5% of the time 83% of the time 100% of the time
d
Ryan collects 20 samples of X observations each. The SD of the underlying population distribution is unknown. What should he expect when constructing 95% confidence intervals for each of his samples? (a) If X is small, the width of each 95% CI will vary widely from sample to sample (b) If X is large, the width of each 95% CI will be exactly the same (c) On average, 19 out of 20 of his 95% CIs will include the true population mean, regardless of X. (d) a) & c)
d
The null hypothesis a. is a statement we wish to test using NHST. b. specifies a single value for the parameter we are interested in. c. is rejected when the p value obtained is very low. d. All of the above
d
The p value is defined as the...
probability of obtaining the observed result, or more extreme, IF the null hypothesis is true.
The researcher uses the smallest value that permits rejection, because...
rejecting at a lower level (.01 rather than .05) provides a more convincing outcome.
Strict NHST
requires the significance level to be stated in advance. However, researchers usually don't do that, but use a small number of conventional significance levels.
For a null hypothesis value at a point where the cat's eye is very thin, the p value is large / medium / small and the evidence against the null hypothesis is strong / medium / weak / very weak.
small, strong
Consider using one-tailed p only when there are...
strong reasons for specifying a directional H1 in advance.
A smaller p tells us..
that results like ours are less likely, IF the null hypothesis is true.
Where the cat's eye is fat...
p is large and gives no evidence against that null hypothesis value. As we consider values further from 53% and the cat's eye becomes thinner, the p value becomes smaller and gives progressively stronger evidence against those values as the null hypothesis.
Proper NHST conclusions
(1) If p > .05: "The null hypothesis was not rejected, p > .05." (2) If p < .05 (but p > .01): "The null hypothesis was rejected, p < .05." Or "... rejected at the .05 level." (3) If p < .01 (but p > .001): "The null hypothesis was rejected, p < .01." Or "...rejected at the .01 level." (4) If p < .001: "The null hypothesis was rejected, p < .001." Or "...rejected at the .001 level
CI's and NHST
(1) If the null hypothesis value lies outside the 95% CI, the p value is less than .05, so p is less than the significance level and we reject the null hypothesis. (2) Conversely, if the null hypothesis lies inside the 95% CI, the p value is greater than .05, so we don't reject.
three-step process of NHST and p values
(1) State a null hypothesis. (2) Calculate the p value. (3) Decide whether to reject the null.
α, β, trade off
(1) The two errors trade: Smaller α means larger β, and larger α means—you guessed it—smaller β. (2) For a stated α, larger N gives smaller β
A one-tailed p value
(1) includes values more extreme than the obtained result in one direction, that direction having been stated in advance. (2) p = Probability (M ≥ 57.8, IF H0 true)
Type II Error
(1) is failing to reject H0 when it's false, as in the top right cell in Table 6.1. (2) Type II Error = false negative, there is an effect but we missed it
The Type II error rate, β:
(1) is the probability of failing to reject H0 when it's false. It's also called the false negative rate, or miss rate (2) β = Probability (Don't reject H0, WHEN H1 is true) (3) The Type II error rate, β, is the probability of failing to reject H0 when H0 is false (i.e., H1 is true). It is not the probability that H1 is true.
The Type I error rate, α:
(1) is the probability of rejecting H0 when it's true. It's also called the false positive rate. (2) α = Probability (Reject H0, WHEN H0 is true)
Type I Error
(1) is the rejection of H0 when it's true, as in the bottom left cell in Table 6.1. (2) Type I Error = false positive, we say there is an effect when there isnt
(2) Calculate the p value.
-Calculate from the data the p value, which we can think of informally as measuring the extent to which results like ours are unlikely, IF the null hypothesis is true. (A little later I'll say how we define "results like ours".) -The "IF" is vitally important: To calculate a p value, we assume the null hypothesis is true. -A p value therefore reflects both the data and the chosen null hypothesis. -For our poll result, and a null hypothesis of 50% support in the population, the p value turns out to be .003, which indicates that results like those found by the poll are highly unlikely, IF there's really 50% support in the population. It's therefore reasonable to doubt that null hypothesis of 50% support. More generally, we can say that a small p value throws doubt on the null hypothesis.
p values and the normal distribution Steps...
1) We identified our sample result (M = 57.8) and our null hypothesis (H0: μ = 50). 2)We focused on the difference, or discrepancy, between that result and the null hypothesis value. Our difference was (57.8 − 50). 3) We used a formula (Equation 6.1) to calculate from that difference the value of a test statistic, IF H0 were true. We calculated that z = 2.136. 4) We consulted the distribution of the test statistic, z, to find the two tail areas corresponding to that value of the test statistic. Figure 6.5 tells us that the total area is .03 (after rounding); this is our p value. 5) We interpreted the p value, using the NHST or strength of evidence approach.
the slippery slope of significance.
An effect is found to be statistically significant, is described, ambiguously, as "significant", then later is discussed as if it had been shown to be important or large.
the slippery slope of nonsignificance.
An effect is found tobe statistically nonsignificant, then later discussed as if that showed it to be non-existent.
Red Flag 4: Beware the p Value: What a p Value Is, and What It's Not
Beware any suggestion that the p value is the probability that H0 is true. In other words, the p value is not the probability that our results are due to chance.
Red Flag 1: Dichotomous Thinking
Beware dichotomous conclusions (effect, no effect) , which may give a false sense of certainty. Prefer estimation thinking. Express research aims as "how much" or "to what extent" questions.
Red Flag 2: the "S" word
Beware the dangerously ambiguous S word. Say "statistically significant", or use a different word. Beware the slippery slope of significance.
The key requirement is that you must choose H1 in advance of conducting the study, and only choose a one-tailed alternative if you have a very good reason...
It's totally unacceptable to calculate, for example, p = .06, two-tailed, then claim that you really meant to use a one-tailed alternative, so one-tailed p = .03 and you can declare statistical significance. No! Any report that includes p values should state whether they are two-tailed or one-tailed.
(3) Decide whether to reject the null.
NHST compares the p value with a criterion called the significance level, often chosen to be .05. If p is less than that level, we doubt the null hypothesis. -In fact we doubt it sufficiently to reject it and say we have a statistically significant effect. -If p is greater than the significance level, we don't reject the null hypothesis and can say we have a statistically nonsignificant effect, or that we didn't find statistical significance. -Note carefully that we say the null hypothesis is "not rejected", but we don't say the null hypothesis is "accepted". Sorry about the multiple "nots"; I'm afraid they come with the NHST territory.
Linda runs a study to compare donations to a charity prompted by the door-in-the-face technique with those prompted by a standard donation request. She will use NHST, using α = .05 and the null hypothesis that donations are the same regardless of type of request. (Have you heard of the door-in-the-face technique? You can find out more at tiny.cc/ doorinface).
Quiz 6.3
(1) State a null hypothesis.
The null hypothesis is a statement about the population that we wish to test. -It specifies a single value of the population parameter that serves as a reference or baseline value, to be evaluated. -Here we would probably choose 50% as our null hypothesis value, so our null hypothesis is the statement that "there is 50% support in the population"—the level to be exceeded for the proposition to pass. -Often a null hypothesis states that there has been no change, or that an effect is zero.
Linda finds p = .001 for the comparison of donation amounts. She thus decides to reject the null hypothesis. In this case, what types of errors does Linda need to worry about? (a) Linda might be making a Type I error, but she doesn't need to worry about a Type II error. (b) Linda might be making a Type II error, but she doesn't need to worry about a Type I error. (c) Linda could be making either a Type I or a Type II error. (d) No need to worry about any kind of error: The p value is very small, so Linda is almost certainly correct. (Hint: This is not the right answer!)
a
The absolute value of the t-scores between which 95% of all observations fall: (a) will almost always be larger than the corresponding z-score (b) will almost always be smaller than the corresponding z-score (c) will almost always be equal to the corresponding z-score (d) is usually 1.96 times smaller than than the corresponding z-score.
a
test statistic
a statistic with a known distribution, when H0 is true, that allows calculation of a p value. For example, z is a widely used test statistic that has the standard normal distribution.
What would be a Type I error for this study? (a) The null hypothesis is true but Linda rejects it. That is, the door-in-the-face technique is not better, but Linda comes to think that it is. (b)The null hypothesis is false, but Linda fails to reject it. That is, the door-in-the-face technique is better, but Linda remains skeptical. (c) The null hypothesis is true, but Linda fails to reject it. That is, the door-in-the-face technique is not better, and Linda remains skeptical about it. (d) The null hypothesis is false but Linda rejects it.
a, but c and d are not errors but correct decisions
In the NHST approach, p is the probability of obtaining results like ours IF the null hypothesis is true. What values can p take? a. Minimum of 0, maximum of 1. b. Minimum of −1, maximum of 1. c. Minimum of 0, no maximum at all. d. Trick question—there is no minimum or maximum for p values.
a: p is a probability and therefore can only range between 0 (no chance) and 1 (certain)
Explaining p: Here's another way to think about the p value. Suppose you run an experiment to investigate whether your friend can use the power of her mind to influence whether a coin comes up heads or tails. You take great care to avoid trickery—consult a skilled conjurer to discover how difficult that is. Your friend concentrates deeply then predicts correctly the outcome of all 10 tosses in your trial. I can tell you that the p value is .001, the probability she would get all 10 correct, IF the null hypothesis of a fair coin and random guessing were true
are you going to reject H0, and buy her the drink she bet you? Or will you conclude that most likely she's just had a very lucky day? Sure, .001 is very small, but you find her claimed power of the mind very hard to accept. That's our dilemma: Either H0 is true and a very unlikely event has occurred as the tiny p value indicated, or H0 is not true.
In the NHST approach, once a p value is obtained it is compared to the significance level, which is usually set at .05. If p < significance level, a. the null hypothesis is not rejected. b. the null hypothesis is rejected. c. the result is not statistically significant. d. something has gone horribly wrong in your research.
b
What would be a Type II error for this study? Choose again from a, b, c, and d.
b
When interpreting NHST results, it is important to remember that (a) statistically significant (p < α) does not necessarily mean the finding is important, large, or meaningful. (b) just because the null is not rejected does not mean you should accept the null as true. (c) p is not the probability that H0 is true, it is the probability of obtaining your results or more extreme if H0 is true. (d) All of the above.
d
Use z to calculate the p value when we are willing to assume σ is known.
equation 6.1
t when H0 is true
equation 6.2
Choose smaller α for...
fewer Type I errors, at the cost of more Type II. Fewer false positives, but more misses.
If the 95% CI contains the null hypothesis value, the corresponding p value will be greater than / less than .05 and the NHST decision will be to reject / not reject the null hypothesis.
greater than, not rekject
A one-sided (or directional) alternative hypothesis
includes only values that differ in one direction from the null hypothesis value. For example, H1: μ > 50.
A two-tailed p value
includes values more extreme in both positive and negative directions.
If I choose smaller α, then β will be __________, and there will be more __________ errors, but fewer __________ errors.
increased, Type II, Type I
The alternative hypothesis (H1)
is a statement about the population effect that's distinct from the null hypothesis. There is an effect.
The inverse probability fallacy
is the incorrect belief that the p value is the probability that H0 is true.
Table 6.1 The Four Possibilities for NHST Decision Making
is the rejection of H0 when it's true, as in the bottom left cell in Table 6.1.
For the .05 significance level, reject the null hypothesis if its value...
lies outside the 95% CI; if inside, don't reject.
NHST
null hypothesis significance testing
When p is very small (close to 0) it means we have obtained results that are likely / unlikely if the null hypothesis is true.
unlikely
When it's reasonable to assume σ is known, do so, and use z to calculate the CI and/or p value. If not...
use s to estimate σ and calculate t.
The strict NHST significance level chosen in advance is
α, and the decision rule is: Reject H0 when p<α; otherwise don't reject.
Use t and s, the sample SD, to calculate the p value when...
σ is not known.