PSY 202 - Power, p, & CI
given effect size
*In the estimation approach, you're calculating a point estimate and interval for a __ __ _. Effect Size In general, an effect size is "how much of anything of interest." -- EX: In a t-test, how much do means differ? -- EX: In a correlation, how strongly do the variables relate? Point Estimate -- The point estimate is any sort of value you can estimate (slope, mean, cohen's d, etc.). Interval -- Range around our point estimate. -- Usually 95% confidence intervals. -- EX: If our point estimate is 4, then our confidence interval is a range around this.
Formula for 95% Confidence Interval
95% CI = ES ± zcrit * SE Upper bound The point estimate PLUS the (z or t) critical value times the standard error. Lower bound The point estimate MINUS the (z or t) critical value times the standard error. ES is the point estimate (mean, mean difference, slope, etc.) zcrit = ± 1.96 for a 95% CI SE = sigma divided by the square root of N (σ/√N) zcrit * SE = margin of error
CI Critical Value
95% CI = ES ± zcrit * SE Upper bound The point estimate PLUS the (z or t) critical value times the standard error. Lower bound The point estimate MINUS the (z or t) critical value times the standard error. Which Critical Value to Use Know the Population SD -- If you know the population SD, you use zcrit (+/- 1.96) Don't Know the Population SD -- If you don't know the population SD then substitute tcrit. Use the t distribution to determine how many standard errors away upper/lower is. At a big enough sample size (N), the z (normal) and t distributions will show similar values.
G*Power
A well-known, free program that can be used to calculate the power associated with a study is known as ___; this program helps you run a power analysis. With this program, you can calculate the necessary sample size to achieve effect of interest with a given power and alpha level.
Sample size (N) is large
All else being equal, POWER is GREATER when __ __ _ __. As sample size increases, so does your power. You're more likely to detect an effect with lots of participants.
Beta (β) is small
All else being equal, POWER is GREATER when __ __ _ __. If Beta (β) is smaller, then power will be larger. Power = 1 - β Power equals 100% minus the probability of missed effects. If the probability of missed effects is smaller, power will be larger. EX: Larger Beta = .5, power is .5 EX: Smaller Beta = .1, power is .9
Alpha is large
All else being equal, POWER is GREATER when __ __ _ __. If the alpha level (α) is larger, then power will be greater. Alpha (α) is the probability of making a type I error, or rejecting the null when you shouldn't If you increase the probability of getting a false positive over time (alpha level), then you'll reject the null more often. You'll be making more Type I errors (false positives) but fewer Type II errors (missed effects) -- Rejecting more often means that you won't miss as many effects; you'll be correctly rejecting more often. EX: Change alpha level from .05 (5%) to .10 (10%), then you will be able to reject the null more often; allowing for a greater % of errors.
Effect size is large
All else being equal, POWER is GREATER when __ __ _ __. You will have more power if the effect is larger. It's easier to detect a large effect than a small effect. When you have a bigger effect, it's easier to find group differences with a smaller sample; this means there's a smaller required sample size when you have a bigger effect!
1. Effect Size is large 2. Sample size (N) is large 3. Alpha (α) is large 4. Beta (β) is small
All else being equal, POWER is GREATER when: 1. 2. 3. 4.
Power is greater when
All else being equal, __ _ __ __: 1. Effect Size is large 2. Sample size (N) is large 3. Alpha (α) is large 4. Beta (β) is small
CI Calculation Example
CI Calculation Example EX: Are Furman students smarter than average? -- µ = 100, σ = 15 -- Sample 10 Furman students and find mean IQ is 120. Want to calculate our Confidence Interval: 95% CI = ES ± zcrit * SE 95% CI = 120 ± 1.96 (15/√10) 120 - 9.3, 120 + 9.3 95% CI = 110.7 to 129.3 95% CI [123.08, 130.96]
CI Calculation Example
CI Calculation Example EX: The Psychic and Blood Pressure A psychic claims to be able to "read" blood pressure by looking at people. You ask the psychic to identify people with abnormal blood pressure. The psychic selects 81 people and you find their average blood pressure (M = 127). You know from previous research that the average blood pressure in the population (µ) is 124 with a standard deviation (σ) of 18. Does the data support the psychic's claim that he can "read" blood pressure? Want to calculate a confidence interval 95% CI = ES ± zcrit * SE ES = 127; SE = 18/√81 = 2; crit = +/- 1.96 95% CI = 127 ± 1.96 (2) 127 - 3.92, 127 + 3.92 95% CI [123.08, 130.96] Does the confidence interval contain the parameter? -- Yes, the confidence interval contains 124. -- So, the data supports the psychic's claim that he can "read" blood pressure.
Confidence interval for Independent Samples t test*
CONSTRUCTING CONFIDENCE INTERVALS Independent Samples t-Test Example Independent Samples t → Compare two group means EX: Do we find people more attractive if we've interacted with them? -- One IV, two levels: interaction, no interaction -- Participants randomly assigned to one condition -- Rated attraction from 1 to 9 -- 10 participants in the interaction condition -- 9 participants in the no interaction condition Construct confidence intervals around the mean difference.
Confidence Interval for One Sample t Test
CONSTRUCTING CONFIDENCE INTERVALS One Sample t Test Example One sample t → compare group mean to population mean. EX: Students and Testing -- Suppose 10 students take a test. -- The test has 100 questions & each ? has 5 options. -- The mean of the sample is 25 with a SD of 3.68. -- Did the students perform better than chance? x-bar = 25, s = 3.68, mu = 20 SE = 3.68 / √10 = 1.16 x-bar - mu ------------ = t SE 25 - 20 --------- = t 1.16 -- After looking at the t distribution, tcrit = 2.26 -- Then, we construct our confidence interval with the confidence interval equation: 95% CI = ES ± tcrit * SE -- 95% CI = 25 ± 2.26 * 1.16 -- 95% CI = [22.4, 27.6]
see if the null is included
Confidence Intervals and Significance Testing Null hypothesis significance testing (NHST) -- There are different ways to do this. -- We've been constructing the interval around the null (use p-values) -- But, we could also construct the interval & __ _ __ _ __. NHST with Confidence intervals -- See if the null is included -- Does your 95% CI contain the null value? -- If your CI doesn't contain 0 (null is outside the interval) → statistic is significant -- If your CI DOES contain zero (null is inside the interval) → statistic is NOT significant Conducting the same kind of significance testing, just a different way. Some people think that this will give you more information about the range of your estimation.
center values are more likely!!
Confidence intervals are often presented as a straight line, and this makes all values in the interval seem equally likely. However, NOT all the values in a confidence interval are equally likely estimates. Instead, some suggest that we should use a cat-eye diagram → on the right. This shows that __ __ _ __ __!
1 - β = power
How do you find power using Beta (β)? Remember, Beta (β) is the probability that you get a Type II error (missed effects) over time!
Long run
In a single study, p is NOT the probability of making a Type I error. In a single study, either you made an error or you did not. The p-value isn't a probability because you know if you did or didn't. Instead, probability of errors refers to the __ __. -- The probability of Type I & Type II errors over time. -- If my alpha level is .05 and I conduct 100 studies, then I will probably see a Type I error in 5 studies. -- The null is true or it isn't (but we never know for sure).
alpha = .05, beta = .2
In psychology, we are generally comfortable with an alpha of __ and beta of __. We're okay with a greater probability of missing an effect (20%) than getting a false positive (5%). ^ More afraid of making a claim that's untrue then saying there's not an effect when there is.
95% of the time
Interpreting Confidence Intervals: Interpretation 1: In the long run (many repeated studies/samples) a 95% CI will contain the population parameter __ _ _ __. You get an estimate & interval based on your sample. -- The true population value within the range in 95% of the confidence intervals you make. -- The confidence is in the formula, NOT one specific interval This interpretation says that... -- Higher confidence (e.g., 99%) = wider interval. -- More confident that a wider interval will contain the true parameter -- Bigger sample = lower interval; more precise
95% confident
Interpreting Confidence Intervals: Interpretation 2: We're __ __ that a reasonable, possible value for the parameter lies somewhere between the lower and upper bound of the interval. 95% confident that the interval contains the parameter -- YES the interval contains the parameter -- or NO the interval doesn't contain the parameter. -- The parameter is in the interval or it isn't—you're calculating the interval after the fact! Note: NOT a probability estimate (95% confident ≠ 95% chance)
dichotomous (NHST), estimation (CIs)
NHST (Null hypothesis significance testing) -- NHST us a __ approach. -- ASKS: Is there an effect? Yes or no? -- Encourages dichotomous thinking. -- Rejecting the null is good; failing to reject it is bad. -- May incentivize p-hacking (people want so badly to reject the null that they do all in their power to). Confidence Intervals -- Confidence intervals use the __ approach -- ASKS: How much? To what extent? NHST v.s Estimation approach -- NHST: Is there an effect of this treatment? -- Estimation: How much of an effect did this treatment have?
less likely
NOT all values in the confidence interval (CI) are equally likely! The farther a value in a CI is from the point estimate, the __ __ it is to be the true population value. These scores are all within a reasonable range. -- BUT, they still get less likely the further they get from the point estimate. -- They just don't hit the cutoff, so they're likely enough.
detect an effect
POWER ANALYSIS refers to the process of determining the sample size that you need based on the alpha, effect size, and beta that you need. In other words, this is the process of determining the sample size that will produce the needed probability of false positive (alpha), probability of missed effects (Beta), and effect size. Power analysis is most useful to calculate a required sample size BEFORE conducting a study. With this, you can determine how big of a sample you'll need to _____.
increase sample size
POWER is GREATER when we __ __ _. Distributions become narrower, so the null and alternative distributions overlap less So, there's less room for error and more room for rejecting the null.
most likely value
Point Estimates are the BEST estimate of the true population parameter. Confidence intervals are based on distributions; your best estimate is still your point estimate (effect size). -- Best bet for the population parameter is still the point estimate; the __ _ __ for the population value.
don't make sense
Power analysis is most useful to calculate a required sample size BEFORE conducting a study. With this, you can determine how big of a sample you'll need to detect an effect. "Observed power" and "post hoc power __ _ __. -- Determining the ability to detect an effect after running the test doesn't make sense. -- You already know if you've detected the effect or not!
1. Effect size 2. Sample size 3. Alpha 4. Beta
Power, the ability to detect an effect when there is one, is determined by: 1. Effect size 2. Sample size 3. Alpha 4. Beta If you know any three of the above, you can calculate the fourth!
Power is determined by
Power, the ability to detect an effect when there is one, is determined by: 1. Effect size 2. Sample size 3. Alpha 4. Beta If you know any three of the above, you can calculate the fourth!
inferences about populations
Remember that we hardly ever (basically never) know population values. We use inferential statistics; by this, I mean we use samples to make __ _ __. For example, we use a t-test when we don't know the population standard deviation; the sample standard deviation stands in.
the probability that we'll reject the null correctly
Remember that we hardly ever (basically never) know population values. We use inferential statistics; by this, I mean we use samples to make inferences about populations. We'll never truly know when we make a mistake. We'll never actually know when the null is true or false, but we can determine the _______. -- The probability that we'll screw it up or get it right.
large enough N, sampling distribution is normal
Sampling distribution of the mean is a distribution of many, many means. -- Repeated samples of the same (equal) size. -- N = number of samples. With enough samples (N>30), the sampling distribution of the mean is approximately normal. With a __ _ __N(>30), then the ____.
Power
Statistical ____ refers to the ability to detect an effect when an effect exists. The ability to reject the null when the null is false. Power is all about correctly rejecting the null when you should.
p-value
The __-__ is the probability that you would see results at least as extreme as yours ASSUMING THE NULL IS TRUE. Always start by assuming that the null is true. If the null is true, then the p value is the probability that you'd get what you did. You want this value to be small and therefore rare. The more unlikely that your value would have come from the null distribution, the greater evidence that the null distribution is not the true state of the world!
Tradeoff between errors
There's a __ ___ Type I and Type II __. If you have a larger alpha level, you'll be more likely to make a Type I error (false positive). If you have a smaller alpha level, you'll be more likely to make a Type II error (missed effect) EX: Smaller Alpha -- Alpha of .01 -- More likely to reject the null incorrectly (Type II error) -- Very conservative! EX: Larger Alpha -- Alpha of .05 -- More likely to reject the null correctly. -- More likely to reject the null incorrectly too (Type I error). -- Not as conservative.
Problems with p-values?
There's a discussion about the value of p-values in psychological research. Are p-values bad? -- Some say yes; many recommendations to get rid of them. Recommend that we use confidence intervals instead Incorrect Interpretation -- The bigger issue, in Dr. Bent's opinion, is interpretation. -- People don't know what they mean Dichotomous (Good/Bad) Thinking -- p-values do imply/support dichotomous thinking. -- Rejecting the null is seen as good; failing to reject it is seen as bad. This is not how we should be thinking! -- "Significant" is a poor word choice -- Significant sounds like meaningful. -- Really, EFFECT SIZES determine if the effects are meaningful, NOT p-values! Remember this!
Confidence interval*
This is a range around our point estimate. Usually a 95% confidence interval.
alpha (α)
This is the probability that we'll get a Type I error over time. The probability that we'll say something's happening when it's not. If you set your alpha at .05, you're acknowledging that you'll make a Type I error 5% of the time, over time.
Beta (β)
This is the probability that we'll get a Type II error over time. The probability that you'll miss an effect when it's actually there. 1 - β = power 100% - the probability of missed effects = power.
can compare confidence intervals
We __ __ two __ __ to see if groups are significantly different. Do the intervals overlap? No overlap → if two confidence intervals do not overlap, then they are significantly different Overlap → if two confidence intervals overlap, no conclusions w/out further testing. For example, can construct a CI around each mean in an independent sample t-test to see if the intervals around the means overlap. -- If the means don't overlap, then they're significantly different; if they do, then we have to do further testing.
significantly different, we have to do further testing (can't make conclusions yet).
We can compare two confidence intervals to see if groups are significantly different. If two confidence intervals do NOT overlap, then they are ____. iF two confidence intervals DO overlap, then ____.
contains zero, doesn't contain zero.
We can conduct Null hypothesis significance testing (NHST) using confidence intervals. We want to see if the null (0) is included in the interval. -- If our 95% CI __ __, then our statistic is NOT significant. -- If our 95% CI __ _ __, then our statistic is significant. Therefore, we want to see if our 95% CI contains the null value; we ask, is the null inside or outside the interval?
Confidence intervals and the Null Distribution*
We can use confidence intervals for hypothesis testing! Null Distribution According to the null distribution, the true population relationship is 0 (nothing is happening). -- We construct a 95% confidence interval around 0. Upper and Lower Bound -- When you calculate a confidence interval, you're trying to find the upper and lower bound cutoffs for the 95% interval. -- EX: In a sampling distribution (using a z statistic), the upper bound and lower bound cutoff will be 1.96 standard deviations away from the point estimate. -- If your test statistic lies outside of the range, then your value likely doesn't belong to the normal distribution. Rejecting & Failing to Reject -- When you reject the null, you get a value that's outside of the 95% confidence interval; it's super rare. -- When you fail to reject the null, you get a value that's inside of the 95% confidence interval.
95% of area under the curve
When calculating confidence intervals, we're looking for the range of values that would delineate __ _ __ __ __ __. For normal distributions, the upperbound and lowerbound of the interval is ± 1.96 standard deviations from the point estimate.
Power on a graph
When conducting an experiment, we're trying to see the real state of the world. To see if our findings are significant, we compare a test statistic to the null distribution; we assume that the null distribution is the true state of the world (in which nothing is happening). On a graph, the null distribution is represented by a normal curve (in red). And, the test statistic is somewhere on the x-axis. The critical value (green line) is the value that we have to exceed to determine significance and reject the null hypothesis in favor of the alternative (distribution in blue). The overlap between the null distribution and the alternative distribution is the probability of making a type I or type II error. And, power is represented by all the space in the alternative distribution where we have the ability to determine significance.
95% of the time OR 95% confident
When interpreting confidence intervals, there are two main interpretations used: 1. In the long run, after many repeated samples/studies, a 95% confidence interval will contain the parameter __ __ __ __. 2. You are __ __ that the interval contains the parameter.
Increasing & Decreasing Alpha
When we increase alpha level, there's a greater probability of Type I errors (false positives). When we decrease alpha level, there's a greater probability of Type II errors (missed effects).
Type I error
When you have a larger alpha level, you're more likely to make a __ _ __. In other words, you're more likely to have a false positive by incorrectly rejecting the null.
Type II error
When you have a smaller alpha level, you're more likely to make a __ _ __. In other words, you're more likely to have a missed effect by not rejecting the null.
Point estimate
You calculate a confidence interval around a __ __. Any sort of value that you estimate (slope, mean, cohen's d) can have a confidence interval constructed around it. 95% CI = ES +/- zcrit * SE -- ES is the point estimate (aka the effect size).
Power analysis
__ __ refers to the process of determining the sample size that you need based on the alpha, effect size, and beta that you need. In other words, this is the process of determining the sample size that will produce the needed probability of false positive (alpha), probability of missed effects (Beta), and effect size. Power analysis is most useful to calculate a required sample size BEFORE conducting a study. With this, you can determine how big of a sample you'll need to detect an effect.
Some other model, extreme scores sometimes
p-values are NEVER true or false! These are statements about data, not about theory! Fail to Reject the Null If p >.05, then we fail to reject the null. -- DON'T say the null is true because unless p=1, some __ __ _ could be more likely. Reject the Null If p < .05, then we reject the null in favor of the alternative. -- DON'T say that the null is false because we'll get ___ __ __ even when the null is true. Additionally, all p-values assume that the null is true to begin with; so, it wouldn't make sense to say that the null is true or false due to this prior assumption.