Chp 15 Stats -Power
power is function of :
1. a- the prob of a type 1 error 2. the true alternative hypothesis H1 3. the sample size 4. particular test to be employed
what does it mean when power is .6
60% of the time will be signiicant fail to reject Ho. accept H1. 40% of the time the study as designed will not obtain a significant result or type II error
hyp testing essentials
Recall that a null hypothesis (H0) states that the findings of the experiment are no different to those that would have been expected to occur by chance. • Statistical hypothesis testing involves calculating the probability of achieving the observed results if the null hypothesis were true. • If this probability is low (conventionally p < 0.05), the null hypothesis is rejected and the findings are said to be "statistically significant" (unlikely) at that accepted level.
why does power depend upon sample size
as sample size increases with a fixed std dev the standard error decreases and power therefore increases
t-values are significant if
greater than or equal to t
why do we have a special formula for calculating power in a two group experiment with quite unequal sample sizes
we want to give more weight to estimates based on larger samples
power equals
1-B. when we say that the power of a particular experimental design is .65 we mean that if the null h is false to the degree we expect the probability is.65 that the results of the experiement will lead us to reject Ho.
When Do You Need Statistical Power Calculations, And Why?
A prospective power analysis is used before collecting data, to consider design sensitivity. • allows for trial-and-error approach to finding the right combination of sample size, effect size, and power A retrospective power analysis is used in order to know whether the studies you are interpreting were well enough designed. • did you have a large enough sample to detect the expected effect size?
How Do We Measure Effect Size?
Cohen's d • Defined as the difference between the means for the two groups, divided by an estimate of the standard deviation in the population. • Often we use the average of the standard deviations of the samples as a rough guide for the latter.
Failing to reject the null hypothesis
However, "no statistical difference" does not prove the null hypothesis. • We simply do not have evidence to reject it. • A failure to find a significant effect does not necessarily mean there is no effect. • So it is difficult to have confidence in the null hypothesis: • Perhaps an effect exists, but our data is too noisy to demonstrate it.
When Do You Need Statistical Power Calculations, And Why?
In Cohen's (1962) seminal power analysis of the journal of Abnormal and Social Psychology he concluded that over half of the published studies were insufficiently powered to result in statistical significance for the main hypothesis.
Power and Sample Size
One of the most useful aspects of power analysis is the estimation of the sample size required for a particular study • Too small an effect size and an effect may be missed • Too large an effect size too expensive a study • Different formulae/tables for calculating sample size are required according to experimental design
What does power depend on?
Power is your ability to find a difference when a real difference exists. The power of a study is determined by four factors: • Type of test (one- vs two-tailed) • Alpha level. • Sample size. • Effect size: • Association between DV and IV • Separation of means relative to error variance.
Statistical Power
Sometimes we will incorrectly fail to reject the null hypothesis - a type II error. • There really is an effect but we did not find it • Statistical power is the probability of detecting a real effect • More formally, power is given by: 1- where is the probability of making a type II error • In other words, it is the probability of not making a type II error
estimating effect size
There are three ways to decide what effect size is being aimed for: • On the basis of previous research • Meta-Analysis: Reviewing the previous literature and calculating the previously observed effect size (in the same and/or similar situations) • On the basis of personal assessment • On the basis of theoretical importance • Deciding whether a small, medium or large effect is required. • The former strategy is preferable but the latter strategies may be the only available strategies.
How to Increase Power
Use a more lenient alpha (not generally rec) •p<.05 is driven by force of habit, not necessarily by substantive concerns Increase n •Must balance the cost vs the benefit Increase ES •Choose a different research question •Use stronger treatments or interventions •Use better measures
Making a decision
With test statistics that exceed the critical value we reject the null hypothesis. • p < .05 (or .01 or .001, etc.) • Statistically significant effect. • independent variable influences the dependent variable. When a finding does not exceed the critical value we fail to reject the null hypothesis: • p > .05 (or .01 or .001, etc.) • not statistically significant • H0=no difference (implies no evidence of an effect of the treatment, no difference in means, etc.)
Cohen's approximation technique
approach based on the normal distribution and the differences between the level of power computed with this method and with the more exact approach are generally negligible
POSY HOC POWER/retrospective power
calculating power of an experiment based on the actual results of that experiment I had a great chance to reject a false null given the power and i didnt reject it so it is prob true. my study didnt really have much power to start with so dont hold it against me that i didnt reject the null. Im sure it is really false even though i ran a poorly designed study that couldnt detect it.
Power and Effect Size, d
d should be set as the "minimum clinically important difference" (MCID) • This is smallest difference between groups that you would care about • Smaller differences require larger sample sizes to detect • As the separation between two means increases the power also increases • As the variability about a mean decreases power also increases
power depends on
degree of overlap between the sampling distributions under Ho and H1 (this is a function of both the distance between u0 and u1 and the standard error
one group case for d
dela= dhat x's sq root of n
modify what to get a certain power
delta and solve for n to see what sample size is needed
2 independent group
delta= d (sq(n/2))
how to calc cohens d?
difference in means / sample std dev
one measure to which Ho is false would be
distance from u1 to u0 expressed in terms of the number of standard error. (it includes sample size when we want to solve for the power associated with a given n or else for that value of n required for a given level of pwer. so we take as our distance measure or effect size d)
what name do we give to our adjusted sample size with unequal ns?
effective sample size
variance decreases when
either n increases or o^2 decreases when o decreases overap is reduced with increase in power. n is most easily manipulated when increasing power
for independent means what is the difference in for mula
expressed in terms of population means rather than sample means
type II
failing to reject a false null (finding no cancer when there actuall is)
Type 1
falsely rejecting a true null (saying there is cancer when there isnt )
a more powerful experiment is one that
has a greater probability of rejecting a false Ho
increase H1
increase in power
3 things that affect power
level of alpha size of sample size of difference between means
the larger we set a what happens to power
more power but more type 1 error
is it possible to fail to reject a null hypothesis and yet have high retrospective power
no
type II
not finding difference that is there
how do you calculate power for diff in population mean
only need to estimate the diffs in population means and the std deviation of one or more population
increase a
our cutoff point moves to the left, thus decreasing B and increasing power, although with a corresponding rise in prob of a type I error
Independent Samples t-test, Unequal n
ppt
The Noncentrality parameter, δ
ppt
3 ways to estimate d (effect size)
prior research- look at sample means and variances from other studies --> make informed guess at the values we might expect for u1-u0 and for stdeviation personal assessment of how large a difference is important use of special conventions- 3 levels of d small- .2 92% of overlap medium- .5 80% overlap large- .8 69% overlap
Type 1 error
prob of finding a difference that is not there
effect size
standardized difference between two means
d is a measure of
the degree to which u1 and u0 differ in terms of standard deviation of the parent population
power
the probability of correctly rejecting a false Ho
power
the probability of finding a significant difference if the effect that you are looking for is real
what do we mean by power?
the probability of rejecting a false null hypothesis
what is the most important factor controlling power
true diff between pop means- means that big differences are easier to find than small ones. size of pop std dev and size of the sample also play important roles
delta
value used in referring to power tables that combines d and the sample size
for unequal sample sizes
w. large and nearly equal sample sizes take smaller of the two for n but not satisfactory for small sample sizes or if the two ns are ddifferent in which case find harmonic mean
Power and alpha
• By making alpha less strict, we can increase power. (e.g. p < 0.05 instead of 0.01) • However, we increase the chance of a Type I error. Low N's have very little power. Power saturates with many subjects.