Exam 2
2 tails in a row
(1/2 * 1/2) 25%
research hypothesis
(usually) that the means of your populations will differ
null hypothesis
(usually) that there is no (null) difference between the means of your populations
.05 vs .01
.05 is more lenient and therefore is a lower risk for a type II error
small
.20
medium
.50
large
.80
7 tails in a row
0.8%
what effects effect size?
1.) (a) the difference between the population means - big numerator= bigger effect size= more power (b) the comparison population's SD -smaller denominator= bigger effect size= more power
3 characteristics of a distribution of means
1.) DoM has the same mean as the distribution of individual scores 2.) (a.) DoM has less spread around the mean (smaller variance and standard deviation) than the distribution of individuals scores (b.) the standard deviation of the distribution of means is the (approximate) average amount of difference each of your sample's means is from the overall population mean 3.) The distribution of means is approximately normal: if the parent population is normal or if the sample has 30 or more people
Steps in Hypothesis Testing
1.) Restate the question as a research hypothesis and a null hypothesis about the populations 2.) Determine the characteristics of the comparison distribution 3.) Determine the cutoff sample score on the comparison distribution at which the null hypothesis should be rejected 4.) Determine your sample's score on the comparison distribution 5.) Decide whether to reject the null hypothesis
distribution of means can be narrow
1.) population of individuals may have a small standard deviation 2.) sample size is large
Power Influenced by
1.) predicted effect size (d) (a) the difference between the population means - big numerator= bigger effect size= more power (b) the comparison population's SD -smaller denominator= bigger effect size= more power 2.) Sample Size (N) 3.) Alpha level (a) 4.) Whether it will be one or two-tailed test descriptive vs inferential stats
influences power
1.) predicted effect size (d) 2.) Sample Size (N) 3.) Alpha level (a) 4.) Whether it will be one or two-tailed test
true for all distribution of means
1.) the mean of the distribution of means is about the same as the mean of the original population of individuals 2.) the spread of the distribution of means is less than the spread of the distribution of the population of individuals 3.) for most cases: the shape of the distribution of means is approximately normal
5 tails in a row
3%
Probability of tail 1 flip
50%
True or false: If you find the significance results with alpha set to .00000001, you can say that your results prove the research hypothesis
False: The term "prove" is too strong because the results of the research hypothesis are based on probabilities
are 5% of all studies type I error false positives
NO you cannot falsely conclude a drug works when it actually works *it is not true that the null is always true, so it is not true to say that 5% of all studies are type I errors*
if a result is statistically significant at the .05 level, will it always be significant at the .01 level?
No
If the result is significant with a one-tailed testing procedure, will it always be significant with a two-tailed procedure?
No criteria for significance is more stringent with a two tailed procedure
If a result is significant with a two-tailed testing procedure will it always be significant with a one-tailed testing procedure?
No, if you pre-specify the low tail and the result is in the high tail, the result will no longer be significant
standard error of the mean (SEM)
Same as standard deviation of a distribution of means; also called standard error (SE)
N (number in sample) goes up
Standard error goes down
.10
Which of the below alpha levels makes it easiest to find significant results? .05 .01 .10 .03
.01
Which of the following alpha levels makes it toughest to find significant results .05 .01 .10 .03
if a result is statistically significant at the .01 level, will it always be significant at the .05 level?
Yes
Distribution of means (DOM)
a hypothetical distribution of means of each of lots and lots of samples of the same size that are randomly taken from the same population of individuals a collection of sample means for all possible random samples of a particular size (N) each score in this distribution is a sample mean not an individual participant's score
bar too high (alpha too high)
a lot of people are able to crawl under the bar= a lot of people are athletes that aren't a lot of false alarms type I error
effect size
a measure of the difference between population means how much our sample mean differs from the null hypothesis population mean after our treatment
effect is there
a more powerful study will more likely find it
one-tailed test
a result in only one tail (not both) could reject the null
0.05 vs 0.001
a study with alpha of 0.05 is much more powerful than a study with alpha set to 0.001 big alphas set a high limbo bar they make it easier for your study to crawl under that bar and find a statistically significant result
hypothesis testing
based on probability we figure out how likely (probable) a result could have come about simply by chance (a fluke)
alpha is .001
chance of making a type I error is 0.1%
alpha is .01
chance of making a type I error is 1%
alpha is .10
chance of making a type I error is 10%
alpha is .05
chance of making a type I error is 5%
statistically significant
conclusion that the results of a study would be unlikely if in fact the sample studied represents a population that is no different from the population in general; an outcome of hypothesis testing in which the null hypothesis is rejected
95% confidence interval
confidence interval in which there is a 95% chance that the population mean falls within this interval
99% confidence interval
confidence interval in which there is a 99% chance that the population mean falls within this interval
statistics
descriptive stats on a sample usually roman letters
parameter
descriptive stats on population usually unknown usually greek letters
distribution of means
distribution of means of samples of a given size from a population; comparison distribution when testing hypotheses involving a single sample of more than one individual
comparison distribution
distribution used in hypothesis testing. It represents the population situation if the null hypothesis is true. It is the distribution to which you compare the score based on your sample's results
standardized effect size
divide the raw score effect size for each study by its respective population standard deviation
two-tailed tests
divide up the alpha into each tail, requiring a more extreme score in each tail to find a significant result set the limbo bar lower (further out in the tail), thereby reducing power
knowing result statistically significant
doesn't tell us how big the effect of our treatment is -need to calculate the effect size
raw score effect size
effect size is given in terms of the raw score on the measure
type II error
failing to reject the null hypothesis when in fact it is false; failing to get a statistically significant result when in fact the research hypothesis is true
type II error
failing to reject the null hypothesis when it is in fact false deciding out study does not support the null hypothesis when it does
make alpha larger
greater area for rejection
larger predicted effect size
greater power of your study
more stringent (.01)
greater risk for type II error
high power study
has a high probability of finding a significant result
low power study
has a low probability of finding a significant result
maximize your study's power
have a large sample use a strong treatment use a one-tailed test set the highest alpha level you can (usually .05) if possible decrease the comparison pop's SD by measuring your variables more precisely
studies w treatments w large effect sizes
have higher power than studies of treatments with smaller effect sizes
one-tailed tests
have more power than two-tailed tests
upper confidence limit
highest possible population mean that would have 95% (or 99%) probability of including our sample mean
Z test
how to determine the likelihood that the mean of our sample could have occurred simply due to chance if the null hypothesis is true
one-tailed test
hypothesis testing procedure for a directional hypothesis; situation in which the region of the comparison distribution in which the null hypothesis would be rejected is all on the one side (tail) of the distribution
two-tailed test
hypothesis-testing procedure for a nondirectional hypothesis; the situation in which the region of the comparison distribution in which the null hypothesis would be rejected is divided between the two sides (tails) of the distribution
z test
hypothesis-testing procedure in which there is a single sample and the population variance is known
decision errors
incorrect conclusions in hypothesis testing in relation to the real situation, such as deciding the null hypothesis is false when its really true
absence of evidence
is not evidence of absence
mean of distribution of means
is the same as the mean of the population of individuals
SE goes down
it becomes easier to get a significant Z score
make alpha smaller
less area for rejection
less people
less power
smaller your alpha
less power your study has
introduction to hypothesis testing
likelihood of result occurring due to chance goes down, the result becomes more improbable and hence is significant
real effect, important
likely if there's a large effect size
real effect, unimportant
likely if there's a small effect size
lower confidence limit
lowest possible population mean that would have 95% (or 99%) probability of including our sample mean
large difference between means (power)
more likely a significant result
more stringent alpha
more likely you are to commit a type II error
more people
more power
bigger your alpha
more power your study has
bar too low (alpha too stringent)
no one can crawl under the bar= no one is an athlete a lot of misses (alpha is too stringent)
H0
null hypothesis
fail to reject our null hypothesis
our results are inconclusive
reject null hypothesis
our results support the research hypothesis
chance diffs vs sig diffs
our sample score may differ from the population simply due to chance-- it's a fluke real, true difference (significant difference) the difference between our sample score and our population mean must be relatively unlikely to be due to chance only inferential test can tell you this
cutoff sample score
point in hypothesis testing, on the comparison distribution at which, if researched or exceeded. by the sample score, you reject the null hypothesis. also called "critical value"
u1
population 1
u2
population 2
nondirectional hypothesis
predicted effect is in no specific direction difference between sample and population is predicted diff can be above or below the pop mean use two-tailed test
hypothesis
prediction, often based on informal observation, previous research, or theory, that is tested in a research study
sig level
probability level set at the beginning of the study only results that are more improbable than this level will be considered non-chance results
beta
probability of making a Type II error
beta
probability of making a type II error
statistical power
probability that the study will give a significant result if the research hypothesis is true
hypothesis testing
procedure for deciding whether the outcome of a study (results of a sample) supports a particular theory or practical innovation (which is thought to apply to a population)
confidence interval
range of scores that is likely to include the true population mean; the range of possible population means from which it is not highly unlikely that you could have obtained your sample mean
type I error
rejecting the null hypothesis when in fact it is true; getting a statistically significant result when in fact the research hypothesis is not true
H1
research hypothesis
directional hypothesis
research hypothesis predicting a particular direction of difference between populations
nondirectional hypothesis
research hypothesis that does not predict a particular direction of difference between the population like the sample studied and the population in general
standard error (SE)
same as standard deviation of a distribution of means; also called standard error of the mean (SEM)
two-tailed procedure
sample can result in either tail could reject the null
cutoff scores
score so extreme that, if you found it (or a score even more extreme) in your sample then you'd be pretty convinced that something is going on score is sufficiently unlikely to occur if the null hypothesis was true
theory
set of principles that attempt to explain one or more facts, relationships, or events; psychologists often drive specific predictions from theories that are then tested in research studies
treatment predicted to have large effect
should result in a large difference between your sample mean and the null hypothesis population mean
alpha
significance level; probability of making a type I error
decision errors
sometimes our decision to "reject the null" or "fail to reject the null" is wrong this decision is based on probability
standard deviation of distribution of means
square root of the variance of a distribution of means also called standard error (SE) or standard error of the mean (SEM)
effect size conventions
standard rules about what to consider a small, medium, and large effect size, based on what is typical in psychology research; also known as Cohen's conventions
Cohen's d
standardized effect sizes a significant treatment is not necessarily a meaningful treatment effect size tells you how meaningful your results may be becomes very useful when comparing the effectiveness of treatments from studies that use different measurement scales it'll tell you which treatment produces a bigger effect
effect size
standardized measure of difference (lack of overlap) between populations. Effect size increases w/ greater differences between means.
null hypothesis
statement about a relation between populations that is the opposite of the research hypothesis; statement that in the population there is no difference (or a difference opposite to that predicted) between populations; contrived statement to set up to examine whether it can be rejected as part of hypothesis testing
research hypothesis
statement in hypothesis testing about the predicted relation between populations (often a prediction of a difference between population means) also called alternative hypothesis
meta-analysis
statistical method for combining effect sizes from different studies
treatment predicted to produce weak effect
study will be less powerful less able to detect that effect
treatment predicted to produce strong effect
study will be more powerful Better able to detect that effect
power tables
table for a hypothesis-testing procedure showing the statistical power of studies with various effect sizes & sample sizes
only calculate if
the effect is significant
more lenient (higher alpha is)
the less likely you are to commit a type II error
mean of a distribution of means
the mean of a distribution of means of samples of a given size from a population; it comes out to ne the same as the mean of the population of individuals
complete opposites
the null and research hypothesis
directional hypothesis
the predicted effect was in one (pre-specified) direction use one-tailed test procedure -> a result in only one tail (not both) could reject the null
power
the probability that you study will find a significant result (when the research hypothesis is true) the likelihood that you will find a significant result when it's really there
lower the alpha
the smaller the chance of making a type I error
result not statistically significant
treatment may have no effect treatment may have an effect but you missed it - type II error
don't know decision errors made
until a study is replicated (which does not happen a lot)
confidence limit
upper or lower value of a confidence interval
variance of distribution of means
variance of population of individuals divided by the number of people in the sample
probability high
we assume it was likely just due to chance
probability low (less than 5% of the time)
we can conclude that the outcome was not due to chance outcome is significant
correct decision 2
we do not reject the null hypothesis when the null hypothesis is true
correct decision 1
we reject the null hypothesis (research hypothesis is supported) when the research hypothesis is true
Type I error (alpha)
we rejected the null hypothesis when it is in fact true
never say prove
we set up the null hypothesis to see if we can or cannot reject it
can never have type I error
when the drug actually works when the drug works it is impossible to make the error of saying a drug works if the drug actually does work
type I error
worse for literature pollutes the scientific data base findings get published that something works when it doesn't
type II error
worse for researcher/career delays the discovery of important findings results inconclusive -> don't get grants or scholarships -> hampers their ability to increase their status
statistical significance & effect size
you need to know the likelihood that your result came about by chance (whether it's statistically significant) and the magnitude of your result (the effect size)
type II error
your treatment may have an effect but you missed it your treatment was too small (leading to a small effect size) your sample was too small your alpha was too small
type I error
your treatment really didn't work but you think it did your sample just behaved really weirdly