Week 3: One-way ANOVA and Post Hoc tests
df error
(k-1)-(N-k)
family wise error rate
1-(1-a)^n
homogeneity of variance
ANOVA is fairly robust is the sample sizes are equal, but not if the sample sizes are unequal if there is heterogeneity of variance, you can use the Brown-Forsyth or Welch F ratio instead of the regular F ratio - it takes into account the the variances are not equal
what is the alternative to using multiple t-tests?
ANOVA- can compare multiple groups in one test or FWE (family wise error) control
f ratio
F = MS between/ MS within
post hoc for unequal sample sizes
Fabriels (small discrepancy in n) or Hochberg's GT2 (large discrepancy in n)
post hoc for unequal variances
Games-howell
f ratio for repeated measures
MS between treatments/ MS error
df for total
N-1
df for within treatments
N-k
SS between subjects
P = total of the scores for each participant K= number of treatment levels G = grand total for all scores N = total number of scores
T-tests
Parametric, tests the difference between two sample means.
MS between
SSbetween/df between
MS error
SSerror/ df error
SS error
SSwithin - SSbetween subjects
MS within
SSwithin/dfwithin
type 2 error
accept the null hypothesis when we shouldn't- failed to reject this is related to power
omega-squareed
alternative to eta-squared- unbiased estimator and unbiased version of eta-squared 0.01 small 0.06 mediam 0.14 large
how do we run a one way independent ANOVA in SPSS
analyse- compare means- one way ANOVA
how do we run a One-way repeated measures ANOVA in SPSS
analyse- general linear model- repeated measures
Post hoc recommendations
assumptions met and equal sample size use- REGWQ or Tukey HSD safe option Bonferroni- although, quite conservative unequal sample sizes- Fabriels (small discrepancy in n) or Hochberg's GT2 (large discrepancy in n) unequal variances-Games-howell
sphericity
calculate the differences for all the possible participants
between-subjects
compare subjects between each other independent measures different participants take part in each condition
the ANOVA model
compare the amount of variability explained by the model to the error in the model (our model is our groups in ANOVA) if the model explains a lot more variability than it can't explain (the model variance is much greater than the residual variance), then the experimental manipulation had a significant effect on the outcome (DV) the larger F is the less likely it is due to sampling error
when group with larger sample sizes have larger variances than the groups with smaller sample sizes, the resulting F-ratio tends to be ________
conservative - more likely to produce a non-siginifcant result when genuine difference does exist
the f-ratio ______ tell us whether the F-ratio is large enough to not be a chance result
does not to discover this we can compare the obtained value of F against the maximum value we would expect to get by chance if the group means were equal in an F-distribution with the same degrees of freedom if the value we obtain exceeds this critical value we can be confident that this reflects an effect of ur independent variable
post hoc tests
essentially t-tests compare each mean against all others- in general terms they use a stricter criterion to accept an effect as significant they control the family-wise error rate (afw_ simplest example is the bonferroni method
standard pool
estimate of standard deviation
The REGWQ has _____ and tight control of the ____ error rate
good power; type1
sphericity violations
if Mauchly's test is less than p 0.05 we use an alternative, corrected ANOVA epsilon values: greenhouse-geisser, huynh-feldt, lower-bound each one of them corrects for the violation of sphericity corrects the degrees of freedom
when do we use greenhouse gessier correction?
if the greenhouse-geisser is less than 0.75 we use this correction - it is a conservative method
what is the problem of doing multiple t tests?
if we are doing a test it assumes that this is the only test being done decision-wirse error for each comparison assumes it is the only test (a) when we conduct multiple tests we have this 5% rate each time - the more tests we do the more likely that one of those tests will be wrong family-wise error (all of the tests we are conducting) becomes inflated- we need to control this family-wise error rate family wise error= 1-(1-0.05)^4 instance 3 tests our family wise error can be computed = 1(0.95)^4 increases type 1 error- false positive- rejecting the null hypothesis when it is true
underlying theory of repeated measures ANOVA
in a repeated measures search study, individual differences are not random- - because the same individuals are measured in every treatment condition, individuals differences will have a systematic effect on the results - these individual differences can be measured and separated out from other sources of error we partition the variance into: -total variance -between subjects - within subjects- previously the residual or error
assumptions
independent observations- someone's score in one group can't affect someone's score in another group interval/ratio level measurement- increments must be equal normality of sampling distribution- remember the central limit theorem homogeneity of variance the group variances should be equal
eta-squared
it is a biased estimator- really good at telling about our sample but doesn't give us a very accurate effect size in our population 0.01 small 0.09 mediam 0.25 large
what does the f-ratio measure?
it is a measure the ratio of the variation explained by the model and the variation explained by unsystematic factors
is ANOVA robust?
it is a robust test meaning that it doesn't really matter if we break he assumptions of the test- the F will still be accurate.
df for between treatments
k-1
when normality is violated when non-parametric test can be performed? (independent measures)
kruskal wallis
when the groups with larger sample sizes have smaller variances than the groups with smaller sample sizes, the resulting F-ratio tends to be_____
liberal- more likely to produce a significant result when there is no difference between groups in the population- type 1 error rate is not controlled
skewed distributions seem to have _____effect on the error rate and power for two-tailed tests
little
what test is used to assess sphericity
mauchly's test
what are the methods of follow up for ANOVA?
multiple t-tests? Not a good idea- increases our type 1 error we could do orthogonal contrasts/comparisons- this what we do when you have a hypothesis post hoc test- use when you DO NOT have a planned hypothesis, compare all pairs of means, in order to use post hoc tests properly we have to have a SIGNIFICANT ANOVA trend analysis- if we believe the means follow a particular shape
sum of squares
need to calculate: SST SSM= SSB (between subject factor) SSR= SSW (within subject factor) work out the overall mean and the group means
if the f-ratio is less than 1 it must represent a _________effect
non significant effect this is because there must be more unsystematic than systematic variance our experimental manipulation has been unsucessful
non-orthogonal comparisons
non-orthogonal comparisons are comparisons that are in some way related using a cake analogy- non-orthogonal comparisons are where you slice up your cake and then try to stick slices of cake together again the comparisons are related so the resulting test statistics and p-values will be correlated to some extent for this reason you should use a more conservative probability level to accept that a given contrast is statistically meaningful
labeling ANOVAs
one way- one IV or factor two-way- two IV or factors three-way- three IV or factors
post hoc tests consist of ______comparisons that are designed to compare all___________
pairwise; different combinations of the treatment groups
the three examples of effects sizes
proportion of variance accounted for (r^2) eta-squared (n^2) omega-squared (w^2)
when is anova robust to normality
providing that your same sizes are equal and large (df error >20) ANOVA is robust to the normality assumption being violated if the sample sizes are not equal/small, ANOVA is not so robust to the normality assumption violation- you can transform the data or use a non-parametric alternative to a one-way independent measures ANOVA (Kruskal-Wallis tests)
type 1 error
reject the null hypothesis when we shouldn't- say theres a difference when there isn't
________design is more powerful than_________design because they remove individual differences
repeated measures; independent measures
MSr
represents the average amount of variation explained by extraneous variables- the unsystematic variation
mean squares MSm
represents the average amount of variation explained by the model- systematic variation
In SPSS what does the row and column represent
row- data from on entity column- a level of a variable
within-subjects design
same participants take part in each condition one sample for level of the IV also known as repeated measures
the effects of kurtosis seem unaffected by whether _______
sample seizes are equal or not
f controls the type 1 error rate well under what conditions?
skew, kurtosis and non-normality
effect size
small effect sizes will be significant with large n larger the sample size the more likely you will find an effect effect sizes indicate how big that effect is it allows comparisons across contexts
Bonferroni has more power when the number of comparisons is_____, whereas Turkey is more powerful when testing______
small; large number of means
one-way ANOVA
statistical technique for comparing several means at once tests the null hypothesis that the means for all populations of interest are equal we call it the omnibus test - that somewhere there is an inequality- but it doesn't tell us where this difference actually is this is why need a follow up to work out where this difference is- there are two approaches: post hoc or planned comparisons/contrast
omnibus test
tells us means aren't equal somewhere, but not where
cohen's d
tells us the degree of separation between two distributions- how far apart, in standard deviation units, are the means 0.2 small 0.5 medium 0.8 large
mauchly's test
test of sphericity tests the null hypothesis that variances between level differences are equal assumption is violated when it is less than 0.05= we then must look at the Epsilon values the assumption is not violated when it is greater than 0.05
if the f-ratio is greater than 1 it is indicates______
that the experimental manipulation had some effect above the effect of individual differences in performance
interpretation of the F-ratio
the f-ratio tells us only that the direct or indirect manipulation was successful it does not tell us specifically which group means differ from which we need to do additional tests to find out where the group differences lie
power
the probability of failing to reject the null hypothesis if it is true/ the probability of detecting an effect if it is there we want more powered studies
SS within (repeated measures)
the variability within an individual- the sum of squared differences between each score and the mean for that respective participant broken down into two parts - how much variability there i between treatment conditions: model sums of squares SSm - how much variability cannot be explained by the model: residual sums of squared SSr
variance ratio method
there is overlap in individual groups- this is where we get our error variance how much variability is there in the total sample group- treatment model in ANOVA we look at this overall variance and we divide it by the error variance if the ratio is greater than 1 - we have evidence that the means are differet
multivariate tests
these tests conduct a MANOVA rather than an ANOVA these results can be used in pace of the regular ANOVA results i the sphericity or normality assumptions are violated whilst these tests are more robust than ANOVA to the assumption violations they are also less powerful
how does pairwise comparisons control the FWE?
they control FWE by correcting the level of significance of each test such that the overall type 1 error rate across comparisons rains at 0.05
polynomial contrasts: trend analysis
this contrast tests for trends in the data and in its most basic form it looks for a linear trend there are other trends- quadratic, cubic, and quartic trends linear trend- proportionate change in the value of the dependent variable across ordered categories quadratic trend- where there is a curve in the line (to find a quadratic trend you need atlas three groups) cubic trend- where there are two changes in the direction of the grand (must have at least four categories of the IV) quartic trend- has three changes of direction (need at least 5 categories of the IV). each of these trends has a set of codes for the dummy variables in the regression model- if you add the codes for a given trend the sum will equal zero and if you multiple the codes the sum of products will also equal zero- these contrasts are orthogonal
when do we use lower-bound correction?
this is too conservative- this assumes that sphericity is completely violated avoid using this correction
platykurtic distributions make the type 1 error rate ______ and consequently the power is______
too high; too low
leptokurtic distributions make the type 1 error rate ______ and consequently the power is______
too low; too high
SStotal
total variance in the data sum of squared differences between each score and the grand mean
if a test is conservative (the probability of a _____ error is small) then it is likely to lack statistical power (the probability of a _____error will be high)
type 1; type 2
why opt for repeated-measures?
uses a single sample, with the same set of individuals measured in all of the different treatment conditions- one of the characteristics of repeated measures design is that it eliminates variance caused by individual differences individual differences are those participant characteristics that vary from one person to another and may influence the measurement that you obtain for each person- age, gender etc.
problems with repeated measures
usually we assume independence but scores will be correlate between conditions, hence it violates the assumption of independence accordingly, an additional assumption is made: sphericity - assumes that the variances and covariances of differences between treatment levels are the same -related to the idea that the correlation of the treatment levels should be the same
SSmodel or SS between
variance explained by the model- groups sum of squared differences between each group mean and the grand mean weighted by group n the more groups means spread out the greater the sum of between will be SStotal- SS within treatments Between groups= our model = SS model = SS between
SSresidual or SSwithin
variance that cannot be explained by the model-groups sum of squared differences between each score and the group mean SSwithin treatments= SS1 + SS2 + SS3 etc. within groups = our error = SS residual = SS within
underlying theory of between-subjects ANOVA
we calculate how much total variability there is between scores- total sum of squares we then calculate how much of this variability can be explained by the model we fit to the data: model sum of squares and then how much cannot be explained- how much variability is due to individual differences or error - residual sum of squares
when do we use huynh-feld correction?
when greenhouse-geisser is greater than 0.75 it is a less contrastive method than compared to greenhouse-geisser
when shouldn't we use REGWQ test?
when group sizes are different
when is ANOVA fairly robust to violations of the assumption of homogeneity of variance?
when sample sizes are equal
when should we use REGWQ test?
when we want to test all pairs of means
what else can you use when assumption of normality or sphericity are violated?
you can also use the multivariate tests such as MANOVA if the assumptions of normality or sphericity are violated as this provides a more robust test of the null hypothesis that all treatment means are equal you could also conduct a Friendman's test as the non-parametric alternative to a one way repeated measures ANOVA