One-way ANOVA
significant assumption checks
(p < .05) indicate that the assumption is violated
non-significant assumption checks
(p > .05) indicate that the assumption is met; greater is good
one-way independent measures ANOVA
-A statistical procedure which compares three or more means simultaneously -Tells you whether or not all group means are approximately the same
within-groups ANOVA
-AKA repeated measures -same subject participants in all conditions -tests for individual differences across conditions
Post hoc tests for unequal sample sizes
-Gabriel's (for small discrepancy in n) -Hochberg's GT2 (for large discrepancy in n)
Post hoc tests for unequal variances
-Games-Howell (can also be used with unequal n)
Post hoc tests for assumptions met and equal sample sizes
-REGWQ -Tukey's HSD
reporting the Brown-Forsythe or Welch's-F ratio
-State that the homogeneity of variance assumption is violated -report the results as usual - except that you used two decimal places for second df value. This is because the F-ratio calculations are based on adjustments being made to the df and after the adjustments, the degrees of freedom may have decimals, unlike 'normal' df which are integers
between-groups ANOVA
-different subjects in each condition -test differences between groups
Dealing with assumption violations
-if sample sizes are equal and large (df-error > 100) ANOVA is largely robust to the normality assumption being vioalted (CLT) -if small/unequal sample sizes, consider transforming data or running a Kruskal-Wallis test (nonparametric ANOVA)
Assumptions of ANOVA
-independence of observations -interval/ratio data -normality -homogeneity of variance
partial eta^2
-measure of effect size -values range from 0-1
Effect size
-null hypothesis significance testing is all or none (significant or non-significant); does not tell us how big the difference really is -effect sizes tell us the proportion of variance accounted for; can be r-squared, eta-squared or omega-squared
independent observations
-one observation tells us nothing about the other
power
-probability of rejecting the null hypothesis when it is false -ability to correctly identify a difference between groups when there truly is a difference -if power is low it means you can be less confident that your finding a real difference between groups
population normality
-scores for each condition are normally distributed around mean
one-way ANOVA vs. multiple t-tests
-t-tests can only compare two means at a time -Conducting multiple t-tests inflates the risk of committing a type I error (experimentwise or familywise alpha level): 𝛼𝐸𝑊 = 1 − (1 − 𝛼𝑇𝑊)c -ANOVA compares three or more means simultaneously, thereby keeping the type I error rate under control: 𝛼 = .05
Logic of ANOVA
-the simplest model we can fit onto a set of data is the grand mean (represents no effect/relationship) -we can fit a different model to the data that represents our hypothesis -the intercept and one or more parameters (b) describes the model -the parameters determine the shape of the model we have fitted -if the differences between group means in large enough, then the resulting model will be a better fit of the data than the grand mean -if this is the case we can infer that our model is better than not using a model
handling significant Levene's tests
-transform data -delete entire group if all score same value -ignore (ANOVA robust if group size is equal)
omega-squared effect sizes
.01= small .06= medium .14= large
eta-squared effect sizes
.01= small .09= medium .25= large
Cohen's d effect sizes
.2= small .5= medium .8= large
Rule for contrast weights
1. choose sensible comparisons: because you can only compare two chunks of variation, make sure what you are comparing makes sense 2. groups coded with positive weights will be compared against groups coded with negative weights 3. the sum of weights for comparison should equal 0 4. if a group is not involved in a comparison, assigning it a weight of zero will remove it from all calculations 5. for a given contrast, the weights assigned to the group in one chunk of variation should be equal to the no. of groups in the opposite chunk of variation
Rules for planned comparisons
1. if we have a control group this is usually because we want to compare it against other groups 2. each contrast must compare only two chunks of variation 3. once a group has been singled out for a contrast, it can't be used in another contrast 4. there should be k-1 contrasts (where k= the no. of conditions you are comparing)
Safe option for post hoc tests
Bonferroni, can be overly conservative, but is the most common used
Tukey's honestly signifcant difference (HSD)
If the pairwise comparisons show any two groups are different you can be sure this is significant even though you ran multiple tests
total sum of squares
SSt= SSm + SSr total variability between scores from the squared mean
Polynomial contrast
a contrast that tests for trends in the data; in its most basic form it looks for linear trend (ie. that the group means increase proportionately)
Model sum of squares (SSm)
a measure of the total amount of variability for which a model can account; it is the difference between the total sum of squares and the residual sum of squares
Total sum of squares (SSt)
a measure of the total variability within a set of observations; it is the total squared deviance between each observation and the overall mean of all observations in a data set
Residual sum of squares (SSr)
a measure of the variability that cannot be explained by the model fitted to the data; it is the squared deviance between the observations and the value of those observations predicted by whatever model is fitted to the data; tells us how much of the variation is caused by extraneous factors such as individual differences in weight, testosterone, etc.
repeated contrast
a non-orthogonal planned comparison that compares the mean in each condition (except the first) to the mean of the preceding condition
Deviation contrast
a non-orthogonal planned comparison that compares the mean of each group (except the first or last, depending on how the contrast is specified) to the overall mean
Difference contrast
a non-orthogonal planned comparison that compares the means of each condition to the overall mean of all previous conditions combined
Planned contrasts
a set of comparisons between group means that are constructed before any data are collected. Theory-led comparisons based on the idea of partitioning the variance created by the overall effect of group differences into gradually smaller portions of variance. Has more power than post hoc tests
Post hoc tests
a set of comparisons between group means that were not thought of before data were collected. typically involve comparing the means of all combinations of pairs of groups. To compensate for the no. of tests conducted, each test uses a strict criterion for significance. Tend to have less power than planned contrasts. Usually used for exploratory work for which no firm hypotheses were available on which to base planned contrasts
Analysis of variance (ANOVA)
a statistical procedure that uses F-ratio to test the overall fit of a linear model. In experimental research, this linear model tends to be defined in terms of group means, and the resulting ANOVA is therefore an overall test of whether group means differ
harmonic mean
a weighted version of the mean that takes into account the relationship between variance and sample size. it is calculated by summing the reciprocal of all observations, then dividing by the number of observations
Eta-squared
an effect size measure that is the ratio of the SSm to the SSt; biased but typically measures the overall effect of an ANOVA, and effect sizes are more easily interpreted when they reflect specific comparisons (eg. the difference between two means)
In Iceland Real Nomads Hug Vultures
assumptions for ANOVA: independent observations, interval/ratio data, normality, homogeneity of variance
why do you divide MS by df?
because we are trying to extrapolate to a population, so some of the parameters within that population need to be held constant
Calculating SSr
by looking at the differences between the score obtained by a person and the mean of the group to which the person belonged. then distances between each data point and the group mean a squared and added together to give the SSr
Types of groups
categorical
mixed ANOVA
combination of between and within
Post Hoc tests
compare each mean against all others; adopts a more stringent decisionwise error rate, hence controlling the familywise error
pairwise comparisons
comparisons of pairs of means; designed to test all different combinations of the treatment groups
calculating the F-ratio
dividing the model mean squares by the residual mean squares
orthogonal
equated to independence in stats. means that it is perpendicular to something
What does "ANOVA is a robust test" mean?
even if the assumptions of the test are not met, the F will still be accurate
confidence intervals
expected results for which scores would fall within if we ran the analyses multiple times with different samples of the population
Residual mean squares (MSr)
gauge of the average amount of variation explained by extraneous variables (eg. unsystematic variation)
between groups variance
how much each group differs from the overall mean
within groups variance
how much each person differs from the means of that group
total variance
how much each person differs from the overall mean
trade-offs for post hoc tests
if a test is conservative (the probability of a type I error is small) then it is likely to lack statistical power (the probability of type II error rate will be high)
Quadratic trend
if the means in ordered conditions are connected with a line then a quadratic trend is shown by one change in the direction of this line (eg. the line is curved in one place); the line is U-shaped. There must be at least 3 ordered conditions
Quartic trend
if the means in ordered conditions are connected with a line then a quartic trend is shown by three changes in the direction of this line
Cubic trend
if you connected the means in ordered conditions with a line then a cubic trend is shown by two changes in the direction of this line. There must be at least 4 ordered conditions
What does the F-ratio tell us?
large F= the variability attributable to the model is larger than that due to chace if F-ratio is larger than the critical value for our desired significance test, it indicates that one or more of the treatment means are statistically significantly different from each other
grand mean
mean of an entire set of observations
Mann-Whitney U test
non parametric alternative to t-test; analyses ranks between groups; ranks always have a known distribution so that data is robust to assumption violations
Kruskal-Wallis test
nonparametric alternative to one-way independent ANOVA; follow up with Mann Whitney-U as a substitute for post-hoc testing
between groups df
number of groups -1
Type I error
occurs when we believe that there is a genuine effect in our population, when in fact there isn't
Type II error
occurs when we believe that there is no effect in the population, when in fact there is
degrees of freedom
one fewer than number of participants F(between, within)= __
Model mean squares (MSm)
represents the average amount of variation explained by the model (eg. the systematic variation)
residual variability
scores not accounted for by the model (SSr)
What doesn't the F-ratio tell us?
specifically which group means differ from which
Cohen's d
tells us the degree of separation between two distributions; mean difference/standard deviation
mean squares (average SS/ MS)
the sum of squares divided by the degrees of freedom
degrees of freedom for SSr
the total degrees of freedom minus the degrees of freedom for the model. N-k where N is the total sample size and k is no. of groups
What does it mean if Levene's test is significant?
the variances are significantly different; we need to use a corrected F-ratio (Brown-Forsythe F or Welch's F)
familywise alpha
total alpha for all comparisons. ANOVA keeps th familywise alpha at .05
within groups df
total df - between groups df
heterogeneity of variance ANOVA
use the Brown-Forsythe or Welch F ration (and their associated df and p values) instead of regular F ratio
model variability
variability between groups (SSm)
grand variance
variance within an entire set of observations
F-ratio
way to assess how well a regression model can predict a outcome compared to the error of that model. assess how good the model is compared to how bad it is.
degrees of freedom for SSm
will always be one less than the no. of "things" used to calculate the SS