Biostat Lec 11: ANOVA test family (factorial, post-hoc, Bonferroni)
Gosset's version calculates a t statistic known as tobserved. How is tobs calculated?
tobs = ratio of difference bw 2 means, divided by the pooled standard error the mean.
MSb (Mean square between) is
treatment effect + error
Mean square Between (MSB) is sensitive to
treatment effect + error
As n increases, the population approaches normality
true.
T or F: Performing a matched t-test also increases stringency of ANOVA
true. T-tests are limited to only 2 groups. They are therefore also very stringent
In health sciences, we usually do a post-hoc comparisons of the groups
true. Unless we have a hunch we can do pre-hoc but it's not usually done in medicine
variance = MSb/MSe
variance = mean square between/mean square of the error
If we do have an a priori hypothesis (pre-hoc)
we can make pre-hoc comparisons. But this is not usually done in health sciences.
If we do NOT have an a priori hypothesis (which is most often)
we can only make POST-HOC comparisons
Why do we subtract 2 from the df for a t test?
we used 2 statistics to calculate it (2 means, the difference between them) so we must subtract 2 from the sum of the 2 sample sizes.
MSe (mean square of the error) is
overall error
The Scheffe method of post-hoc comparison
-Compares "everything to everything" and therefore is nondirectional -Has the weakest stringency of all the comparison methods -Used in comparisons of all pairs of means
Special cases of F
-Nesting: compares different groups within each nest. Sees if w have significant differences between the nests that allows us a higher power -Repeated Measures: Obtaining a baseline for several groups, then administering a treatment, Calculates Δs how they change over time, therefore similar to a matched t-test. ○therefore helps us fractionate error + increase stringency like a matched t-test -ANCOVA: analysis of covariance. Correcting for overall residual error/covariates when we know each individual value
Advantages/strengths of a repeated-measure within-subjects design?
-Reduces variance of estimates of treatment-effects (fractionates out variability due to individual variances) -Increases power of the test (since fewer subjects are needed)
Post-Hoc comparison types/methods
-Scheffe method -Dunnet method -Fisher method -Newman-Keuls method
Issues with ANOVA
-The more # of groups (k) you want to compare, the more complex the analysis + interpretation and therefore the lower stringency -Power decreases if you dilute the sample
Disdvantages/weaknesses of a repeated-measure within-subjects design?
-Threat of drop-out, mortality, missing/incomplete data, measurement erors -Weak followup of subject-specific trends over time
Types of ANOVA tests we can do
-a standard ANOVA using time t as an independent variable -a repeated-measures within-subject design with ANOVA/ANCOVA analysis -an analysis of Δs t test or ANOVA
The Dunnet method of post-hoc comparison
-compares "everything to a control group" and therefore is directional -therefore is slightly more stringent than the Scheffe method -for stepwise correction of several treatment means against one reference/control mean, but doesn't compare treatments
The Newman-Keuls method of post-hoc comparison
-groups are inserted for control purposes -we take into account a hierarchy and paired measurements -used for pairwise comparisons of RANKED means -equally as stringent as Fisher
Special cases
-repeated measures -ANCOVA -nesting
The Fisher method of post-hoc comparison
-sets up a structure of pairs/hierarchy -the least significant difference (LSD) for all possible pair-wise comparisons, with n* paired means -more stringent than the Scheffe and Dunnet methods -on par with the Newman-Keuls method; both are very stringent
Rank the methods of Post-hoc comparisons from lowest to highest stringency
1. Scheffe - least, no direction, compares everything to everything 2. Dunnet - somewhat directional, compares everything to one control 3. Fisher - pairs, hierarchy 4. Newman-Keuls - pairwise comparisons of ranked means
2 rules/ steps of ANOVA
1. establish that there is a statistically significant diference somewhere within the design 2. explore where exactly the statistically significant difference lies in the design, and correct the significance level for it
What is nesting?
A special case in which the level of one factor does not cross with the other factor but rather is nested within it. In other words, the levels of factor B are different for different levels of factor A
What is ANCOVA?
ANOVA test, but corrected for the effect of a given underlying variable (a covariate). For example, correlation or better predictive strength, or a baseline value
What is the Bonferroni correction?
Alpha level is adjusted by dividing it by the number of comparisons - prevents fishing or statistical cheating from repeated t-tests. Bonferroni correction = α/# of comparisons made
Ex of nesting
Biostatistics students are nested within classes at CSUN vs UCLA
Fisher's view on the t test says to compare the _____
Fisher says compare the VARIANCES
Once we've made post or pre-hoc comparisons, we then must
Do the Bonferroni correction, dividing the α level by the # of comparisons we made
Within-Subject Design
Each participant acts as his or her own control; tested repeatedly and Δs are calculated with baseline values serving as a covariate
If we wish to use ANOVA using time as an independent variable, how can we analyze?
Either as: -A repeated measure ANOVA/ANCOVA analysis -A t test or ANNOVA analysis of deltas (Δ)
What if F ratio?
F = MSB/MSₑ F = mean square bw/ mean square error
Formula for F
F = MSb/MSₑ F = (SSb/dfb)/(SSₑ/dfₑ) F = [Σ(x̅ - x̅g)²/(k-1)] / [Σ(x - x̅g)² / (N- k)]
T or F: The mean square between is sensitive to the residual error.
FALSE! MSₑ (mean square residual error) is sensitive to residual error. While MSB (mean square between) is sensitive to both the treatment effect & error.
T or F: ANOVA is dependent on he # of groups.
False
T or F: ANOVA is dependent on the complexity of the design itself
False
Gosset's view on the t test says to compare the _____
Gosset says compare the MEANS
Is your calculated F statistic is less than the Fcrit from the F table, what can you conclude?
If Fobs < Fcrit STATISTICALLY INSIGNIFICANT results We cannot move on with the process of ANOVA
Is your calculated F statistic is greater than the Fcrit from the F table, what can you conclude?
If Fobs > Fcrit STATISTICALLY SIGNIFICANT results We can move on with the process of ANOVA
Rule 2 of ANOVA : Find out where exactly the statistically significant difference lies within the design, then correct the significance level for it.
If you know what the outcome is before obtaining the data, we can state a post-hoc hypothesis. -Define what groups will be compared -Then perform either pre-hoc or post-hoc comparisons of the groups
Special cases t: repeated measures
In a repeated measures design (a within-subjects design in which each subject also serves as their own control + therefore is very stringent), the same subjects are used repeatedly with every research condition, including placebos in a crossover design.
What is the mean square witin (MSₑ)?
The mean square within is the ratio of SSₑ divided by the total sample size N, minus the number of groups MSₑ = SSₑ/(N-k)
What is the "Mean Square Between" and where does it go in the calculation of F?
MSB is an estimate of the difference between groups, and it goes in the numerator in the calculation of F.
Write the formula for mean square between (MSb)
MSb = SSb/dfb
What is the "Mean Square Residual Error" and where does it go in the calculation of F?
MSₑ (mean square residual error) is an estimate of the difference within groups, and it goes in the denominator in the calculation of F.
Write the formula for mean square residual error (MSₑ)
MSₑ = SSₑ/dfₑ
Mean square residual error (MSₑ) is sensitive to
Mean square residual error (in the denominator) is only sensitive to the residual error.
What happens if we want to compare more than 2 groups? Can we simply perform repeated t tests?
NO!!! That's statistically CHEATING because we continuously increase the α level, therefore making it more easy to obtain statistically significant results. If you want to compare more than 2 groups, you should perform ANOVA.
F = MSb/MSₑ
formula for calculating F alternatively F = MSb/MSwithin
SStotal of the design is composed of
SSerror/SSwithin and SSbetween
T or F: the smaller the amount of groups compared, the higher power of the design
TRUE! less groups compared = more stringent
T or F: the more the number of groups (k) you are comparing, the more complex your analysis and interpretation is of results.
TRUE! more groups (larger k) -> complex analysis -> lower power + stringency
T or F: the more amount of groups compared, the lower the stringency + power of the design
TRUE! more groups/higher k = less stringent
T or F: variance is the mean of squared deviations about the mean.
TRUE. s² can also be written as MS (mean square).
T or F: variance is the sum of squared deviations about the mean (SS), divided by the degrees of freedom.
TRUE. s² can also be written as ss/df.
How do you establish whether there is a statistically significant difference in a design of more than 2 groups?
Test whether the variability between groups that constitute the design is greater than the OVERALL variability (residual error) within the design.
What is the sum of squares between (SSb)?
The "sum of squares between" is the sum of the differences of each group's mean, from the grand mean SSb = Σ(x̅ - x̅g)²
What is the sum of squares error (SSₑ)?
The "sum of squares error" is the sum of the differences of each individual value from the grand mean. SSₑ= Σ(x - x̅g)²
T or F: If you're attempting to compare only 2 groups, Gosset's t-test is equivalent to Fisher's method of comparing the variances
True
T or F: It is acceptable to perform multiple t-tests after performing ANOVa.
True! no longer statistically cheating. we can in fact do repeated t-tests once we finish ANOVA. This is because we've already established that there's a statistically significant difference (a sig F). However, if we don't establish a sig F, it's still cheating.
best method of comparing more than 2 groups to one another
Use ANOVA to establish whether there is a statistically significant difference in a design that has >2 groups.
What does the term ANOVA refer to when comparing >2 groups?
We analyze the variances between the groups, with respect to the variance WITHIN THE DESIGN
Why is it bad to do repeated comparisons/t tests?
We increase the α level and therefore increase the probability of making a Type I error.
Why do we multiply the sum by the number of groups (k) when computing variance between 2 groups?
We multiply by k (the number of groups) because we need to correct for continuity, due to the fact that we calculate the variability once in each group
Inherent weakness of repeated-measures within-subject design?
Weak at following up subject-specific trends over time - hard to follow patients
why are t tests limited
they can only analyze at most 2 groups ANOVA can analyze more than 2
ANCOVA works by
adjusting or correcting the outcome variable y for the contribution of the covariate, done by using regression
purpose of the t test
allows us to compare 2 distributions, as long as the parametric 3 assumptions are met
ANCOVA
analysis of covariance
What is ANCOVA?
analysis of covariance. It's ANOVA of multiple groups across which there is a COMMON UNDERLYING COVARIATE that alters the outcome variable
What does ANCOVA do?
analyzes covariance across groups for which there is a common underlying covariate (variable that effects the outcome variable y)
F is a family of distributions, which varies as a function of a pair of df and tends towards ______ as N increases
as N increases, tends towards NORMALITY skewed when n is small.
F ratio shows that
as n increases, the distribution shifts towards normalllllity
Repeated measures are collected for each subject in a longitudal design, and analysis is doe in terms of
change OVER TIME (Δ). The difference Δ therefore is obtained from baseline - and the baseline value serves as a covariate
Once you calculate tobs, what do you do?
compare tobs to tcrit from the t table, using the correct α level & df.
Fisher's method of comparing 2 groups involves
comparing the variances, by obtaining a ratio of the variance between groups, divided by the overall variance (residual error) of the design. Then compare the sqrt of the above to the tcrit value from the t-table.
What is the df used for a t-test?
df for a t-test = n₁+n₂ -2
What is the dfb?
dfb = # of groups - 1 dfb = k - 1
What is the dfₑ?
dfₑ = total sample size - # of gorups dfₑ = N - k
What does the t test measure?
effect size ratio between 2 groups
MSb refers to the error of the
entire design. this MSb error of the design can be fractionated.
If we increase the amount of groups we wish to compare, what happens to the α level?
increased # of groups compared -> increases α level -> increases the chance of obtaining statistical significance (which is CHEATING)!! Do ANOVA instead
Directional hypotheses give us
increased stringency. This is why 1 tailed tests are more stringent since they go based off a post-hoc hypothesis of direction
Nondirectional hypotheses are
less stringent, this is why two-tailed tests are less stringent
Rule 1 of ANOVA : establish there is a statistically significant difference in the design
may be in the main effect, or in the interactions. -If there is 2+ main effects, you must compute the interactions. -If the interaction among the main effects is statistically significant, any conclusion about an individual main effect depends on the other.
What is a t test?
measures effect size ratio of distance between 2 means, divided by standard error of the mean it's important because it allows us to make comparisons between groups as long as the 3 assumptions of parametric tests are satisfied.
We do what type of comparisons most often in the health sciences?
post-hoc comparisons
t test formula
ratio of the distance between 2 means, divided by the standard error of the mean
a t test analyzes
repeated measures differences (Δs) between 2 groups, which is the most stringent way to analyze 2 groups. Since n is low, df is low and fractionation of the random error ok.
How else can you write the formula for variance (s²)?
s² = mean of the squared deviations about the mean = MS OR s² = sum of squared deviations about the mean, divided by the df = ss/df
What is the mean square between (MSb)?
the mean square between is the ratio of SSb divided by the number of groups (k -1) MSb = SSb/(k-1)
What's left over after fractionation of the error of the design (MSwithin/ MSₑ) is
the residual error, which increases stringency of the test