Chapter 27

¡Supera tus tareas y exámenes ahora con Quizwiz!

Conditions for using ANOVA(Condition 3)

All the populations have the same standard deviation, whose value is unknown Hardest condition to satisfy and check If this condition is not satisfied ANOVA is often okay if the sample sizes are large enough and if they are similar across the groups Can use group_by( ) and summarize( ) to calculate the sample SD's to see if they're similar and indicate that the population parameters are too General rule: we want the largest standard deviation to be less than twice as largest as the smallest one

What would the data look like in a data frame?

One "grouping" variable (categorical) One continuous response variable ANOVA asks if there is an association between the grouping variable and the response variable.

Tukey's HSD

Tukey's test maintains a 5% experimentwise or "family" error rate. Even if you make many pairwise comparisons, the overall error rate is 5% (at most) Overcomes the issue of multiple testing. Recall: If you conducted 100 tests with a 5% error rate (i.e., α = 0.05) AND the H0 was always true, how many p-values would you expect to be < 0.05? The Tukey's error rate is 5% overall, no matter how many tests you do. Thus it overcomes the problem of multiple testing

Summary of plots/F statistic

- What we informally did on the previous slides was compare the variation between group means to the variation within the groups This focus on variation is why this test is called ANOVA: an ANalysis Of VAriance When the ratio of between vs. within variation is large enough then we detect a difference between the groups When the ratio isn't large enough we don't detect the difference. This ratio is our test statistic, denoted by F

Bonferroni correction

Could compare combination of groups in pairs or comparisons. To compensate for making multiple comparisons and set the overall probability of making a type I error at 0.05, we can adjust our significance level to significance level* for each comparison by dividing the number of comparisons we are making. We then use α∗ as the significance level for each individual comparison. So for a comparison of 3 groups we would use an α of 0.0167 as the significance level for each comparison. This modification is known as the Bonferroni correction.

Conditions for using ANOVA(Condition 2)

Each of the k populations has a Normal distribution with an unknown mean This assumption is less necessary The ANOVA test is robust to Non-Normality Remember that the ANOVA is based on comparing the differences of sample means

F statistic formula

F = Variation among sample means/ variation among observations in the sample Numerator - variance of the samples Denominator - an average of the group variances

ANOVA null hypothesis/Alternative hypothesis

Here we are testing a null hypothesis that all the means are the same, for 3 samples this would be Our alternative hypothesis is that at least one of the means is not equal to the others Even though your hypothesis involves means, the test compares the variability between groups to the variability within groups

ANOVA alternative hypothesis

Not all means are the same. Or, at least one of the means differs from the others.

p hacking

Remember, one of the issues with multiple comparisons is that when you repeatedly question the same dataset, you can end up finding "significant" results by chance alone.

F Distribution

Skewed right, take only positive values, depends on the number of means being compared and the sample size across all groups F statistic follows an F distribution with k - 1 degrees of freedom in the numerator and Ntotal - k degrees of freedom in the denominator The p-value of the ANOVA F Statistic is always the area to the right of the F statistic

Tukey in R

You can think of the TukeyHSD() as a wrap-around for the anova, you can either nest the statements like this: TukeyHSD(aov(outcome ~ group)) or save the ANOVA as an object and use that in the statement: modelresult<-aov(outcome ~ group) TukeyHSD(modelresult, overall_alpha) "Adjusted" means that it is adjusted for conducting multiple tests. The unadjusted p-value would be smaller. You can tell the unadjusted p-value would be < 0.05 because the 95% CI doesn't include 0. Thus, when you have an adjusted test you can't use the CI to infer the value of the p-value!

ANOVA

analysis of variance The ANOVA is based on two kinds of variability: - The variability among sample means or how much the individual group means vary around the overall mean - The variability within groups, how much do individual observation values vary around the group mean If the variability within the k different populations is small relative to the variability among their respective means, this suggests that the population means are in fact different.

ANOVA in R

aov(outcomevariable ~ groupvariable, data = dataset) Will save the output of this as an object and then use tidy to get the output we want What to focus for output? -Statistic is the F statistic, the ratio of the variation between means vs. the variation within groups -p-value is the p-value for the test F statistic in R (6.70): This F says that the variation between the means is nearly 7 times as large as the variation within the groups.

Conditions for using ANOVA (Condition 1)

k independent SRSs, each independent from each of k populations Most important assumption

p of an F statistic in R

pf(value, df1 = numerator degrees of freedom, df2 = denominator degrees of freedom, lower.tail = F) P in R: 0.00313; There is a 0.3% chance of observing the F statistic we observed (or more extreme) under the null hypothesis that all the means are the same.


Conjuntos de estudio relacionados

Topic 9 - Characteristics of an Insurable Risk

View Set

Chapter 7 Cancer Biology Tumor Suppressor Genes

View Set

Chapter Quiz: Disability Income & Related Insurance

View Set

Even, Odd, Prime, and Composite Numbers

View Set

ARM 400 - Segment A - Chapters 1, 2, & 3

View Set

Chapter 20 - Cardiovascular System

View Set

Prelude - And Just What Is Geology?

View Set

Diverticulitis & Appendicitis NCLEX

View Set

Lab 10-2: Install and Configure a Type 2 Hypervisor

View Set