Chapter 13 Anova
ANOVA
-analysis of variance, hypothesis testing procedure used to evaluate mean difference between two or more treatments.
SS and M are also important to the formulas for ANOVA
...
another way for computing the between-treatments sum of squares, but it can only be used when all treatments have the same number of scores. But, it presents the same results as equation
...
ANOVA advantage over t-test
ANOVA over t-test because ANOVA reduces the risk of type I error because it combines all the levels and can test more than two samples
The advantage of ANOVA is that it performs all there comparisons simultaneously in the same hypothesis test. Thus,
ANOVA uses one test with one alpha level to evaluate the mean differences and thereby avoids the problem of an inflated experimentwise alpha level
Step 3: To compute the F-ratio you need a series of calculations:
Analyze the SS to obtain SSbetween and SSwithin. Use the SS values and the df values from step 2 to calculate the two variances, MSbetween and MSwithin . Finally use the MS values to compute the F-ratio. SStotal is simply the SS for N = SStotal = ∑X2- G2 N SSwithin combines the SS values from inside each treatment condition. SSwithin = ∑SSinside each treatment = SSbetween measures the differences between the four treatments means. Ssbetween = ∑T2 - G2 n N
Two characteristics
Because F-ratios are computed from two variances (the numerator and denominator of the ratio), F values always will be positive numbers. Variance is always positive. When H0 is true, the numerator and denominator of the F-ratio are measuring the same variance. So, the two sample variances should be about the same size, so the ratio should be near 1. The distribution of F-ratios should pile up around 1.0.
We need two separate analyses
Compute SS for the total study and analyze it into two components (between and within), Then compute df for the total study, and analyze in into two components (between and within)
First, we find df for the total set of N scores, and them partition this value into two components. There are two considerations to keep in mind:
Each df value is associated with a specific SS value. Normally, the value of df is obtained by counting the number of items that were used to calculate SS and then subtracting l, i.e. compute SS for a set of n scores, then Df = n - 1
This is the total probability of a Type 1 error accumulated from all of the separate tests in the experiment
Experimentwise
the numerator and denominator of the F-ratio measure variances, or mean squared differences. Therefore, we can express the F-ration as follows:
F = (differences between samples2/ (differences expected by chance)2
The test statistic for ANOVA the test statisitc is called
F-ratio with F = variance (differences) between sample means divided by variance (differences) expected by chance (error)
The sum of all the scores in the research study (the grand total) is identified by
G. G = ∑T
Forming the Hypotheses for ANOVA
H0 : µ1 = µ2 = µ3 H1 : µ1 ≠ µ2 ≠ µ3 or H1 : µ1 = µ3, but µ2 is different
Hints for predicting data outcomes:
Hint 1: Remember: SS and MS provide a measure of how much difference there is between treatment conditions. Hint 2: Find the mean of total (T) for each treatment, and determine how much difference there is between the two treatments. The F-ratio always measures how much difference exists between treatments. But you should be able to look at the data to see if there is a large or small difference
There are two primary sources for chance differences
Individual differences and Experimental error
STEP 2:
Locate the critical region for the F-ratio. We must determine degrees of freedom for MSbetween treatments and MSwithin treatments (the numerator and denominator of F) dftotal = N -1
In analysis of variance, we combine two or more samples.
MSwithin = SS within = ∑SS = SS1 + SS2 + SS3 + . . . dfwithin ∑df df1 + df2 + df3
Between Treatments Variance
Measures the size of the differences are for a set of numbers, or treatment conditions and are simply due to chance, we are really measuring the differences between the sample means.
Total Sum of Squares SStotal, for the entire set of N scores.
SS = ∑X2 - (∑X)2 N
We will need to compute an ______ for the variance both between treatments (numerator of F) and another ____ value for the variance within treatments (denominator of F).
SS and a df; SS and df
The ANOVA summary table shows the sources of variability (between treatments, within treatments, and totoal variability),
SS, df, MS, and F.
The value of SSbetween treatments can be found by subtraction.
SSbetween = SStotal - SSwithin
SSbetween treatments
SSbetween = SStotal - Sswithin
To make the formula consistent:
SStotal = ∑X2 - G2 N
Anova testing Step 1
Step 1: state hypotheses H0 : µ1 = µ2 = µ3 = µ4 no effect H1 : At least one of the treatment means is different. Then the critical value α = .05
The total (∑X) for each treatment condition is identified by
T
The alpha level you select for each individual hypothesis.
Testwise
Or, you can use the formula dfwithin = N - k Here, adding up all the n values gives N.
The number of treatments = k. df within =
Post Hoc test are done when You reject H0 and
There are three or more treatments (k ≥ 3).
Two explanations for differences between treatments
Treatment Effect, The differences are simply due to chance
how is the alternate hypothsis determined in ANOVA
Usually theory or the results of other studies
Next, calculate mean squares
We must compute the variance or MS for each of the two components: MSbetween = SSbetween= df between = Mswithin = SSwithin= df within Calculation of F. F = Msbetween = Mswithin Decision: Reject the H0.
There are two possibilities for the F-ratio:
When the treatment has no effect, then the differences between treatments (numerator) are entirely due to chance. With the numerator and denominator roughly equal, the F-ratio should have a value around 1.00. When the treatment effect is zero
Researchers use the variance within treatments, the error term, as
a benchmark or standard for evaluating the differences between treatments.
The total number of scores in the entire study is specified by ____. When all the samples are the same size,
a capital N; n is a constant, N = kn
when a researcher uses a nonmanipulated variable to designate groups for ANOVA the variable is called
a quasi-independent variable
If H0 is true, we expect If H0 is not true,
a small value for F; we expect a large value for F.
an F-ratio near 1.00 indictates that the differences between treatments (numerator) are
about the same as the differences that are expected by chance (denominator).
The concept of pooled variance is the same whether you have exactly two samples or more than two samples. You simply
add the SS values and divide by the sum of the df values
If each of the t values is squared, then
all of the negative values become positive
If you have two independent-measures you can use either a t-test or ANOVA. These techniques will
always result in the same statistical decision. F = t2
Because we are going to analyze variability, the process is called
analysis of variance
Rejecting H0 indicates that
at least one difference exists among treatments.
The structure of the F-ratio also compares differences
between sample means vs. differences due to chance (error).
The term ____ refers to differences from one treatment to another. With three treatments we compare three different means, df = 3 - 1 = 2.
between treatments
The ANOVA requires that we first compute a total sum of squares and then partition the value into two components,
between treatments and within treatments
Because differences are unexplained and unpredictable, they are considered to be
chance occurrences
First, determine the total variability for the entire set of data by
combining all the scores from all the separate samples to obtain one general measure of variability for the complete experiment
How do we measure size?
compute r2 will measure how much of the differences between scores is accounted for by treatments. r2 = Ssbetween Sstotal
The shape of the F distribution depends on the ____ of the two variances of the F-ratio.
degrees of freedom
the variance in the _____ of the F-ratio and the standard error in the denominator of the t statistic both measure the differences that would be expected just by chance or sampling error
denominator
df Notation:
df = 2, 12
The SS formula measures the variability for the set of treatment totals. Count the number of T values and subtract 1. Because the number of treatments is k, the formula is
df between = k - 1
Analyze this total into two components.
df between = k -1 And, df within = ∑df inside each treatment
Degrees of Freedom, df total. Remember, SS measures varibility for the entire set of N scores.
df total = n - 1
To find the df associated with SS, look at how the SSwithin value is computed. We first find the SS inside of each of the treatments and then add these values together. Each of the treatment values measures variability for the n scores in the treatment, so each SS will have df = n-1. When all of these treatments are added together, we obtain
df within = ∑(n-1) = ∑dfin each treatment
The word analysis means
dividing into smaller parts
In ANOVA, the MS value in the denominator of the F-ratio is called the
error term
This MS value is intended to measure the amount of ____—that is, variability in the data for which there is no systematic or predicable explanation
error variability
Post hoc tests are additional hypothesis tests that are done after any ANOVA to determine
exactly which mean differences are significant and which are not
In analysis of variance, the variable (independent or quasi-independent) that designates the groups being compared is called a
factor
Any value that is in the critical region for t will end up in the critical region
for F-ratios after it is squared.
ANOVA uses sample data as the basis
for drawing general conclusions about populations
A large F-ratio indicates that the differences between treatments are
greater than expected by chance and has a significant effect.
In ANOVA we use variance to measure
how big the differences should be if there is no treatment effect
In the t statistic we computed an estimated standard error to measure
how much difference is expected by chance
two samples are not expected to be identical even
if there is no treatment effect whatsoever
variables in ANOVA where the researcher manipulates the variable to create a treatment condition is an
independent variables
Like t tests, ANOVA can be used with either an
independent-measures or repeated measures design
With very ____df values, nearly all the F-ratios will be clustered very near to 1.0. With the ____ df values, the F distribution is more spread out.
large; smaller
individual conditions or values that make up a factor are called the
levels of the factor
In ANOVA, it is customary to use the term _____ instead of variance, which is defined as the mean of the squared deviations. MSvariance = s2 = SS df
mean square or MS
the goal of Anova is to
measure the amount of variability (the size of the difference) and to explain where it comes from
When the treatment effect is zero (H0 is true), the error term
measures the same sources of variance as the numerator of the F-ratio, so the value of the F-ratio is expected to be nearly equal to zero
The number of scores in each treatment is identified by
n
entire process of analysis of variance will require ___ calculations;
nine; three values for SS, three values for df, two variances (between and within), and a final F-ratio
If the treatment had an effect, the numerator of the F-ratio should be _____, and we should obtain an F-ratio noticeably larger than 1.00.
noticeably larger than the denominator
ANOVA corresponds to two hypothesis:
null and alternative as part of the general hypothesis testing procedure
The precision of the sample variance depends on the
number of scores or the degrees of freedom.
the variance in the ____ of the F - ratio provides a single number that describes the differences between all the sample means
numerator
with both the t statistic and ANOVO the ____ of the ratio measures the actual difference obtained from the sample data, and the _____ measures the difference that would be expected if there is no treatment effect
numerator; denominator
The denominator of the F-ratio measures
only uncontrolled and unexplained variability and is called the error term
A post hoc test enables you to go back through the data and compare the individual treatments two at a time. This procedure is called
pairwise comparisons
Within-Treatment Variance
provide a measure of the variability inside each treatment condition
The numerator of the F-ratio always includes the ______ as in the error term, but it also includes any systematic differences caused by the treatment effect
same unsystematic variability
A ____ mean difference indicates that the differences observed in the sample data is very unlikely to have occured just by chance.
significant
The ability to combine different factors and to mix different designs within one study provides researchers with the flexibility to develop studies that address scientific questions tat could not be answered by a single design using a single factor. These are called
single-factor designs.
As the number of separate tests increases,
so does the experiment-wise alpha level
Present the findings:
source SS df MS________ Betn treatments 50 3 16.67 F = 8.33 w. Treatments 32 16 2.00 TOTAL 82 19___________
The fact that the t statistic is based on differences and the F-ratio is based on
squared differences leads to the basic relationship F = t2 You will be testing the same hypotheses: H0 and H1. 2. The df for the t statistic and the df for the denominator of the F-ratio (dfwithin) are identical. The distribution of t and the distribution of F-ratios match perfectly if you consider the relationship F = t2.
Anovo provides difficulty in hypothesis testing. The F-ratio tells you that a significant difference exists; it does not
tell exactly which means are significantly different and which are not.
differences between treatments are significantly greater than can be explained by chance alone;
that is, the differences have been caused by the treatment effects
The major advantage of ANOVA is
that it can be used to compare two or more treatments
With either the F-ratio or the t statistic, a large value provides evidence
that the sample mean difference is more than would be expected by chance alone
With an F-ratio near 1.00, we will conclude
that there is no evidence that the treatment had any effect.
The final calculation for ANOVA is
the F-ratio, which is composed of two variances, F = variance between treatments divided by the variance within treatments Variance for sample data: sample variance = s2 = SS/df
. The numerator of the F-ratio (MSwithin) simply measures how much difference exists between the treatment means. The bigger the mean differences,
the bigger is the F-ratio
k is used to identify_____. For an independent-measures study, k also specifies the number of separate samples
the number of treatment conditions.
If the between-treatments differences (MSwithin)are substantially greater than the error terms, then
the researcher can confidently conclude that the differences between treatments are due to more than chance
The denominator of the F-ratio (MSwithin) measures the variance of the scores inside each treatment; that is, the variance for each of the separate samples. In general, the larger the sample variances,
the smaller is the F-ratio
The structure of the t statistic compares to the actual differences between samples (numerator) with
the standard differences one would expect by chance.
With ANOVA we must decide between two interpretations:
there really is no differences between the populations (or treatments) and the populations (or treatments) really do have different means
The goal of an ANOVA three sample anaylisis is
to determine whether the mean differences observed among the samples provide enough evidence to conclude that there are mean differences among the three populations
The term ___ refers to the entire set of scores. We compute SS for the whole set of N scores, and the df is N-1.
total
Analyzing the total ____ into these two components is the heart of ANOVA
variability
F-ratio is based on ____ instead of sample mean difference
variance
We must compute the _____ between and within treatments in order to calculate the F-ratio
variance
Once we have analyzed the total variability into two basic components, between treatments and within treatments,
we compare them using the F-ratio
The error term is used as a standard for determining
whether or not the differences between treatments (measures by MSbetween) are greater than would be expected just by chance.
The term _____ refers to differences that exist inside the individual treatment conditions. We compute SS and df inside each of the separate treatments
within treatments
To measure chance differences, we compute the variance
within treatments
The _____ variance provides a measure of how much difference is reasonable to expect by chance, i.e, how big are the differences when H0 is true.
within-treatments
Within-Treatments Sum of Squares
∑ SSwithintreatments = ∑SSinside each treatment