Stats Final Exam
within-treatment variance measures differences caused by:
1. random, unsystematic factors
SP =
= ΣXY − ΣXΣY / n
How to formulate hypotheses?
H0: "All population (treatment) means are the same" H0: µ1 = µ2 = µ3 H1: "At least one population mean is different" e.g.: H1: µ1 ≠ µ2 ≠ µ3 H1: µ1 = µ2 , but µ3 differs. etc.
F =
MSbetween / MSwithin
Correlation
describes the situation in which both X and Y are random variables. In this case, the values for X and Y vary from one replication to another and thus sampling error is involved in both variables.
In a repeated-measures design, the same people are tested in each treatment, so
differences between treatment groups cannot be due to individual differences
interaction effect
effects of IV diff at diff evels of another IV
factor
independent (or quasi-independent) variable
Most commonly used correlation:
pearson correlation
f e = formula
pn
Direction:
positive (+) or negative (-).
Tukey's HSD:
q(√MSWithin / n)
independent measures f =
treatment effect + indiv diffs + chance / indiv diffs + chance
A repeated-measures design removes
variability due to individual differences, and gives us a more powerful test
Unplanned comparisons (post hoc tests)
• Exploring data after H0 has been rejected • Specific tests to control experiment
What is ANOVA?
A hypothesis-testing procedure used to evaluate mean differences between two or more treatments (or populations).
What does ANOVA stand for
Analysis of Variance
df = for chi
C -1
Outliers:
Correlation is particularly susceptible to a few extreme scores--always look at the plot
Another way of expressing ANOVA formula
F = treatment effect + chance / chance
Dependent measures design
Groups are samples of dependent measurements (usually same people at different times; also matched samples) "Repeated measures"
Independent measures desig
Groups are samples of independent measurements (different people)
Main effect
Mean differences along the levels of one factor (oneway F-ratio)
dftotal =
N - 1
Omibus ANOVA
Overall significance test
Testwise Error
Probability of a type I error on any one statistical test
Experiment-wise Error
Probability of a type I error over all statistical tests in an experiment
measure of covariability
SP ("sum of squared products")
our previous measure of variability is
SS ("sum of squared deviation")
MSbetween =
SSbetween / dfbetween
SStotal =
SSwithin + SSbetween -dfwithin + dfbetween
MSwithin =
SSwithin / dfwithin
Hypothesis Testing with ANOVA
Step 1: Hypotheses • H0: all equal; H1: at least one is different Step 1: Hypotheses • H0: all equal; H1: at least one is different • Need: dfB and dfW Step 3: Calculations • SSB and SSW • MSB and MSW • F Step 4: Decision and conclusions • And maybe a source table
Taraban & McClelland
Tested reading times varying attachment and expectation
Nonlinearity:
The data may be consistently related, but not in a linear fashion
Random Variable
The values of the variable are are beyond the experimenter's control. We don't know what the values will be until we collect the data
Fixed Variable
The values of the variable are determined by the experimenter. A replication of the experiment would produce same values
studentized range statistic
a table of critical values of the SRS is provided in appendix G
More than one factor
factorial design
the critical value for Chi-square actually ___ with larger df rather than decreasing
increases
Remember: variance (="noise") in the samples
increases the estimated standard error and makes it harder for a treatment-related effect to be detected
Non-parallel lines =
interaction
regression
involves predicting a random variable (Y) using a fixed variable (X). In this situation, no sampling error is involved in X, and repeated replications will involve the same values for X (This allows for prediction)
When rejecting H0 in an ANOVA test, we
just know there is a difference somewhere...we need to do some detective work to find it
You look up the value for q from your text using
k (# of treatment groups) and dfwithin
dfbetween =
k - 1
The differences among the levels of one factor are referred to as the ____ of that factor.
main effect
Factorial Design can cause what
main effects of either factor, and an interaction effect between the factors
In ANOVA, variance =
mean square (MS)
Pearson correlation
measures the degree and direction of a linear relationship between variables.
HSD is the
minimal mean difference for significance
Parametric tests
must make assumptions about the distribution of the population, and estimate parameters of a theoretical probability distribution from statistics
Parallel lines =
no interaction
C =
number of columns
k =
number of levels of the factor (i.e. number of treatments)
n =
number of scores in each treatment
Levels
number of values used for the independent variable
fo =
observed frequency
interaction between two factors
occurs whenever mean differences between individual treatment conditions (combinations of two factors) are different from the overall mean effects of the factors
Advantages of Repeated-Measures
reduces or limits the variance, by eliminating the individual differences between samples.
One factor
single-factor design
q is the
studentized range statistic
Smith & Ellsworth (1987)
studied the effect of asking a misleading question on accuracy of eyewitness testimony. Subjects viewed a video of a robbery and were then asked about what they saw Factor 1: Type of question (unbiased/misleading) Factor 2: Questioner's knowledge of crime (naïve/ knowledgeable)
Remember the t statistic:
t = actual difference between sample means / difference expected by chance
in a graph, lines that are nonparallel indicate
the presence of an interaction between two factors.
Two-factor ANOVA consists of
three hypothesis tests
N =
total number of scores in the entire study
repeated measures f =
treatment effect + chance / indiv diffs + chance
Pearson Correlation- Points to keep in mind:
• Correlation does NOT imply causation. Ø Direction and third variable problems. • Correlation values can be greatly affected by the range of scores in the data. • Outliers can have a dramatic effect on a correlation value.
in independent measures, differences within groups could be due to
• Individual differences • Error or chance
In independent measures, differences between groups could be due to
• Treatment effect • Individual differences • Error or chance (tired, hungry, etc)
SSbetween =
∑ -T2/n - G2/N
SSwithin =
∑SSinside each treatment *****SSwithin = SStotal - SSbetween*****
T =
∑X for each treatment condition
SStotal =
∑X2 - (G^2/N)
dfwithin =
∑df in each treatment = N - k
We have to make sure we do not exceed a
.05 chance of a Type I error while we "investigate"
Steps in Hypothesis Testing for chi
1) State hypotheses (No preference/preference) 2) Determine critical values (chance model) 3) Calculate Chi-sqaure statistic 4) Decision and conclusions
Advantages to ANOVA
1)Can work with more than two samples. 2)Can work with more than one independent variable
Two interpretations for ANOVA
1)Differences are due to chance. 2)Differences are real
Note about F values
1)F-ratios must be positive. 2) If H0 is true, F is around 1.00. 3)Exact shape of F distribution will depend on the values for df.
steps of ANOVA calculation
1. SS within 2. df within 3. variance within treatments (ss within/df/within) 4. F = variance between treatments / variance within treatments
ANOVA Hypotheses
1. There really are no differences between the populations (or treatments). The observed differences between the sample means are caused by random, unsystematic factors (sampling error). 2. The populations (or treatments) really do have different means, and these population mean differences are responsible for causing systematic differences between the sample means
between-treatment variance measures differences caused by:
1. systematic treatment effects 2. unsystematic, random factors
If there is no effect due to treatment: f =~
1.00
G =
"grand total" of all the scores
Nonparametric tests
(distribution free) make no parameter assumptions; parameters are built from data, not the model
Two-factor ANOVA will do three things:
- Examine differences in sample means for humidity (factor A) - Examine differences in sample means for temperature (factor B) - Examine differences in sample means for combinations of humidity and temperature (factor A and B).
Looking at the data, there are two kinds of variability (variance):
-Between treatments -Within treatments
Variance between treatments can have two interpretations
-Variance is due to differences between treatments. -Variance is due to chance alone. This may be due to individual differences or experimental error.
We might want a nonparametric test if:
-We don't meet the assumptions for a parametric test -We want to do tests on medians -They aren't just tools to break out if assumptions fail.
error term
-the denominator of the F-ratio -it measures only unsystematic variance
ANOVA test statistic (F-ratio) is similar to t-stat why?
actual variance between sample means / variance expected by chance
Bonferroni correction
alpha is divided across the number of comparisons sig = α / k
Planned comparisons (a priori tests)
are based on theory and are planned before the data are collected • More powerful tests • Can be more liberal with our error rate
Chi-Square statistic
comparing observed frequencies to those expected by chance
r =
degree to which X and Y vary together / degree to which X and Y vary separately
Correlation measures the
direction and degree of of the relationship between X and Y
f e =
expected frequency
If there is a significant effect due to treatment
f > 1.00