Stats chapter 7
T test for two means from two independent samples
Both must be random samples and have the same underlying normal distribution. ONLY USE Equal variances not assumed on SPSS outputs. t= t-test statistic Sig (2-tailed) = P value. Mean difference= mean difference (estimate1- estimate2) Standard Error difference = Se (estimate1- estimate2)
4. The standard deviations of the underlying distributions or populations are equal
Largest standard deviation/Smallest standard deviation < 2
Sign test
The non parametric equivalent to a one sample t-test. uses median difference.
df1 and df2
are the parameters of the F distribution and we write F(df1, df2,)
TUKEY
If you see all intervals include -, then it means that there might not actually be a difference in means. However if there are CIs where 0 is not included make sure you take note of them and compare them. use the P value and CI to interpret.
When S2B is large relative to S2W
Large f0, Small P Value.
T test can be used on the following:
Single mean Paired Data Two independent means
When S2B is small relative to S2W
Small f0, Large P Value
Parametric tests are
Superior to non parametric tests, but take the same independence assumptions.
n tot
is the total number of observations
Two tailed tests are event more robust with respect to
normality assumption than one tailed tests.
Two sample t procedures are even more robust against non-normality than
one sample t procedures, especially for samples of similar size and distribution shape.
T Test Assumptions
1. Observations from within the same sample are independent of each other 2. The two samples/groups are independent ie observations between samples/groups are independent of eachother. 3. The normality assumption: the population or underlying distributionsare normal. ie the data in each sample group have come from a distribution which is bell-shaped (unimodal and symmetric).
T test for paired Data
Approached in a similar way to single mean T Test. Usually for paired data you analyse differences within each units measurments. ALWAYS REMEMEBR: The output will be interms of differences between factor 1 and factor 2. and the output will give p value for 2 tailed test, half if you are doing a 1 tailed test.
Using a sign test.
Assign + or - or = values in respect to the hypotehesised value. Make your interpretation from the +/- balance. Exact.Sign (2 tailed) is your P value. You will need to half it if you are doing a 1 sided non parametric test. µ with a squiggle on top is used for the median difference.
To calculate the f-test statistic
F0= S2B/S2W (these two values can be found in the mean square colum on the ANOVA SPSS output).
S2B
Is a measure of the variability BETWEEN the sample/group means. (SPSS output, Mean square, between groups)
S2W
Is a measure of the variability WITHIN the samples/groups (SPSS output, Mean square, Within groups)
Non parametric Paired data testing
Non parametric tests dont have an underlying distribution assumption (whereas t-tests have the normality assumption).
P values for F test
P value = pr (F ≥ f0) where F ≈ F(df1, df2) (one tailed)
T test for single mean.
Remember to halve the p value if you are doing a one tailed test. In a one sample case, add the test value onto the CI values to get the confidence interval estimate.
F Test for One-way ANOVA: Hypotheses
We test H0- all of the underlying/population means are the same versus H1- not all of the underlying/population means are the same eg/ a difference exists between some of the means, at least two of the means are difference, at least one of the means are different from the others, the grouping factor and the response variable are related.
A small P value for F test indicates that
- The null hypothesis is NOT true, ie difference exists between some of the k means. BUT -gives no indication of which means are difference - gives no indication of the size of any differences ---> use confidence intervals. TUKEY
A large P value for F test indicates that
- The null hypothesis is plausible. - The difference we see between the sample means could be explained simply in terms of random fluctuations (could be due to chance alone)
F test for One-way ANOVA: assumptions
1. Obeservations from within the same sample are independent eg the samples are random. - CRITICAL 2. The samples are indepentdent ie observations from different samples are independent. - CRITICAL 3. The underlying distributions or populations are normal. - The f test is robust against departures form the normality assumption, as in the two sample t test. 4. The standard deviations of the underlying distrubutions or populations are equal.- the F test is reasonably robust with respect to the standard deviations assumption, but the tukey pairwise confidence intervals are not.
Necessary conditions when using T-procedures (when comparing 2 means from 2 independent samples/ groups)
1. Observations are independent - CRITICAL 2. The two samples/groups are independent - CRITICAL. 3. Data does not suggest an underlying separation into clusters or a multimodal nature. 4. Further properties of the sample data: For a small sample size (n1+n2 ≤ 15 or so) - no outliers -at most, slight skewness For a medium sample size (15 < n1 + n2 < 40 - no outliers - not strongly skewed For a large sample size (n1+ n2 ≥ 40 or so) - no gross outliers - data may be strongly skewed
Two forms of the two independent sample t-test
1. the pooled form of the two sample t-test carries the assumption of equal standard deviations (we don't use this form in this course) 2. The non pooled form (Called the welch two sample t test) which does not require the assumption that the underlying standard deviations are equal.. ie use output row that said equal variances NOT assumed
When the null hypothesis is true and the assumptions for F procedures hold, the sampling distribution of the random variable corresponding to f0 is
the F distribution, with: df1 = k-1 (in same column as between groups on ANOVA output) df2 = ntot-k (in same column as within groups on ANOVA output) df1+ df2 (in total (T) column on ANOVA table)
F test: Evidence against the null hypothesis
the data gives evidence against the null hypothesis, when the variability between the sample of group means is large relative to the variability within the samples or groups.
k =
the number of groups
df
using df = min (n1-1, n2 -1) gives a conservative result, it is smaller then the true value. Software packages give the true value for the degrees of freedom, but they involve complicated formula. As a result, the confidence interval produced by hand is WIDER then is necessary to ensure the methods success rate is 95% ie the true coverage rate of such intervals (using df = min (n1-1, n2 -1) for calculating the degrees of freedom) is at least 95%
F test is used
when there are more then two samples.