PSYC 305: Statistics for Experimental Design
Steps for one way Anova
1. Assure independence of observations 2. check normality and equal variance assumptions 3. ANOVA 4. Post Hoc if F is significant.
Assumptions of T test
1. both populations are normally distributed 2. each subject is independent (random sampling and random assignment) 3. standard deviations of the populations are the same (homogeneity of variance).
Normal Quantile Plot (Q-Q)
1. compute percentile rank for each score (percent of scores below that score) 2. based on percentile rank compute theoretical z scores, do this for every score in sample. 3. calculate actual z scores 4. Plot them against each other, want straight line.
Assumptions of One-way repeated measures ANOVA
1. distribution of observations on the DV is normal in the population within each treatment factor. 2. Sphericity: variance of different scores is equal for all pair wise comparisons 3. Dependant observations: we know which observations are dependant -> which 3 time points are connected due to same person for all three scores.
What needs to be included in APA style report for ANOVA?
1. overviews which describes IV and DV stated conceptually. 2. Describe F test (effect size) 3. description of patter of mean differences (include each group M (mean) and SD) and if significant difference found or not. 4. conceptual conclusion
One-Way ANOVA Assumptions
1. population distribution is normal within each group (normality assumption) 2. the variance of population distributions is equal for each group (homogeneity of variance) 3. independence of observations
factorial design ANOVA
2 or more IV with 2 or more levels
Advantages of One-way repeated measures ANOVA
Advantages: more power -> ability to reject null hypothesis when false fewer participants needed Disadvantages vulnerable to fatigue and practice effects ->may require counter balancing if possible.
How is total variation divided up for 2 way ANOVA
Again SST=SSR+SSM, however, SSM can be divided into SSA, SSB and SS a x b. Find individual F values for each factor and the interaction.
correlated samples t-test
Also known as repeated measures. Each participant goes through both conditions in the experiment ex. preference of score before and after seeing add.
Kolmogorow-Smirnow Test (K-S Test)
Asseses normality but can also be used for other other distributions (ex. F and chi squared). Less power than S-W (K-S may not reject the null but S-W does). Conceptually takes sample mean and SD, generates normal distribution and then compares sample scores to these scores.
The Fmax Test of Hartley
Calculate the variance for each group, find the largest and smallest variance value and then divide larger variance by the smaller variance to get F max which can then be compared to F crit. Assumes each group has equal number of observations.
Computing z score
Can only be used when standard deviation in the population is known. Also assumes variable X in population is normally distributed, random sampling from population. Convert score (X) to z score, which moves distribution so SD is 1 and mean is zero. z=(score-mean)/SD. Your z score is equal to SD scores you are above or below mean. ex. z score of 0.62 means you are 0.62 SD above mean (since SD is 1 and mean is zero). Note shape of distribution does not change. Allows for easier interpretation. We can use table to see the percent of scores above or below that Z score.
Nominal
Classify objects, can use numbers to index categories, no order. ex. gender, country of origin, native language. Dichotomous is when two choices (ex. control or treatment)
Sphericity Assumption
Conceptually the idea is you take two levels ex: baseline and week 3 and compute all the differences between values (lined up per subject). Next you take these difference values and then computed the variance for these difference factors. We could then do the same thing for week 3 and week 6 and then baseline and week 6. Sphericity assumes that at the population levels, the variance of the differences are equal for all pair wise comparisons. Sphericity is assumption required for the F test in ANOVA to be accurate, if sphericity doesn't hold high time 1 error rates.
Methods of assessing normality
Descriptive and inferential statistics: -skewness -K-S and Shapiro Wilk Test Visual Methods -Histogram -Normal Quantile (Q-Q Plot) We use visual tests because if large sample size, small departures in normality may be picked up but visually look okay.
Shapiro-Wilk Test (S-W)
Designed specifically to test normality, usually more powerful than K-S (more likely to reject null when null is false). As sample size gets bigger, gets more sensitive.
Does rejecting the null hypothesis infer causality
Even if say headache goes down over treatment time, the IV is time and not the treatment. This is not an experimental design and therefore cannot infer causality. Could be other factors causing decrease in headache over time.
True or false: As we increase the sample size, the type one error rate decreases?
False: type one error rate stays the same, it depends on alpha and not the sample size. Alpha by definition is probability of committing type I error.
HSD
HSD is the minimum absolute difference between 2 means for them to be statistically significant. Calculate HSD once and then compare all the differences to that value.
Tukey's HSD (equal sample sizes)
If equal sample sizes are assumed and all comparisons represent simple differences between means. Test uses Q or the studentized range statistic. We then compare observed Q to critical value Q based on K and N-K and alpha. Reject null if Q is greater than Q crit and means difference between that pair is statistically significant.
Confidence Interval
If we repeated the experiment many many times, 95% of the time the confidence intervals will include the true mean. For alpha is 0.05, confidence interval is 95%. If alpha is 0.01, confidence interval is 99%. As sample size increases, we decrease confidence interval. If alpha decreases, confidence interval becomes larger and wider.
Normal (Gaussian) Distribution
In many statistical techniques for experimental designs, the dependant variable is assumed to be continuous and normally distributed. Mean=medium=mode. If given SD and and mean, can draw normal distribution. If we go one SD from mean in both directions (whatever value calculated for SD) on either side, 68% of values will be in that region. 95% of values within 2 SD and 99% within 3 SD. If we want to find out how extreme a value is, we can use computations to find out how far in the tail the value is and probability of getting that score.
Degrees freedom 2 way anova
K-1 for main effects and multiply the two main effect degrees of freedom for the interaction effect. For MSR df=ab(Ngj-1) where Ngj is number of observations in each cell.
Degrees of freedom
MSB is N-1 MSM is K-1 MSR is these two multiplied together.
Levene's test
Most common test for homogeneity of variance, tests null hypothesis that population variances are equal. Sensitive to large sample sizes and may reject null a lot. If p<0.05 reject the null. Measures how much scores deviate from group mean.
Left Skew
Negative skew and tail goes out to left. Mean is smaller than the medium.
Effect sizes for 2 way ANOVA
Ng is the number of people per cell (assume same in each). For each σ^2 divide by total σ^2 total to find omega squared. σ^2 is just MS minus MSR and multiplied by degrees of freedom and divided by Ng (A)(B). σ^2 total is just adding all these values and then also adding MSR.
One way independent group designs ANOVA
One IV with multiple levels (2 or more), each subject in one category.
Standard Deviation
Problem is when you take variance, number is kind of arbitrary because units are not the same as date set units ex. 200 is variance but what does that mean. People prefer to speak in standard deviations, which is the square root of variance. Has the same units as the original data and shows the variation about the mean. Can give sense of how far on average you are from the mean. Say one data set has 10 times the SD, means on average data sets are 10 times as far from mean. Symbol is s for sample and σ for population
Degrees of Freedom
Provides unbiased estimate for computing sample variance. If you take population variance, you would divide by N (total population size). Think of how the population has some true mean, μ, and when we randomly sample from population we ca find the sample mean X. X will always be within the sample values, however, we do not know that X matches μ. Depending on how we sampled, μ could be outside of our sample values or somewhere on either end. When we calculate variance of sample we will find some spread, however, it is likely that in real life the variance is greater. By dividing SS by N-1 instead of N, we are making our S^2 value bigger and a better estimate for true population variance. Therefore makes the sample variance unbiased. As N gets bigger (population size), it will become closer and closer to n-1. This is because with bigger sample size it is more likely we have the true variance, we need to correct less. If sample size is very small more likely population mean is outside. Degrees of freedom is also the number of independent pieces of information we use to compute or estimate a statistic.
Hypothesis testing about two means - T test
Purpose is to test whether two unknown population means (μ1 and μ2) are different from each other based on their samples. Ho: μ1=μ2 H1: μ1 does not equal μ2 Sample can be independent or correlated D is difference between means
Tukeys HSD for 2 way anova
Q crit deterring by degrees of freedom. k number of levels for that factor and degrees of freedom for residual. For the number of people in each group use the number of people in that level (ignoring other factor)
Interval
Quantitative variable. rating data with equal distances between numbers. Assigned numbers have meaningful units and unit size remains constant. ex. poor (1), average (2), good (3) and excellent (4).
Ratio
Quantitative. Interval but with absolute zero or meaningful origin. Half as much or two times as much makes sense. ex. height, weight, pulse rate, time etc.
SSR in 2 way ANOVA
SSR is sum of squared means between a cell mean and cell observation across all cells. Think of as if computing variance for each cell. Find cell mean and then subtract individual values against this and then do for all cells.
Dividing of variation in One-way repeated measures ANOVA
SST can be divided into SSW (within participants) and SSB (between subjects). SSW can further be broken down into SSM and SSR. This allows us to partition off MSB and variation caused between subjects. Basically we don't care what's happening differently among the subjects, we just want to see if within each subject there is a significantly significant change. MSM is the variation between groups of treatment means. By partitioning off MSB, MSR is smaller which leads to more power.
What happens when you have violations to homogeneity of variance?
Serious violations tend to inflate type I error rates as it increases F statistic.
Calculating SSR
Since SST=SSB+SSM+SSR we can calculate SSR with algebra.
The effect of degrees of freedom on t distribution
The t distribution varies in shape according to the degrees of freedom (N-1). As df gets larger due to increased sample size, the t distributions approaches normal distribution. (T -> Z as N increases). After calculating t statistic, compare to t critical with desired alpha and degrees of freedom. Again this is two tailed alpha, 2.5% on one side and 2.5% on the other side. If T is greater than critical value we reject the null hypothesis (alternatively use p value calculator and compare to 0.05)
Why not test all possible difference between means with t tests
This would lead to inflated type one error rates. The more statistical tests we perform, the more change we will have type one error rate as each test has chance of it happening. Based on fact that 5% chance of type one error rate ex. 20 t tests and one will have type one error vs could perform on independent samples ANOVA.
T distribution
Unlike z distribution (standard normal), t statistic follows t distribution which looks the same as z but with fatter tails. Normal distribution would drop down to zero quicker, t distribution has greater uncertainty due to sx instead of σx
Calculating confidence intervals
Use the mean plus and minus the critical value multiplied by the standard error. For independent samples t test us D instead of mean and Sd.
Histogram
Want histogram to apear relatively symmetric, but this can be difficult to visually assess. Do different histogram for each pair of groups in ANOVA.
Z test
We can apply the z test to our sample mean under the assumption the null hypothesis is true. Z= ((sample mean)-μ₀)/σx -σx is the standard error -μo is the test value or mean under the null hypothesis The Z statistic tells us how extreme the sample mean is, how extreme our sample mean is compared to our predicted null hypothesis mean. Again converts our sampling distribution to standard normal distribution (SD=1 and mean=0). If alpha is 0.05, then Z crit is 1.96 and we can reject the null hypothesis is z score is above that (this is for two tailed which means 2.5% in one tail and 2.5% in the other tail with z=-1.96. Therefore Z test used for single sample mean to see if sample mean is equal to hypothesized value, only use if we know the standard deviation of population.
sampling distribution
We imagine we have a population and it has a parameter ex. mean. By taking samples of the population, and measuring a statistic (mean) we want to be able to predict the parameter in the population. Sampling distribution is if we imagine taking a sample of population many times, computing the mean for each sample and then plotting the distribution. Middle of the sampling distribution (most common mean) should be around the parameter of the population (mean of population).
Skewness
When testing skewness, null hypothesis is equal to zero. (above zero means positive skew and below zero means negative skew). Find Z skew= Skewness statistic - 0 /SE skew. Same general format a Z test. reject null if skew is greater than 1.96 (compared to standard normal distribution)
Turkey-Kramer Test
When the group sizes are unequal. Have to find different HSD for each difference since each combo has different sample sizes.
When is F the same as T
When there are only two means, F=t^2
Mean
average, vulnerable to extreme values
independant samples t test
between subjects design, each participant goes into one of the two conditions. Could either be comparing 2 separate populations or could be taking from one population and ransoming assigning to two conditions (treatment and control)
Effect size
calculate using omega squared, equation looks very complicated because it has been adjusted for bias.
Sheffes Test
can be used if groups have different sample sizes and more robust to certain assumption violations. Should use if you are worried you violated assumptions (such as normality or equal variance). The most conservative test, very unlikely to reject null hypothesis (good to avoid type 1 error rates). Good to use if results are very important ex. involve money or giving drugs to people, do not want to falsely reject the null. Sheffes test is conceptually the same as doing F test for all mean comparisons but the difference is critical F statistic is higher. Regular F crit multiple by (K-1). If F crit is bigger, need bigger F to reject the null, more variation must be due to MSR.
SS AxB for 2 way ANOVA
can find by subtracting everything since SST=SSA+SSB+SSAB+SSR
For 2 way ANOVA what does it mean if factors are completely crossed?
contain all possible combinations at the level of factors. We assume between subjects design and different subjects for each condition.
Calculating effect size
convert the t value to an r value. Higher r means stronger effects. r^2 is the proportion of DV explained by the IV (part explained by IV/total variance)
Calculating SST
difference between each score and the grand mean
Limitations of normality tests
easy to find significant results (reject null hypothesis about normality) when sample size is large.
One way design with repeated measures ANOVA
ex. IV is the therapy and levels are time points. Each subject undergoes therapy and undergoes measures at 3 time points.
One-way repeated measures ANOVA
experimental design where measurements of a single DV repeated a number of times within the same subject (longitudinal study). Also called within subjects. Example is study that measures DV before and after treatment.
Interaction effect
extent to which the effect of one factor depends on the level of another factor. When the effects of one factor on DV change at different levels of the other factor. Called crossover effect but do not need to cross. Lines not parallel=interaction. Ex. there is only an effect of gender in one of the conditions and not the other.
T-test (one mean)
gives same information as z test but do not need to know the SD of population. Instead we use the standard deviation of the sample (s) and use sx as the standard error (when we see sx as standard error, tells us that we didn't know the SD of population). T= (sample mean-uo)/sx sx=s/√N
Density
height of the bell curve at different values for X.
When can we use post hoc for 2 way ANOVA
if the number of levels is greater than 2 for a factor.
Simple effect analysis
if we reject null for the interaction, we can further test the simple effects of each factor so as to clarify nature of significant interaction. Simple effect is the effect of one factor at each level of the other factor. We can normally compare in multiple ways, but one will make the most sense for the data. Want to do the way where you only compare 2 values as can do simple t test (one way ANOVA with 2 means) instead of whole ANOVA.
independence of observations
knowing the value of one observation goes us no information on other observations. very important assumption as cannot fix if violated. Only way to assess is to look how data was collected. Solution is random sampling and random assignment.
SSB
marginal mean for each participant minus the grand mean all squared and summed and multiple the entire thing by k.
Symmetric distribution
mean and medium are equal
central tendency
mean, median, mode
Balanced design
means equal sample size in each group.
Null hypothesis for 2 way ANOVA
must make 3 null hypothesis. The null hypothesis for main effect of A and main effect of B look at marginal means. Null for interaction is the interaction between factor A and B is equal to zero.
F distribution
no longer normal distribution, varies in shape according to dfM and dfR. F(dfm,dfr). F distribution is right skewed. Variety of shapes depending on the two degrees of freedom. If F value is way out in right tail we reject the null.
Power
out ability to reject the null hypothesis and represented by 1-β. β is the probability we committing type 2 error and 1-β is probability of correctly rejecting null hypothesis. Unlike alpha, power is effected by sample size. Increasing alpha will increase the power, but increases probability of type 1 error) Larger sample size results in increased precision and decreased width of sampling distribution, which results in increases power.
Right Skew
positive skew and tail goes to right. Mean is larger than medium. Tail is pulling up mean.
Effect size values for r (Cohen)
r=0.1 -> small r=0.3 -> medium r=0.5 (large)
Effect size for one way anova
r=√SSM/SST η^2=r^2=SSM/SST ω^2=SSM-dfM(MSr)/(SST+MSR) Closer to one tells us most of total variation in DV is coming from IV small=0.01 medium=0.06 large=0.14 η^2 is biased and normally higher than should be, omega squared is unbiased.
Ordinal
ranking data in order but tells us nothing about relative values ex. birth order or rank restaurants from worst to best. Not quantitative.
Type 1 error
reject Ho when it is true (this is determined by alpha, smaller alpha and less likely to commit type 1 error where we falsely reject the null hypothesis)
Type II error
retain null hypothesis when it is false
How to find SSB and SSA for 2 way ANOVA
same as SSM for one one way anova but with marginal means. Except instead find difference between marginal means and the grand mean and square them. Trick is we still need to weight them by Ng, except Ng is the number of observations that would be used to calculate the marginal means. Ex. if marginal mean in row, use the number of observations in that row.
Range
simplest measure of dispersion. Largest value - smallest value
Variance
standard deviation squared. Measure of dispersion or how much data points very from the mean. Variance is the SS (sum of squared deviations) over the degrees of freedom. SS means we take all the deviations (individual score minus the mean), square them and then sum them then we divide by degrees of freedom to get the approximate average squared deviations. σ^2 for population variance and s^2 for sample.
Effect size
standardize measure of the of the magnitude of treatment effect. Examples include: Pearsons correlation coefficient, omega or omega squared (ω^2) and η^2.
Calculating SSM
take the mean for each treatment level and subtract by grand mean and multiply by the number of participants.
Finding SST 2 way ANOVA
take very score, subtract by grand mean, square it and add them all.
Tukeys HSD
tells us which of the three treatment levels varies significantly. Find Q critical with (k-1)(N-1) and K.
Standard error
the standard deviation of the sampling distribution is the standard error. Gives us an idea of how variable estimates of the parameter are. Larger sample size and get smaller standard error, larger sampling size and more close to actual mean. Goal is to get as little distribution as possible.
Standard error of the difference
the variability between means based on chance alone. Find variance for each group (s1^2 and s2^2), multiple each variance by df and then divide by df1+df2 to find overall s^2. You can then calculate the standard error of the difference using these values. Note: the reason its the same as SS1 is because s^2=SS/(N-1) so SS=s^2(N-1). Conceptually imagine we find the sampling distribution for D. We take two samples over and over and compute difference and then plot the distribution of this. Then we want to find the SD of this distribution. Under homogeneity of variance assumption, we assume variance is the same in both groups s^2 is a pooled variance estimate. Note: df1+df2 is total degrees of freedom and needs to be used to find critical T.
Between subjects effects
this looks at the variability between subjects and treats each participant as a different level. We are typically not interested in this. If significant tells us the participants are different but may not have anything to do with IV. What we are interested in is if IV had effect on participants regardless of natural differences between participants.
Median
value in the middle. If even number find average between two middle. Less vulnerable to extreme values.
Mode
value that appears most frequently. Often used for nominal data.
Covariance
very similar to correlation, it measures how much two responses seems to very in the same or opposite direction. Basically covariance is the unstandardized version of correlation (units uninterpretable). The magnitude of covariance is very difficult to interpret meaning, sign will be the same as correlation. Formula for covariance is the same as variance except uses the difference between Y instead of second x. Correlation is just rescaling covariance between -1 and +1 as covariance has no bounds.
Compound symmetry
when homogeneity of variance (variance of observations equal at each level of treatment factor) and homogeneity of covariance are met. This is a more restrictive assumption than sphericity. However if compound symmetry holds, so does sphericity.
Are assumptions for two way ANOVA the same as one way ANOVA?
yes
Standard error equation
σ/√N