211 Test 3
T-tests reported in research articles
-Help us to find which differences were actually significant. -t (___) blank is the degrees freedom total. -Without asterisks = not much of a difference, significance level denoted as asterisks or crosses
Effect size for t-test of dependent means
-Similar effect size conventions. d = population 1 mean - population 2 mean / standard deviation
Planned contrast review continued b. COMPARING THE CRIMINAL RECORD GROUP (M=8) TO CLEAN RECORD GROUP (M=4)
-Within-variance (F denominator) the same. S^2Within = 5.33 FINDING THE BETWEEN-GROUPS VARIANCE 1: Variance of the DOM S^2M = E (M-GM)^2 / DF b-t S^2M = 4 + 4 / 1 = 8 2: Variance of the pop of individual scores (F numerator) S^2B = S^2M (n) 8 (5) = 40 3: F-ratio F = S^2B / S^2W 40/5.33 = 7.50 4: F cutoff Sig. level = 0.05 DF b-t = 1 DF w-in = 12 5: Reject/retain? F cutoff = 4.75 F score = 7.50 REJECT: The Criminal Record condition makes a person rate guilt differently from being in the Clean Record condition.
ANOVA practice •How does an analysis of variance (ANOVA) differ from a t test for independent means? •What does it mean if a null hypothesis is true in an ANOVA? •What is the denominator in an F ratio for a planned comparison? •What do the values in "F (4, 95) = 9.32, p < .05" represent? •(4, 95) = •How many groups were there in this study? •How many participants were in this study? •9.32 = p < .05 =
1. Both look at differences between different groups. ANOVA can be used to compare 3 or more groups; the t-test for independent means cannot be used to compare more than 3 groups. 2. There is less variance/less difference/not a significant amount of difference among means of samples than if the null were not true. 3. The overall within-groups pop variance, regardless of the 2 means being compared. B-t groups pop variance: Groups being compared W-in groups pop variance: Overall w-in group's population variance 4. What you might see in a research article... -> 4 = 5 different groups studied, df between [NGroups - 1] -> 98 = 100 total participants studied, df within [df1 + df2 ... dflast] -> 9.32 = calculated f-score -> p < 0.05 = alpha at which the f-score is significant -> F = f-ratio calculated
T-tests for independent means review •What is the comparison distribution for a t test of independent means? •What does it mean if you conducted a t test for independent means and found the t score equaled 0? •What is the most common way for a t test of independent means to be reported in a research article?
1. Distribution of differences between means: We have two samples that have two separate means and from those two samples we can create a distribution of differences between means and that is going to be our comparison distribution when we are conducting those t-tests. 2. Specifically what would that mean about your two samples and their means? The two sample means are equal. There is no difference between those two sample means and so that t-score comes out to be 0. Distribution of diffs. b/t means: A standardized score that talks to us about the diff. b/t these 2 samples. 3. t (60) = 4.7, p < 0.05 60 = df total (df1 + df2) t = We report that a t-test is conducted 4.7 = The calculated t-score. T with a total df of 60 gives us a t-score of 4.7. p < 0.05 = That t-score is significant. We set our significance level at 0.05, it came out to be significant. This t-score fell in the critical region, so we rejected that null hypothesis. The probability of us getting that answer if the null was true would be less than 5%, so that is really unlikely.
ANOVA practice •In an analysis of variance with a within-groups variance estimate of 5.3 and a between-groups variance estimate of 8.5, what is the F ratio? •If you have estimated population for 3 groups that are: 27, 49, and 36, what is the within-groups estimate of the population variance?
1. F = S^2B / S^2W = 8.5/5.3 = 1.60 2. S^2W = S^2/1 + S^2/2 ... S^2/Last / NGroups S^2W = 27 + 49 + 36 / 3 = 37.33. Just an average of these 3 groups' variances (27, 49, 36).
Study Guide Video: T-TEST FOR DEPENDENT MEANS: POP. MEAN IS ALWAYS 0 B/C RECALL IN EVERY T-TEST YOU ARE SUBTRACTING YOUR SAMPLE MEAN FROM YOUR COMPARISON DISTRIBUTION (THE SITUATION WHERE THE NULL IS TRUE). THE NULL IN THIS CASE IS THERE ARE NO DIFFERENCES IN THE TWO SETS OF SCORES, THAT IS THE MEAN OF THE DISTRIBUTION OF DIFFERENCE SCORES IS 0. THIS IS THE SITUATION WHERE THE NULL IS TRUE! 1. In a t-test for independent means, what does it mean when we reject the null? 2. What does it mean if you conducted a t-test for independent means and found the t-score equaled 0? 3. What do we compare using a t-test for dependent means? 4. What is the mean of a distribution of differences? 5. What happens if we run several t-tests using the same samples? 6. What does it mean if the null is true in an ANOVA? 7. What would be the numerator in an F ratio for planned comparison depend on? 8. If we were going to reject the null based on a statistically significant F ratio, what would that F ratio be generally? 9. What procedure would we use to keep the alpha at .05 for planned contrasts? 10. If we know an ANOVA (F) with three groups is significant, what is the most accurate conclusion? 11. F (3, 56) = 8.23, p < .05 12. What is an interaction effect? 13. What does it mean to dichotomize a variable?
1. The mean of one sample is so far from the mean of the other that the samples must come from populations with different means. 2. The two sample means must be equal. 3. A sample in which participants each have two scores, e.g. pre- and post-test scores. 4. Always 0. 5. The chance of one of the t-tests being significant will be greater than 5% (assuming we set the alpha at 0.05). 6. There is less variance among the means of the samples than if the null was not true. 7. The numerator would depend on which pair of means is being compared. 8. Much larger than 1. 9. Bonferroni 10. The three groups do not come from populations with the same mean. However, it is not clear which means are different from one another. Would have to perform planned or post-hoc comparisons. 11. # groups = 4 | # participants = df within + # of groups = 60. Calculated f-score, 8.23. Set alpha at .05. 12. When the influence of one variable that divides the groups changes according to the level of the other variable that divides the groups. 13. Dividing the variable into two groups, one that is high on the variable and one that is low on the variable.
Figuring a two-way ANOVA (degrees of freedom, effect size)
DFRows = NRows - 1 DFColumns = NColumns - 1 DFInteraction = NCells - DFRows - DFColumns - 1 EXAMPLE: 3 rows by 3 columns (3x3) 9 cells - 2 for DFRows - 2 for DFColumns - 1 = 4 DFWithin = df1 + df2 + ... dflast All of your individual groups! DFTotal = N -1 Total N (number of participants) EFFECT SIZE ON IMAGE.
Calculating an ANOVA: F-ratio and the f-table
F = S^2Between / S^2Within F-table: Significance value (0.01, 0.05, 0.1) Between-groups degrees of freedom: df b/t = NGroups - 1 Within-groups df: dfwithin = df 1 + df 2 ... df last EXAMPLE: 3 samples of 5 each (N-1) + (N-1) .... (N-1) 4 + 4 + 4 = 12
Variance of the DOM and the SD of the DOM
Important distinction: -When estimating pop variance, divide the sum of squared deviations by df (N-1) -When figuring variance of distribution of means, divide estimated pop variance by full sample size (N)
A school psychologist believes the students in 6th grade experience higher rates of depressive symptoms than students in other grade levels. The school psychologist uses a depression inventory—on which a score of 0 indicates no depression and a score of 10 indicates severe depression—to see if the level of depression in sixth graders in a class of 20 students differs from that of students in other grade levels at the school. The results of previous testing indicate that students in general at the school usually have a score of 5 on the scale, but the variation is unknown. The current sample of 20 sixth graders has a mean depression score of 4.4. -What kind of test will be used? -If the unbiased estimate of population variance is 15, what is the variance of the DOM? -If the null hypothesis is that the level of depression of 6th graders does not differ from other students in general, and the psychologist calculates a t score of -0.20, what would be the decision related to the null hypothesis? -If the psychologist figures S = .85, what is the effect size (d)?
M = 4.4 compare this mean score to the comparison distribution mean score of 5. 1. T-test for single sample. 2. S^2M = estimated population variance / number of participants or sample size 15 / 20 = 0.75 3. Set cutoff score based off our DF (N-1, DF=19). T-Table, DF=19 | No matter our 1 or 2 tail test or even our sig. level, it is very clearly less extreme than the cutoff t-score. FAIL TO REJECT THE NULL/RETAIN NULL. Results inconclusive. 4. d = Population mean - Sample mean / S 5 - 4.4 / 0.85 = 0.71 (Medium/large effect size)
F-tests in research articles continued
Might also see it reported like this. -Shows all the given scales to the side. -Your three groups (in this case three attachment styles). -Then your F with the degrees freedom between, degrees freedom within. Based on this table, you know there were 3 groups and about 574 participants. -Asterisks indicate significance results in F column.
F-tests in research articles
Might see reported given... The source of variation | SS | df | MS | F Between groups -> SS divided by df to equal MS Within groups -> SS divided by df to equal MS Between MS / Within MS = F
Calculating an ANOVA: Between groups population variance
RECALL FINAL FORMULA: F = S^2Between / S^2Within OR MSBetween/MSWithin Estimating the pop variance from the differences between group means. STEP 1: Estimate the variance of the distribution of means S^2M = E (M-GM)^2 / DF between Sum of squiares with a twist: mean of each sample subtracted from GRAND MEAN (mean of all samples together), square those diff. scores and then add them together to get your S^2M numerator. DF b/t = NGroups - 1 Example: 3 groups, DF b/t = 2 STEP 2: Estimating the pop variance of individual scores (numerator of F-ratio) S^2Between = S^2M (n) N = number of individual scores S^2M = variance of the dist. of means
Calculating an ANOVA: Within groups population variance
RECALL FINAL FORMULA: F = S^2Between / S^2Within OR MSBetween/MSWithin Estimating the pop variance from the variation of scores within each group (denominator): S^2Within: S^2/1 + S^2/2 + S^2/3 .... S^2/Last divided by N Groups
Effect size for ANOVA
R^2 (ETA SQUARED) SMALL = 0.01 MEDIUM = 0.06 LARGE = 0.14 Proportion of variance accounted for by R^2 R^2 = (Between groups variance x degrees freedom of between groups) / (the numerator calculation) + (within groups variance x degrees freedom of within groups)
Estimating the population variance for the distribution of differences between means STEP 2: Figuring the variance of each of the two distributions of means STEP 3: Figuring the variance of the distribution of differences between means STEP 4: Figuring the SD of the distribution of differences between means SHAPE STEP 5: T-score for the difference between the two actual means STEP 6: Reject/retain null?
STEP 2: Experimental Group: S^2Pooled / N1 75 / 11 = 6.82 Control Group: S^2Pooled / N2 75-31 = 2.42 STEP 3: S^2 Difference = S^2M1 + S^2M2 = 6.82 + 2.42 = 9.24 STEP 4: SDifference = square root of S^2 difference = square root 9.24 = 3.04 SHAPE: t-distribution based on the df total df total = df1 + df2 dftotal = 10 + 30 = 40 One-tail test [INCREASE], 0.05 significance CUTOFF: +1.684 STEP 5: t = M1 - M2 / SDifference Mean of experimental group sample - mean for control group sample / SD of the distribution of differences between means (198-190) / 3.04 2.63 STEP 6: REJECT OR RETAIN NULL? Reject the null; more extreme than our cutoff score. Support for research hypothesis.
Bonferroni Procedure
•Multiple-comparison procedure in which the total alpha percentage is divided among the set of comparisons so that each is tested at a more stringent significance level •With multiple contrasts, you will have more than a .05 chance of getting a significant result if the null hypothesis is true Steps... 1. Divide the overall significance level by the number of planned contrasts 2. Use the resulting significance level for each comparison of a pair of means •5 planned contrasts = .05/5 = .01 -> Each contrast you will actually look at using the 0.01 significance level. •3 planned contrasts = .05/3 = .017 •4 planned contrasts = .01/5 = .0025 EXAMPLE: Significance level, 0.05. Bonferroni procedure spreads that 5% out among the set of comparisons, so each is tested at a more stringent significance level. Controls for what we talked about as being "too many t-tests" like in independent samples and t-tests. You increase the likelihood that one t-test (in this case one t-contrast) is more than 5% likely. But if you're going to do multiple comparisons, you are going to have to find control of that ahead of time.
Accuracy in estimating S^2 and summary of formulas
-Accuracy lost when estimating pop variance -Adjust this loss by making the cutoff sample score for significance more extreme (moving our significance level for example from 0.05 to now 0.01) -An exact distribution takes this loss of accuracy into account. Photo: -Variance of any of these (SD -> S) can be found by simply squaring the value. -Subscript of M = DOM
Estimating the population variance for the distribution of differences between means STEP 1: Pooled variance
-Assume the populations have the same variance as sample -Pooled estimate of the population variance (working with this distribution of diffs. b/t means) -Experimental group (1): N = 11 M = 198 Estimated pop variance = 60 <- Assume that our sample variance is equal to our population variance -Control group (2): N = 31 M = 190 Estimated pop variance = 80 dftotal = 10 + 30 = 40 S2pooled = (10/40) (60) + (30/40) (80) 75
T-test for single sample review -In what sense is a sample's variance a biased estimate of the population the sample is taken from? Why can't we just say the sample's variance is the same as the pop variance? -What is the difference b/t the typical variance formula and the formula for estimating a pop's variance from scores in a sample (AKA unbiased estimate of the pop variance)?
-The sample's variance will in general be slightly smaller than the variance of the population. -Typical variance divide by number of participants in a sample minus 1
Sources of variation
Sources of variation in within-groups and between-groups variance estimates WITHIN-GROUPS 1. Variation among scores in each of the populations = variation due to chance factors Example: One stats class compared to another. Everyone in Class A studied more or something like that - things we do not know when looking at the two sample means. Classes may have been at different times of the day. BETWEEN-GROUPS 1. Variation among scores in each of the pops = variation due to chance factors [Noise, but the thing we are actually looking for is variation due to a treatment effect.] AND 2. Variation among the means of the pops = variation due to a treatment effect Diff. b/t groups also see the variation according to chance factors BUT it also considers variation due to a treatment effect. EXAMPLE: Treatment effect - teaching method. What we are actually trying to compare the groups between. PHOTO: Section 1 has no within group variance Section 2 has quite a bit of within group variance Section 3 has some, less than Section 2 but not 0 within group variance Within group variance will ALWAYS occur whether there is or is not any b/t groups population variance! The B/t groups variance (M of all 3 groups) the variance b/t these 3 means is actually 0, but we do see variation among exam scores within the sections 2 and 3. NO TREATMENT EFFECT, means of samples are all the same.
Estimating the population variance
-A sample's variance cannot be used directly as an estimate of the pop variance. -It can be shown mathematically that a sample's variance will, on average, be smaller than its pop's variance. -Ordinarily, variance is figured as the sum of the squared deviations from the mean divided by the number of participants in the sample: SD^2 = SS/N, which gives a biased estimate of the pop variance because the sample variance is actually a little smaller than it actually is (the pop variance). -An unbiased estimate of the pop variance (S^2) is obtained by modifying the formula. -The sum of squared deviations subtracted from the mean, squared, and then added together. -We just modify what is in the denominator. No longer N, but N-1. Makes it a slightly smaller number which means its gonna make our variance slightly bigger.
ANOVA assumptions
-All populations normally distributed -Robust to moderate violations, especially if N is not small (get a big N so the assumption of normalcy can be violated and still give you accurate results) -A problem if populations skewed in different directions -All populations have the same variance -Robust to moderate violation -A problem when the largest variance estimate of any group is 4 or 5x that of the smallest EXAMPLE: Group A: Small variance Group B: Very very large variance 4 or 5x the smallest group (A's) variance
Cutoff t-scores and t-test for a single sample
-Cutoff scores similar to z-scores for t-distributions. Similar process to finding the cutoff score and then finding our obtained z-score. -Most have the significance level (usually 0.05) -Find the degrees of freedom. Example: If you had 16 participants, your df would be 15 (N-1). If significance level was 0.05 and you were doing a 1-tail test, then your cutoff score would be 1.753. SINGLE SAMPLE T-TEST: Used to compare a sample mean to a known population mean, but the variance of the pop is unknown. -Similar to z-tests. t = sample mean - population mean divided by the standard error of the distribution of means
Controversy: Dichotomizing numeric values and its solution
-Dividing the scores for a variable into 2 groups AKA median split Example: Age and level of education. Age is a continuous variable, but we divided it into 2 groups of younger and older. Example: Aggression scores in children that range from 1-30. 1-15 = Low aggression 16-30 = High aggression -Divide kids into groups - any kid that scores within 15 points would be in low aggression, whereas those who scored 16-30 are placed in the high aggression group. -Advantages: you can do a factorial ANOVA, efficiency, examining interaction effect. -Disadvantages: lose information when reducing range to few categories, less accurate, misleading grouping of scores Example: Range of 1-30, reduce to two categories, can be a little less accurate and a misleading group of scores because in this example - what is the difference between someone who scores a 15 and a 16? Are they really that different? And we're putting them in low aggression and high aggression, may not be the best way to do that. Solution? Multiple regression. No need to dichotomize.
-T-test for independent means assumptions -Effect size for the t-test for independent means
-Each of the population distributions follows a normal curve -The two populations have the same variance -Estimated effect size after a completed study: Estimated d = M1 - M2 / SPooled Mean of Group 1 - Mean of Group 2 / Pooled Standard Deviation
Within-groups estimate of population variance
-Estimating pop variance from variation from within each sample -Based on the variation among the scores in each of the samples. -Not affected by whether the null is true. Will occur whether the null is true or the research hyp. is supported. The typical random variation that happens regardless of what we are looking at between groups. -Null: Has to do with whether the means of the populations differ Separate populations and wanting to know how they differ. Not the diff. that is happening within that 1 population. -Within-groups estimate is not affected by null being true because the variation within each population (which is the basis for the sample variation) is not affected by whether population means differ. -Even if population means differ from one another or if they do not there is still going to be some naturally occurring variation within those populations. EXAMPLE: Two different stats classes according to teaching method and looking at exam scores in those classes. There is going to be or may be some differences b/t Class A and B in terms of their mean scores on an exam and that might be related to diff. things we could be interested in, but also within Class A, not everyone in Class AA is going to score the exact same. That mean is determined with all of the variation that happens in that class. Same with Class B. B will have some variation within its own group too. That will happen regardless of whether Classes A and B are gonna be diff. from one another.
Between-groups estimate of population variance
-Estimating population variance from variation between the means of the samples. Actually comparing the samples/pops to one another. -Based on the variation among the means of the samples. Scores of the population will have naturally occurring variance, but the difference in the means of the samples is what we are typically interested in. -B/t groups population variance estimate IS affected by the null being true or not. If the null is true, then we are saying the b/t groups pop variance is 0, we say they are close/similar enough to be considered the same. If the research hyp. is supported, there is enough variation b/t these two samples that our RH has some support. NULL IS TRUE: Population means differ less; smaller variance estimate NULL IS FALSE: Means of the pops differ. B/t groups estimate is bigger b/c variation among the means of the pops is greater when the pop means differ. B/t groups estimate is big enough b/t them that they differ.
Post-Hoc comparisons
-Exploratory approach; NOT PLANNED IN ADVANCE. You notice after some exploration that there is some significance happening. You should try to do planned contrasts if possible, helps to control in advance. AKA Pairwise comparisons because you are comparing all possible pairings of means (Rather than planning out which two you wish to compare specifically ahead of time.) -If using Bonferroni for many comparisons, the power for any one comparison becomes very low due to every decreasing significance (rejection region). There could be an effect happening but because you are making that rejection region smaller and smaller, that significance level lowers, the likelihood to detect that effect is also decreased/less likely. PROCEDURES TO KEEP OVERALL TYPE 1 ERROR AROUND 0.05 WHILE AT THE SAME TIME NOT TOO DRASTICALLY REDUCING STATISTICAL POWER... Depending on which type of data you have -Scheffé -Tukey, Tukey-B -Neuman-Keuls -Duncan, Duncan's D
Basic Logic of Factorial designs and interaction effects (Factorial ANOVAs): Factorial research design and interaction effects. NEW INTERACTION EFFECT DEFINITION. -Regular ANOVAs we looked at the effect of 1 variable across multiple groups.
-Factorial research design: -Effect of two or more variables examined at the same time -Efficient research design. Trying to understand potentially a lot of interrelated issues occurring. We look for main effects (like we do in a regular ANOVA), but we also look for interaction effects. INTERACTION EFFECT: WHEN THE INFLUENCE OF ONE VARIABLE THAT DIVIDES THE GROUPS CHANGES ACCORDING TO THE LEVEL OF OF THE OTHER VARIABLE THAT DIVIDES THE GROUPS. -Interaction effects: -Occur when a combination of variables has a special effect (Impact on our dependent variable coming from a combination of variables). We do not see just by the main effects alone. -Effect could not be predicted from the effects of the two variables individually.
Difference scores
-For each person, subtract one score from the other (assuming we have repeated measures) -Carry out hyp. testing with the difference scores. -Population of difference scores with a mean of 0 (The null is saying that all of these difference scores, the mean of them would be 0 because there would not be any difference.) -Population 2 has mean of 0
Viewing main and interaction effects in graphs continued
-Graph A: Interaction (pattern very different) -Graph B: Interaction -Graph C: No interaction, though the numbers are different the pattern stays the same (jumping up by $40k). -Graph D: Clear main effect between young and old group, but the pattern of staying the same across education levels is the same. So no interaction effect. -Graph E: Interaction not as pronounced. -Graph F: Interaction not as pronounced.
Number of participants per group needed to the amount of power you wish to have in a group with knowledge of your effect size: Finding power for ANOVAs table
-Helpful table for planning study out in advance. Do not want to spend a lot of time/effort/money in conducting a study if your study is not going to have very good power (is not going to be able to detect what you are looking for). -Example: Expecting large effect size (R^2 = 0.14), and you have 3 groups with 40 participants each, expect really high power 0.98)
The distribution of the differences between means
-If the null hypothesis is true, the two pops have equal means -If the null hypothesis is true, the two dists. of means have equal means -If the null hypothesis is true, the mean of the dist. of diffs. b/t equals 0. No difference between these 2 sample means if the null is true. OUR END GAME: t = Mean 1 - Mean 2 / SDifference Mean of one sample - mean of the other sample / standard deviation of our difference scores (what we work backwards to find) PROCESS OF FINDING THE SDIFFERENCE: 1. Pooled variance of the 2 distributions: S^2Pooled = df 1 / df total (S^2/1) + df 2 / df total (S^2/2) 2. Variance of the dist. of means S^2M1 = S^2Pooled / N1 S^2M2 = S^2Pooled / N2 [N = sample sizes of each group] 3. Variance of the dist. of the diff. between means S^2Difference = S^2M1 + S^2M2 4. SE of the dist. of the diff. between means SDifference = square root of S^2Difference 5. T-value t = M1 - M2 / SDifference
Estimated population variance (S^2)
-In order to compare a sample mean to a population with a known mean, but an unknown variance, the variance of the pop must be estimated. -Usually, the only info available about a pop is a sample from the pop. -Therefore, the assumption that Pops 1 and 2 have the same variance is necessary. -Variance of the sample should provide info about the pop (Think about the representativeness of that population by that sample). -If the sample variance is small, the pop variance is probably small. Likewise if a sample variance is large, the pop variance is probably large. Image: -Variation in the population from which the sample is taken from. -Variation in sample scores. -Example A: Normal curve: match onto each other B: If the variance in the sample is small, then the population variance is assumedly small too. C:Assume that the pop the sample came from has a larger variance too. Sample with a much larger spread/variance.
T-tests for independent means
-No known population variance. Scores are considered independent because they are obtained from different participants (no pre-test, post-test situations). Example: Testing new medication - control group might receive a placebo whereas our experimental group receives the medication. Compare what might happen between those 2 groups. WAYS TO USE: Comparing two samples like an experimental group and a control group Scores from the groups are independent because they are obtained from different participants LOGIC (PHOTO): -Different curves considered when doing a t-test for independent means. -Populations: Can be summarized by distributions of means. If you have a lot of samples of the same size, you can create a distribution of those samples' means. -Distributions of means: What we add in here is this dist. of the diffs. b/t means. The dist. that describes the diffs. we would be seeing between sample means that come from dist. of means that come from populations. All representative of the samples we have. -Samples: Any sample we ever derive will always come from an overall population.
Assumptions of the t-test
-Normal population distribution -T-tests are robust to moderate violations of this assumption Example: Pop distribution might not be completely normal, but t-tests can still get reasonably accurate results even when it is far from normal.
Basic logic of ANOVA
-Null: Several populations have the same mean (no diff.) -Research hyp.: Do the means of the samples differ more than expected if the null were true? There is more diff. here than we would expect if they're actually all the same. -Analyze variances - ANOVA When you want to know how several means differ, you are actually asking about the variation among those means. We do this two different ways. -Two diff. ways of estimating population variance: -Within groups (Just within people who are pet owners for example: What is the diff. within that group related to other factors not necessarily being looked at right now? Typical variation that occurs naturally, people are different from one another naturally.) -Between groups (Looking at the difference between pet owners and non-pet owners in terms of these things. Looking at the difference that occurs b/t these groups based on a shared/diff. commonality.)
Degrees of freedom
-Number of scores that are "free to vary." -There are N-1 df because when figuring the deviations, each score is subtracted from the mean. -Thus, if all the deviation scores but one are known, the last score can have only one value because it's also going to be dictated by our mean. Therefore, df = N-1 THEREFORE, the formula for S^2 using df can be written...
Approximate power for studies using t-test for dependent means table
-Oftentimes we have power tables or internet calculators that help us understand power. -Greater sample size, greater power. Same pattern follows for 2-tail tests but 2-tail tests are not as powerful as 1-tail tests because of having to split that rejection region in half. -Also, as the effect size gets larger, given the same number of participants, the power also gets larger. -What we could expect the power to be given each fo the diff. effect sizes. Number of participants in that test. -Example: 10 participants, small effect size, 1-tail test. Power is only 0.15 but as you increase your number of participants/sample size to 100, power jumps to 0.63 even with the same predicted effect size.
Viewing main and interaction effects in graphs
-Other way we can see interaction and main effects aside from tables is through histograms and bar charts. -INTERACTION: Pattern of bars on 1 section of the graph is different from the pattern on the other section of the graph. -Example image: In both situations, the easy test is a little lower than the hard test bar, but we know there is an interaction because the difference is so much greater for the highly sens. group (right side of the graphs). Definitely an interaction happening between sens. level and easy or hard test conditions.
F-statistic, f-distribution curve
-Our f-distribution is skewed positively, skewed to the right. -The F-table often labels the table with "df numerator" and "df denominator" YOU NEED THREE VALUES FOR YOUR CUTOFF F VALUE: 1. Significance level (usually 0.05) 2. DF b/t 3. DF w/in DF b/t can be thought of as your DF numerator (corresponds to the f-ratio formula of S^2 b/t / S^2 w/in). Example: 4 groups | df b/t = Ngroups - 1 Df b/t = 3 DF w/in can be thought of as your DF denominator
T-distributions
-T-distributions kind of follow some of the same shape we would expect in a normal distribution. -The more participants, the closer it is to that normal distribution (N of 21 versus N of 3). -There is one t-distribution for each number of df -The greater the number of df, the closer the t-dist. is to the normal curve. -When there is an infinite number of df, the t-dist. is the same as the normal curve.
Two-way ANOVA assumptions -Extensions and special cases of the factorial ANOVA Repeated-measures ANOVA : Factorial/ordinary ANOVA t-test for dependent means : t-test for independent means *DUE TO THE PRE- AND POST-TEST NATURE OF THE DATA!*
-Populations follow a normal curve -Populations have equal variance -Assumptions apply to the populations represented in each cell x x x x x x x x x x x x x x x x x x x x -Three-way and Higher ANOVA designs -Increasing to 3+ grouping variables Same as two-way, just additional main and interaction effects -EXAMPLE: Variation in exam scores if students take an exam in morning, afternoon, or evening and whether they had 1, 2, or 3 cups of coffee (3x3 example) -You would actually have 9 total groups/cells Morning, 1 Afternoon, 1 Evening, 1 Morning, 2 Afternoon, 2 Evening, 2 Morning, 3 Afternoon, 3 Evening, 3 -Repeated-measures ANOVA: analysis of variance for a repeated measures design in which each person is tested more than once so that the levels of the grouping variable(s) are different times or types of testing for the same persons -Repeated measures ANOVA is to a factorial ANOVA as a t-test for dependent means is to a t-test for independent means. -SAME PARTICIPANTS with multiple scores whereas in a factorial ANOVA and a t-test for individual means, we do not. Example: Looking at different assessments. Compare three different intelligence tests. Want to compare them across the same participants. Each participant takes each type of test. Three results for each participant just on different tests.
Controversies and limitations for t-test of dependence samples
-Repeated measures design We like this design, but you should always have a control group. Another group who takes the questionnaire for example, without later having the mindfulness procedure. Takes the post-test too. -For a comparison. Sometimes this can be difficulty and you can run into ethical considerations especially if you are looking at a treatment trial for a new drug. How are people put in the control group versus those who will get the treatment? Where good research design can come into play and hopefully address those things. -Have high power because the standard deviation of difference scores is usually low Could be different events happening b/t the two tests influence those numbers. Example: Looking at anxiety scores both before and after mindfulness training. Someone may have came in the first day and taken their questionnaire where there was just a lot of stressful things happening, like a student who had 3 exams coming up. Of course they went through the program they took their post-questionnaire, maybe they did not have 3 exams coming up. So there could be diff. things that are happening in the background. Participants could also drop out if they are not feeling like they see benefits. So the folks left for the post questionnaire might be people who are benefiting a lot from it, so you might be losing some of the data the t-test would suggest maybe your manipulation is not as strong. Also your initial test could potentially prime and create changes on its own. Example: Someone fills an anxiety inventory and realizes maybe their anxiety is way higher than they thought that it was and maybe they do things on their own now to potentially change that. -Weak research design without a control group The differences that happen of course you would like to attribute them to whatever manipulation or treatment that you used, but there are other things that could be happening as mentioned above, creating some background noise or changes. It is also possible that participants could have changed or improved regardless of the manipulation.
T-test for dependent means in research articles
-SE in parenthesis in Mean Score category -Difference Score under Difference category All t-values appear significant at 0.05 or 0.01 level as in T column. See a difference both in the whole sample and when females and male participants look at female/male targets on their own and rating gender status beliefs.
Approximate number of participants needed for 80% power table
-Similar to our power table is a table for the approximate number of participants needed to get 80% power for a t-test if tested at a sig. level of 0.05. Like to know this info up front. Example: Pay participants $5 a piece for participating in the study. Let's say we are writing a grant to try and get some of that money then we need to know that our study is going to be powerful so that hopefully they will give us the money to conduct our study. -If we expect a large effect size, we will not be spending as much money because you don't necessarily need as many participants. -If we're expecting a small effect size (common in psych research), or a medium effect size, we're going to need more participants. Good info to have about how many participants you would need depending on what you expect your effect size to be, whether you are doing a 1 or 2-tail test, setting it a 0.05 significance level, and aiming for about 80% power.
Basic Logic of Factorial designs and interaction effects (Factorial ANOVAs): Relationship between one-way analysis of variance (ANOVA) and two-way/factorial analysis of variance (ANOVA)
-Still analyzing variance - using a two-way ANOVA because it considers the effects of two variables that divide groups, i.e., GROUPING VARIABLES. -One-way ANOVA only uses one grouping variable. -Example: Kind of info about the defendant's criminal record. Whether it was Criminal Record, Clean Record, or NI, that was our grouping variable - we had only 1 grouping variable and 3 different groups within that 1 grouping variable.
Cell and marginal means example
-Study in which participants rated the originality of paintings under various conditions -What happens in a factorial ANOVA is we are able to look at main effects by comparing marginal means and further compare the cell means to see those interaction effects. Cell means: 6.5 - Landscape, contemporary 5.5 - Landscape, Renaissance 3.5 - Portrait, contemporary 2.5 - Portrait, Renaissance Marginal means: 6 - Landscape group (regardless of if contemporary or Renaissance) 3 - Portrait group (regardless of if contemporary or Renaissance) 5 - Contemporary group (regardless of if portrait or landscape) 4 - Renaissance group (regardless of if portrait or landscape)
Factorial ANOVA table example
-Study looking at how difficult an exam is according to sens. levels. High sens. versus low sens. and what happens in terms of negative mood ratings based on those effects. -We are looking at ratings of negative mood and the two variables that we are seeing have an effect on negative mood are test difficulty and sentitivity, but there is an INTERACTION that is happening between test diff. and sens. that is having an effect that could not just be explained by these two variables by themselves. What we see is the effect of test diff. on negative mood is different according to the level of sens. -For the students not highly sens., their level of negative mood is slightly higher with a hard test than with an easy test, Row A. However, for students who are highly sens., their level of negative mood is much higher with a hard test compared to an easy test. -So we're seeing that students who are highly sens. there is a bigger difference between their mood depending on if they got an easy test or a hard test compared to folks who were not highly sens. Maybe these folks are not as reactive to test diff., but something is happening for folks who are more highly sens.
Analysis of Variance (ANOVA)
-Testing variation among the means of several groups (more than 2 groups). So far up until t-tests for ind. means, we just had two groups. You could use ANOVA for just 2 groups too, but the t-test is a much simpler process and it gives you the same result. -ANOVA: Hypothesis-testing procedure for studies with three or more groups. -One-way analysis of variance: Examining ways in which the groups vary across one dimension EXAMPLE: Look at someone's score on a measure of jealousy according to 3 diff. attachment styles. Could compare the 3 attachment styles and see if there is a difference in jealousy ratings ALONE. -More than one dimension: Factorial ANOVA EXAMPLE: Looking at jealousy ratings and how that varies across BOTH attachment styles and binary gender or does jealousy vary according to attachment style and whether you have a pet or not?
F-ratios of the two-way ANOVA
-The THREE f-ratios needed in a two-way ANOVA (looking at the impact of TWO variables)... 1. Column main effect: F-ratio for grouping variable spread across column -Example: Whether or not someone is high school or college educated -> Main effect for education level 2. Row main effect: F-ratio for grouping variable spread across rows -Example: Whether or not they were older or younger -> Main effect for age grouping 3. Interaction effect -Layout image of the ANOVA table for a two-way ANOVA Source of variance: BETWEEN-GROUPS: Columns, Rows, Interaction WITHIN-GROUPS TOTAL SS (sum of squares) SSColumns SSRows SSInteraction SSWithin SSTotal DF degrees freedom) DFColumns DFRows DFInteraction DFWithin DFTotal MS (estimated population variances) MS = SS/DF MSColumns MSRows MSInteraction MSWithin F (f-ratios) F = MSb-t / MSw-in FColumns FRows FInteraction
Recognizing and interpreting interaction effects
-The variable we are looking at the relationship between or trying to determine or predicting is income. -We could just look at income based on age or just based on education, but we look at the relationship or interaction b/t age and education and how that influences someone's income. RESULT A: Younger group both HS and college resulted in same avg. income. For older groups, the d/f b/t older people with HS and college education is about 40-60K. INTERACTION EFFECT - Across younger people regardless of education stays same, but pattern in older people changes. RESULT B: INTERACTION EFFECT - The inverse pattern occurs. Not just based on the level of education (which is less than for someone who went to college who is young and more than for someone went to college who is older). DIFFERENT PATTERN! RESULT C: Young increase from HS to college by about $40k. Old same increase amount. NO INTERACTION EFFECT: No big difference. We do see two main effects (older generation tends to make more, college education tend to make more. With older and college educated making the most of the 4). RESULT D: NO INTERACTION EFFECT. Definitely a main effect for age (older folks make more across the board). RESULT E: INTERACTION EFFECT - Notice for younger folks from HS to college, make $20 more, but older folks make $40 more than HS educated peers if they went to college and make much much more than the same educated but different aged peers. DIFFERENCE IN PATTERNS! RESULT F: INTERACTION EFFECT - Younger folks make $15 more but older folks make $25 more between HS and college education levels. DIFFERENCE in patterns even if small.
T-test for dependent means
-Unknown pop mean and variance -Two scores for each person (Repeated measures design) Example: You give someone an anxiety scale questionnaire before a treatment and after a treatment. See if there is any change or difference based on them going through this treatment/mindfulness training. Same procedure as t-test for single sample, except... -Use difference scores -Assume the population mean is 0 How we might have dependent means of related samples: REPEATED MEASURES DESIGN - Pre-Post Design and Within-Subjects Design MATCHED PAIRS DESIGN - Experimental Manipulation (1 group gets a treatment while the other gets a placebo) and Natural Occurrence (If looking for gender differences, match difference genders with one another to see if there's a difference between them)
Approximate power for studies using t-test for independent means table -Controversies and limitations of t-tests generally
-Want to be sure we're pretty likely to detect an effect if there is one. We will need to adjust our number of participants accordingly to get that power. -Greater effect size, might not need as many participants. In social science and psych research oftentimes you are dealing with a small or medium effect size. Why we try to get as many participants as we can in a study - need to make sure we can detect an effect if one is there. -THE PROBLEM OF TOO MANY T-TESTS: Multiple t-tests in the same study - skewing the odds. Possibility any one of them turns out significant at 0.05 level by chance is greater than 0.05. Too many t-tests in the same study than the possibility. Running too many t-tests is like giving someone bonus rolls of dice for their dice rolling. It is more likely now they will land on what they are looking for or needed if they have multiple rolls of the die. -Looking at whether or not something is likely to happen. If it is less than 5% likely to happen if the null hypothesis is true, we are saying we think there is an effect.
Reviewing the three kinds of t-tests
-What ties all 3 t-tests is none have a known population variance. What makes our means dependent in dependent means t-tests is having 2 scores per participant, because there are 2 scores per person, the t-test is carried out on those difference scores. Formula across the board for DF is DF = N-1, but for a t-test of independent means, you must add the DF for both groups where df1 and df2 both equal N-1. -FORMULA OF T: For single sample, we know the population mean. For dependent means we are assuming this population mean is 0. INDEPENDENT MEANS: Means of both compared groups -> M1 - M2 / SDifference SD of the distribution of differences between means (SD of the difference scores) rather than the SD of the DOM like in dependent means and single sample t-tests.
Review of z-tests
-Z-tests compare back to known populations, more specifically known population variance. In real world research, however, it is rare to know the entire population variance or we will often compare groups of scores to one another without the info on the overall population. This is why we use t-tests. -T-test examples: -Two score for each person in a group of people (Pre- and post-test scores like anxiety ratings before and after some sort of mindfulness treatment). -Comparing experimental and control groups, so 1 group goes through a specialized program or manipulation and the other does not. We want to see if that program has an impact. -Matched pairs, comparing 2 different sections of a class.
Basic Logic of Factorial designs and interaction effects (Factorial ANOVAs): -Main effect -Cell -Cell means -Marginal means
1. Difference between groups on one grouping variable; result for a grouping variable, averaging across the levels of the other group variables 2. Particular combination of levels of the variables that divide the group. -Each of those 4 quadrants we saw in the table earlier Example: GV1: Sensitivity, GV2: Test difficulty -> Determining levels of negative mood. Cell: High sens/Hard test Cell: Low sens/Hard test Cell: High sens/Easy test Cell: Low sens/Easy test 3. Mean of a particular combination of levels of the variables that divide the groups in a factorial design in ANOVA (The number within those 4 quadrants seen in the table.) 4. Mean score for all of the participants at a particular level of one of the grouping variables. Example: Marginal mean for easy test condition | Hard test condition | High sensitivity condition | Low sensitivity condition
A counseling psychologist developing a technique to reduce anxiety symptoms has clients complete an anxiety inventory and uses this as a pretest measure of anxiety. One week later, Clients then attend a mindfulness workshop. Clients complete the anxiety inventory again one week after the workshop, which is recorded as the posttest measure. •What kind of test will be used for this research question? •If the psychologist wants to see if there is a change (increase or decrease in anxiety by clients who attended this workshop, the appropriate description of "Population 2" (the comparison distribution) difference scores will be...? •If the psychologist wants to see if there is a change (increase or decrease) in anxiety by 15 of the clients who attended this workshop using the .05 significance level, what would be the cutoff t score? •If the psychologist finds the sum of squared deviations from the mean of the difference scores of the sample is 280, what would the estimated population variance be?
1. T-test for dependent means - The difference b/t the pre and post-test matters, so our results are dependent on what happens between the post and pre-tests. The post-test does not really matter if we do not know what the pre-test measured scores were. In that way they are dependent on one another. 2. ZERO; MewM = 0, Compare our sample t-results to the results that there was no change potentially (Situation in which the null hypothesis is true). 3. Three pieces of info needed to determine cutoff score: -DF -Significance level -1 or 2-tail test N=15, DF=14 2-tail 0.05 sig. level +/- 2.145. Set on either side of the mean, cutoff scores on both tails. 4. S^2 = SS / N-1 280/14 20
T-test for dependent means practice: 1. What is the mean difference b/t a Z-score and t-score? 2. In what type of test do we use diff. scores? 3. In a t-test for dep. means, the pop mean always equals...? 4. Why do studies using diff. scores tend to have larger effect sizes?
1. We use t-scores when the population variance is unknown. If we know our pop variance, we can use z-scores. For t-tests we have to estimate our pop variance from our sample. 2. T-test for dep. means - means that the results for them is somehow dependent on one another. Example: Someone who takes an anxiety inventory before a mindfulness program and then they take that same inventory after it. What we're looking at, that second mean, is really just relative to what their first score was. The mean on that second test, that post test, is relative to what the mean score was on that first test. In that way they are dependent. 3. 0. Same thing as saying there is no difference b/t those post and pre-test scores. 4. Standard deviation of diff. scores tend to be low. The diff. between the two scores (Pre and Post test scores) do not have to be very different b/c the standard deviation is so low.
T-tests for independent means review A statistics professor interested in effective teaching methods compared two classes of the same statistics course using different teaching methodology. Higher scores indicated high effectiveness of teaching strategy. The results were as follows: Class A: N = 39; M = 82; S2 = 13 Class B: N = 43; M = 78; S2 = 10 S^2 = population variance So, Class B has more students but their teaching method was less effective. •If a t test for independent means is conducted, how many degrees of freedom will be used to locate the cutoff score in the t table? •What is the pooled estimate of the population variance? •If the standard deviation of the distribution of the difference between means is 0.75, what is the t score?
1. df = N - 1 df1 = 39 - 1 = 38 df2 = 43 - 1 = 42 dftotal = 38 + 42 = 80 Information needed to locate cutoff score: -dftotal -1 or 2 tail test -Significance level -Simply is there a DIFFERENCE b/t the teaching methods? 2-Tail test. -Significance level = 0.05 CUTOFF: +/- 1.990 2. S^2Pooled = df 1 / df total (S^2/1) + df 2 / df total (S^2/2) S^2pooled: (38/80) (13) + (42/80) (10) = 11.425 3. t = M1 - M2 / SDifference OR MA (Class A) - MB (Class B) / SDiff. From S^2A and S^2B to S^2Pooled to S^2M1 and S^2M2 to now the given SDiff. 82 - 78 / 0.75 = +5.33 Cutoff score: +/- 1.990 Sample t-score: +5.33 REJECT THE NULL. There is a significant diff. of teaching effectiveness b/t these two methods. 5.33 is more extreme than +/- 1.990.
Dependent means t-test example NULL: IS NOT GREATER THAN NO DIFFERENCE (POP. MEAN = 0) RESEARCH: IS GREATER THAN NO DIFFERENCE -Examine brain systems of humans in love -They look at avg. brain activations in the ventral tegmental areas from fMRI scans of 10 individuals when they look at 2 kinds of photos: one of partner and one of familiar but neutral person (no love or hate for person) -Compare these diff. scores back to no diff., Population 2 mean = 0 [Differences between VTA activations depending on which picture they look at. Compare these differences in VTA activation [difference scores] back to the notion that there would be no difference b/t those photos.] NULL: If someone has a certain VTA activation when looking at someone they love, the VTA activation will stay the same when looking at the neutral person. Is there MORE VTA activity when viewing the beloved's photo? Each individual has 2 fMRI scans: scan of VTA when looking at photo of beloved and neutral person
Known Information: Pop 1. N = 10 - Individuals in study Pop 2. Mean unknown - Individuals whose VTA activity is same when looking at either photo 1. Null and research hypotheses Null: Mean difference score (brain activation when viewing beloved minus activation when viewing neutral person) is not greater than no difference. Brain act. is not greater when viewing the beloved's photo compared to viewing neutral person's photo. Research: Mean difference score (brain activation when viewing beloved minus activation when viewing neutral person) is greater than no difference. Brain act. is greater when viewing the beloved's photo compared to viewing neutral person's photo. 2. Comparison Distribution (Difference scores) -Make each person's two scores into a difference score -Figure the mean of the diff. scores -Assume mean of the DOM of diff. scores = 0 (Pop. 2 mean) -Shape is t-distribution with df = N-1 -Figure SD of the DOM (SM) -> Estimated pop variance of diff. scores (S^2 = SS/DF) -> Variance of DOM of diff. scores (S^2M = S^2/N) -> SE of the DOM of diff. scores (SM = square root S^2M) Difference scores = Beloved photo score - control score. Then subtract that diff. score from mean. Square that. Add those squared dev. scores together. Divide by DF to get S^2. Go from there to S^2M and then SM. 3. Cutoff score -9 DF (10-1) -5% significance -1-tail test (looking for MORE, change in direction) cutoff t-value = 1.833 4. T-score formula t = calculated sample score - comparison distribution mean or population mean / SE of DOM of difference scores t = 1.4 - 0 / 0.251 = 5.58 5. Reject null? Is our sample score extreme enough? YES! Research hyp. supported; there appears to be more VTA activity when viewing a beloved's photo
One Sample t-Test example -CW reports students study 17 hrs/week. -You think students study more than that, poll 16 students from your dorm. -These students report studying an avg. of 21 hrs/week. -Do students in your dorm study more than college avg.? -Or could you conclude that your results are close enough to the college avg. that the small difference of 4 hours is due to your having picked, purely by chance, 16 of the more studious residents in your dormitory?
Known information: Population 1: M = 21 | N = 16 Pop. 1 = Your dorm students Population 2: Mew = 17, variance unknown Pop. 2 = Students in general 1. Determine your null and research hypotheses Null Hypothesis: Students in your dorm (Pop. 1) do not study more than students in general (Pop. 2). Research Hypothesis: Students in your dorm (Pop. 1) study more than students in general (Pop. 2) 2. Estimate population variance from sample scores -UNBIASED estimate of the pop variance S^2 = E (X-M)^2 / N - 1 = SS / N - 1 = SS / DF Degrees of freedom: Number of scores free to vary df = N - 1 S^2 = 694 - 15 / 46.27 3. Find the variance and standard error of the distribution of means -We needed the estimated population variance in order to get the variance and SD of the distribution of means. Variance of DOM: S^2M = S^2 / N SE of DOM: SM = square root S^2M S^2M = 46.27 / 16 = 2.89 SM = square root 2.89 = 1.7 Comparison Distribution: M = 17 [Assume the mean of the DOM is equal to our population's mean], SM = 1.7 4. Find the cutoff score -One-tailed (directional - observing if dorm students study MORE) -5% significance level -df = 15 Cutoff = 1.753 - If we calculate our t-value, and we get a value that is more extreme than our critical value, that is when we reject our null hypothesis and say our research hyp. that students in your dorm study more is supported. 5. Find the sample mean score on the comparison distribution by calculating a t-score t = sample mean - pop mean / standard error of the DOM The reason we had to calculate our pop variance and then get the variance and SE of the DOM is so we would have that number in our denominator for our t-value formula. 21 - 17 / 1.7 = 2.35 6. Decide whether to reject the NH Is our sample extreme enough to reject the null? YES. The research hypothesis is supported; students in your dorm study more than students in college overall. PHOTO: Population: No known information on variance and SE, just a known mean. Comparison distribution. What we create as our distribution of means. Set our cutoff t-score at 1.753. Sample. Definitely beyond/within this rejection region in our comparison distribution. REJECT NULL. Falling quite a ways out here because the t-score for the sample ended up being +2.35.
More on the sources of variation in within and between-groups variance estimates
NULL IS TRUE: Chance factors alone accounted for in both between and within-groups pop variances. RESEARCH IS TRUE: Chance factors AND our b/t groups estimate also reflects variation b/t those pops due to our treatment effect or manipulation.
ANOVA practice problem A social psychologist is studying the influence of knowledge of previous criminal record on juries' perceptions of the guilt or innocence of defendants. Researcher recruits 15 volunteers who have been selected for jury duty. The researcher shows them a DVD of a four-hour trial in which a woman is accused of passing bad checks. Before viewing the video, however, all of the research participants are given a "background sheet" with age, marital status, education, and other such information about the accused woman. For five of the participants, the last section of the sheet says that the woman has been convicted several times before the passing of bad checks (Criminal Record group). For five other participants, the last section of the sheet says the woman has a completely clean criminal record (Clean Record group). For the remaining five participants, the sheet does not mention anything about criminal record on way or the other (No Information group).
df bt = 2 (3 groups - 1) df within = 12 (N-1 + N-1 + N-1), 4 + 4 + 4 STEP 1: FIGURING MEANS FOR EACH GROUP STEP 2: FIGURING POPULATION VARIANCE FOR EACH SAMPLE -Figure means of each sample (score sum / number of scores) -Subtract each score from that sample mean to get deviation scores -Square those -Add those together FOR EACH GROUP -Divide this sum squared deviation by the number of participants MINUS 1 to get the pop variance for that score. Repeat for all 3 groups. STEP 3: Estimate pop. variance for variation of scores WITHIN EACH GROUP (denominator) S^2Within = S^2/1 + S^2/2 ... S^2/3 / NGroups S^2/1 = 4.5 S^2/2 = 5.0 S^2/3 = 6.5 4.5 + 5.0 + 6.5 / 3 = 5.33 STEP 4: Estimate pop. variance for variation of scores BETWEEN EACH GROUP (numerator) Step 4.1 Calculate variance of distribution of means S^2M = E (M-GM)^2 / df between df between = NGroups - 1 GM = Mean sum / number of means Sample means: 4 - GM (5.67) = -1.67 = ^2 = 2.79 8 - GM (5.67) = 2.33 = ^2 = 5.43 5 - GM (5.67) = -0.67 = ^2 = .45 Add those squared dev. scores = 8.67 Divide by df between (NGroups - 1) 8.67 / 2 = 4.34 Step 4.2 Estimate pop. variance for variation of scores BETWEEN EACH GROUP (numerator) S^2Between = S^2M (n) n = number of participants per sample (5) 4.34 x 5 = 21.70 STEP 5: F-RATIO F = S^2Between / S^2Within F = 21.70 / 5.33 = 4.07 Greater than 1... likely something happening but we need to know our cutoff score. STEP 6: F CUTOFF SCORE, F-TABLE Significance value: 0.05 B/t groups df = NGroups - 1 = 2 W/in groups df = df1 + df2 + df3 ... dflast (5-1) + (5-1) + (5-1) = 12 F-cutoff score: 3.89 Sample f-score: 4.07 STEP 7: REJECT/RETAIN NULL? REJECT NULL: 3 groups come from populations with different means: people exposed to diff. kinds of information or no information about the criminal record of the defendant in a situation of this kind will differ in their ratings of the defendant's guilt. BUT WE ARE NOT SAYING WHICH GROUPS RATE HIGHER OR LOWER.
F-statistic (F-ratio)
•Ratio of the between-groups population variance estimate to the within-groups population variance estimate B/T GROUP VARIANCE / W/IN GROUP VARIANCE What we look to calculate when we use an ANOVA •Between-groups and within-groups both based on the same source of variation - variation among the scores in each of the populations (i.e,. chance factors) •True Null: F ratio is usually about 1 If our b/t groups variance is made up of those chance factors and whatever the treatment effect Is, but our between groups variance is 3 and our within groups variance is also 3, then we know our within groups variance is in our b/t groups variance and everything happening b/t those groups seems to be accounted for by those chance factors made up by the within groups variance •False Null: F ratio is larger than 1 -> We set our cutoff ratio, if more extreme then we are confident there is something happening with our treatment effect. EXAMPLE: 6/3 3 = chance factors alone 6 = chance factors + other factors (potential treatment effect) F ratio = 2 •Between-groups estimate also influenced by variation among the means of the populations (i.e., treatment effect) •False Null = Greater Between-Groups Estimate in the numerator = greater F -Is our b/t groups variance big enough that we think there is something else going on between the groups besides just chance factors alone [besides something happening in the denominator with the within groups variance]?
Figuring a two-way ANOVA (sum of squares, population variance estimates, f-ratios)
•Structural model for the two-way ANOVA -Each score's deviation from the grand mean can be divided into -> Score's deviation from the mean of its cell -> Score's row's mean from the grand mean -> Score's column's mean from the grand mean •Structural model for the two-way ANOVA •Each score's deviation from the grand mean can be divided into -> Remainder after other three deviations subtracted from overall deviation from grand mean (allows us to figure our two-way ANOVA) x x x x x x x x x x x x x x x x x x x x STEP 1: Sum of squares (We use computer software for this!): SSRows = E (MRow - GM)^2 SSColumns = E (MColumn - GM)^2 SSInteraction = E [(X-GM) - (X-M) - (MRow - GM) - (MColumn - GM)]^2 SSWithin = E (X-M)^2 SSTotal = E (X-GM)^2 SSTotal = SSRows + SSColumns + SSInteraction + SSWithin STEP 2: Population variance estimates S^2Rows or MSRows = SSRows / DFRows MSColumns = SSColumns / DFColumns MSInteraction = SSInteraction / DFInteraction MSWithin = SSWithin / DFWithin STEP 3: F-ratios. You will have an F-ratio for each variable (rows, columns, interaction effect). Individual variances (what was your between groups variance on the numerator) and divide by within-groups variance (same for all) FRows = MSRows / MSWithin FColumns = MSColumns / MSWithin FInteraction = MSInteraction / MSWithin
Planned contrasts review -Recall from our previous example we had an F cutoff score of 3.89 and an estimated F value of 4.07. Thus, we rejected the null. -Also recall our F ratio does not tell us if 1 group was definitely higher than the other or whether 1 group was definitely lower than another. There is no basis for comparing these groups by the F ratio alone. -Three groups were given diff. types of info about a person's previous criminal record. Potential jurors were divided into these Criminal Record, Clean Record and No Information groups. a. COMPARING THE CRIMINAL RECORD GROUP (M=8) TO NI GROUP (M=5) -We do planned contrasts when we reject the overall null. The population means ARE NOT the same. -For planned contrasts we must conduct TWO VARIANCE ESTIMATES. 1. Within-groups pop variance estimate (that NOISE happens regardless of if there is a difference going on between groups). It stays the same our overall ANOVA. 2. Between-groups pop variance estimate USING JUST TWO means of interest - Full ANOVA used all 3 means, we will only use 2 group means. -Then figure your F the same way only with 2 comparison groups.
•The overall ANOVA DOES NOT test which population means are different •E.g., In the criminal record study, we concluded the true means of the three populations these groups represent are not all the same. However, we do not know which populations' means are significantly different from each other. The means do represent different groups, but we do not know which are different from each other. -The ANOVA says, here are these three groups, there is a difference. We do not know if that difference is between groups 1/2, 2/3, 1/3, etc. •Planned comparisons are the subset of all possible comparisons that the researcher specifies in advance of the study •E.g., In the criminal record study, the researchers' prediction in advance would probably have been that the Criminal Record group would rate the defendant's guilt higher than both the No Information group and the Clean Record group. -Proposing a planned comparison between the Criminal Record group compared to the NI group and the Criminal Record group compared to the Clean Record group. •These would be examples of planned contrasts. x x x x x x x x x x x x x x x x x x a. Within variance = the same S^Within = 5.33 The only thing we change is the b-t groups variance. •Estimating the population variance from the differences between group means STEP 1: Estimate the variance of the distribution of means -Calculate GM -Subtract group mean from GM -Square those -Add those together -Divide by df b-t S^2M = E (M-GM)^2 / df b-t | df b-t = NGroups - 1 E [(8-6.5)^2 + (5-6.5)^2] / 1 S^2M = 4.5 STEP 2: Estimate the variance of the population of individual scores (numerator) S^2Between = S^2M (n) n = number of participants in each sample 4.5 (5) = 22.5 - our new F numerator STEP 3: F-RATIO S^2B / S^2W 22.5/5.33 = 4.22 STEP 4: CUTOFF SCORE -Sig. level, df b-t, df w-in -Alpha = 0.05, df b-t = 1 (NGroups - 1), df w-in = the same = 12 [(5-1) + (5-1) + (5-1)] F-cutoff = 4.75 Our f = 4.22 FAIL TO REJECT/RETAIN NULL: The 3 means differ overall, but you cannot conclude the Criminal Record condition makes a person rate guilt diff. than being in the NI group.
Planned Contrats
•The overall ANOVA does not test which population means are different from one another •E.g., In the criminal record study, we concluded the true means of the three populations these groups represent are not all the same. However, we do not know which populations' means are significantly different from each other. -Your ANOVA just tells you, "There is a difference among the groups." •Planned comparisons/planned contrasts are the subset of all possible comparisons that the researcher specifies in advance of the study •E.g., In the criminal record study, the researchers' prediction in advance would probably have been that the Criminal Record group would rate the defendant's guilt higher than both the No Information group and the Clean Record group. -Compare the Criminal Record group mean to that of the NI and Clean Record group means. •These would be examples of planned contrasts. -For planned contrasts you must have rejected the overall null hypothesis. -> Research hypothesis: Means are not all the same. If you retain the null, there is no point in conducting a planned contrast because your ANOVA tells you, "These population means are close enough to being the same." x x x x x x x x x x x x x x x x x TO CALCULATE PLANNED CONTRASTS... Within-groups pop variance estimate is THE SAME as the overall ANOVA BUT the between-groups pop variance estimate (F numerator) DIFFERS. You are instead using the two means of interest rather than all the group means. -BUT BUT BUT you figure F the same way you would using only 2 comparison groups.