ANOVA

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

58. Construct a line graph from a hypothetical experiment that illustrates a main effect, but not an interaction.

...

59. Construct a line graph from a hypothetical experiment that illustrates an interaction.

...

61. Construct a table of means from a hypothetical experiment that illustrates the presence of an interaction.

...Means and Standard Deviations by Gender and Condition Variable M SD n Males Control 2 1.22 5 Violent 8 1.58 5 Females Control 3 2.00 5 Violent 4 1.00 5

52. Still referring to the scenario above, which of the influences on power most likely caused this set of outcomes

An extremely large sample size probably caused this set of outcomes to occur.

62. Describe (in words), the results from a hypothetical study where an interaction effect is present.

An interaction effect was present in the current analysis. Specifically, the effect of video-game condition on aggression was moderated by gender, such that the violent video game condition compared to controls resulted in significantly more aggressive behavior in boys, but not in girls. For girls, there was no significant difference between the video-game and control condition on aggressive behavior.

48. Changing your alpha level from .05 to .10 will have what effect on power

Changing your alpha level from .05 to .10 will increase your power.

15. If the null hypothesis is false, what value (or range of values) would we expect the F to take on

If the null is false (i.e. there is an effect), the F statistic should be greater than 1.

5. If the mean difference between two groups would be relatively common from a population where the null is true, what would the p-‐value look like

Mean differences that would be relatively common if the null were true would be much closer to 1, i.e. p-values greater than .1.

10. Conceptually, what does between-‐group sums of squares quantify

Sum of Squares Between is the variability in the dependent variable that is due to the independent variable... this represents our treatment effect.

11. Conceptually, what does within-‐group sums of squares quantify

Sum of Squares Within is the leftover variability in the dependent variable that is NOT due to the independent variable... this represents error/noise.

2.Suppose that large p value is obtained for a particular sample (e.g., p = .60). In this case, the data are consistent, or inconsistent with the null hypothesis

A large p-value (p= .60) means that the data are consistent with the null hypothesis because based on this sample, there is a 60% chance of getting these results by random chance from a population where the null is true.

34. How does Fischer's LSD compare to Tukey's test in terms of Type I error control

Fisher's LSD provides no protection against Type I errors as compared to Tukey's HSD.

13. If the null hypothesis is false, what does MSbetween quantify

If the null hypothesis is false, the groups will have different means in the population indicating that there is a treatment effect. the thing causing the means to be different is due to not only sampling error/noise, but also treatment effect

41. How would you characterized the magnitude of each of the above effect sizes

The eta squared would be a medium to large effect, and the hedges g would be a very small effect (misses the conventional cut off for small).

30. What is a Bonferroni adjustment, and when would you use it

When the familywise type I error rate is high, we can use post hoc tests that inflate the probability values for each comparison to protect against type I errors; the Bonferonni procedure is one such procedure. Specifically, the Bonferroni adjustment multiples each unadjusted p-value by the number of comparisons (e.g., if we are comparing 3 different treatment groups, we would multiply the p-value by 3) to produce Bonferroni p-values. Note that there's also an "alternate Bonferroni procedure" that researchers can employ that reduces the alpha level for each test by dividing desired alpha by the number of comparisons (i.e. instead of multiplying the unadjusted p-values by 3, you would divide desired alpha by 3) Finally, note that the Bonferroni adjustment is popular because it can be applied to any statistical test (e.g., a table of correlations), but is not ideal for post hoc tests because it tends to overcorrect the p-value and make it really difficult to detect group differences.

69. What assumptions are required by a two-‐factor ANOVA

1) Independence of observations (pg. 2)Extra Info: The independence assumption requires that one participant's score is not related to or influenced by other participants' scores. 2) Homogeneity of variance (pg. 4)Extra Info: Homogeneity of variance means that the variability is the same within each group. This means that the standard deviations of each group must be the same. 3) Normality (pg. 10)Extra Info: ANOVA assumes that population of scores must be normally distributed with each group. For example, if we were to look at an individual therapy group, anger outbursts must follow a normal curve in the population. It is also not useful to look at the distribution of the entire sample, because the the distribution may not be normal even if the individual groups are normally distributed.

45. Under what situations can a researcher make a Type II error

A researcher can make a Type II error when there is a mean difference in the population but random chance yields a sample with a small mean difference and a p-value > .05. Type II errors are easier for researchers to make when power is low. So factors that negatively influence power can create situations when a researcher makes a Type II error (such as small effect size or low sample size).

8. What type of research questions does ANOVA address

ANOVA is used to address comparative research questions involving 2+ groups (and when the IV is categorical and the DV is continuous). Ex: Do 3 different therapy types differentially impact client depression scores.

21. Give an example of a study or data collection scenario where independence is violated.

An example of study with independence violation would be a treatment study where certain clients share the same therapist.

68.What is an unbalanced design, and what negative outcome can result from such a design

An unbalanced design occurs because the benefit of adding a second factor (IV) decreases when group sizes are unequal. The negative outcome that results is a reduction in the SS for each IV because the IVs are correlated with unequal group sizes. In contrast, when IVs are uncorrelated in a balanced design, each IV accounts for unique variation in the DV. In addition, adding a factor (IV) often increases power by reducing error, but we have to be concerned with unbalanced designs.

46. As effect size increases (i.e., the groups become more different), what happens to power

As effect size increases, power increases.

47. As individual differences among subjects increase, what happens to power

As individual differences among subjects increase, power decreases.

7. Describe what a confidence interval is and how to interpret one.

Confidence intervals provide a range of values intended to estimate parameters for a population based on our sample. So say we have a 95% confidence interval and the lower-bound number is 2 and the upper-bound number is 3.5 This means that for 95% of the samples we collect data from, the true population value will fall within the confidence interval... so, there's a 95% chance that the true population value falls within 2 and 3.5.

64.What will happen to power when a second factor is added to an ANOVA

Could power increase, or decrease? Under what situation could an increase or decrease occur? Adding a second factor to your ANOVA can either cause power to increase or decrease, depending on the context. Adding a second factor that is actually related to the DV can help explain some of the systematic variation in SSw, reduce error, and therefore, increase power. However, adding a second factor that is unrelated to your DV can actually reduce power. When you calculate the within-group mean square (MSw) in order to calculate the F statistic, you use the equation: MSw=SSw/dfw. Adding a second factor "diverts" degrees of freedom values to the other IV. In other words, having two factors will necessarily cause your dfw term to be smaller. Dividing SSw by a smaller number actually creates a larger MSw (error) term, and according to the formula for F=MSb/MSw, will then cause F to be smaller and therefore less likely to reach significance. The key point here is that choosing an unrelated second factor will not account for as much error in the data and your SSw will not be smaller, but rather will be bigger. This coupled with the necessarily smaller dfw of two factor ANOVA could result in reduced power. **also unsure of this. Craig said that power can't really be talked about in the "present"/about the current study, but is rather a projective number. If that is the case then I'm not sure if I got at exactly what he wanted.

65.What are simple effects

Describe a hypothetical study, and explain the concept of simple effects within the context of that study. In general a simple effect is a priori simple comparison involving two specific groups (e.g., Contrast codes for 3 groups = 0 -3 3). This allows for the researcher to find the difference of means between both groups. Example: There are three groups: individual therapy (I), group therapy (G), and a control group (C) in a study examining whether there is a difference between each therapy and the control group for the treatment of anxiety symptoms. In order to test the difference of means between each therapy group and control group you would use simple effects accompanied with the following coding system to yield the desired comparisons. Individual Group Control Indiv. Vs Control 1 0 -1 Group vs Control 0 1 -1

38. Under which situations can you use the various effect size measures

For example, when might you choose eta squared over hedges g? An eta squared is the appropriate effect size measure for an ANOVA with more than two groups. Both eta squared and w squared give you the proportion of variance in the DV explained by the IVs, but the two have different computations. Eta squared overestimates the proportion of variance accounted for in the population, but is accurate for the sample. The w squared is a "adjusted" estimate of the proportion of variance accounted for, and therefore is a better estimate of variance accounted for in the population (and sample). These differences between eta squared and w squared exist for small sample sizes, but the two effect sizes become comparable with large samples. The cut offs are small >.01, medium >.06, large >.14. A hedges g is an appropriate effect size measure for planned comparisons. The hedge's g is very similar to the Cohen's d in that it uses average standard deviation to standardize the mean difference. The cut offs are small >.20, medium >.50, large >.80 (the same as Cohen's d).

28. F-‐max yields a value of 9.78. What is your conclusion regarding homogeneity of variance

Going by Craig's rule of thumb that 3 is the cutoff for F-max values that we should be concerned about (i.e., if f-max is less than 3, the accuracy of the t-test may not be substantially compromised), homogeneity most likely HAS been violated. (Note that different sources suggest different rules of thumb, so we should check p-values for these F statistics to be safe).

66. In a 3x2 ANOVA, how many different simple effects tests are possible

Group A Group B Group C Level 1 1A 1B 1C Level 2 2A 2B 2C There are 5 simple effect tests possible. 2 from the levels and 3 from the groups.

17. Suppose that the standard deviations within each group went from 5 to 8. Holding all other values constant, what impact would this have on the F statistic

If the standard deviations increase, then the MSw will also increase because it's based on the standard deviations and represents error. So if the standard deviations increase, then the F statistic will decrease. This is because the amount of error in the study increased affecting the ratio of treatment effect to error (F).

53. What information do you need in order to conduct a power analysis

In order to conduct a power analysis you need at least 3 of the 4 pieces of information below: 1) Alpha level 2) Effect size 3) Sample Size 4) Power

22. What impact will independence violations have on your results, and how serious is this violation

Independence violations in your study will bias your results so that Random error is underestimated and the rates of false positive significance tests increase. This is a VERYserious violation that would INVALIDATE the ANOVA analysis--hierarchical linear models (AKA: multiplevel models or mixed linear models) are more appropriate in these scenarios b/c they can explicitly estimate the extent of the clustering effect.****

4. If the mean difference between two groups would occur very rarely from a population where the null is true, what would the p-‐value look like?

Mean differences that would occur rarely if the null were true will have very small probability values, i.e. p-values less than .05.

60. Construct a table of means from a hypothetical experiment that illustrates a main effect for one (or both) factor(s), but not an interaction.

Means and Standard Deviations by Condition Variable M SD n Control 2.5 1.5 10 Non-Violent 3.5 1.4 10 Violent 6 1.2 10

57. What is meant by moderation, or interaction

Moderation/interactions occur when the relationship of an IV to a DV is dependent on, or changes across levels on a second IV. A test of interactions examines whether the effectso f one IV are uniform for all groups of the second IV.

31. What is the difference between post-‐hoc and planned comparisons. When would you use one versus the other

Planned comparisons-- a researcher specifies hypotheses about specific groups that she wants to compaire, PRIOR to the study.Post hoc (unplanned) comparisons-- a researcher performs exploratory analyses to determine which groups differ; this usually involves every possible comparison of group comparisons.You'd use a planned comparison when you have a specific hypothesis about specific groups (hypothesis testing) and post hoc comparisons when you don't (exploratory research)--- this last part wasn't out of notes, does that sound right to everyone?

27. Levene's test yields a p-‐value of .47. What is your conclusion regarding homogeneity of variance

Recall that Levene's test tests the null hypothesis that the group variances are not significantly different from one another; a Levene's test yielding a p-value of .47 (non-significant results!) indicates that the group variances are not significantly different from one another and thus that the homogeneity of variance assumption has not been violated.

16. Suppose that the between-‐group variability doubled. Holding all other values constant, what impact would this have on the F statistic

The F statistic would increase because an increase in the between-group variability indicates a larger difference between the groups (i.e. a larger treatment effect) and this would be reflected by a larger F statistic.

32. Suppose that a researcher conducted an ANOVA with three groups, and wanted to do pairwise comparisons among all the groups. Which follow-‐up procedure would be most powerful: Tukey or Scheffe

The Tukey test would be most powerful. Tukey is appropriate when you want to compare all possible pairwise comparisons. The Scheffe procedure compares all possible pairwise comparisons and all possible complex comparisons. The Scheffe procedure is usually undesirable because it adjusts the p-value for too many comparisons making it difficult to detect group differences (i.e., the test lacks power).

67. How is "error" (e.g., SSW or MSW) defined in a two-‐factor ANOVA

The definition is the same as the one-factor ANOVA. Error is defined as the left-over variation not due to the independent variable(s). This may be from random noise (e.g., human error) or from systematic variability (e.g., gender differences). With systematic variability, if you had an additional independent variable, you can explain some of the within-group variability (error) and thus reduce error (and increase power).

20. What is the independence assumption

The independence assumption is the requirement in ANOVA that one participant's score not be related to or influence by another participant's score. Statistically speaking, in ANOVA the standard error of the mean is calculated as σ /√N, Because it divides by N, the formula assumes that each individual contributes one "unit" of information. Independence causes redundancies in the data, such that each score contains less than one unit of unique information

33. What is the difference between Tukey and other post hoc tests (e.g., Fisher, Scheffe)

The main difference is they all differ in their level of protection against false positives. Fisher is at the no protection end, Scheffe is on the maximum protection end and Tukey and Dunnet are in the middle. Post Hoc Test Appropriate when Type I error control? Other Comments Fisher's LSD Does not protect against type I error Logic: If F is significant, the null is false, type I errors are not possible. Scheffe Test Goal is to compare all possible pairwise comparisons and complex comparisons "over-corrects", adjusts the p-value for too many comparisons: maximum protection against type I errors Difficult to detect any group differences Tukey's HSD Goal is to compare al possible pairwise comparisons Keeps familywise type I error rate at .05 Most common procedure used in psych Dunnet's test Goal is to compare a reference group to every other group Similar to Tukey but uses a smaller correction factor because it assumes fewer comparisons. If youre comparing intervention vs control this is the best to use.

1. What role does the null hypothesis play in the significance testing process

The null hypothesis defines the situation when the treatment makes no difference. When the null is true, the groups of interest are equivalent (and have equivalent means) in the population.

35. Suppose that a researcher did an ANOVA with four groups. Further, she wanted to compare the first two groups against the second two groups using a planned comparison. What are the contrast coefficients that would be used for this analysis

The way you go about assigning weights (i.e., a contrast coefficient) is: ● Each mean gets a weight ● The means that are not involved in a contrast get a weight of zero. ● The means that are being compared in a contrast have weights with opposite signs (positive or negative) ● All weight must sum to zero. Therefore, in this questions where you have four groups and you want to compare the first two against the second two you can use the following contrast coefficients: Group 1 Group 2 Group 3 Group 4 complex comparison -1 -1 +1 +1

43. Define a Type II error.

Type 2 error is the probability of failing to reject the null when there is in fact an effect in the population. Type 2 error can only occur when there is a mean difference in the population, and when random chance yields a sample with small mean differences. Type 2 error is inversely related to power, 1-power = type 2 error. Since we usually want a power of .80, then we are an accepting a type 2 error rate (failure to detect an effect even though there is one in the population) 20% of the time.

42. Define a Type I error.

Type one error is specified by alpha in advance of the study. It can be defined as the probability of rejecting the null hypothesis (saying there is an effect) when there is in fact no effect in the population. Type one error can only occur when there is no mean difference in the population, and when random chance yields a sample with an extreme mean difference (p <.05).

25. Under what condition will violating homogeneity of variance lead to an increase in Type I errors

Violating the homogeneity of variance can lead to an increase in Type 1 errors (false positives when: - The group sample sizes are equal OR - We have unequal group sizes and the group with the largest n (largest group) has the smallest standard deviation Because these conditions result in a within group variability is too small, giving us an F-stat that is too big--biasing the results towards a Type 1 error.

70. What are the analytic steps following a significant interaction

When the interaction is significant, ignore the main effects and perform addition analyses that help you understand the interaction. These additional analyses are testing simple effects to examine the influence of one IV within each group of the other IV. Simple effects are similar to one-factor ANOVAs preformed on a subset of the participants. Also know that since you have two factors, you can test the simple effects by splitting the design by whatever IV you prefer. (No need to perform both sets of analyses, choose to split the design that will best address your research question.)

71. What are the analytic steps following a non-‐significant interaction

When there is non-significant interaction, examine the F statistics for the main effects only. In addition, when the interaction is not significant, each main effect is similar to a one-factor ANOVA. If the main effect is signficant, peform pairwise comparisons.

26. Under what condition will violating homogeneity of variance lead to an increase in Type II errors

When within group variability is too large, we can expect an increase in Type II errors (false negatives)

37. Can p-‐values be used as measures of effect size across studies

Why or why not? P-values cannot be used as measures of effect size across studies because they are dependent on sample size. On the other hand, effect size measures quantify the magnitude of the association in a way that is independent of sample size.

19. Suppose that the ANOVA yielded a significant F statistic (e.g, p < .05). What conclusion can you draw from this

You could conclude that the results are statistically significant indicating that at the mean of at least one group is significantly different from the others.

29. What is a familywise Type I error rate

a familywise Type I error rate is basically just a Type I error-- a.k.a. the probability of arriving at a false positive, or of finding evidence in favor of the alternative hypothesis when in reality the null hypothesis is true.I (Susan) looked this up in another stats book. I think this is helpful --Familywise Type I error is the type I error rate for the experiment as a whole, which includes all of the comparisons tested in the analysis. Separate, per comparison probabilities actually combine to produce a much larger value, which we call familywise type I error. This refers to the probability that at least one type I error has been committed somewhere among the various test conducted in the analysis. Familywise type I error corrections are only needed for post-hoc comparisons not planned comparions. **Similarly, from page 6 of the post-hoc test notes: The probability of making one or more type 1 errors across a set of tests is called the family wise type 1 error rate.**

44. Under what situations can a researcher make a Type I error

(Type one error can only exist when there is no mean difference in the population, and when random chance yields a sample with an extreme mean difference). This can occur because of independence violations (because independence violations shrink within group variability in that each individual is not contributing one unit uniquely). Type one errors can also occur if the homogeneity of variance assumption is violated. If group sizes are equal, type one error increases roughly to 8% instead of 5%. If the groups are unequal, then the ANOVA is driven by the group with the largest n. If that largest group also has the smallest standard deviation, the within group variability is reduced, increasing the type one error rate.**Isn't it also possible to just have a massive n, that would jack up or power to the point where we detect statistical significance and reject the null, even if it is true in the population?***

50. How do you interpret a power value of, say .70

A power value of .70 means you have a 70% chance of detecting a treatment effect of a particular magnitude, if the effect truly exists.

36. What is the difference between a simple comparison and a complex comparison

A simple comparison involves only two conditions. A complex comparison pools two (or more?) conditions and compares the pooled group to another condition.

6. What is the alpha level, and what purpose does it serve in the significance testing process

Alpha is the researcher-designated significance level of the test. It is the probability level associated with the decision rule such that a found p-value of greater than alpha means the data is consistent with the null, and a found value less that alpha results in rejection of the null.

63.How will the addition of a second factor change the SSB and SSW values from a one-factor ANOVA

Compared to the SSw of a one-factor ANOVA, the addition of a second factor would decrease the value of SSw in a two-factor ANOVA. SSw represents the left-over variation that is not due to the independent variable, and the addition of a second factor can help explain some of the systematic variation in SSw, and thereby decrease its value, or error. The SSb term will stay the same. Since SSb represents the comparison of each group's average (e.g. IV-video game condition, or gender) to the grand mean, this will be the same regardless of how many factors you add to the analysis.

24. Why is homogeneity of variance important in ANOVA

Homogeneity of variance is important in ANOVA because the F statistic is influence by within group variability (error) which is calculated by adding up the sums of squares within each group. If the variances in the two groups are different from each other, then adding the two together is not appropriate, and will not yield an estimate of the common within-group variance (since no common variance exists).

55. What is a marginal mean

How are marginal means computed? In 2-factor ANOVA, the marginal means for one factor are the means for that factor averaged across all levels of the other factor. For example, when looking at the phone sex data, the marginal mean for the Android group would be the mean number of sexual partners in the Android group across gender. The marginal mean is computed by averaging mean scores on a DV in within one group across several factors. So if, in the phone sex example, Android Females had a mean of 6 and Android Males had a mean of 8, the Android marginal mean=7.*NOTE: The marginal means for each group within one factor are compared when consider the main effect of that factor. So, the 3 marginal means for each phone group would be compared to see the main effect of phone group.

3.Suppose that a researcher was comparing males and females on some dependent variable using an ANOVA or t-‐test. The p-‐value was .08. What is the interpretation of the p-‐value (Note: the answer has nothing to do with whether the p-‐value is significant or not).

If the null hypothesis is true, a p-value of .08 means that the t statistic calculated for this sample has an 8% probability of occurring due to chance. Therefore, we consider the data consistent with the null hypothesis, meaning that males and females are different in the population based on the dependent variable being measured, since we usually use .05 as our cutoff.

14. If the null hypothesis is true, what is the expected value of the F statistic

If the null is true (i.e. there is no effect), the F statistic should be 1 (or close to 1). The ratio of the between-group variability to the within-group variability is nearly equal, which is why the F value is close to 1.

49. Why is changing your alpha level a bad way to manipulate power

It's a bad way to manipulate power because it increases your Type I error rate (the chances of having a false positive).

23. What null hypothesis is being tested by Levene's test

Levene's test computes the absolute value of each score's distance from the group mean (i.e., a participant's contribution to the within-group variability) and then uses those scores as the DV in an ANOVA analysis. The null hypothesis is that the standard deviations of the groups are equal (s=s, or that there is homogeneity of variance).

12. If the null hypothesis is true, what does MSbetween quantify

MSbetween is giving us an indication of a type 1 error (since MSbetween is telling us about treatment effects, and the null is true-- meaning there isn't a treatment effect). Any mean differences we find in this case are due to noise/ sampling error.

18. What null hypothesis is being tested by ANOVA

The groups will have identical means in the population.

56. How is a main effect defined in a two-factor ANOVA

The main effect, in two-factor ANOVA, compares the means of one factor while completely ignoring the second factor. The main effect of a factor is examined by looking at the differences in marginal means within the factor of interest.

40. How do you interpret a Hedge's g of, say .10

This is a tiny effect size (the standard for small is >.20). This tells us there is a .10 standard deviation difference is the means.

39. How do you interpret and eta squared of, say .10

This would mean that the proportion of variance explained by the IVs is 10%. This is a medium to large effect according to conventional standards.

9. What type of data are required for the IV and DV (categorical, continuous, etc.)

To use ANOVA, we need continuous DVs and categorical/nominal IVs (note that covariates- control variables- can be categorical or continuous in ANOVA).

54. What factors can and cannot (or should not) be manipulated in conducting a power analysis

Usually, in a power analysis, the desired level of significance (alpha) and power (beta) are fixed. Estimated effect size, should not changed—but in reality, these estimates may fluctuate depending on how comfortable PI's are with the initial results of the power analysis (also, sometimes it can be upped, like by giving the rats more hormone). Sample size is a factor that may be manipulated in power analysis depending on study resources. You can also decrease the heterogeneity of your sample and thus reduce noise.

51. Suppose that a study is comparing 2 groups and finds and effect size close to zero (e.g. d=0.1). However, the results were statistically significant (e.g., p<0.05). Does the study have too much, too little, or just the right amount of power

When we detect a statistically significant result, even when there is a "trivial" effect size, or an effect size close to zero, this means that the study may have had a sample size that was too large. This means that the study probably had too much power, driven by an extremely large sample size.**Just as an aside other such scenarios may include: 1) We probably have just the right amount of power when we have a "decent" effect size and p<0.05 OR when we have a trivial effect size and p>0.05 b/c n is just the right size. 2)We do not have enough power when we have a "decent" effect size and p>0.05, probably b/c n is too small.


Ensembles d'études connexes

Sound Byte: Plagiarism and Intellectual Property

View Set

Fundamentals of Web Program Midterm

View Set

Acupuncture Point Cautions and Contraindications

View Set

International Marketing Quiz 1 - Chp 04

View Set