Stats, ANOVA Focus

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Assumptions of the two-way randomized ANOVA

"All conditions contain independent samples of subjects. Interval or ratio data were collected. The populations represented by the data are roughly normally distributed. The variances among the populations being compared are homogenous."

Assumptions of the One-way repeated measures ANOVA

"The data are in an interval or ratio scale. The underlying population distribution is normally distributed. The variances among the populations being compared are homogenous. The groups are correlated."

Error df

(k-1)(n-1)

Subject df

(n-1)

What assumptions are required by a two-‐factor ANOVA

1) Independence of observations 2) Homogeneity of variance 3) Normality

What does the Bonferoni adjustment look like?

1-(1-a)ˆc

Levene's test yields a p-‐value of .47. What is your conclusion regarding homogeneity of variance?

A Levene's test yielding a p-value of .47 (non-significant results!) indicates that the group variances are not significantly different from one another and thus that the homogeneity of variance assumption has not been violated.

Suppose that a p value is obtained for a particular sample (e.g., p = .60). In this case, the data are consistent, or inconsistent with the null hypothesis?

A large p-value (p= .60) means that the data are consistent with the null hypothesis because based on this sample, there is a 60% chance of getting these results by random chance from a population where the null is true.

Tukey's honestly significant difference (HSD)

A post hoc test used with ANOVAs for making all pairwise comparisons when conditions have equal n.

What is the difference between a simple comparison and a complex comparison?

A simple comparison involves only two conditions. A complex comparison pools two (or more?) conditions and compares the pooled group to another condition.

Complete factorial design

All levels of each independent variable are paired with all levels of every other independent variable.

Incomplete factorial design

All levels of one variable are variable are not paired with all levels of another independent variable.

k

Amount of groups.

What are simple effects?

An a priori simple comparison involving two specific groups (e.g., Contrast codes for 3 groups = 0 -3 3). This allows for the researcher to find the difference of means between both groups.

Main effect

An effect of a single independent variable.

Mean square

An estimate of either variance between groups of variance within groups.

Between-groups variance

An estimate of the variance of the group means about the grand mean; includes both systematic variance and error variance.

Within-groups variance

An estimate of the variance within each condition in the experiment; also known as error variance, for variance due to chance factors.

Error variance

An estimate within each condition in the experiment after variance due to individual differences has been removed.

Eta-squared

An inferential statistic for measuring effect size with an ANOVA.

One-way repeated measures ANOVA

An inferential statistical test for comparing the mean of three or more groups using a correlated-groups design.

What is an unbalanced design?

An unbalanced design occurs because the benefit of adding a second factor (IV) decreases when group sizes are unequal.

Factorial design

Any design with more than one independent variable.

What is a marginal mean?

Are the means for a factor averaged across all levels of the other factor.

As effect size increases (i.e., the groups become more different), what happens to power?

As it increases, power increases.

Fisher's LSD

Does not protect against type I error Logic: If F is significant, the null is false, type I errors are not possible.

What are the analytic steps following a non-‐significant interaction

Examine the F statistics for the main effects only. In addition, when the interaction is not significant, each main effect is similar to a one-factor ANOVA. If the main effect is significant, perform pairwise comparisons.

Hartley's Test is also called

F-Max test

How does Fischer's LSD compare to Tukey's Test in terms of Type I error control?

Fisher's LSD provides no protection against Type I errors as compared to Tukey's HSD.

F-‐max yields a value of 9.78. What is your conclusion regarding homogeneity of variance?

If f-max is less than 3, the accuracy of the t-test may not be substantially compromised, so homogeneity most likely HAS been violated. (Note that different sources suggest different rules of thumb, so we should check p-values for these F statistics to be safe).

Suppose that a researcher was comparing males and females on some dependent variable using an ANOVA or t-‐test. The p-‐value was .08. What is the interpretation of the p-‐value?

If the null hypothesis is true, a p-value of .08 means that the t statistic calculated for this sample has an 8% probability of occurring due to chance. Therefore, we consider the data consistent with the null hypothesis, meaning that males and females are different in the population based on the dependent variable being measured, since we usually use .05 as our cutoff.

If the null hypothesis is false, what value (or range of values) would we expect the F to take on?

If the null is false (i.e. there is an effect), the F statistic should be greater than 1.

If the null hypothesis is true, what is the expected value of the F statistic?

If the null is true (i.e. there is no effect), the F statistic should be 1 (or close to 1). The ratio of the between-group variability to the within-group variability is nearly equal, which is why the F value is close to 1.

Suppose that the standard deviations within each group went from 5 to 8. Holding all other values constant, what impact would this have on the F statistic?

If the standard deviations increase, then the MSw will also increase because it's based on the standard deviations and represents error. So if the standard deviations increase, then the F statistic will decrease. This is because the amount of error in the study increased affecting the ratio of treatment effect to error (F).

What are the analytic steps following a significant interaction

Ignore the main effects and perform additional analyses that help you understand the interaction. These additional analyses are testing simple effects to examine the influence of one IV within each group of the other IV.

What role does the null hypothesis play in the significance testing process?

It defines the situation when the treatment makes no difference. When the null is true, the groups of interest are equivalent (and have equivalent means) in the population.

Conceptually, what does between-‐group sums of squares quantify?

It is the variability in the dependent variable that is due to the independent variable... this represents our treatment effect.

What impact will independence violations have on your results, and how serious is this violation?

It will bias your results so that random error is underestimated and the rates of false positive significance tests increase. This is a VERY serious violation that would INVALIDATE the ANOVA analysis hierarchical linear models (AKA: multilevel models or mixed linear models) are more appropriate in these scenarios b/c they can explicitly estimate the extent of the clustering effect.

Changing your alpha level from .05 to .10 will have what effect on power?

It will increase your power.

Suppose that the between-‐group variability doubled. Holding all other values constant, what impact would this have on the F statistic?

It would increase because an increase in the between-group variability indicates a larger difference between the groups (i.e. a larger treatment effect) and this would be reflected by a larger F statistic.

Why is changing your alpha level a bad way to manipulate power?

It's a bad way to manipulate power because it increases your Type I error rate (the chances of having a false positive).

If the null hypothesis is true, what does MSbetween quantify?

MSbetween is giving us an indication of a type 1 error (since MSbetween is telling us about treatment effects, and the null is true-- meaning there isn't a treatment effect). Any mean differences we find in this case are due to noise/ sampling error.

If the mean difference between two groups would be relatively common from a population where the null is true, what would the p-‐value look like?

Mean differences that would be relatively common if the null were true would be much closer to 1, i.e. p-values greater than .1.

If the mean difference between two groups would occur very rarely from a population where the null is true, what would the p-‐value look like?

Mean differences that would occur rarely if the null were true will have very small probability values, i.e. p-values less than .05.

How do you interpret a power value of, say .70

Means you have a 70% chance of detecting a treatment effect of a particular magnitude, if the effect truly exists.

Total df

N-1

Within-groups df

N-k

Can p-‐values be used as measures of effect size across studies?

No, they cannot be used as measures of effect size across studies because they are dependent on sample size. On the other hand, effect size measures quantify the magnitude of the association in a way that is independent of sample size.

What is meant by moderation, or interaction

Occur when the relationship of an IV to a DV is dependent on, or changes across levels on a second IV. A test of interactions examines whether the effects of one IV are uniform for all groups of the second IV.

What is the difference between post-‐hoc and planned comparisons?

Planned comparisons-- a researcher specifies hypotheses about specific groups that she wants to compare, PRIOR to the study. Post hoc (unplanned) comparisons-- a researcher performs exploratory analyses to determine which groups differ; this usually involves every possible comparison of group comparisons.

n

Sample size in each group.

N

Sum of scores

Suppose that a researcher conducted an ANOVA with three groups, and wanted to do pairwise comparisons among all the groups. Which follow-‐up procedure would be most powerful: Tukey or Scheffe?

The Tukey test would be most powerful. Tukey is appropriate when you want to compare all possible pairwise comparisons. The Scheffe procedure is usually undesirable because it adjusts the p-value for too many comparisons making it difficult to detect group differences (i.e., the test lacks power).

Assumptions of the One-Way Randomized ANOVA

The data are on an interval or ratio scale. The underlying population distribution is normally distributed. The variance among the populations being compared are homogenous.

Interaction effect

The effect of each independent variable across the levels of the other independent variable.

What null hypothesis is being tested by ANOVA?

The groups will have identical means in the population.

appropriate effect size measure for planned comparisons.

The hedge's g. It is very similar to the Cohen's d in that it uses average standard deviation to standardize the mean difference. The cut offs are small >.20, medium >.50, large >.80 (the same as Cohen's d).

Alternative Hypothesis

The independent variable had an effect - at least one of the samples represents a different population than the others.

Null Hypothesis

The independent variable had no effect - the samples all represent the same population.

What is the difference between Tukey and other post hoc tests (e.g., Fisher, Scheffe)?

The main difference is they all differ in their level of protection against false positives. Fisher is at the no protection end, Scheffe is on the maximum protection end and Tukey and Dunnet are in the middle.

Grand mean

The mean of the means of several sub samples, as long as the sub samples have the same number of data points

Factorial notation

The notation that indicates how many independent variables were used in a study and how many levels were used for each variable.

Two-Way Randomized ANOVA

The phrase "two-way" informs us that there are two independent variables in the study.

F-ratio

The ratio formed when the between-groups variance is divided by the within-groups variance.

What is the independence assumption?

The requirement in ANOVA that one participant's score not be related to or influenced by another participant's score. Statistically speaking, in ANOVA the standard error of the mean is calculated as σ /√N, Because it divides by N, the formula assumes that each individual contributes one "unit" of information. Independence causes redundancies in the data, such that each score contains less than one unit of unique information

Sum of Squares Factor A

The sum of the squared deviation scores of each group mean for Factor A minus the grand mean, ties the number of scores in each Factor A condition.

Sum of Squares Factor B

The sum of the squared deviation scores of each group mean for Factor B minus the grand mean, times the number of scores in each Factor B.

Between-groups sum of squares

The sum of the squared deviations of each group's mean from the grand mean, multiplied by the number of subjects in each group.

Sum of squares error

The sum of the squared deviations of each score from its condition mean.

Within-groups sum of squares

The sum of the squared deviations of each score from its group mean.

Total sum of squares

The sum of the squared deviations of each score from the grand mean.

Sum of squares interaction

The sum of the squared difference of each condition means minus the grand mean, times the number of scores in each condition. The SSa and SSb are then subtracted from this.

Subject variance

The variance due to individual differences; removed from the error variance.

In a 3x2 ANOVA, how many different simple effects tests are possible?

There are 5 simple effect tests possible. 2 from the levels and 3 from the groups.

Describe what a confidence interval is and how to interpret one.

They provide a range of values intended to estimate parameters for a population based on our sample. So say we have a 95% confidence interval and the lower-bound number is 2 and the upper-bound number is 3.5 This means that for 95% of the samples we collect data from, the true population value will fall within the confidence interval... so, there's a 95% chance that the true population value falls within 2 and 3.5.

Under what situations can a researcher make a Type I error?

This error can only exist when there is no mean difference in the population, and when random chance yields a sample with an extreme mean difference. Can occur due to independence violations and homogeneity of variance violations

How do you interpret a Hedge's g of, say .10

This is a tiny effect size (the standard for small is >.20). This tells us there is a .10 standard deviation difference is the means.

How do you interpret and eta squared of, say .10

This would mean that the proportion of variance explained by the IVs is 10%. This is a medium to large effect according to conventional standards.

What type of data are required for the IV and DV (categorical, continuous, etc.) in ANOVA?

To use ANOVA, we need continuous DVs and categorical/nominal IVs (note that covariates- control variables- can be categorical or continuous in ANOVA).

Calculations for a two-way randomized ANOVA

Total, Factor A, Factor B, AxB, Error.

Calculations for a one-way repeated measures ANOVA

Total, between treatments, within treatments, subject, error.

What factors can and cannot (or should not) be manipulated in conducting a power analysis

Usually, the desired level of significance (alpha) and power (beta) are fixed. Estimated effect size, should not changed—but in reality, these estimates may fluctuate depending on how comfortable PI's are with the initial results of the power analysis. Sample size is a factor that may be manipulated in power analysis depending on study resources. You can also decrease the heterogeneity of your sample and thus reduce noise.

What is a Bonferroni adjustment, and when would you use it?

When the familywise type I error rate is high, we can use post hoc tests that inflate the probability values for each comparison to protect against type I errors; the Bonferonni procedure is one such procedure. A Bonferroni adjustment multiplies each unadjusted p-value by the number of comparisons (e.g., if we are comparing 3 different treatment groups, we would multiply the p-value by 3) to produce Bonferroni p-values.

Under what situations can a researcher make a Type II error?

When there is a mean difference in the population but random chance yields a sample with a small mean difference and a p-value > .05.

Post hoc test

When used with an ANOVA, a means of comparing all possible pairs of groups to determine which ones differ significantly from each other.

Suppose that a study is comparing 2 groups and finds and effect size close to zero (e.g. d=0.1). However, the results were statistically significant (e.g., p<0.05). Does the study have too much, too little, or just the right amount of power?

When we detect a statistically significant result, even when there is a "trivial" effect size, or an effect size close to zero, this means that the study may have had a sample size that was too large. This means that the study probably had too much power, driven by an extremely large sample size.

Under what condition will violating homogeneity of variance lead to an increase in Type II errors?

When within group variability is too large, we can expect an increase in Type II errors (false negatives)

Suppose that the ANOVA yielded a significant F statistic (e.g, p < .05). What conclusion can you draw from this?

You could conclude that the results are statistically significant indicating that at the mean of at least one group is significantly different from the others.

What information do you need in order to conduct a power analysis?

You need at least 3 of the 4 pieces of information below: 1) Alpha level 2) Effect size 3) Sample Size 4) Power

When would you use post hoc or planned comparisons?

You'd use a planned comparison when you have a specific hypothesis about specific groups (hypothesis testing) and post hoc comparisons when you don't (exploratory research)

What negative outcomes can result from an unbalanced design?

a reduction in the SS for each IV because the IVs are correlated with unequal group sizes.

parameter

a value derived from the data collected from a population, or the value inferred to the population from a sample statistic

X

an individual score in the distribution

mean

arithmetic average of a distribution of scores

Why is the Bonferroni adjustment popular?

because it can be applied to any statistical test (e.g., a table of correlations), but is not ideal for post hoc tests because it tends to overcorrect the p-value and make it really difficult to detect group differences.

Why is homogeneity of variance important in ANOVA?

because the F statistic is influenced by within group variability (error) which is calculated by adding up the sums of squares within each group. If the variances in the two groups are different from each other, then adding the two together is not appropriate, and will not yield an estimate of the common within-group variance (since no common variance exists).

How is the marginal mean computed?

by averaging mean scores on a DV in within one group across several factors.

distribution

collection, or group, of scores from a sample on a single variable; these scores are often arranged from smallest to largest

What type of research questions does ANOVA address?

comparative research questions involving 2+ groups (and when the IV is categorical and the DV is continuous). Ex: Do 3 different therapy types differentially impact client depression scores.

How is a main effect defined in a two-factor ANOVA?

compares the means of one factor while completely ignoring the second factor. This is examined by looking at the differences in marginal means within the factor of interest.

What null hypothesis is being tested by Levene's test?

computes the absolute value of each score's distance from the group mean (i.e., a participant's contribution to the within-group variability) and then uses those scores as the DV in an ANOVA analysis. The null hypothesis is that the standard deviations of the groups are equal (s=s, or that there is homogeneity of variance).

bimodal

distribution that has two values that have the highest frequency of scores

median split

dividing a distribution of scores into two equal groups by using the median score as the divider; scores above the median are the "high" group and scores below the median are the "low" group

When IVs are uncorrelated in a balanced design

each IV accounts for unique variation in the DV.

appropriate effect size measure for an ANOVA with more than two groups.

eta squared

outliers

extreme scores that are more than two standard deviations above or below the mean

Pairwise comparison

generally is any process of comparing entities in pairs to judge which of each entity is preferred, or has a greater amount of some quantitative property, or whether or not the two entities are identical.

population

group from which data are collected or a sample is selected; the population encompasses the entire group for which the data are alleged to apply

Type one error increases roughly to 8% instead of 5% when

group sizes are equal

How is Hartley's Test used?

in the analysis of variance to verify that different groups have a similar variance, an assumption needed for other statistical tests.

Adding a factor (IV) will often

increases power by reducing error, but we have to be concerned with unbalanced designs.

sample

individual or group, selected from a population, from whom or which data are collected

Define a Type I error.

is specified by alpha in advance of the study. It can be defined as the probability of rejecting the null hypothesis (saying there is an effect) when there is in fact no effect in the population.

What is a familywise Type I error rate?

is the type I error rate for the experiment as a whole, which includes all of the comparisons tested in the analysis. Separate, per comparison probabilities actually combine to produce a much larger value, which we call familywise type I error. This refers to the probability that at least one type I error has been committed somewhere among the various test conducted in the analysis.

Between df

k-1

Between groups df

k-1

μ

mean of a population

X (with a line over it)

mean of a sample

n or η (nu)

number of cases or scores in a sample

Ν

number of scores in a population

Simple effects are similar to

one-factor ANOVAs peformed on a subset of the participants. Also know that since you have two factors, you can test the simple effects by splitting the design by whatever IV you prefer. (No need to perform both sets of analyses, choose to split the design that will best address your research question.)

Type 2 error is inversely related to

power 1-power = type 2 error

As individual differences among subjects increase, what happens to power?

power decreases.

Type II errors are easier for researchers to make when

power is low

median

score in the distribution that marks the 50th percentile; 50% of the distribution falls below and 50% falls above

mode

score in the distribution that occurs most frequently

A treatment study where certain clients share the same therapist is an example of what?

study or data collection scenario where independence is violated.

∑X

sum of X; adding up all of the scores in a distribution

Levene's test

tests the null hypothesis that the group variances are not significantly different from one another

The Method of Pairwise Comparisons was explicitly designed to satisfy

the fairness criterion called the Condorcet Criterion.

Condorcet Criterion addresses

the fairness of declaring a candidate the winner even though some other candidate won all possible head-to-head matchups.

If the null hypothesis is false, what does MSbetween quantify?

the groups will have different means in the population indicating that there is a treatment effect. The thing causing the means to be different is due to not only sampling error/noise, but also treatment effect

If group sizes are unequal, then the ANOVA is driven by the group with

the largest n. If that largest group also has the smallest standard deviation, the within group variability is reduced, increasing the type one error rate.

Conceptually, what does within-‐group sums of squares quantify?

the leftover variability in the dependent variable that is NOT due to the independent variable... this represents error/noise.

Define a Type II error.

the probability of failing to reject the null when there is in fact an effect in the population.

What is the alpha level, and what purpose does it serve in the significance testing process?

the researcher-designated significance level of the test. It is the probability level associated with the decision rule such that a found p-value of greater than alpha means the data is consistent with the null, and a found value less that alpha results in rejection of the null.

the sum of; to sum

Homogeneity of variance

the variability is the same within each group. This means that the standard deviations of each group must be the same.

Type 2 error can only occur when

there is a mean difference in the population, and when random chance yields a sample with small mean differences.

Goal of Dunnet's test

to compare a reference group to every other group. Similar to Tukey, but uses a smaller correction factor because it assumes fewer comparisons. If you're comparing intervention vs control this is the best to use.

Goal of Scheffe Test

to compare all possible pairwise comparisons and complex comparisons. "over-corrects", adjusts the p-value for too many comparisons: maximum protection against type I errors Difficult to detect any group differences

Goal of Tukey's HSD

to compare all possible pairwise comparisons. Keeps familywise type I error rate at .05. Most common procedure used in psych.

statistic

value derived from the data collected from a sample

skew

when a distribution of scores has a high number of scores clustered at one end of the distribution with relatively few scores spread out toward the other end of the distribution forming a tail

multimodal

when a distribution of scores has two or more values that have the highest frequency of scores

negative skewed

when most of the scores are clustered at the higher end of the distribution with a few scores creating a tail at the lower end of the distribution

positively skewed

when most of the scores are clustered at the lower end of the distribution with a few scores creating a tail at the higher end of the distribution

Type one error can only occur

when there is no mean difference in the population, and when random chance yields a sample with an extreme mean difference (p <.05)

Under what condition will violating homogeneity of variance lead to an increase in Type I errors?

when: - The group sample sizes are equal OR - We have unequal group sizes and the group with the largest n (largest group) has the smallest standard deviation Because these conditions result in a within group variability that is too small, giving us an F-stat that is too big--biasing the results towards a Type 1 error.

independence violations shrink

within group variability, in that each individual is not contributing one unit uniquely.


Kaugnay na mga set ng pag-aaral

Ch 23 Chest and LRT Disorders PrepU

View Set

History & Geography Unit 6 Self Tests 1-3

View Set

Intro to Architecture Midterm October

View Set

Past Participle Spelling Practice

View Set

Linear Algebra Section 1.5: Solutions of linear systems

View Set

Wealth Management Exams 1 & 2 Study Guide

View Set