PSY3062 Lecture Revision
In order to set-up a planned contrast, which word replaces 'Polynomial' in the syntax?
'Special'
Which post-hoc test(s) is/are used when there are unequal sample sizes?
- Gabriel's (small discrepancy in n) - Hochberg's GT2 (large discrepancy in n)
What are the assumptions for a parametric test?
- Normality - Homogeneity of variance - Homoscedasticity - Independence
In a mixed model ANOVA, what does Error A measure? (i.e. SSw/in a)
- between-subjects error variability - measures differences between subjects in the same group - which includes individual differences - how much variance in Factor A it is reasonable to expect based on chance
Which two forms of output from SPSS can be used to assess normality VISUALLY?
- histogram - P-P plot
What are the assumptions for an INDEPENDENT SAMPLES t-test?
- independent observations - normality - homogeneity of variance - interval/ratio level measurement
What are some advantages of using a Two-Way repeated measures ANOVA, as opposed to a One-Way ANOVA?
- interaction effects - ecological validity - fewer studies
What are the assumptions for a regression model?
- no multicollinearity - independent errors (for any pair of observations, the error terms should be uncorrelated) - normally-distributed errors (the residuals should be random and normally distributed with a mean of zero) - homoscedasticity (for each value of the predictors the variance of the error term should be constant)
What are the 3 types of multiple regression studied in the semester?
- standard - hierarchical - stepwise
Which two forms of output from SPSS can be used to assess normality STATISTICALLY?
- tests for normality (Shapiro-Wilk, Kolmogorov-Smirnov) - test for skew/kurtosis
What are 3 solutions for dealing with normality violations?
- transformation - remove outliers - use a non-parametric method
For a partial eta squared, what is a small, medium, and large effect?
.01 = small .09 = medium .25 = large
If tolerance is less than ___ and/or VIF is greater than ___, there is an issue of multicollinearity
.1; 10
For Cohen's d, what is a small, medium, and large effect?
.2 = small .5 = medium .8 = large
A power of ___ or greater is considered acceptable
.80 (80% or more)
In order to perform a planned contrast using syntax, what do you need to add, following clicking 'Paste'?
/CONTRAST(IV) SPECIAL (insert contrast weights)
The sum of contrast weights in a planned comparison must be what?
0
Cook's distance of greater than ___ indicate that a case is having an overly influential effect on the regression model
1
In terms of outliers in a regression model, what percentage of your cases are expected to be outliers?
1%
According to Field (2013), what Mahalanobis distance value, would be problematic for the following sample sizes? 1. N=100 2. N=30 3. N=500
1. >15 2. >11 3. >25
What are the 9 steps in the research process?
1. Find a research idea: topic, and hypothesis 2. Define and measure variables 3. Identify participants 4. Research strategy 5. Research Design 6. Conduct the study 7. Evaluate data 8. Report the results 9. Refine and reformulate research idea
What are the 3 options in SPSS for entering data using a STEPWISE multiple regression?
1. Forward - searches for the best predictor and if it significantly improves regression model it gets added - process continues until adding a predictor does not significantly improve the regression model 2. Backward - searches for the worst predictor and if removing it does not significantly worsen the regression model, it gets removed - process continues until removing a predictor significantly worsens the regression model 3. Stepwise - same as forward, but each time a predictor is added, it also tests the removal of the worst predictor
What are the assumptions for a Pearson Correlation?
1. both variables measured on an interval or ratio scale 2. both variables should be normally distributed 3. relationship between the variables must be linear
What are some advantages of a within-subjects design?
1. control for individual differences 2. smaller sample size needed 3. eliminating variability = greater power
What are the 4 advantages of a factorial design?
1. enhances external validity 2. more efficient 3. can test for interactions between factors 4. can use to control systematic and non-systematic variance
If the assumption of linearity in a Pearson correlation is violated, what are two solutions listed by the slides?
1. if relationship is monotonic, use a Spearman correlation 2. try transforming the data to achieve linearity
What are some disadvantages of a between-subjects design?
1. large sample size required 2. individual differences = confounding variables 3. individual differences increase variability = decrease power
What are 3 reasons listed on the slides as to why you may achieve non-significant findings in a Mixed Design ANOVA?
1. may really be no difference between conditions 2. your study may have been poorly designed/controlled 3. your study may not have been powerful enough to detect a difference
What are the two reasons listed on the slides as to why correlation does not infer causation?
1. third variable problem 2. the directionality problem
What are some disadvantages of a within-subjects design?
1. time-related factors 2. testing/order effects
If the assumption of normality is violated in a Pearson correlation, what are two solutions listed by the slides?
1. use CLT if N>30 2. use a Spearman correlation
What are some advantages of a between-subjects design?
1. used for a wide range of research questions 2. each score is independent - not affected by practice, fatigue, or contrast effects
What are the two general situations in which a Spearman correlation is used?
1. when two ordinal variables exist (i.e. variables are ranked) 2. measures the consistency of direction of the relationship between two interval/ratio variables
95% of cases should have residuals within +/- ___ SD's
1.96
How many 'chunks' should be compared at any one time for a planned contrast?
2 chunks
99% of cases should have residuals within +/- ___ SD's
2.59
Durbin-Watson values greater than ____ indicate a _____ correlation, and values less than ____ indicate a ____ correlation. The rule of thumb is that values less than ___ or greater than ___ are a cause for concern
2; positive; 2; negative; 1; 3
How many F-ratios will a Mixed Design ANOVA produce?
3 One for each main effect, and one for the interaction
How many F-ratios are there in a Two-Way ANOVA?
3 F-ratios
99.9% of cases should have residuals within +/- ___ SD's
3.29
How many F-ratios are there in a Three-Way ANOVA?
7
___ of cases should have residuals within +/- 1.96 SD's
95%
___ of cases should have residuals within +/- 2.58 SD's
99%
___ of cases should have residuals within +/- 3.29 SD's
99.9%
In a Mixed Design ANOVA, if one wanted to run planned comparisons for the independent measures variable, what would need to be added/changed in the syntax screen?
A new line is added under WSFactor, and takes the form of /CONTRAST(IV) Special (contrast weights)
What is a correlation?
A statistical technique for measuring the extent to which two variables are RELATED. The standardised covariance between 2 variables
What is the difference between 'r^2' and 'adjusted r^2'?
Adjusted r^2 gives you an estimate of r^2 in the population - takes into account the fact that the regression model may not work as well with other sample data. It gives us an estimate of how much variability would be explained if the model was derived from the population rather than a sample
What is the easiest strategy when heterogeneity of variance is found?
Adopt a more stringent alpha level
The overall significance of the regression equation can be evaluated by computing what?
An F-ratio
When running a repeated-measures ANOVA, how is the data entered and what steps are taken?
Analyze --> General Linear Model --> Repeated Measures 1. type in the name for the repeated measured factor 2. Specify the number of levels the levels has 3. Press Add 4. Type in the name of the dependent variable (spaces not allowed) 5. Press Add 6. Click Define and move on
In order to avoid a group from being assigned to a planned comparison, what process is carried out?
Assign it a value of 0
Why should we care about orthogonality in the sum of cross products?
Because if they are orthogonal, we don't need to adjust the alpha level for multiple comparisons, and thus reduce the risk of an inflation of Type I error rate
Why would Mauchly's test place an empty value (i.e. '.') under the Sig. column in a Two-Way ANOVA?
Because it requires there to be MORE than 2 levels of the variable
Why aren't the df all the same for the main effects and interactions in factorial ANOVA?
Because the data being compared is different for each main effect and interaction. So the df change according to what is being compared by that particular F-ratio
Which post-hoc test(s) is/are considered the 'safe option'?
Bonferroni
Which two tests can be used if homogeneity of variance is violated in an ANOVA? i.e. there is heterogeneity of variance
Brown-Forsythe, or the Welch F-ratio
If you wanted to test whether two Pearson correlation coefficients differ significantly, how could this be achieved?
Convert both correlations to Z-scores using Fisher's R-to-Z transformation. Then calculate a Z-score for the difference between the coefficients
Which of the tests studied were measuring 'association'?
Correlations -Pearson -Spearman -Partial
What is the difference between eta squared and partial eta squared?
Eta --> is the sum of squares of the model/effect, over the sum of squares total Partial --> is the sum of squares of the model/effect, over the sum of squares model/effect, plus the sum of squares residual
What does a Factorial ANOVA do?
Examines the effects of each factor (IV) on its own, as well as the combined effects of the factors.
True or False; you can ask SPSS for the Welch F-ratio for a two-way independent measures ANOVA?
False It is asked for, when using a one-way independent measures ANOVA
True or false; eta squared is not a biased estimator?
False - overestimates true effect size in population
True or False; Simple effects analysis is still conducted when there is a non-significant interaction in a Mixed Design ANOVA
False - you only need to conduct further analyses if the interaction is significant
True or false; omega squared will give a larger value than eta squared?
False; it generally gives a smaller value - corrects for bias
True or false; in a regression model, most values should not be clustered around the regression line if we want the assumption of normally distributed errors to be satisfied
False; we want MOST of them clustered around the line
True or False: When using the K-S for normality, we want to reject the null hypothesis?
False; we want p to be greater than .05 - this means that there is no strong evidence that data vary from normal
What is the rationale behind choosing stepwise (forward) vs. stepwise (backward)?
Forward selection allows you to examine how much new variance is added by each predictor. Backward selection doesn't allow you to examine this
Which post-hoc test(s) is/are used when there are unequal variances?
Games-Howell (can be used with unequal n)
In a factorial design, are there more I.V's, or D.V's?
I.V's --> at least 2 or more
The predictor variable is the ___, and the outcome variable is the ___
I.V; D.V Remember IPOD (IV. Predcitor. Outcome. DV.)
What does a significant F-ratio in a multiple regression mean?
It indicates that the regression model results in significantly better prediction of your DV than simply using the mean Tells you there is more variance, than error
In a mixed-model ANOVA, where does the interaction term for the repeated measures factor come from?
It is under SSwithin (SSaXb)
In a regression model, what does SSt relate to?
It uses the differences between the observed data and the mean value of Y
In terms of kurtosis, which form is tall and skinny?
Leptokurtic
Which statistical test is used to determine homogeneity of variance? When is the assumption NOT violated?
Levene's test of equality of variances When p>.05
_____ output can be used and interpreted in place of ANOVA output if the ____ or ____ assumptions are violated
MANOVA; sphericity; normality
What is the formula for the F-ratio?
MSm / MSr (or MSb / MSw)
MSr = ? MSm = ?
MSr = MSw MSm = MSb
How do you determine the threshold for Mahalanobis distance which suggests that a case is a multivariate outlier?
Mahalanobis distance is distributed as chi-square with degrees of freedom equal to the number of predictors (k)
Testing the null hypothesis that variances of between-level differences are equal, is performed by which test?
Mauchly's test; which we want to see p>.05
How is Cohen's d calculated?
Mean difference / standard deviation
What is considered large for 'N', according to CLT?
N > 30
If you analysed the same data with both stepwise (forward) and stepwise (backward), wouldn't you end up with the same results?
No. Backward selection can assess suppressor effects which forward can't. So forward selection is more likely to miss predictors with suppressor effects and therefore runs a higher Type II error risk
For an ANOVA, what does the Sum of Squares model contain?
Only SSm (Sum of Squares model), and SSr (Sum of Squares residual)
What has the biggest impact on S.D?
Outliers
What is the name of the simple effects analysis used for a Factorial ANOVA?
Pillai's Trace
Which is more powerful; planned comparisons, or post-hoc tests? Why?
Planned comparisons; because they encourage you to have strong hypotheses before the analysis
In terms of kurtosis, which form is short and wide?
Platykurtic
In a Mixed Design ANOVA, if one wanted to run planned comparisons for the repeated measures variable, what would need to be added/changed in the syntax screen?
Ploynomial --> Special (for the WSFactor)
Which post-hoc test(s) is/are used when assumptions are met and there are equal sample sizes
REGWQ or Tukey HSD
Which of the tests studied, were measuring 'prediction'?
Regression -multiple -hierachial
How do you calculate variance?
SS/N-1
What two components is SSw in a one-way repeated measures ANOVA, partitioned down into?
SSm and SSr
Which two SS's are listed under SSw for a repeated-measures ANOVA?
SSm and SSr
How is eta squared calculated?
SSm/SSt (SSb/SSt)
If, in an One-way ANOVA, one group is both larger and has greater variance than the other, what would happen to the SSr, and subsequently, the F-ratio? Which type of error is this?
SSr will become greater, and therefore the F-ratio will become smaller. Type II error
If, in an One-way ANOVA, one group is larger, but has smaller variance than the other, what would happen to the SSr, and subsequently, the F-ratio? Which type of error is this?
SSr will become smaller, and therefore the Fratio will become greater. Type I error
What is used to caluclate r^2?
SSregression / SStotal
In a One-way repeated measures ANOVA, what is used in place of SSr (seen in an independent measures)?
SSw (sums of squares within-groups)
What is the difference between the sample distribution, and the sampling distribution?
Sample --> the frequency distribution of the data in our sample (data collected) Sampling --> the frequency distribution of the means of random samples taken from the population (the distribution of means)
Which test of normality is considered more appropriate for smaller sample sizes?
Shapiro-Wilk
What does Mauchly's test, test?
Sphericity Tests the null hypothesis that variances of between level differences are equal
What is the SSm, or SSb?
Sum of squared differences between each group mean and the grand mean
What is the SSr?
Sum of squared differences between each score and the GROUP mean
What is the SSt (SS total)?
Sum of squared differences between each score and the grand mean
In a multiple regression, what does the notation b0 signify?
The Y-intercept
What is multicollinearity?
The assumption that predictors are highly correlated In a regression model, we don't want this to occur
When covariance has been standardised, what is the name of the resulting statistic?
The correlation coefficient - or Pearson's r
What does Cohen's d tell us?
The degree of separation between two distributions
How does SSwithin for a Factorial ANOVA differ to that in a One-Way ANOVA?
There are factors for each variable, and an error term for each variable
What are the assumptions of a Two-Way ANOVA?
They are the same as for a One-Way ANOVA - interval/ratio data - homogeneity of variance - normality - independence
What do post-hoc tests do?
They compare each mean against all others
True or False: If the sample distribution is normal, then the sampling distribution will also probably be normal?
True
True or False; the same formula is used on both Pearson and Spearman correlations?
True - the only difference is that they are performed on ranked data instead of interval for the Spearman
Performing multiple T-tests increases the likelihood of what?
Type I error
How are tolerance and VIF related?
VIF = 1/tolerance tolerance = 1/VIF
What does VIF stand for?
Variance Inflation Factor
Which is considered more powerful; Brown-Forsythe or Welch F-ratio?
Welch
When is a distributional test most important, but also most insensitive?
When 'N' is small
What is a monotonic relationship?
When two variables consistently increase or decrease together, but are not linear
When do you conduct a power analysis?
When you have a negative/non-significant effect in a mixed design
When is a planned comparison utilised in place of a post-hoc test?
When you have generated specific/directional hypotheses
What does an interaction effect test?
Whether the effect of one factor is the same across ALL levels of another factor
When writing up an ANOVA, what do we start out with?
Writing up the omnibus test Includes what test was used, F(B,W) = F-ratio, p<?, n^2 = ?
What is a zero-order correlation?
a correlation between two variables when you do NOT control for anything
Shrinkage is best evaluated using what kind of study?
a cross-validation study
What does a Leptokurtic tail refer to?
a heavy tail - positive kurtosis
What does a Platykurtic tail refer to?
a light tail - negative kurtosis
What is the difference between a partial and semi-partial correlation?
a partial controls for the impact of a third variable on both other variables, while a semi-partial controls for the impact of another variable on only 1 other variable
What is an Omnibus test?
a test in ANOVA that tells us if there is a difference somewhere in the means, but doesn't tell us exactly where
What is heteroscedasticity?
a violation of homoscedasticity; this means that at each level of the predictor the spread of the residuals is not the same
Post-hoc tests are ___ or ___
all; none
What is a multivariate outlier?
an usual combination of variables e.g. scoring high on depression and anxiety, but also on self-esteem
How do you report Sig=1.00, and why?
as p>.99; as saying p=1.00 implies that there is 100% certainty in results, which we never have
What is sphericity?
assumes the variances and covariances of differences between treatment levels are the same
If the overall ANOVA is significant, how many of the planned contrasts must also be significant?
at least 1
Outliers ___ the mean, and ___ standard error
bias; inflate
Sampling error will _____ as the sample size _____, and as the number of predictors _____
can be either: 1. increase; decrease; increase 2. decrease; increase;decrease
What does Winsorizing refer to?
changing an outlier's score
When performing a simple effects analysis using syntax, what are the two options that must be added, and to what row of the syntax?
compare (variable) adj (Bonferroni), to the line labelled, EMMEANS=TABLES (IV*IV)
How is Pearson's r calculated?
covariance (x,y) / Sx*Sy
In a multiple regression model, how is the df for 'regression' calculated?
df(regression) = k Where k = the number of predictors
In a multiple regression model, how is the df for the 'residual' caluclated?
df(residual) = N - k - 1 Where N = number of participants
In a multiple regression model, how is the df for 'total' calculated?
df(total) = N - 1 Or by adding df(regression) and df(residual) together
What are the assumptions for a hierarchical multiple regression analysis?
exactly the same as a standard multiple regression analysis - outcome/predictor variables must be interval or ratio - non-zero variance - linearity - independence - no multicollinearity - homoscedasticity - independent errors - normally-distributed errors
True or false; In a Factorial ANOVA, parallel lines indicate an interaction between variables
false
Post-hoc tests control for inflated ____ error
family wise
Mahalanobis distance measures what?
how far each case is from the means of the predictor variables
Orthogonal contrasts are ____
independent
Simple effects analysis looks at the effects of one ____ at one level of the other ____, then repeats the process for all other levels
independent variable; independent variable
The ____ must be significant to use simple effects analysis
interaction
What is covariance?
it tells us how much scores on two variables differ from their respective means
How is df(b) calculated for an ANOVA?
k-1
If group sizes are equal, then SSmodel can be divided into ___ components exactly
k-1
There will always be ____ orthogonal contrasts for k groups
k-1
Small effect sizes will be significant with ____ n
large
In terms of heterogeneity of variance, ____ group variances, associated with ____ sample sizes, tend to produce a CONSERVATIVE ____
large:; large; F-statistic
In terms of heterogeneity of variance, ____ group variances associated with ____ sample sizes tend to produce a LIBERAL ____
large; small; F-statistic
A planned comparison is a ____ ____ of ____
linear; combination; means
How is df(w) calculated for an ANOVA?
n-k
Which assumption is ANOVA relatively robust to violations of?
normality - only when sample sizes are reasonably large, and equal
If contrasts ARE orthogonal, what must be done to the alpha level
nothing; there is no risk of type 1 error rate
As the ____ approaches sample size (N), ____ approaches 1
number of predictors (k); r^2
What does the F-ratio tell us?
only that the direct or indirect manipulation was successful
Orthogonal contrasts are also known as what?
planned comparisons
What type of variance is used in Cohen's d? Why?
pooled variance because the SD will be different between groups
In terms of skew, a positive skew has scores bunched at which end of the distribution? What about a negative skew?
positive = bunched at low values negative = bunched at high values
The probability of detecting an effect if it is there, refers to what?
power
How is shrinkage calculated?
r^2 - adjusted r^2
When performing a data transformation, what would a reciprocal transformation (1/x) be used for?
reducing influence of large scores and stabilize variance
When performing a data transformation, what would a square root transformation (sqrt[x]) be used for?
reducing positive skew and stabilizing variance
When performing a data transformation, what would a log transformation (log[x]) be used for?
reducing positive skew, and stabilizing variance
MANOVA tests are more ____ than ANOVA to assumption violations, but they are less ____
robust; powerful
Failure to replicate is called what?
shrinkage
When you find a significant interaction between two or more variables, ____ can be used to examine that interaction
simple effects analysis
To convert skewness and/or kurtosis into a z-score, what calculation is performed?
skewness/standard error kurtosis/standard error
What is a small, medium, and large effect size for omega squared?
small = .01 medium = .06 large = .14
How does a hierarchical multiple regression differ to a standard multiple regression?
standard - all predictors are entered simultaneously hierarchical - researcher decides the order in which the predictors are entered into the model
How does a stepwise multiple regression differ to a standard multiple regression?
standard - all predictors are entered simultaneously stepwise - predictors are selected (by the computer) on the basis of their semi-partial correlation with the outcome variable
What is the solution used due to covariance being dependent upon the units of measurement of each variable?
standardise them - divide by the standard deviations of both variables
When performing a hierarchical regression, what aspect of the output is examined to identify whether there is a change following the addition of secondary variables?
the 'R square' column
What does kurtosis refer to?
the 'heaviness' of the tail
Which of the tests studied were measuring 'mean differences'?
the ANOVA's - one-way - two-way - repeated measures - independent - mixed model Post-hoc tests Planned comparisons Simple Effects Analyses
Which Epsilon value is recommended to be used if Mauchly's test is violated?
the Greenhouse-Geisser value (as long as it is <.75) --> if this is not the case, use Huynh-Feldt
What is tolerance?
the amount of variability in a predictor which is not explained by the other predictor(s)
What does the assumption of normality refer to?
the assumption that the SAMPLING DISTRIBUTION is normal, not necessarily the SAMPLE DISTRIBUTION
What is homoscedasticity?
the assumption that the variances are equal across all levels of a factor
Which table is analysed in SPSS to determine if there are outliers in a regression model?
the casewise diagnostics table
Which 'chunk' should be your reference when conducting a planned comparison?
the control group
In order to determine the direction of the effect in a Factorial ANOVA, which aspect of the output would one analyse?
the estimated marginal means graph
What does the Durbin-Watson test examine?
the independence of scores
What is the 'technical term' for overall means?
the marginal means
What is trade-off as a result of using planned comparisons in ANOVA, due to their high-level of conservation?
the need for a larger effect size to reach significance
If your maximum Mahalanobis distance is greater than the critical value, what does this suggest?
the presence of one or more multivariate outliers
The first subscript of a particular value refers to ___, while the second subscript refers to ___
the row that the value is in; the column the value is in
When performing a hierarchical multiple regression, which variables are normally entered first?
the theoretically important, or known predictors
What do tolerance and VIF measure?
they examine the issue of multicollinearity
True or False; you cannot carry out a post-hoc test if your main effect is not significant
true
True or false; post-hoc tests use a stricter criterion to accept an effect as significant?
true
What does 'factorial' mean?
two or more factors
The F-ratio compares the ____ predicted by the ____ with the ____ not predicted by the _____
variance; regression model (MSm); variance; model (MSr)
What are suppressor effects in a multiple regression?
when a predictor only has a significant effect when another variable is held constant
What does Cook's distance measure?
whether a single case in the regression model is having an influence
How is the new df calculated if you are using the Greenhouse-Geisser value?
you multiply your original df by the epsilon values
How is the Bonferroni statistic calculated?
your alpha level / number of tests