All flash cards for PSY3062
what are the steps to conducting a planned comparison- independent measures
Assign contrast weights for each group- signs of weights, magnitude (numbers) and weights The weights must equal zero We continuing onto the next contrast we must give any contrasts that will not be involved (e.g. control group) a 0. In SPSS, analyse > compare > one-way ANOVA Options > place a tick next to descriptives, homogeneity of variance test, Brown-Forsythe, and Welch, means plot Contrast > assign coefficients - must be in the same order as variables in data > the coefficient total should equal 0 Next > add coefficients for second contrast Press okay
Post hoc tests
No specific hypothesis at the outset. Uses stringent (conservative) aPW (pair wise) for each test to control family-wise type 1 error rate. In order to a post hoc-test properly you must have a significant ANOVA.
SS between subjects
P = total of the scores for each participant K= number of treatment levels G = grand total for all scores N = total number of scores
T-tests
Parametric, tests the difference between two sample means.
what are planned comparisons?
Planned comparisons (priori contrasts): Specific hypothesis at outset. Breaks down treatment variance (model variance) into component parts PC are conducted when we have specific hypotheses about differences amongst the groups means before running the ANOVA You set up your comparisons at the same time as an ANOVA We break down the treatment variance into component variance according to our hypothesises made before et experiment Providing that the hypotheses are independent of one another, the family wise type 1 error will be controlled
keep contrasts independent
a. Contrasts should be orthogonal to each other b. Each unique chunk can only be compared once→ otherwise type 1 error will be inflated c. K-1 possible orthogonal contrasts (K= number of groups) if you follow this rule and always compare only two pieces of variance then you will always end up with one less contrast than the number of groups (k-1)
sphericity
assumes that the variances and covariances of the differences between the levels are approximately the same. if we calculated difference scores for all possible pairs of levels, their variances would be approximately equal special case of compound symmetry - less restrictive form because it is an assumption we want it to be non-significant
Post hoc recommendations
assumptions met and equal sample size use- REGWQ or Tukey HSD safe option Bonferroni- although, quite conservative unequal sample sizes- Fabriels (small discrepancy in n) or Hochberg's GT2 (large discrepancy in n) unequal variances-Games-howell
model sum of squares SSm
broken into three components: SSA, SSB, and SSAB we can calculate SSAB by using subtraction SSAB = SSM - SSA- SSB
using z-scores to find outliers
by converting our data to z-scores we can use benchmarks that we can apply to data sets to search for outliers we go into analyse- descriptive statistics- descriptives dialog box select he variable to convert and tick the save standardised values SPSS will then create a new variable in the data editor to look for outliers we can count how many z-scores fall within certain important limits in a normal discussion we'd expect about 5% to be greater than 1.96, 1% to have absolute values greater than 2.58 and none to be greater than about 3.29
the residual sum of squares, SSr
calculated the same was as in one-way anova SS1 + SS2 + SS3 etc it represents individual differences in performance or the variance that can't be explained by factors that were systematically manipulated
reciprocal transfomr: 1/x
can reduce influence of large scores and stabilise variance, reverses scores, so you may need to reverse before transform
square root transform: sqrt (x)
can reduce positive skew and stabilise variance
transform the data
can sometimes resolve distributional problems, outliers, unequal variances involves applying a mathematical function to all data points must exercise caution when using data transformations as changes hypothesis tested e.g. when using a log transformation and comparing means you change from comparing arithmetic means to comparing geometric means
how many levels?
can use numerical code to designate the number of levels of each factor 2 X 3 means that there are 2 levels of factor A and 3 levels of factor B- because there are two numbers, and therefore, two factors, it is a two-way design 2 X 3 X 3 means there are 2 levels of factor A, 3 levels of factor B, and 4 levels of factor C-because there are three numbers, therefore, three factors, it is a three-way design
what do we use to assess the sample distribution?
central limit theorem
output of two-way repeated anova
descriptive statistics- look at Std. deviation and note whether they are similar across the different levels- can be a way of identifying extreme outliers we the multivariate tests when sphericity is violated and for some reason we can't use the adjusted test hyena-feldt. perhaps they are too conservative and you want to us more powerful tests. if the assumption of sphericity is not violated we view sphericity is assumed row
obtaining good data
designing research- do not alter or embellish standard scales without clearly indicating the changes collecting the data- participate activity in the data collection, check to ensure that research instruments have been completed without omissions coding and analysing the data- check the scoring key against the survey instrument, check for reserve questions, recode so that all items purporting to measure same characteristic scored in same direction, transform scale and item values to be consistent with scoring key, know your measurement instruments (what populations have they been used on), weight scales according to standard scoring procedures, assign unique numbers to missing values to distinguish them from other responses, check the order of data input, make sure data sets are comparable before combining them to make large sample describing the data- label compulsively, data should have clear and permanent labels, do not discard raw data
why factorial design?
enhances external validity: by not limiting the study to examining the influence of only one IV on the DV you can improve the generalizability of the study's findings more efficient: we can use fewer participants than compared to if we did separate studies on gender/depression scores and diagnosis/depression scores. Whilst factorial designs require more participants and resources than a single factor study, compared to conducting several single factor studies to thoroughly examine a particular DV, conducting one factorial design will be more efficient and provide more information. test for interactions between factors: Factorial designs cannot only examine the effects of each IV on its own, but also the combined effects of the IVS- the interaction between factors. used to control systematic and non-systematic variance: Including participant characteristics such as age or gender as IVs and reduce the error variance of the analysis- which will make the statistical analysis more powerful Including additional variables also enables you to test whether these variables are acting as confounding variables
parameter estimation with confidence intervals
if null hypothesis value is outside 95% confidence intervals, can reject he null gives you an idea of how precise your study is
Central limit theorem
if our N is large enough, then the sampling distribution will be approximately normal even if our sample distribution or population is non-normal a large N is >30-40 people if our N is small, we rely on our sample if sample distribution is normal, then the sampling distribution will also probably be normal if can;t assume, use non parametric
spotting linearity
if the graph funnels out, then the chances are that there is heteroscedasticity in the data if there is any sort of curve in this graph then the chances are that the data has broken the assumption of linearity
when do we use greenhouse gessier correction?
if the greenhouse-geisser is less than 0.75 we use this correction - it is a conservative method
how can box plots tell us whether the distribution is symmetrical or skewed?
if the whiskers are the same length then the distribution is symmetrical if the top or bottom whisker is much longer than the opposite whisker then the distribution is asymmetrical
including references for assumption violations
if you have assumption violations or outliers that need to be dealt with - you need to state how you have dealt with them you should a cite a reference to support what you have done
orthogonal means...
independent therefore orthogonal contrasts are contrasts which all tap into different things no redundancy they are unique chunks of variance
chronbach's alpha
is a measure of internal consistency it is used to assess the extent too which a set of questionnaire items tapping a single underlying construct covary values above 0.7 are considered acceptable.
what are the methods of follow up for ANOVA?
multiple t-tests? Not a good idea- increases our type 1 error we could do orthogonal contrasts/comparisons- this what we do when you have a hypothesis post hoc test- use when you DO NOT have a planned hypothesis, compare all pairs of means, in order to use post hoc tests properly we have to have a SIGNIFICANT ANOVA trend analysis- if we believe the means follow a particular shape
non-orthogonal comparisons
non-orthogonal comparisons are comparisons that are in some way related using a cake analogy- non-orthogonal comparisons are where you slice up your cake and then try to stick slices of cake together again the comparisons are related so the resulting test statistics and p-values will be correlated to some extent for this reason you should use a more conservative probability level to accept that a given contrast is statistically meaningful
non-orthogonal comparisons
non-othogonal contrasts are comparisons that are in some way related using a cake analogy- non-orthogonal comparisons are where you slice up your cake and then try to stick slices of cake together again e.g. use placebo in first contrast and then re-use it if you multiple and add the codings from the two contrasts the sum is not zero- the contrasts are not orthogonal with non-orthogonal contrasts, the comparisons you do are related and so the resulting test statistics and p values will be correlated to some extent for this reason you should use a more conservative probability level to accept that a given contrast is statistically meaningful
parametric assumptions
normally distributed: bell curve, there is no skewness & kurtosis, assume that there is no outliers, sampling distribution, residuals homogeneity of variance and homoscedasticity independence
general equation for a linear model
outcome = model + error outcome (bX) + error the model is described by one or more predictor variables (the X in the equation) and parameters (the B) that tell us something about the relationship between the predictor and the outcome variable this model will not predict the outcome perfectly so for each observation there will be some error
________design is more powerful than_________design because they remove individual differences
repeated measures; independent measures
SPSS column (data view)
represents each variable
MSr
represents the average amount of variation explained by extraneous variables- the unsystematic variation
mean squares MSm
represents the average amount of variation explained by the model- systematic variation
SPSS column (variable view)
represents the different options that you can select for each variable
within-subjects design
same participants take part in each condition one sample for level of the IV also known as repeated measures
the effects of kurtosis seem unaffected by whether _______
sample seizes are equal or not
effect size
small effect sizes will be significant with large n larger the sample size the more likely you will find an effect effect sizes indicate how big that effect is it allows comparisons across contexts
In big samples__________ deviations from sphericity can be significant, and in small samples ________violations can be non-significant
small; large
Bonferroni has more power when the number of comparisons is_____, whereas Turkey is more powerful when testing______
small; large number of means
one-way ANOVA
statistical technique for comparing several means at once tests the null hypothesis that the means for all populations of interest are equal we call it the omnibus test - that somewhere there is an inequality- but it doesn't tell us where this difference actually is this is why need a follow up to work out where this difference is- there are two approaches: post hoc or planned comparisons/contrast
what does 'reward > (punishment + indifference)/2'
that reward will be greater than the means of punishment and indifference
if the f-ratio is greater than 1 it is indicates______
that the experimental manipulation had some effect above the effect of individual differences in performance
kolmogorov-Smirnov and shapiro-wilk test
they compare the scores in the sample to a normally distributed set of scores with the same mean and standard deviation ananlyse-descrptive statistics- explore
additivity and linearity
this assumption means that the outcome variable is linearly related to any predictors and if you have several predictors than their combined effect is best described by adding their effects together this assumption is the most important because if it is not true then even if all other assumptions are met, your model is invalid because you have described it incorrectly
trim the data
this means deleting some scores from the extremes could mean deleting the dat from the person who contributed the outlier trimming involves removing extreme scores using one of two rules; percentage based rule and a standard deviation based rule.
leptokurtic distributions make the type 1 error rate ______ and consequently the power is______
too low; too high
SStotal
total variance in the data sum of squared differences between each score and the grand mean
transforming data
transforming the data can sometimes resolve distributional problems involves applying a mathematical function to all data-points transformations can hinder rather than help if the wrong transformation is applied and the interpretation also becomes more difficult
how do we reduce bias?
trim the data winsorizing analyse with robust methods transform the data
when sphericity is definitely not violated ___________can be used
tukey's test
factorial
two or more IVs
reject the null hypothesis when it is true
type 1 error (a)
accept the null hypothesis when it is false
type 2 error (b)
_______ variances create a bias and inconsistency in the estimate of the standard error associated with the parameter estimates
unequal
Scale (interval/ratio data)
values represent ordered categories with a meaningful metric. this means that distance comparisons between values are appropriate e.g. height, weight, years, dollars etc.
SSmodel or SS between
variance explained by the model- groups sum of squared differences between each group mean and the grand mean weighted by group n the more groups means spread out the greater the sum of between will be SStotal- SS within treatments Between groups= our model = SS model = SS between
SSresidual or SSwithin
variance that cannot be explained by the model-groups sum of squared differences between each score and the group mean SSwithin treatments= SS1 + SS2 + SS3 etc. within groups = our error = SS residual = SS within
how do we assess normality
visually: using histograms and p-p plots statistically- tests for normality, test for skew/kurtosis
underlying theory of between-subjects ANOVA
we calculate how much total variability there is between scores- total sum of squares we then calculate how much of this variability can be explained by the model we fit to the data: model sum of squares and then how much cannot be explained- how much variability is due to individual differences or error - residual sum of squares
underlying theory of a two way ANOVA
we still find the SSt and break this variance down into variance that can be explained by the experiment (SSM) and variance that cannot be explained (SSR) in two way ANOVA, the variance explained by the experiment is made up of not one experimental manipulation but two therefore we break the model sum of squares down into variance explained by the first IV (SSA), second IV (SSB) and variance explained by the interaction of these two variables (SSA X SSB)
ANOVA notation
we use subscript notation to represent and locate individual scores in an experiment single subject notation Xi the ith score in a set
confidence intervals
we use values of the standard normal distribution to compute the confidence interval around a parameter estimate. using the values of the standard normal distribution makes sense only if the parameter estimates actually come from one
Are planned comparisons more powerful than post-hoc tests?
yes
planned contrasts
you can do planned contrasts for a two-way design but only for the main effects you can run contrasts involving one IV at a time if you have predictors to test.
non-singificance
you might obtain non-significant findings for the following reasons: there may really be no difference between conditions you study may have been poorly designed/controlled, which resulted in the lack of a difference or a large error term or you study may not have been powerful enough to detect a difference- e.g. because your sample size is too small it is important to note that non-significant results are not necessarily indicative of a failed experiment a non-significant result can be very interesting and important finding
dealing with outliers
you should check you data for outliers you convert all scores into z-scores and if any have a value of Z>3.29 (p<0.001)- they are outliers could also change the score (windorsing) - change the score to the next highest score +1 - convert the score to the score expected for a Z score of 3.29 - convert the score to the mean plus 2 or 3 standard deviations
contrasts in SPSS
• For one-way ANOVA, SPSS has a procedure for entering codes that define the contrasts we want to do. However, for two-way ANOVA no such facility exists and instead we are restricted to doing one of several standard contrasts. • We can use standard contrasts • In reality, most of the time we want contrasts for our interaction term, and they can be obtained only through syntax. • To get contrasts for the main effect of factor 1, click on contrast in the main dialog box.
why can we use planned comparisons instead of post hoc tests?
• Post-hoc tests control for inflated family-wise error o Make critical a smaller to account for type 1 error inflation due to multiple comparisons o This is good→ keeps our experiment-wise (family wise) error at a o It is conservative - reduces our power- makes harder to find our effected interest o It means we require larger effect sizes to reach significance • This may be too conservative, especially if our sample size is small and/or we have many groups or levels to compare
using syntax to do simple effects analysis
• Setup the analysis as normal, but when you're ready to run it, instead of pressing OK, press paste- which will paste the analysis info into a new syntax file • Type an extra command in your syntax file e.g. /EMMEANS=TABLES (GENDER*ALCOHOL) COMPARE (ALCOHOL) ADJ (BONFERRONI) • Compare(alcohol) tells SPS to examine the interaction by comparing the different alcohol levels one gender at a time • ADJ(bonferroni) this will conduct post-hoc tests for the differences between alcohol levels by gender.
custom contrasts
• Syntax is like the underlying code to SPSS • You can generate syntax for any analysis using the paste button o Number of benefits you can see how syntax works or save as a syntax file • Looks something like this→
heterogeneity and sample size
• Unequal sample sizes and heterogeneity do not mix • Gamst et. Al (2008) provide further information about this situation: • Heterogeneity can become a serious issue with unequal sample sizes. For example, large group variances associated with small sample sizes tend to produce a liberal F statistic whose nominal alpha level such as .05 is actually operating at a less stringent level such as .10. • Conversely, when the situation arises where large sample sizes and large group variances are associated, we produce a conservative F statistic, where the nominal alpha level such as .05 is actually more stringent, for example, .01 (Stevens, 2002). (p. 58)
leptokurtic
+ heavy tails
power analysis in SPSS
- Ask SPSS to run (post hoc) power analyses using sample size and effect size estimate based on your sample data - Click on options button and place a tick next to observed power power of 0.80 or greater is considered acceptable
why p >0.99
- By logic, is that reporting a p-value as p = 1.00 implies 100% certainty, which we actually never have. - Since SPSS only reports to 3 decimal places, it fails to capture the fact that the p value may have been. - One the other hand, I think that reporting a p-value as p> 0.99 better communicates the fact that the p value was extremely large, without also implying 100% certainty in the outcome - Just as there is no p= 0.000
tips for choosing contrasts
1. use your control group as a reference 2. only compare two chunks at a time 3.keep contrasts independent
interaction effect
Interaction effects examines two or more factors at the same time, which may not be predictable based on the effects of either factor on their own.
when is a liberal F statistic produced?
Large group variances associated with small sample sizes tend to produce liberal F statistic - more likely to find a significant difference when there isn't one
when is a conservative F statistic produced?
Large groups variances associated with large sample tend to produce conservative F statistic - more likely not to find a significant difference when one does exist.
main effect
Main effect is the effect of one factor (IV) on its own
do we perform both planned comparisons and a post hoc?
No not both Planned comparisons if you have a priori hypotheses
MS between
SSbetween/df between
box plots
a boxplot is a graphical representation of a five number summary 50% of data is contained within the interquartile range, IQR at the top and bottom of the box plot are two whiskers; which show the top and bottom 25% of scores
the scatterplot
a graph that plots each person's score on one variable against their score on another it tells us whether there seems to be a relationship between the variables, what kind of relationship it is and whether any cases are markedly different from others.
how do we run a One-way repeated measures ANOVA in SPSS
analyse- general linear model- repeated measures
outliers and histograms
check the tails of the distribution to see if data points fall away at the extremes
sphericity is sometimes referred to as ________, is a more general condition of ___________
circularity; compound symmetry
between-subjects
compare subjects between each other independent measures different participants take part in each condition
the ANOVA model
compare the amount of variability explained by the model to the error in the model (our model is our groups in ANOVA) if the model explains a lot more variability than it can't explain (the model variance is much greater than the residual variance), then the experimental manipulation had a significant effect on the outcome (DV) the larger F is the less likely it is due to sampling error
if the standard error is biased then the _______will be too because it is based on the standard error
confidence
null hypothesis significance testing
develop an alternative hypothesis H1 (the hypothesis we are interested in) identify the null hypothesis H0 (no change/no significant difference) test the null hypothesis if the data are unlikely under the null (p<0.05), reject H0
the f-ratio ______ tell us whether the F-ratio is large enough to not be a chance result
does not to discover this we can compare the obtained value of F against the maximum value we would expect to get by chance if the group means were equal in an F-distribution with the same degrees of freedom if the value we obtain exceeds this critical value we can be confident that this reflects an effect of ur independent variable
analyse with robust methods
e.g. trimmed means, bootstrapping; methods that don't rely on normality assumptions
the f-ratios
each effect in a two way ANOVA has its own f-ratio to calculate these we have to first calculate the MS for each effect by taking the SS and dividing by the respective degrees of freedom
SPSS row (data view)
each row represents on case or participant
SPSS row (variable view)
each row represents one variable
compound symmetry holds true when both the variances across conditions are
equal
what does a factorial ANOVA do?
examines the effect of each factor IV on its own, as well as the combined effect of the factors main effect is the effect of one factor (IV) on its own interaction effect examines two or more factors at the same time- that is the combined effect, which may not be predictable based on the effects of either factor on their own thus, factorial ANOVA produce multiple F ratios- one for each main effect and interaction term knowing about one f-ratio tells you nothing about the others- each term sits by itself
outliers
extreme scores outliers bias the mean and inflate the standard error
summary of factorial designs
factorial design- examining the impact of two or more factors on a particular dependent variable
stacked histogram
good way to compare the relative frequency of scores across groups
what are the ways to detect outliers?
histograms and box plots descriptive statistics analysis - z-scores in excess of 3.29 (p<0.001)
why two way repeated measures?
more power- detecting effect is high- eliminates individual differences fewer participants
histogram
plots a single variable (x-axis) against the frequency of scores (y-axis)
frequency polygon
same data as the simple histogram, except that it uses a line instead of bars to show the frequency
cohen's d
tells us the degree of separation between two distributions- how far apart, in standard deviation units, are the means 0.2 small 0.5 medium 0.8 large
platykurtic distributions make the type 1 error rate ______ and consequently the power is______
too high; too low
if a test is conservative (the probability of a _____ error is small) then it is likely to lack statistical power (the probability of a _____error will be high)
type 1; type 2
the assumption of normality
we assume that the sampling distribution is normal, not necessarily the sample distribution
when should we use REGWQ test?
when we want to test all pairs of means
SPSS contrasts
• SPSS has a standard set of built-in contrast which can perform- can be useful but more complex • Note: not all are orthogonal
planned comparisons
/WSFACTOR = time 3 SPECIAL (111,2 -1 -1, 000) we replace polynomial with Special () and insert the contrast weights in brackets for group we add in new line after this which takes the form /CONTRAST (group) Special () and insert the contrast weights in brackets
skewness and kurtosis - SPSS
0 indicates normal distribution we convert the value to a z-score (1.96 = p<0.05) to compute the z-score: statistic/standard error e.g. skewness value/standard error
MS error
SSerror/ df error
SS error
SSwithin - SSbetween subjects
MS within
SSwithin/dfwithin
nominal data
a variable can be treated as nominal when its values represent categories with no intrinsic ranking e.g. postcodes, gender, employment status, religious affiliation categorical data
ordinal data
a variable can be treated as original when the values represent categories with some instruct ranking e.g. likert scales, position in race, ranking ranked data
use your control group as a reference
a. If you have a control group as a reference, it is usually because you want to compare it to other groups b. Then compare other groups c. E.g. no diagnosis, depression, schizophrenia
type 2 error
accept the null hypothesis when we shouldn't- failed to reject this is related to power
line charts
are bar chats but with lines instead of bars simple line or multiple line charts
To do planned comparisons, the hypotheses must be derived ____________the data has been collected
before
One non-parametric method used to estimate a statistic's sampling distribution is known as...
bootstrapping
mixed-design ANOVA
compares several means when there are two or more IV and at least one of them has been measured using the same entities and at least one other has been measured using different entities each effect is independent of one another- therefore knowing that one particular effect is significant tells you nothing about whether any other effects are likely to be significant. significant interactions indicate that the main effects do not tell you the whole story of what is going on with your data
_________ can be extremely inaccurate when homogeneity of variance cannot be assumed
confidence intervals
when group with larger sample sizes have larger variances than the groups with smaller sample sizes, the resulting F-ratio tends to be ________
conservative - more likely to produce a non-siginifcant result when genuine difference does exist
reject the null hypothesis when it is false
correct decision (1-b)
error A
error A- used in the f-ratio denominator when calculating the main effect of factor A subjects within A, between subjects error variability, measures differences between subjects in the same group- which includes individual differences how much variance in Factor A it is reasonable to expect based on change
what kinds of factorial designs are there?
independent measures repeated measures or mixed design- a combination of independent and repeated measure factors (mixed models)
between subjects factorial designs
independent measures there are a number of questions we can ask: is there an effect of being in condition A1-3? is there an effect of being in condition B1-2? then there is the extra question of : is there an interaction between A and B e.g. o IV: psychiatric diagnosis AND gender o DV: depression scores o Is there an effect of diagnosis (factor 1) on depression score- ignoring gender? o Is there an effect of gender (factor 2) on depression score- ignoring diagnosis? o Then we can ask the more complicated question: does the effect of diagnosis depend on your gender (interaction)?
skewed distributions seem to have _____effect on the error rate and power for two-tailed tests
little
factorial designs
more often than not, behaviour. affect and cognitions are influenced by more than one factor therefore, we often design studies which involve examining the impact of two or more IVs on a particular dependent variable factorial designs allow us to investigate complicated relationships
sum of squares
need to calculate: SST SSM= SSB (between subject factor) SSR= SSW (within subject factor) work out the overall mean and the group means
what are the two sources of bias?
outliers and violations of assumptions
post hoc tests consist of ______comparisons that are designed to compare all___________
pairwise; different combinations of the treatment groups
type 1 error
reject the null hypothesis when we shouldn't- say theres a difference when there isn't
what are some potential disadvantages of two-way repeated ANOVA?
sphericity temporal confounds- something else different about those times order effects- they just don't care by week 4- we can control this through counterbalancing
mauchly's test
test of sphericity tests the null hypothesis that variances between level differences are equal assumption is violated when it is less than 0.05= we then must look at the Epsilon values the assumption is not violated when it is greater than 0.05
kolmogorov-smirnov test
tests if data differs from the normal distribution the null hypothesis is that the data WILL be normal P<0.05 data vary significant from normal P>0.05 no strong evidence that data vary from normal denoted by D
homoscedasticity
the assumption that the variance are equal across all levels of a factor
homogeneity of variance
the assumption that the variance of one variable is stable at all levels of another variable is the variance similar between two groups? levene's test does this for us H0 - the variances are the same p<0.05 variances are statistically significantly different homogeneity of variance assumption is violated p>0.05 variances not statistically significantly different homogeneity of variance assumption not violated
power
the probability of failing to reject the null hypothesis if it is true/ the probability of detecting an effect if it is there we want more powered studies
SS within (repeated measures)
the variability within an individual- the sum of squared differences between each score and the mean for that respective participant broken down into two parts - how much variability there i between treatment conditions: model sums of squares SSm - how much variability cannot be explained by the model: residual sums of squared SSr
research question and variables
the variables you measure must match your research question
heterogeneity of variance
the variance of one variable varies across levels of another
partitioning the variance
there are two error terms: error A an error B
multivariate tests
these tests conduct a MANOVA rather than an ANOVA these results can be used in pace of the regular ANOVA results i the sphericity or normality assumptions are violated whilst these tests are more robust than ANOVA to the assumption violations they are also less powerful
when do we use lower-bound correction?
this is too conservative- this assumes that sphericity is completely violated avoid using this correction
dealing with normality violations
transformation remove outliers use a nonparametric method- Friedman's test
what happens when we fit a model to the data?
we estimate the parameters and we usually use the methods of least squares when we estimate a parameter we also compute an estimate of how well it represents the population such as a standard error or confidence interval
rules for assigning contrast weights
1. Choose sensible contrasts • You should only compare two chunks of variance • If a group is singled out in a previous contrast, it should be excluded from subsequent chunks 2. Groups coded with positive weights will be compared against groups with negative weights • Assign one chunk of variance with a positive weight and the other chunk with a negative weight 3. The sum of contrast weights must be 0 • If you add up all the contrast weights for a particular comparison, they must sum to 0 in order to be valid • Otherwise you are testing a different null hypothesis 4. If a group is not assigned to a comparison, assign it a weight of 0 • This will exclude the group from calculations 5. For a given contrast, the weights assigned to the groups in one chunk of variation should be equal to the number of groups in the opposite chunk of variation When assigning magnitudes to experimental groups we must look at how many groups there are within the chunk (E1 and E2 = 2) therefore, we should give 1 to each E1 and E2. As a result, the magnitude of the control group should be 2.
when would we use a two-way mixed design ANOVA?
Intervention or drug trails- assessing the main outcome of a new drug e.g. participants have their anxiety levels measured before therapy, at the end of the therapy and again three months after. new therapy or waisting list control. Factor A - group Factor B- time DV- anxiety levels
what are the steps to conducting a repeated measures ANOVA with planned comparisons?
Like above we must choose our contrasts and assign our weights. In SPSS, analyse > general linear model > repeated measures We tell SPSS what is our variables are and how many levels there are in each. We then move our variables cross to the black within-subjects variables > then we click contrasts Note that by default, trend analysis is performed for repeated-measures ANOVA. Therefore, you do not need to click on the contrasts button to set this up. We press continue Next we click plots an move the variable onto horizontal axis Press add then continue Then we click options and place a tick next to descriptive statistics and estimate of effect size > continue Press okay to run the ANOVA
SPSS
analyze > general linear model > repeated measures enter in a within-subjects factor: time between subjects factor: group enter DV specific the levels of time and group we click paste to get to syntax- need to modify it if we want to planned comparisons and simple effects
factorial ANOVA using SPSS
analyze > general linear model > univariate IV placed into fixed factors and DV placed in dependent variable to graph interactions we click on plots and place factor 1 on horizontal axis and factor 2 on separate lines (doesn't matter which way round the variables are plotted)
p-p plot
another graph for checking normality- it plots the cumulative probability of a variable agains the cumulative probability of a particular distribution the actual z=score is plotted against the expected z-score if the data are normally distributed then the actual z-score will be the same as the expected z-score and then you will get a straight diagonal line when values sag consistently above or below this line then kurtosis differs from a normal distribution when values are in a s-shape, the problem is with skewness
variance
assume the variance is roughly equal between groups/levels of factor if it is violated we can use a more strict p value, typically p <0.025 this helps guard against type 1 error transform data- may reduce heterogeneity
error B
error B- used in the f-ratio denominator when calculating the main effect of factor b and the A X B interaction also referred toa s B by subjects within A within each group, measures differences between subjects across time which excludes individual differences how much variance in B it is reasonable to expect based on chance after individual differences have been removed.
post hoc tests
essentially t-tests compare each mean against all others- in general terms they use a stricter criterion to accept an effect as significant they control the family-wise error rate (afw_ simplest example is the bonferroni method
Greenhouse-Geisser estimate
for example, in a situation where there are five contains the lower limit will be 1/ (5-1) or 0.25 (known as the lower-bound estimate of sphericity)
how to run a two-way repeated measures anova in spss
go to analyse- general linear model- repeated measures type in a name for the first repeated measures factor- specific how many levels the factor has type in a label for the DV type in the name for the second repeated measures factor -enter in levels drag variables from the list into within-subjects variables click onto plots and move one IV to one axis and the other on the other axis place ticks next to descriptive stats and estimates of effect size and press continue (unless you want to run post hoc tests or simple effects analysis-if so we move the factors and interaction into the display means for box) then we place a tick next to compare main effects and select bonferroni and then press continue press paste to set up simple effects analysis or custom contrasts press run to perform analysis
what makes a good graph?
graphs should do the following show the data induce the reader to think about the data being presented avoiding distorting the data present many numbers with minimum ink make large data sets coherent encourage the reader to compare different pieces of data reveal the underlying message of the data 2D plot makes a graph easier to compare the values clearly labeled axis minimal or no distractions minimum ink- getting rid of the axis lines
sphericity violations
if Mauchly's test is less than p 0.05 we use an alternative, corrected ANOVA epsilon values: greenhouse-geisser, huynh-feldt, lower-bound each one of them corrects for the violation of sphericity corrects the degrees of freedom
what is the problem of doing multiple t tests?
if we are doing a test it assumes that this is the only test being done decision-wirse error for each comparison assumes it is the only test (a) when we conduct multiple tests we have this 5% rate each time - the more tests we do the more likely that one of those tests will be wrong family-wise error (all of the tests we are conducting) becomes inflated- we need to control this family-wise error rate family wise error= 1-(1-0.05)^4 instance 3 tests our family wise error can be computed = 1(0.95)^4 increases type 1 error- false positive- rejecting the null hypothesis when it is true
simple effects analysis
if you wanted to follow up on a significant interaction with simple effects analysis, add command to to the line which produces the estimated marginal means for the interaction to compare the two groups at each time point compare (group) ADJ (Bonferroni) to compare the three time points for each group separately Compare(time) ADJ (Bonferroni)
underlying theory of repeated measures ANOVA
in a repeated measures search study, individual differences are not random- - because the same individuals are measured in every treatment condition, individuals differences will have a systematic effect on the results - these individual differences can be measured and separated out from other sources of error we partition the variance into: -total variance -between subjects - within subjects- previously the residual or error
Multi-subscript notation Xijk
in two-way ANOVA< three subscripts are needed the first, i, identifies the level of factor a (row) the second subscript, j, identifies the level of factor B- column together (i,j) identify the cell in the data table subscript K identifies the individual score within a score n = number of scores in each cell (in a balanced design) nij = number of scores in a specific cell (i,j) a = number of levels of factor A b = number of levels of factor B N= total number of scores
assumptions
independent observations- someone's score in one group can't affect someone's score in another group interval/ratio level measurement- increments must be equal normality of sampling distribution- remember the central limit theorem homogeneity of variance the group variances should be equal
when the groups with larger sample sizes have smaller variances than the groups with smaller sample sizes, the resulting F-ratio tends to be_____
liberal- more likely to produce a significant result when there is no difference between groups in the population- type 1 error rate is not controlled
main effect with interaction
main effect with interaction: nonparallel lines
main effect with no interaction
main effect with no interaction: parallel lines, there is no crossing over
what test is used to assess sphericity
mauchly's test
double notation, Xij
needed when we have multiple variables for the sample participant the first subject, i, refers to the row that the particular value is in the second subscript, j, refers to the column what is X2,3? the second person (second row) and the third value (third column)
Post hoc tests are done when you have ________________
no specific hypotheses
is there a non-parametric version of mixed ANOVA?
no, but there are robust methods that can be used based on bootstrapping
if the f-ratio is less than 1 it must represent a _________effect
non significant effect this is because there must be more unsystematic than systematic variance our experimental manipulation has been unsucessful
effect size for two-way ANOVA
omega squared- need to first compute a variance component for each of the effects and the error and then use these to calculate the effect sizes for each In these equations, a is the number of levels of the first independent variable, b is the number of levels of the second independent variable and n is the number of people per condition. we also need to estimate the total variability: this involves the sum of all variance component for each of the effects and error plus the residual MS the effect size is then calculated by the effect/total variance
outliers and boxplots
outliers are displayed as little circles with an ID number attached
if the test statistic is biased then so too will its _____
p-value
what are the 3 contexts that can be biased?
parameter estimates standard errors and confidence intervals test statistics and p-values
positive/negative skew
positive skew- scores bunched at low values with the tail point to high values negative skew- scores bunched at high values with the tail point to low values
disadvantages with normality tests
relatively insensitive when N is small- this is when normality matters the most too sensitive when N is large- when this doesn't matter due to central limit theorem therefore it is best to do visual examination
reliability analysis
reliability refers to the consistency or dependability of a measure over time, over questionnaire items, or over observers/raters
managing outliers
remove the participant from your analyses change the participants score- next highest value plus 1, value that has a z-score of 3.29
windorizing
replacing outliers with the next highest score that is not an outlier
planned contrasts- weights
set up contracts- assign positive/negative, magenta and weight to each chunk set up two-way independent ANOVA as usual but press paste instead of OK to copy the syntax into the syntax editor window Add a line like: /CONTRAST (ALCOHOL)SPECIAL (2 -1 -1) Add at top: attractiveness BY Gender Alcohol
first contrast
should compare all experimental groups with the control group
what are the three types of boxplots
simple, clustered, 1-D boxplot
skewness and kurtosis
skew is about the symmetry of the distribution kurtosis is about the heaviness of the tails or pointness of the distribution
f controls the type 1 error rate well under what conditions?
skew, kurtosis and non-normality
reserve scoring
sometimes, questionnaires contain both positively and negatively worded items summing these responses to obtain a total scale score would make very little sense therefore, reversing the response to the second question allows us to calculate a meaningful total, which can be used in subsequent analysis
planned comparisons are done when you have _____________ you want to test
specific hypotheses
levene's test
tests the null hypothesis that the variances in different groups are equal does this by doing a one-way ANOVA on the deviation scores when sample size is large, small differences in group variance can produce a significant in levee's test analyse- descriptive statistics- explore
where does error come from?
the T statistic gets bigger it we increase the difference (the signal) or if we decrease the error (the noise) in the t stat, the error comes from the standard error this is made up of the variability in the sample and the number of people we can reduce the error by increasing the number of people or decreasing the variability e.g. using better measurement instruments the bigger the statistic the smaller the p value
variance ratio method
there is overlap in individual groups- this is where we get our error variance how much variability is there in the total sample group- treatment model in ANOVA we look at this overall variance and we divide it by the error variance if the ratio is greater than 1 - we have evidence that the means are differet
polynomial contrasts: trend analysis
this contrast tests for trends in the data and in its most basic form it looks for a linear trend there are other trends- quadratic, cubic, and quartic trends linear trend- proportionate change in the value of the dependent variable across ordered categories quadratic trend- where there is a curve in the line (to find a quadratic trend you need atlas three groups) cubic trend- where there are two changes in the direction of the grand (must have at least four categories of the IV) quartic trend- has three changes of direction (need at least 5 categories of the IV). each of these trends has a set of codes for the dummy variables in the regression model- if you add the codes for a given trend the sum will equal zero and if you multiple the codes the sum of products will also equal zero- these contrasts are orthogonal
heteroscedasticity
this occurs when the residuals at each level of the predictor variable have unequal variances
what do we do if we violate sphericity?
use the epsilon values
simple histogram
use this when you want to see the frequencies of scores for a single variable
why opt for repeated-measures?
uses a single sample, with the same set of individuals measured in all of the different treatment conditions- one of the characteristics of repeated measures design is that it eliminates variance caused by individual differences individual differences are those participant characteristics that vary from one person to another and may influence the measurement that you obtain for each person- age, gender etc.
bar charts
usual way to display means
problems with repeated measures
usually we assume independence but scores will be correlate between conditions, hence it violates the assumption of independence accordingly, an additional assumption is made: sphericity - assumes that the variances and covariances of differences between treatment levels are the same -related to the idea that the correlation of the treatment levels should be the same
Q-Q plot
very similar to the P-P except it plots the quantiles of the data instead of every individual score in the data
histogram 'bins'
we can determine properties of bins used to make the histogram these bins can be seen as rubbish bins: on each rubbish bin you write a score or range of scores, then you through scores in your data set and throw each score into the rubbish bin with the appropriately label on it when you have finished throwing your data into these bins, you count how many scores are in each bin
effect size for factorial repeated-measures ANOVA
we can use the F ratio s and convert them to an effect size r
compute variable
we can use the compute variable functioning SPSS to sum or average responses to questionnaire items this will create a new variable, and thus a new column of data
assumptions of two-way ANOVA
we have the same overall assumptions we had with one-way ANOVA o Interval/ratio level data for DV o Independence o Normality o Homogeneity of variance • Overall, ANOVA is a relatively robust procedure to violations of assumptions: o This is especially true for violations of normality, especially when sample sizes are large o Normality is not a crucial assumption and can be violated without great consequence
interpreting interactions graphs
whilst you can use graphs to give you an indication of whether there is a main effect or interaction, you cannot use graphs to tell you if main effects or interactions are statistically significant interaction- non parallel lines no interaction- parallel lines/no crossing over
the method of least squares
will produce unbiased estimates of parameters even when homogeneity of variance can't be assumed but better estimates can be achieved using different methods such as weighed least squares
recoding data
with the recode function in SPSS, we can change specific values or ranges of values on one or more variables. this can be used to: collapse continuous variables into categories or reverse negatively scaled questionnaire items
what else can you use when assumption of normality or sphericity are violated?
you can also use the multivariate tests such as MANOVA if the assumptions of normality or sphericity are violated as this provides a more robust test of the null hypothesis that all treatment means are equal you could also conduct a Friendman's test as the non-parametric alternative to a one way repeated measures ANOVA
when planning comparisons...
your first contrast should be one that compares all of the experimental groups with the control groups there is only one control condition and so this portion of variance is only used in the first contrast- because it cannot be broken down any further
data types and statistical tests
your statistical tests must be appropriate to the type of data you are using. we need to consider: how many outcome variables? what type of outcome? how many predictor variables? what type of predictor? if categorical predictor, how many categories? if a categorical predictor, are the same or different entities in each category? are assumptions of linear model met/not met?
normality
• ANOVA is considered to be quite resilient or robust to departures from normality. As Winer et al. (1991) suggest "A reasonable statement is that the analysis of variance F statistic is robust with regard to moderate departures from normality when sample sizes are reasonably large and equal..." (p. 101) • Confidence in the "robustness" of the normality assumption come in part from an important statistical principle known as the central limit theorem, which indicates that as we increase the sample size (n), the sample mean will increasingly approximate a normal distribution. • Hence, Keppel and Wickens (2004) argue persuasively that "once the samples become as large as a dozen or so, we need not worry much about the assumption of normality" (p. 145).
running custom contrasts
• In order to run the analysis, set-up the ANOVA as usual o Instead of pressing OK press Paste o This will create the syntax and copy it to a new syntax window /WSFACTOR = Time 4 Polynomial tells us that you have a within-subjects factor called Time, which has 4 levels, and that, by default, polynomial contrasts will be performed But we now want to perform Special (custom) contrasts rather than polynomial contrasts So we delete polynomial and type 'Special' then place in brackets the contrast weights:
contrasted weights
• Need to assign weights to each group to tell SPSS how to perform the contrast • A planned comparison is linear combination of means • Where a1 to ak are contrast weights for k groups • This simply states that a planned contrast is a weighted sum of treatment means • Assign weights such that if there is no difference, L= 0 • Conduct t-test on 0 to see if it is different from 0
orthogonality
• Orthogonal contrasts are independent o They tap into different things o Unique chunks of variance o There is no redundancy o E.g. hypothesis 1 tests reward in relation to punish and indifferent o It would therefore by redundant to compare reward to punish in hypothesis 2 because you have already compared reward in relation to punish and indifferent o Have already used reward as a unique chunk • You can work out if contrasts are orthogonal to one another by calculating the sum of the cross products.
underlying theory of planned comparisons
• The variability explained by the Model (experimental manipulation SSM) is due to participants being assigned to different groups • This variability can be further broken down to test specific hypotheses about which groups might differ from one another • We break down the variance according to hypotheses made a priori (before the experiment). • Providing that the hypotheses are independent of one another, the family wise type 1 error will be controlled • When breaking down the variance→ cutting up a cake o We cut slices into smaller bits but we cannot stick these bits back together
why should we care if contrasts are orthogonal?
• Type 1 error o If contrasts are orthogonal, we don't need to adjust the a level for multiple comparisons o No inflation of type 1 error rate • Planning o There will always be K-1 orthogonal contrasts for K groups o If the overall ANOVA is significant, at least one of these contrasts must be significant also • Sum of squares o If group sizes are equal then SSmodel can be divided into k-1 components exactly.
simple effects analysis
• When you find a significant interaction between two or more variables, simple effects analysis (i.e. post hoc tests) can be used to examine that interaction. • If there is no significant interaction you cannot do simple effects analysis. • Simple effects analysis looks at the effects of one independent variable at one level of the other independent variable, then repeats the process for all other levels • For example, we could use simple effects analysis to look at the effects of alcohol on attractiveness for males, then at the effects of alcohol on attractiveness for females. • For males we would do a normal ANOVA with the different levels of alcohol and then separate we do a one-way ANOVA for females. • We have to use syntax to do this
In SPSS what does the row and column represent
row- data from on entity column- a level of a variable
how does pairwise comparisons control the FWE?
they control FWE by correcting the level of significance of each test such that the overall type 1 error rate across comparisons rains at 0.05
What are the two options to determine where the difference is in ANOVA?
Post hoc tests and planned comparisons
Polynomial contrasts in SPSS
Set up an ANOVA as usual (analyse > compare means > one-way ANOVA for independent measures; analyse> general linear model > repeated measures for repeated measures) Click on contrasts button. We get an F statistic instead of T statistic
when sphericity is violated the ___________ seems to be generally the most robust of the univariate techniques
bonferroni
df error
(k-1)-(N-k)
platykurtic
- light tails
family wise error rate
1-(1-a)^n
two-way ANOVAs address what three questions?
1. What happens to the DV as factor A (IV1) changes in levels? a. Is there a statistically significant main effect on factor A on the DV? 2. What happens to the DV as factor B (IV2) changes in levels? a. Is there a statistically significant main effect on factor B on the DV? 3. How does specific combinations of factor A and factor B affect the DV? a. Is there a statistically significant interaction between factor A and factor B upon the DV?
you need at least ________ conditions for sphericity to be an issue
3
homogeneity of variance
ANOVA is fairly robust is the sample sizes are equal, but not if the sample sizes are unequal if there is heterogeneity of variance, you can use the Brown-Forsyth or Welch F ratio instead of the regular F ratio - it takes into account the the variances are not equal
what is the alternative to using multiple t-tests?
ANOVA- can compare multiple groups in one test or FWE (family wise error) control
how do we test the interaction?
An interaction tests whether the effect of one factor is the same across all levels of another factor. If there is no interaction the difference between means should be the same. Therefore, if differences are not the sample then there is an interaction — we can reject the null hypothesis because they are not the same. H 0 = (A1B1 - A1B2) - (A2B1 - A2B2) = 0
how is sphericity measured?
By hand- start by calculating the differences between pairs of scores in all combinations of treatments. then we calculate the variance of these differences. the differences between pairs of scores are computed for each participant and the variance for each set of differences is calculated the assumption is met when these variances are roughly equal
post hoc in SPSS
Click on Post Hoc If one variable has only two levels we do not need to select post hoc tests for that variable. If the variable as more than two levels we can conduct post hoc tests. Select the variable and transfer it to the box labelled Post Hoc Tests for.
what do we do when there are two control groups? how do we plan our contrasts?
Contrast 1: control group (C1 and C2) vs. experimental groups (E1 and E2) Contrast 2: E1 vs. E2 Contrast 3: C1 vs. C2
f ratio
F = MS between/ MS within
f ratio for repeated measures
MS between treatments/ MS error
post hoc for unequal sample sizes
Fabriels (small discrepancy in n) or Hochberg's GT2 (large discrepancy in n)
post hoc for unequal variances
Games-howell
heterogeneity of variance
If variance is violated we can use Welch F in one way but this is not available for two way novas. easiest strategy is to adopt a more stringent alpha level- only considered results to be significant if p <0.025 or 0.01
df for total
N-1
df for within treatments
N-k
why can't we use graph interactions as an indication of significance?
We can use graphs to give an indication of whether there is a main effect or interaction but you CANNOT use graphs to tell you if the main effects or interactions are statistically significant. This is because differences between means can be due to sampling error. Therefore, you must ALWAYS use statistical tests to determine significance.
only compare two chunks at a time
a. For simple comparisons b. Compare one chunk against another chunk c. Doesn't apply for polynomial comparisons
omega-squareed
alternative to eta-squared- unbiased estimator and unbiased version of eta-squared 0.01 small 0.06 mediam 0.14 large
how do we run a one way independent ANOVA in SPSS
analyse- compare means- one way ANOVA
total sums of squares, SSt
calculate in the same way as one-way
sphericity
calculate the differences for all the possible participants
log transform: log (x)
can reduce positive skew and stabilise variance
Accept the null hypothesis when it is true
correct decision (1-a)
local sphericity
data with two of the variances of differences that are similar A and B (15.7) B and C (10.7) C and A (10.3)
standard pool
estimate of standard deviation
parameter estimates
estimates of parameters are affected by non-normal distributions. parameter estimates differ in how much they are biased in a non-normal distribution the median is less biased by skewed distributions than the mean if our parameter estimate is normally distributed then test statistics and p-values will be accurate if residuals are normally distrbuted in the population then using the method of least squares to estimate the parameters will produce better estimates than other methods
The REGWQ has _____ and tight control of the ____ error rate
good power; type1
why is it preferable to use a factorial ANOVA instead of two single factor ANOVAs?
interactions ecological validity fewer studies
eta-squared
it is a biased estimator- really good at telling about our sample but doesn't give us a very accurate effect size in our population 0.01 small 0.09 mediam 0.25 large
what does the f-ratio measure?
it is a measure the ratio of the variation explained by the model and the variation explained by unsystematic factors
is ANOVA robust?
it is a robust test meaning that it doesn't really matter if we break he assumptions of the test- the F will still be accurate.
df for between treatments
k-1
what are the tests for normality
kolmogorov-smirnov test or the shapiro-wilk test (better test)
what statistical test is used to assess variance?
levene's test
population pyramid
like a stacked histogram, this shows the relative frequency of scores in two populations
labeling ANOVAs
one way- one IV or factor two-way- two IV or factors three-way- three IV or factors
the three examples of effects sizes
proportion of variance accounted for (r^2) eta-squared (n^2) omega-squared (w^2)
when is anova robust to normality
providing that your same sizes are equal and large (df error >20) ANOVA is robust to the normality assumption being violated if the sample sizes are not equal/small, ANOVA is not so robust to the normality assumption violation- you can transform the data or use a non-parametric alternative to a one-way independent measures ANOVA (Kruskal-Wallis tests)
hartley's fmax
ratio of the variances between the group with the biggest variance and the group with the smallest variance
what statistic are we given for planned comparisons
t
omnibus test
tells us means aren't equal somewhere, but not where
what does /WSFACTOR = Time 4 Polynomial tell us
tells us that you have a within-subjects factor called Time, which has 4 levels, and that, by default, polynomial contrasts will be performed.
if the standard error is biased ______ will be to because they are usually based on the standard error
test statistics
interpretation of the F-ratio
the f-ratio tells us only that the direct or indirect manipulation was successful it does not tell us specifically which group means differ from which we need to do additional tests to find out where the group differences lie
the sample distribution
the frequency distribution of the data in our sample
the sampling distribution
the frequency distribution of the means of random samples taken from the population
why study RDA?
to acquire accurate knowledge about why people think, feel and behave the way they do
when do we use huynh-feld correction?
when greenhouse-geisser is greater than 0.75 it is a less contrastive method than compared to greenhouse-geisser
when shouldn't we use REGWQ test?
when group sizes are different
when is ANOVA fairly robust to violations of the assumption of homogeneity of variance?
when sample sizes are equal
when are assumptions of normality statistically violated?
when the significance value is below 0.05 for assumptions to be met the significance must be greater than 0.05
Polynomial contrasts
• Polynomial trends→ trend analysis - used when there an ordinal pattern in our data • Test specific patters in data • Use specific weights • Make sure levels in your data file are in the correct order