Discovering Statistics Using IBM SPSS Statistics, 5e
Fixed variable
A fixed variable is one that is not supposed to change over time (e.g., for most people their gender is a fixed variable - it never changes).
Multilevel linear model (MLM)
A linear model (just like regression, ANCOVA, ANOVA, etc.) in which the hierarchical structure of the data is explicitly considered. In this analysis regression parameters can be fixed (as in regression and ANOVA) but also random (i.e., free to vary across different contexts at a higher level of the hierarchy). This means that for each regression parameter there is a fixed component but also an estimate of how much the parameter varies across contexts (see fixed coefficient, random coefficient).
Test of excess success (TES)
A procedure designed for identifying sets of results within academic articles that are 'too good to be true'. For an article reporting multiple scientific studies examining the same effect, the test computes (based on the size of effect being measured and sample size of the studies) the probability that you would get significant results for all of the studies. If this probability is low it is highly unlikely that the researcher would get these results and the results appear 'too good to be true', implying p-hacking (Francis, 2013). It is noteworthy that the TES is not universally accepted as testing what it sets out to test (e.g., Morey, 2013).
Fixed intercept
A term used in multilevel linear modelling to denote when the intercept in the model is fixed. That is, it is not free to vary across different groups or contexts (cf. random intercept).
Random intercept
A term used in multilevel linear modelling to denote when the intercept in the model is free to vary across different groups or contexts (cf. fixed intercept).
Fixed slope
A term used in multilevel linear modelling to denote when the slope of the model is fixed. That is, it is not free to vary across different groups or contexts (cf. random slope).
Random slope
A term used in multilevel linear modelling to denote when the slope of the model is free to vary across different groups or contexts (cf. fixed slope).
Fixed effect
An effect in an experiment is said to be a fixed effect if all possible treatment conditions that a researcher is interested in are present in the experiment. Fixed effects can be generalized only to the situations in the experiment. For example, the effect is fixed if we say that we are interested only in the conditions that we had in our experiment (e.g., placebo, low dose and high dose) and we can generalize our findings only to the situation of a placebo, low dose and high dose.
Cohen's d
An effect size that expresses the difference between two means in standard deviation units. In general it can be estimated using:
Fisher's exact test
Fisher's exact test (Fisher, 1922) is not so much a test as a way of computing the exact probability of a statistic. It was designed originally to overcome the problem that with small samples the sampling distribution of the chi-square statistic deviates substantially from a chi-square distribution. It should be used with small samples.
Journal
In the context of academia a journal is a collection of articles on a broadly related theme, written by scientists, that report new data, new theoretical ideas or reviews/critiques of existing theories and data. Their main function is to induce learned helplessness in scientists through a complex process of self-esteem regulation using excessively harsh or complimentary peer feedback that has seemingly no obvious correlation with the actual quality of the work submitted.
Moderation
Moderation occurs when the relationship between two variables changes as a function of a third variable. For example, the relationship between watching horror films (predictor) and feeling scared at bedtime (outcome) might increase as a function of how vivid an imagination a person has (moderator).
Pearson's correlation coefficient
Pearson's product-moment correlation coefficient, to give it its full name, is a standardized measure of the strength of relationship between two variables. It can take any value from −1 (as one variable changes, the other changes in the opposite direction by the same amount), through 0 (as one variable changes the other doesn't change at all), to +1 (as one variable changes, the other changes in the same direction by the same amount).
P-P plot
Short for 'probability-probability plot'. A graph plotting the cumulative probability of a variable against the cumulative probability of a particular distribution (often a normal distribution). Like a Q-Q plot, if values fall on the diagonal of the plot then the variable shares the same distribution as the one specified. Deviations from the diagonal show deviations from the distribution of interest.
Q-Q plot
Short for 'quantile-quantile plot'. A graph plotting the quantiles of a variable against the quantiles of a particular distribution (often a normal distribution). Like a P-P plot, if values fall on the diagonal of the plot then the variable shares the same distribution as the one specified. Deviations from the diagonal show deviations from the distribution of interest.
Residual
The difference between the value a model predicts and the value observed in the data on which the model is based. Basically, an error. When the residual is calculated for each observation in a data set the resulting collection is referred to as the residuals.
Empirical probability
The empirical probability is the probability of an event based on the observation of many trials. For example, if you define the collective as all men, then the empirical probability of infidelity in men will be the proportion of men who have been unfaithful while in a relationship. The probability applies to the collective and not to the individual events. You can talk about there being a 0.1 probability of men being unfaithful, but the individual men were either faithful or not, so their individual probability of infidelity was either 0 (they were faithful) or 1 (they were unfaithful).
Cochran's Q
This test is an extension of McNemar's test and is basically a Friedman's ANOVA for dichotomous data. So imagine you asked 10 people whether they'd like to shoot Justin Timberlake, David Beckham and Simon Cowell and they could answer only 'yes' or 'no'. If we coded responses as 0 (no) and 1 (yes) we could do Cochran's test on these data.
McNemar's test
This tests differences between two related groups (see Wilcoxon signed-rank test and sign test), when nominal data have been used. It's typically used when we're looking for changes in people's scores and it compares the proportion of people who changed their response in one direction (i.e., scores increased) to those who changed in the opposite direction (scores decreased). So, this test needs to be used when we've got two related dichotomous variables.
Marginal likelihood (evidence)
When using Bayes' theorem to test a hypothesis, the marginal likelihood (sometimes called evidence) is the probability of the observed data, p(data). See also likelihood.
Posterior probability
When using Bayes' theorem to test a hypothesis, the posterior probability is our belief in a hypothesis or model after we have considered the data, p(model|data). This is the value that we are usually interested in knowing. It is the inverse conditional probability of the likelihood.
Prior probability
When using Bayes' theorem to test a hypothesis, the prior probability is our belief in a hypothesis or model before, or prior to, considering the data, p(model). See also posterior probability, likelihood, marginal likelihood.
Bayesian statistics
a branch of statistics in which hypotheses are tested or model parameters are estimated using methods based on Bayes' theorem.
Binary variable
a categorical variable that has only two mutually exclusive categories (e.g., being dead or alive).
Fixed coefficient
a coefficient or model parameter that is fixed; that is, it cannot vary over situations or contexts (cf. random coefficient).
Random coefficient
a coefficient or model parameter that is free to vary over situations or contexts (cf. fixed coefficient).
Matrix
a collection of numbers arranged in columns and rows. The values within a matrix are typically referred to as components or elements.
Compound symmetry
a condition that holds true when both the variances across conditions are equal (this is the same as the homogeneity of variance assumption) and the covariances between pairs of conditions are also equal.
Polynomial contrast
a contrast that tests for trends in the data. In its most basic form it looks for a linear trend (i.e., that the group means increase proportionately).
Bonferroni correction
a correction applied to the -level to control the overall Type I error rate when multiple significance tests are carried out. Each test conducted should use a criterion of significance of the α-level (normally 0.05) divided by the number of tests conducted. This is a simple but effective correction, but tends to be too strict when lots of tests are performed.
Bivariate correlation
a correlation between two variables.
Intraclass correlation (ICC)
a correlation coefficient that assesses the consistency between measures of the same class, that is, measures of the same thing (cf. Pearson's correlation coefficient, which measures the relationship between variables of a different class). Two common uses are in comparing paired data (such as twins) on the same measure, and assessing the consistency between judges' ratings of a set of objects. The calculation of these correlations depends on whether there is a measure of consistency (in which the order of scores from a source is considered but not the actual value around which the scores are anchored) or absolute agreement (in which both the order of scores and the relative values are considered), and whether the scores represent averages of many measures or just a single measure is required. This measure is also used in multilevel linear models to measure the dependency in data within the same context.
Unstructured
a covariance structure used in multilevel linear modelling. This covariance structure is completely general. Covariances are assumed to be completely unpredictable: they do not conform to a systematic pattern.
Variance components
a covariance structure used in multilevel linear modelling. This covariance structure is very simple and assumes that all random effects are independent and that the variances of random effects are assumed to be the same and sum to the variance of the outcome variable.
Diagonal
a covariance structure used in multilevel linear models. In this variance structure variances are assumed to be heterogeneous and all of the covariances are 0.
Probability distribution
a curve describing an idealized frequency distribution of a particular variable from which it is possible to ascertain the probability with which specific values of that variable will occur. For categorical variables it is simply a formula yielding the probability with which each category occurs.
p-curve
a curve summarizing the frequency distribution of p-values you'd expect to see in published research. On a graph that shows the value of the p-value on the horizontal axis against the frequency (or proportion) on the vertical axis, the p-curve is the line reflecting how frequently (or proportionately) each value of p should occur for a given effect size.
Growth curve
a curve that summarizes the change in some outcome over time. See polynomial.
Bimodal
a description of a distribution of observations that has two modes.
Posterior distribution
a distribution of posterior probabilities. This distribution should contain our subjective beliefs about a parameter or hypothesis after considering the data. The posterior distribution can be used to ascertain a value of the posterior probability (usually by examining some measure of where the peak of the distribution lies or a credible interval).
Prior distribution
a distribution of prior probabilities. This distribution should contain our subjective beliefs about a parameter or hypothesis before, or prior to, considering the data. The prior distribution can be an informative prior or an uninformative prior.
Common factor
a factor that affects all measured variables and, therefore, explains the correlations between those variables.
Unique factor
a factor that affects only one of many measured variables and, therefore, cannot explain the correlations between those variables.
Non-parametric tests
a family of statistical procedures that do not rely on the restrictive assumptions of parametric tests. In particular, they do not assume that the sampling distribution is normally distributed.
Multivariate analysis of variance
a family of tests that extend the basic analysis of variance to situations in which more than one outcome variable has been measured.
Concurrent validity
a form of criterion validity where there is evidence that scores from an instrument correspond to concurrently recorded external measures conceptually related to the measured construct.
Predictive validity
a form of criterion validity where there is evidence that scores from an instrument predict external measures (recorded at a different point in time) conceptually related to the measured construct.
Experimental research
a form of research in which one or more variables are systematically manipulated to see their effect (alone or in combination) on an outcome variable. This term implies that data will be able to be used to make statements about cause and effect. Compare with cross-sectional research and correlational research.
Cross-sectional research
a form of research in which you observe what naturally goes on in the world without directly interfering with it by measuring several variables at a single time point. In psychology, this term usually implies that data come from people at different age points, with different people representing each age point. See also correlational research, longitudinal research.
Longitudinal research
a form of research in which you observe what naturally goes on in the world without directly interfering with it, by measuring several variables at multiple time points. See also correlational research, cross-sectional research.
Correlational research
a form of research in which you observe what naturally goes on in the world without directly interfering with it. This term implies that data will be analysed so as to look at relationships between naturally occurring variables rather than making statements about cause and effect. Compare with cross-sectional research, longitudinal research and experimental research.
Smartreader
a free piece of software that can be downloaded from the IBM SPSS website and enables people who do not have SPSS Statistics installed to open and view SPSS output files.
Histogram
a frequency distribution.
Central tendency
a generic term describing the centre of a frequency distribution of observations as measured by the mean, mode and median.
Quartiles
a generic term for the three values that cut an ordered data set into four equal parts. The three quartiles are known as the first or lower quartile, the second quartile (or median) and the third or upper quartile.
CAIC (Bozdogan's criterion)
a goodness-of-fit measure similar to the AIC, but correcting for model complexity and sample size. It is not intrinsically interpretable, but can be compared in different models to see how changing the model affects the fit. A small value represents a better fit to the data.
AIC (Akaike's information criterion)
a goodness-of-fit measure that is corrected for model complexity. That just means that it takes account of how many parameters have been estimated. It is not intrinsically interpretable, but can be compared in different models to see how changing the model affects the fit. A small value represents a better fit to the data.
AICC (Hurvich and Tsai's criterion)
a goodness-of-fit measure that is similar to AIC but is designed for small samples. It is not intrinsically interpretable, but can be compared in different models to see how changing the model affects the fit. A small value represents a better fit to the data.
BIC (Schwarz's Bayesian information criterion)
a goodness-of-fit statistic comparable to the AIC, although it is slightly more conservative (it corrects more harshly for the number of parameters being estimated). It should be used when sample sizes are large and the number of parameters is small. It is not intrinsically interpretable, but can be compared in different models to see how changing the model affects the fit. A small value represents a better fit to the data.
Bar chart
a graph in which a summary statistic (usually the mean) is plotted on the y-axis against a categorical variable on the x-axis (this categorical variable could represent, for example, groups of people, different times or different experimental conditions). The value of the mean for each category is shown by a bar. Different-coloured bars may be used to represent levels of a second categorical variable.
Line chart
a graph in which a summary statistic (usually the mean) is plotted on the y-axis against a categorical variable on the x-axis (this categorical variable could represent, for example, groups of people, different times or different experimental conditions). The value of the mean for each category is shown by a symbol, and means across categories are connected by a line. Different-coloured lines may be used to represent levels of a second categorical variable.
Scree plot
a graph plotting each factor in a factor analysis (X-axis) against its associated eigenvalue (Y-axis). It shows the relative importance of each factor. This graph has a very characteristic shape (there is a sharp descent in the curve followed by a tailing off), and the point of inflexion of this curve is often used as a means of extraction. With a sample of more than 200 participants, this provides a fairly reliable criterion for extraction (Stevens, 2002)
Frequency distribution
a graph plotting values of observations on the horizontal axis, and the frequency with which each value occurs in the data set on the vertical axis (a.k.a. histogram).
Interaction graph
a graph showing the means of two or more independent variables in which means of one variable are shown at different levels of the other variable. Unusually the means are connected with lines, or are displayed as bars. These graphs are used to help understand interaction effects.
Scatterplot
a graph that plots values of one variable against the corresponding values of another variable (and the corresponding values of a third variable can also be included on a 3-D scatterplot).
Boxplot (a.k.a. box-whisker diagram)
a graphical representation of some important characteristics of a set of observations. At the centre of the plot is the median, which is surrounded by a box the top and bottom of which are the limits within which the middle 50% of observations fall (the interquartile range). Sticking out of the top and bottom of the box are two whiskers which extend to the highest and lowest extreme scores, respectively.
Error bar chart
a graphical representation of the mean of a set of observations that includes the 95% confidence interval of the mean. The mean is usually represented as a circle, square or rectangle at the value of the mean (or a bar extending to the value of the mean). The confidence interval is represented by a line protruding from the mean (upwards, downwards or both) to a short horizontal line representing the limits of the confidence interval. Error bars can be drawn using the standard error or standard deviation instead of the 95% confidence interval.
Sphericity
a less restrictive form of compound symmetry which assumes that the variances of the differences between data taken from the same participant (or other entity being tested) are equal. This assumption is most commonly found in repeated-measures ANOVA but applies only where there are more than two points of data from the same participant. See also Greenhouse-Geisser correction, Huynh-Feldt correction.
Regression line
a line on a scatterplot representing the regression model of the relationship between the two variables plotted.
Discriminant function variate
a linear combination of variables created such that the differences between group means on the transformed variable are maximized. It takes the general form:
Simple regression
a linear model in which one variable or outcome is predicted from a single predictor variable. The model takes the form of the equation in which Y is the outcome variable, X is the predictor, b1 is the regression coefficient associated with the predictor and b0 is the value of the outcome when the predictor is zero.
Bayes' theorem
a mathematical description of the relationship between the conditional probability of events A and B, p(A|B), their reverse conditional probability, p(B|A), and individual probabilities of the events, p(A) and p(B). The theorem states that
Structure matrix
a matrix in factor analysis containing the correlation coefficients for each variable on each factor in the data. When orthogonal rotation is used this is the same as the pattern matrix, but when oblique rotation is used these matrices are different.
Pattern matrix
a matrix in factor analysis containing the regression coefficients for each variable on each factor in the data. See also structure matrix.
Square matrix
a matrix that has an equal number of columns and rows.
Factor transformation matrix, ∧
a matrix used in factor analysis. It can be thought of as containing the angles through which factors are rotated in factor rotation.
Mean squares
a measure of average variability. For every sum of squares (which measure the total variability) it is possible to create mean squares by dividing by the number of things used to calculate the sum of squares (or some function of it).
Log-likelihood
a measure of error, or unexplained variation, in categorical models. It is based on summing the probabilities associated with the predicted and actual outcomes and is analogous to the residual sum of squares in multiple regression in that it is an indicator of how much unexplained information there is after the model has been fitted. Large values of the log-likelihood statistic indicate poorly fitting statistical models, because the larger the value of the log-likelihood, the more unexplained observations there are. The log-likelihood is the logarithm of the likelihood.
Variance inflation factor (VIF)
a measure of multicollinearity. The VIF indicates whether a predictor has a strong linear relationship with the other predictor(s). Myers (1990) suggests that a value of 10 is a good value at which to worry. Bowerman and O'Connell (1990) suggest that if the average VIF is greater than 1, then multicollinearity may be biasing the regression model.
Split-half reliability
a measure of reliability obtained by splitting items on a measure into two halves (in some random fashion) and obtaining a score from each half of the scale. The correlation between the two scores, corrected to take account of the fact the correlations are based on only half of the items, is used as a measure of reliability. There are two popular ways to do this. Spearman (1910) and Brown (1910) developed a formula that takes no account of the standard deviation of items in which r12 is the correlation between the two halves of the scale. Flanagan (1937) and Rulon (1939), however, proposed a measure that does account for item variance in which,s1 and s2 are the standard deviations of each half of the scale, and is the variance of the whole test. See Cortina (1993) for more details.
Covariance
a measure of the 'average' relationship between two variables. It is the average cross-product deviation (i.e., the cross-product divided by one less than the number of observations).
Cross-product deviations
a measure of the 'total' relationship between two variables. It is the deviation of one variable from its mean multiplied by the other variable's deviation from its mean.
DFBeta
a measure of the influence of a case on the values of bi in a regression model. If we estimated a regression parameter bi and then deleted a particular case and re-estimated the same regression parameter bi, then the difference between these two estimates would be the DFBeta for the case that was deleted. By looking at the values of the DFBetas, it is possible to identify cases that have a large influence on the parameters of the regression model; however, the size of DFBeta will depend on the units of measurement of the regression parameter.
DFFit
a measure of the influence of a case. It is the difference between the adjusted predicted value and the original predicted value of a particular case. If a case is not influential then its DFFit should be zero - hence, we expect non-influential cases to have small DFFit values. However, we have the problem that this statistic depends on the units of measurement of the outcome, and so a DFFit of 0.5 will be very small if the outcome ranges from 1 to 100, but very large if the outcome varies from 0 to 1.
Deleted residual
a measure of the influence of a particular case of data. It is the difference between the adjusted predicted value for a case and the original observed value for that case.
Adjusted predicted value
a measure of the influence of a particular case of data. It is the predicted value of a case from a model estimated without that case included in the data. The value is calculated by re-estimating the model without the case in question, then using this new model to predict the value of the excluded case. If a case does not exert a large influence over the model then its predicted value should be similar regardless of whether the model was estimated including or excluding that case. The difference between the predicted value of a case from the model when that case was included and the predicted value from the model when it was excluded is the DFFit.
Studentized deleted residual
a measure of the influence of a particular case of data. This is a standardized version of the deleted residual.
Adjusted R2
a measure of the loss of predictive power or shrinkage in regression. The adjusted R2 tells us how much variance in the outcome would be accounted for if the model had been derived from the population from which the sample was taken.
Cook's distance
a measure of the overall influence of a case on a model. Cook and Weisberg (1982) have suggested that values greater than 1 may be cause for concern.
Partial correlation
a measure of the relationship between two variables while 'controlling' the effect of one or more additional variables has on both.
Semi-partial correlation
a measure of the relationship between two variables while adjusting for the effect that one or more additional variables have on one of those variables. If we call our variables x and y, it gives us a measure of the variance in y that x alone shares.
Cronbach's α
a measure of the reliability of a scale defined by the equation shown in which the top half of the equation is simply the number of items (N) squared multiplied by the average covariance between items (the average of the off-diagonal elements in the variance-covariance matrix). The bottom half is the sum of all the elements in the variance-covariance matrix.
Cramér's V
a measure of the strength of association between two categorical variables used when one of these variables has more than two categories. It is a variant of phi used because when one or both of the categorical variables contain more than two categories, phi fails to reach its minimum value of 0 (indicating no association).
Phi
a measure of the strength of association between two categorical variables. Phi is used with 2 × 2 contingency tables (tables which have two categorical variables and each variable has only two categories). Phi is a variant of the chi-square test, X2: in which N is the total number of observations.
Correlation coefficient
a measure of the strength of association or relationship between two variables. See Pearson's correlation coefficient, Spearman's correlation coefficient, Kendall's tau.
Skew
a measure of the symmetry of a frequency distribution. Symmetrical distributions have a skew of 0. When the frequent scores are clustered at the lower end of the distribution and the tail points towards the higher or more positive scores, the value of skew is positive. Conversely, when the frequent scores are clustered at the higher end of the distribution and the tail points towards the lower or more negative scores, the value of skew is negative.
Model sum of squares
a measure of the total amount of variability for which a model can account. It is the difference between the total sum of squares and the residual sum of squares.
Total sum of squares
a measure of the total variability within a set of observation. It is the total squared deviance between each observation and the overall mean of all observations.
Residual sum of squares
a measure of the variability that cannot be explained by the model fitted to the data. It is the total squared deviance between the observations, and the value of those observations predicted by whatever model is fitted to the data.
Covariance ratio (CVR)
a measure of whether a case influences the variance of the parameters in a regression model. When this ratio is close to 1 the case has very little influence on the variances of the model parameters. Belsey et al. (1980) recommend the following: if the CVR of a case is greater than 1 + [3(k + 1)/n] then deleting that case will damage the precision of some of the model's parameters, but if it is less than 1 − [3(k + 1)/n] then deleting the case will improve the precision of some of the model's parameters (k is the number of predictors and n is the sample size).
Method of least squares
a method of estimating parameters (such as the mean, or a regression coefficient) that is based on minimizing the sum of squared errors. The parameter estimate will be the value, out of all of those possible, which has the smallest sum of squared errors.
Kaiser's criterion
a method of extraction in factor analysis based on the idea of retaining factors with associated eigenvalues greater than 1. This method appears to be accurate when the number of variables in the analysis is less than 30 and the resulting communalities (after extraction) are all greater than 0.7, or when the sample size exceeds 250 and the average communality is greater than or equal to 0.6.
Alpha factoring
a method of factor analysis.
Hierarchical regression
a method of multiple regression in which the order in which predictors are entered into the regression model is determined by the researcher based on previous research: variables already known to be predictors are entered first, new variables are entered subsequently.
Stepwise regression
a method of multiple regression in which variables are entered into the model based on a statistical criterion (the semi-partial correlation with the outcome variable). Once a new variable is entered into the model, all variables in the model are assessed to see whether they should be removed.
Promax
a method of oblique rotation that is computationally faster than direct oblimin and so useful for large data sets.
Direct oblimin
a method of oblique rotation.
Equamax
a method of orthogonal rotation that is a hybrid of quartimax and varimax. It is reported to behave fairly erratically (see Tabachnick & Fidell, 2012) and so is probably best avoided.
Varimax
a method of orthogonal rotation. It attempts to maximize the dispersion of factor loadings within factors. Therefore, it tries to load a smaller number of variables highly onto each factor, resulting in more interpretable clusters of factors.
Quartimax
a method of orthogonal rotation. It attempts to maximize the spread of factor loadings for a variable across all factors. This often results in lots of variables loading highly on a single factor.
Weighted least squares
a method of regression in which the parameters of the model are estimated using the method of least squares but observations are weighted by some other variable. Often they are weighted by the inverse of their variance to combat heteroscedasticity.
Ordinary least squares (OLS)
a method of regression in which the parameters of the model are estimated using the method of least squares.
Oblique rotation
a method of rotation in factor analysis that allows the underlying factors to be correlated.
Orthogonal rotation
a method of rotation in factor analysis that keeps the underlying factors independent (i.e., not correlated).
Saturated model
a model that perfectly fits the data and, therefore, has no error. It contains all possible main effects and interactions between variables.
Open science
a movement to make the process, data and outcomes of scientific research freely available to everyone.
Principal component analysis (PCA)
a multivariate technique for identifying the linear components of a set of variables.
Factor analysis
a multivariate technique for identifying whether the correlations between a set of observed variables stem from their relationship to one or more latent variables in the data, each of which takes the form of a linear model.
Repeated contrast
a non-orthogonal planned contrast that compares the mean in each condition (except the first) to the mean of the preceding condition.
Simple contrast
a non-orthogonal planned contrast that compares the mean in each condition to the mean of either the first or last condition, depending on how the contrast is specified.
Difference contrast
a non-orthogonal planned contrast that compares the mean of each condition (except the first) to the overall mean of all previous conditions combined.
Helmert contrast
a non-orthogonal planned contrast that compares the mean of each condition (except the last) to the overall mean all subsequent conditions combined.
Deviation contrast
a non-orthogonal planned contrast that compares the mean of each group (except for the first or last, depending on how the contrast is specified) to the overall mean.
Kendall's tau
a non-parametric correlation coefficient similar to Spearman's correlation coefficient, but should be used in preference for a small data set with a large number of tied ranks.
Kruskal-Wallis test
a non-parametric test of whether more than two independent groups differ. It is the non-parametric version of one-way independent ANOVA.
Friedman's ANOVA
a non-parametric test of whether more than two related groups differ. It is the non-parametric version of one-way repeated-measures ANOVA.
Median test
a non-parametric test of whether samples are drawn from a population with the same median. So, in effect, it does the same thing as the Kruskal-Wallis test. It works on the basis of producing a contingency table that is split for each group into the number of scores that fall above and below the observed median of the entire data set. If the groups are from the same population then these frequencies would be expected to be the same in all conditions (about 50% above and about 50% below).
Moses extreme reactions
a non-parametric test that compares the variability of scores in two groups, so it's a bit like a non-parametric Levene's test.
Mann-Whitney test
a non-parametric test that looks for differences between two independent samples. That is, it tests whether the populations from which two samples are drawn have the same location. It is functionally the same as Wilcoxon's rank-sum test, and both tests are non-parametric equivalents of the independent t-test.
Wilcoxon's rank-sum test
a non-parametric test that looks for differences between two independent samples. That is, it tests whether the populations from which two samples are drawn have the same location. It is functionally the same as the Mann-Whitney test, and both tests are non-parametric equivalents of the independent t-test.
Wilcoxon signed-rank test
a non-parametric test that looks for differences between two related samples. It is the non-parametric equivalent of the related t-test.
Mixed normal distribution
a normal-looking distribution that is contaminated by a small proportion of scores from a different distribution. These distributions are not normal and have too many scores in the tails (i.e., at the extremes). The effect of these heavy tails is to inflate the estimate of the population variance. This, in turn, makes significance tests lack power.
Weight
a number by which something (usually a variable in statistics) is multiplied. The weight assigned to a variable determines the influence that variable has within a mathematical equation: large weights give the variable a lot of influence.
Polynomial
a posh name for a growth curve or trend over time. If time is our predictor variable, then any polynomial is tested by including a variable that is the predictor to the power of the order of polynomial that we want to test: a linear trend is tested by time alone, a quadratic or second-order polynomial is tested by including a predictor that is time2, for a fifth-order polynomial we need a predictor of time5 and for an nth-order polynomial we would have to include timen as a predictor.
Normal distribution
a probability distribution of a random variable that is known to have certain properties. It is perfectly symmetrical (has a skew of 0), and has a kurtosis of 0.
Chi-square distribution
a probability distribution of the sum of squares of several normally distributed variables. It tends to be used to test hypotheses about categorical data, and to test the fit of models to the observed data.
Loglinear analysis
a procedure used as an extension of the chi-square test to analyse situations in which we have more than two categorical variables and we want to test for relationships between these variables. Essentially, a linear model is fitted to the data that predicts expected frequencies (i.e., the number of cases expected in a given category). In this respect it is much the same as analysis of variance but for entirely categorical data.
Rotation
a process in factor analysis for improving the interpretability of factors. In essence, an attempt is made to transform the factors that emerge from the analysis in such a way as to maximize factor loadings that are already large, and minimize factor loadings that are already small. There are two general approaches: orthogonal rotation and oblique rotation.
Counterbalancing
a process of systematically varying the order in which experimental conditions are conducted. In the simplest case of there being two conditions (A and B), counterbalancing simply implies that half of the participants complete condition A followed by condition B, whereas the remainder do condition B followed by condition A. The aim is to remove systematic bias caused by practice effects or boredom effects.
Hypothesis
a proposed explanation for a fairly narrow phenomenon or set of observations. It is not a guess, but an informed, theory-driven attempt to explain what has been observed. A hypothesis cannot be tested directly but must first be operationalized as predictions about variables that can be measured (see experimental hypothesis and null hypothesis).
Random variable
a random variable is one that varies over time (e.g., your weight is likely to fluctuate over time).
M-estimator
a robust measure of location. One example is the median. In some cases it is a measure of location computed after outliers have been removed; unlike a trimmed mean, the amount of trimming used to remove outliers is determined empirically.
Discriminant score
a score for an individual case on a particular discriminant function variate obtained by substituting that case's scores on the measured variables into the equation that defines the variate in question.
Planned contrasts
a set of comparisons between group means that are constructed before any data are collected. These are theory-led comparisons and are based on the idea of partitioning the variance created by the overall effect of group differences into gradually smaller portions of variance. These tests have more power than post hoc tests.
Post hoc tests
a set of comparisons between group means that were not thought of before data were collected. Typically these tests involve comparing the means of all combinations of pairs of groups. To compensate for the number of tests conducted, each test uses a strict criterion for significance. As such, they tend to have less power than planned contrasts. They are usually used for exploratory work for which no firm hypotheses were available on which to base planned contrasts.
Sobell test
a significance test of mediation. It tests whether the relationship between a predictor variable and an outcome variable is significantly reduced when a mediator is included in the model. It tests the indirect effect of the predictor on the outcome.
Mean
a simple statistical model of the centre of a distribution of scores. A hypothetical estimate of the 'typical' score.
Factor score
a single score from an individual entity representing their performance on some latent variable. The score can be crudely conceptualized as follows: take an entity's score on each of the variables that make up the factor and multiply it by the corresponding factor loading for the variable, then add these values up (or average them).
Complete separation
a situation in logistic regression when the outcome variable can be perfectly predicted by one predictor or a combination of predictors! Suffice it to say this situation makes your computer have the equivalent of a nervous breakdown: it'll start gibbering, weeping and saying it doesn't know what to do.
Multicollinearity
a situation in which two or more variables are very closely linearly related.
Suppressor effect
a situation where a predictor has a significant effect but only when another variable is held constant.
Šidák correction
a slightly less conservative variant of a Bonferroni correction.
Sample
a smaller (but hopefully representative) collection of units from a population used to determine truths about that population (e.g., how a given population behaves in certain conditions).
Identity matrix
a square matrix (i.e., having the same number of rows and columns) in which the diagonal elements are equal to 1, and the off-diagonal elements are equal to 0.
Variance-covariance matrix
a square matrix (i.e., same number of columns and rows) representing the variables measured. The diagonals represent the variances within each variable, whereas the off-diagonals represent the covariances between pairs of variables.
Sum of squares and cross-products matrix (SSCP matrix)
a square matrix in which the diagonal elements represent the sum of squares for a particular variable, and the off-diagonal elements represent the cross-products between pairs of variables. The SSCP matrix is basically the same as the variance-covariance matrix, except that the SSCP matrix expresses variability and between-variable relationships as total values, whereas the variance-covariance matrix expresses them as average values.
Index of mediation
a standardized measure of an indirect effect. In a mediation model, it is the indirect effect multiplied by the ratio of the standard deviation of the predictor variable to the standard deviation of the outcome variable.
Spearman's correlation coefficient
a standardized measure of the strength of relationship between two variables that does not rely on the assumptions of a parametric test. It is Pearson's correlation coefficient performed on data that have been converted into ranked scores.
Biserial correlation
a standardized measure of the strength of relationship between two variables when one of the two variables is dichotomous. The biserial correlation coefficient is used when one variable is a continuous dichotomy (e.g., has an underlying continuum between the categories).
Point-biserial correlation
a standardized measure of the strength of relationship between two variables when one of the two variables is dichotomous. The point-biserial correlation coefficient is used when the dichotomy is a discrete, or true, dichotomy (i.e., one for which there is no underlying continuum between the categories). An example of this is pregnancy: you can be either pregnant or not, there is no in between.
Standardized DFBeta
a standardized version of DFBeta. These standardized values are easier to use than DFBeta because universal cut-off points can be applied. Stevens (2002) suggests looking at cases with absolute values greater than 2.
Standardized DFFit
a standardized version of DFFit.
Test statistic
a statistic for which we know how frequently different values occur. The observed value of such a statistic is typically used to test hypotheses.
Trimmed mean
a statistic used in many robust tests. It is a mean calculated using trimmed data. For example, a 20% trimmed mean is a mean calculated after the top and bottom 20% of ordered scores have been removed. Imagine we had 20 scores representing the annual income of students (in thousands), rounded to the nearest thousand: 0, 1, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 40. The mean income is 5 (£5000), which is biased by an outlier. A 10% trimmed mean will remove 10% of scores from the top and bottom of ordered scores before the mean is calculated. With 20 scores, removing 10% of scores involves removing the top and bottom two scores. This gives us: 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, the mean of which is 3.44. The mean depends on a symmetrical distribution to be accurate, but a trimmed mean produces accurate results even when the distribution is not symmetrical. There are more complex examples of robust methods such as the bootstrap.
Linear model
a statistical model that is based upon an equation of the form Y = BX + E, in which Y is a vector containing scores from an outcome variable, B represents the b-values, X the predictor variables and E the error terms associated with each predictor. The equation can represent a solitary predictor variable (B, X and E are vectors) as in simple regression or multiple predictors (B, X and E are matrices) as in multiple regression. The key is the form of the model, which is linear (e.g., with a single predictor the equation is that of a straight line).
Analysis of covariance
a statistical procedure that uses the F-statistic to test the overall fit of a linear model, adjusting for the effect that one or more covariates have on the outcome variable. In experimental research this linear model tends to be defined in terms of group means and the resulting ANOVA is therefore an overall test of whether group means differ after the variance in the outcome variable explained by any covariates has been removed.
Analysis of variance
a statistical procedure that uses the F¬¬-statistic to test the overall fit of a linear model. In experimental research this linear model tends to be defined in terms of group means, and the resulting ANOVA is therefore an overall test of whether group means differ.
Contingency table
a table representing the cross-classification of two or more categorical variables. The levels of each variable are arranged in a grid, and the number of observations falling into each category is noted in the cells of the table. For example, if we took the categorical variables of glossary (with two categories: whether an author was made to write a glossary or not), and mental state (with three categories: normal, sobbing uncontrollably and utterly psychotic), we could construct a table. This instantly tells us that 127 authors who were made to write a glossary ended up as utterly psychotic, compared to only 2 who did not write a glossary.
Bootstrap
a technique from which the sampling distribution of a statistic is estimated by taking repeated samples (with replacement) from the data set (in effect, treating the data as a population from which smaller samples are taken). The statistic of interest (e.g., the mean, or b coefficient) is calculated for each sample, from which the sampling distribution of the statistic is estimated. The standard error of the statistic is estimated as the standard deviation of the sampling distribution created from the bootstrap samples. From this, confidence intervals and significance tests can be computed.
Robust test
a term applied to a family of procedures to estimate statistics that are reliable even when the normal assumptions of the statistic are not met.
Monte Carlo method
a term applied to the process of using data simulations to solve statistical problems. Its name comes from the use of Monte Carlo roulette tables to generate 'random' numbers in the pre-computer age. Karl Pearson, for example, purchased copies of Le Monaco, a weekly Paris periodical that published data from the Monte Carlo casinos' roulette wheels. He used these data as pseudo-random numbers in his statistical research.
Pre-registration
a term referring to the practice of making all aspects of your research process (rationale, hypotheses, design, data processing strategy, data analysis strategy) publically available before data collection begins. This can be done in a registered report in an academic journal, or more informally (e.g., on a public website such as the Open Science Framework). The aim is to encourage adherence to an agreed research protocol, thus discouraging threats to the validity of scientific results such as researcher degrees of freedom.
General linear model
a term to represent the fact that the linear model can encompass a range of different research designs such as multiple outcome variables (a.k.a. MANOVA), comparing means of categorical predictors (a.k.a. t-test, ANOVA), and including both categorical and continuous predictors (a.k.a. ANCOVA).
Extraction
a term used for the process of deciding whether a factor in factor analysis is statistically important enough to 'extract' from the data and interpret. The decision is based on the magnitude of the eigenvalue associated with the factor. See Kaiser's criterion, scree plot.
Singularity
a term used to describe variables that are perfectly correlated (i.e., the correlation coefficient is 1 or −1).
Durbin-Watson test
a test for serial correlations between errors in regression models. Specifically, it tests whether adjacent residuals are correlated, which is useful in assessing the assumption of independent errors. The test statistic can vary between 0 and 4, with a value of 2 meaning that the residuals are uncorrelated. A value greater than 2 indicates a negative correlation between adjacent residuals, whereas a value below 2 indicates a positive correlation. The size of the Durbin-Watson statistic depends upon the number of predictors in the model and the number of observations. For accuracy, look up the exact acceptable values in Durbin and Watson's (1951) original paper. As a very conservative rule of thumb, values less than 1 or greater than 3 are definitely cause for concern; however, values closer to 2 may still be problematic, depending on the sample and model.
One-tailed test
a test of a directional hypothesis. For example, the hypothesis 'the longer I write this glossary, the more I want to place my editor's genitals in a starved crocodile's mouth' requires a one-tailed test because I've stated the direction of the relationship. I would generally advise against using them because of the temptation to interpret interesting effects in the opposite direction to that predicted. See also two-tailed test.
Two-tailed test
a test of a non-directional hypothesis. For example, the hypothesis 'writing this glossary has some effect on what I want to do with my editor's genitals' requires a two-tailed test because it doesn't suggest the direction of the relationship. See also one-tailed test.
Box's test
a test of the assumption of homogeneity of covariance matrices. This test should be non-significant if the matrices are roughly the same. Box's test is very susceptible to deviations from multivariate normality and so may be non-significant not because the variance-covariance matrices are similar across groups, but because the assumption of multivariate normality is not tenable. Hence, it is vital to have some idea of whether the data meet the multivariate normality assumption (which is extremely difficult) before interpreting the result of Box's test.
Mauchly's test
a test of the assumption of sphericity. If this test is significant then the assumption of sphericity has not been met and an appropriate correction must be applied to the degrees of freedom of the F-statistic in repeated-measures ANOVA. The test works by comparing the variance-covariance matrix of the data to an identity matrix; if the variance-covariance matrix is a scalar multiple of an identity matrix then sphericity is met.
Kolmogorov-Smirnov test
a test of whether a distribution of scores is significantly different from a normal distribution. A significant value indicates a deviation from normality, but this test is notoriously affected by large samples in which small deviations from normality yield significant results.
Shapiro-Wilk test
a test of whether a distribution of scores is significantly different from a normal distribution. A significant value indicates a deviation from normality, but this test is notoriously affected by large samples in which small deviations from normality yield significant results.
Roy's largest root
a test statistic in MANOVA. It is the eigenvalue for the first discriminant function variate of a set of observations. So, it is the same as the Hotelling-Lawley trace, but for the first variate only. It represents the proportion of explained variance to unexplained variance (SSM/SSR) for the first discriminant function.
Wilks's lambda ()
a test statistic in MANOVA. It is the product of the unexplained variance on each of the discriminant function variates, so it represents the ratio of error variance to total variance (SSR/SST) for each variate.
Hotelling-Lawley trace (T2)
a test statistic in MANOVA. It is the sum of the eigenvalues for each discriminant function variate of the data and so is conceptually the same as the F-statistic in ANOVA: it is the sum of the ratio of systematic and unsystematic variance (SSM/SSR) for each of the variates.
Pillai-Bartlett trace (V)
a test statistic in MANOVA. It is the sum of the proportion of explained variance on the discriminant function variates of the data. As such, it is similar to the ratio of SSM/SST.
Wald statistic
a test statistic with a known probability distribution (a normal distribution, or a chi-square distribution when squared) that is used to test whether the b coefficient for a predictor in a logistic regression model is significantly different from zero. It is analogous to the t-statistic in a regression model in that it is simply the b coefficient divided by its standard error. The Wald statistic is inaccurate when the regression coefficient (b) is large, because the standard error tends to become inflated, resulting in the Wald statistic being underestimated.
F-statistic
a test statistic with a known probability distribution (the F-distribution). It is the ratio of the average variability in the data that a given model can explain to the average variability unexplained by that same model. It is used to test the overall fit of the model in simple regression and multiple regression, and to test for overall differences between group means in experiments.
t-statistic
a test statistic with a known probability distribution (the t-distribution). In the context of the linear model it is used to test whether a b-value is significantly different from zero; in the context of experimental work this b-value represents the difference between two means and so t is a test of whether the difference between those means is significantly different from zero. See also paired-samples t-test and independent t-test.
Parametric test
a test that requires data from one of the large catalogue of distributions that statisticians have described. Normally this term is used for parametric tests based on the normal distribution, which require four basic assumptions that must be met for the test to be accurate: a normally distributed sampling distribution (see normal distribution), homogeneity of variance, interval or ratio data, and independence.
Independent t-test
a test using the t-statistic that establishes whether two means collected from independent samples differ significantly.
Paired-samples t-test
a test using the t-statistic that establishes whether two means collected from the same sample (or related observations) differ significantly.
Percentiles
a type of quantile; they are values that split the data into 100 equal parts.
Noniles
a type of quantile; they are values that split the data into nine equal parts. They are comonly used in educational research.
Confounding variable
a variable (that we may or may not have measured) other than the predictor variables in which we're interested that potentially affects an outcome variable.
Currency variable
a variable containing values of money.
Date variable
a variable made up of dates. The data can take forms such as dd-mmm-yyyy (e.g., 21-Jun-1973), dd-mmm-yy (e.g., 21-Jun-73), mm/dd/yy (e.g., 06/21/73), dd.mm.yyyy (e.g., 21.06.1973).
Continuous variable
a variable that can be measured to any level of precision. (Time is a continuous variable, because there is in principle no limit on how finely it could be measured.)
Discrete variable
a variable that can only take on certain values (usually whole numbers) on the scale.
Latent variable
a variable that cannot be directly measured, but is assumed to be related to several variables that can be measured.
Moderator
a variable that changes the size and/or direction of the relationship between two other variables.
Covariate
a variable that has a relationship with (in terms of covariance), or has the potential to be related to, the outcome variable we've measured.
Predictor variable
a variable that is used to try to predict values of another variable known as an outcome variable.
Mediator
a variable that reduces the size and/or direction of the relationship between a predictor variable and an outcome variable (ideally to zero) and is associated statistically with both.
Outcome variable
a variable whose values we are trying to predict from one or more predictor variables.
Studentized residuals
a variation on standardized residuals. Studentized residuals are the unstandardized residual divided by an estimate of its standard deviation that varies point by point. These residuals have the same properties as the standardized residuals but usually provide a more precise estimate of the error variance of a specific case.
Partial eta squared (partial η2)
a version of eta squared that is the proportion of variance that a variable explains when excluding other variables in the analysis. Eta squared is the proportion of total variance explained by a variable, whereas partial eta squared is the proportion of variance that a variable explains that is not explained by other variables.
Confirmatory factor analysis (CFA)
a version of factor analysis in which specific hypotheses about structure and relations between the latent variables that underlie the data are tested.
Logistic regression
a version of multiple regression in which the outcome is a categorical variable. If the categorical variable has exactly two categories the analysis is called binary logistic regression, and when the outcome has more than two categories it is called multinomial logistic regression.
Brown-Forsythe F
a version of the F-statistic designed to be accurate when the assumption of homogeneity of variance has been violated.
Welch's F
a version of the F-statistic designed to be accurate when the assumption of homogeneity of variance has been violated. Not to be confused with the squelch test which is where you shake your head around after writing statistics books to see if you still have a brain.
Cox and Snell's
a version of the coefficient of determination for logistic regression. It is based on the log-likelihood of a model, the log-likelihood of the original model and the sample size, n. However, it is notorious for not reaching its maximum value of 1 (see Nagelkerke's ).
Parameter
a very difficult thing to describe. When you fit a statistical model to your data, that model will consist of variables and parameters: variables are measured constructs that vary across entities in the sample, whereas parameters describe the relations between those variables in the population. In other words, they are constants believed to represent some fundamental truth about the measured variables. We use sample data to estimate the likely value of parameters because we don't have direct access to the population. Of course, it's not quite as simple as that.
Anderson-Rubin method
a way of calculating factor scores which produces scores that are uncorrelated and standardized with a mean of 0 and a standard deviation of 1.
Maximum-likelihood estimation
a way of estimating statistical parameters by choosing the parameters that make the data most likely to have happened. Imagine for a set of parameters that we calculated the probability (or likelihood) of getting the observed data; if this probability was high then these particular parameters yield a good fit of the data, but conversely if the probability was low, these parameters are a bad fit to our data. Maximum-likelihood estimation chooses the parameters that maximize the probability.
Dummy variables
a way of recoding a categorical variable with more than two categories into a series of variables all of which are dichotomous and can take on values of only 0 or 1. There are seven basic steps to create such variables: (1) count the number of groups you want to recode and subtract 1; (2) create as many new variables as the value you calculated in step 1 (these are your dummy variables); (3) choose one of your groups as a baseline (i.e., a group against which all other groups should be compared, such as a control group); (4) assign that baseline group values of 0 for all of your dummy variables; (5) for your first dummy variable, assign the value 1 to the first group that you want to compare against the baseline group (assign all other groups 0 for this variable); (6) for the second dummy variable assign the value 1 to the second group that you want to compare against the baseline group (assign all other groups 0 for this variable); (7) repeat this process until you run out of dummy variables.
Harmonic mean
a weighted version of the mean that takes account of the relationship between variance and sample size. It is calculated by summing the reciprocal of all observations, then dividing by the number of observations. The reciprocal of the end product is the harmonic mean:
ANCOVA
acronym for analysis of covariance.
ANOVA
acronym for analysis of variance.
MANOVA
acronym for multivariate analysis of variance.
Hartley's Fmax
also known as the variance ratio, is the ratio of the variances between the group with the biggest variance and the group with the smallest variance. This ratio is compared to critical values in a table published by Hartley as a test of homogeneity of variance. Some general rules are that with sample sizes (n) of 10 per group, an Fmax less than 10 is more or less always going to be non-significant, with 15-20 per group the ratio needs to be less than about 5, and with samples of 30-60 the ratio should be below about 2 or 3.
Theory
although it can be defined more formally, a theory is a hypothesized general principle or set of principles that explain known findings about a topic and from which new hypotheses can be generated. Theories have typically been well-substantiated by repeated testing
Chi-square test
although this term can apply to any test statistic having a chi-square distribution, it generally refers to Pearson's chi-square test of the independence of two categorical variables. Essentially it tests whether two categorical variables forming a contingency table are associated.
Yates's continuity correction
an adjustment made to the chi-square test when the contingency table is 2 rows by 2 columns (i.e., there are two categorical variables both of which consist of only two categories). In large samples the adjustment makes little difference and is slightly dubious anyway (see Howell, 2012).
Repeated-measures ANOVA
an analysis of variance conducted on any design in which the independent variable (predictor) or variables (predictors) have all been measured using the same participants in all conditions.
Factorial ANOVA
an analysis of variance involving two or more independent variables or predictors.
Simple slopes analysis
an analysis that looks at the relationship (i.e., the simple regression) between a predictor variable and an outcome variable at low, mean and high levels of a third (moderator) variable.
Registered report
an article in a journal usually outlining an intended research process (rationale, hypotheses, design, data processing strategy, data analysis strategy). The report is reviewed by relevant expert scientists, ensuring that authors get useful feedback before data collection. If the protocol is accepted by the journal editor it typically comes with a guarantee to publish the findings no matter what they are, thus reducing publication bias and discouraging researcher degrees of freedom aimed at achieving significant results.
Homoscedasticity
an assumption in regression analysis that the residuals at each level of the predictor variable(s) have similar variances. Put another way, at each point along any predictor variable, the spread of residuals should be fairly constant.
Homogeneity of regression slopes
an assumption of analysis of covariance. This is the assumption that the relationship between the covariate and outcome variable is constant across different treatment levels. So, if we had three treatment conditions, if there's a positive relationship between the covariate and the outcome in one group, we assume that there is a similar-sized positive relationship between the covariate and outcome in the other two groups too.
Homogeneity of covariance matrices
an assumption of some multivariate tests such as MANOVA. It is an extension of the homogeneity of variance assumption in univariate analyses. However, as well as assuming that variances for each dependent variable are the same across groups, it also assumes that relationships (covariances) between these dependent variables are roughly equal. It is tested by comparing the population variance-covariance matrices of the different groups in the analysis.
Random effect
an effect is said to be random if the experiment contains only a sample of possible treatment conditions. Random effects can be generalized beyond the treatment conditions in the experiment. For example, the effect is random if we say that the conditions in our experiment (e.g., placebo, low dose and high dose) are only a sample of possible conditions (perhaps we could have tried a very high dose). We can generalize this random effect beyond just placebos, low doses and high doses.
Omega squared
an effect size measure associated with ANOVA that is less biased than eta squared. It is a (sometimes hideous) function of the model sum of squares and the residual sum of squares and isn't actually much use because it measures the overall effect of the ANOVA and so can't be interpreted in a meaningful way. In all other respects it's great, though.
Eta squared (η2)
an effect size measure that is the ratio of the model sum of squares to the total sum of squares. So, in essence, the coefficient of determination by another name. It doesn't have an awful lot going for it: not only is it biased, but it typically measures the overall effect of an ANOVA and effect sizes are more easily interpreted when they reflect specific comparisons (e.g., the difference between two means).
Variance
an estimate of average variability (spread) of a set of data. It is the sum of squares divided by the number of values on which the sum of squares is based minus 1.
Standard deviation
an estimate of the average variability (spread) of a set of data measured in the same units of measurement as the original data. It is the square root of the variance.
Greenhouse-Geisser estimate
an estimate of the departure from sphericity. The maximum value is 1 (the data completely meet the assumption of sphericity) and the minimum is the lower bound. Values below 1 indicate departures from sphericity and are used to correct the degrees of freedom associated with the corresponding F-statistics by multiplying them by the value of the estimate. Some say the Greenhouse-Geisser correction is too conservative (strict) and recommend the Huynh-Feldt correction instead.
Huynh-Feldt estimate
an estimate of the departure from sphericity. The maximum value is 1 (the data completely meet the assumption of sphericity). Values below this indicate departures from sphericity and are used to correct the degrees of freedom associated with the corresponding F-statistics by multiplying them by the value of the estimate. It is less conservative than the Greenhouse-Geisser estimate, but some say it is too liberal.
Sum of squares (SS)
an estimate of total variability (spread) of a set of observations around a parameter (such as the mean). First the deviance for each score is calculated, and then this value is squared. The SS is the sum of these squared deviances.
Independent design
an experimental design in which different treatment conditions utilize different organisms (e.g., in psychology, this would mean using different people in different treatment conditions) and so the resulting data are independent (a.k.a. between-groups or between-subjects designs).
Repeated-measures design
an experimental design in which different treatment conditions utilize the same organisms (i.e., in psychology, this would mean the same people take part in all experimental conditions) and so the resulting data are related (a.k.a. related design or within-subject design).
Independent factorial design
an experimental design incorporating two or more predictors (or independent variables) all of which have been manipulated using different participants (or whatever entities are being tested).
Related factorial design
an experimental design incorporating two or more predictors (or independent variables) all of which have been manipulated using the same participants (or whatever entities are being tested).
Mixed design
an experimental design incorporating two or more predictors (or independent variables) at least one of which has been manipulated using different participants (or whatever entities are being tested) and at least one of which has been manipulated using the same participants (or entities). Also known as a split-plot design because Fisher developed ANOVA for analysing agricultural data involving 'plots' of land containing crops.
Multiple regression
an extension of simple regression in which an outcome is predicted by a linear combination of two or more predictor variables. The form of the model is shown in the equation, in which the outcome is denoted by Y, and each predictor is denoted by X. Each predictor has a regression coefficient b associated with it, and b0 is the value of the outcome when all predictors are zero.
Degrees of freedom
an impossible thing to define in a few pages, let alone a few lines. Essentially it is the number of 'entities' that are free to vary when estimating some kind of statistical parameter. In a more practical sense, it has a bearing on significance tests for many commonly used test statistics (such as the F¬¬-statistic, t-test, chi-square statistic) and determines the exact form of the probability distribution for these test statistics. The explanation involving soccer players in Chapter 2 is far more interesting...
Goodness of fit
an index of how well a model fits the data from which it was generated. It's usually based on how well the data predicted by the model correspond to the data that were actually collected.
Peer Reviewers' Openness Initiative
an initiative to get scientists to commit to the principles of open science when they act as expert reviewers for journals. Signing up is a pledge to review submissions only if the data, stimuli, materials, analysis scripts and so on are made publically available (unless there is a good reason, such as a legal requirement, not to).
Ratio variable
an interval variable but with the additional property that ratios are meaningful. For example, people's ratings of this book on Amazon.com can range from 1 to 5; for these data to be ratio not only must they have the properties of interval variables, but in addition a rating of 4 should genuinely represent someone who enjoyed this book twice as much as someone who rated it as 2. Likewise, someone who rated it as 1 should be half as impressed as someone who rated it as 2.
Effect size
an objective and (usually) standardized measure of the magnitude of an observed effect. Measures include Cohen's d, Glass's g and Pearson's correlations coefficient, r.
Outlier
an observation or observations very different from most others. Outliers bias statistics (e.g., the mean) and their standard errors and confidence intervals.
Independent ANOVA
analysis of variance conducted on any design in which all independent variables or predictors have been manipulated using different participants (i.e., all data come from different entities).
Mixed ANOVA
analysis of variance used for a mixed design.
Reverse Helmert contrast
another name for a difference contrast.
Independent variable
another name for a predictor variable. This name is usually associated with experimental methodology (which is the only time it makes sense) and is used because it is the variable that is manipulated by the experimenter and so its value does not depend on any other variables (just on the experimenter). I just use the term predictor variable all the time because the meaning of the term is not constrained to a particular methodology.
Related design
another name for a repeated-measures design.
Within-subject design
another name for a repeated-measures design.
Part correlation
another name for a semi-partial correlation.
Factor
another name for an independent variable or predictor that's typically used when describing experimental designs. However, to add to the confusion, it is also used synonymously with latent variable in factor analysis.
Blockwise regression
another name for hierarchical regression.
Between-groups design
another name for independent design.
Between-subjects design
another name for independent design.
Hat values
another name for leverage.
Polychotomous logistic regression
another name for multinomial logistic regression.
Dependent variable
another name for outcome variable. This name is usually associated with experimental methodology (which is the only time it really makes sense) and is used because it is the variable that is not manipulated by the experimenter and so its value depends on the variables that have been manipulated. To be honest, I just use the term outcome variable all the time - it makes more sense (to me) and is less confusing.
Planned comparisons
another name for planned contrasts.
Second quartile
another name for the median.
Sum of squared errors
another name for the sum of squares.
Wald-Wolfowitz runs
another variant on the Mann-Whitney test. Scores are rank-ordered as in the Mann-Whitney test, but rather than analysing the ranks, this test looks for 'runs' of scores from the same group within the ranked order. Now, if there's no difference between groups then obviously ranks from the two groups should be randomly interspersed. However, if the groups are different then one should see more ranks from one group at the lower end, and more ranks from the other group at the higher end. By looking for clusters of scores in this way the test can determine if the groups differ.
Categorical variable
any variable made up of categories of objects/entities. The university you attend is a good example of a categorical variable: students who attend the University of Sussex are not also enrolled at Harvard or UV Amsterdam, therefore, students fall into distinct categories.
Variables
anything that can be measured and can differ across entities or across time.
Cross-validation
assessing the accuracy of a model across different samples. This is an important step in generalization. In a regression model there are two main methods of cross-validation: adjusted R2 or data splitting, in which the data are split randomly into two halves, and a regression model is estimated for each half and then compared.
Pairwise comparisons
comparisons of pairs of means.
Interval variable
data measured on a scale along the whole of which intervals are equal. For example, people's ratings of this book on Amazon.com can range from 1 to 5; for these data to be interval it should be true that the increase in appreciation for this book represented by a change from 3 to 4 along the scale should be the same as the change in appreciation represented by a change from 1 to 2, or 4 to 5.
Wide format data
data that are arranged such that scores from a single entity appear in a single row and levels of independent or predictor variables are arranged over different columns. As such, in designs with multiple measurements of an outcome variable within a case the outcome variable scores will be contained in multiple columns each representing a level of an independent variable, or a time point at which the score was observed. Columns can also represent attributes of the score or entity that are fixed over the duration of data collection, such as participant sex, employment status etc. (cf. long format data).
Long format data
data that are arranged such that scores on an outcome variable appear in a single column and rows represent a combination of the attributes of those scores - the entity from which the scores came, when the score was recorded, etc. In long format data, scores from a single entity can appear over multiple rows where each row represents a combination of the attributes of the score - for example, levels of an independent variable or time point at which the score was recorded (cf. wide format data).
Ordinal variable
data that tell us not only that things have occurred, but also the order in which they occurred. These data tell us nothing about the differences between values. For example, gold, silver and bronze medals are ordinal: they tell us that the gold medallist was better than the silver medallist, but they don't tell us how much better (was gold a lot better than silver, or were gold and silver very closely competed?).
Multimodal
description of a distribution of observations that has more than two modes.
Dichotomous
description of a variable that consists of only two categories (e.g., biological sex is a dichotomous variable because it consists of only two categories: male and female).
Validity
evidence that a study allows correct inferences about the question it was aimed to answer or that a test measures what it set out to measure conceptually. See also content validity, criterion validity.
Criterion validity
evidence that scores from an instrument correspond with (concurrent validity) or predict (predictive validity) external measures conceptually related to the measured construct.
Content validity
evidence that the content of a test corresponds to the content of the construct it was designed to cover.
Ecological validity
evidence that the results of a study, experiment or test can be applied, and allow inferences, to real-world conditions.
Perfect collinearity
exists when at least one predictor in a regression model is a perfect linear combination of the others (the simplest example being two predictors that are perfectly correlated - they have a correlation coefficient of 1).
Qualitative methods
extrapolating evidence for a theory from what people say or write (contrast with quantitative methods).
Confidence interval
for a given statistic calculated for a sample of observations (e.g., the mean), the confidence interval is a range of values around that statistic that are believed to contain, in a certain proportion of samples (e.g., 95%), the true value of that statistic (i.e., the population parameter). What that also means is that for the other proportion of samples (e.g., 5%), the confidence interval won't contain that true value. The trouble is, you don't know which category your particular sample falls into.
Independent errors
for any two observations in regression the residuals should be uncorrelated (or independent).
Factor matrix
general term for the structure matrix in factor analysis.
Component matrix
general term for the structure matrix in principal components analysis.
Grand mean centring
grand mean centring means the transformation of a variable by taking each score and subtracting the mean of all scores (for that variable) from it (cf. group mean centring).
Group mean centring
group mean centring means the transformation of a variable by taking each score and subtracting from it the mean of the scores (for that variable) for the group to which that score belongs (cf. grand mean centring).
Fit
how sexually attractive you find a statistical test. Alternatively, it's the degree to which a statistical model is an accurate representation of some observed data. (Incidentally, it's just plain wrong to find statistical tests sexually attractive.)
Discriminant function analysis
identifies and describes the discriminant function variates of a set of variables and is useful as a follow-up test to MANOVA as a means of seeing how these variates allow groups of cases to be discriminated.
Quadratic trend
if the means in ordered conditions are connected with a line then a quadratic trend is shown by one change in the direction of this line (e.g., the line is curved in one place); the line is, therefore, U-shaped. There must be at least three ordered conditions.
Quartic trend
if the means in ordered conditions are connected with a line then a quartic trend is shown by three changes in the direction of this line. There must be at least five ordered conditions.
Standard error of differences
if we were to take several pairs of samples from a population and calculate their means, then we could also calculate the difference between their means. If we plotted these differences between sample means as a frequency distribution, we would have the sampling distribution of differences. The standard deviation of this sampling distribution is the standard error of differences. As such it is a measure of the variability of differences between sample means.
Cubic trend
if you connected the means in ordered conditions with a line then a cubic trend is shown by two changes in the direction of this line. You must have at least four ordered conditions.
Informative prior distribution
in Bayesian statistics an informative prior distribution is a distribution representing your beliefs in a model parameter where the distribution narrows those beliefs to some degree. For example, a prior distribution that is normal with a peak at 5 and range from 2 to 8 would narrow your beliefs in a parameter such that you most strongly believe that its value will be 5, and you think it is impossible for the parameter to be less than 2 or greater than 8. As such, this distribution constrains your prior beliefs. Informative priors can vary from weakly informative (you are prepared to believe a wide range of values) to strongly informative (your beliefs are very constrained) (cf. uninformative prior).
Uninformative prior distribution
in Bayesian statistics an uninformative prior distribution is a distribution representing your beliefs in a model parameter where the distribution assigns equal probability to all values of the model/parameter. For example, a prior distribution that is uniform across all potential values of a parameter suggests that you are prepared to believe that the parameter can take on any value with equal probability. As such, this distribution does not constrain your prior beliefs (cf. informative prior).
Credible interval
in Bayesian statistics, a credible interval is an interval within which a certain percentage of the posterior distribution falls (usually 95%). It can be used to express the limits within which a parameter falls with a fixed probability. For example, if we estimated the average length of a romantic relationship to be 6 years with a 95% credible interval of 1 to 11 years, then this would mean that 95% of the posterior distribution for the length of romantic relationships falls between 1 and 11 years. A plausible estimate of the length of romantic relationships would, therefore, be 1 to 11 years.
Parsimony
in a scientific context, parsiomony refers to the idea that simpler explanations of a phenomenon are preferable to complex ones. This idea relates to Ockham's (or Occam's if you prefer) razor, which is a phrase referring to the principle of 'shaving' away unnecessary assumptions or explanations to produce less complex theories. In statistical terms, parsimony tends to refer to a general heuristic that models be kept as simple as possible - in other words, not including variables that don't have real explanatory benefit.
Population
in statistical terms this usually refers to the collection of units (be they people, plankton, plants, cities, suicidal authors, etc.) to which we want to generalize a set of findings or a statistical model.
Adjusted mean
in the context of analysis of covariance this is the value of the group mean adjusted for the effect of the covariate.
Quantitative methods
inferring evidence for a theory through measurement of variables that produce numeric outcomes (cf. qualitative methods).
Leverage
leverage statistics (or hat values) gauge the influence of the observed value of the outcome variable over the predicted values. The average leverage value is (k+1)/n, in which k is the number of predictors in the model and n is the number of participants. Leverage values can lie between 0 (the case has no influence whatsoever) and 1 (the case has complete influence over prediction). If no cases exert undue influence over the model then we would expect all of the leverage values to be close to the average value. Hoaglin and Welsch (1978) recommend investigating cases with values greater than twice the average (2(k + 1)/n) and Stevens (2002) recommends using three times the average (3(k + 1)/n) as a cut-off point for identifying cases having undue influence.
Binary logistic regression
logistic regression in which the outcome variable has exactly two categories.
Multinomial logistic regression
logistic regression in which the outcome variable has more than two categories.
Multivariate
means 'many variables' and is usually used when referring to analyses in which there is more than one outcome variable (MANOVA, principal component analysis, etc.).
Univariate
means 'one variable' and is usually used to refer to situations in which only one outcome variable has been measured (ANOVA, t-tests, Mann-Whitney tests, etc.).
Orthogonal
means perpendicular (at right angles) to something. It tends to be equated to independence in statistics because of the connotation that perpendicular linear models in geometric space are completely independent (one is not influenced by the other).
Goodman and Kruskal's λ
measures the proportional reduction in error that is achieved when membership of a category of one variable is used to predict category membership of the other variable. A value of 1 means that one variable perfectly predicts the other, whereas a value of 0 indicates that one variable in no way predicts the other.
Kolmogorov-Smirnov Z
not to be confused with the Kolmogorov-Smirnov test that tests whether a sample comes from a normally distributed population. This tests whether two groups have been drawn from the same population (regardless of what that population may be). It does much the same as the Mann-Whitney test and Wilcoxon rank-sum test! This test tends to have better power than the Mann-Whitney test when sample sizes are less than about 25 per group.
Type I error
occurs when we believe that there is a genuine effect in our population, when in fact there isn't.
Type II error
occurs when we believe that there is no effect in the population, when in fact there is.
Mediation
perfect mediation occurs when the relationship between a predictor variable and an outcome variable can be completely explained by their relationships with a third variable. For example, taking a dog to work reduces work stress. This relationship is mediated by positive mood if (1) having a dog at work increases positive mood; (2) positive mood reduces work stress; and (3) the relationship between having a dog at work and work stress is reduced to zero (or at least weakened) when positive mood is included in the model.
Syntax
predefined written commands that instruct SPSS Statistics what you would like it to do (writing 'bugger off and leave me alone' doesn't seem to work ...).
Practice effect
refers to the possibility that participants' performance in a task may be influenced (positively or negatively) if they repeat the task because of familiarity with the experimental situation and/or the measures being used.
Boredom effect
refers to the possibility that performance in tasks may be influenced (the assumption is a negative influence) by boredom or lack of concentration if there are many tasks, or the task goes on for a long period of time.
p-hacking
research practices that lead to selective reporting of significant p-values. Some examples of p-hacking are: (1) trying multiple analyses and reporting only the one that yields significant results; (2) stopping collecting data at a point other than when the predetermined sample size is reached; (3) deciding whether to include data based on the effect they have on the p-value; (4) including (or excluding) variables in an analysis based on how they affect the p-value; (5) measuring multiple outcome or predictor variables but reporting only those for which the effects are significant; (6) merging groups of variables or scores to yield significant results, and (7) transforming, or otherwise manipulating, scores to yield significant p-values.
Variance ratio
see Hartley's Fmax.
Residuals
see Residual.
Box-whisker plot
see boxplot.
Discriminant analysis
see discriminant function analysis.
Leptokurtic
see kurtosis.
Platykurtic
see kurtosis.
Contaminated normal distribution
see mixed normal distribution.
Regression model
see multiple regression and simple regression.
Dependent t-test
see paired-samples t-test.
Negative skew
see skew.
Positive skew
see skew.
Standardized
see standardization.
VIF
see variance inflation factor.
Density plot
similar to a histogram except that rather than having a summary bar representing the frequency of scores, it shows each individual score as a dot. They can be useful for looking at the shape of a distribution of scores.
βi
standardized regression coefficient. Indicates the strength of relationship between a given predictor, i, of many and an outcome in a standardized form. It is the change in the outcome (in standard deviations) associated with a one standard deviation change in the predictor.
Variance sum law
states that the variance of a difference between two independent variables is equal to the sum of their variances.
Chartjunk
superfluous material that distracts from the data being displayed on a graph.
Experimental hypothesis
synonym for alternative hypothesis.
Levene's test
tests the hypothesis that the variances in different groups are equal (i.e., the difference between the variances is zero). It basically does a one-way ANOVA on the deviations (i.e., the absolute value of the difference between each score and the mean of its group). A significant result indicates that the variances are significantly different - therefore, the assumption of homogeneity of variances has been violated. When samples sizes are large, small differences in group variances can produce a significant Levene's test. I do not recommend using this test - instead interpret statistics that have been adjusted for the degree of heterogeneity in variances.
Sign test
tests whether two related samples are different. It does the same thing as the Wilcoxon signed-rank test. Differences between the conditions are calculated and the sign of this difference (positive or negative) is analysed because it indicates the direction of differences. The magnitude of change is completely ignored (unlike in Wilcoxon's test, where the rank tells us something about the relative magnitude of change), and for this reason it lacks power. However, its computational simplicity makes it a nice party trick if ever anyone drunkenly accosts you needing some data quickly analysed without the aid of a computer - doing a sign test in your head really impresses people. Actually it doesn't, they just think you're a sad gimboid.
Kaiser-Meyer-Olkin measure of sampling adequacy (KMO)
the KMO can be calculated for individual and multiple variables and represents the ratio of the squared correlation between variables to the squared partial correlation between variables. It varies between 0 and 1: a value of 0 means that the sum of partial correlations is large relative to the sum of correlations, indicating diffusion in the pattern of correlations (hence, factor analysis is likely to be inappropriate); a value close to 1 indicates that patterns of correlations are relatively compact and so factor analysis should yield distinct and reliable factors. Values between 0.5 and 0.7 are mediocre, values between 0.7 and 0.8 are good, values between 0.8 and 0.9 are great and values above 0.9 are superb (see Kaiser & Rice, 1974).
Reliability
the ability of a measure to produce consistent results when the same entities are measured under different conditions.
Test-retest reliability
the ability of a measure to produce consistent results when the same entities are tested at two different points in time.
Generalization
the ability of a statistical model to say something beyond the set of observations that spawned it. If a model generalizes it is assumed that predictions from that model can be applied not just to the sample on which it is based, but to a wider population from which the sample came.
Power
the ability of a test to detect an effect of a particular size (a value of 0.8 is a good level to aim for).
Falsification
the act of disproving a hypothesis or theory.
Researcher degrees of freedom
the analytic decisions a researcher makes that potentially influence the results of the analysis. Some examples are: when to stop data collection, which control variables to include in the statistical model, and whether to exclude cases from the analysis.
Independence
the assumption that one data point does not influence another. When data come from people, it basically means that the behaviour of one person does not influence the behaviour of another.
Homogeneity of variance
the assumption that the variance of one variable is stable (i.e., relatively similar) at all levels of another variable.
Interaction effect
the combined effect of two or more predictor variables on an outcome variable. It can used to gauge moderation.
Deviance
the difference between the observed value of a variable and the value of that variable predicted by a statistical model.
Measurement error
the discrepancy between the numbers used to represent the thing that we're measuring and the actual value of the thing we're measuring (i.e., the value we would get if we could measure it directly).
Indirect effect
the effect of a predictor variable on an outcome variable through a mediator (cf. direct effect).
Direct effect
the effect of a predictor variable on an outcome variable when a mediator is present in the model (cf. indirect effect).
Error SSCP (E)
the error sum of squares and cross-products matrix. This is a sum of squares and cross-products matrix for the error in a predictive linear model fitted to multivariate data. It represents the unsystematic variance and is the multivariate equivalent of the residual sum of squares.
Sampling variation
the extent to which a statistic (the mean, median, t, F, etc.) varies in samples taken from the same population.
Publication bias
the fact that articles published in scientific journals tend to over-represent positive findings. This can be because (1) non-significant findings are less likely to be published; (2) scientists don't submit their non-significant results to journals; (3) scientists selectively report their results to focus on significant findings and exclude non-significant ones; and (4) scientists capitalize on research degrees of freedom to shed their results in the most favourable light possible.
Probability density function (PDF)
the function that describes the probability of a random variable taking a certain value. It is the mathematical function that describes the probability distribution.
Hypothesis SSCP (H)
the hypothesis sum of squares and cross-products matrix. This is a sum of squares and cross-products matrix for a predictive linear model fitted to multivariate data. It represents the systematic variance and is the multivariate equivalent of the model sum of squares.
Exp(B)
the label that SPSS applies to the odds ratio. It is an indicator of the change in odds resulting from a unit change in the predictor in logistic regression. If the value is greater than 1 then it indicates that as the predictor increases, the odds of the outcome occurring increase. Conversely, a value less than 1 indicates that as the predictor increases, the odds of the outcome occurring decrease.
Interquartile range
the limits within which the middle 50% of an ordered set of observations fall. It is the difference between the value of the upper quartile and lower quartile.
−2LL
the log-likelihood multiplied by minus 2. This version of the likelihood is used in logistic regression.
Shrinkage
the loss of predictive power of a regression model if the model has been derived from the population from which the sample was taken, rather than the sample itself.
Grand mean
the mean of an entire set of observations.
Median
the middle score of a set of ordered observations. When there is an even number of observations the median is the average of the two scores that fall either side of what would be the middle value.
Mode
the most frequently occurring score in a set of data.
Multiple R
the multiple correlation coefficient. It is the correlation between the observed values of an outcome and the values of the outcome predicted by a multiple regression model.
Lower-bound estimate
the name given to the lowest possible value of the Greenhouse-Geisser estimate of sphericity. Its value is 1/(k − 1), in which k is the number of treatment conditions.
Heterogeneity of variance
the opposite of homogeneity of variance. This term means that the variance of one variable varies (i.e., is different) across levels of another variable.
Heteroscedasticity
the opposite of homoscedasticity. This occurs when the residuals at each level of the predictor variables(s) have unequal variances. Put another way, at each point along any predictor variable, the spread of residuals is different.
Tertium quid
the possibility that an apparent relationship between two variables is actually caused by the effect of a third variable on them both (often called the third-variable problem).
HARKing
the practice in research articles of presenting a hypothesis that was made after data were collected as though it were made before data collection.
Alternative hypothesis
the prediction that there will be an effect (i.e., that your experimental manipulation will have some effect or that certain variables will relate to each other).
Sampling distribution
the probability distribution of a statistic. We can think of this as follows: if we take a sample from a population and calculate some statistic (e.g., the mean), the value of this statistic will depend somewhat on the sample we took. As such the statistic will vary slightly from sample to sample. If, hypothetically, we took lots and lots of samples from the population and calculated the statistic of interest we could create a frequency distribution of the values we get. The resulting distribution is what the sampling distribution represents: the distribution of possible values of a given statistic that we could expect to get from a given population.
Odds
the probability of an event occurring divided by the probability of that event not occurring.
α-level
the probability of making a Type I error (usually this value is 0.05).
Experimentwise error rate
the probability of making a Type I error in an experiment involving one or more statistical comparisons when the null hypothesis is true in each case.
Familywise error rate
the probability of making a Type I error in any family of tests when the null hypothesis is true in each case. The 'family of tests' can be loosely defined as a set of tests conducted on the same data set and addressing the same empirical question.
β-level
the probability of making a Type II error (Cohen, 1992, suggests a maximum value of 0.2).
Likelihood
the probability of obtaining a set of observations given the parameters of a model fitted to those observations. When using Bayes' theorem to test a hypothesis, the likelihood is the probability that the observed data could be produced given the hypothesis or model being considered, p(data|model). It is the inverse conditional probability of the posterior probability. See also marginal likelihood.
Transformation
the process of applying a mathematical function to all observations in a data set, usually to correct some distributional abnormality such as skew or kurtosis.
Standardization
the process of converting a variable into a standard unit of measurement. The unit of measurement typically used is standard deviation units (see also z-scores). Standardization allows us to compare data when different units of measurement have been used (we could compare weight measured in kilograms to height measured in inches).
Randomization
the process of doing things in an unsystematic or random way. In the context of experimental research the word usually applies to the random assignment of participants to different treatment conditions.
Centring
the process of transforming a variable into deviations around a fixed point. This fixed point can be any value that is chosen, but typically a mean is used. To centre a variable the mean is subtracted from each score. See grand mean centring, group mean centring.
Ranking
the process of transforming raw scores into numbers that represent their position in an ordered list of those scores. The raw scores are ordered from lowest to highest and the lowest score is assigned a rank of 1, the next highest score is assigned a rank of 2, and so on.
Communality
the proportion of a variable's variance that is common variance. This term is used primarily in factor analysis. A variable that has no unique variance (or random variance) would have a communality of 1, whereas a variable that shares none of its variance with any other variable would have a communality of 0.
Coefficient of determination
the proportion of variance in one variable explained by a second variable. It is Pearson's correlation coefficient squared.
Range
the range of scores is the value of the smallest score subtracted from the highest score. It is a measure of the dispersion of a set of scores. See also variance, standard deviation and interquartile range.
Posterior odds
the ratio of posterior probability for one hypothesis to another. In Bayesian hypothesis testing the posterior odds are the ratio of the probability of the alternative hypothesis given the data, p(alternative|data), to the probability of the null hypothesis given the data, p(null|data).
Odds ratio
the ratio of the odds of an event occurring in one group compared to another. So, for example, if the odds of dying after writing a glossary are 4, and the odds of dying after not writing a glossary are 0.25, then the odds ratio is 4/0.25 = 16. This means that the odds of dying if you write a glossary are 16 times higher than if you don't. An odds ratio of 1 would indicate that the odds of a particular outcome are equal in both groups.
Prior odds
the ratio of the probability of one hypothesis/model to a second. In Bayesian hypothesis testing, the prior odds are the probability of the alternative hypothesis, p(alternative), divided by the probability of the null hypothesis, p(null). The prior odds should reflect your belief in the alternative hypothesis relative to the null before you look at the data.
Bayes factor
the ratio of the probability of the observed data given the alternative hypothesis to the probability of the observed data given the null hypothesis. Put another way, it is the likelihood of the alternative hypothesis relative to the null. A Bayes factor of 3, for example, means that the observed data are 3 times more likely under the alternative hypothesis than under the null hypothesis. A Bayes factor less than 1 supports the null hypothesis by suggesting that the probability of the data given the null is higher than the probability of the data given the alternative hypothesis. Conversely, a Bayes factor greater than 1 suggests that the observed data are more likely given the alternative hypothesis than the null. Values between 1 and 3 are considered evidence for the alternative hypothesis that is 'barely worth mentioning', values between 3 and 10 are considered 'substantial evidence' ('having substance' rather than 'very strong') for the alternative hypothesis, and values greater than 10 are strong evidence for the alternative hypothesis.
Factor loading
the regression coefficient of a variable for the linear model that describes a latent variable or factor in factor analysis.
Levels of measurement
the relationship between what is being measured and the numbers obtained on a scale.
Standardized residuals
the residuals of a model expressed in standard deviation units. Standardized residuals with an absolute value greater than 3.29 (actually, we usually just use 3) are cause for concern because in an average sample a value this high is unlikely to happen by chance; if more than 1% of our observations have standardized residuals with an absolute value greater than 2.58 (we usually just say 2.5) there is evidence that the level of error within our model is unacceptable (the model is a fairly poor fit to the sample data); and if more than 5% of observations have standardized residuals with an absolute value greater than 1.96 (or 2 for convenience) then there is also evidence that the model is a poor representation of the actual data.
Unstandardized residuals
the residuals of a model expressed in the units in which the original outcome variable was measured.
Null hypothesis
the reverse of the experimental hypothesis, it states that your prediction is wrong and the predicted effect doesn't exist.
Standard error
the standard deviation of the sampling distribution of a statistic. For a given statistic (e.g., the mean) it tells us how much variability there is in this statistic across samples from the same population. Large values, therefore, indicate that a statistic from a given sample may not be an accurate reflection of the population from which the sample came.
Standard error of the mean (SE)
the standard error associated with the mean. Did you really need a glossary entry to work that out?
Total SSCP (T)
the total sum of squares and cross-products matrix. This is a sum of squares and cross-products matrix for an entire set of observations. It is the multivariate equivalent of the total sum of squares.
Main effect
the unique effect of a predictor variable (or independent variable) on an outcome variable. The term is usually used in the context of ANOVA.
z-score
the value of an observation expressed in standard deviation units. It is calculated by taking the observation, subtracting from it the mean of all observations, and dividing the result by the standard deviation of all observations. By converting a distribution of observations into z-scores a new distribution is created that has a mean of 0 and a standard deviation of 1.
Predicted value
the value of an outcome variable based on specific values of the predictor variable or variables being placed into a statistical model.
Upper quartile
the value that cuts off the highest 25% of ordered scores. If the scores are ordered and then divided into two halves at the median, then the upper quartile is the median of the top half of the scores.
Lower quartile
the value that cuts off the lowest 25% of the data. If the data are ordered and then divided into two halves at the median, then the lower quartile is the median of the lower half of the scores.
Grand variance
the variance within an entire set of observations.
Data view
there are two ways to view the contents of the data editor window. The data view shows you a spreadsheet and can be used for entering raw data. See also variable view.
Variable view
there are two ways to view the contents of the data editor window. The variable view allows you to define properties of the variables for which you wish to enter data. See also data view.
Mahalanobis distances
these measure the influence of a case by examining the distance of cases from the mean(s) of the predictor variable(s). One needs to look for the cases with the highest values. It is not easy to establish a cut-off point at which to worry, although Barnett and Lewis (1978) have produced a table of critical values dependent on the number of predictors and the sample size. From their work it is clear that even with large samples (N = 500) and five predictors, values above 25 are cause for concern. In smaller samples (N = 100) and with fewer predictors (namely three) values greater than 15 are problematic, and in very small samples (N = 30) with only two predictors values greater than 11 should be examined. However, for more specific advice, refer to Barnett and Lewis's (1978) table.
Simple effects analysis
this analysis looks at the effect of one independent variable (categorical predictor variable) at individual levels of another independent variable.
HE−1
this is a matrix that is functionally equivalent to the hypothesis SSCP divided by the error SSCP in MANOVA. Conceptually it represents the ratio of systematic to unsystematic variance, so is a multivariate analogue of the F-statistic.
Meta-analysis
this is a statistical procedure for assimilating research findings. It is based on the simple idea that we can take effect sizes from individual studies that research the same question, quantify the observed effect in a standard way (using effect sizes) and then combine these effects to get a more accurate idea of the true effect in the population.
Kendall's W
this is much the same as Friedman's ANOVA but is used specifically for looking at the agreement between raters. So, if, for example, we asked 10 different women to rate the attractiveness of Justin Timberlake, David Beckham and Brad Pitt we could use this test to look at the extent to which they agree. Kendall's W ranges from 0 (no agreement between judges) to 1 (complete agreement between judges).
Unsystematic variation
this is variation that isn't due to the effect in which we're interested (so could be due to natural differences between people in different samples such as differences in intelligence or motivation). We can think of this as variation that can't be explained by whatever model we've fitted to the data.
Kurtosis
this measures the degree to which scores cluster in the tails of a frequency distribution. Kurtosis is calculated such that no kurtosis yields a value of 3. To make the measure more intuitive, SPSS Statistics (and some other packages) subtract 3 from the value so that no kurtosis is expressed as 0 and positive and negative kurtosis take on positive and negative values, respectively. A distribution with positive kurtosis (leptokurtic, kurtosis > 0) has too many scores in the tails and is too peaked, whereas a distribution with negative kurtosis (platykurtic, kurtosis < 0) has too few scores in the tails and is quite flat.
AR(1)
this stands for first-order autoregressive structure. It is a covariance structure used in multilevel linear models in which the relationship between scores changes in a systematic way. It is assumed that the correlation between scores gets smaller over time and that variances are assumed to be homogeneous. This structure is often used for repeated-measures data (especially when measurements are taken over time such as in growth models).
Jonckheere-Terpstra test
this statistic tests for an ordered pattern of medians across independent groups. Essentially it does the same thing as the Kruskal-Wallis test (i.e., test for a difference between the medians of the groups) but it incorporates information about whether the order of the groups is meaningful. As such, you should use this test when you expect the groups you're comparing to produce a meaningful order of medians.
Central limit theorem
this theorem states that when samples are large (above about 30) the sampling distribution will take the shape of a normal distribution regardless of the shape of the population from which the sample was drawn. For small samples the t-distribution better approximates the shape of the sampling distribution. We also know from this theorem that the standard deviation of the sampling distribution (i.e., the standard error of the sample mean) will be equal to the standard deviation of the sample (s) divided by the square root of the sample size (N).
Partial out
to partial out the effect of a variable is to remove the variance that the variable shares with other variables in the analysis before looking at their relationships (see partial correlation).
Tolerance
tolerance statistics measure multicollinearity and are simply the reciprocal of the variance inflation factor (1/VIF). Values below 0.1 indicate serious problems, although Menard (1995) suggests that values below 0.2 are worthy of concern.
bi
unstandardized regression coefficient. Indicates the strength of relationship between a given predictor, i, of many and an outcome in the units of measurement of the predictor. It is the change in the outcome associated with a unit change in the predictor.
Bartlett's test of sphericity
unsurprisingly, this is a test of the assumption of sphericity. This test examines whether a variance-covariance matrix is proportional to an identity matrix Therefore, it effectively tests whether the diagonal elements of the variance-covariance matrix are equal (i.e., group variances are the same), and whether the off-diagonal elements are approximately zero (i.e., the dependent variables are not correlated). Jeremy Miles, who does a lot of multivariate stuff, claims he's never ever seen a matrix that reached non-significance using this test and, come to think of it, I've never seen one either (although I do less multivariate stuff), so you've got to wonder about its practical utility.
Quantiles
values that split a data set into equal portions. Quartiles, for example, are a special case of quantiles that split the data into four equal parts. Similarly, percentiles are points that split the data into 100 equal parts and noniles are points that split the data into nine equal parts (you get the general idea).
Numeric variables
variables involving numbers.
String variables
variables involving words (i.e., letter strings). Such variables could include responses to open-ended questions such as 'How much do you like writing glossary entries?'; the response might be 'About as much as I like placing my ballbag on hot coals'.
Common variance
variance shared by two or more variables.
Unique variance
variance that is specific to a particular variable (i.e., is not shared with other variables). We tend to use the term 'unique variance' to refer to variance that can be reliably attributed to only one measure, otherwise it is called random variance.
Random variance
variance that is unique to a particular variable but not reliably so.
Systematic variation
variation due to some genuine effect (be that the effect of an experimenter doing something to all of the participants in one sample but not in other samples, or natural variation between sets of variables). We can think of this as variation that can be explained by the model that we've fitted to the data.
Overdispersion
when the observed variance is bigger than expected from the logistic regression model. Like leprosy, you don't want it.
Autocorrelation
when the residuals of two observations in a regression model are correlated.
Nominal variable
where numbers merely represent names. For example, the numbers on sports players shirts: a player with the number 1 on her back is not necessarily worse than a player with a 2 on her back. The numbers have no meaning other than denoting the type of player (full back, centre forward, etc.).