Discovering Statistics Using IBM SPSS Statistics, 5e

Ace your homework & exams now with Quizwiz!

P-P plot / 'probability-probability plot'

A graph plotting the cumulative probability of a variable against the cumulative probability of a particular distribution (often a normal distribution). If values fall on the diagonal of the plot then the variable shares the same distribution as the one specified. Deviations from the diagonal show deviations from the distribution of interest.

Q-Q plot / 'quantile-quantile plot'

A graph plotting the quantiles of a variable against the quantiles of a particular distribution (often a normal distribution). If values fall on the diagonal of the plot then the variable shares the same distribution as the one specified. Deviations from the diagonal show deviations from the distribution of interest.

Theory

A hypothesized general principle or set of principles that explain known findings about a topic and from which new hypotheses can be generated. Typically been well-substantiated by repeated testing

Multilevel linear model (MLM)

A linear model (just like regression, ANCOVA, ANOVA, etc.) in which the hierarchical structure of the data is explicitly considered. In this analysis regression parameters can be fixed (as in regression and ANOVA) but also random (i.e., free to vary across different contexts at a higher level of the hierarchy). This means that for each regression parameter there is a fixed component but also an estimate of how much the parameter varies across contexts (see fixed coefficient, random coefficient).

Test of excess success (TES)

A procedure designed for identifying sets of results within academic articles that are 'too good to be true'.

Random intercept

A term used in multilevel linear modelling to denote when the intercept in the model is free to vary across different groups or contexts (cf. fixed intercept).

Fixed intercept

A term used in multilevel linear modelling to denote when the intercept in the model is not free to vary across different groups or contexts

Random slope

A term used in multilevel linear modelling to denote when the slope of the model is free to vary across different groups or contexts (cf. fixed slope).

Fixed slope

A term used in multilevel linear modelling to denote when the slope of the model is not free to vary across different groups or contexts

Bartlett's test of sphericity

A test of the assumption of sphericity that examines whether a variance-covariance matrix is proportional to an identity matrix Therefore, it effectively tests whether the diagonal elements of the variance-covariance matrix are equal (i.e., group variances are the same), and whether the off-diagonal elements are approximately zero (i.e., the dependent variables are not correlated). Questionable practical utility because it's often significant

Fixed variable

A variable that is not supposed to change over time (e.g., for most people their gender is a fixed variable).

Fisher's exact test

A way of computing the exact probability of a statistic. Designed originally to overcome the problem that with small samples the sampling distribution of the chi-square statistic deviates substantially from a chi-square distribution. It should be used with small samples.

Cohen's d

An effect size that expresses the difference between two means in standard deviation units. In general it can be estimated using:

Biserial correlation coefficient

Coefficient used when one variable is a continuous dichotomy (e.g., has an underlying continuum between the categories).

Chi-square test

Generally refers to Pearson's __ __ of the independence of two categorical variables. Essentially it tests whether two categorical variables forming a contingency table are associated.

How to interpret adjusted predicted value

If a case does not exert a large influence over the model then its predicted value should be similar regardless of whether the model was estimated including or excluding that case.

Journal

In the context of academia, a collection of articles on a broadly related theme, written by scientists, that report new data, new theoretical ideas or reviews/critiques of existing theories and data. Their main function is to induce learned helplessness in scientists through a complex process of self-esteem regulation using excessively harsh or complimentary peer feedback that has seemingly no obvious correlation with the actual quality of the work submitted.

Leptokurtic

Kurtosis of less than 0, has too many scores in the tails and the distribution is quite peaked.

Moderation

Moderation occurs when the relationship between two variables changes as a function of a third variable. For example, the relationship between watching horror films (predictor) and feeling scared at bedtime (outcome) might increase as a function of how vivid an imagination a person has (moderator).

Platykurtic

Negative kurtosis value; the distribution has too few scores in the tails and is quite flat.

Pearson's correlation coefficient

Pearson's product-moment correlation coefficient, to give it its full name, is a standardized measure of the strength of relationship between two variables. It can take any value from −1 (as one variable changes, the other changes in the opposite direction by the same amount), through 0 (as one variable changes the other doesn't change at all), to +1 (as one variable changes, the other changes in the same direction by the same amount).

Fixed effect

The case where all possible treatment conditions that a researcher is interested in are present in the experiment. These can be generalized only to the situations in the experiment. For example, if we say that we are interested only in the conditions that we had in our experiment (e.g., placebo, low dose and high dose) and we can generalize our findings only to the situation of a placebo, low dose and high dose.

Residuals

The collection of differences between the value a model predicts and the value observed in the data on which the model is based CALCULATED FOR EACH observation in a data set.

Fit

The degree to which a statistical model is an accurate representation of some observed data.

Residual

The difference between the value a model predicts and the value observed in the data on which the model is based. Basically, an error.

Adjusted predicted value

The predicted value of a case from a model estimated without that case included in the data, calculated by re-estimating the model without the case in question, then using this new model to predict the value of the excluded case. *A measure of the influence of a particular case of data*

Empirical probability

The probability of an event based on the observation of many trials. For example, if you define the collective as all men, then the empirical probability of infidelity in men will be the proportion of men who have been unfaithful while in a relationship. The probability applies to the collective and not to the individual events. You can talk about there being a 0.1 probability of men being unfaithful, but the individual men were either faithful or not, so their individual probability of infidelity was either 0 (they were faithful) or 1 (they were unfaithful).

F-statistic

The ratio of the average variability in the data that a given model can explain to the average variability unexplained by that same model. It is used to test the overall fit of the model in simple regression and multiple regression, and to test for overall differences between group means in experiments. A test statistic with a known probability distribution

Hartley's Fmax AKA variance ratio

The ratio of the variances between the group with the biggest variance and the group with the smallest variance. This ratio is compared to critical values in a table published by Hartley as a test of homogeneity of variance. Some general rules are that with sample sizes (n) of 10 per group, an Fmax less than 10 is more or less always going to be non-significant, with 15-20 per group the ratio needs to be less than about 5, and with samples of 30-60 the ratio should be below about 2 or 3.

Standard error of differences

The standard deviation of a sampling distribution of differences, a measure of the variability of differences between sample means.

Grand mean centring

The transformation of a variable by taking each score and subtracting the mean of all scores (for that variable) from it

Range

The value of the smallest score subtracted from the highest score. It is a measure of the dispersion of a set of scores. See also variance, standard deviation and interquartile range.

Tolerance

These statistics measure multicollinearity and are simply the reciprocal of the variance inflation factor (1/VIF). Values below 0.1 indicate serious problems, although Menard (1995) suggests that values below 0.2 are worthy of concern.

Error SSCP (E) AKA The error sum of squares and cross-products matrix

This is a for the error in a predictive linear model fitted to multivariate data. It represents the unsystematic variance and is the multivariate equivalent of the residual sum of squares.

Random effect

This occurs if the experiment contains only a sample of possible treatment conditions. Can be generalized beyond the treatment conditions in the experiment. For example, if we say that the conditions in our experiment (e.g., placebo, low dose and high dose) are only a sample of possible conditions (perhaps we could have tried a very high dose). We can generalize this beyond just placebos, low doses and high doses.

Mediation

This occurs when the relationship between a predictor variable and an outcome variable can be completely explained by their relationships with a third variable.

Heteroscedasticity

This occurs when the residuals at each level of the predictor variables(s) have unequal variances. Put another way, at each point along any predictor variable, the spread of residuals is different.

Adjusted R²

This tells us how much variance in the outcome would be accounted for if the model had been derived from the population from which the sample was taken. *measure of the loss of predictive power/shrinkage in regression*

Heterogeneity of variance

This term means that the variance of one variable varies (i.e., is different) across levels of another variable.

Cochran's Q

This test is an extension of McNemar's test and is basically a Friedman's ANOVA for dichotomous data. Ex: You asked 10 people whether they'd like to shoot Justin Timberlake, David Beckham and Simon Cowell and they could answer only 'yes' or 'no'. If we coded responses as 0 (no) and 1 (yes) we could use this.

McNemar's test

This tests differences between two related groups (see Wilcoxon signed-rank test and sign test), when nominal data have been used. It's typically used when we're looking for changes in people's scores and it compares the proportion of people who changed their response in one direction (i.e., scores increased) to those who changed in the opposite direction (scores decreased). So, this test needs to be used when we've got two related dichotomous variables.

Negative skew

When the frequent scores are clustered at the higher end of the distribution and the tail points towards the lower (or more negative) scores

Positive skew

When the frequent scores are clustered at the lower end of the distribution and the tail points towards the higher (or more positive) scores.

Binary variable

a categorical variable that has only two mutually exclusive categories (e.g., being dead or alive).

Fixed coefficient

a coefficient or model parameter that is fixed; that is, it cannot vary over situations or contexts (cf. random coefficient).

Random coefficient

a coefficient or model parameter that is free to vary over situations or contexts (cf. fixed coefficient).

Matrix

a collection of numbers arranged in columns and rows. The values within a matrix are typically referred to as components or elements.

Compound symmetry

a condition that holds true when both the variances across conditions are equal (this is the same as the homogeneity of variance assumption) and the covariances between pairs of conditions are also equal.

Polynomial contrast

a contrast that tests for trends in the data. In its most basic form it looks for a linear trend (i.e., that the group means increase proportionately).

Bonferroni correction

a correction applied to the α-level to control the overall Type I error rate when multiple significance tests are carried out. Each test conducted should use a criterion of significance of the α-level (normally 0.05) divided by the number of tests conducted. *Tends to be too strict when lots of tests are performed*

Bivariate correlation

a correlation between two variables.

Intraclass correlation (ICC)

a correlation coefficient that assesses the consistency between measures of the same class, that is, measures of the same thing (cf. Pearson's correlation coefficient, which measures the relationship between variables of a different class). Two common uses are in comparing paired data (such as twins) on the same measure, and assessing the consistency between judges' ratings of a set of objects. The calculation of these correlations depends on whether there is a measure of consistency (in which the order of scores from a source is considered but not the actual value around which the scores are anchored) or absolute agreement (in which both the order of scores and the relative values are considered), and whether the scores represent averages of many measures or just a single measure is required. This measure is also used in multilevel linear models to measure the dependency in data within the same context.

Unstructured

a covariance structure used in multilevel linear modelling. This covariance structure is completely general. Covariances are assumed to be completely unpredictable: they do not conform to a systematic pattern.

Variance components

a covariance structure used in multilevel linear modelling. This covariance structure is very simple and assumes that all random effects are independent and that the variances of random effects are assumed to be the same and sum to the variance of the outcome variable.

Diagonal

a covariance structure used in multilevel linear models. In this variance structure variances are assumed to be heterogeneous and all of the covariances are 0.

Probability distribution

a curve describing an idealized frequency distribution of a particular variable from which it is possible to ascertain the probability with which specific values of that variable will occur. For categorical variables it is simply a formula yielding the probability with which each category occurs.

p-curve

a curve summarizing the frequency distribution of p-values you'd expect to see in published research. On a graph that shows the value of the p-value on the horizontal axis against the frequency (or proportion) on the vertical axis, the p-curve is the line reflecting how frequently (or proportionately) each value of p should occur for a given effect size.

Growth curve

a curve that summarizes the change in some outcome over time. See polynomial.

Bimodal

a description of a distribution of observations that has two values that appear most often

Posterior distribution

a distribution of posterior probabilities. This distribution should contain our subjective beliefs about a parameter or hypothesis after considering the data. The posterior distribution can be used to ascertain a value of the posterior probability (usually by examining some measure of where the peak of the distribution lies or a credible interval).

Prior distribution

a distribution of prior probabilities. This distribution should contain our subjective beliefs about a parameter or hypothesis before, or prior to, considering the data. The prior distribution can be an informative prior or an uninformative prior.

Common factor

a factor that affects all measured variables and, therefore, explains the correlations between those variables.

Unique factor

a factor that affects only one of many measured variables and, therefore, cannot explain the correlations between those variables.

Non-parametric tests

a family of statistical procedures that do not rely on the restrictive assumptions of parametric tests. In particular, they do not assume that the sampling distribution is normally distributed.

Multivariate analysis of variance

a family of tests that extend the basic analysis of variance to situations in which more than one outcome variable has been measured.

Concurrent validity

a form of criterion validity where there is evidence that scores from an instrument correspond to concurrently recorded external measures conceptually related to the measured construct.

Predictive validity

a form of criterion validity where there is evidence that scores from an instrument predict external measures (recorded at a different point in time) conceptually related to the measured construct.

Experimental research

a form of research in which one or more variables are systematically manipulated to see their effect (alone or in combination) on an outcome variable. This term implies that data will be able to be used to make statements about cause and effect.

Cross-sectional research

a form of research in which you observe what naturally goes on in the world without directly interfering with it by measuring several variables at a single time point. In psychology, this term usually implies that data come from people at different age points, with different people representing each age point.

Longitudinal research

a form of research in which you observe what naturally goes on in the world without directly interfering with it, by measuring several variables at multiple time points.

Correlational research

a form of research in which you observe what naturally goes on in the world without directly interfering with it. This term implies that data will be analysed so as to look at relationships between naturally occurring variables rather than making statements about cause and effect.

Smartreader

a free piece of software that can be downloaded from the IBM SPSS website and enables people who do not have SPSS Statistics installed to open and view SPSS output files.

Central tendency

a generic term describing the centre of a frequency distribution of observations as measured by the mean, mode and median.

Quartiles

a generic term for the three values that cut an ordered data set into four equal parts.

CAIC (Bozdogan's criterion)

a goodness-of-fit measure similar to the AIC, but correcting for model complexity and sample size. It is not intrinsically interpretable, but can be compared in different models to see how changing the model affects the fit. A small value represents a better fit to the data.

AIC (Akaike's information criterion)

a goodness-of-fit measure that is corrected for model complexity (takes account of how many parameters have been estimated). It is not intrinsically interpretable, but can be compared in different models to see how changing the model affects the fit. A small value represents a better fit to the data.

AICC (Hurvich and Tsai's criterion)

a goodness-of-fit measure that is similar to AIC but is designed for small samples. It is not intrinsically interpretable, but can be compared in different models to see how changing the model affects the fit. Small value=better fit

Bar chart

a graph in which a summary statistic (usually the mean) is plotted on the y-axis against a categorical variable on the x-axis (this categorical variable could represent, for example, groups of people, different times or different experimental conditions). The value of the mean for each category is shown by a bar. Different-coloured bars may be used to represent levels of a second categorical variable.

Line chart

a graph in which a summary statistic (usually the mean) is plotted on the y-axis against a categorical variable on the x-axis (this categorical variable could represent, for example, groups of people, different times or different experimental conditions). The value of the mean for each category is shown by a symbol, and means across categories are connected by a line. Different-coloured lines may be used to represent levels of a second categorical variable.

Scree plot

a graph plotting each factor in a factor analysis (X-axis) against its associated eigenvalue (Y-axis). It shows the relative importance of each factor. This graph has a very characteristic shape (there is a sharp descent in the curve followed by a tailing off), and the point of inflexion of this curve is often used as a means of extraction. With a sample of more than 200 participants, this provides a fairly reliable criterion for extraction (Stevens, 2002)

Frequency distribution AKA histogram

a graph plotting values of observations on the horizontal axis, and how many times each value occurs in the data set on the vertical axis

Interaction graph

a graph showing the means of two or more independent variables in which means of one variable are shown at different levels of the other variable. Unusually the means are connected with lines, or are displayed as bars. These graphs are used to help understand interaction effects.

Scatterplot

a graph that plots values of one variable against the corresponding values of another variable (and the corresponding values of a third variable can also be included on a 3-D scatterplot).

Boxplot AKA box-whisker diagram

a graphical representation of some important characteristics of a set of observations. At the centre of the plot is the median, which is surrounded by a box the top and bottom of which are the limits within which the middle 50% of observations fall (the interquartile range). Sticking out of the top and bottom of the box are two whiskers which extend to the highest and lowest extreme scores, respectively.

Error bar chart

a graphical representation of the mean of a set of observations that includes the 95% confidence interval of the mean. The mean is usually represented as a circle, square or rectangle at the value of the mean (or a bar extending to the value of the mean). The confidence interval is represented by a line protruding from the mean (upwards, downwards or both) to a short horizontal line representing the limits of the confidence interval. Error bars can be drawn using the standard error or standard deviation instead of the 95% confidence interval.

Sphericity

a less restrictive form of compound symmetry which assumes that the variances of the differences between data taken from the same participant (or other entity being tested) are equal. This assumption is most commonly found in repeated-measures ANOVA but applies only where there are more than two points of data from the same participant. See also Greenhouse-Geisser correction, Huynh-Feldt correction.

Regression line

a line on a scatterplot representing the regression model of the relationship between the two variables plotted.

Discriminant function variate

a linear combination of variables created such that the differences between group means on the transformed variable are maximized. It takes the general form:

Simple regression

a linear model in which one variable or outcome is predicted from a single predictor variable. The model takes the form of the equation in which Y is the outcome variable, X is the predictor, b1 is the regression coefficient associated with the predictor and b0 is the value of the outcome when the predictor is zero.

Structure matrix

a matrix in factor analysis containing the correlation coefficients for each variable on each factor in the data. When orthogonal rotation is used this is the same as the pattern matrix, but when oblique rotation is used these matrices are different.

Pattern matrix

a matrix in factor analysis containing the regression coefficients for each variable on each factor in the data. See also structure matrix.

Square matrix

a matrix that has an equal number of columns and rows.

Factor transformation matrix, ∧

a matrix used in factor analysis. It can be thought of as containing the angles through which factors are rotated in factor rotation.

Mean squares

a measure of average variability. For every sum of squares (which measure the total variability) it is possible to create mean squares by dividing by the number of things used to calculate the sum of squares (or some function of it).

Log-likelihood

a measure of error, or unexplained variation, in categorical models. Based on summing the probabilities associated with the predicted and actual outcomes and is analogous to the residual sum of squares in multiple regression in that it is an indicator of how much unexplained information there is after the model has been fitted. Large values indicate poorly fitting statistical models, because the larger the value of the log-likelihood, the more unexplained observations there are. The logarithm of the likelihood.

Variance inflation factor (VIF)

a measure of multicollinearity. The VIF indicates whether a predictor has a strong linear relationship with the other predictor(s). Myers (1990) suggests that a value of 10 is a good value at which to worry. Bowerman and O'Connell (1990) suggest that if the average VIF is greater than 1, then multicollinearity may be biasing the regression model.

Split-half reliability

a measure of reliability obtained by splitting items on a measure into two halves (in some random fashion) and obtaining a score from each half of the scale. The correlation between the two scores, corrected to take account of the fact the correlations are based on only half of the items, is used as a measure of reliability. There are two popular ways to do this. Spearman (1910) and Brown (1910) developed a formula that takes no account of the standard deviation of items in which r12 is the correlation between the two halves of the scale. Flanagan (1937) and Rulon (1939), however, proposed a measure that does account for item variance in which,s1 and s2 are the standard deviations of each half of the scale, and is the variance of the whole test. See Cortina (1993) for more details.

Covariance

a measure of the 'average' relationship between two variables. It is the average cross-product deviation (i.e., the cross-product divided by one less than the number of observations).

Cross-product deviations

a measure of the 'total' relationship between two variables. It is the deviation of one variable from its mean multiplied by the other variable's deviation from its mean.

DFBeta

a measure of the influence of a case on the values of bi in a regression model. If we estimated a regression parameter bi and then deleted a particular case and re-estimated the same regression parameter bi, then the difference between these two estimates would be the _____ for the case that was deleted. By looking at the values of the _____, it is possible to identify cases that have a large influence on the parameters of the regression model; however, the size of _____ will depend on the units of measurement of the regression parameter.

DFFit

a measure of the influence of a case. It is the difference between the adjusted predicted value and the original predicted value of a particular case. If a case is not influential then its ____ should be zero - hence, we expect non-influential cases to have small ______values. However, we have the problem that this statistic depends on the units of measurement of the outcome, and so a ______ of 0.5 will be very small if the outcome ranges from 1 to 100, but very large if the outcome varies from 0 to 1.

Deleted residual

a measure of the influence of a particular case of data. It is the difference between the adjusted predicted value for a case and the original observed value for that case.

Studentized deleted residual

a measure of the influence of a particular case of data. This is a standardized version of the deleted residual.

Cook's distance

a measure of the overall influence of a case on a model. Suggested that values greater than 1 may be cause for concern.

Partial correlation

a measure of the relationship between two variables while 'controlling' the effect of one or more additional variables has on both.

Semi-partial correlation

a measure of the relationship between two variables while adjusting for the effect that one or more additional variables have on one of those variables. If we call our variables x and y, it gives us a measure of the variance in y that x alone shares.

Cronbach's α

a measure of the reliability of a scale defined by the equation shown in which the top half of the equation is simply the number of items (N) squared multiplied by the average covariance between items (the average of the off-diagonal elements in the variance-covariance matrix). The bottom half is the sum of all the elements in the variance-covariance matrix.

Cramér's V

a measure of the strength of association between two categorical variables used when one of these variables has more than two categories. It is a variant of phi used because when one or both of the categorical variables contain more than two categories, phi fails to reach its minimum value of 0 (indicating no association).

Phi

a measure of the strength of association between two categorical variables. Phi is used with 2 × 2 contingency tables (tables which have two categorical variables and each variable has only two categories). Phi is a variant of the chi-square test, X2: in which N is the total number of observations.

Correlation coefficient

a measure of the strength of association or relationship between two variables. See Pearson's correlation coefficient, Spearman's correlation coefficient, Kendall's tau.

Skew

a measure of the symmetry of a frequency distribution. Symmetrical distributions have a ______of 0.

Model sum of squares

a measure of the total amount of variability for which a model can account. It is the difference between the total sum of squares and the residual sum of squares.

Total sum of squares

a measure of the total variability within a set of observation. It is the total squared deviance between each observation and the overall mean of all observations.

Residual sum of squares

a measure of the variability that cannot be explained by the model fitted to the data. It is the total squared deviance between the observations, and the value of those observations predicted by whatever model is fitted to the data.

Covariance ratio (CVR)

a measure of whether a case influences the variance of the parameters in a regression model. When this ratio is close to 1 the case has very little influence on the variances of the model parameters. Belsey et al. (1980) recommend the following: if the ____ of a case is greater than 1 + [3(k + 1)/n] then deleting that case will damage the precision of some of the model's parameters, but if it is less than 1 − [3(k + 1)/n] then deleting the case will improve the precision of some of the model's parameters (k is the number of predictors and n is the sample size).

Method of least squares

a method of estimating parameters (such as the mean, or a regression coefficient) that is based on minimizing the sum of squared errors. The parameter estimate will be the value, out of all of those possible, which has the smallest sum of squared errors.

Kaiser's criterion

a method of extraction in factor analysis based on the idea of retaining factors with associated eigenvalues greater than 1. This method appears to be accurate when the number of variables in the analysis is less than 30 and the resulting communalities (after extraction) are all greater than 0.7, or when the sample size exceeds 250 and the average communality is greater than or equal to 0.6.

Alpha factoring

a method of factor analysis.

Hierarchical regression

a method of multiple regression in which the order in which predictors are entered into the regression model is determined by the researcher based on previous research: variables already known to be predictors are entered first, new variables are entered subsequently.

Stepwise regression

a method of multiple regression in which variables are entered into the model based on a statistical criterion (the semi-partial correlation with the outcome variable). Once a new variable is entered into the model, all variables in the model are assessed to see whether they should be removed.

Promax

a method of oblique rotation that is computationally faster than direct oblimin and so useful for large data sets.

Direct oblimin

a method of oblique rotation.

Equamax

a method of orthogonal rotation that is a hybrid of quartimax and varimax. *Fairly erratic & so is probably best avoided*

Varimax

a method of orthogonal rotation. It attempts to maximize the dispersion of factor loadings within factors. Therefore, it tries to load a smaller number of variables highly onto each factor, resulting in more interpretable clusters of factors.

Quartimax

a method of orthogonal rotation. It attempts to maximize the spread of factor loadings for a variable across all factors. This often results in lots of variables loading highly on a single factor.

Weighted least squares

a method of regression in which the parameters of the model are estimated using the method of least squares but observations are weighted by some other variable. Often they are weighted by the inverse of their variance to combat heteroscedasticity.

Ordinary least squares (OLS)

a method of regression in which the parameters of the model are estimated using the method of least squares.

Oblique rotation

a method of rotation in factor analysis that allows the underlying factors to be correlated.

Orthogonal rotation

a method of rotation in factor analysis that keeps the underlying factors independent (i.e., not correlated).

Saturated model

a model that perfectly fits the data and, therefore, has no error. It contains all possible main effects and interactions between variables.

Open science

a movement to make the process, data and outcomes of scientific research freely available to everyone.

Principal component analysis (PCA)

a multivariate technique for identifying the linear components of a set of variables.

Factor analysis

a multivariate technique for identifying whether the correlations between a set of observed variables stem from their relationship to one or more latent variables in the data, each of which takes the form of a linear model.

Repeated contrast

a non-orthogonal planned contrast that compares the mean in each condition (except the first) to the mean of the preceding condition.

Simple contrast

a non-orthogonal planned contrast that compares the mean in each condition to the mean of either the first or last condition, depending on how the contrast is specified.

Difference contrast

a non-orthogonal planned contrast that compares the mean of each condition (except the first) to the overall mean of all previous conditions combined.

Helmert contrast

a non-orthogonal planned contrast that compares the mean of each condition (except the last) to the overall mean all subsequent conditions combined.

Deviation contrast

a non-orthogonal planned contrast that compares the mean of each group (except for the first or last, depending on how the contrast is specified) to the overall mean.

Kendall's tau

a non-parametric correlation coefficient similar to Spearman's correlation coefficient, but should be used in preference for a small data set with a large number of tied ranks.

Kruskal-Wallis test

a non-parametric test of whether more than two independent groups differ. It is the non-parametric version of one-way independent ANOVA.

Friedman's ANOVA

a non-parametric test of whether more than two related groups differ. The non-parametric version of one-way repeated-measures ANOVA.

Median test

a non-parametric test of whether samples are drawn from a population with the same median. So, in effect, it does the same thing as the Kruskal-Wallis test. It works on the basis of producing a contingency table that is split for each group into the number of scores that fall above and below the observed median of the entire data set. If the groups are from the same population then these frequencies would be expected to be the same in all conditions (about 50% above and about 50% below).

Moses extreme reactions

a non-parametric test that compares the variability of scores in two groups, so it's a bit like a non-parametric Levene's test.

Wilcoxon signed-rank test

a non-parametric test that looks for differences between two RELATED samples. It is the non-parametric equivalent of the related t-test.

Mann-Whitney test

a non-parametric test that looks for differences between two independent samples. That is, it tests whether the populations from which two samples are drawn have the same location. It is functionally the same as Wilcoxon's rank-sum test, and both tests are non-parametric equivalents of the independent t-test.

Wilcoxon's rank-sum test

a non-parametric test that looks for differences between two independent samples. That is, it tests whether the populations from which two samples are drawn have the same location. It is functionally the same as the Mann-Whitney test, and both tests are non-parametric equivalents of the independent t-test.

Mixed normal distribution AKA Contaminated normal distribution

a normal-looking distribution that is contaminated by a small proportion of scores from a different distribution. These distributions are not normal and have too many scores in the tails (i.e., at the extremes). The effect of these heavy tails is to inflate the estimate of the population variance. This, in turn, makes significance tests lack power.

Weight

a number by which something (usually a variable in statistics) is multiplied. The weight assigned to a variable determines the influence that variable has within a mathematical equation: large weights give the variable a lot of influence.

Polynomial

a posh name for a growth curve or trend over time. If time is our predictor variable, then any polynomial is tested by including a variable that is the predictor to the power of the order of polynomial that we want to test: a linear trend is tested by time alone, a quadratic or second-order polynomial is tested by including a predictor that is time2, for a fifth-order polynomial we need a predictor of time5 and for an nth-order polynomial we would have to include timen as a predictor.

Normal distribution

a probability distribution of a random variable that is known to have certain properties. It is perfectly symmetrical (has a skew of 0), and has a kurtosis of 0.

Chi-square distribution

a probability distribution of the sum of squares of several normally distributed variables. It tends to be used to test hypotheses about categorical data, and to test the fit of models to the observed data.

Loglinear analysis

a procedure used as an extension of the chi-square test to analyse situations in which we have more than two categorical variables and we want to test for relationships between these variables. Essentially, a linear model is fitted to the data that predicts expected frequencies (i.e., the number of cases expected in a given category). In this respect it is much the same as analysis of variance but for entirely categorical data.

Rotation

a process in factor analysis for improving the interpretability of factors. In essence, an attempt is made to transform the factors that emerge from the analysis in such a way as to maximize factor loadings that are already large, and minimize factor loadings that are already small. There are two general approaches: orthogonal rotation and oblique rotation.

Counterbalancing

a process of systematically varying the order in which experimental conditions are conducted. In the simplest case of there being two conditions (A and B), this implies that half of the participants complete condition A followed by condition B, whereas the remainder do condition B followed by condition A. The aim is to remove systematic bias caused by practice effects or boredom effects.

Hypothesis

a proposed explanation for a fairly narrow phenomenon or set of observations. It is not a guess, but an informed, theory-driven attempt to explain what has been observed. It cannot be tested directly but must first be operationalized as predictions about variables that can be measured

M-estimator

a robust measure of location. One example is the median. In some cases it is a measure of location computed after outliers have been removed; unlike a trimmed mean, the amount of trimming used to remove outliers is determined empirically.

Discriminant score

a score for an individual case on a particular discriminant function variate obtained by substituting that case's scores on the measured variables into the equation that defines the variate in question.

Planned contrasts

a set of comparisons between group means that are constructed before any data are collected. These are theory-led comparisons and are based on the idea of partitioning the variance created by the overall effect of group differences into gradually smaller portions of variance. These tests have more power than post hoc tests.

Post hoc tests

a set of comparisons between group means that were not thought of before data were collected. Typically these tests involve comparing the means of all combinations of pairs of groups. To compensate for the number of tests conducted, each test uses a strict criterion for significance. As such, they tend to have less power than planned contrasts. They are usually used for exploratory work for which no firm hypotheses were available on which to base planned contrasts.

Sobell test

a significance test of mediation. It tests whether the relationship between a predictor variable and an outcome variable is significantly reduced when a mediator is included in the model. It tests the indirect effect of the predictor on the outcome.

Mean

a simple statistical model of the centre of a distribution of scores. A hypothetical estimate of the 'typical' score.

Factor score

a single score from an individual entity representing their performance on some latent variable. The score can be crudely conceptualized as follows: take an entity's score on each of the variables that make up the factor and multiply it by the corresponding factor loading for the variable, then add these values up (or average them).

Complete separation

a situation in logistic regression when the outcome variable can be perfectly predicted by one predictor or a combination of predictors!

Multicollinearity

a situation in which two or more variables are very closely linearly related.

Suppressor effect

a situation where a predictor has a significant effect but only when another variable is held constant.

Šidák correction

a slightly less conservative variant of a Bonferroni correction.

Sample

a smaller (but hopefully representative) collection of units from a population used to determine truths about that population (e.g., how a given population behaves in certain conditions).

Identity matrix

a square matrix (i.e., having the same number of rows and columns) in which the diagonal elements are equal to 1, and the off-diagonal elements are equal to 0.

Variance-covariance matrix

a square matrix (i.e., same number of columns and rows) representing in which the diagonals represent the variances within each variable, whereas the off-diagonals represent the covariances between pairs of variables.

Sum of squares and cross-products matrix (SSCP matrix)

a square matrix in which the diagonal elements represent the sum of squares for a particular variable, and the off-diagonal elements represent the cross-products between pairs of variables. THIS is basically the same as the variance-covariance matrix, except that it expresses variability and between-variable relationships as total values, whereas the variance-covariance matrix expresses them as average values.

Index of mediation

a standardized measure of an indirect effect. In a mediation model, it is the indirect effect multiplied by the ratio of the standard deviation of the predictor variable to the standard deviation of the outcome variable.

Spearman's correlation coefficient

a standardized measure of the strength of relationship between two variables that does not rely on the assumptions of a parametric test. It is Pearson's correlation coefficient performed on data that have been converted into ranked scores.

Biserial correlation

a standardized measure of the strength of relationship between two variables when one of the two variables is dichotomous.

Point-biserial correlation

a standardized measure of the strength of relationship between two variables when one of the two variables is dichotomous. The coefficient is used when the dichotomy is a discrete, or true, dichotomy (i.e., one for which there is no underlying continuum between the categories). An example of this is pregnancy: you can be either pregnant or not, there is no in between.

Standardized DFBeta

a standardized version of DFBeta. These standardized values are easier to use than DFBeta because universal cut-off points can be applied. Stevens (2002) suggests looking at cases with absolute values greater than 2.

Standardized DFFit

a standardized version of DFFit.

Test statistic

a statistic for which we know how frequently different values occur. The observed value of such a statistic is typically used to test hypotheses.

Trimmed mean

a statistic used in many robust tests. It is a mean calculated using trimmed data. For example, a 20% BLANK is a mean calculated after the top and bottom 20% of ordered scores have been removed. Imagine we had 20 scores representing the annual income of students (in thousands), rounded to the nearest thousand: 0, 1, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 40. The mean income is 5 (£5000), which is biased by an outlier. A 10% BLANK will remove 10% of scores from the top and bottom of ordered scores before the mean is calculated. With 20 scores, removing 10% of scores involves removing the top and bottom two scores. This gives us: 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, the mean of which is 3.44. The mean depends on a symmetrical distribution to be accurate, but a trimmed mean produces accurate results even when the distribution is not symmetrical. There are more complex examples of robust methods such as the bootstrap.

Linear model

a statistical model that is based upon an equation of the form Y = BX + E, in which Y is a vector containing scores from an outcome variable, B represents the b-values, X the predictor variables and E the error terms associated with each predictor. The equation can represent a solitary predictor variable (B, X and E are vectors) as in simple regression or multiple predictors (B, X and E are matrices) as in multiple regression. The key is the form of the model, which is linear (e.g., with a single predictor the equation is that of a straight line).

Analysis of covariance (ANCOVA)

a statistical procedure that uses the F-statistic to test the overall fit of a linear model, adjusting for the effect that one or more covariates have on the outcome variable. In experimental research this linear model tends to be defined in terms of group means and the resulting ANOVA is therefore an overall test of whether group means differ after the variance in the outcome variable explained by any covariates has been removed.

Analysis of variance (ANOVA)

a statistical procedure that uses the F-statistic to test the overall fit of a linear model. In experimental research this linear model tends to be defined in terms of group means, and the result is therefore an overall test of whether group means differ.

Contingency table

a table representing the cross-classification of two or more categorical variables. The levels of each variable are arranged in a grid, and the number of observations falling into each category is noted in the cells of the table. For example, if we took the categorical variables of glossary (with two categories: whether an author was made to write a glossary or not), and mental state (with three categories: normal, sobbing uncontrollably and utterly psychotic), we could construct a table. This instantly tells us that 127 authors who were made to write a glossary ended up as utterly psychotic, compared to only 2 who did not write a glossary.

Bootstrap

a technique from which the sampling distribution of a statistic is estimated by taking repeated samples (with replacement) from the data set (in effect, treating the data as a population from which smaller samples are taken). The statistic of interest (e.g., the mean, or b coefficient) is calculated for each sample, from which the sampling distribution of the statistic is estimated. The standard error of the statistic is estimated as the standard deviation of the sampling distribution created from the <___> samples. From this, confidence intervals and significance tests can be computed.

Robust test

a term applied to a family of procedures to estimate statistics that are reliable even when the normal assumptions of the statistic are not met.

Monte Carlo method

a term applied to the process of using data simulations to solve statistical problems. Its name comes from the use of Monte Carlo roulette tables to generate 'random' numbers in the pre-computer age. Karl Pearson, for example, purchased copies of Le Monaco, a weekly Paris periodical that published data from the Monte Carlo casinos' roulette wheels. He used these data as pseudo-random numbers in his statistical research.

Pre-registration

a term referring to the practice of making all aspects of your research process (rationale, hypotheses, design, data processing strategy, data analysis strategy) publically available before data collection begins. This can be done in a registered report in an academic journal, or more informally (e.g., on a public website such as the Open Science Framework). The aim is to encourage adherence to an agreed research protocol, thus discouraging threats to the validity of scientific results such as researcher degrees of freedom.

General linear model

a term to represent the fact that the linear model can encompass a range of different research designs such as multiple outcome variables (a.k.a. MANOVA), comparing means of categorical predictors (a.k.a. t-test, ANOVA), and including both categorical and continuous predictors (a.k.a. ANCOVA).

Extraction

a term used for the process of deciding whether a factor in factor analysis is statistically important enough to 'extract' from the data and interpret. The decision is based on the magnitude of the eigenvalue associated with the factor. See Kaiser's criterion, scree plot.

Singularity

a term used to describe variables that are perfectly correlated (i.e., the correlation coefficient is 1 or −1).

Durbin-Watson test

a test for serial correlations between errors in regression models. It tests whether adjacent residuals are correlated *useful in assessing the assumption of independent errors* Varies between 0 and 4, 2, residuals are uncorrelated, >2, negative correlation <2, positive correlation. As a very conservative rule of thumb, values <1 or >3 are cause for concern Values closer to 2 may still be problematic, depending on the sample and model (As the size depends upon the number of predictors in the model and the number of observations)

One-tailed test

a test of a directional hypothesis. For example, the hypothesis 'the longer I write this glossary, the more I want to place my editor's genitals in a starved crocodile's mouth' requires a one-tailed test because I've stated the direction of the relationship. I would generally advise against using them because of the temptation to interpret interesting effects in the opposite direction to that predicted. See also two-tailed test.

Two-tailed test

a test of a non-directional hypothesis. For example, the hypothesis 'writing this glossary has some effect on what I want to do with my editor's genitals' requires this because it doesn't suggest the direction of the relationship.

Box's test

a test of the assumption of homogeneity of covariance matrices. This test should be non-significant if the matrices are roughly the same. PROBLEMS: very susceptible to deviations from multivariate normality and so may be non-significant not because the variance-covariance matrices are similar across groups, but because the assumption of multivariate normality is not tenable. WORK AROUND: Have some idea of whether the data meet the multivariate normality assumption before interpreting the result (extremely difficult)

Mauchly's test

a test of the assumption of sphericity. If this test is significant then the assumption of sphericity has not been met and an appropriate correction must be applied to the degrees of freedom of the F-statistic in repeated-measures ANOVA. The test works by comparing the variance-covariance matrix of the data to an identity matrix; if the variance-covariance matrix is a scalar multiple of an identity matrix then sphericity is met.

Kolmogorov-Smirnov test

a test of whether a distribution of scores is significantly different from a normal distribution. A significant value indicates a deviation from normality, but this test is notoriously affected by large samples in which small deviations from normality yield significant results.

Shapiro-Wilk test

a test of whether a distribution of scores is significantly different from a normal distribution. A significant value indicates a deviation from normality, but this test is notoriously affected by large samples in which small deviations from normality yield significant results.

Roy's largest root

a test statistic in MANOVA. It is the eigenvalue for the first discriminant function variate of a set of observations. So, it is the same as the Hotelling-Lawley trace, but for the first variate only. It represents the proportion of explained variance to unexplained variance (SSM/SSR) for the first discriminant function.

Wilks's lambda ()

a test statistic in MANOVA. It is the product of the unexplained variance on each of the discriminant function variates, so it represents the ratio of error variance to total variance (SSR/SST) for each variate.

Hotelling-Lawley trace (T2)

a test statistic in MANOVA. It is the sum of the eigenvalues for each discriminant function variate of the data and so is conceptually the same as the F-statistic in ANOVA: it is the sum of the ratio of systematic and unsystematic variance (SSM/SSR) for each of the variates.

Pillai-Bartlett trace (V)

a test statistic in MANOVA. It is the sum of the proportion of explained variance on the discriminant function variates of the data. As such, it is similar to the ratio of SSM/SST.

Wald statistic

a test statistic with a known probability distribution (a normal distribution, or a chi-square distribution when squared) that is used to test whether the b coefficient for a predictor in a logistic regression model is significantly different from zero. It is analogous to the t-statistic in a regression model in that it is simply the b coefficient divided by its standard error. Inaccurate when the regression coefficient (b) is large, because the standard error tends to become inflated, resulting in the statistic being underestimated.

t-statistic

a test statistic with a known probability distribution. In the context of the linear model it is used to test whether a b-value is significantly different from zero; in the context of experimental work this b-value represents the difference between two means and so ___ is a test of whether the difference between those means is significantly different from zero.

Parametric test

a test that requires data from one of the large catalogue of distributions that statisticians have described. Normally this term is used for those tests based on the normal distribution, which require four basic assumptions that must be met for the test to be accurate: a normally distributed sampling distribution (see normal distribution), homogeneity of variance, interval or ratio data, and independence.

Independent t-test

a test using the t-statistic that establishes whether two means collected from independent samples differ significantly.

Dependent t-test AKA Paired-samples t-test

a test using the t-statistic that establishes whether two means collected from the same sample (or related observations) differ significantly.

Percentiles

a type of quantile; they are values that split the data into 100 equal parts.

Noniles

a type of quantile; they are values that split the data into nine equal parts. They are comonly used in educational research.

Confounding variable

a variable (that we may or may not have measured) other than the predictor variables in which we're interested that potentially affects an outcome variable.

Currency variable

a variable containing values of money.

Date variable

a variable made up of dates. The data can take forms such as dd-mmm-yyyy (e.g., 21-Jun-1973), dd-mmm-yy (e.g., 21-Jun-73), mm/dd/yy (e.g., 06/21/73), dd.mm.yyyy (e.g., 21.06.1973).

Continuous variable

a variable that can be measured to any level of precision. (Ex: Time, because there is in principle no limit on how finely it could be measured.)

Discrete variable

a variable that can only take on certain values (usually whole numbers) on the scale.

Latent variable

a variable that cannot be directly measured, but is assumed to be related to several variables that can be measured.

Moderator

a variable that changes the size and/or direction of the relationship between two other variables.

Covariate

a variable that has a relationship with (in terms of covariance), or has the potential to be related to, the outcome variable we've measured.

Predictor variable

a variable that is used to try to predict values of another variable known as an outcome variable.

Mediator

a variable that reduces the size and/or direction of the relationship between a predictor variable and an outcome variable (ideally to zero) and is associated statistically with both.

Random variable

a variable that varies over time (e.g., your weight is likely to fluctuate over time).

Outcome variable

a variable whose values we are trying to predict from one or more predictor variables.

Studentized residuals

a variation on standardized residuals; the unstandardized residual divided by an estimate of its standard deviation that varies point by point. These residuals have the same properties as the standardized residuals but usually provide a more precise estimate of the error variance of a specific case.

Partial eta squared (partial η2)

a version of eta squared that is the proportion of variance that a variable explains when excluding other variables in the analysis. Eta squared is the proportion of total variance explained by a variable, whereas partial eta squared is the proportion of variance that a variable explains that is not explained by other variables.

Confirmatory factor analysis (CFA)

a version of factor analysis in which specific hypotheses about structure and relations between the latent variables that underlie the data are tested.

binary logistic regression

a version of multiple regression in which the outcome is a categorical variable that has exactly two categories

multinomial logistic regression

a version of multiple regression in which the outcome is a categorical variable with more than two categories, in which the analysis breaks down the outcome variable into a series of comparisons between two categories

Logistic regression

a version of multiple regression in which the outcome is a categorical variable.

Brown-Forsythe F

a version of the F-statistic designed to be accurate when the assumption of homogeneity of variance has been violated.

Welch's F

a version of the F-statistic designed to be accurate when the assumption of homogeneity of variance has been violated.

Cox and Snell's

a version of the coefficient of determination for logistic regression. It is based on the log-likelihood of a model, the log-likelihood of the original model and the sample size, n. However, it is notorious for not reaching its maximum value of 1 (see Nagelkerke's ).

Parameter

a very difficult thing to describe. When you fit a statistical model to your data, that model will consist of variables and parameters: variables are measured constructs that vary across entities in the sample, whereas parameters describe the relations between those variables in the population. In other words, they are constants believed to represent some fundamental truth about the measured variables. We use sample data to estimate the likely value of parameters because we don't have direct access to the population. Of course, it's not quite as simple as that.

Anderson-Rubin method

a way of calculating factor scores which produces scores that are uncorrelated and standardized with a mean of 0 and a standard deviation of 1.

Maximum-likelihood estimation

a way of estimating statistical parameters by choosing the parameters that make the data most likely to have happened. Imagine for a set of parameters that we calculated the probability (or likelihood) of getting the observed data; if this probability was high then these particular parameters yield a good fit of the data, but conversely if the probability was low, these parameters are a bad fit to our data. This chooses the parameters that maximize the probability.

Dummy variables

a way of recoding a categorical variable with more than two categories into a series of variables all of which are dichotomous and can take on values of only 0 or 1.

Harmonic mean

a weighted version of the mean that takes account of the relationship between variance and sample size. It is calculated by summing the reciprocal of all observations, then dividing by the number of observations. The reciprocal of the end product is the harmonic mean:

MANOVA

acronym for multivariate analysis of variance.

Yates's continuity correction

an adjustment made to the chi-square test when the contingency table is 2 rows by 2 columns (i.e., there are two categorical variables both of which consist of only two categories). In large samples the adjustment makes little difference and is slightly dubious anyway (see Howell, 2012).

Repeated-measures ANOVA

an analysis of variance conducted on any design in which the independent variable (predictor) or variables (predictors) have all been measured using the same participants in all conditions.

Factorial ANOVA

an analysis of variance involving two or more independent variables or predictors.

Simple slopes analysis

an analysis that looks at the relationship (i.e., the simple regression) between a predictor variable and an outcome variable at low, mean and high levels of a third (moderator) variable.

Registered report

an article in a journal usually outlining an intended research process (rationale, hypotheses, design, data processing strategy, data analysis strategy). The report is reviewed by relevant expert scientists, ensuring that authors get useful feedback before data collection. If the protocol is accepted by the journal editor it typically comes with a guarantee to publish the findings no matter what they are, thus reducing publication bias and discouraging researcher degrees of freedom aimed at achieving significant results.

Homoscedasticity

an assumption in regression analysis that the residuals at each level of the predictor variable(s) have similar variances. Put another way, at each point along any predictor variable, the spread of residuals should be fairly constant.

Homogeneity of regression slopes

an assumption of analysis of covariance. This is the assumption that the relationship between the covariate and outcome variable is constant across different treatment levels. So, if we had three treatment conditions, if there's a positive relationship between the covariate and the outcome in one group, we assume that there is a similar-sized positive relationship between the covariate and outcome in the other two groups too.

Homogeneity of covariance matrices

an assumption of some multivariate tests such as MANOVA. It is an extension of the homogeneity of variance assumption in univariate analyses. However, as well as assuming that variances for each dependent variable are the same across groups, it also assumes that relationships (covariances) between these dependent variables are roughly equal. It is tested by comparing the population variance-covariance matrices of the different groups in the analysis.

Omega squared

an effect size measure associated with ANOVA that is less biased than eta squared. It is a (sometimes hideous) function of the model sum of squares and the residual sum of squares and isn't actually much use because it measures the overall effect of the ANOVA and so can't be interpreted in a meaningful way. In all other respects it's great, though.

Eta squared (η2)

an effect size measure that is the ratio of the model sum of squares to the total sum of squares. So, in essence, the coefficient of determination by another name. PROBLEMS: biased, typically measures the overall effect of an ANOVA and effect sizes are more easily interpreted when they reflect specific comparisons (e.g., the difference between two means).

Variance

an estimate of average variability (spread) of a set of data. It is the sum of squares divided by the number of values on which the sum of squares is based minus 1.

Standard deviation

an estimate of the average variability (spread) of a set of data measured in the same units of measurement as the original data. It is the square root of the variance.

Greenhouse-Geisser estimate

an estimate of the departure from sphericity. The maximum value is 1 (the data completely meet the assumption of sphericity) and the minimum is the lower bound. Values below 1 indicate departures from sphericity and are used to correct the degrees of freedom associated with the corresponding F-statistics by multiplying them by the value of the estimate. Some say the Greenhouse-Geisser correction is too conservative (strict) and recommend the Huynh-Feldt correction instead.

Huynh-Feldt estimate

an estimate of the departure from sphericity. The maximum value is 1 (the data completely meet the assumption of sphericity). Values below this indicate departures from sphericity and are used to correct the degrees of freedom associated with the corresponding F-statistics by multiplying them by the value of the estimate. It is less conservative than the Greenhouse-Geisser estimate, but some say it is too liberal.

Sun of squared errors AKA Sum of squares (SS)

an estimate of total variability (spread) of a set of observations around a parameter (such as the mean). First the deviance for each score is calculated, and then this value is squared. The ___ is the sum of these squared deviances.

Between-groups AKA Between-subjects AKA Independent design

an experimental design in which different treatment conditions utilize different organisms (ex: different people in different treatment conditions) and so the resulting data are independent

Repeated-measures design AKA Within-subject design

an experimental design in which different treatment conditions utilize the same organisms (i.e., in psychology, this would mean the same people take part in all experimental conditions) and so the resulting data are related (a.k.a. related design or within-subject design).

Independent factorial design

an experimental design incorporating two or more predictors (or independent variables) all of which have been manipulated using different participants (or whatever entities are being tested).

Related factorial design

an experimental design incorporating two or more predictors (or independent variables) all of which have been manipulated using the same participants (or whatever entities are being tested).

Mixed design AKA Split-plot design

an experimental design incorporating two or more predictors (or independent variables) at least one of which has been manipulated using different participants (or whatever entities are being tested) and at least one of which has been manipulated using the same participants (or entities).

Multiple regression

an extension of simple regression in which an outcome is predicted by a linear combination of two or more predictor variables. The form of the model is shown in the equation, in which the outcome is denoted by Y, and each predictor is denoted by X. Each predictor has a regression coefficient b associated with it, and b0 is the value of the outcome when all predictors are zero.

Degrees of freedom

an impossible thing to define in a few pages, let alone a few lines. Essentially it is the number of 'entities' that are free to vary when estimating some kind of statistical parameter. In a more practical sense, it has a bearing on significance tests for many commonly used test statistics (such as the F¬¬-statistic, t-test, chi-square statistic) and determines the exact form of the probability distribution for these test statistics.

Goodness of fit

an index of how well a model fits the data from which it was generated. It's usually based on how well the data predicted by the model correspond to the data that were actually collected.

Peer Reviewers' Openness Initiative

an initiative to get scientists to commit to the principles of open science when they act as expert reviewers for journals. Signing up is a pledge to review submissions only if the data, stimuli, materials, analysis scripts and so on are made publically available (unless there is a good reason, such as a legal requirement, not to).

Ratio variable

an interval variable but with the additional property that ratios are meaningful. For example, people's ratings of this book on Amazon.com can range from 1 to 5; for these data to be ratio not only must they have the properties of interval variables, but in addition a rating of 4 should genuinely represent someone who enjoyed this book twice as much as someone who rated it as 2. Likewise, someone who rated it as 1 should be half as impressed as someone who rated it as 2.

Effect size

an objective and (usually) standardized measure of the magnitude of an observed effect. Measures include Cohen's d, Glass's g and Pearson's correlations coefficient, r.

Outlier

an observation or observations very different from most others. Outliers bias statistics (e.g., the mean) and their standard errors and confidence intervals.

Independent ANOVA

analysis of variance conducted on any design in which all independent variables or predictors have been manipulated using different participants (i.e., all data come from different entities).

Mixed ANOVA

analysis of variance used for a mixed design.

Reverse Helmert contrast

another name for a difference contrast.

Independent variable

another name for a predictor variable. This name is usually associated with experimental methodology (which is the only time it makes sense) and is used because it is the variable that is manipulated by the experimenter and so its value does not depend on any other variables (just on the experimenter).

Related design

another name for a repeated-measures design.

Part correlation

another name for a semi-partial correlation.

Factor

another name for an independent variable or predictor that's typically used when describing experimental designs. However, to add to the confusion, it is also used synonymously with latent variable in factor analysis.

Blockwise regression

another name for hierarchical regression.

Hat values

another name for leverage.

Polychotomous logistic regression

another name for multinomial logistic regression.

Dependent variable

another name for outcome variable. This name is usually associated with experimental methodology (which is the only time it really makes sense) and is used because it is the variable that is not manipulated by the experimenter and so its value depends on the variables that have been manipulated.

Planned comparisons

another name for planned contrasts.

Second quartile

another name for the median.

Wald-Wolfowitz runs

another variant on the Mann-Whitney test. Scores are rank-ordered as in the Mann-Whitney test, but rather than analysing the ranks, this test looks for 'runs' of scores from the same group within the ranked order. Now, if there's no difference between groups then obviously ranks from the two groups should be randomly interspersed. However, if the groups are different then one should see more ranks from one group at the lower end, and more ranks from the other group at the higher end. By looking for clusters of scores in this way the test can determine if the groups differ.

Categorical variable

any variable made up of categories of objects/entities. Ex: The university you attend: students who attend the University of Sussex are not also enrolled at Harvard or UV Amsterdam, therefore, students fall into distinct categories.

Variables

anything that can be measured and can differ across entities or across time.

Cross-validation

assessing the accuracy of a model across different samples. This is an important step in generalization. In a regression model there are two main methods of this: adjusted R² or data splitting, in which the data are split randomly into two halves, and a regression model is estimated for each half and then compared.

Pairwise comparisons

comparisons of pairs of means.

Interval variable

data measured on a scale along the whole of which intervals are equal. Ex: people's ratings of this book on Amazon.com can range from 1 to 5; the increase in appreciation for this book represented by a change from 3 to 4 along the scale should be the same as the change in appreciation represented by a change from 1 to 2, or 4 to 5.

Wide format data

data that are arranged such that scores from a single entity appear in a single row and levels of independent or predictor variables are arranged over different columns. As such, in designs with multiple measurements of an outcome variable within a case the outcome variable scores will be contained in multiple columns each representing a level of an independent variable, or a time point at which the score was observed. Columns can also represent attributes of the score or entity that are fixed over the duration of data collection, such as participant sex, employment status etc. (cf. long format data).

Long format data

data that are arranged such that scores on an outcome variable appear in a single column and rows represent a combination of the attributes of those scores - the entity from which the scores came, when the score was recorded, etc. In long format data, scores from a single entity can appear over multiple rows where each row represents a combination of the attributes of the score - for example, levels of an independent variable or time point at which the score was recorded (cf. wide format data).

Ordinal variable

data that tell us not only that things have occurred, but also the order in which they occurred. These data tell us nothing about the differences between values. For example, gold, silver and bronze medals: they tell us that the gold medallist was better than the silver medallist, but they don't tell us how much better (was gold a lot better than silver, or were gold and silver very closely competed?).

Multimodal

description of a distribution of observations that has more than values that appear most often

Dichotomous

description of a variable that consists of only two categories

Validity

evidence that a study allows correct inferences about the question it was aimed to answer or that a test measures what it set out to measure conceptually.

Criterion validity

evidence that scores from an instrument correspond with (concurrent validity) or predict (predictive validity) external measures conceptually related to the measured construct.

Content validity

evidence that the content of a test corresponds to the content of the construct it was designed to cover.

Ecological validity

evidence that the results of a study, experiment or test can be applied, and allow inferences, to real-world conditions.

Perfect collinearity

exists when at least one predictor in a regression model is a perfect linear combination of the others (the simplest example being two predictors that are perfectly correlated - they have a correlation coefficient of 1).

Qualitative methods

extrapolating evidence for a theory from what people say or write (contrast with quantitative methods).

Confidence interval

for a given statistic calculated for a sample of observations (e.g., the mean), a range of values around that statistic that are believed to contain, in a certain proportion of samples (e.g., 95%), the true value of that statistic (i.e., the population parameter). What that also means is that for the other proportion of samples (e.g., 5%), this won't contain that true value. The trouble is, you don't know which category your particular sample falls into.

Independent errors

for any two observations in regression the residuals should be uncorrelated (or independent).

Factor matrix

general term for the structure matrix in factor analysis.

Component matrix

general term for the structure matrix in principal components analysis.

Group mean centring

group mean centring means the transformation of a variable by taking each score and subtracting from it the mean of the scores (for that variable) for the group to which that score belongs (cf. grand mean centring).

Discriminant function analysis

identifies and describes the discriminant function variates of a set of variables and is useful as a follow-up test to MANOVA as a means of seeing how these variates allow groups of cases to be discriminated.

Quadratic trend

if the means in ordered conditions are connected with a line then a quadratic trend is shown by one change in the direction of this line (e.g., the line is curved in one place); the line is, therefore, U-shaped. There must be at least three ordered conditions.

Quartic trend

if the means in ordered conditions are connected with a line, this trend is shown by three changes in the direction of this line. There must be at least five ordered conditions.

Cubic trend

if you connected the means in ordered conditions with a line then a cubic trend is shown by two changes in the direction of this line. You must have at least four ordered conditions.

Parsimony

in a scientific context, parsiomony refers to the idea that simpler explanations of a phenomenon are preferable to complex ones. This idea relates to Ockham's (or Occam's if you prefer) razor, which is a phrase referring to the principle of 'shaving' away unnecessary assumptions or explanations to produce less complex theories. In statistical terms, parsimony tends to refer to a general heuristic that models be kept as simple as possible - in other words, not including variables that don't have real explanatory benefit.

Population

in statistical terms this usually refers to the collection of units (be they people, plankton, plants, cities, suicidal authors, etc.) to which we want to generalize a set of findings or a statistical model.

Adjusted mean

in the context of analysis of covariance this is the value of the group mean adjusted for the effect of the covariate.

Quantitative methods

inferring evidence for a theory through measurement of variables that produce numeric outcomes (cf. qualitative methods).

Binary logistic regression

logistic regression in which the outcome variable has exactly two categories.

Multinomial logistic regression

logistic regression in which the outcome variable has more than two categories.

Multivariate

means 'many variables' and is usually used when referring to analyses in which there is more than one outcome variable (MANOVA, principal component analysis, etc.).

Univariate

means 'one variable' and is usually used to refer to situations in which only one outcome variable has been measured (ANOVA, t-tests, Mann-Whitney tests, etc.).

Orthogonal

means perpendicular (at right angles) to something. It tends to be equated to independence in statistics because of the connotation that perpendicular linear models in geometric space are completely independent (one is not influenced by the other).

Goodman and Kruskal's λ

measures the proportional reduction in error that is achieved when membership of a category of one variable is used to predict category membership of the other variable. A value of 1 means that one variable perfectly predicts the other, whereas a value of 0 indicates that one variable in no way predicts the other.

Kolmogorov-Smirnov Z

not to be confused with the Kolmogorov-Smirnov test that tests whether a sample comes from a normally distributed population. This tests whether two groups have been drawn from the same population (regardless of what that population may be). It does much the same as the Mann-Whitney test and Wilcoxon rank-sum test! This test tends to have better power than the Mann-Whitney test when sample sizes are less than about 25 per group.

Type I error

occurs when we believe that there is a genuine effect in our population, when in fact there isn't.

Type II error

occurs when we believe that there is no effect in the population, when in fact there is.

Syntax

predefined written commands that instruct SPSS Statistics what you would like it to do

Practice effect

refers to the possibility that participants' performance in a task may be influenced (positively or negatively) if they repeat the task because of familiarity with the experimental situation and/or the measures being used.

Boredom effect

refers to the possibility that performance in tasks may be influenced (the assumption is a negative influence) by ____ or lack of concentration if there are many tasks, or the task goes on for a long period of time.

p-hacking

research practices that lead to selective reporting of significant p-values. Some examples: (1) trying multiple analyses and reporting only the one that yields significant results; (2) stopping collecting data at a point other than when the predetermined sample size is reached; (3) deciding whether to include data based on the effect they have on the p-value; (4) including (or excluding) variables in an analysis based on how they affect the p-value; (5) measuring multiple outcome or predictor variables but reporting only those for which the effects are significant; (6) merging groups of variables or scores to yield significant results, and (7) transforming, or otherwise manipulating, scores to yield significant p-values.

Discriminant analysis

see discriminant function analysis.

Regression model

see multiple regression and simple regression.

Density plot

similar to a histogram except that rather than having a summary bar representing the frequency of scores, it shows each individual score as a dot. They can be useful for looking at the shape of a distribution of scores.

βi

standardized regression coefficient. Indicates the strength of relationship between a given predictor, i, of many and an outcome in a standardized form. It is the change in the outcome (in standard deviations) associated with a one standard deviation change in the predictor.

Variance sum law

states that the variance of a difference between two independent variables is equal to the sum of their variances.

Chartjunk

superfluous material that distracts from the data being displayed on a graph.

Experimental hypothesis

synonym for alternative hypothesis.

Levene's test

tests the hypothesis that the variances in different groups are equal (i.e., the difference between the variances is zero). It basically does a one-way ANOVA on the deviations (i.e., the absolute value of the difference between each score and the mean of its group). A significant result indicates that the variances are significantly different - therefore, the assumption of homogeneity of variances has been violated. When samples sizes are large, small differences in group variances can produce a significant test. I do not recommend using this test - instead interpret statistics that have been adjusted for the degree of heterogeneity in variances.

Sign test

tests whether two related samples are different. It does the same thing as the Wilcoxon signed-rank test. Differences between the conditions are calculated and the sign of this difference (positive or negative) is analysed because it indicates the direction of differences. The magnitude of change is completely ignored (unlike in Wilcoxon's test, where the rank tells us something about the relative magnitude of change), and for this reason it lacks power. However, its computational simplicity makes it a nice party trick if ever anyone drunkenly accosts you needing some data quickly analysed without the aid of a computer - doing a sign test in your head really impresses people. Actually it doesn't, they just think you're a sad gimboid.

Kaiser-Meyer-Olkin measure of sampling adequacy (KMO)

the KMO can be calculated for individual and multiple variables and represents the ratio of the squared correlation between variables to the squared partial correlation between variables. It varies between 0 and 1: a value of 0 means that the sum of partial correlations is large relative to the sum of correlations, indicating diffusion in the pattern of correlations (hence, factor analysis is likely to be inappropriate); a value close to 1 indicates that patterns of correlations are relatively compact and so factor analysis should yield distinct and reliable factors. Values between 0.5 and 0.7 are mediocre, values between 0.7 and 0.8 are good, values between 0.8 and 0.9 are great and values above 0.9 are superb (see Kaiser & Rice, 1974).

Reliability

the ability of a measure to produce consistent results when the same entities are measured under different conditions.

Test-retest reliability

the ability of a measure to produce consistent results when the same entities are tested at two different points in time.

Generalization

the ability of a statistical model to say something beyond the set of observations that spawned it. If a model _____ it is assumed that predictions from that model can be applied not just to the sample on which it is based, but to a wider population from which the sample came.

Power

the ability of a test to detect an effect of a particular size (a value of 0.8 is a good level to aim for).

Falsification

the act of disproving a hypothesis or theory.

Researcher degrees of freedom

the analytic decisions a researcher makes that potentially influence the results of the analysis. Some examples are: when to stop data collection, which control variables to include in the statistical model, and whether to exclude cases from the analysis.

Independence

the assumption that one data point does not influence another. When data come from people, it basically means that the behaviour of one person does not influence the behaviour of another.

Homogeneity of variance

the assumption that the variance of one variable is stable (i.e., relatively similar) at all levels of another variable.

Interaction effect

the combined effect of two or more predictor variables on an outcome variable. It can used to gauge moderation.

Deviance

the difference between the observed value of a variable and the value of that variable predicted by a statistical model.

Measurement error

the discrepancy between the numbers used to represent the thing that we're measuring and the actual value of the thing we're measuring (i.e., the value we would get if we could measure it directly).

Indirect effect

the effect of a predictor variable on an outcome variable through a mediator

Direct effect

the effect of a predictor variable on an outcome variable when a mediator is present in the model (cf. indirect effect).

Sampling variation

the extent to which a statistic (the mean, median, t, F, etc.) varies in samples taken from the same population.

Publication bias

the fact that articles published in scientific journals tend to over-represent positive findings. This can be because (1) non-significant findings are less likely to be published; (2) scientists don't submit their non-significant results to journals; (3) scientists selectively report their results to focus on significant findings and exclude non-significant ones; and (4) scientists capitalize on research degrees of freedom to shed their results in the most favourable light possible.

Probability density function (PDF)

the function that describes the probability of a random variable taking a certain value. It is the mathematical function that describes the probability distribution.

Hypothesis SSCP (H)

the hypothesis sum of squares and cross-products matrix. This is a sum of squares and cross-products matrix for a predictive linear model fitted to multivariate data. It represents the systematic variance and is the multivariate equivalent of the model sum of squares.

Exp(B)

the label that SPSS applies to the odds ratio. It is an indicator of the change in odds resulting from a unit change in the predictor in logistic regression. If the value is greater than 1 then it indicates that as the predictor increases, the odds of the outcome occurring increase. Conversely, a value less than 1 indicates that as the predictor increases, the odds of the outcome occurring decrease.

Interquartile range

the limits within which the middle 50% of an ordered set of observations fall. It is the difference between the value of the upper quartile and lower quartile.

−2LL

the log-likelihood multiplied by minus 2. *used in logistic regression*

Shrinkage

the loss of predictive power of a regression model if the model has been derived from the population from which the sample was taken, rather than the sample itself.

Grand mean

the mean of an entire set of observations.

Median

the middle score of a set of ordered observations. When there is an even number of observations the THIS is the average of the two scores that fall either side of what would be the middle value.

Mode

the most frequently occurring score in a set of data.

Multiple R

the multiple correlation coefficient. It is the correlation between the observed values of an outcome and the values of the outcome predicted by a multiple regression model.

Lower-bound estimate

the name given to the lowest possible value of the Greenhouse-Geisser estimate of sphericity. Its value is 1/(k − 1), in which k is the number of treatment conditions.

Tertium quid

the possibility that an apparent relationship between two variables is actually caused by the effect of a third variable on them both (often called the third-variable problem).

HARKing

the practice in research articles of presenting a hypothesis that was made after data were collected as though it were made before data collection.

Alternative hypothesis

the prediction that there will be an effect (i.e., that your experimental manipulation will have some effect or that certain variables will relate to each other).

Sampling distribution

the probability distribution of a statistic. We can think of this as follows: if we take a sample from a population and calculate some statistic (e.g., the mean), the value of this statistic will depend somewhat on the sample we took. As such the statistic will vary slightly from sample to sample. If, hypothetically, we took lots and lots of samples from the population and calculated the statistic of interest we could create a frequency distribution of the values we get. The resulting distribution is what the sampling distribution represents: the distribution of possible values of a given statistic that we could expect to get from a given population.

Odds

the probability of an event occurring divided by the probability of that event not occurring.

α-level

the probability of making a Type I error (usually this value is 0.05).

Experimentwise error rate

the probability of making a Type I error in an experiment involving one or more statistical comparisons when the null hypothesis is true in each case.

Familywise error rate

the probability of making a Type I error in any set of tests conducted on the same data set and addressing the same empirical question when the null hypothesis is true in each case.

β-level

the probability of making a Type II error (Cohen, 1992, suggests a maximum value of 0.2).

Likelihood

the probability of obtaining a set of observations given the parameters of a model fitted to those observations.

Transformation

the process of applying a mathematical function to all observations in a data set, usually to correct some distributional abnormality such as skew or kurtosis.

Standardization

the process of converting a variable into a standard unit of measurement. The unit of measurement typically used is standard deviation units (see also z-scores). This allows us to compare data when different units of measurement have been used (we could compare weight measured in kilograms to height measured in inches).

Randomization

the process of doing things in an unsystematic or random way. In the context of experimental research the word usually applies to the random assignment of participants to different treatment conditions.

Centring

the process of transforming a variable into deviations around a fixed point. This fixed point can be any value that is chosen, but typically a mean is used. To centre a variable the mean is subtracted from each score. See grand mean centring, group mean centring.

Ranking

the process of transforming raw scores into numbers that represent their position in an ordered list of those scores. The raw scores are ordered from lowest to highest and the lowest score is assigned a rank of 1, the next highest score is assigned a rank of 2, and so on.

Communality

the proportion of a variable's variance that is common variance. This term is used primarily in factor analysis. A variable that has no unique variance (or random variance) would have a communality of 1, whereas a variable that shares none of its variance with any other variable would have a communality of 0.

Coefficient of determination

the proportion of variance in one variable explained by a second variable. It is Pearson's correlation coefficient squared.

Odds ratio

the ratio of the odds of an event occurring in one group compared to another. So, for example, if the odds of dying after writing a glossary are 4, and the odds of dying after not writing a glossary are 0.25, then the odds ratio is 4/0.25 = 16. This means that the odds of dying if you write a glossary are 16 times higher than if you don't. An odds ratio of 1 would indicate that the odds of a particular outcome are equal in both groups.

Factor loading

the regression coefficient of a variable for the linear model that describes a latent variable or factor in factor analysis.

Levels of measurement

the relationship between what is being measured and the numbers obtained on a scale.

Standardized residuals

the residuals of a model expressed in standard deviation units. Those with an absolute value greater than 3.29 (actually, we usually just use 3) are cause for concern because in an average sample a value this high is unlikely to happen by chance; if more than 1% of our observations have ___ with an absolute value greater than 2.58 (we usually just say 2.5) there is evidence that the level of error within our model is unacceptable (the model is a fairly poor fit to the sample data); and if more than 5% of observations have ____ with an absolute value greater than 1.96 (or 2 for convenience) then there is also evidence that the model is a poor representation of the actual data.

Unstandardized residuals

the residuals of a model expressed in the units in which the original outcome variable was measured.

Null hypothesis

the reverse of the experimental hypothesis, it states that your prediction is wrong and the predicted effect doesn't exist.

Standard error

the standard deviation of the sampling distribution of a statistic. For a given statistic (e.g., the mean) it tells us how much variability there is in this statistic across samples from the same population. Large values, therefore, indicate that a statistic from a given sample may not be an accurate reflection of the population from which the sample came.

Standard error of the mean (SE)

the standard error associated with the mean

Total SSCP (T)

the total sum of squares and cross-products matrix. This is a sum of squares and cross-products matrix for an entire set of observations. It is the multivariate equivalent of the total sum of squares.

Main effect

the unique effect of a predictor variable (or independent variable) on an outcome variable. The term is usually used in the context of ANOVA.

z-score

the value of an observation expressed in standard deviation units. Calculated by taking the observation, subtracting from it the mean of all observations, and dividing the result by the standard deviation of all observations. This conversion creates a new distribution that has a mean of 0 and a standard deviation of 1.

Predicted value

the value of an outcome variable based on specific values of the predictor variable or variables being placed into a statistical model.

Upper quartile

the value that cuts off the highest 25% of ordered scores. If the scores are ordered and then divided into two halves at the median, then THIS is the median of the top half of the scores.

Lower quartile

the value that cuts off the lowest 25% of the data. If the data are ordered and then divided into two halves at the median, then THIS is the median of the lower half of the scores.

Grand variance

the variance within an entire set of observations.

Data view

there are two ways to view the contents of the data editor window. The data view shows you a spreadsheet and can be used for entering raw data. See also variable view.

Leverage AKA hat values

these gauge the influence of the observed value of the outcome variable over the predicted values. The average is (k+1)/n, in which k is the number of predictors in the model and n is the number of participants. Can lie between 0 (the case has no influence whatsoever) and 1 (the case has complete influence over prediction). If no cases exert undue influence over the model then we would expect all of these to be close to the average value. Hoaglin and Welsch (1978) recommend investigating cases with values greater than twice the average (2(k + 1)/n) and Stevens (2002) recommends using three times the average (3(k + 1)/n) as a cut-off point for identifying cases having undue influence.

Mahalanobis distances

these measure the influence of a case by examining the distance of cases from the mean(s) of the predictor variable(s). One needs to look for the cases with the highest values. It is not easy to establish a cut-off point at which to worry, although Barnett and Lewis (1978) have produced a table of critical values dependent on the number of predictors and the sample size. From their work it is clear that even with large samples (N = 500) and five predictors, values above 25 are cause for concern. In smaller samples (N = 100) and with fewer predictors (namely three) values greater than 15 are problematic, and in very small samples (N = 30) with only two predictors values greater than 11 should be examined. However, for more specific advice, refer to Barnett and Lewis's (1978) table.

Simple effects analysis

this analysis looks at the effect of one independent variable (categorical predictor variable) at individual levels of another independent variable.

HE−1

this is a matrix that is functionally equivalent to the hypothesis SSCP divided by the error SSCP in MANOVA. Conceptually it represents the ratio of systematic to unsystematic variance, so is a multivariate analogue of the F-statistic.

Meta-analysis

this is a statistical procedure for assimilating research findings. It is based on the simple idea that we can take effect sizes from individual studies that research the same question, quantify the observed effect in a standard way (using effect sizes) and then combine these effects to get a more accurate idea of the true effect in the population.

Kendall's W

this is much the same as Friedman's ANOVA but is used specifically for looking at the agreement between raters. So, if, for example, we asked 10 different women to rate the attractiveness of Justin Timberlake, David Beckham and Brad Pitt we could use this test to look at the extent to which they agree. Kendall's W ranges from 0 (no agreement between judges) to 1 (complete agreement between judges).

Unsystematic variation

this is variation that isn't due to the effect in which we're interested (so could be due to natural differences between people in different samples such as differences in intelligence or motivation). Variation that can't be explained by whatever model we've fitted to the data.

Kurtosis

this measures the degree to which scores cluster in the tails of a frequency distribution.Calculated such that none yields a value of 3. To make the measure more intuitive, SPSS Statistics (and some other packages) subtract 3 from the value so that none is expressed as 0 and positive and negative THIS take on positive and negative values, respectively.

AR(1)

this stands for first-order autoregressive structure. It is a covariance structure used in multilevel linear models in which the relationship between scores changes in a systematic way. It is assumed that the correlation between scores gets smaller over time and that variances are assumed to be homogeneous. This structure is often used for repeated-measures data (especially when measurements are taken over time such as in growth models).

Jonckheere-Terpstra test

this statistic tests for an ordered pattern of medians across independent groups. Essentially it does the same thing as the Kruskal-Wallis test (i.e., test for a difference between the medians of the groups) but it incorporates information about whether the order of the groups is meaningful. As such, you should use this test when you expect the groups you're comparing to produce a meaningful order of medians.

Central limit theorem

this theorem states that when samples are large (above about 30) the sampling distribution will take the shape of a normal distribution regardless of the shape of the population from which the sample was drawn. For small samples the t-distribution better approximates the shape of the sampling distribution. We also know from this theorem that the standard deviation of the sampling distribution (i.e., the standard error of the sample mean) will be equal to the standard deviation of the sample (s) divided by the square root of the sample size (N).

Partial out

to partial out the effect of a variable is to remove the variance that the variable shares with other variables in the analysis before looking at their relationships (see partial correlation).

bi

unstandardized regression coefficient. Indicates the strength of relationship between a given predictor, i, of many and an outcome in the units of measurement of the predictor. It is the change in the outcome associated with a unit change in the predictor.

Quantiles

values that split a data set into equal portions. Quartiles, for example, are a special case of quantiles that split the data into four equal parts. Similarly, percentiles are points that split the data into 100 equal parts and noniles are points that split the data into nine equal parts (you get the general idea).

Numeric variables

variables involving numbers.

String variables

variables involving words (i.e., letter strings). Such variables could include responses to open-ended questions such as 'How much do you like writing glossary entries?'; the response might be 'About as much as I like placing my ballbag on hot coals'.

Common variance

variance shared by two or more variables.

Unique variance

variance that is specific to a particular variable (i.e., is not shared with other variables). We tend to use THIS to refer to variance that can be reliably attributed to only one measure, (otherwise it is called random variance).

Random variance

variance that is unique to a particular variable but not reliably so.

Systematic variation

variation due to some genuine effect (be that the effect of an experimenter doing something to all of the participants in one sample but not in other samples, or natural variation between sets of variables). We can think of this as variation that can be explained by the model that we've fitted to the data.

Overdispersion

when the observed variance is bigger than expected from the logistic regression model. Like leprosy, you don't want it.

Autocorrelation

when the residuals of two observations in a regression model are correlated.

Nominal variable

where numbers merely represent names. For example, the numbers on sports players shirts: a player with the number 1 on her back is not necessarily worse than a player with a 2 on her back. The numbers have no meaning other than denoting the type of player (full back, centre forward, etc.).


Related study sets

HITT - 1311 - Check Your Understanding / Study Questions - Chapter 3

View Set

Genetic Disorders of the Endocrine System + Q(Tegay)

View Set

Chap 29: The Normal Newborn: Needs and Care

View Set

Quiz: Providing Range-of-Motion Exercises

View Set

Codeplus II HS 2 taak 2 + 3 DEFINITIE

View Set