Comprehensive Exam - Methods
What is meta-analysis? What are the basic steps? What are the main things meta-analysis can tell us?
(1) The average relationship between two variables across studies. It focuses on a bivariate relationship. (2) The variance in the relationship across studies. This speaks to the generalizability of the relationship. (3) Variables that help explain the variance in effects: moderators. Step 1: Specify the relationship of interest Step 2: Search for relevant studies. Step 3: Establish inclusion criteria. Step 4: Code selected studies for relevant information. Step 5: Analyze the data.
What is a statistic? What is a parameter?
A parameter is a characteristic of a population. A statistic is a characteristic of a sample. Inferential statistics enables you to make an educated guess about a population parameter based on a statistic computed from a sample randomly drawn from that population
What is an a priori comparison? A post hoc comparison?
A priori comparisons are analyses that are specified before seeing the data. Post hoc analysis consists of analyses that were not specified before seeing the data. This typically creates a multiple testing problem because each potential analysis is effectively a statistical test. Multiple testing procedures are sometimes used to compensate, but that is often difficult or impossible to do precisely. Post hoc analysis that is conducted and interpreted without adequate consideration of this problem is sometimes called data dredging by critics, because the more one looks the more likely something will be found.
What does structural equation modeling (SEM) involve? What is path analysis? What are the similarities and differences among SEM, path analysis, and regression analysis?
Data analysis technique that incorporates many of the other things discussed (correlation, regression, and factor analysis). CFA vs. SEM: in SEM you test relationships between constructs, while accounting for measurement error, like in the case of Confirmatory Factor Analysis. SEM assesses and controls for measurement error. SEM allows for the inclusion of multiple dependent variables. In SEM you have the ability assess direct and indirect relationships. You can test alternative models. Path Analysis is SEM with observed variables, not latent variables. Observed variables only. As such it does not address measurement error.
What are the differences between descriptive statistics and inferential statistics?
Descriptive statistics uses the data to provide descriptions of the population, either through numerical calculations or graphs or tables. Inferential statistics makes inferences and predictions about a population based on a sample of data taken from the population in question.
What is dummy variable coding and why is it used in analyses such as multiple regression?
Dummy variable coding is a technique where a variable is coded as either one or zeroes. It is used to assess the effects of a categorical variable in multiple regression. This is done by creating a new variable equals to 1 for each category and zero otherwise. When running a regression one needs to leave one category out of the regression analysis for comparison and estimability.
What are some methods that can be used to visually inspect one's data?
Histogram. Bar chart. Scatter plots. Open the data.
What is logistic regression?
Logistic regression is a maximum likelihood method that utilizes the Bernoulli probability distribution with a logistic link function to predict the probability of the occurrence of an event. at is usually taken to apply to a binary dependent variable. In regression analysis, logistic regression or logit regression is estimating the parameters of a logistic model. More formally, a logistic model is one where the log-odds of the probability of an event is a linear combination of independent or predictor variables. The two possible dependent variable values are often labelled as "0" and "1", which represent outcomes such as pass/fail, win/lose, alive/dead or healthy/sick. The binary logistic regression model can be generalized to more than two levels of the dependent variable: categorical outputs with more than two values are modelled by multinomial logistic regression, and if the multiple categories are ordered, by ordinal logistic regression, for example the proportional odds ordinal logistic model. In practical terms, when a dependent variable is dichotomous, a scholar may be interested in knowing the association between the level in a variable and the likelihood of the dependent variable being equal to one.
What is mean centering in multiple regression and why is it used?
Mean centering in multiple regression is a linear transformation technique where the value of the mean is subtracted from each value in the data so that the mean of the transformed variable is zero.
What are measures of central tendency?
Mean, median, mode. The mean is the average of a data set. The mode is the most common number in a data set. The median is the middle of the set of numbers.
What are measures of variance?
Range, variance, and standard deviation. This is where we will look at measures of variability, which are statistical procedures to describe how spread out the data is. They are: Range: defined as a single number representing the spread of the data. Standard deviation: defined as a number representing how far from the average each score is. Variance: defined as a number indicating how spread out the data is
What is regression analysis? What is an intercept? What is a slope? What are unstandardized and standardized regression coefficients?
Regression analysis is a set of statistical processes for estimating the relationships among variables. An intercept is the expected value of the dependent variable when the value of all other independent variables is zero. A slope is the expected change in the dependent variable resulting from a unit change in the independent variable. An unstandardized regression coefficient is the one resulting from the regression in the original scale of the variables, as such it preserves such scale that might be meaningful. A standardized regression coefficient is such that it takes the value if the scale in both the dependent and independent variable were standardized with a mean of zero and standard deviation of 1. Standardized regression coefficients all share the same scale, which allows comparison for effect sizes between variables.
What is the relationship (both conceptual and statistical) between validity and reliability?
Reliability is a necessary but insufficient condition for validity. Sometimes it has been argued that reliability is an upper bound for validity. Every time a measure is less than perfectly reliable it will reduce relationships between that measure and other constructs, so assessing validity is harder.
What is statistical power? What factors affect power?
Statistical power: the probability of correctly rejecting the NULL hypothesis when it should be rejected. Avoid type 2 error. Power of 0.8 is benchmark. Factors that influence statistical power Sample size Multicollinearity Effect size Decision criteria Number of variables Reliability
What are intraclass correlation coefficients (ICCs)? How are they used to assess rater consistency or agreement? How are they used in multilevel modeling (e.g., HLM)?
The intraclass correlation, or the intraclass correlation coefficient (ICC),[1] is an inferential statistic that can be used when quantitative measurements are made on units that are organized into groups. It describes how strongly units in the same group resemble each other. While it is viewed as a type of correlation, unlike most other correlation measures it operates on data structured as groups, rather than data structured as paired observations. It compares variation within groups to variation between groups by assessing the variation between groups as a share of the total variation. They are used to assess rater consistency and agreement in that one can compare the ratings made by each rater and compare the groups of measures. ICC are used in Multilevel modelling to justify its use if the ICCs are high enough to justify group-level testing. In other words, if all groups have similar values in the dependent variable, then there is little reason to use
What is a t-test?
The t-test is any statistical hypothesis test in which the test statistic follows a Student's t-distribution under the null hypothesis. A t-test is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. When the scaling term is unknown and is replaced by an estimate based on the data, the test statistics (under certain conditions) follow a Student's t distribution. The t-test can be used, for example, to determine if two sets of data are significantly different from each other. It is also used to test if a two paired samples are different from each other, as in pretest-posttest analyses.
What are Type I and Type II errors?
Type I error is rejecting the null, when the null is true in the population (false positive). Type II error is failing to reject the null, when the null should be rejected in the population (false negative).
What is the general linear model (GLM)?
general linear model or multivariate regression model is a statistical linear model. The errors are usually assumed to be uncorrelated across measurements, and follow a multivariate normal distribution. If the errors do not follow a multivariate normal distribution, generalized linear models may be used to relax assumptions about Y and U. The general linear model incorporates a number of different statistical models: ANOVA, ANCOVA, MANOVA, MANCOVA, ordinary linear regression, t-test and F-test. The general linear model is a generalization of multiple linear regression model to the case of more than one dependent variable. Hypothesis tests with the general linear model can be made in two ways: multivariate or as several independent univariate tests.
What is a chi-square test and what can it tell us?
is any statistical hypothesis test where the sampling distribution of the test statistic is a chi-squared distribution when the null hypothesis is true. This is used in several contexts. There is a chi square test to assess whether there is a relationship between two categorical variables. Chi square tests are also used to compare the model fit of two different models, where one model is nested in the other, in any maximum likelihood method, including Structural Equations Modelling (SEM) or Logistic regression. As such both the Wald test and the Likelihood ratio test are both test statistics distributed as chi-square.
What is a z score?
is the number of standard deviations from the mean a data point is. But more technically it's a measure of how many standard deviations below or above the population mean a raw score is. A z-score is a standardized number that takes the scale away from the measurement. As such it has a mean of zero and a standard deviation of 1.
What are nominal, ordinal, interval, Likert, ratio, forced-choice, and behaviorally anchored scales?
· Nominal scales: categorize cases into discrete groups Ordinal scales: scale indicate the order of cases. He is describing these scales completely wrong. He is describing rank scales, not ordinal. Interval/Continuous scale: they are those that indicate quality. Equal distance between scale points. Likert type scales Behaviorally anchored scales: subtype of interval. Creates anchors around answers: provide the raters with examples that illustrate what an acceptable response entails. Less acquiescence. It means the same thing across raters: more interrater reliability
What are the advantages and challenges of the following types of convenience samples: college students, online panel/crowdsourcing data (e.g., MTurk), and "snowball" samples?
· Advantages: o Efficiency (cost-effective and more responsive) o Homogeneity (less noise or extraneous variation in homogeneous samples) o Humanity (samples contain people) o Generalizability (field samples are no more representative of typical organizations) o Adequacy (any sample encompassed by the theory is appropriate) · Disadvantages: o Not representative of general population due to range restrictions (variables in sample is less than the variable in the population) and covaried characteristics · Students are available, cost-effective, and can answer very detailed questions. However, college students may lack the experience to answer certain questions of interest · Online panels or crowdsourcing offer diverse participants. However, responses tend to be hurried because participants rush to get compensated · Snowball samples offer large sample size, but the responses may come from people with similar characteristics
What is an experiment? What are some key features of experiments? What are some types of experimental designs? What is a quasi-experiment?
· An experiment is a design where individuals are manipulated to extract inferences about the effects of the manipulation · Tries to draw conclusions about cause-and-effect relationships · Main features of experiences: o (1) Manipulation of one or more variables and comparison to a control condition o (2) Random assignment to treat and control conditions/groups o (3) Control over extraneous variables · Types of experiments: o (1) Independent measures: between groups design o (2) Repeated measures design: within groups design § Can incorporate multiple conditions § Counterbalancing to reduce order effects o (3) Matched pairs design: each condition uses different, but similar participants § This is ideally a twin-study. o (4) Pre-post test design: it can be in one group (simple pre-post test) or two groups (difference in difference) · Quasi-experiments: differs from a true experiment in that the researcher controls only some of the variables under investigation o Cases do not control the levels of the IV they experience o Cases are not randomly assigned to levels of the IV—uses preexisting groups
What are constructs? What is a latent variable? What is an observed variable?
· Constructs (latent variables): A variable which is not directly observable but rather inferred, mathematically, from other variables that are observed (in SEM) · Observed variable (manifest indicators): A variable that can be measured directly (and are used to assess latent constructs in SEM)
What is counterbalancing and why is it used?
· Counterbalancing is used to reduce order effects · If an experiment involves subjecting the treatment group to different (sequential) manipulations, then researchers could further divide the treatment group and expose the participants in the study with different orders of the manipulation
What are dependent/criterion/endogenous variables?
· Dependent (criterion) variable: variable whose variation is explained by a(n) IV(s) · Endogenous variable: variable that is affected by other variables in the system o Endogenous variables have values that are determined by other variables in the system (these "other" variables are called exogenous variables)
What are independent/predictor/exogenous variables?
· Independent (predictor) variable: variable that helps explain or predict a DV o It is the cause in a causal relationship · Exogenous variable: variable that is not affected by other variables in the system o Exogenous variables are fixed when they enter the model, are taken as a "given" in the model, influence endogenous variables in the model, are not determined by the model, are not explained by the model, and aren't affected by any other variables in the model (although it could be affected by factors outside of the linear regression model being studied)
What are phi coefficients, point-biserial correlations, and Spearman rank-order correlations?
The phi coefficient is a measure of the degree of association between two binary variables. This measure is similar to the correlation coefficient in its interpretation. The point biserial correlation coefficient (rpb) is a correlation coefficient used when one variable (e.g. Y) is dichotomous. The point-biserial correlation is mathematically equivalent to the Pearson (product moment) correlation, that is, if we have one continuously measured variable X and a dichotomous variable Y, rXY = rpb. This can be shown by assigning two distinct numerical values to the dichotomous variable. is a nonparametric measure of rank correlation (statistical dependence between the rankings of two variables). It assesses how well the relationship between two variables can be described using a monotonic function. The Spearman correlation between two variables is equal to the Pearson correlation between the rank values of those two variables; while Pearson's correlation assesses linear relationships, Spearman's correlation assesses monotonic relationships (whether linear or not). Spearman's coefficient is appropriate for both continuous and discrete ordinal variables.
What are the differences among r, r2, R, and R2, and adjusted R2?
The r statistic is the simple correlation. The r2 is the squared correlation that tells us the amount of shared variance between two variables. R is the level of relationship between the dependent variables and the independent variables as a group. R2 is the amount of variance in the dependent variable explained by the independent variables as a group. Adjusted R2 can be interpreted as the R2, but it is adjusted for the number of variables included in the model.
What are the standard error of the mean, the standard error of the estimate, and the standard error of measurement?
The standard error of a sample statistic is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the parameter or the statistic is the mean, it is called the standard error of the mean (SEM). The sampling distribution of a population mean is generated by repeated sampling and recording of the means obtained. This forms a distribution of different means, and this distribution has its own mean and variance. Mathematically, the variance of the sampling distribution obtained is equal to the variance of the population divided by the sample size. This is because as the sample size increases, sample means cluster more closely around the population mean. The relationship between the standard error and the standard deviation is such that, for a given sample size, the standard error equals the standard deviation divided by the square root of the sample size. The Standard Error of the Estimate is a statistical figure that tells you how well your measured data relates to a theoretical straight line, the line of regression. A score of 0 would mean a perfect match, that every measured data point fell directly on the line. The standard error of measurement is a statistic that indicates the variability of the errors of measurement by estimating the average number of points by which observed scores are away from true scores.
What is a main effect? What is an interaction effect?
a main effect is the effect of an independent variable on a dependent variable averaging across the levels of any other independent variables. An interaction effect is the simultaneous effect of two or more independent variables on at least one dependent variable in which their joint effect is significantly greater (or significantly less) than the sum of the parts.
What are degrees of freedom?
the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary. The number of independent ways by which a dynamic system can move, without violating any constraint imposed on it, is called number of degrees of freedom. Estimates of statistical parameters can be based upon different amounts of information or data. The number of independent pieces of information that go into the estimate of a parameter are called the degrees of freedom.
What is careless or unmotivated responding in the context of questionnaire research? What methods can researchers use to detect and deal with these response patterns?
· Careless or unmotivated responding happens when a respondent answers a questionnaire without thinking through their answers or trying to answer truthfully/honestly (DeVellis, 2012; 2016) · This creates a problem because the measure does not become an assessment of the underlying latent construct under study · Researchers can use several methods to detect careless responding o Direct screening methods; Questionnaire design decisions used to reduce careless responding § Attention checks § Bogus items § Reverse worded items § Differing response formats § Direct question asking participant their attentiveness o Archival screening methods; Examining patterns of responses to identity careless responders § Detecting respondents with very low/high within-person variance across constructs § Detecting when respondents have the same responses on items that are opposites § Checking response completion times o Statistical screening methods; Using statistical analyses to identify carelessness § Examining items that are highly correlated
What is the difference between a cross-sectional design and a longitudinal design? What are some considerations in choosing one design or the other?
· Cross-section design: involves multiple cases being measured at the same time o Pros: § Participants only need to respond once, so collection is quick and cheap o Cons: § Does not address CMV given that IVs and DVs are not temporally separated · Longitudinal design (repeated measure design): involves the same cases being observed at different points in time o Pros: § Helps increase reliability § Addresses some concerns regarding CMV § Allows you to measure causality more efficiently § Shows how relationships change over time o Cons: § Relationships tend to be smaller when measured over time as opposed to concurrently making effects harder to detect § Takes larger amounts of time § Puts higher demands on participants and researcher § Hard to determine how long it will take for an IV to affect a DV § Participant attrition; need to oversample at T1, are people that leave different than those that stay (random attrition vs. systematic attrition)
What are demand characteristics? What are experimenter expectancy effects?
· Demand characteristics: An experimental artifact where participants form an interpretation of the experiment's purpose and unconsciously change their behavior to fit that interpretation o The subject figures out a way to 'beat' the experiment to attain good scores in the alleged evaluation o A solution depends on the study, but a possible solution is to conceal IVs and DVs among random constructs · Subject-expectancy effects: A form of reactivity that occurs in experiments when a research subject expects a given result and therefore unconsciously affects the outcome, or reports the expected result o Because this effect can significantly bias the results of experiments (especially on human subjects), double-blind methodology is used to eliminate the effect · Observer(experimenter)-expectancy effects: A form of reactivity in which a researcher's cognitive bias causes them to subconsciously influence the participants of an experiment (like confirmation bias: you see what you want to see) o The solution is double-blind experimental design
What are the following threats to the validity of inferences from an experiment: History, maturation, testing, instrumentation, statistical regression, differential selection, and attrition/mortality?
· History: Did some unanticipated event occur while the experiment was in progress and did these events affect the dependent variable o In the one group pre-post test design, the effect of the treatment is the difference in the pre-test and post-test scores; This difference may be due to the treatment or to history · Maturation: Were changes in the DV due to normal developmental and growth? Processes operating within the subject as a function of time? o Has there been enough time to detect changes and/or changes to be isolated to treatment? Threat for one group designs (pre-post). · Testing: Did the pre-test affect the scores of the post-scores? Threat to one-group design o A pre-test may sensitize participant in unanticipated ways and their performance on the post-test may be due to the pre-test, not to the treatment, or, more likely, an interaction of the pre-test and treatment. · Instrumentation: Did any change occur during the study in the way the DV was measured? · Statistical regression (regression to the mean): An effect that is the result of a tendency for subjects selected on the bases of extreme scores to regress towards the mean on subsequent tests. This is related (inversely) to test reliability · Differential selection: Refers to selecting participants for the various groups in the study o Are the groups equivalent at the beginning of the study? A threat to two group quasi-experiments o If subjects were selected by random sampling and random assignment, all had equal chance of being in treatment or comparison groups, and the groups are equivalent · Attrition/Mortality: Differential loss of participants across groups. Did some participants drop out? Did this affect the results? Did about the same number of participants make it through the entire study in both experimental and comparison groups?
What are the differences among listwise deletion, pairwise deletion, and imputation for dealing with missing data? Which approach is best?
· Listwise deletion: when the entire case with missing data is removed from consideration · Pairwise deletion: when a case is deleted only if there are missing data on variables from the specific analyses conducted o The case is retained for any other analyses in which data are present o Pairwise deletion maximizes all data available on an analysis basis · Multiple imputation: the researcher creates several datasets with different possible values for each missing value; then results are pooled across the different datasets o Imputation is substituting missing values with an analytically derived value · It has been argued that pairwise deletion is better than listwise deletion because it uses more information o On the other hand, standard errors may be calculated wrongly, because the average sample size is used o Both methods assume the data to be missing completely at random · Multiple imputation only assumes the data is missing at random. Any method for deletion can be a source for bias if the data is missing not at random; but there is no test available to assess this.
What is measurement? What are contamination and deficiency in the context of measurement?
· Measurement: Procedures used to measure constructs of interest (or) the application of a scale to subjects to assess the level of a latent construct of an individual o The quality of the measurement depends on the measurement process and on the quality of the scale administered · Content validity refers to the extent to which scores on the measure reflect the intended construct o Content deficiency: when part of the construct of interest is not captured by the measure o Content contamination: when one captures something that is not part of the construct of interest
What is an outlier? What is an influential case? How can outliers and influential cases be detected? How can they be dealt with when testing hypotheses?
· Outlier: an observation point that appears to deviate from other observations in a sample · Influential case: any case that significantly alters the value of a regression coefficient whenever it is deleted from an analysis · Detecting outliers and influential cases o Need to consider if data is univariate vs. multivariate and parametric vs. non-parametric o Popular methods include... § Visual inspection of graphical representations (like a histogram) § Z-score, indicates how many SDs a data point is from the sample mean § Probabilistic and statistical modeling, § Linear regression models (e.g., PCA) § Proximity based models § Grubb's test § Density based clustering methods (e.g., Dbscan) § Isolation forests, algorithm of binary decision trees that detects outliers when path lengths are shorter than the rest of the observations o Influential cases can be detected via Difference in Fits or Cook's distance § These both depend on the concept of leverage where one compares the predicted value of the dependent variable if the observation is included or not · They can be dealt with in many ways o Ad hoc: careful data entry, precise experimental planning and execution, etc. o If the data is large enough, outliers are expected and therefore might be ignored o Statistical methods to exclude the observations § Trimming (or truncation), removing extreme outliers § Winsorizing, replacing all outliers with the largest values that are considered nonsuspect § Logarithmic transformations, tends to squeeze together the larger values in your data set and stretch out the smaller ones § Non-normal distribution (Cauchy), estimation method which accounts for a large number of outliers § Hierarchical clustering, allows observations to switch cluster membership, so outliers have less of an impact
What does sampling involve? What are the main types of probability and non- probability samples?
· Sampling involves selecting a small group of cases from a larger group and studying the small group to learn about the large group · You make inferences about the population · Sample representativeness: the extent to which the sample represents the target · Main types of samples o Probability samples: these are samples where cases are selected by chance and have a known probability of being selected § Simple random samples: samples in which every case from the population has an equal chance of being selected § Stratified random samples: randomly sampling within subgroups (strata) of a larger population · Used when there is less of one subgroup then another; gives you equal numbers in each subgroup o Non-probability samples: samples in which the cases are not selected by chance. Thus, we do not know the probability a case is included § Purposive sample: samples that include cases that possess a characteristic of interest, but not representative of the population as a whole § Convenience sample: cases are self-selected and/or easily accessible · Crowdsourcing samples, online panels, students, snowball samples
What is statistical conclusion validity, internal validity, construct validity, and external validity in the context of experimental designs?
· Statistical conclusion validity: a judgment concerning whether theory and empirical evidence support the inferences one can draw from the scores of a measure and the relationships between measures as it pertains to theory o Importance of inferences that we make about the thing we are trying to measure NOT the measure itself o Validity is not a number or single test/analysis, validity is a judgment o Validity of measures not validity of study as a whole (internal and external validity) · Internal validity: when variation in IV scores are responsible for variation in DV scores o It is related to whether... § there is no demand characteristics (subjects conform to what they think is wanted) § no contamination from the outside (confounding variables) § it is double or triple blind § subjects complied with the experimental treatment (people assigned to a condition were actually exposed to the condition) o Internal validity most first be established before external validity can be inferred o Three criteria for assessing internal validity § (1) IV and DV are meaningfully (statistically/practically significant) related · The relationship must be greater then what might be expected to occur by chance or coincidence § (2) Variation in the IV is concurrent with, or precedes, variation in the DV § (3) There is a reasonable causal explanation for the observed relationship and there are no plausible alternative explanations for it · External validity: when findings obtained are correctly generalized beyond particulars of the experiment, to other samples, or the larger population—external validity is inferred · Construct validity: when there is high correspondence between scores on a measure and the intended construct of interest o Construct validity is never (or at best, rarely) achieved. It is generally inferred § We cannot observe most constructs as they are conceptual, not directly measurable o Construct validation steps § (1) Define the construct and develop conceptual meaning for it · Construct domain—what it is and isn't · Content validity: when the items adequately capture the content domain of the construct (i.e. overlap of measure content and the construct domain)—determined by experts o Content Deficiency: part of the construct we have not captured o Content Contamination: part of the construct measure that is not part of the construct · Face validity: when a measure appears to be construct valid by individuals who use it, including participants · Nomological network—where it fits, how its related to other constructs § (2) Develop/choose a measure consistent with the definition § (3) Perform logical analyses and empirical tests to determine if observation obtained on the measure conform to the conceptual definition · Convergent validity: the extent to which your scales are related to scale of the same or highly similar constructs · Discriminant validity: the extent to which measures of constructs that are supposed to be independent are found to have a low correspondence
What does it mean to manipulate a variable? What is a manipulation check?
· To manipulate a variable, a research will strategically alter the levels of a variable of interest for the treatment group, and then randomly assign participants to different levels of the manipulation · A manipulation check occurs when researchers use measured variables to show what the manipulated variables concurrently affect besides the DV of interest o A manipulation check can help an experimenter rule out reasons that a manipulation may have failed to influence a DV o When a manipulation creates significant differences between experimental conditions in both (1) the DV and (2) the measured manipulation check variable, the interpretation is that (1) the manipulation "causes" variation in the DV (the "effect") and (2) the manipulation also explains variation in some other, more theoretically obvious measured variable that it is expected to concurrently influence, which assists in interpreting the "cause"
What is the difference between within- and between-subject designs?
· Within-person (or within-subject) design: represent the variability of a particular value for individuals in a sample o Same subjects perform at all levels of the IV o Within-subjects designs are more powerful that between-subjects because there is less variability · Between-persons (or between-subjects) designs: Examine differences between individuals o Different groups of subjects received different levels of IV
What is a confidence interval? What is a credibility interval (e.g., in meta-analysis)?
A confidence interval (CI) is a type of interval estimate, computed from the statistics of the observed data, that might contain the true value of an unknown population parameter. The interval has an associated confidence level that, loosely speaking, quantifies the level of confidence that the parameter lies in the interval. More strictly speaking, the confidence level represents the frequency (i.e. the proportion) of possible confidence intervals that contain the true value of the unknown population parameter. In other words, if confidence intervals are constructed using a given confidence level from an infinite number of independent sample statistics, the proportion of those intervals that contain the true value of the parameter will be equal to the confidence level. Credibility interval (likely range of estimates across studies) vs confidence interval (our confidence in the mean correlation).
What is a confound or confounding variable? What is spurious relationship?
A confounding variable is an outside influence that changes the effect on a dependent from an independent variable. This extraneous influence is used to influence the outcome of an experimental design. Simply, a confounding variable is an extra variable that was not accounted for, but affects the relationship between x and y. Confounding variables can ruin an experiment and produce useless results. They suggest that there are correlations when there really are not. This may lead to a spurious relationship. A spurious relationship is a relationship in which two or more events or variables are not causally related to each other, yet it may be wrongly inferred that they are, due to either coincidence or the presence of a certain third, unseen factor (referred to as a "common response variable", "confounding factor", or "lurking variable").
What is a control variable? How are control variables included in analyses such as multiple regression? What are some considerations in including (or excluding) control variables in multivariate models?
A control variable is a variable that is included in a regression analysis to account for the effects of the predictor of interest, while holding constant effects of the control variable. Purposes of control variables: 1. Purification: The measure of the IV or DV is thought to be contaminated in some way. Use a control variable to remove the contamination. An example would be social desirability. 2. Account for other meaningful variables. Helps identify the unique contribution of an IV in the presence of other IV's. 3. Allows us to look at incremental prediction. Above and beyond z, how much of y does x explain? Seeing whether a substantive IV explains variance in the DV beyond the control variables. Potential problems in use of controls: a) Controls are sometimes a second-thought in terms of measurement. We normally do not give much thought to how to measure controls. It may be a problem because we may be mindlessly controlling for things. Treated as second-class citizens b) They can change the meaning of the substantive meaning of the IV so the relationship between the variables of interest changes. This is because it changes the meaning of what is left in the variance in the IV. c) Controls tend to not correlate highly with the DV or IV.
What is method variance (bias)? How can method variance be assessed? How can it be minimized?
A form of systematic error due to the methods with which the constructs are measured. This inflates artificially the relationships, or deflate validity evidence. It can be assessed by the Harlan's test, which assesses whether a common factor accounts for different variables. Moreover, if the source of method variance is known, researchers can control for it. Sources of method variance: rater factors (tendencies of the respondents; you end up measuring individual differences in "leniency" instead of individual differences in the underlying construct), item characteristics factors (scale or anchor points), item context factors, priming effects, and measurement context factors. How to deal with it? Study design (counterbalancing (randomly assigning participants to different method variance factors like the order of questions), filler tasks (avoid one item priming the other), and measure some of the factors and control for it (social desirability). Statistical: marker variables (a variable that is theoretically uncorrelated and control for it), measures of the method factor.
What is a moderator variable? How does one test for the presence of moderation or an interaction effect?
A moderator variable is a variable that changes the effect of the independent variable on the dependent variable. Occurs when a variable influences the magnitude and/or direction of the relationship between an IV and a DV. Interaction in the experimental fields. Moderator in non-experimental. Moderators provide information about when or under what conditions an IV relates to the DV. Moderators provide information about when or under what conditions an IV relates to the DV One tests for the presence of moderation or an interaction effect by using moderated multiple regression. This is done by including the main effect terms of both the independent variable and the moderator variable. Then, one adds the multiplicative term of the independent and moderator variable as an additional predictor in the regression model. If the interaction term is statistically significant, one finds the first evidence for moderation. The second evidence for moderation comes from graphing the interactions and analyzing how the moderator changes the relationship.
What is the normal distribution? What is a sampling distribution?
A sampling distribution is the probability distribution of a given random-sample-based statistic. If an arbitrarily large number of samples, each involving multiple observations (data points), were separately used in order to compute one value of a statistic (such as, for example, the sample mean or sample variance) for each sample, then the sampling distribution is the probability distribution of the values that the statistic takes on. In many contexts, only one sample is observed, but the sampling distribution can be found theoretically. The normal distribution Is a very common probability distribution that is bell-shaped, symmetric (not skewed), and with kurtosis of 3.
What is hypothesis testing? What is a null hypothesis? What are some criticisms of hypothesis testing?
A statistical hypothesis test is a method of statistical inference. Commonly, two statistical data sets (i.e. variables) are compared, or a data set obtained by sampling is compared against a synthetic data set from an idealized model. A hypothesis is proposed for the statistical relationship between two variables, and this is compared as an alternative to an idealized null hypothesis that proposes no relationship between two variables. The comparison is deemed statistically significant if the relationship between the data sets would be an unlikely realization of the null hypothesis according to a threshold probability—the significance level. Hypothesis tests are used in determining what outcomes of a study would lead to a rejection of the null hypothesis for a pre-specified level of significance. The process of distinguishing between the null hypothesis and the alternative hypothesis is aided by identifying two conceptual types of errors (type 1 & type 2), and by specifying parametric limits on e.g. how much type 1 error will be permitted. The criticisms to hypotheses testing stems from an idealization of threshold levels for the significance levels, as it trivializes nonsignificant relationships, ignores effects sizes, and is normally mute on statistical power.
What is a suppressor variable? What are some signs that suppression may be present in a multiple regression analysis?
A suppressor variable (in multiple regression) has zero (or close to zero) correlation with the criterion but is correlated with one or more of the predictor variables and, therefore, it will suppress irrelevant variance of independent variables. Including a suppressor variable will therefore improve the significance of the predictor variable on the criterion variable.
What is the difference between one- and two-tailed tests? When would you use one or the other?
A two-tailed test rejects the null on either direction, while a one-tailed test focuses on rejecting the null on one direction only. Significance level is conceptualized as the number of results more extreme than the statistic found in the focal sample. The one tailed test only checks the share of results that are more extreme than the focal statistic on the direction hypothesized, making it less conservative. The two tailed test checks the share of results that are more extreme than the focal statistic on either direction, making it more conservative. It is appropriate to use one-tailed tests when the theoretical rationale behind the hypothesis is well supported. Also, it is appropriate to use a one-tailed test when the statistical power from the sample is too low if a two tailed test was to be used and collecting more data would be too costly.
What are analysis of variance (ANOVA), multivariate analysis of variance (MANOVA), analysis of covariance (ANCOVA), and multivariate analysis of covariance (MANCOVA)?
Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among group means in a sample. ANOVA was developed by statistician and evolutionary biologist Ronald Fisher. ANOVA provides a statistical test of whether the population means of several groups are equal, and therefore generalizes the t-test to more than two groups. ANOVA is useful for comparing (testing) three or more group means for statistical significance. Multivariate analysis of variance (MANOVA) is simply an ANOVA with several dependent variables. That is to say, ANOVA tests for the difference in means between two or more groups, while MANOVA tests for the difference in two or more vectors of means. Analysis of covariance (ANCOVA) is a general linear model which blends ANOVA and regression. ANCOVA evaluates whether the means of a dependent variable (DV) are equal across levels of a categorical independent variable (IV) often called a treatment, while statistically controlling for the effects of other continuous variables that are not of primary interest, known as covariates (CV) or nuisance variables. Multivariate analysis of covariance (MANCOVA) is an extension of analysis of covariance (ANCOVA) methods to cover cases where there is more than one dependent variable and where the control of concomitant continuous independent variables - covariates - is required. The most prominent benefit of the MANCOVA design over the simple MANOVA is the 'factoring out' of noise or error that has been introduced by the covariant.
What do construct, content, predictive (or criterion-related), convergent, and discriminant validity mean in the context of measurement?
Construct validity refers to the degree to which a test measures what it claims, or purports, to be measuring. Content validity to the extent to which a measure represents all facets of a given construct. Predictive (criterion-related) validity is the extent to which a measure is related to an outcome. Do the scores on the measure predict or is it related to a criterion (like dependent variable). Convergent Validity is a sub-type of construct validity. Convergent validity refers to the degree to which two measures of constructs that theoretically should be related, are in fact related. Discriminant validity is a second sub-type of construct validity. It refers to whether concepts or measurements that are not supposed to be related are actually unrelated.
What is a correlation and what information does it provide? What are the differences among zero-order, partial, and semi-partial (part) correlations?
Correlation is s any statistical relationship, whether causal or not, between two random variables or bivariate data. It refers to how close two variables are to having a linear relationship with each other. It is normally assessed through the Pearson correlation coefficient is a measure of the linear correlation between two variables X and Y. It has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation. Zero-order correlation is a correlation measure where there are no controlled variables, such as the Pearson correlation. Partial correlation is a measure of the strength and direction of a linear relationship between two continuous variables whilst controlling for the effect of one or more other continuous variables (also known as 'covariates' or 'control' variables). Its squared-partial correlation can be interpreted as the proportion of residual variance in the dependent variable that is uniquely explained by the independent variable and not explained by other controls. Semi-partial correlation is similar to the partial correlation statistic. Like the partial correlation, it is a measure of the correlation between two variables that remains after controlling for (i.e., " partialling " out) the effects of one or more other predictor variables. However, semi-partial correlation explains the proportion of the total variance in the dependent variable that is uniquely explained by the independent variable and not by other controls.
What is exploratory factor analysis (EFA)? How does one conduct an EFA and what are some of the key output/statistics this analysis provides?
EFA involves analyzing correlations among items. Identify groups of items (or factors) that correlate highly with one another, but not with other items. Use EFA when there is no strong idea of how many factors there are from the data. Extraction: principal axis factoring is the most common. Principal components is technically not factor analysis because it is pure data reduction. Rotation: simplifies the structure of the data. Simple structure. Choose Promax if oblique (factors can be correlated). Orthogonal (create factors that are as uncorrelated as possible) in special cases (varimax). Chad prefers pairwise. Pattern matrix is the most important: factor loadings in them. Factor loadings are essentially the correlation between the items and the factor. Suppress small loadings Eigenvalue versus retained factors. Communalities: these represent the percentage of variance in each item that is accounted for by the factors. Variance explained is the total variance in the items explained by the factors. Eigenvalues: the number of items that each factor accounts for. That is why you want them to be higher than 1 Scree Plot: a plot of the eigenvalues. We are looking for a break in the trend to assess what number of items is reasonable. Then you include the factors after that break (6 factors might be reasonable). Factor correlations: we are looking for large correlations among factors. Look at how correlated they are. More an art than a science.
What are eta-squared (e.g., in ANOVA), partial eta-squared, omega-squared, and epsilon-squared?
Eta squared measures the proportion of the total variance in a dependent variable that is associated with the membership of different groups defined by an independent variable. Partial eta squared is a similar measure in which the effects of other independent variables and interactions are partialled out. Omega squared (ω2) is a measure of effect size, or the degree of association for a population. It is an estimate of how much variance in the response variables are accounted for by the explanatory variables. Omega squared is widely viewed as a lesser biased alternative to eta-squared, especially when sample sizes are small. Epsilon squared is a measure of effect size (Kelly, 1935). It is one of the least common measures of effect sizes. The formula is basically the same as that for omega squared, except that there is one less term in the denominator.
What does hypothesizing after the results are known (i.e., "HARKing") involve? Is this a problem?
HARKing involves posting a research paper with hypotheses that are purportedly ex-ante sampling and testing, and derived solely from theoretical reasoning. HARKing is an unethical research practice because it is a lie and it corrupts the scientific method where a theoretical hypothesis is confirmed with a research design and statistical methods. This is a problem because the relationship found can be a statistical fluke, leading researchers to incorrect beliefs after publication. It can be a statistical fluke because a p-value can be interpreted as the percentage of statistical estimates that will be more extreme if the null hypothesis is true. Hence, if one uses a p-value of 0.05, there are 5% of statistical estimates that can be more extreme if the null hypothesis is true (hence we think that the alternative hypothesis is supported). This leads to an inflated number of studies with type-1 error (false positives).
What is hierarchical regression analysis?
Hierarchical regression analysis is the comparison of two regression models by doing an F-test to assess whether a more complex model fits the data better. The less complex model needs to be nested within the more complex model. It is said that the less complex model is nested within the more complex model if the all the estimated parameters in the less complex model are also estimated in the more complex model, both models are tested on the same data, and both models utilize the same statistical method.
What is confirmatory factor analysis (CFA)? How does one conduct a CFA and what are some of the key output/statistics this analysis provides?
Instead of exploring we are trying to confirm a particular number of factors. Latent factors Phi- the correlation between the two latent factors The indicators (items). The error terms (residual terms). The unique variance to the item (i.e. unexplained by the latent factor). CFA: (1) Tests the fit of this model to the actual data. (2) Look at the standardized loadings. (3) Look at the fit indices. As such, in CFA you specify the model, so you are more intentional. That is an advantage because you can test. The items only load on the specified factor. CFA it provides more information on model fit. CFA provides modification indices that give more information about improvement. Multigroup CFA: compare the same model in two samples.
What are internal consistency, split-half, test-retest, parallel forms, and interrater reliability?
Internal Consistency Reliability: the most common and assessed with Cronbach's alpha. Assesses consistency of scores across multiple items designed to measure the same construct. Level of analysis is at the item level. It is a function of two things: number of items and the average correlation between the items. Measures random response error. The split-half method assesses the internal consistency of a test. It measures the extent to which all parts of the test contribute equally to what is being measured. This is done by comparing the results of one half of a test with the results from the other half. Parallel forms reliability assesses consistency of scores on two parallel measures of the same construct. Level of analysis is at the measure level. It measures random response, specific factor error. Test retest reliability measures the consistency of scores over time. Level of analysis in the measure level. Measures random response error and transient error. Interrater reliability assesses the consistency of ratings of the same construct by different raters. At the measure level. Measures random error and rater error.
What is measurement error? What are common sources of measurement error?
Measurement error is anything that causes inconsistency in scores. Random response error: caused by momentary variations in attention or mental processing. Question-specific by person. Specific factor error: caused by some element of the measurement situation. Something about the measure, not the person. Transient error: error caused by temporal variations in the respondents' mood, feelings, or motivation. If the construct is stable, changes over time are transient error. If the construct changes over time, transient error is harder to assess. Rater error: caused by the rater's unique perceptions of the construct being rated. Inconsistencies among raters.
What is a mediator variable? How does one test for the presence of a mediator variable? What are direct, indirect, and total effects in the context of mediation analyses?
Mediation occurs when the relationship between an independent variable and a dependent variable can be accounted for by a third variable. Indirect effect: there is no direct effect from x to y, but x has an effect on z and z on y Full Mediation: there is a direct effect from x and y, but when one includes the mediator the relationship between x and y goes away. Partial Mediation: x leads to y, x leads to z and z leads to y. When the mediator is controlled. The relationship between x and y goes away. Testing for mediation: 1. test x on y: if n.s. can only be indirect effect 2. test x on z 3. test z on y 4. test x and z on y 5. If using SEM, test full model. (ask for indirect effect from analysis properties in AMOS). 6. If using B&K(86) do Sobel test or bootstrap. 7. Test the significance of the indirect effect (bootstrap).
What are missing data? When can missing data be a problem? What steps can researchers take to assess whether missing data are a problem?
Missing data, or missing values, occur when no data value is stored for the variable in an observation. Missing data are a common occurrence and can have a significant effect on the conclusions that can be drawn from the data. Missing data can occur because of nonresponse: no information is provided for one or more items or for a whole unit ("subject"). Attrition ("Dropout") is a type of missingness that can occur in longitudinal studies when participants drop out before the test ends and one or more measurements are missing. These forms of missingness take different types, with different impacts on the validity of conclusions from research: Missing completely at random, missing at random, and missing not at random. Values in a data set are missing completely at random (MCAR) if the events that lead to any particular data-item being missing are independent both of observable variables and of unobservable parameters of interest, and occur entirely at random. When data are MCAR, the analysis performed on the data is unbiased; however, data are rarely MCAR. Missing at random (MAR) occurs when the missingness is not random, but where missingness can be fully accounted for by variables where there is complete information. MAR is an assumption that is impossible to verify statistically, we must rely on its substantive reasonableness. An example is that males are less likely to fill in a depression survey but this has nothing to do with their level of depression, after accounting for maleness. Missing not at random (MNAR) (also known as nonignorable nonresponse) is data that is neither MAR nor MCAR. To extend the previous example, this would occur if men failed to fill in a depression survey because of their level of depression. One can deal with missingness in two ways: deletion, or imputation.
What is multicollinearity? How can it be detected in a multiple regression context and what problems can it pose?
Multicollinearity is when is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. This causes the coefficient estimates to change erratically from sample to sample, and higher standard errors in regression. Hence, this lowers the statistical power in a regression. Multicollinearity is the situation where two or more variables have high correlations and explain the same variance. At the limit, perfect multicollinearity makes a regression model unable to be run. One can detect multicollinearity by looking at correlation coefficients, calculating Variance Inflation Factors (or its inverse known as tolerance), which calculate the amount of the variance in each predictor variable that is due to the other predictor variables.
What are multilevel models? What are some of the basic questions that can be answered with multilevel analyses such as HLM?
Multilevel models are those where the data has a multi-level structure, such that observations in the lower level are nested within groups. Multilevel models allow you to test hypothesis at different levels simultaneously, controlling for factors at different levels. An example question that can be answered is how the number of people in a team affects the individual performance according to the individual's personality.
What is a nomological network?
Nomological network is a representation of the concepts (constructs) of interest in a study, their observable manifestations, and the interrelationships among and between these. Validity evidence based on nomological validity is a form of construct validity. It is the degree to which a construct behaves as it should within a system of related constructs (the nomological network). The elements of a nomological network are: (1) Must have at least two constructs; (2) Theoretical propositions, specifying linkages between constructs; (3) Correspondence rules, allowing a construct to be measured. (4) Empirical constructs or variables that can actually be measured; (5) Empirical linkages: Hypotheses before data collection. Empirical generalization after data collection.
What is the difference between an omitted variable and a noisy variable? What are the consequences of excluding omitted and noisy variables from regression or structural equation models?
Omitted variables are variables that should be included but weren't. it may lead to omitted variable bias, which occurs when a statistical model incorrectly leaves out one or more relevant variables. The bias results in the model attributing the effect of the missing variables to the estimated effects of the included variables. Noise Variables are difficult or impossible to control at the design and production level, but can be controlled at the analysis level. A noisy variable are factors that could possibly, but not plausibly, affect the dependent variable. Noise variables can be disregarded by ensuring that the selection of problems used is not biased.
What is range restriction? How can range restriction affect relations between variables?
Restriction of range is the term applied to the case in which observed sample data are not available across the entire range of interest. The most common case is that of a bivariate correlation between two normally distributed variables, one of which has a range less than that commonly observed in the population as a whole. In such cases the observed correlation in the range restricted sample will be attenuated (lower) than it would be if data from the entire possible range were analyzed.
What is sampling error?
Sampling error is incurred when the statistical characteristics of a population are estimated from a subset, or sample, of that population. Since the sample does not include all members of the population, statistics on the sample, such as means and quantiles, generally differ from the characteristics of the entire population, which are known as parameters.
What are skewness and kurtosis?
Skewness is a measure of the symmetry in a distribution. A symmetrical dataset will have a skewness equal to 0. So, a normal distribution will have a skewness of 0. Skewness essentially measures the relative size of the two tails. skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive or negative, or undefined. negative skew: The left tail is longer; the mass of the distribution is concentrated on the right of the figure. The distribution is said to be left-skewed, left-tailed, or skewed to the left, despite the fact that the curve itself appears to be skewed or leaning to the right; left instead refers to the left tail being drawn out and, often, the mean being skewed to the left of a typical center of the data. positive skew: The right tail is longer; the mass of the distribution is concentrated on the left of the figure. The distribution is said to be right-skewed, right-tailed, or skewed to the right, despite the fact that the curve itself appears to be skewed or leaning to the left; right instead refers to the right tail being drawn out and, often, the mean being skewed to the right of a typical center of the data. Kurtosis is a measure of the combined sizes of the two tails. It measures the amount of probability in the tails. The value is often compared to the kurtosis of the normal distribution, which is equal to 3. If the kurtosis is greater than 3, then the dataset has heavier tails than a normal distribution (more in the tails).
What is the difference between statistical significance and effect size or practical significance? What are some common effect size benchmarks or points of reference?
Statistical significance is just the level of confidence with which the null can be rejected. Effect size is how strong the relationship is between two variables in the sample. The two are related because, all else equal, a strong effect size is more likely to be significant. However, a sample may show a relationship between two variables as very strong, but because variability in said relationship is high, the actual statistic is not significant. The effect size benchmarks are usually related in terms of correlations where as small effect Is that of a correlation below 0.1, a medium effect size is a correlation of 0.3, and a large effect size is a correlation of 0.5. An alternative way to assess effect size is the d statistic, where a d of 0.2 is small, a d of 0.5 is medium sized, and a d statistic of 0.8 is considered large.