Research Methods Study Guide
• The more tests you run, you increase the chances of getting a bogus significant result. 7 • You adjust for multiple comparisons when you run multiple statistical tests. • If you do correct for multiple comparisons, you may increase the number of false negatives (type 2 error). • False negatives can be very costly. If this is the case, do not try to control for multiple comparisons (ex: pharmaceuticals) • If you do not correct for multiple comparisons, you may increase the number of false positives (type 1 error).
Analytically, what is "multiple comparisons"? When would you adjust for multiple comparisons and what are some concerns if you do and do not adjust for multiple comparisons?
. Standard deviation measures how spread out numbers in a data set are. a. Standard Error Measures how far the sample mean is from the true population mean. b. However, standard error uses statistics (sample data) and the standard deviation uses parameters (population data).
Compare and contrast standard deviation versus standard error of measurement.
A test is valid if it measures what it is supposed to measure. Reliability is another term for consistency, does it measure the same thing repeatedly. A measurement may be reliable but not valid, but a test can not be valid unless it is reliable.
Contrast reliability from validity.
-Sensitivity: proportion of people with the disease who will have a positive result. correctly identifies patients with a disease. (The number of positive test results for the presence of an outcome (a) divided by the total presence of an outcome (a+b) Sensitivity = a / (a+b) ) -Specificity: proportion of people without the disease who will have a negative result. how well a test identifies patients who do not have a disease (Number of negative test results for the absence of an outcome (d) divided by the total absences of an outcome (c + d) Specificity = d / (c+d) ) -Area Under Curve: how well a parameter can distinguish between two diagnostic groups Eval: A test that is 100% sensitive will identify all patients who have the disease. A test that has 100% specificity will identify 100% of patients who do not have the disease. AUC used to determine which model predicts best
Contrast sensitivity, specificity and AUC. Given an example of how these metrics can be used to evaluate the validity of a diagnostic test
• Test-retest- administering the same test twice over a period of time to a group of individuals • Internal Consistency- measure of reliability used to evaluate the degree of which different test items that probe the same construct that produce the same results, whether several items that propose to measure the same general construct produce similar scores.
Describe different types of reliability (e.g., test-retest, internal consistency).
• Construct: refers to how well a test or tool measures the construct that it was designed to measure • Face: The degree to which a procedure, especially a psychological test or assessment, appears effective in terms of its stated aims. • Content: extent to which a measure represents all facets of a given construct • Criterion-Referenced/Related Validity • Concurrent: measuring items at the same time. (Previous test) • Predictive: Does X predict Y • Convergent: Does my item correlate to something it should be • Discriminant: Is my item not correlated to something it should be
Describe different types of validity (e.g., construct validity, face validity, convergent validity, discriminant validity).
-Can skew data, especially mean and SD. -Can identify them via data plots (box and scatter), SD, Tukey's -Remove them or change them, state what you did in data
Discuss the implications of having outliers in your data. Indicate how to identify whether it is an outlier or not. Then indicate what you would do if you identified an outlier in your data.
To find out where the differences occur
For a one-way ANOVA, when would you employ a post-hoc test and why would you employ this test?
When you increase the sample size (n), the p-value will not decrease (leading to "better" results) if the absolute difference between the mean (X) and constant decreases and the standard deviation increases.
How might a "larger n may not lead to "better" results"?
Proportion of the variance in DV that is predictable from IV. Correlation between smoking and lung cancer r squared = .9. 81% of variance in cancer can be explained by variance in smoking
In a multivariable linear regression, what information does r-squared provide and give an example of how you would interpret a specific r-squared value?
You would use the SEM to determine the precision of the measurement instrument. The smaller the value, the more precise the instrument is. -Ex: Aerobic vs stretching. 13% Body Fat - after intervention - 11%. SEM=3%. → Reduction driven by intervention? Prob not bc change is less than SEM
Provide an example of how you would use the standard error of measurement when interpreting the results of an intervention.
1. Show that the causal variable is correlated with the outcome. This step establishes that there is an effect that may be mediated 2. Show that the causal variable is correlated with the mediator. This step essentially involves treating the mediator as if it were an outcome variable. 3. Show that the mediator affects the outcome variable 4. To establish that M completely mediates the X-Y relationship, the effect of X on Y controlling for M (path c') should be zero
Provide the 4-step Barron and Kenny mediation approach.
• NEVER draw a conclusion merely based on a p-value • In addition to p-values, report both descriptive statistics (mean and SD) of variables being tested and the value of the statistics themselves (t or F values) • Always include "statistically" as a prefix when using a word "significant" to describe p-value based finding • Compute and report CI • Make judgment on the meaning of the correlation coefficient or compute the coefficient of determination • For other statistical tests, report effect size • Explain effect size under the context of "clinical/practical" significance and link each unit of change in a DV with its real life meaning
What are 7 things to always do with hypothesis testing?
Reliability=the consistency of measurements Intrasubject variability: variability within subjects; ex. Assessing vertical height with a test and want to test reliability, instrument doesn't change-- has to do with the subject; variability in each subject; range of possible values for any measurable characteristic of the subjects Inter-rater variability: lack of interrater (between raters) reliability or agreement. This refers to the degree to which different raters give consistent estimates of the same behavior. Intra-rater variability: within raters; high degree of agreement. Intra-rater reliability refers to the degree of agreement among multiple repetitions of a diagnostic test performed by a single rater.
What are factors that influence reliability (discuss intrasubject variability, inter-rater variability, intra-rater variability)?
Nominal, Ordinal and Continuous
What are several different levels of measurement?
a. Mean- average of all numbers within a set. b. Median- the middle number in a set. c. Mode- the number in a set that occurs the most
What are several measures of central tendency and what is the difference between them?
One concern is that a p-value will be the sole determinant of significance, regardless of limitations of the data set. A p-value of 0.051 set against ɑ=0.05 is said to be not statistically significant, however that doesn't necessarily mean that the results have no meaning or weight, and vice versa. b. P-value highly dependent on sample size, and this can translate into T1 and T2 errors
What are several potential concerns with "p-values"?
ADV: 1-Power is often increased 2-Fewer participants generally needed DIS: 1-Carryover effects 2-Practice effects
What are two advantages and two disadvantages of a one-way repeated measures ANOVA?
Relationship between 2+ characteristics, direction, magnitude effect, correlation doesn't always equal causation but sometimes it can
What are two main things we can learn from a correlation?
P-value hacking, in which investigators run different forms of analysis on a data set until they find results that suit them, is one such practice of torture. A classic example of P-value hacking is the practice of identifying subsets of data that lead to significant findings. The findings for these subsets are then reported as if this was the central question of the study.
What is "p-value" hacking and what are the potential concerns with this?
• Specifies how the collected information will be entered into a computer database. How anticipated data problems such as missing data, will be handled.
What is a "codebook" and give an example of its importance and why it may be problematic to not use a codebook?
a. A 95% confidence interval means that utilizing a normally distributed population, there is a 95% chance that the true mean lies within that range. If a study measure falls outside of that 95% CI, then there would be statistically significant difference from the population measure given ɑ=0.05. b. An example of this would be
What is a 95% CI and how can you use this to determine whether statistical significance (at the 0.05 alpha level) has been reached? Given an example.
evaluates whether the means of DV are equal across levels of a categorical IV often called a treatment, while statistically controlling for the effects of other continuous variables that are not of primary interest, known as covariates. Example = BMI + Race - think there's a confounding variable (age) so you control for age
What is a ANCOVA and give an example of when you would use it?
Essentially a non-parametric alternative to a one-way ANOVA. Used to determine if there are statistically significant differences between two or more groups of an independent variable on a continuous or ordinal dependent variable. Example = do test scores differ in different elementary school grades?
What is a Kruskal Wallis test and give an example of when you would use it?
Non-parametric test for 2 paired nominal or categorical variables. Example = was there a change in participants before treatment (yes/no) and after treatment (yes/no) - 2x2 graph
What is a McNemar's test and give an example of when you would use it?
Tests actual results against a null hypothesis, comparing what is observed with what is expected by chance. Example = Change in one variable is associated with change in another variable
What is a Pearson chi-square test and give an example of when you would use it?
Non-parametric version of dependent t-test. Used to compare two sets of scores from same people, such as after a change in time/intervention. Example = Test smokers cigarette consumption and again after cessation intervention
What is a Wilcoxon signed rank sum test and give an example of when you would use it?
Non-parametric test that does not violate the three assumptions. It compares the outcomes between two independent groups, like comparing medians between two groups. Example = give one asthma group placebo, one group drug, see if two groups have different numbers of asthma episodes
What is a Wilcoxon-Mann-Whitney test and give an example of when you would use it?
• Intended to test how likely it is that an observed distribution is due to chance, specifically categorical data. • Example- test whether attending class influences how students perform on an exam. (attended, skipped, pass, fail)
What is a chi-square goodness of fit test and give an example of when you would use it?
• Examines whether there is a linear relationship between one predictor variable and the outcome variable. • It is used when the outcome variable is a ratio or interval variable.
What is a linear regression and give an example of when you would use it?
• Commonly used in case-control studies, for which the outcome variable is usually case status, which case = 1 and control = 0. • Used when the outcome variable is a dichotomous variable
What is a logistic regression and give an example of when you would use it?
Used to explain relationship between one nominal DV and one or more continuous-level(interval or ratio scale) IVS. DV has 3+ levels or categories. Example = Type of drink preferred (multiple drinks - DV) and age (IV)
What is a multinomial logistic regression and give an example of when you would use it?
Test that sees if observed results differ from expected results. Example = Class equals 60% Female, is that a greater percent than % of females at entire university (52%F - is this sig?)
What is a one sample binomial test and give an example of when you would use it?
Test used to test a hypothesized median value against an observed median value in a representative sample. Example = Test scores average of sample (ex:high school) equal to national test scores average
What is a one sample median test and give an example of when you would use it?
• A one sample t-test is when the outcome variable is continually expressed. An example = class BMI vs. Entire university BMI and compare all your data to 1 value
What is a one sample t-test and give an example of when you would use it?
Used to determine if there is significant differences between means of 2+ groups based on one dimension or treatment. Example = BMI with 5 different races
What is a one-way ANOVA and give an example of when you would use it?
used to compare three or more group means where the participants are the same in each group. Participants are measured multiple times to see changes to an intervention; or when participants are subjected to more than one condition/trial and the response to each of these conditions wants to be compared. Example = Assess BMI at four different times in a month
What is a one-way repeated measures ANOVA and give an example of when you would use it?
Outcome variables continuously expressed. Example = same participants, test 2 different times (pre-post test)
What is a paired t-test and give an example of when you would use it?
• Taking a third variable out of the relationship between two variables. • Balding compared to cardiovascular disease along with age • measures the strength of a relationship between two variables, while controlling for the effect of one or more other variables. For example, you might want to see if there is a correlation between amount of food eaten and blood pressure, while controlling for weight or amount of exercise. (Removing effects of age on Balding and CVD)
What is a partial correlation and give an example of when you would use it?
• Removing the influence of a third variable on only one of the two variables in a relationship • Age only refers to one variable (i.e. balding) • Partial correlation holds variable X3 constant for both the other two variables. Whereas, Semipartial correlation holds variable X3 for only one variable (either X1 or X2) (removing effects of age on only one variable - balding)
What is a semi-partial correlation and give an example of when you would use it?
Compares the means of two independent groups in order to determine whether there is statistical evidence that the associated population means are significantly different. (Parametric Test)
What is a two-independent samples t-test and give an example of when you would use it?
Outcome variable expressed in rank order. DV is ordinal and IVS are ordinal or continuous. Example = Does gender influence happiness scale?
What is an ordinal regression and give an example of when you would use it?
Conditional process analysis is an analysis that is used to understand the contingencies of the mechanisms that produce effects; used when mediation and moderation are present
What is conditional process analysis and give an example of when you would evaluate it?
Isotemporal substitution modeling estimates the effect of replacing one physical activity type with another physical activity type for the same amount of time.
What is iso-temporal substitution modeling? Discuss it in the context of regression as well as demonstrate it via mathematically (equation).
Is assessing the effect on the IV on the DV but through multiple mediators • You would evaluate it when you have 2+ mediators that aren't related to each other • Diamond of noise sensitivity, perceived stress, depression complaints, and sleep problems
What is parallel mediation and give an example of when you would evaluate it?
When measurements fluctuate unpredictably around their true values and is caused by imprecise measurement tools or true biological variability, or both
What is regression dilution bias? What is a method to help correct regression dilution bias?
Multiple mediators exist - x influences y via A and B. Example = pain related to depression with cognitive functions and perceptions of stress as mediators
What is serial mediation and give an example of when you would evaluate it?
Used to analyze the structural relationship between measured variables and latent constructs. Measurement aspect, degree of simultaneous regression, sees if association is direct or indirect. Used to assess unobservable constructs. Example = Intelligence score can predict academic performance (SAT, ACT, etc.)
What is structural equation modeling and give an example of when you would use it?
Pearson Correlation measures linear relationship between two continuous variables Spearman Rho measures between continuous or ordinal
What is the difference between a Spearman Rho and a Pearson correlation?
• Parametric: makes assumptions about the parameters. Non-parametric: does not make assumptions about the parameters. • 3 main assumptions: 1. The population from which the sample is drawn is normally distributed on the variable of interest. 2. The samples drawn from a population have the same variances on the variable of interest. 3. The observations are independent.
What is the difference between a parametric test and a non-parametric test and provide the 3 main assumptions of a parametric test.
• Equivalence testing is used when you want to know if a treatment or service has the same effect, but less cost (safer, more convenient). You're looking to see if the new treatment is equivalent or superior to the current method. • In a traditional comparative analysis for an RCT, the burden of proof is on the research hypothesis of difference between treatments/therapies.
What is the difference between a traditional comparative analysis for an RCT versus equivalence testing? Demonstrate via narration and visually.
a. It is up to the researcher to determine the desired alpha value generally by default set at 0.05. The p-value is statistically calculated from data provided from a study, which is measured against the alpha value to determine whether or not the result/measure is statistically significant.
What is the difference between alpha and a p-value?
Complete: if inclusion of the mediation variable drops the relationship between the independent variable and dependent variable to zero. Rarely occurs. The most likely event is that c becomes a weaker, yet still significant path with the inclusion of the mediation effect. **X causes Y because of Z (without Z, no relationship b/n X and Y) -Partial: mediating variable accounts for some, but not all, of the relationship between the IV and DV. Not only a significant relationship between the mediator and the dependent variable, but also some direct relationship between the independent and dependent variable **X causes Y because of Z and something else (some relationship b/n X and Y w/o Z) • Based on the Barron and Kenny approach, if all four steps are met, then complete mediation has been achieved. If only the first three steps are met, it results in partial mediation.
What is the difference between complete mediation and partial mediation? How do you know if you have achieved complete and partial mediation based on the 4-step Barron and Kenny approach?
Moderation is a way to check whether that third variable influences the strength or direction of the relationship between an independent and dependent variable. A mediator mediates the relationship between the independent and dependent variables
What is the difference between mediation and moderation and give an example of when you would evaluate these?
a. Skewness- the measure of a data set's symmetry. Normal distribution has a skewness of 0. b. Kurtosis- the degree of peakedness of a distribution.
What is the difference between skewness and kurtosis?