Regression Exam 3
Describe the steps involved in conducting a propensity score analysis in an observational study comparing students in one school given an active treatment (T) and students in a second school given a control treatment (C).
1. All identified covariates that might be confounding the treatment-outcome base are measured at baseline 2. A logistic regression equation is used to estimate the probability that each participant would receive the treatment o In brief we estimate the log of odds (the "logit") that person i is a case 3. We identify cases in each group that we can closely match on the logit of their propensity score o If cases are properly matched on propensity scores, then they will be balanced within sampling error on all covariates that are involved in the prediction of the propensity score. This outcome mimics a randomized experiment, but only for measured covariates 4. Each of the covariates are checked for balance 5. The two matched groups are compared on the outcome
What are the three stages of multiple imputation?
1. Imputation phase Create multiple copies of the data, each with different imputed values for missing values 2. Analysis phase Perform standard statistical analyses (e.g., MR) separately on each data set 3. Pooling phase Combine the collection of estimates and standard errors into a single set of results
If you have unequal sample sizes in the groups in a data set, describe the two grand means that can be computed. Then explain how the unweighted effects codes versus the weighted effects codes measure discrepancies between the group means and these two different grand means. When do unweighted and weighted effects codes yield the same results?
1. Unweighted Mean (sum of individual means/number of groups) 2. weighted mean (sum of individual means with each mean multiplied by the category's sample size/sum of sample sizes) Unweighted effect coding compares each group mean with the unweighted mean of all the groups, while weighted effect coding compares each group mean with the weighted mean of all the groups Weighted and unweighted will yield the same results when the number of cells is equal (equal n).
What is the general strategy taken by robust statistics to the problem of outliers?
Alternative "robust" estimators minimize the effect of outliers. The goal of the robust estimators is to produce less biased estimates with smaller standard errors than OLS regression. However, none of the robust alternatives work well in all cases.
What is meant by auxiliary variables? Describe an advantage of multiple imputation in its utilization of auxiliary variables.
Auxiliary Variables - Imputation Phase can be based on all available variables that might predict missingness (e.g., distance to site). Analysis phase based only on variables of theoretical interest. The advantage of using auxiliary variables is that they can potentially make MAR more like MCAR. Additional terms can be added to the imputation phase to represent curvilinear or interactive relationships.
To what problems does the inclusion of step 1 of Baron and Kenny's approach to mediation lead (testing the effect of X on Y)?
Baron and Kenny's step 1 is a combined test of the direct c' and indirect (mediated) ac effects. Inclusion of this step does not test the specific hypothesis of interest in mediation, i.e. H0: αβ = 0, where α is the value of a in the population and β is the value of b in the population (does not directly test ab) Inclusion of step 1 can decrease the statistical power of the test of mediation
Suppose I have the regression equation Yhat = b0 + b1X + b2C + b3XC. In this regression equation, C is gender, X is height, Y is weight. Based on this regression model, how do I test the difference in weight for men and women who are 68 inches tall?
Center the data at X (height) = 68 and apply dummy coding/contrast coding to the gender variable, then run the regression model. The b2 coefficient gives the difference in intercepts/value of weight between men and women at X = 68 inches.
In the equation: Yhat= b1 X + b2 C + b3 XC + b0 when using the contrast code versus Y hat= b1 X + b2 UE + b3 X*UE + b0 when using the unweighted effects codes explain how the numerical values of coefficients will change when you switch coding systems. Will the significance of the coefficients change when you change between these two coding systems?
Change of numerical values: b0 and b1 will not change; b2 and b3 will be half the value in unweighted than they were in contrast Will significance change?: Overall significance will stay the same
If the reference (base) group is changed for dummy, weighted, or unweighted effects codes, be able to explain how each of the coefficients would change
Changing the reference/base group would not change the value of b0 for the weighted and unweighted effect coding, but would change for dummy coding (since b0 represents the mean of the selected reference group, while it represents the overall grand mean in the other two coding schemes). b1 and b2 would change for dummy coding, but not weighted and unweighted coding since it's the still the difference between the mean of each group with the unweighted mean
Given an ANCOVA model with 3 treatment groups and 1 (metric) covariate, how is the overall treatment effect tested in multiple regression? What is the df for the overall treatment effect? What is the df for the covariate?
Coding: C1 C2 T1 0 0 T2 0 1 C 1 0 Overall treatment effect: Yhat = b0 + b1COV + b2D1 + b3D2 Reduced model: = b0 + b1COV Overall df of group effect is 2 Compare the effect of the full model with the reduced model using the gain in prediction model (df = m, n-p-1) df for covariate = 1
What is the solution to the computing standardized residuals with data containing outliers? How is MSresidual(i) computed? I will refer to residuals divided by their deleted standard errors as "externally studentized
Compute the Externally Studentized Residual: (Y - Yhat)/SE Run the regression with the cases removed and compute the predicted score as well as the standard error from the analysis with the case removed
Be able to take a 2 x 3 ANOVA (two levels of factor A and three levels of factor B) and show the design matrix. Test the hypotheses that the mean of B1 does not differ from the mean of (B2 and B3) and that the means of B2 and B3 do not differ.
Correct coding scheme that is orthogonal and corresponds to 1-unit change for full interpretability (only first two contrasts shown to match hypotheses): C1 C2 A1, B1 -.67 0 A1, B2 .33 -.5 A1, B3 .33 .5 A2, B1 -.67 0 A2, B2 .33 -.5 A2, B3 .33 .5 With the regression equation Yhat = b0 +b1C1 + b2C2 +b3C3 + b4C4 + b5C5, the t-test of b1 is the test of the mean of B1 versus the mean of B2 plus B3. The t-test of b2 is the test of the mean of B2 versus the mean of B3 You could also set the the b1 term to 0 as a reduced model and compare it to the full model using the test of gain in prediction
What does DFBETAS measure?
DFBETAS is a measure of standardized change in regression coefficient when case is deleted 1. One score per regression coefficient per case 2. Regression coefficient with case in analysis versus removed from
What does DFFITS measure?
DFFITS is a measure of standardized change based on predicted scores One score per case Case deleted from predicted score and standard error
What are the two approaches that may be taken with data imputation with multivariate missing data? What are the primary strengths and limitations of each?
Data Augmentation: makes a strong assumption that the complete data have a multivariate normal distribution. This implies all relationships are linear. The imputation step is conducted simultaneously for all variables having missing data. When the data are continuous and approximate multivariate normality, this approach can be very efficient. Full conditional specification (FCS, aka chained equations): each variable with missing data is imputed one at a time. The variable (say X1) with the least missing data is imputed first, then the variable (say X2) with the second least missing data is imputed second, and so on. Each regression equation represents the measurement level of the outcome variable. If X1 is binary, logistic regression is used; if X2 is a count variable, Poisson regression is used, ... Thus, FCS is far more flexible than data augmentation. However, FCS is far more likely to have problems achieving a solution and may not return any results.
What are possible solutions when you detect an outlier?
Deletion method -The initial analysis is run (analysis 1), the most problematic outlier is deleted, then the analysis is re-run (analysis 2) and new outliers are identified. This process continues until the set of outliers have been detected and deleted one by one Transformations -- Change the metric of the data (i.e., taking the log of Y and X and rerunning analysis). The transformation does not always make outliers disappear. Sometimes new and different outliers will appear following transformation Robust regression -use robust estimators that are less influenced by extreme values (e.g., least absolute deviation LAD when there are outliers on Y and not X; Thiel-Sen Estimator when working with single predictor regression; Least trimmed squares; or bootstrapping).
What are deterministic outliers (a.k.a., contaminated observations)? What are some example sources? What are probabilistic outliers (a.k.a., rare cases)? What is an example source?
Deterministic outliers (i.e., contaminated observations) are errors in data collection. Some example sources include situations where interviewer misread questions, errors in recording, state of respondent (e.g., intoxicated). Probabilistic outliers (i.e., rare cases) occur when the extreme value is in fact a true possibility. An example source would be having a misspecified model (i.e., have a linear instead of quadratic model). In order to remedy this extreme value a researcher could delete the case, correctly specifying a model, or transform the data
Explain distance. Is a specific regression model required to measure distance? Does high distance necessarily mean that a point is affecting the regression outcome?
Distance is the extent to which a score is extreme given the values on the predictors. Yes, a regression equation is required to calculate the residual which is the basis of all measures of distance. No, high distance only means there is potential for that point to affect the regression outcome, not that it will.
Briefly describe two alternative approaches that address the problem in dealing with ab as an estimator of the mediation effect
Distribution of Product Method: o Statistical theory to provide the exact (ugly) distribution of the confidence interval of ab. The distribution will in general not be close to normal. Percentile Bootstrapping: Performs similarly to the distribution of the product method. A bootstrap sample would be selected and equation (2) above would be used to estimate a and equation (3) would be used to estimate b in the same bootstrap sample. This process would be repeated in a large number of bootstrap samples (say 2000) so we have 2000 bootstrap estimates of ab. Then the estimates are sorted from low to high. The cutpoint between the 50th and 51st ranked estimates defines the lower limit (2.5 percentile) and the cutpoint between the 1950th and 1951st estimates (97.5 percentile) defines the upper limit of the percentile bootstrap CI. The estimate of ab from the parent sample is the best point estimate of the mediated effect
For each coding scheme, be able to take the general regression equation and the codes and interpret what each of the coefficients in the equation is measuring: E.g. 2 contrasts with 3 groups (psychotherapy, drug, and control): , Y = depression For contrast coding, we have 2 control groups and 1 treatment
Dummy coding interpretiation: b0 is the mean in the reference group (coded 0 0)--no treatment control b1 is the difference between the mean of the psychotherapy and the mean of the control group b2 is the difference between the mean of the drug group and the mean of the control group unweighted effect coding interpretation: b0 is the unweighted mean of the three groups combined. b1 is the difference between the mean of the psychotherapy group and the unweighted mean of the three groups combined. b2 is the difference between the mean of the drug group and the unweighted mean of the three groups combined weighted effect coding interpretation: b0: weighted mean of 3 group means b1: the difference between the mean for group 1 and the weighted mean of the three groups b2: the difference between the mean for group 2 and the weighted mean of the three groups. contrast coding interpretation: b0 is the unweighted mean of the three groups b1 is the difference between the Treatment and the Mean of the two Control groups b2 is the difference between the Mean(Con1) and the Mean(Con2)
In any coding scheme for G groups, how many code variables are required to characterize the G groups?
G-1 code variables are needed
In a randomized experiment, can the test of the mediated effect be confounded? How might it be confounded? What variables are particularly important to control for to reduce this form of confounding?
If randomized, the test can be confounded if you don't control for an extra variable that isn't included as predictor or mediator but may be causing the M (mediator) or Y (criterion) There may also be a "backdoor path" when baseline measures of the mediator and outcome are correlated with the post-treatment mediator and outcomes, making it look like there is a relationship independent of treatment effect. To reduce this you control for baseline levels of the mediator and outcome
If the within class regression slopes differ as a function of the categorical variable in ANCOVA, what does this tell you about the slopes of the within class regression lines in the groups of the ANCOVA? How is this tested? What is the difficulty in coming up with an estimate of the treatment effect in ANCOVA if the within class regression slopes are not parallel?
If the within class regression slopes differ as a function of the categorical variable, the slopes differ, and we do not have an additive effect of the treatment variable on the criterion for all values of the covariate. We can see if the regression lines are parallel by fitting separate regression lines to the data in each group graphically. We can also test for a significant interaction between the treatment and covariate using: Y-hat= b0 + b1 COV + b2 T + b3 T * COV The difficulty here is that the main treatment effect is not constant on all levels of the covariate (no additive effect). We need to do more investigating to understand the actual effect because it varies.
What is the problem in regression diagnostics with clusters of errant points?
If you have a cluster of points that are working together to influence the outcome, then removing one of the cases will not alleviate the problem since the other errant points will continue to affect the outcome.
Explain influence. Is a specific regression model required to measure influence? How does influence relate to leverage and distance. Does high influence necessarily mean that the point is affecting the regression outcome?
Influence is the amount by which a data point changes the regression equation. Influence = leverage x distance If you have BOTH high leverage and high distance it will affect your predicted value and your regression coefficients High influence change in each regression coefficient and the regression intercept and therefore change in the predicted score.
What weaknesses of ANCOVA do propensity score approaches address? What is meant by the region of support?
It permits proper adjustment for a large number of covariates It provides good checks on the adequacy of the adjustment model It does not rely on properly representing the functional form of the relationship between the covariate and the outcome It does not allow risky extrapolation The region of support: where there are cases from both groups. Outside the region of support, no direct comparison is possible. Propensity scores permit comparison only within the region of support.
In the regression equation Yhat = b0 + b1X + b2C + b3XC, if you change the coding of the categorical variable (C) with two levels from dummy to contrast coding, will the regression coefficient for the interaction change or remain the same?
It will stay the same; both will represent a 1-unit change between the slopes, so they will be the same
What are the three characterizations of errant data points?
Leverage - the extent to which the case is close to or far from the rest of the cases, in terms of scores on the predictors X1...Xp only. Distance (discrepancy) - the extent to which the score on the criterion is extreme, give the values on the set of predictors. Influence - the extent to which a single data point changes the outcome of the regression analysis.
Explain leverage. Is a specific regression model required to measure leverage? Does high leverage necessarily mean that a point is affecting the regression outcome?
Leverage is how far the point is from the centroid of the predictors. No regression equation is required to measure leverage No, high leverage means there is potential for that point to affect the regression outcome
Be able to take a 2 x 3 ANOVA (two levels of factor A and three levels of factor B) and show the design matrix. Test the hypotheses that the mean of B1 does not differ from the mean of (B2 and B3) and that the means of B2 and B3 do not differ
Look at study guide for answer
Consider a 2 x 2 factorial design. There are two levels of A (low, high) and two levels of B (low, high). Write the design matrix for the 2 x 2 design in ANOVA. Otherwise stated, what do C1, C2, and C3 (code variables) look like?
Look at study guide for answer
Depict the relationship among the variables in ANCOVA using Venn diagrams and path models in a true experiment versus in a non-experimental control group (a.k.a. observational) study. Why is there no correlation expected between the covariate and the treatment in the true experiment? With what two things can the covariate be related in this type of quasi-experiment with nonrandom assignment?
Look at study guide for venn diagrams Why is there no correlation expected between cov. and treatment in true experiment? "If subjects are randomly assigned to treatment conditions, the expectation is that there will be no correlation between any baseline covariate variable and treatment assignment (unless there is faulty randomization that assigns participants to treatment etc.)" With what two things can the covariate be related in this type of quasi-experiment with nonrandom assignment? This problem of (a) participant selection into treatment results in potential correlation between the pre-existing characteristics of the subject and assignment to treatment. (b) The covariate may also be correlated to a smaller or larger degree with the criterion
What are the two types of measures of influence? What does each measure?
Measures of global change in the whole regression equation. DFFITS was the measure emphasized (also Cook's D). Measures of specific change in each regression coefficient. There is a set of measures for each case, one measure for each regression coefficient in the equation including the intercept.
There are two new methods of treating missing data that have become the current state of the art. One was discussed in class. What is this method? If one uses one of these approaches to handling missing data, what missing data mechanism must be operating to yield unbiased parameter estimates?
Multiple Imputation & Maximum Likelihood are the two new methods for treating missing data. Multiple imputation was discussed in class. Multiple imputation creates multiple copies of the data (typically 20+) each of which has a different set of plausible replacement values. Multiple imputation (and maximum likelihood) produces unbiased parameter estimates when data are missing at random (MAR).
Are unweighted effects codes orthogonal (uncorrelated)?
No, because they don't meet the two definitions/requirements for being orthogonal
Do you get different numbers in the analysis of regression summary table if you use weighted effects codes versus unweighted effects codes to code a categorical variable with unequal group size?
No, there will be identical R^2 and ANOReg tables since these are equivalent models.
Are the pairs of dummy codes in a dummy variable coding scheme orthogonal?
No, they share the same base group and the sum of the weights does not equal zero (0 +1 + 0 = 1)
What does it mean if two codes from a coding scheme are orthogonal (uncorrelated)? What conditions need to be met for the following relationship to hold where c1, c2, and c3 are code variables? r2multiple = r2y.c1 + r2y.c2 + r2y.c3
Orthogonal means the sets of codes are uncorrelated (independent) with each other · Two conditions to be Orthogonal codes: o Sum of the weights = 0 o Sum of the product of the weights for different code variables = 0 o For max interpretability, code should be 1-unit apart (NOT necessary for orthogonality) The proportion of variation accounted for by one is not related to the proportion of variation accounted for by the others. So you can add them up if all three variables are uncorrelated, since they each account for their own unique variance to the squared multiple correlation.
What is meant by the "pick a point approach" to testing group differences in a regression model for a binary treatment vs. control IV, a continuous covariate, and their interaction?
Pick a point on the moderator (covariate, continuous) variable and center the data at this point. Then run the regression equations at that point to test the differences between groups/slopes at that point, which is now at Xc = 0
Be able to identify cases with high leverage, distance, and influence, and indicate how regression cefficients are being influenced by individual points Look at study guide for graph
Point 1: high leverage but stabilizes the equation because it's on the trajectory of the regression line Point 2: high distance that wouldn't change the slope but would raise the intercept Point 3: high influence that flattens the slope
What is the primary potential advantage of ANCOVA over ANOVA in randomized experiments? What is the primary purpose of the inclusion of covariates in ANCOVA in the non-equivalent control group design (a.k.a., observational study)?
Primary advantage in randomized experiments: Covariate included in the design to partial out variability from the outcome that is unrelated to the treatment, thereby increasing the statistical power of the test for treatment effect. Primary advantage in observational studies: For the observational study, the primary role of ANCOVA is to partial out pre-existing between group differences on covariates that are related the criterion that should not be attributed to treatment—these differences are due to participant selection into treatment. In addition ANCOVA might also partial out error variation in the criterion (as in the true experiment).
Why are standardized solutions NOT typically reported with categorical IVs?
Standardized solutions assume that a very good estimate of the population variance is available. With categorical variables, this depends on having a random sample from the population to which you wish to generalize so that the proportion in each category in the sample represents the proportion in the population. When we don't have a representative population, the standardized effect can be widely inaccurate.
According to the Baron and Kenny (1986) causal steps approach, what are the four steps involved in testing mediation?
Step 1. Test the regression coefficient c for statistical significance. Is there an overall effect of X on Y? Step 2. Test the regression coefficient a for statistical significance. Does X have an effect on the mediator M? Step 3. Test the partial regression coefficient b of the relationship between the mediator M and the outcome Y over and above X. Thus, the effect of X is statistically controlled in this step. Step 4. (Optional). To show that the results are consistent with full mediation, test c' in equation 3. This effect should be non-significant (i.e., consistent with 0 effect).
What is meant by the statement that "the different coding schemes represent equivalent regression models?"
The R^2 and F-test of the ANOReg is identical across the coding schemes; same predicted values as well
Why are the VIF statistics typically not reported with a categorical IV with G groups?
The VIF statistic is for a single variable and when you have a code variable then it would give us a separate value for each category. VIF would be appropriate if you had a binary variable but not for anything with more than two categories. Different coding schemes would produce different VIFs.
What is the problem with simply dividing a residual by its standard error to compute a standardized residual?
The case can move the regression plane toward itself and reduce its own residual and increase the residual for all other cases. This would create a larger MSresidual
What measures are on the main diagonal of the hat matrix denoted hii?
The elements on the main diagonal are the measures of leverage of each data point in the dataset
What is the centroid?
The point that represents the mean of every predictor in the data space
How does the ANOVA summary table relate to the analysis of regression summary table when the groups are coded with a set of unweighted effects codes?
They will be identical
What will be the relationship of the resulting ANOVA summary table to a regression analysis in which a set of dummy codes are used to code the four groups, and the criterion is the same as the dependent variable in the regression analysis?
They will be the same When categorical variables are the independent variables, we have a special case--classic oneway ANOVA One value is predicted for each category, and SSregression is entirely due to differences between the treatment groups
Briefly describe how the major robust approaches to multiple regression.
Thiel-Sen Estimator: In the one predictor case, this estimator is highly resistant to outliers. This method estimates the slope by taking the median of the slopes that are estimated using all possible pairs of cases in stand. OLS regression. A downfall is that the standard errors may be larger than other estimators. Least Trimmed Squares: When computing the OLS estimate of the regression coefficient, this method sorts the squared residuals from lowest to highest and trims them with using an apriori percentage. Bootstrapping: Bootstrap sample 2000 times and estimate the regression coefficients and then trim by 2.5% of the highest and lowest values.
Suppose you have two groups that you have dummy coded as follows: C1 C2 T1 1 0 T2 0 1 C 0 0 How would you compare the means of T1 and T2 from the regression output?
To estimate the means, mean T1 = b0 + b1 and mean T2 = b0 + b2. T o test the difference between the means of T1 and T2, recode as follows: C1 C2 T1 0 0 T2 0 1 C 1 0 Then the test of b2 is the test of the difference between the means of T1 and T2.
Are unweighted effects codes centered for equal group size? for unequal group size?
Unweighted effect codes are only centered in the special case that the group sizes are equal in all categories
Suppose you have a categorical variable with four groups, e.g., four geographical regions of the US. You wish to use it as a predictor in a regression analysis. What is the general strategy for employing a categorical variable as a predictor in a regression analysis? What determines which coding scheme is used?
With four groups, the general strategy would be to create G -1 (i.e., 3) coding variables such that Yhat = b0 + b1C1 + b2C2 + b3C3. This would allow us to test the overall group effect. Then we would choose the coding scheme that allows us to test the specific focused hypothesis of interest. The following coding schemes would be used in the following situations: 1. Dummy coding would be used if the hypothesis involved comparing each group with a reference group. 2. Weighted/unweighted effects coding would be used if the hypothesis involved comparing the mean of each group with the overall mean of all the groups combined (i.e., the grand mean). The difference would be whether you would weigh the grand mean or not. Weighted effect coding scheme presumes that we have a truly representative sample such that the proportions of cases in each group is representative of the population. 3. Contrast codes would be used if the hypothesis involved comparing specific differences between groups or combinations of groups.
Write a regression equation for a categorical predictor, a continuous predictor, and their interaction. Rearrange this equation into the simple regression equation for the regression of Y on X at values of the categorical variable C
Y-hat = b1 X + b2 C + b3 XC + b0 Y-hat = (b1 + b3 C)X + (b2 C + b0)
Consider a 2 x 3 ANOVA with 2 levels of A (low, high) and 3 levels of B (low, moderate, high). The design matrix uses five code variables as predictors of the dependent variable. Using a series of regression equations, explain how you would find SSA, SSB and SSAB. What are the numerator degrees of freedom for the A, B, and AB effects?
Yhat = b0 + b1C1 + b2C2 + b3C3 + b4C4 + b5C5 SSA: compare full model to reduced (without A effect b1C1); df= 1 SSB: compare full model to reduced (without B effect b2C2 + b3C3); df= 2 SSAB: compare full model to reduced (without interaction terms b4C4 + b5C5); df= 2
How can the indirect (mediated) effect be tested more directly than with the Baron and Kenny approach? What problem arises in statistically testing the indirect effect?
You can explicitly test the indirect effect with the product estimator ab Problem: the indirect effect ab is NOT a pivot statistic and does not have a standard sampling distribution
What is meant by mediation?
an initial variable (e.g. treatment) can lead to changes in an intermediate variable (mediator), which, in turn, causes changes in the outcome variable of interest X affects Y through mediating variable
Consider gender as a dummy coded variable, 1=male, 0=female. Suppose you have the coefficients for the overall regression equation, where X is continuous and D is a dummy code: Y hat = b1 X + b2 D + b3 XD + b0 Y hat = .4X + .3 D + .2XD + 1.5
b0 (1.5) is the intercept for the group coded zero (female) when X = 0 b1 (.4) is the regression of Y on X in the group coded zero (female) b2 (.3) is the difference in intercepts for the group coded one (male) minus for the group coded zero (female) b3 (.2) is the difference in slopes for the group coded one (male) minus for the group coded zero (female)
If you use contrast codes for two groups, and the contrast code interacts with the continuous variable in the equation, be able to interpret what each of the regression coefficients in the equation measures
b0 (intercept) is the unweighted mean of the two intercepts b1 is the unweighted mean of the two slopes b2 is the difference between the intercepts of the group coded higher minus group coded lower b3 (interaction) is the difference between the slopes of the group coded higher minus group coded lower
If you use dummy coding for two groups, and the dummy code interacts with the continuous variable in the equation, be able to indicate what each of the regression coefficients in the equation measures. What two tests would you perform to see whether each within group regression line differs from 0?
b0 = the predicted value of Y when X = 0 (or Xc = 0, the mean of X, if centered) b1 = the effect of X on Y in group 0; the estimated slope of the reference group b2 = the difference between the intercepts of the two groups b3 = the difference in slopes between the two groups (0 and 1, usually control and treatment); if it is significant, then they have different slopes for the regression of Y on X, indicating an interaction The two tests to perform to see whether each within group regression line differs from 0 are two regression equations using the dummy codes. First, code the Control group = 0; the regression of Y on X in this equation is the effect for the Control group only (b1 coefficient). Then, reverse the coding so that the Treatment group = 0 and the Control group = 1. Then rerun the regression equation. The regression of Y on X in this second analysis tests the significance of the slope for the Treatment group only
What is the general strategy for studying the effect of a point on the regression outcome?
delete the point and re run analysis Compute regression analysis with case included, Repeat the analysis with case deleted, Assess how some result in regression has changed
What does the Johnson-Neyman procedure test?
identifies "regions of significance", that, to identify a value of X above which the treatment effect is significant, for which the elevation of the two simple regression lines differs procedure of testing conditional effects of treatment C at particular values of X
Be able to set up contrast codes if given a specific set of a priori hypotheses example: We predict we need high motivation (A) and high ability (B) to show a good effect as opposed to any other combination of the two variables
low/low, the two low/high combos, would all have -1; high/high would have +3 if there were any groups we didn't want to include in the comparison, it gets a 0
Are dummy codes centered?
no
What is a within class regression line in the ANCOVA? What assumption is made about within class regression lines in analysis of covariance? What other assumptions underlie the use of ANCOVA?
· Consider separating the data into two subsets—we consider the data separately within each of the two treatment conditions, T and C. We can plot the regression of the criterion on the covariate in each of the treatment conditions. Each of these regression lines is called a within class regression line. · Key assumption underlying the analysis of covariance: There is no interaction between the covariate and the treatment variable (e.g., why the lines are drawn parallel; see below) · Other assumptions: the lines are drawn parallel, the relationship between the covariate and the outcome has been properly specified (e.g., for what values of the covariate the treatment effect is significant?), homoscedasticity
What are the two older methods of handling missing data?
· Listwise deletion: eliminates all cases with missing values, resulting in a complete data set · Pairwise deletion: eliminates cases on an analysis-by-analysis basis (e.g., correlations, means, SDs based on different Ns) · These deletion methods assume missing completely at random (MCAR)! We can get biased estimates if we have MAR or MNAR
Describe the three mechanisms of missing data
· Missing Completely at Random (MCAR) o The probability of missing data on Y is unrelated to other measured variables and is unrelated to the values of Y itself; nothing in the data set predicts the propensity to have a missing value there · Missing at Random (MAR) o probability of missing data on Y is related to other measured variables o After controlling for other variables, there is no remaining association between propensity for missing data on Y and the would-be values of Y o Multiple imputation requires MAR assumption! o Dean's list and looking at student GPA...whether someone is on Dean's list is dependent on their GPA · Missing Not at Random (MNAR) o probability of missing data on the dependent variable Y is related to the values of Y itself o The Y score that is missing is related to the value you would have observed if the person had responded; causes substantial bias
In addition to the usual assumptions of multiple regression, what additional assumptions are needed for mediation analysis for causal inference?
· Multiple regression assumptions o Linearity o Homoscedasticity o Normality of residuals o Independence of participants (no clustering) o Independence of errors · Additional assumptions for mediation o No measurement error in the mediator o No reverse causal effects o No confounding (no omitted variables)
What are the three rules for choosing contrast codes to maximize interpretability?
· Required. The sum of the weights for each code variable must equal 0. · Required. The sum of the product of the weights for each pair of code variables must equal 0. o Given the first two rules hold, if n is the same in each group, the contrast codes are orthogonal—a desirable outcome. · Optional. If there are only positive weights with the same positive value and negative weights with the same negative value, the difference in the value of the set of positive weights and negative weights should equal 1. o (This facilitates interpretation of the results by creating a 1-unit difference between the combined positive weights and the negative weights)
What sort of data configuration lends itself most naturally to coding with dummy codes? What are the three criteria useful in choosing a reference group with dummy codes?
· When you have a reference group of interest that you want to reference against/compare the other groups to, like a control group · Criteria: o Have a meaningful control/reference group o Not a small sample size (small samples gives unstable estimates) o Reference group needs to be a well-defined category
What are adjusted means? How are the adjusted (conditional) means estimated using multiple regression?
·Adjusted means: the predicted mean value for each group given that the covariate is held constant at a value equal to its mean in the full sample. o Estimate adjusted means by centering the covariate o Yhat = b0 + b1T + b2COVc(centered) Typically, the equation is centered at the overall mean of the coviarate (the centroid of the covariates if there's more than 1); the predicted value of Y in each group will be a conditional mean. Dummy coding offers the most straightforward interpretation of the adjusted means