Stats exam 2
Why is the added variable plot (AVP) useful in detecting curvilinear relationships in regression equations with several predictors?
yhat= bo + b1X + b2Z + b3W The AVP focuses on the residuals: ex. residual of y after removing the effects of z and w (on the y axis) and residual of x after removing z and w (on the x axis) Looking at the remaining relationship Provides a visualization of b1X (over and above b2Z and b3W) Gives us an idea of what is going on in that coefficient Is it linear? Are there outliers that may be impacting its value? If we did a scatterplot only, we would not be controlling for Z and W
What is measured by the squared partial correlation of a predictor with the criterion?
· Squared partial correlation: of the variance that is STILL unexplained after X1, what is the amount of variance accounted for by the second predictor?
Rearrange the equation containing an interaction (above) into a simple regression equation showing the regression of Y on X at values of Z. Explain how the regression coefficient in the rearranged equation shows that the regression of Y on X depends on the value of Z.
Ŷ = (b1 + b3Z)X + (b2Z + b0) where (b1 + b3Z) is the simple slope If the interaction is nonzero, that is b3 does not equal zero, then the regression of Y on X is given by the regression coefficient (b1 + b3Z) which shows that the slope of the regression of Y on X depends on (changes depending on) the particular value of Z. If b3 is zero (there is not interaction between X and Z), then the regression coefficient for the regression of Y on X reduces to b1. It also shows that the intercept depends on the value of Z; the intercept is given by (b2Z + b0). If b2 equals zero, then the intercept reduces to b0
Assume X and Z are centered. If the b3 coefficient is significant in the above equation, what does that tell you? Draw a sketch of an interaction in which the b3 coefficient is positive, is negative, and explain in words the condition under which the b3 coefficient will be positive or negative.
ŷ = bo + b1Xc + b2Zc + b3XcZc b3 sig = interaction Pos versus neg b3 neg --- z high on bottom b3 pos -- z high on top
Describe the general approach of a piecewise regression model.
("broken stick" regression) The range of X is broken up into segments. A linear regression line is estimated for each piece with the regression lines joined at the boundary of the segments.
Interpret the b1 coefficient in three ways, assuming X has been centered before analysis.
(1) One interpretation of b1 for centered predictor (X- X ) is that it is the average of all these little regressions across the range of X, the average linear effect in the data (2) The second interpretation applies to both centered and uncentered X. It is that b1 is the regression of Y on the predictor at the value of zero on the predictor. (3) Centered (X- X ) always equals zero at the arithmetic mean of centered predictor (X- X ): b1 is the slope of Y on centered X at the mean of centered X.
How would you estimate the additional proportion of variance accounted for by a cubic spline over a basic cubic regression model?
(basic cubic regression equation) (cubic spline equation) We would do a gain in prediction test - compare the pure cubic with the cubic spline to see any gains in prediction.
Consider the following regression equation: ŷ = b0 + b1X1 + b2X2 + b3X3. Suppose there is very high multicollinearity, say VIF = 20. What effect does this have on R2 from the equation, important for overall prediction? What effect does this have on the estimates of b1, b2, and b3, important for explanation (hypothesis testing)?
As indicated by the values of R2 and adjusted R2, and the standard error of estimate prediction is not adversely affected by the inclusion of variables that are multicollinear. Despite extraordinarily low levels of tolerance and high levels of its reciprocal VIF, prediction is not adversely affected. There is no need to drop predictors from the regression equation because the partial regression coefficients have the opposite from the expected sign (i.e., expected: the sign of the Pearson correlation). The bottom line: Trust the overall prediction as represented by R2 and adjusted R2. Be skeptical of the value of the individual regression coefficients given the high multicollinearity and large standard errors.
What problem has occurred if you get an error message "determinant = 0" or "results are aliased"?
As tolerance approaches zero the result will be considered undefined. This would occur in a case of perfect multicollinearity or "exact multicollinearity". If the determinant of the covariance matrix (or covariation or correlation) is 0, then division by 0 will occur and the results of the regression analysis will not be meaningful. The variable(s) that are redundant with the others need to be removed from the model; otherwise it cannot be estimated.
As a prologue to the development of bootstrapping in one predictor regression case, we considered one set of assumptions for the derivation of statistical tests and confidence intervals for bYX. What were the assumptions? What are their implications? What three desirable properties of the estimates of bYX and its standard error resulted?
Assumptions: Normal distribution Observations are independent of each other Implications: if assumptions are not met then our findings can be very wrong Desirable properties: Unbiasedness Consistency Minimum standard error
What does additivity mean in a regression equation containing predictors X and Z and the criterion Y?
No interaction between X and Z Intercept is conditional
. If I test b1 in the regression equation, = b0 + b1X1 and in the equation = b0 + b1X1 + b2X2 +b3X3, will my test of significance in general produce the same result? Why? Is there any case in which it will produce the same result?
Not likely because the second equation introduces predictors that may overlap in accounting for variance explained by b1. The only case where it would produce the same result is if there is no multicollinearity between the b1 predictor and the other predictors.
What do I learn from the CI that I do not learn from the null hypothesis significance test?
The confidence interval provides provides the point estimate as well as interval estimate. Confidence interval gives the best point estimate and gives us the degree of precision with which the parameter is estimated. CIs tell us more than null hypothesis significance tests; CIs tell us about the size, precision, and statistical significance.
What problem(s) have potentially occurred if you get an error message "negative determinant" or "negative eigenvalue"?
The correlation matrix does not have the proper structure, possible causes: Data input error Analysis of a theoretical correlation or covariance matrix such as: Matrix is corrected for attenuation due to measurement error Tetrachoric or biserial correlations are computed to estimate a correlation on a dataset to see what it would look like if the data were continuous and normally distributed. Pairwise deletion took place so each correlation in the matrix is based on a different set of data.
How can you use the approach you described in question 11 to make theoretical arguments about the importance of particular predictors or sets of predictors, taking into account other predictors?
The decision of which variable to use first based on theory × Estimate unique contribution of predictor/set of predictors above and beyond others to make theoretical arguments ×Example: health protective behaviors & cancer screening - believe that psychological factors determine whether or not patients will take recommendation of physician and undergo screening - psychological factors predict screening above and beyond medical input
What do we mean by "higher" order terms in regression analysis? Give an example of a regression equation containing a higher order curvilinear term.
The highest order term has the highest exponent, everything below highest term is conditional Y^ = b1X +b2X2+b3Z+b4XZ+b5X2Z+b0
What factors determine the fit of a spline to the data?
The number of knot points The location of the knot points The degree of the polynomial used in the local regression
Moore, McCabe, and Craig (readings) describe the use of bootstrapping in testing the 25% trimmed mean. Please give an outline of the procedure.
A Normal quantile plot was used to assess the distribution - given the strong skew (non-normal distribution) it was concluded bootstrapping would be appropriate as it does not need distribution facts to be used (i.e., SD too difficult to estimate given the data) Drew resamples (in the example 3,000 resamples of 150) Calculated the 25% trimmed mean for each resample Form bootstrap distribution with those 3,000 values In the example we saw the bootstrap distribution is close to normal (Figure 16.9 in the chapter)
Given two predictors X1, X2, and a criterion Y. The range of the correlation between X1 and X2 will not always match the potential range from -1 to +1. What can constrain r12 to be less than 1.0 in magnitude?
A correlation matrix can not have a negative or 0 determinant. These restrictions lead to the limits on the potential value of correlation coefficients. These limits lead to a variety of predictor-outcome configurations and different results for the values of standardized regression coefficient. While the theoretical range is -1 to 1, enhancement, redundancy, and suppression may limit the range of correlation.
What is the general form of a polynomial equation?
A polynomial of order p has (p-l) bends or inflections
What problems do high degree polynomials have? Why are lowess and splines preferred to high degree polynomials as a representation of the relationship of Y to X in the data?
A. C. Atkinson (1985) makes the strong argument that findings regarding polynomials of order > 2 are very unlikely to replicate except under two special conditions: (a) a designed experiment in which the levels of X are fixed (e.g., 0, 50, 100, 200, 400, 800 milliseconds exposure time) and (b) the dependent measure (Y) is measured with minimal error. Higher order polynomials are strongly affected by local misfit because polynomial regression provides a global fit to the data and ignores problems of poor fit in local regions of X. Outliers and other problems can cause problems in other regions of the curve. Lowess and splines allow for local fit and provide a better representation of the relationship between X and Y.
What does a trellis plot show? What can it be used to check for?
AKA conditioning plots, or co-plots Look at data to see if an interaction is reasonable E.g., 200 cases and expecting a different relationship at different levels Focal = x, outcome = y, moderator = z Identify some segment of z (the moderator), e.g., the highest 50 cases Look at raw data and put lowess through it for these cases Next, take cases 51-100 and plot and put lowess Next, cases 101-150 Then, lowest cases 151-200 Look at relationship x and y as a function of z in each plot without forcing any model on it Compare the different segments of the data
What is meant by an equivalent regression model?
An equivalent regression model is one that produces exactly the same overall fit (R2) and predicted values (y-hat), but which leads to different parameter estimates and hence interpretations of the parameters. The raw and centered quadratic regression equations are examples of alternative parameterizations.
What is the difference between the bootstrap t confidence interval and the percentile bootstrap confidence interval? When is the percentile bootstrap confidence interval preferred?
Bootstrap t CI is when you take the SE of your *bootstrapped* sampling distribution and substitute that value in for the normal theory SE (use the unstandardized coefficient, though, and typical t-critical value) Percentile bootstrap is when you take the 95% CI around your bootstrapped values themselves (i.e. the middle 95% of the 1000 regression coefficients you calculated in your bootstrapped samples) **Preferred when you have an asymmetrical distribution because it's looking at the actual values themselves as opposed to taking a central value and adding to/subtracting from that the allowance factor/margin of error (the bootstrap t method still uses the MOE approach and assumes a symmetrical distribution) T bootstrap: We take statistic we are interested in (e.g., regression coefficient, trimmed mean), we take our parent sample (n = 150), then start bootstrap sample where we sample values with replacement from the parent sample and we get a bootstrap sample that matches the sample size of the parent sample, here = 150. Sampling with replacement means that you can be chosen more than once. Repeat this process for a recommended 1000 times for a bootstrap t confidence interval and a 2000 times for a percentile bootstrap confidence interval For trimmed means (or any statistic), compute standard deviation of the bootstrap sampling distribution of 2000 bootstrap samples LL = M(bar)(sub t) - t(crit) Standard error of trimmed mean (standard deviation) ? UL = M(bar)(sub t) + t(crit) Standard error of trimmed mean (standard deviation) ?? standard deviation of the 1000 bootstrap bxy values is an estimate Percentile bootstrap: Get 2000 bootstrap estimates of trimmed mean - DOES NOT NECESSARILY NEED TO BE THE TRIMMED VALUE Order them from lowest to highest Choose lowest 2.5% (lower limit; in this case, between 50 and 51), and upper limit, highest 2.5% (in this case, between 1949 and 1950) Advantage: if you have statistic with asymmetric distribution, percentile bootstrap is better
Explain this statement: "In the quadratic regression equation, = b1(X - ) + b2(X - )2 + b0, if the b2 coefficient is nonzero, then the regression of Y on X depends on the value of X". Use a figure to illustrate your answer.
Centered quadratic regression Yhat - b1 (X - Xbar) + b2 (X - Xbar)2 + b0 Slope is defined at specific values of X On frown curve, at value of x to the LEFT of the center: slope is positive, whereas at value of x to the RIGHT of the center: slope is negative. Slope is the tangent line
Why is centering used with X and Z in the regression equation = b1X + b2Z + b3 XZ + b0? Why is centered not used with Y.
Centering is used with the predictors (X and Z) to make the coefficients more interpretable Centering is NEVER used with the criterion (Y) because we want to keep the predicted outcome in the same units as the dependent variable
What is meant by nested models? Are models 1 and 2 below nested? ŷ = b0 + b1X1 + b2X2 + b3X3 + b4X4 (model 1) ŷ = b0 + b1X1 + b3X3 + b4X4 + b5X5 (model 2)
Concept behind nested models: Can simplify one model into the other by putting a constraint on it. No these are not nested. Model 2 sets 1 regression coefficient = 0, but also adds a new regression coefficient (b5) not found in Model 1. (See semipartial and partial... page 13)
In the regression equation ŷ = b1 X + b2 Z + b3 X Z + b0, we say that the regressions of Y on X and Y on Z are conditional relationships. What do we mean by this?
Conditional: change in value over rescaling of other variables in the regression equation In a regression equation, the regression coefficients for all but the highest order predictor are conditional. All these coefficients for predictors that have lower order than the highest order predictor are conditional and change when predictors are rescaled (e.g., centered)
Do all regressions of Y on X (all simple slopes) have the same value of the standard error, regardless of the value of Z?
Consider three simple slopes: Zhigh, Zmean, Zlow Assume we have a normal distribution We have far fewer cases for Zhigh and Z low than in Z mean So the standard error for these cases will increase; Zmean's will be the lowest standard error because it has the most observations Also, Z is in the standard error formula so, as a function of Z, the value will change. Answer: no :)
What is the regression equation for a curvilinear by linear interaction? What is the difference between the treatment of complex interactions in multiple regression and ANOVA?
Curvilinear by linear regression equation: ŷ = b0 + b1x + b2z + b3xz + b4x2 + b5x2z ŷ = b0 + b1x + b2x2 + b3z + b4xz + b5x2z In regression, we need to build in all of the desired (hypothesized) terms Ŷ = b0 + b3z + b4xz + b5x2z Test of omnibus X effect Ŷ = b0 + b1x + b2x2 + b3z Test of omnibus XZ interaction effect Compare R2 from full model with R2 from reduced model F-test for gain in prediction In ANOVA: 4 levels of A 3 levels of B Automatically generates the tests: Test of A (df = A - 1 = 3) → linear, quad, cub Test of B (df = B - 1 = 2) → linear, quad Test of AxB; df = (A - 1)(B - 1) = 6 Because of it automatically generates test and hence increase df, you lower your stats power. (whereas in regression, you can just test one specific effect). Disadvantage of regression: if there's an effect, you might miss it.
What defines whether an interaction is ordinal versus disordinal in multiple regression? How does this differ from the situation in analysis of variance?
Disordinal is when simple slopes lead to lines that cross, ordinal no cross within meaningful range of variables In multiple regression ANOVA will always give everything in the output - all main effects In regression can have a more tailored approach Can keep IV continuous instead of making levels If we have A (3 levels) and B (4 levels) ANOVA: put this in and what do we get out? 2 x 2: A main effect (df=2), B main effect (df=3), interaction (df=6) The machinery of ANOVA produces this analysis If we have A (continuous variable) and B (continuous variable) No model just comes out in regression We have to build the model (we think and decide what model to estimate): yhat= b0 + b1A + b2B + b3B² + b4AB + b5AB² This follows my theory while ANOVA is just a generic procedure that produces the omnibus effects The other difference is that in ANOVA the levels are determined by the researcher (fixed levels of the variable; for example, based on the literature (control, low dose, high dose) In regression, we are sampling the levels (not fixed) When considering Ordinal vs Disordinal: In regression, need to look at the range of the data and ask do the lines cross within the range of the data (they will cross eventually but is it within a meaningful range); murkier in regression because data is based on the sample, not on values you fixed (so may change by sample) In ANOVA, we fix the range; so we do not need to worry about looking at the range for ordinal vs disordinal because the range is fixed; if they cross within those fixed levels you have disordinal, if not it is ordinal
Explain what is meant by a "linear by linear" interaction. How many degrees of freedom does such an interaction have in Regression Analysis.
Each straight line would have 1, so 1x1 = 1 1 degree of freedom
The maximum/minimum value of a quadratic regression equation will be at X = . How can we establish a confidence interval for this estimate? Describe the steps.
Establish a CI through bootstrapping because we don't know the standard error. Two ways: percentile or t Start the same way, take 2,000 min/max bootstrapped values Percentile finds middle 95% of values Can be asymmetric Could make CI larger and it won't be centered around mean Good because we are not forcing possibly non normally distributed data into a normally distributed CI
Know how to compute interactions 1SD above the mean and 1SD below the mean using the computer method. How is the interaction at the mean computed?
First, rearrange regression equation to look at simple slopes For an interaction 1SD above the mean: SUBTRACT the SD from each value (in SPSS, use the "compute" function, in R you just create a new variable: "Zlow <- [formula here]") For an interaction 1SD below the mean: ADD the SD to each value The interaction at the mean is computed by centering Z
What is meant by a focal predictor? What is meant by a moderator?
Focal predictor - The predictor of whose relationship to the criterion we are examining Moderator - the relationship of one predictor to the criterion depends on another predictor
I will give you the formula for the F test for the significance of the squared semi-partial correlation, along with degrees of freedom, and an F table. Using numerical information such as given in question 17 above (squared multiple correlations with different combinations of sets of predictors), be able to test hypotheses that squared semi-partial correlations are zero. Be able to state the hypotheses and compute the F test. For example, test the hypothesis that Set 2 in question 17 contributes significant variance over and above set 1. Assume n=100 here.
H0: ρ2 Y(2.1) = 0 H1: ρ2 Y(2.1) > 0 Fgain = (r2all - r2set1)/(1-r2all )x (n-k-m-1)/(m) = .32-.17/1-.32 x (100-2-2-1)/2 = .15/.68 (95/2) = 11.58 Reject the null. The two variables from Set 2 add significant prediction of the criterion, over and above the predictors in Set 1, F(2, 95) =11.58, p < .01.
What two plots are used to probe these assumptions?
Homoscedasticity - residual scatterplot and Normality of residuals - P-P plot with predicted values (*ZPRED) on X and residual values (*ZRESID) on Y
In the analysis of the bodyweight-brainweight data, what features led to the log transformation of brainweight and bodyweight? What was the result of examining the large positive residuals in this example?
Huge range (two orders of magnitude) led to the log transformation Highly positively skewed data After log transformation, the large positive residuals determined that primates may have largest brain capacity. Addition of primate variable led to higher Rsquared and plot of fitted values versus residuals also improved.
In the regression equation ŷ = b0 + b1X1 + b2X2, suppose you wish to test whether β1 = β2 in the population. What two models would you compare? If X1 and X2 represent variables measured in different units, can X1 and X2 be converted to z-scores and the results of the standardized regression equation be compared? Why or why not?
If we want to test the equality of two regression coefficients then x1 and x2 need to be measured in the same units. We want to see whether the two regression coefficients are equal in the population. We must do it unstandardized. a Using the concept of nested models: ŷ= b1X1 + b2X (b1=b2) = b1 (x1+x2) Constrained b1 to equal b2 Compare this to our full model. If there is no difference, then we can conclude that they are equal. If using standardized, it would only work if we have equal variances.
In the raw regression equation, what is the interpretation of b0, b1, and b2? In the centered regression equation , in which XC = X - , what is the interpretation of (centered regression coefficients)? Will any of the regression coefficients be the same in the raw and centered equation? General formula: Y = Bo + B1(x) + B2(X squared)
In raw regression: b0 = predicted value of y at x=0 b1 = predicted slope of y on x at x=0 b2 = the acceleration (how much the slope changes for a 1 unit change in x); invariant (non-conditional), determines shape In centered: b0 = predicted value of y at mean of x = 0 b1 = predicted slope of y on x at xc = 0; will be the average slope b2 = invariant(non-conditional), determines shape - here is a quadratic Reminder: when the highest order term coefficient is negative, it's the :( face, when positive it's the :)
A traditional strategy in data analysis with continuous variables was to dichotomize the variables and analyze them in a 2 x 2 ANOVA. Give at least two problems with this strategy?
Multiple regression: yhat=b0 +b1X +b2Z + b3XZ ANOVA: yhat= b0 + b1C1 + b2C2 + b3C3 (in this equation we have high and low for each variable) 1. Statistical Power Chopping the variable in half, lowers the statistical power for all effects Effect size is approximately equal to .7 of the true effect size for the main effects and .49 for the interaction Shooting yourself in the foot 2. Bias- particularly in the lower order coefficients When we are doing regression, we make the assumption that we measured X and Z reliably. If we did not, it is not a proper regression model. If we have perfect reliability, we have perfect control for x. If we did not, and x and z are correlated, we will get a biased larger effect for Z (bias in the lower order coefficients).
What are the two perspectives used to develop (derive) multiple regression? What is meant by each perspective in terms of the sampling of cases? How do the results of the two perspectives differ in the linear case? How do they differ in the nonlinear case?
In regression, as it was being developed, people took one of two approaches to derive regression 1. Fixed sampling: Came out of the work of Fisher Ex. grade 1, 2, 3, 4, and 5 ; grade will be one of our predictors (x=grade); the mean will be fixed at 3 because we are taking the same number each time; there is no variation in the mean, standard deviation, correlation between x and y, and correlation between x and z Want to get 100 children; so we do fix sampling and get 20 for each grade randomly; do this every time we take a sample All of the variability will be in y not in x 2. Random approach: more recently gaining popularity Go into elementary school and randomly sample kids; get a distribution (assuming it will be a normal distribution) Use each of these to derive the regression equation Using the linear model: yhat= b0 + b1X + b2z (linear in the variables) Find that using the two approaches: Estimates for b0, b1, b2 are identical Standard errors are identical Statistical tests will be identical BUT statistical power will not be equal; there is no variability of x in the fixed sampling approach so will need a few more participants when using random sampling (about 15% more) Using the nonlinear case: yhat = bo + b1X + b2Z + b3XZ Interactions, quadratics, etc. More complex standard error In random sampling we assume a bivariate normal distribution There is no place to put a nonlinear effect so the derivations will start to differ between fixed and random Parameter estimates are identical But the standard errors differ When you center and test interactions using the fixed effects equations, standard errors are too low If have a moderate to large sample size, percentile bootstrapping or using conservative alpha level (alpha= .02) can help In non-linear case, standard errors tend to be too low in random sampling, need to bootstrap to get better standard errors, minimizes type I error rates
Which terms in the regression equation = b1X + b2Z + b3 XZ + b0 are conditional? Which are non-conditional?
In the interactive regression equation, the XZ predictor is of order 2, the highest order in the regression equation, and thus the b3 coefficient is non-conditional. The b1 and b2 coefficients are both conditional, since they are not the highest order terms in the regression equations. Both predictors X and Z are of order one. The intercept is conditional as well, with order of zero
What are the basic assumptions underlying multiple regression?
Linearity Linearity means that the predictor variables in the regression have a straight-line relationship with the outcome variable. If your residuals are normally distributed and homoscedastic, you do not have to worry about linearity. Homoscedasticity The homoscedasticity means that the variance around the regression line is the same for all values of the predictor/independent variables. This is also known as homogeneity of variance. Normality of residuals Independence of observations
How can a polynomial regression equation be used to test a prediction of a curvilinear relationship of a predictor to the criterion? Why did Atkinson recommend that polynomials above a quadratic not generally be used? What problems occur with high degree polynomial regression equations?
Polynomials are equations for curves. A polynomial of order m has (m-l) bends.Polynomial regression provides a global fit to the data and ignores problems of poor fit in local regions of X. Outliers and other problems can induce problems in other regions of the curve Problems ("Assumptions and Nonlinearity p. 5 ex) Above a quadratic, we need highly reliable ind/dep variables and we need a high range of ind variables (x's). If we go above that, much of the time what you'll see is too much error Higher order polynomials are strongly affected by local misfit (affects other areas) We want some function that addresses local fit: lowess and splines (they address a fit within a narrow region, so it doesn't get messed up like the higher order polynomials do)
What are the general expressions for R 2 mult in terms of (a) SS regression and SS total and (b) the standardized regression coefficients and the validities?
R2mult = SSregression/SSy(total) R2mult = b*1ry1 + b*2ry2 + b*3ry3 + b*4ry4 (standardized coefficient times its validity (correlation of whichever predictor X times Y)
What conditions lead us to prefer transformations over quadratic or piecewise regression when there is a nonlinear relationship between X and Y?
Recall that our basic multiple regression model assumed: Linearity, Homoscedasticity, Normality of Residuals, Independence of Observations We have seen how to address non-linearity through polynomial models and through splines. These approaches imply that assumptions 2 and 3 are met. However, in many cases more than one of the assumptions 1-3 will be violated. In these cases, transformations of X, Y or both may be helpful. In addition, transformations can sometimes address outliers.
Reading: Consider the two predictor regression equation = b0 + b1X1 + b2X2. Friedman and Wall (2004) describe four types (regions) of relationships as r12 is changed. They focus on and . What are the names of the four regions? Give a brief description and the characteristics of each.
Region I: Complementarity (Termed Enhancement) Here both X1 and X2 correlate positively with Y. But, X1 and X2 have a negative correlation. R2Y.12 > r2Y.1 + r2Y.2 Region II: Partial Redundancy of X1 and X2 (termed redundancy) Most common case: Both X1 and X2 have a positive correlation with Y (+ validities). Both b*Y1 and b*Y2 will be smaller in value than the corresponding rY1 and rY2. Region III: Suppression R2Y.12 < r2Y.1 + r2Y.2 Suppose we have a math aptitude test (X1) given the first day of class that predicts the score on the final exam (Y). The aptitude test is highly speeded so most students do not finish. Suppose we also have a test of reading speed (X2) that is correlated with the score on X1. Reading speed (X2) might or might not be positively correlated with the score on the final exam (Y) which reflects mastery of the mathematical concepts. Then, if we include both X1 and X2 in a regression equation predicting Y, the influence of the second component (speed) is removed. Region IV: Enhancement (again) R2Y.12 > r2Y.1 + r2Y.2 as in region 1, but now r12 tends to be relatively high so there may be substantial multicollinearity.
What is meant by a linear x linear interaction?
Regression of Y on X is linear across the range of Z Regression of Y on Z is linear across the range of X Straight lines crossing If there were any nonlinearity in the interaction, e.g. a curvilinear relationship of X to Y at some values of Z but not others, would the linear x linear interaction detect this curvilinearity? No, regression of Y on X/Y on Z is linear, you're only looking for linear relationship Same as 2X2 in ANOVA, not a test of curvilinearity What term in a regression equation would detect a curvilinear relationship of X to Y that varied in curvature as a function of the value of Z? Ŷ = b1 X + b2 X2 + b3 Z + b4 XZ + b5 X2Z + b0 The curvature in X is a function of Z
Explain how to find an appropriate "standardized solution" in regression with interactions. Why is the standardized solution provided by R or SPSS that accompanies the solution with the centered predictors an inappropriate standardized solution?
Right way: Standardize X and standardize Z -> create cross product Wrong way SPSS takes cross product of X and Z and then standardizes it Proper "Standardized Solution" for equation containing interaction Step 1: convert the raw x, raw z predictors to the standardized z scores Step 2: Form the cross-product of the standardized scores, zx * zz Step 3: Compute zy, the standardized score corresponding to criterion Y, from zy = (Y − Ybar)/(sy) where sy is the standard deviation of Y Step 4: RUn the regression analysis using the standardized scores you have calculated Step 5: Report the "raw" regression equation from the regression analysis in Step 4. This is the appropriate "standardized" solution to report. Ignore the standardized coefficients from this analysis. Step 6: If you want to report any simple slope analysis in standardized form, use the "raw equation" from Step 5 as the basis of this analysis. The issue is one of order of operations. The computer programs employ the following order of operations: The crossproduct term (either XZ or (X- X )(Z- Z ) is entered as a predictor by the researcher. Conceptually, the computer program standardizes the crossproduct term, forming the Z score of the product, from which the standardized regression coefficient for the Interaction is computed
What is a scatterplot matrix? What does a scatterplot matrix potentially tell us that a correlation matrix does not?
Shows the relationship for the two variables on the x and y axis Can see the relationship between the two variables for each cell Some statistical packages give a histogram for x by x or y by y etc. Can see if there is a curvilinear relationship and outliers Can not see this curvilinear relationship in the correlation matrix Correlation matrix gives only the linear relationship The scatterplot matrix is much more general Shows us what the relationship looks like rather than give just a singular value Shows us outliers Shows potential curvilinear relationships
What do we mean by simple (conditional) regression equations, simple (conditional) slopes? To what are they analogous in ANOVA.
Simple regression equations and simple slopes are those that look at the effects of one variable while the other is held constant (effects of X at a constant value of Z for example) Linear by linear interaction in regression: Ŷ = b0 + b1x + b2z + b3xz (b1 + b3Z)X = simple slope of y on x Choose specific values of z for high/medium/low (e.g., cut scores on BDI) ^ There are simple slopes at the different points In ANOVA, if only two levels can only have a straight line -- linear by linear interaction Test simple effects (difference between means conditional on the values) Analog of the simple slopes in regression
When predictors overlap in their association with the criterion, what approach is used to apportion variance accounted to individual predictors?
Specify order of priority of the variables Assign first variable all of the overlap of variable with the criterion Assign second variable its overlap with the criterion that is unique (squared semi-partial correlation)
What is measured by the squared semi-partial correlation of a predictor with the criterion?
Squared semi-partial correlation: additive variance based upon adding a new predictor into the model - basically, this is asking: what proportion of the variance is uniquely added by this predictor?
How do I construct a confidence interval for bi in the p predictor case? How do I interpret the CI [5 < β1 < 10]= .95? Is this regression coefficient significantly different from 0?
Standard deviation of y times the standard deviation Margin of Error (ME) = tcrit*(standard error of regression coefficient - sbi ) Create confidence C[bi - ME ≤ β1 ≤ bi + ME] = 1- α The confidence interval does not contain zero. We can be 95% confident that the true value of β1 does not contain zero. We can conclude that the regression coefficient is significantly different from zero.
What defines synergistic (enhancing), buffering, and compensatory interactions?
Synergistic (enhancing): one variable strengthens the impact of the second variable-- they work in the same direction Buffering: one variable weakens the impact of the second variable or even "cancels out" the effect of the second variable Compensatory Interactions: In which two predictors relate to an outcome in the same direction. However, the interaction between them is in the opposite direction of the direction of the relationship of the individual variable to the outcomes. As one predictor increases in value, the strength of the relationship of the other predictor decrease. This kind of interaction has also been referred to as an antagonistic or in reference interaction.
Explain how you would use the rearranged equation (Q. 57) to generate three simple regression equations, one at ZH (one standard deviation above the mean of Z), one at ZM (at the mean of Z), and one at ZL (one standard deviation below the mean of Z). Be able to do this if given a numerical example. Be able to take R or SPSS printout and reproduce the three simple regression equations. Be able to plot simple regression equations.
Take the mean of Z To get ZHigh, subtract a SD from the mean of Z To get ZLow, add a SD to the mean of Z ZMean is itself Ŷ = b0 + b1x + b2z + b3xz (centered interaction) (b1 + b3Z)X + (b2Z + b0) Example: b0=8, b1=2, b2=5, b3=1, mean of z=0, SD of z= 10 At mean of Z [2 + (1*0))X+ (5(0) + 8)] 1 SD below mean of Z: [(2-10)X+ (5(10) +8)] 1 SD above of Z: [(2+10)x + (5(-10) +8)]
What is meant by suppression?
The authors use the term suppression in a situation where a suppressor combined with another variable (X) increases the amount of variation explained (R squared) than X with a different highly correlated variable. The inclusion of the suppressor in the regression equation removes (suppresses) the unwanted variance in Xi, in effect, and enhances the relationship between Xi and Y by means of fiyi 2. Example: one variable that is a math ability test (speeded) and one variable that is final course grade and one variable response speed. There is no correlation with response speed and final course grade. But there is a high positive correlation between response speed and math ability test and between math ability test and final course grade. The suppressor is response speed. The correlation with the suppressor included will be higher than the correlation between math ability test and final course grade alone.
Why must all lower order terms be included in a regression equation when the prediction from a higher order term is being examined for significance?
The coefficient for the highest order term is only an accurate reflection of the curvilinearity at the highest level if all lower levels are partialed out. When testing the model, it would produce bias if you did not incorporate the lower order terms. Also messes up the location on the y axis as well.
What does it mean if a regression equation is "linear in the coefficients"?
The predicted score is a linear combination of the coefficients Two cases for linear in the coefficients: ŷ = b0 + b1logx+ b2ez Still a linear equation but not a linear relationship Ŷ = eb1xeb2zeb3weresidual (this isn't itself linear, but it can be linearized) Can take log ŷ = b1x + b2x + b3w + residual Can linearize this equation, but possible that other ones would not be able to be linearized 28. What does it mean if a regression is "linear in the variables"? Two meanings: Constant change in X leads to constant change in Y (slope is constant) Residuals are normally distributed Ŷ = b0 + b1x + b2z (linear in the variables) y vs x and y vs z plots both look linear functional form of x and y and z and y are linear
How do the squared partial and squared semi-partial correlation differ in what is being measured?
They both talk about the unique contribution effects - but in semi-partial, it's the unique contribution out of all of the variance; in partial, it is the unique contribution out of what has yet to be explained of the variance
What is meant by a third order predictor? Give two examples.
Third order predictor is one with the power of 3: this can be for a cubic relationship between the criterion and the predictor OR for a three-way interaction between predictors OR a linear x curvilinear interaction ( e.g X2Z)
What do interactions between predictors signify in multiple regression analysis? Is there additivity of two predictors? Give an algebraic explanation of how the regression of Y on X is affected if there is an interaction. What does the geometry of the surface look like (sketch)? ŷ= b1 X + b2 Z + b3 XZ + b0
This is a non-additive model. The new term XZ literally represents the product of the predictors X and Z. For each case, the score on predictor X and the score on Z are multiplied together to form a score on a third predictor XZ. This third predictor XZ carries the interaction between X and Z. The product XZ of the two predictors is a nonlinear function of each predictor X and Z, which produces the warp in the regression surface. For example:
When we compute the standard error of an individual regression coefficient, we have to take into account the redundancy of the predictor in question with other predictors. How is this measured? That is, be able to explain the generalization of the measure of tolerance and the VIF to the multiple predictor case. What characteristic does the VIF have for X1 and X2 in the two predictor case that it does not have in the four predictor case for X1, X2, X3, and X4?
To take into account the extent to which a predictor is redundant with the other predictors we need to compute the extent to which the standard error that is being computed is redundant to the other predictors. Do this by predicting each predictor from the remaining predictors (squared multiple correlation). Tolerance: [1-r^2] ← this is the non-overlapping variance In multiple regression, tolerance is used as an indicator of multicollinearity. Tolerance is estimated by 1 - R2, where R2 is calculated by regressing the independent variable of interest onto the remaining independent variables included in the multiple regression analysis. All other things equal, researchers desire higher levels of tolerance, as low levels of tolerance are known to affect adversely the results associated with a multiple regression analysis. LINK What characteristic does the VIF have for X1 and X2 in the two predictor case that it does not have in the four predictor case for X1, X2, X3, and X4? The only difference in calculating VIF in a two predictor case and a p predictor case is we expect each predictor to have a different standard error and the equation is slightly more complex. Remember: in two predictor the standard errors of the two standardized regression coefficients for the two predictors were equal. When there are only two predictors, the overlap for one against the second is the same as the second against the first. In the case where there are 3+ predictors, the overlap might or might not be the same. As VIF gets bigger, there is more and more overlap between the predictors. Multicollinearity = extent to which the predictors are measuring the same thing.
What is the purpose of Fisher's r to z transform?
To transform a skewed distribution into a close relative of the normal distribution.
How does the VIF relate to the tolerance? How is the VIF interpreted?
Variation Inflation Factor (VIF) is the reciprocal of tolerance or 1/Tolerance. We want large tolerances and small VIFs to suggest low multicollinearity. "Thus the VIF provides an index of how many times the variance of the reg coefficient i is inflated relative to a reg equation in which i is uncorrelated with all other independent variables.
Give a brief step by step outline of how bootstrapping is conceptually conducted.
We take a sample (say a sample of 50 cases) and then we create ~1000 new samples of 50 cases, randomly selecting (*with replacement*) cases from the original sample. With each "new" sample, we calculate the statistic(s) of interest (mean, regression coefficient, etc.) This then creates a HUGE number of sample statistics, so large that the average value of each statistic is closer to the expected value of the population parameter (mean, regression coefficient, etc.). These sample statistics represent the sampling distribution of each parameter of interest. Have a parent sample, take a large number of bootstrap samples from the parent sample all with the same sample size -- sampling with replacement; then, calculate the statistic of interest (e.g., regression coefficient) in each bootstrap sample
When can bootstrapping produce an advantage over standard (normal theory) approaches?
When non-normal: high skewness and high positive kurtosis Good time to use specifically the percentile bootstrap method With non-pivot statistics
One parameterization of the piecewise regression model is as follows:
Xc = [X is centered at the cutpoint, so Xc = 0 at cutpoint] If X ≤ 5, D1 = 1 and D2 = 0 If X > 5, D1= 0 and D2 = 1 [D1 and D2 are "on-off switches." The value of Di = 1 indicates the segment that is being considered]. In this parameterization, what do b0, b1, and b2 represent? Parameterization 2: b2 as slope in the second segment (vs. Par 1: where X is not centered and b2 is change in slope) b0 is the intercept, the predicted value of Y when Xc = 0 (y-hat at cutpoint) b1 is the slope of the first segment b2 is slope of the second segment
How do we build a linear by quadratic interaction into a regression analysis? How many degrees of freedom does such an interaction have in regression analysis.
Yhat = bo + b1X + b2X² + b3Z + b4XZ + b5 X²Z (the underlined is the full interaction term) 2 degrees of freedom because we have two interaction terms Linear by linear: b4XZ Quadratic by linear: b5X²Z
I will give you the equation for the standard error of a simple slope for the regression of Y on X at values of Z: sebj = (s11 + 2 Z s13 + Z2 s33)½ Explain what terms from Sb (the covariance matrix of the regression coefficients) go into this expression for the standard error. To what does Z refer to in this expression? Be able to compute this standard error if given the matrix Sb and a particular numeric value of Z (say, e.g. ZH = 20, ZL = 10). Also be able to compute the t-test for the simple slope and know the degrees of freedom for the test.
Z is the predictor - plug in value in which you want to hold Z constant Given a variance/covariance matrix s11 variance of b1 S13 covariance of b1 and b3 S33 variance of b3 t= simple slope/standard error Degrees of freedom will be the same as residual variance n-p-1
Suppose you had the following information about five predictors. The predictors can be divided into three sets:(a) Set 1: Predictors 1 and 2 form a demographic set (b) Set 2: Predictors 3 and 4 are two personality measures (d) Set 3: Predictor 5 is an experimental treatment (intervention vs. control) Here are a number of squared multiple correlations that result from regression analyses: (1) Set 1, predictors 1 and 2 only: r2y.12 = .17 (2) Set 2, predictors 3 and 4 only: r2y.34 = .21 (3) Set 1 plus Set 2, predictors 1,2,3, and 4: r2y.1234 = .32 (4) Set 1 plus Set 2, plus Set 3, predictors 1,2,3,4, and 5: r2y.12345 = .41
a. What are the values of the squared semi-partial and squared partial correlations of set 2 with the criterion, over and above set 1. Know formulas and show calculation. Semi-partial Set 1 alone: r2y.12 = .17 Set 1 and set 2 predictors - .32 Set 2 over Set 1: .32 - .17 = .15 = r2y(34.12) 15% of the TOTAL VARIANCE IN Y can be uniquely attributed to set 2 Partial = (.32 - .17)/(1-.17) = r2y34.12 = .18 18% of the variance in Y that is UNEXPLAINED BY SET 1 can be uniquely attributed to set 2 b. What are the values of the squared semi-partial and squared partial correlations of set 3 with the criterion, over and above combined sets 1 and 2. Know formulas and show calculation. Semi-partial Set 3 over combined Set 1 and Set 2: .41 - .32 = .09 = r2y(5.1234) 9% of the TOTAL VARIANCE IN Y can be uniquely attributed to set 3 Partial (.41 - .32)/(1-.32) = .13 = r2y5.1234 13% of the variance in Y that is UNEXPLAINED BY SETS 1 and 2 can be explained by Set 3 c. Using the information from a. and b., show how the squared multiple correlation of predictors 1,2,3,4, and 5 with the criterion is built up from the squared semi-partial correlations. Sum semi-partial correlations for each set .17 + .09 + .15 = .41
How do the tests of the significance of each regression coefficient in the multiple regression analysis Ŷ = b0 + b1X1 + b2X2 + b3X3 + b4X4 of the original data differ from that of the analysis (a) in which the predictors (but not the criterion) have been centered? (b) In which the predictors and criterion have been standardized (both X variables and Y variable are converted to z-scores prior to analysis)?
b0 + [b1X1 + b2X2 + b3X3 + b4X4] → highest order term is first order (specified with []) Your highest order do not change and lower order coefficients change. Values change - significance does not change Remember when centering, When centered, b0 = Ŷ Not centered, b0 = Ŷ at X= 0 That means test of significance in highest order term did not change. Change in the lower order terms. Significance does not change Standardized Case: b0 will not = 0 in general, unless X and Y are uncorrelated Test of significance may differ from unstandardized test because with standardizing we use the standard deviations from each predictor and these may have a differing impact based on each predictor. Steve say to trust the test of the unstandardized coefficients.
In the regression equation, Predicted weight loss = b1pillgrp + b2exercise + b0, pillgroup = 1 if participant takes a weight loss pill and 0 = if participant takes a placebo (control), and exercise is the number of hours of exercise per week. What do b0, b1, and b2 represent in this regression equation?
b0 = weight loss if given placebo and you don't workout (aka, Jack) b0 will change as a function of X versus X-centered b0 = predicted value of weight loss when piacebo = 0 + exercise = 0 (0 hours exercise). b0c = predicted value of weight loss when placebo = 0 + centered exercise = 0 (mean exercise) b1= increase in weight loss if given the pill b1 = difference in level between the two regression lines (constant) b2 = increase in weight loss for every 1 hour of exercise. b2 = slope of weight loss on exercise on regression line Do you even know what b0c + b1c is? It's the mean in pill group.
Suppose you analyze a data set with the equation Ŷ = b1X + b2Z + b3 XZ + b0, and you use centered predictors. You then repeat the regression analysis with uncentered predictors. Which coefficients will change across analyses and why?
b0, b1 and b2 coefficients will have changed from the uncentered to centered interactive equation. Only the b3 coefficient for the interaction remains constant when the predictors are rescaled from uncentered to centered. WHY: the higher-order predictors (b3 in this case) are NON-CONDITIONAL so they do not change with centering. The lower-order predictors (here, b0, b1, and b2) are CONDITIONAL so they DO change with centering. When a variable is "conditional" that means that it's value is "conditional" on where you are on the x-axis, so centering the x-axis would change those variables.
Assume that we are working with the equation above and that both X and Z are centered, and that the XZ term is the product of centered X times centered Z. What are two interpretations of the b1 coefficient, the b2 coefficient.
b1: Regression(slope) of Y on X at mean of Z b1: Average slope of Y on X b2: Regression(slope) of Y on Z at mean of X b2: Average slope of Y on Z
What does the b2 coefficient tell you about the relationship of X to Y?
b2 coefficient (regardless of whether X is uncentered or centered). It gives the extent (how much curvature) and direction (concave or convex of the curvature). The negative b2 coefficient indicates that the curve is concave downward.
Will the b2 coefficient in the equation in question 34 change if the variables are centered or not? Explain your answer. Will the b1 coefficient change
b2 will be identical in the centered and uncentered regression equations because centering does not change the shape of the relationship of X to Y. b1 will change in the centered and uncentered regression equations because it depends upon the scaling of the predictor X.
How do we know if the curve produced by a quadratic regression equation will reach a maximum or minimum?
max if b2 is negative, min if b2 is positive
Be able to look at pictures of quadratic relationships such as those in Display 1 of the Handout 7 Addendum and to indicate what the values of b1 and b2 would be in a second order polynomial describing the data. Otherwise stated, be able to determine the sign and very roughly the values of b0, b1, and b2 from looking at a graph.
positive or negative, downward or upwared, U shaped or inverted U shaped
What graphical checks on the residuals are normally performed in multiple regression with p predictors?
qq plot of residuals against normal distribution scatterplot of residuals on y axis and fitted values on x axis plus loess
Know how to compute the squared semi-partial correlation from two squared multiple correlations (I will not give formula). Say you had r2y.123 and r2y.12. Then the squared semi-partial correlation of X3 with the criterion, over and above X1 and X2 is
r2y(3.12) = r2y.123 - r2y.12
Know how to compute the squared partial correlation from two squared multiple correlations (I will not give formula). Say you had r2y.123 and r2y.12. Then the squared partial correlation of X3 with the criterion, over and above X1 and X2 is
r2y3.12 = r2y.123 - r2y.12 (Partial r of X3 with criterion, with X1 and X2 partialed out) _________________ 1 - r2y.12