Regression Exam 2

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

When can bootstrapping produce an advantage over standard (normal theory) approaches?

1. Assumptions are violated 2. You have non-pivot statistics (e.g., correlations are pivot statistics - the standard error changes as a function of "rho") -Non-pivot statistics have standard errors that rely on the value of other components in the equation 3. Test statistic is unknown 4. When a standard error exists but is asymptotic 5. When you need standard errors for non-parametric curves

Give a brief step by step outline of how bootstrapping is conceptually conducted

1. Take random sample from the population (called the "parent sample") 2. take a random sample of n cases from the parent sample. n is exactly the same size as n in the parent sample - we do this by randomly selecting a value from the parent sample then, after returning it, selecting another value 3. data in the bootstrap sample are then analyzed 4. repeat the process of drawing random samples with replacement from the parent sample then conducting regression analyses for as many times as specified 5. Create empirical sampling distribution of the estimate (e.g., bXY) and either (a) calculate standard error and substitute bootstrapped standard error for the traditional (normal theory) standard error or (b) percentile bootstrap confidence interval

(59) Simple slope standard error equation: Sbyx|z = sqrt (s^2b1 + 2Zsb1b3 +Z^2s^2b3) Explain what terms from Sb (the covariance matrix of the regression coefficients) go into this expression for the standard error. To what does Z refer to in this expression? Be able to compute this standard error if given the matrix Sb and a particular numeric value of Z

1. s^2b1 is the variance of b1 2. s^2b3 is the variance of b3 3. sb1b3is the covariance of b1 and b3. 4. Z is the chosen numeric value for the moderator variable Z. 5. sbyx|z is the simple slope of the regression of Y on X at the chosen numeric value of Z. These values are available in the BCOV matrix in SPSS and vcov matrix in R. The t-test can easily be computed by hand

What is meant by a linear x linear interaction? If there were any nonlinearity in the interaction, e.g. a curvilinear relationship of X to Y at some values of Z but not others, would the linear x linear interaction detect this curvilinearity? What term in a regression equation would detect a curvilinear relationship of X to Y that varied in curvature as a function of the value of Z?

A linear by linear interaction indicates that the regression of Y on X is linear at every value of Z. Conversely, the regression of Y on Z would also be linear at every value of X. To detect an interaction that is not linear X linear, we need to add additional terms to capture it; e.g. Y = b1X + bX^2 + b3Z + b4XZ + b5X^2Z + b0

Explain what is meant by a "linear by linear" interaction. How many degrees of freedom does such an interaction have in Regression Analysis?

A linear by linear interaction is one in which the regression of Y on X is linear at all values of Z, and the regression of Y on Z is linear at all values of X. In the regression equation Y = b1 X + b2 Z + b3 XZ + b0 the interaction we characterize with the XZ term is the "linear by linear" interaction. This measures how the linear regression of Y on X changes in slope as Z changes linearly The interaction has only one degree of freedom

What is meant by an equivalent regression model?

A model that produces exactly the same overall fit (R2) and predicted values (Ybar) as another, but which leads to different parameter estimates and hence interpretations of the parameters o E.g. raw quadratic equation vs centered quadratic equation

What is the difference between the bootstrap t confidence interval and the percentile bootstrap confidence interval? When is the percentile bootstrap confidence interval preferred?

Bootstrap t confidence interval is simply the standard deviation of the bootstrap sampling distribution for bXY, whereas percentile bootstrap confident interval orders the bootstrapped bXY s (or any parameter of interest) and then creates LL by taking the bXY that defines the lowest 2.5% and creates the UL by taking the bXY that defines the highest 2.5%. Percentile bootstrap confidence interval is preferred when CI are skewed since it allows for skewness - whereas bootstrap t CIs assume normality.

What problem has occurred if you get an error message "determinant = 0" or "results are aliased"?

Each matrix has a determinant. For correlation, covariance, and covariation matrices, the determinant is expected to be a small positive number. Is the determinant of the matrix = 0? If the determinant of the covariance matrix (or covariation or correlation) is 0, then division by 0 will occur and the results of the regression analysis will not be meaningful. This occurs when one of the variables is a linear combination of other variables being analyzed (e.g., three predictors X1, X2, X3, where X3 = X1 - X2). This is known as exact multicollinearity. The variable(s) that are redundant with the others need to be removed from the model; otherwise it cannot be estimated.

What does additivity mean in a regression equation containing predictors X and Z and the criterion Y?

Each predictor contributes a certain amount to Y, with no interaction between X and Z (the effect of X is the same across Z and vice versa)

When predictors overlap in their association with the criterion, what approach is used to apportion variance accounted to individual predictors? How can you use this approach you to make theoretical arguments about the importance of particular predictors or sets of predictors, taking into account other predictors?

Each squared semi-partial value is the unique variance account for by that predictor (or set of predictors) over and above all of the other predictors in the model. (The sum of the squared semi-partials will not sum up to (R2SS) because the variance that is explained by more than one predictor is not accounted for--hierarchical regression partitions the variance of R2ss in sets) We can test the gain in prediction with F = [(r2full - r2reduced)/m]/[(1-r2full)/(n-p-1)] -difference between R2full and R2reduced is the increase in the proportion of variation accounted for in the set of predictors that are added

(59) Be able to compute the t-test for the simple slope and know the degrees of freedom for the test

H0: βYX | Z = 0 [simple slope of Y on X at a specified value of Z equals 0]. Ha: βYX | Z > 0 Ha': βYX | Z < 0 t = (byx|z/sbyx|z) df = n - p - 1, here there are three terms in the interaction regression equation (X, Z, XZ), so n - p - 1 = n - 4.

What is meant by nested models? Are models 1 and 2 below nested? = b0 + b1X1 + b2X2 + b3X3 + b4X4 (model 1) = b0 + b1X1 + b3X3 + b4X4 + b5X5 (model 2)

Nested models are when there is a restriction imposed on a full model such that you have a full equation and then the second equation has one of the coefficients set to 1, a constant, or equal to another coefficient. In the example above, Model 1 and Model 2 are not nested. The new regression coefficient (b5) is not found in Model 1, and there are not restrictions imposed on coefficients b0-b4

If I test b1 in the regression equation, = b0 + b1X1 and in the equation = b0 + b1X1 + b2X2 + b3X3, will my test of significance in general produce the same result? Why? Is there any case in which it will produce the same result?

No, it is likely they will not produce the same results since for the b1 coefficient because b1 is likely to be correlated with other coefficients. If they are completely uncorrelated, then the coefficients would be the same.

Will the b2 coefficient change if the variables are centered or not? Explain your answer. Will the b1 coefficient change?

No, it is the highest order term in the equation, so it is unaffected by centering. The value of b1 will change, though, since it is dependent on where X is centered.

Which terms in the regression equation Y = b0 + b1X + b2Z + b3 XZ are conditional? Which are non-conditional?

Non-conditional: b3 Conditional: b0, b1 and b2

(1) Set 1, predictors 1 and 2 only: r2y.12 = .17 (2) Set 2, predictors 3 and 4 only: r2y.34 = .2 1 (3) Set 1 plus Set 2, predictors 1,2,3, and 4: r2y.1234 = .32 (4) Set 1 plus Set 2, plus Set 3, predictors 1,2,3,4, and 5: r2y.12345 = .41 What are the values of the squared semi-partial and squared partial correlations of set 3 with the criterion, over and above combined sets 1 and 2?

squared semi partial correlation = R2 Y(5.1234) = R2y.12345 - R2y.1234 = .41 - .32 = .09 Squared partial correlation of set 2 = R2 Y34.12 = (R2y.12345 - R2y.1234)/(1 - R2y.1234) = .09/.68 = .13

Interpret the b1 coefficient in three ways, assuming X has been centered before analysis

· the slope of Y when X = the mean of X o the regression of Y on the predictor at the value of zero on the predictor · the average of the regressions across the range of X · the average linear effect in the data

How can a polynomial regression equation be used to test a prediction of a curvilinear relationship of a predictor to the criterion? Why did Atkinson recommend that polynomials above a quadratic not generally be used? What problems occur with high degree polynomial regression equations?

Tests of polynomials above quadratic can be difficult to implement in many research areas. A. C. Atkinson (1985) makes the strong argument that findings regarding polynomials of order > 2 are very unlikely to replicate except under two special conditions: (a) a designed experiment in which the levels of X are fixed (e.g., 0, 50, 100, 200, 400, 800 milliseconds exposure time) and (b) the dependent measure (Y) is measured with minimal error. This latter condition normally only occurs when there are multiple assessments of the outcome with each subject under the fixed conditions. With the exception of some areas of experimental (cognitive) psychology, these conditions do not characterize most research in the social sciences.

What are the basic assumptions underlying multiple regression?

· 1. Linearity (linear relationship between variables and Y) · 2. Homoscedasticity · 3. Normality of Residuals · 4. Independence of Observations

What conditions lead us to prefer transformations over quadratic or piecewise regression when there is a nonlinear relationship between X and Y?

· If more than one of the basic multiple regression model assumptions (linearity, homoscedasticity, normality of residuals) are violated · Non-linearity can be addressed by polynomials and splines, but the other two assumptions must be met; if not, it may be better to use transformations

How do we know if the curve produced by a quadratic regression equation will reach a maximum or minimum?

· If the quadratic term is negative (frowny face), the equation will reach a maximum · If the quadratic term is positive (smiley face), the equation will reach a minimum

How do I construct a confidence interval for bi in the p predictor case? How do I interpret the CI [5 < β1 < 10]= .95? Is this regression coefficient significantly different from 0?

· ME = tcrit C[bi - ME ≤ βi ≤ bi + ME] = 1 - α for confidence interval · We are 95% confidence that the population regression coefficient Bi lies between 5 & 10. This is significantly different from 0 since 0 is not included in the interval.

How can we establish a confidence interval for the estimate of the max/min value (-b1/2b^2)?

· Percentile bootstrapping; take the lowest/highest 2.5% tails of the bootstrapped distribution to get the values of the confidence interval

What problems do high degree polynomials have? Why are lowess and splines preferred to high degree polynomials as a representation of the relationship of Y to X in the data?

· Polynomial regression provides a global fit to the data and ignores problems of poor fit in local regions of X (like outliers), and can lead to overfitting that doesn't generalize to other samples · Splines and the lowess line use a local regression approach, considering subsections of data when forming the lines and estimating from those rather than trying to fit all the data at once

What graphical checks on the residuals are normally performed in multiple regression with p predictors?

· Residuals against the fit-values plot and a q-q plot

(1) Set 1, predictors 1 and 2 only: r2y.12 = .17 (2) Set 2, predictors 3 and 4 only: r2y.34 = .2 1 (3) Set 1 plus Set 2, predictors 1,2,3, and 4: r2y.1234 = .32 (4) Set 1 plus Set 2, plus Set 3, predictors 1,2,3,4, and 5: r2y.12345 = .41 What are the values of the squared semi-partial and squared partial correlations of set 2 with the criterion, over and above set 1?

· Squared semi partial correlation of set 2 = R2 Y(34.12) = R2y.1234 - R2y.12 = .32 - .17 = .15 · Squared partial correlation of set 2 = R2 Y(34.12) = (R2y.1234 - R2y.12)/(1-R2y.12) = .15/.83 = .18

Moore, McCabe, and Craig (readings) describe the use of bootstrapping in testing the 25% trimmed mean. Please give an outline of the procedure.

· The 25% trimmed mean is the mean of the middle 50% of the observations, and often does a better job representing the average of typical observations than the median; this helps us focus on the central part of the distribution Example: n= 100 1. Do a large number of samples with replacement (all samples have n = 100) 2. Calculate a 25% trimmed mean in each sample 3. Percentile bootstrap - sort from low to high 4. Identify the value of the 25% trimmed mean corresponding to the lower 2.5% and upper 97.5% For boostrap t, calculate the SD of the 2000 bootstrap samples, then calculate margin of error by multiplying t-crit by this SD

What do we mean by "higher" order terms in regression analysis? Give an example of a regression equation containing a higher order curvilinear term?

· The highest order term is the component in the equation with the highest exponent · Y = b1X + b2X2 + b0 <- the highest order term is the quadratic term, with exponent of 2

How does the VIF relate to the tolerance? How is the VIF interpreted?

· They are reciprocals of each other. VIF provides an index of how many times the variance of the regression coefficient i is inflated relative to a regression equation in which i is uncorrelated with all other independent variables X1, X2, ...(Xi)... Xp

What do interactions between predictors signify in multiple regression analysis? Is there additivity of two predictors? Give an algebraic explanation of how the regression of Y on X is affected if there is an interaction

· They signify that the effect of one predictor on Y is dependent/changes on the levels of another predictor · With interactions there is not additivity, because additivity indicates that the relation of one predictor to Y is NOT dependent on the level of the other predictor

In the one parameterization of the piecewise regression, what do b0, b1, and b2 represent?

· b0 is the intercept (the predicted value of Y at Xc = 0; this is at the cutpoint) · b1 is the slope of the first segment · b2 is the slope of the second segment D is the on-off switch; if D - 1, the switch is on; if D = 0, the switch is off

Be able to look at pictures of quadratic relationships such as those in Display 1 of the Handout 7 Addendum and to indicate what the values of b1 and b2 would be in a second order polynomial describing the data. Otherwise stated, be able to determine the sign and very roughly the values of b0, b1, and b2 from looking at a graph

· b1 gives the regression/slope of Y on X only at the value of X = 0 (when centered, it's at the mean of X)- can l · b2 gives the rate of change in the slope (acceleration), indicating the extent of the curvature and direction of the curvature o negative if the graph faces downward (like a frown) and positive if it faces up (like a smile) · b0 reflects the predicted value of Y when X = 0 (or with centering, where x= mean) it will be positive it it's above the x-axis and negative if it's below the x-axis

What defines whether an interaction is ordinal versus disordinal in multiple regression? How does this differ from the situation in analysis of variance?

An interaction is ORDINAL or NON-CROSSOVER, when the simple regression lines do not cross within the meaningful range of the data. Interactions in which the simple regression lines cross within the meaningful range of the data are referred to as DISORDINAL or CROSSOVER interactions. This differs from ANOVA since ANOVA uses ategorical IVs making the range static

What are the general expressions for R2mult in terms of (a) SSregression and SS total and (b) the standardized regression coefficients and the validities?

· r2multiple = SSreg/SSy · r2multiple = b*1 ry1 + b*2 r2 + b*3 ry3 + b*4 ry4

What two plots are used to probe the assumptions of multiple regression?

· scatterplot of residuals vs fit values, q-q plot

What is a scatterplot matrix? What does a scatterplot matrix potentially tell us that a correlation matrix does not?

A scatter plot matrix is a grid (or matrix) of scatter plots used to visualize bivariate relationships between combinations of variables. A scatterplot can detect a nonlinear relationships (i.e., cubic, quadratic etc.) whereas a correlation matrix cannot

What do we mean by simple (conditional) regression equations, simple (conditional) slopes? To what are they analogous in ANOVA?

A simple slope is defined as the regression of the outcome y on the predictor x at a specific value of the moderator z. Simple slopes parallel testing simple effects in ANOVA following a significant interaction.

What is meant by suppression?

A variable that increases predictive ability of another variable when that occurs and the suppressor isn't correlated with the criterion, we have classical suppression

How do the tests of the significance of each regression coefficient in the multiple regression analysis = b0 + b1X1 + b2X2 + b3X3 + b4X4 of the original data differ from that of the analysis: (a) in which the predictors (but not the criterion) have been centered? (b) In which the predictors and criterion have been standardized (both X variables and Y variable are converted to z-scores prior to analysis)?

A) All except the intercept would be equal to the raw/original data, since the highest order terms are not affected by centering; however, the intercept's test of significance would not change B) All the coefficients would have the same t-values and p-values except for the intercept, which will equal 0 and so will have a p-value of 1

Why is the added variable plot useful for detect curvilinear relationships in equations with several predictors?

Added variable plots display the unique relationship of Y with a specific predictor. An added variable plot for the relationship between Y and X1 removes the effect of X2, X3, etc. from both Y and X1. The added variable plots of the relationship between these residuals clearly displays the form of the relationship between Y and X1 after the effects of the other variables has been removed. It can greatly help in detecting specific curvilinear relationships in models with several predictors.

What were the assumptions of statistical test and confidence intervals for byx? What are their implications? What three desirable properties of the estimates of bYX and its standard error resulted?

Assumptions: o Cases are independent o X and Y have a bivariate normal distribution (linear relationship and constant residuals around the regression line) Three Properties: o Unbiasedness: bYX or b1 is an unbiased estimate of βYX, the regression coefficient in the population. o Consistency: As n increases, bYX approaches (gets closer in exact value) to the true value of βYX. bYX -> βYX o Minimum Standard Error. No other estimate of βYX can have a smaller standard error than the ordinary least squares estimate.

Why is centering used with X and Z in the regression equation Y = b0 + b1X + b2Z + b3 XZ? Why is centered not used with Y?

Centering is used to get the average effect of X on Y at the average Z. Y is not centered because it does not aid interpretation. We are interested in Y's original values

Consider the two predictor regression equation: = b0 + b1X1 + b2X2. Friedman and Wall (2004) describe four types (regions) of relationships as r12 is changed. What are the names of the four regions? Give a brief description and the characteristics of each.

Complementarity: X1 and X2 correlate positively with Y but have a negative correlation with each other Redundancy: most common case; X1 and X2 have a positive correlation and have a positive correlation with Y o Standardized slopes will be smaller than Pearson correlations o Complete redundancy: X1 and X2 are positively correlated but X2 provides no unique prediction of Y over X1 Suppression: X2 is correlated with another independent variable but not the outcome, which ultimately increases the proportion of variance Enhancement: R2 from y predicted by 1 and 2 is larger than the individual correlations between X1 and X2 with Y o May be substantial multicollinearity

Given two predictors X1, X2, and a criterion Y. The range of the correlation between X1 and X2 will not always match the potential range from -1 to +1. What can constrain r12 to be less than 1.0 in magnitude?

If X1 and X2 are separately highly correlated with Y, and aren't correlated with each other, you can get R2 values greater than 1 which isn't possible There are restrictions on the value of the correlation coefficients because the correlation matrix must have a determinant greater than 0

What factors determine the fit of a spline to the data?

· the number of knot points · the location of the knot points · the degree of the polynomial used in the local regression

In the regression equation Y = b1 X + b2 Z + b3 X Z + b0, we say that the regressions of Y on X and Y on Z are conditional relationships. What do we mean by this?

In the simple regression equations, both the simple slope (first term) and simple intercept (Second term) are conditional; for simple regression of Y on X, it is conditional on Z and vice versa

Explain how you would use the rearranged interaction equation to generate three simple regression equations, one at ZH (one standard deviation above the mean of Z), one at ZM (at the mean of Z), and one at ZL (one standard deviation below the mean of Z). Be able to do this if given a numerical example. Be able to take R or SPSS printout and reproduce the three simple regression equations.

In the simple slope equation Y = (b1 + b3Z)X + ( b0 + b2Z), we can substitute the value of Z into the above equation and know the simple slope of Y on X for that specific value of Z. We would substitute the numeric values of Z = 1 SD below the mean of Z (by doing Z + 1 SD of Z); at the mean of Z; and 1 SD above the mean of Z (by doing SD minus 1). It would like like: Y = coefficient(XCENT) + intercept

In the analysis of the bodyweight-brainweight data, what features led to the log transformation of brainweight and bodyweight? What was the result of examining the large positive residuals in this example?

In this dataset, there were extreme cases which violated assumptions of normality. Therefore, these cases were transformed to bring those values in close to the rest of the data while maintaining their order (i.e., degree to which they were extreme relative to the rest of the data). Before the transformation, the residuals were not normally distributed. The result of examining the large positive residuals lead to the conclusion these cases were part of a category (primates).

Know how to compute interactions 1SD above the mean and 1SD below the mean using the computer method. How is the interaction at the mean computed?

Interactions 1SD above the mean: We subtract 1SD to set this rescaled new value of Z (ZHIGH) to 0 (ZHIGH = ZCENT - 1SD) Interactions 1SD below the mean We add 1SD to set this rescaled new value of Z (ZLOW) to 0 (ZLOW = ZCENT + 1SD) In the regression equation = b0 + b1XCENT +b2ZLOW + b3XCENT*ZLOW Interaction at the mean (Centered Regression Equation) = b0 + b1XCENT +b2ZCENT + b3XCENT*ZCENT · The test of b1 is the test of simple slope of Y on X at the mean of Z (ZCENT = 0). · The test of b2 is the test of simple slope of Y on Z at the mean of X (XCENT = 0).

What do I learn from the CI that I do not learn from the null hypothesis significance test?

It gives us the degree of precision with which the parameter is estimated, as well as the size and significance of the regression coefficient

What does the b2 coefficient tell you about the relationship of X to Y?

It gives us the rate of change (acceleration) of the curve and the direction of the curvature

What is meant by a third order predictor? Give two examples.

It is a predictor that has a power of 3. ...+ b3X^3 ---b4ZX^2

How would you estimate the additional proportion of variance accounted for by a cubic spline over a basic cubic regression model?

Perform F-test of gain in prediction. Therefore, splines are mathematically more tractable than loess. We have added two additional terms so the number of terms in the second set is 2. In general, the cubic spline equation will have df = 4 + k, where k is the number of knot points (here 2). In the present example 4 df are required for b0, b1, b2, and b3 and 2 are needed for the two knot points, so df = 6. Y = b0 + b1X + b2X2 + b3X3 (basic cubic) Y = b0 + b1X + b2X2 + b3X3 + b4X2^3 + b5X3^2 (cubic spline)

What are the two perspectives used to develop (derive) multiple regression? What is meant by each perspective in terms of the sampling of cases? How do the results of the two perspectives differ in the linear case? How do they differ in the nonlinear case?

Random sample from multivariate normal distribution o We assume that the variables have a multivariate normal distribution in the population and we are taking a random sample from this population o Other implications include: § Linearity: There are no nonlinear effects (curvilinear/interactions); the relationship between X1 and Y and X2 and Y are linear § Homoscedasticity: variance of the residuals around regression line is constant § Normality of residuals § Independence of observations Fixed sample from a distribution o a fixed number of cases is selected from each point on the distribution of the predictors (e.g. selecting 20 children from each elementary grade) o same implications as first perspective (homoscedasticity, normality of residuals, independence) o Easier derivations, as the population means, SD, and covariances will be identical between samples When linearity holds, the two approaches lead to the same estimates of coefficients and standard errors o Statistical power is the difference; random sample from a multivariate normal distribution approach requires a larger (~5-10%) sample size than the fixed approach to achieve a desired level of statistical power

In the raw regression equation, Y = b0 + b1X + b2X^2 , what is the interpretation of b0, b1, and b2? In the centered regression equation , in which X1C = X1 - X1barand X2C = X2 - X^2bar, what is the interpretation of (centered regression coefficients)? Will any of the regression coefficients be the same in the raw and centered equation?

Raw regression equation o b0 is intercept. at X1 and = 0. o b1 is unstandardized regression coefficient (slope, linear tangent) at X1 = 0. o b2 is unstandardized regression coefficient (related to acceleration, rate of change of slope, of curve; the curvature and direction of curvature). Centered regression equation o b0c is intercept. at X1 and = mean of X. o b1c is unstandardized regression coefficient (slope, linear tangent) at mean of X1 (the mean of centered X1 = 0). § the average of all these little regressions across the range of X, the average linear effect in the data § the slope of Y on centered X at the mean of centered X. o b2c is unstandardized regression coefficient (related to acceleration, rate of change of slope, of curve; the curvature and direction of curvature). Only the quadratic term will be the same, since it is the highest order term in the equation

What defines synergistic (enhancing), buffering, and compensatory interactions? Where is the crossing point?

Synergistic (enhancing): one variable strengthens the impact of the second variable - they work in the same direction. Buffering: one variable weakens the impact of the second variable or even "cancels out" the effect of the second variable. Compensatory: Two predictors relate to an outcome in the same direction. However, the interaction between them is in the opposite direction of the direction of the relationship of the individual variables to the outcome. As one predictor increases in value, the strength of the relationship of the other predictor decreases. The crossing point is either outside the range of the data (ordinal) or within the range of the data (disordinal)

Explain how to find an appropriate "standardized solution" in regression with interactions. Why is the standardized solution provided by R or SPSS that accompanies the solution with the centered predictors an inappropriate standardized solution?

Step 1. Convert the raw X, raw Z predictors to the standardized z scores Step 2. Form the cross-product of the standardized scores, zx * zz Step 3. Compute zy, the standardized score corresponding to criterion Y with sy for the standard deviation of Y Step 4. Run the regression analysis using the standardized scores you have calculated: = b1 zx + b2 zz + b3 zx * zz + b0 Yzˆ Step 5. Report the "raw" regression equation from the regression analysis in Step 4. This is the appropriate "standardized" solution to report. Ignore the standardized coefficients from this analysis. Step 6. If you want to report any simple slope analysis in standardized form, use the "raw equation" from Step 5 as the basis of this analysis. The standardized solutions in SPSS and R use an incorrect order of operations. It standardizes the cross-product term instead of computing z scores.

Set 1 alone: r2y.12 = .17 Set 2 over Set 1: r2y(34.12) = .32 - .17 = .15 Set 3 over combined Set 1 and Set 2: r2y(5.1234) = .41 - .32 = .09 Show how the squared multiple correlation of predictors 1,2,3,4, and 5 with the criterion is built up from the squared semi-partial correlations

Sum the semi-squared partial correlations for each set to get the squared multiple correlation for the whole model .17 + .15 + .09 = .41

Assume that we are working with: Y = b1 X + b2 Z + b3 X Z + b0 Both X and Z are centered, and the XZ term is the product of centered X times centered Z. What are two interpretations of (a) the b1 coefficient and (b) the b2 coefficient?

The b1 coefficient has three interpretations when predictors are centered. Only the first interpretation holds for b1' for uncentered predictors X and Z. (a) The regression b1' of Y on X at Z=0 in the centered equation. (b) The regression of Y on x at the arithmetic mean of z, since the mean of centered z equals zero. (c) The average of all the regression slopes of Y on x at every value of z taken across the whole range of the predictor z. The b2 coefficient has three interpretations when predictors are centered. Only the first interpretation holds for b2' for uncentered predictors X and Z. (a) The regression b2' of Y on Z at X=0 in the centered equation. (b) The regression of Y on z at the arithmetic mean of x, since the mean of centered x equals zero. (c) The average of all the regression slopes of Y on z at every value of x taken across the whole range of the centered predictor x.

Consider the following regression equation: = b0 + b1X1 + b2X2 + b3X3. Suppose there is very high multicollinearity, say VIF = 20. What effect does this have on R2 from the equation, important for overall prediction? What effect does this have on the estimates of b1, b2, and b3, important for explanation (hypothesis testing)?

The betas decrease and the standard errors increase. B1 will be VERY large a positive and b2 and b3 will be negative! The standard errors will get so large that even with the large betas will be non-significant. The R2 is fine; standard error of estimate prediction is not adversely affected by the inclusion of variables that are multicollinear. Despite extraordinarily low levels of tolerance and high levels of its reciprocal VIF, prediction is not adversely affected. Prediction is not impacted, but explanation becomes very difficult

Why must all lower order terms be included in a regression equation when the prediction from a higher order term is being examined for significance?

The coefficient for the highest order term is only an accurate reflection of the curvilinearity at the highest level if all lower levels are partialed out

What problem(s) have potentially occurred if you get an error message "negative determinant" or "negative eigenvalue"?

The determinant of the matrix is negative (less than 0), resulting in the matrix having an improper structure. (a) mistakes in data input (errors in inputting correlation or covariance matrix); (b) analysis of a "theoretical" (latent) correlation or covariance matrix; --matrix is corrected for attenuation due to measurement error; --tetrachoric or biserial correlations are computed to estimate what the correlation would be if the variable(s) were continuous and normally distributed. ·--pairwise deletion is used. Each correlation in the matrix is based on a different set of cases. "Impossible" values can occur for some correlations due to large variation in the sample sizes upon which the correlations are based or differences between the correlations occur because different relationships exist for subjects who have complete vs. missing data

What is meant by a focal predictor? What is meant by a moderator? Are these determined by the analysis or by theory and research in the area?

The focal predictor is the independent variable whose relationship to the outcome that the researchers in the substantive research area wish to spotlight. The moderator is the secondary variable that the researchers believe is changing that relationship. These are determined by theory/researchers in the area since statistically they are equal.

Describe the general approach of a piecewise regression model

The range of X is broken up into segments. A linear regression line is estimated for each piece with the regression lines joined at the boundary of the segments (a special case of splines)

Know how to compute the squared semi-partial correlation from two squared multiple correlations (I will not give formula) Say we have r2y.123 and r2y.12, the semi-partial correlation of X3 is...

r2y(3.12) = r2y.123 - r2y.12

What is measured by the squared partial correlation of a predictor with the criterion? How do the squared partial and squared semi-partial correlation differ in what is being measured?

The squared partial is proportion of Y accounted for by X1 controlling for X2 over the residual variance in Y after controlling for X2. Semi partial squared is the proportion of variance in Y accounted for by the independent variance in X1 over the total variation in Y (the gain in prediction). The squared partial and semi-partial correlation differ in how much of the variance in Y is included in the proportion that is attributed to a single predictor. In the squared partial we look we use the remaining part of Y that is not predicted by this regression equation (i.e., the residual variance not accounted for). Whereas the squared semi partial we use the total variance in Y to assess increase in the proportion of variance accounted for in the criterion Y by adding another predictor

Explain this statement: "In the quadratic regression equation, Y = b0 + b1(X1 - X1bar) + b2(X2 - X2bar)^2 if the b2 coefficient is nonzero, then the regression of Y on X depends on the value of X". Use a figure to illustrate your answer.

The tangent line is equal to b1 at the center (Xc = 0) in the centered graph; for the quadratic coefficient, it depends where you are on the graph itself (so the value of X) to know what the slope is

What does it mean if a regression is "linear in the variables"?

There is a linear relationship of each predictor to the criterion. In other words, each regression coefficient only measures the degree of linear relationship between a predictor and the criterion

Explain the generalization of the measure of tolerance and the VIF to the multiple predictor case. What characteristic does the VIF have for X1 and X2 in the two predictor case that it does not have in the four predictor case for X1, X2, X3, and X4?

Tolerance is interpreted as the degree to which variables are redundant with one another (high tolerance is referred to multicollinearity). As in the two predictor case, the value (1-r2xi.12..(i)..p) is very important. It is called the tolerance of the predictor variable Xi ." Note that if predictor Xi is highly redundant with other predictors, the value of r2xi.12..(i)..p will approach one. Consequently, the value of (1-r2xi.12...(i)...p) will therefore approach zero, and the standard error will become large. The tolerance (and its reciprocal, the VIF) now typically differ for each variable in the equation instead of being equal as they were in the special case of two predictors

What is the purpose of Fisher's r to z transform?

a nonlinear transformation of r that transforms the skewed sampling distribution of r to a close approximation of a normal distribution

In the regression equation = b0 + b1X1 + b2X2, suppose you wish to test whether β1 = β2 in the population. What two models would you compare? If X1 and X2 represent variables measured in different units, can X1 and X2 be converted to z-scores and the results of the standardized regression equation be compared? Why or why not?

Two models: o Full Model: = b0 + b1 X1 + b2X2 o Reduced Model: = b0 + b1 X1 + b1X2 Unless the variances of X1 and X2 are the same, the test must be a comparison of unstandardized regression coefficients. The units must be the same if they don't meet that requirement.

A traditional strategy in data analysis with continuous variables was to dichotomize the variables and analyze them in a 2 x 2 ANOVA. Give at least two problems with this strategy?

Using a median split... If X and Z are correlated, then spurious main effects may occur The statistical power of the test of the interaction is sharply reduced

Know how to compute the squared partial correlation from two squared multiple correlations (I will not give formula). Say you had r2y.123 and r2y.12. Then the squared partial correlation of X3 with the criterion, over and above X1 and X2 is...

r2y3.12 = (r2y.123 - r2y.12)/(1 - r2y.12)

Rearrange the equation containing an interaction into a simple regression equation showing the regression of Y on X at values of Z. Explain how the regression coefficient in the rearranged equation shows that the regression of Y on X depends on the value of Z

Y = (b1 + b3Z)X + ( b0 + b2Z) Simple regression of Y on X Y = (b2 + b3X)Z + ( b0 + b1X) Simple regression of Y on Z Y = b1 X + b2 Z + b3 X Z + b0 (general equation) When you substitute a value for Z then the terms b2 Z becomes a constant and can be added to the intercept. Similarly the term b3 X Z can be added to b1 X.

What is the general form of a polynomial equation?

Y = b1X + b2X^2 + b3X^3 + . . .+ bpX^p + b0

What is the regression equation for a curvilinear by linear interaction? What is the difference between the treatment of complex interactions in multiple regression and ANOVA?

Y = b1X + b2X^2 + b3Z + b4XZ + b5X^2Z + b0 X, X2, and Z give conditional main effects XZ, and X2Z give interaction Difference: If X2 were omitted in the equation, then the b5 coefficient would be a biased estimate of the curvilinear by linear interaction component.

Do all regressions of Y on X (all simple slopes) have the same value of the standard error, regardless of the value of Z?

Yes

Predicted weight loss = b1pillgrp + b2exercise + b0, pillgroup = 1. If participant takes a weight loss pill and 0 = if participant takes a placebo (control), and exercise is the number of hours of exercise per week. What do b0, b1, and b2 represent in this regression equation?

b0 is the intercept, the predicted value of Y when patient takes a placebo (=0) and when they get 0 hours of exercise · b1 is the effect of taking the pill on weight loss where the intercept equals 0 and exercise equals 0 (so without any exercise, we would expect a loss of 3 pounds) · b2 is the effect of exercise on weight loss when the intercepts equals 0 and the group is coded as 0

(63) On a 3 dimensional regression surface, be able to identify b0, b1, and b2.

b0 is where the dot is b1 is the bolded line where Zc and Xc are 0 b2 is the bold line along the ZC axis

Suppose you analyze a data set with the equation Y = b1X + b2Z + b3 XZ + b0, and you use centered predictors. You then repeat the regression analysis with uncentered predictors. Which coefficients will change across analyses and why?

b0, b1, b2 will change between the analyses because these terms are conditional coefficients.

Y = b1X + b2Z + b3XZ Assume X and Z are centered. If the b3 coefficient is significant in the above equation, what does that tell you?

b3 represents the regression coefficient for the linear interaction The b3 coefficient that the relationship between X and Y partly depends on another predictor (Z) and vice versa. Therefore, b3 tells you the shape of the warped regression surface at the mean of Z and Y.

Be able to state the hypotheses and compute the F test for gain in prediction. The Fgain formula will be provided. For example, test the hypothesis that Set 2 in question 17 contributes significant variance over and above set 1. Assume n=100. Set 1 alone: .17 Set 2 (predictors from set 1 and 3, 4): .32

o H0: P2y(34.12) = 0 o H1: P2y(34.12) > 0 o k = the number of variables in the first set; in Set 1 there are 2 predictors o m = is the number of variables in the set being added; in Set 2 there are 2 predictors o r2all = r2multiple resulting when all predictors from both sets 1 and 2 are included in the regression equation (r2all = .32) o r2set1 = r2multiple resulting when all predictors from the first set are included in the regression equation (r2set1 = .17) Fgain = 10.48 Df = [m, (n-k-m-1)] = (2, 95) Our Fgain value is more extreme than the Fcrit value, meaning we reject the null hypothesis. We have evidence showing that Set 2 provides a significant proportion of variance in the criterion over and above Set 1's contribution

Given a Venn diagram, be able to identify the areas corresponding to the total variance in Y, R2, the shared variance between Y and X1, Y and X2, Y and X1 holding X2 constant, Y and X2 holding X2 constant.

look at notes for correct answer The pearson correlation is the shaded area between X1 and Y (including any space occupied by X2) Semi partial correlation (or part correlation): the space with ONLY shared space of X1 and Y (no X2) R2y12 = all variance in Y with X1 and X2 (three spaces filled in total; it include their shared variance and individual variance with Y) Partial correlation = the denominator is the variance that has not been explained by X2 (the whole circle of Y (variance of 1) - X2; what's left to be explained) · The numerator for the correlation is the same as the semi partial If X1 and X2 do not overlap, semi partial and partial will be the same

How do we build a linear by quadratic interaction into a regression analysis? How many degrees of freedom does such an interaction have in regression analysis?

the amount of curvature of Y on X depends on the value of Z. To build a linear quadratic interaction, let's use the following example: Y = b1X + b2X^2 + b3Z + b4XZ + b5X^2Z + b0 the term X^2Z is the highest order of interaction, so we have to build in every lower order term: X2, X, Z, XZ. Together the XZ and X2Z terms represent the two degrees of freedom of the quadratic by linear interaction.

What does it mean if a regression equation is "linear in the coefficients"?

the predicted score is a linear combination of the predictors, where the weights are the regression coefficients


Kaugnay na mga set ng pag-aaral

AP Psychology Unit 5 Study Guide

View Set

CLCV 205 Midterm 2: Important Events

View Set

AP Classroom Test Period 5: 1861-1865

View Set

Chapter 7; Social Stratification

View Set

Past, Present, and Future Tense Verbs

View Set

Auditing Chapter 17, Chapter 17, ATG 457 CH 17, HW - Chapter 17, Chapter 17, Auditing Chapter 17

View Set