Chapter 16: Regression

Ace your homework & exams now with Quizwiz!

what purpose does the regression line serve?

Allows us to make predictions about one variable based on what we know about another variable. It gives us a visual representation of what we believe is the underlying relation between the variables, based on the data we have available to us.

Error based on the mean vs. error based on the regression equation

Error based on the MEAN: the sum of squares total (SStotal) Error based on the REGRESSION EQUATION: the sum of squared errors (SSerror)

Determining the Regression Equation

FORMULA: Yhat = a + b(X) X = raw score on the predictor variable Yhat = the predicted raw score on the outcome variable a = the intercept of the line b = slope FIND THE INTERCEPT 1. Find z score for an X of 0 zx = (X-Mx)/SDx --> zx = (0-Mx)/SDx 2. Use the z-score regression equation to calculate the predicted z score on Y zYhat = (rxy)(zx) 3. Convert the z-score to its raw score Yhat = zy(SDy)+ My FIND THE SLOPE 1. find the z-score for an X of 1 zx = (X-Mx)/SDx --> zx = (1-Mx)/SDx 2. Use the z score regression equation to calculate the predicted z score on Y zYhat = (rxy)(zx) 3. Convert z-score to raw score Yhat = zy(SDy)+ My 4. Determine the slope Subtract a value from b value, deterimine increase or decrease for (+/-) sign we now have three points -- if they don't fall on a straight line, we've made an error

Proportionate reduction in error for a multiple regression

R^2 (rather than r^2); capitalization indicates the proportionate reduction in error is based on more than one predictor variable

Regression to the mean

Regression to the mean occurs because extreme scores tend to become less extreme -- that is, they tend to regress towards the mean. For example, very tall parents tend to have tall children, but usually not as tall as they are, whereas very short parents tend to have short children, but usually not as short as they are.

Standard error of the estimate around the line of best fit vs. the error of prediction around the mean

Standard Error of the Estimate: a stat that indicates the typical distance btwn a regression line and the actual data points. when we don't have enough info to compute a regression equation, we often use the mean as the "best guess." Error of prediction when the mean is used: typically greater than the standard error of the estimate

Calculating error when we predict the mean for everyone

Table: Y, My, Y-My, Y-My squared, sum Y-My squared Sum = total variability around the mean of Y (total error for predicting Y -- it is a measure of error that would result if we predicted the mean for every person in the sample (SStotal); represents the worst case scenario, the total error we would have if there was no regression equation. Graph w a horizontal line as the mean -- we draw liines for each person's point on a scatterplot to the mean and those lines are a visual representation of error

Calculate the amount of error from using the regression equation with the sample:

To Calculate the Proportionate Reduction In Error: 1. Calculate what we would predict for each student if we used the regression equation -- plug each X into the regression equation; subtract the predicted score from the actual score 2. square all errors and sum (SSerror) -- represents the error we'd have if we predicted Y using the regression equation Visualized on a graph that includes the regression line -- draw lines from each point to the regression line; lines visualize the error 3. now we compare the error we predicted using the mean for everyone to the error we predicted using the regression line 4. Subtract regression from mean ; divide by mean r^2 = (SStotal - SSerror) / SStotal

why don't we use IV and DV in correlational studies?

We cannot assume causation in correlational studies

adjusted r^2

a less biased and more conservative estimate of effect size for the regression equation than is r; this value will always be lower than the original r^2. formula: r^2 adjusted = 1 - (1-r^2) (N-1) / N-p-1 N = total sample size p = number of predictor variables that we used in the regression equation -- in the case of simple linear regression, p always = 1, because there is just one predictor variable

standardized regression coefficient

a standardized version of the slope in a regression equation; the predicted change in the outcome variable in terms of standard deviations for an increase of 1 standard deviation in the predictor variable (ß) ß = (b) √SSx / √SSy we must calculate sum of squares for the predictor variable (x) and the outcome variable (y) to use in the equation for the standardized regression coefficient (ß). X, X-Mx, X-Mx^2, Y, Y-My, Y-My^2 for simple linear regression, the standardized regression coefficient is always exactly the same as the correlation -- they indicate the change in standard deviation that we expect when the predictor variable increases by 1 SD.

Regression with Z Scores

if we know the z-score on one variable (predictor), we can multiply by the correlation coefficient to calculate his predicted z-score on a second variable (outcome) z-score indicates how far a score falls from the mean in terms of standard deviation Formula: Standardized Regression Equation zy = (rXY)(Zx) first z-score is for the outcome, second z-score is for the predictor; y"hat" means the variable is predicted, we can't predict the actual score and the hat reminds us of this; x and y subscripts for the Pearson correlation coefficient (r) indicate that this is the correlation between variables X and Y

orthogonal variables

predictor variables that make separate and distinct contributions in the prediction of an outcome variable, as compared with the contributions of the other predictor variables do not overlap each others Example: when exploring which variables are associated with better health outcomes such as low blood pressure and an absence of heart disease, both the amount of physical activity and diet would likely make separate contributions to the prediction of health outcomes -- one can be physically active but eat an unhealthy diet, and eating well does not require that one also exercises.

For regression, how is the strength of the correlation related to the proportionate reduction in error?

strong correlations mean highly accurate predictions with regression. this translates into a larger proportionate reduction in error.

slope

the amount that Y is predicted to increase for an increase of one unit in X b

outcome variable

the dependent variable in a correlational study that is being predicted by the predictor variable

predictor variable

the dependent variable in a correlational study that is used to predict the score on another variable

intercept

the predicted value for Y when X is equal to 0, which is the point at which the line crosses, or intercepts, the y-axis a

regression to the mean

the tendency of scores that are particularly high or low to drift toward the mean over time regression of the outcome variable (the fact that it is closer to the mean)

Calculating Regression with Z Scores

1. Calculate Z-Score zx = (X-Mx)/SDx 2. Multiply z-score by the correlation coefficient zYhat = (rxy)(zx) 3. Convert the z-score to a raw score Yhat = zy(SDy)+ My

Two common ways of quantifying how well the regression line captures the variability in the data with linear regression:

1. standard error of the estimate (calculate after we have the proportionate reduction in error) 2. proportionate reduction in error (use first)

standard error of the estimate

a statistic that indicates the average vertical distance between a regression line and the actual data points (average error in predicting each of our data points using the regression equation) essentially the standard deviation of the actual data points around the regression line; an estimate of the amount of variability of the outcome/response variable about the regression line smaller: there is less error and we're doing much better in our predictions than if there had been a larger standard error -- actual scores are closer to the regression line larger: we're dong much worse in our predictions than if there had been a smaller standard error -- actual scores are farther away from the regression line Formula: σestimate = √∑(Y-Yhat)^2 /N this number tells us the average deviation of each individual score from the regression line -- on average, we will be this many units off from the actual depression score

proportionate reduction in error (r^2)

a statistic that tells us the amount of error we have eliminated by using a particular regression equation to predict a person's score on the outcome variable vs. simply predicting the mean on the outcome variable for that person this statistic tells us how good the regression equation is the proportionate reduction in error is a measure of the amount of variance in the outcome variable that is explained by the predictor variable This value is represented by r^2, so basically we can just square the correlation coefficient to get the same number. It is also the same as the effect size for ANOVA (R^2) -- in all cases, the number represents the proportion of variance in the one variable that is explained by the predictor

regression

a statistical technique that can provide specific quantitative information that predicts relations between variables

multiple regression

a statistical technique that includes two or more predictor variables in a prediction equation; assesses whether multiple pieces of evidence are better than one as well as how much better each additional piece of evidence is. in behavioral sciences, behavior tends to be influenced by many factors, so multiple regression allows us to better predict a given outcome Yhat = Constant coefficient (the intercept) + variable 1 coefficient (X1) + Variable 2 coefficient (X2)

simple linear regression

a statistical tool that lets us predict a person's score on an outcome variable from his/her score on one predictor variable allows us to calculate the equation for a straight line that describes the data -- once we graph the line, we look at any point on the x-axis and find its corresponding point on the y-axis -- that corresponding point is what we predict for y they predict a z score on the outcome variable that is closer to the mean than is the z score on the predictor variable -- scores regress (go backward) towards the mean; called regression to the mean

prediction tool

an equation to predict a person's score on one variable from his/her score on a different variable; developed using a correlation coefficient


Related study sets

Stats Exam Flashcards - Chapter 5

View Set

Ops: Mngt of Bus Processes Exam 3

View Set

Project Management Exam 1 (Chapters 1,2,3,4,5,11, & 12)

View Set

NUR 212 EAQ - Chapter 34 Heart Failure

View Set

Unit 5 Psych Test (ap classroom)

View Set

Texas Principles of Real Estate - Real Estate U

View Set