Key Concepts using Linear Regression
Bivariate Linear Regression
1 scale-level IV: *Predictor Variable (X)* - the variable we are predicting from. 1 scale-level DV: *Criterion Variable (Y)* - the variable we are trying to predict. Linear prediction rule: an equation that allows us to plot a line that best describes the relationship between the predictor and criterion variables.
Hypothesis Testing for Prediction
*NULL HYPOTHESIS*: regression is coefficient b=0 *RESEARCH HYPOTHESIS*: regression coefficient b/=0 Examin the value of the sloe and the confidence interval of the slope (SPSS)
A note about Prediction
*Prediction is also not causation*. Other issues that applied to correlation also apply to regression: --Curvilinear patterns --Restriction in range --Unreliable measurements --Outliers
SPSS EXERCISE
*RESEARCH QUESTION*: Can we predict level of injury (Y) from a measure of total strength (X)? What is the bet prediction rule (equation of regression line), and what is the correlation between variables? *NULL Hypothesis*: The slope of the prediction line is equal to zero (meaning X does not predict Y at all); (another possible Ho: the correlation between the two scores is equal to zero. In document form, scores will appear in two or more columns or rows, each headed by a variable name. For example, Total strength (Z) and Injury Index. Data are entered in separate columns for each variable. There is NO GROUPING VARIABLE. First, to obtain the total body strength score, standardize the 5 strength raw scores by transforming them into z scores. Go to: ANALYZE> DESCRIPTIVE STATISTICS>DESCRIPTIVES Move all strength over to variables (not the injury Index). click SAVE standardized values as variables, click OK. Pre-analysis: Output winds automatically pops up but is NOT necessary for results. Looking at the data file you will see data into z-Scores. Original raw scores then in the pink column z scores for each variable are highlighted in PINK.
Linear Regression Graphs
*Scatter plot* with line of Best fit: GRAPHS>LEGACY DIALOGS>ScatterDot >> Simple Scatter >>DEFINE to plot the line correctly put the variables in the right box. IV variable (X) on the X axis and plot dependent variable (Y) on Y axis. CLICK OK. Since this is a regression, you need to add a line on top of the graph. Double click on the graph to open chart editor. Click on a point on the graph which selects all the points with highlights on the chart. NEXT go to elements on menu bar > Fit line at total >LINEAR the close. The line will appear with a regression equation.
Result of Knowing Prediction Rule
*Y(hat)* = .75 + 2.9(X) Job Success Score = .75 +2.9 (Job training perf. score) Job success score for person with job training performance score of (X) 33: Job Success Score = .75 +2.9(33) = .75 + 95.7 = 96.45
Actual Linear Regression Analysis Phase
Analysis Step 1: start at this point to do the work Actual Linear Regression Analysis begins here: ANALYZE> REGRESSION>LINEAR Step 2) Open dialog box. Move variables into the proper boxes. *(VIP to get the variable right in this step.)* This goes first: DEPENDENT = variable you want to predict (Y). Variable you are predicting from (X) goes in the independent box. Step 3) Click on statistics in the top right hand corner. In this box: Select - Estimates, Confidence intervals levels 95%, Model fit, and descriptives. then click CONTINUE, then OK to run analysis.
SPSS Results Part 1 & II
CORRELATIONS show correlation between X and Y, and then the significance level. MODEL SUMMARY shows proportion of variance in Y accounted for by linear relationship with X Correlation in this exercise is not ZERO and is SIGNIFICANT, so reject the Ho that correlation = 0. COEFFICIENTS: shot the slope, with the Y= intercept. With the CI of the slope does not contain zero. The slope is -4.891 with an Y= intercept of 145.800 REGRESSION line equation in words: Predicted value of Y = Y-Intercept + (slope)(value of X) Y(hat) = 145.8 + (-4.98)X or 145.8 - 4.89x FINAL AREA LOOKED AT TO DECIDE to REJECT THE NULL: See the CI of the slope which doesn't contain a zero in the lower or upper bounds. Slope estimate does not contain zero, so reject Ho that slope = 0 showing that X is a SIGNIFICANT predictor of Y.
*Least Squared Error Principle*
Finds the regression line that comes closest to the true scores on the criterion variable. Best prediction rule has the smallest amount of squared error.
APA Results General Outline: Linear Regression Analysis
General Outline: 1) State what kind of test was used (linear regression analysis) 2) What variables: Include independent and dependent variables (IV = total strength; DV = injury index) 3) Formal APA style statistical statement (includes r and p values, equation of the line, confidence interval for slope, and r^2) 4) Brief Interpretation in "english" of the results in the context of the research problem: Significant correlations or not? Slope different from zero? How much variance in Y is explained by the relationship between X and Y (R^2)? 5) Is the null hypothesis rejected or not?
Decreasing the Error in Prediction
MORE ERROR> Predicted on Y(hat) 5 Actual Score on Y (Y) 10 Prediction Rule: LESS ERROR> Predicted on Y(hat) 5 Actual Score on Y (Y) 16
Regression: Pre-anaylysis Steps
PRE-ANALYSIS: Create the total Strength Index by going to TRANSFORM> Compute variable to get the total strength index. SPSS will create that by adding the 5 z-scored variables together. Fill in the name of the reanalysis in Target-Variable: Step 1) Select ALL from the function group which then select under the functions and special variables: using the up arrow to put SUM into the Numeric Expression box on top. Step 2) Put cursor into the parenthesis, then click on each variable you want to go into the Numeric expression using the arrow button putting a comma between each variable. THEN CLICK OK. Step 3) The new variable will appear in the last column of the data file, with the z-scored Total Strength Index, and will be used as the IV (XO in the regression analysis.
What is linear regression?
Predicts the score on one variable from the score on a different variable. For example, Does the IV (Predictor variable) predict the. DV (Criterion Variable). Similar to correlation: if we know the relationship between two variables, we can predict the score on one based on the score of the other. Called "Linear" regression because it attempts to plot a line on a graph that represents the best predictive relationship between two variables. *MORE EXAMPLES*: Predicting college GPA from high school GPA Predicting playground behavior from about of aggression viewed on TV Using the average amount of exercise per week to predict resting heatt rate.
Communication results in APA STYLE Linear Regression
Report simple correlation ( r ) between the variables and its significance level. Report linear prediction rule. Equation of the line of best fit. State whether or nor regression coefficient (Slope) is significantly different from ) by examine confidence interval of the slope. In reporting, include graph of regression line on the scatterplot.
Linear Regression Analysis and Interpretation
Statistical Statement Template: -- r(98) = 0.33, p< .01 -- Y(hat) = 145.8 + (-4.98)X or simplified as Y(hat) = 145.8 - 4.89x or predicted injury score = 145.8 - 4.89x (body strength score); CI of slope = -7.74 to -2.04 -- R^2 = 1.1 ENGLISH STATEMENT: There is a significant negative linear relationship between bodily strength and number and severity of injuries. The regression equation for the best prediction rule is given, and the confidence interval (CI) of the slope does not contain the value of zero. The relationship between the injury and strength variables account for 11% of the variance in the injury index (the DV).
Multiple Regression
Test relationships between two or more independent variables and one dependent variable. Each IV has it own regression coefficient and can contribute to the prediction rule in different ways. Figure *multiple correlation coefficient ( R )* to find overall relationships between IV's and the dependent variable. Figure R^2 to find proportion ov variance accounted for by all IV's taken together.
The line of Best Fit
Uses the *Least Squared Error Principle* to determine the prediction rule based on data that already exists (i.e. we have real scores for both Y/X). Without a prediction rule the predicted score would equal the mean.
Graphing the Correlation between two variables
Using a scatter plot, draw a line in the direction and best fits the variables on the graph using a equation. *Y(hat) = a + (b)(X)* <<Y hat >> = Criterion Variable (Predicted Score) X = Predictor Variable (score predicted from) a = Regression Constant (y-intercept - where the line meets the vertical axis b= Regression Coefficient )Slope- how steeply the line slopes up or down.
Graphing the Regression Line
Using the least squared error principle : Y(hat) = a + (b)(X) <<Y hat >> = Criterion Variable (Predicted Score) X = Predictor Variable (score predicted from) a = Regression Constant (y-intercept - where the line meets the vertical axis b= Regression Coefficient )Slope- how steeply the line slopes up or down.
APA Results Written example Template: Linear Regression Analysis
WRITTEN EXAMPLE: A linear regression analysis was conducted to determine whether scores on an index of total body strength predicts scores on an index of injury severity and frequency. The results indicated a significant negative linear relationship between the variables, r(98) = 0.33, p< .01. The equation for the best prediction rule is Y(hat) = 145.8 + (-4.98)X, or predicted injury score = 145.8 - 4.89x (body strength score). The confidence interval score (CI) of slope = -7.74 to -2.04 did not contain the value of zero, indicating a significant relationship between the variables. This linear relationship between injury and strength accounted for 11% of the variance in the injury scores (R^2 = .11). The null hypothesis of no predictive relationship between the variables is rejected. Higher scores of bodily strength tend to predict lower injury scores.
SPSS: Linear Regression
When choosing a linear regression analysis: -- *Process of PROGRESSION*: Predicts score on one variable from score on another variable, where both variables are at the scale level of measurement. --Related to *Correlation*, but goes one step further by determining the equation of a line that gives the best prediction rule for predicting the score on the DV from the score on the IV.
The importance of the Regression Coefficient (b) in Prediction
if b = 0 then: Y-hat = 3.5 +0(X) Y-hat will always = 3.5 at every value of x. Value of X tells us nothing useful about the value of Y. Is the slope significantly different than zero.
Proportionate Reduction in Error
r^2 The variance in Y that is attributed to its relationship with X If r^2 = .34 then 34% of the variance is Y is accounted for by its relationship with X.
Standardized Regression Coefficient
•• Works in way similar to Z scores making it meaningful to compare outcomes from two or more different studies that use different instrument or measures ("apples to oranges"). •• Shows predicted amount of change in standard deviation units of the criterion variable if the value of the predictor variable increase by one standard deviation. •• In bivariate prediction, the standardized regression coefficient is equal to r, the correlation coefficient.