Chapter 13 Simple Linear Regression
unexplained variation
(error sum of squares) represents variation due to factors other than the relationship between X and Y
residual
(estimated error value) is the difference between the observed and predicted values of the dependent variable for a given value of X
independent variable
(explanatory variable) the variables used to make predictions. X
equal variance assumption of regression
(homoscedasticity) requires the variance of the errors be constant for all values of X
explained variation
(regression sum of squares) represents variation that is explained by the relationship between X and Y
dependent variable
(response variable)the variable you wish to predict. Y
total sum of squares
(total variation) measure of the variation of the Yi values around their mean, Y-bar
Y intercept
mean value of Y when X equals 0
coefficient of determination
measures the proportion of variation in Y that is explained by the variation in the independent variable X in the regression model
standard error of the estimate
measures the variability of the observed Y values from the predicted Y values
residual analysis
visually evaluates the regression assumptions to determine whether the regression model is appropriate
Assumption of Regression
Linearity, independence of errors, normality of error, and equal variance
least-squares method
determines the values of the regression coefficients that minimize the sum of squared differences around the prediction line
regression analysis
enables you to develop a model to predict the values of a numerical variable based on the value of other variables
slope
expected change in Y per unit of X
relevant range
includes all values from the smallest to the largest X used in developing the regression model. You should not extrapolate beyond the range of X values
independence of errors assumption of regression
particularly important when data are collected over time--the errors of a specific time period are sometimes correlated with those of a previous time period
autocorrelation
pattern when data are collected over sequential time periods because a residual at any one time period is sometimes similar to residuals at adjacent time periods.
SLR prediction line
predicted value of Y equals the Y intercept plus the slope multiplied by the value of X
normality assumption of regression
requires that the errors be normally distributed at each value of X
regression coefficients
sample Y intercept and sample slope
simple linear regression
single numerical independent variable, X, is used to predict numerical dependent variable Y
Durbin-Watson statistic
used to measure autocorrelation