Business Analytics II Chapter 13
four assumptions of regression
linearity, independence of errors, normality of error, equal variance
standard error of estimate
measures the variability of the observed Y values from the predicted Y values, equal to sq.root(SSE/(n-2))
durbin-watson statistic
used to measure autocorrelation, measures correlation between each residual and the residual for the previous time period; if positive autocorrelation -> D will approach 0
residual analysis
visually evaluates these assumptions and helps you determine whether the regression model that has been selected is appropriate
relevant range
when using regression models, only consider the ____________________ of the independent variable in making predictions; *includes all values from the smallest to the largest X used in developing the regression model*; you can interpolate but not extrapolate
B0
y intercept for the population, represents the mean value of Y when X=0
H1
B1 does not equal 0
H0
B1=0; if rejected, you conclude that there is evidence of a linear relationship
regression mean square (MSR)
SSR/1 = SSR
total sum of squares (SST)
a measure of variation of the Yi values around their mean; total variation is divided into explained variation and unexplained variation; *equal to the regression sum of squares plus the error sum of squares*
independent variable
also known as the explanatory variable,
dependent variable
also known as the response variable,
prediction line
also known as the straight line formed from the simple linear regression
f test
alternative to t test, use this to determine whether the slope is statistically significant; Fstat: regression mean square/mean square error -> MSR/MSE
autocorrelation
basic assumption of regression model is independence of errors, but this is violated when data are collected over sequential time period because a residual at any one time period sometimes is similar to residuals at adjacent time periods -> validity is doubtful in this case
mean square error (MSE)
error variance; SSE/(n-2)
residual
estimated error value is the difference between the observed (Yi) and the predicted values (^Yi) of the dependent variable for a given value of Xi
regression sum of squares (SSR)
explained variation, represents variation that is explained by the relationship between X & Y; *based on the difference between ^Yi (the predicted value of Y from the prediction line) and _Y (the mean value of Y)*
reject H0 in f test
if Fstat > Falpha
point estimate
predicted value of y given the x variables
coefficient of determination
r2; equal to the regression sum of squares divided by the total sum of squares (SSR/SST); gives us the proportion of variation in Y that is explained by the variation in the independent variable X in the regression model
Ei
represents the random error in Y for each observation, i; *vertical distance of the actual value of Yi above or below the expected value of Yi on the line*
independence of errors assumption
requires that the errors be independent of one another, particularly important when data are collected over a period of time; time series - a residual may sometimes be related to the residual that precedes it -> cyclical pattern so it's violated
normality assumption
requires that the errors be normally distributed at each value of X; if they appear to depart substantially from the normal distribution -> it's violated
equal variance assumption (homoscedasticity)
requires that the variance of the errors be constant for all values of X; if it is funnel-shaped -> it's violated
B1
slope for the population, represents the expected change in Y per unit change in X
linearity assumption
states that the relationship between variables is linear; if the model is appropriate, you will not see any pattern in the plot
least-squares
this method minimizes the sum of the squared differences between the actual values (Yi) and the predicted values (^Yi) using the prediction line;
t test
to determine the existence of a significant linear relationship between the X and Y variables, you test whether B1=0
error sum of squares (SSE)
unexplained variation, represents variation due to factors other than the relationship between X & Y; *based on the difference between Yi & ^Yi*