Regression Analyses
Steps of regression analysis
(step 1), then compare a critical F-statistic (step 2) to a ratio of systematic to unsystematic variance, which is an F-ratio (step 3) the F-ratio will be compared to the critical F-statistic and interpreted (step 4).
independence of errors
- successive observations should not be related - Important when the independent variable is time
The smallest possible value for r2 is
.00
If r = −.10, select the correct value for the coefficient of determination (in decimal form, not as a percentage).
.01
Assumptions of regression analysis
1. Linearity within the relevant range 2. Constant variance of residuals (residual terms are unaffected by the level of the cost driver) - uniform scatter 3. Independence of residuals 4. Normality of residuals
The denominator value is calculated by:
1. Measuring the distance between the best-fit regression line and each sample outcome value (i.e., the vertical lines in the right graph of the figure). 2. Squaring each distance. 3. Adding all the squared distances. 4. Dividing this sum by two minus the sample size (dfresiduals), which produces the error variance.
The numerator value is calculated by:
1. Measuring the distance between the diagonal best-fit regression line and a horizontal line with a slope of 0 (i.e., the vertical lines in the left graph of the figure). 2. Squaring each distance. 3. Adding all the squared distances. 4. Dividing this sum by one minus the number of variables (dfregression), which produces the variance of the regression model.
Appropriate Null Hypothesis
H0: β=0
If the F statistic is larger than the critical F value:
Reject the null hypothesis, H0.
Across the years 1990 to 2006, there was a correlation of r = −.80 between annual new passenger car sales and the annual average cost of red delicious apples, such that as new passenger car sales decreased, the cost of red delicious apples increased (Vigen, n.d.a). What else should be reported for this correlational analysis?
The result of hypothesis testing, including p value
x
a given value of the predictor variable
bivariate linear regression
a procedure in which the linear relationship between a single interval- or ratio-level predictor variable is used to predict the value of an interval- or ratio-level outcome variable
regression analysis
a process in which one or more predictor variables is used in an equation that yields the value of an outcome variable
Method of least squares (regression)
a statistical way to find the best-fitting line through a set of data points
Bivariate linear regression analysis example:
analyzing if total years of school is a significant predictor of annual income.
predictor variable is measured in
observational research along with the outcome variable.
ŷ
predicted value of y (the outcome variable)
Predictor and outcome variables
predictor variables are most often plotted on the x-axis of a graph, with the outcome variables on the y-axis
normality of errors
residual histogram appears slightly skewed but is not a serious departure
Which statistics should be reported for a correlational analysis?
df, F, r, p : degrees of freedom in parentheses, the r value (the correlation coefficient) and the p value
Matthews (2000) identified the bivariate regression equation ŷ = .03x + 225.03, with the predictor variable number of stork breeding pairs and the outcome variable human birth rate (1000s/year). Which of the following is the appropriate interpretation of the slope?
for every 1 stork breeding pair, the human birth rate (1000s/year) increases by 0.03.
Effect size is used to determine
how meaningful the relationship between variables or the difference between groups is
The slope indicates
how much the outcome variable, y, will change for every 1 unit increase in the predictor variable, x.
R-squared is a measure of
the "fit" of the line to the data and will have a value between 0 and 1. The larger the value of R-squared, the better the fit.
As the value in the numerator increases
the F-ratio increases and the predictive power of the regression line increases
Each deviation score for a residual sum of squares is calculated as the difference between what two values?
the actual Y value and the predicted Y value.
a
the intercept of the regression line
The intercept equals
the mean of the outcome variable minus the product of the slope and the mean of the predictor variable
b
the slope of the regression line
The intercept indicates
the value of the outcome variable, y, when the predictor variable, x, equals 0
If the slope of the regression line is small
the variance of the regression model (numerator) should be about the same as the variance of the error (denominator)
If the slope of the regression line is large
the variance of the regression model (numerator) should be larger than the variance of the error (denominator)
What is the value of the slope of the regression equation
times the ratio of the standard deviation for the outcome variable over the standard deviation of the predictor variable
independent variable is
under the control of the researcher and is manipulated in an experimental context
