Ch. 12
A standardized residual that is outside the range ______ or _______ would be considered unusual or an outlier.
(-2, +2) (-3, +3)
Which of the following sample correlation coefficients shows the strongest association between X and Y?
-0.95
Given that SSxy = 492.45, SSxx = 234.5, b1 =
2.1
If the sample regression equation is found to be ŷ = 10 + 2x, what is the estimated value of Y if x = 5?
20
Given that b1 = 2.1, xx = 10 and yy = 65.8, b0 =
44.8
In a study, SST = 1,000, SSE = 200. Find the coefficient of determination.
80%
Fill in the missing symbols between the sums of squares to express the relationship: SST_____SSR_____SSE
=; +
Given the following output from a regression analysis: ŷŷ = 125 + 7.4x, R2 = .15, p-value = .453 for the zero slope test (α = .05), one would conclude
Approximately 15% of the variation in Y is explained by X. This is a poor fit because R2 is closer to zero than one. The slope is not significantly different from zero.
Which of the following is a use of the standard error of estimate in regression analysis?
As a goodness of fit measure.
True or false: If a relationship exists between a response variable Y and a predictor variable X it is appropriate to say that X causes variation in Y.
False
The Excel regression analysis is found in
Data > Data Analysis > Regression
If the trend line equation is ŷ = 15 + 5x, which of the following is the correct interpretation of 5?
For every unit increase in X, Y on the average will increase by 5 units.
In hypothesis tests about the population correlation coefficient, ρ, the null hypothesis for a two-tailed test is
H0: ρ = 0
In hypothesis tests about the population correlation coefficient, ρ, the alternative hypothesis for a two-tailed test is
H1: ρ ≠ 0
What are the steps to take to deal with unusual observations?
If a data point is an error, it may be discarded. Note any unusual observations when reporting results. Try to determine if the observation is influential.
How does the coefficient of determination help as a goodness of fit tool in regression analysis?
It gives the percentage of the variation in Y explained by the sample regression equation.
Which of the following choices describe the information we look for on a scatterplot?
Pattern of relationship Direction of relationship Presence of outliers
What type of relationship exists between X and Y if as X increases Y increases?
Positive
What type of linear relationship appears to exist between two variables if rx y= 0.83?
Positive Strong
For which of the following scenarios would a simple regression model be appropriate?
Predicting units sold from advertising spending. Predicting salary from years of education. Predicting height from age.
To test for overall significance of a regression, we compare which two sums of squares?
SSR to SSE
The coefficient of determination or R² is calculated as
SSR/SST
What does SSR represent in regression analysis?
The amount of variation in Y that is explained.
What does SSE represent in regression analysis?
The amount of variation in Y that is left unexplained.
The three regression assumptions are
The errors are independent. The errors are normally distributed. The errors have constant variance.
If the trend line equation is ŷ = 15 + 5x, which of the following is the correct interpretation of 15?
The line crosses the y axis at y = 15
Given the following output from a regression analysis: ŷŷ = -54 + 2.3x, R2 = .75, p-value = .003 for the zero slope test (α = .05), one would conclude
The slope is significantly different from zero. Approximately 75% of the variation in Y is explained by X. For each unit increase in X, Y increases by 2.3 units, on average.
The use of the standard error of regression, se, as a measure of goodness of fit of a model is best expressed by which of the following statements?
The smaller the value, the better the fit.
True or false: Regression calculations are typically done on a computer because the calculations can be quite tedious and lengthy.
True
We can check for normality of errors by looking at
a histogram of residuals. a normplot of residuals.
A _____________ interval for Y, the response variable, predicts the mean of Y whereas a ____________ interval for Y predicts the individual value for Y.
confidence; prediction
High leverage residuals are of interest because they indicate a point that
could have a strong influence on the regression estimates.
If the error terms in a regression analysis are not independent we say the errors are
autocorrelated
The only data points that could possibly be discarded in a regression analysis are those that are
errors
Total variation in Y = Variation in Y ___________ by X + __________ variation
explained; unexplained
When using a simple regression equation for predicting a response variable, ____________ or predicting outside the range of observed x values, should be approached with caution.
extrapolation
In order to fit a regression line on an Excel scatterplot:
from the ribbon: click on Chart Tools > Layout > Trendline highlight the data points on the scatterplot. Right click and choose Add Trendline.
If the error terms in a regression analysis do not have constant variance we say the errors are
heteroscedastic
Given the following regression equation: profit = $20,000 + $45advertising expenditure, we could conclude
if nothing is spent on advertising the average profit will be $20,000 for each one unit increase in advertising expenditures we see a $45 increase in profit
With the population linear model: Y = β0 + β1X + ε, β0 represents the
intercept population parameter
The _________________ the value of the F statistic, the better the fit of the regression. But we still must compare F calc to a _______________ value for F in order to conclude statistical significance.
larger; critical
A residual that has a high leverage statistic is a point that is far from the _____________ of the x variable.
mean
The confidence interval for Y will be ___ prediction interval for Y.
narrower than the
We can check for dependent errors by
plotting the residuals in sequence and looking for a nonrandom trend.
The formula for calculating the two-tailed critical value of r, the sample correlation coefficient is:
rcritical = tα/2 sq(t2α/2+n−2)
We test the three regression assumptions using the
residuals.
Both the standard error of the slope estimate and the standard error of the intercept estimate are calculated using
se
In statistics, a straight-line model of the relationship between two variables, X and Y, is called a
simple regression equation
With the population linear model: Y = β0 + β1X + ε, β1 represents the
slope population parameter
Simple regression describes the relationship between two variables, X and Y, using the ___________ and _______________ form of a linear equation
slope; intercept
If the fit of the regression line is good the value of SSE will be relatively ________________ compared to SST.
smaller
We calculate a ________________ residual in order to spot unusual or outlier residual values.
standardized
Confidence intervals for the slope and intercept of a simple regression line are calculated using the
t statistic
If the error terms in a regression analysis are not normally distributed
the confidence intervals for the parameters could be untrustworthy.
Non-normality of errors is considered a mild violation unless
the data has major outliers.
Non-constant error variance is considered a serious violation because
the significance of the regression could be overstated.
The test for zero slope will give the same result as the test for zero correlation because
the tcalc value will be the same in both tests.
Dependent errors are often found in
time-series data.
True or false: Once your data have been plotted on an Excel scatterplot you can fit a regression line to the points using the Trendline option.
true
To calculate the critical value of the correlation coefficient one will need to find the critical t statistic using n - _____________ degrees of freedom
two
A high residual value means the observation is far from the regression line in the ____________ direction
vertical
Given that the 95% confidence interval for β1 is (-34.5, 26.8) we can say that because the interval contains _____________ it is possible that the slope is 0.
zero
The test for zero correlation is the same as the test for
zero slope