Correlation and Linear Regression
A test for the slope of the regression line uses the hypotheses H0: β = 0, H1: β ≠ 0. What are we seeking to discover with this test?
If the regression line has predictive power for the dependent variable.
How is the "confidence interval" used as a part of regression analysis?
It is used when predicting the mean value of Y for a given X.
The regression line has some power to predict the value of the dependent variable.
Student's t-distribution
The formula for the test of significance of the sample correlation is: t = r√n−2√1−r2rn-21-r2. Match the variables to their description.
t matches Choice t-distribution test statistic n - 2 matches Choice degrees of freedom r matches Choice sample correlation n matches Choice sample size
The equation for the test for the slope of a regression line is: t = b−0sbb-0sb. Match the variables and their description for this equation.
t- the test statistic n-2 -degrees of freedom for t b - r(sY/sX) Sb - standard error of slope
The general form of the regression equation is written like this: Y∧Y∧= a + bX. Match the variables to their description.
y^ = Estimated y Value x = independant variable a = y intercept b = slope of line
If X is the miles on a particular model used car and Y is its sales price and the regression equation relating them is Y∧Y∧= 18,000 - 0.083X, what is the predicted sales price of a car with 25,000 miles on it?
$15,925
If X is the amount a grocer spends on advertising and Y is his gross sales and the regression equation relating them is Y∧Y∧= 23044 + 10.4X, what is his predicted income if he spends $4000 on advertising?
$64644
How can you tell if the relationship between two variables is non-linear?
You can easily see a non-linear relationship on a scatter diagram.
Which of the following regression equations looks like it matches the scatter diagram below?
Y∧Y∧= 14.4 - 1.9X.
Which of the following regression equations looks like it matches the scatter diagram below?
Y∧Y∧= 3.0 + 1.6X.
Which of these is the equation for the y-intercept of the regression line?
a = YY- bX
The equation for the y-intercept of the regression line is: a = YY- bXX. Match the variables to their descriptions.
a matches Choice the y-intercept b matches Choicethe slope of the regression line YY matches Choice mean of the dependent variable XX matches Choice mean of the independent variable
Which of the following is true regarding the standard error of the estimate?
The smaller the standard error of the estimate, the closer the points are to a straight line.
How is the standard error of the estimate calculated from ANOVA information?
mse
What is Correlation Analysis?
A group of techniques to measure the relationship between two variables.
An experiment of study times versus test scores found a correlation coefficient of r = 0.49. How would you describe this relationship?
A moderate positive correlation.
Which of the following are characteristics of the correlation coefficient? Select all that apply.
A value near 1 indicates a positive linear relationship. Indicates the direction and strength of the linear relationship between two interval or ratio scale variables. A value near zero indicates little linear relationship between the variables. The symbol of the sample correlation coefficient is lowercase, r.
A study of finishing time versus standardized test scores found a correlation coefficient of r = 0.13. How would you describe this relationship?
A weak positive correlation.
Which of the following is a valid null hypothesis for the test of significance of the correlation coefficient?
H0: ρ = 0
Which of the following is NOT an example of correlation analysis?
Hypothesis testing for equality of means.
Which of these statements correctly describes the values that can be assumed by the correlation coefficient, r?
It can be any number from -1 to +1, inclusive.
Which two of the following describe the independent variable in a relationship between two variables?
It is used to predict the other variable. On a scatter diagram, it is the horizontal axis.
How is the "prediction interval" used as a part of regression analysis?
It is used when predicting a particular value of Y for a given X.
If two variables are correlated to each other, which two of the following are characteristics of the dependent variable?
It is usually shown on the vertical axis of a scatter diagram. In a cause and effect relationship, it is the effect.
The general form of the regression equation is written like this: Y∧Y∧= a + bX. What is the correct interpretation of b?
It represents the estimated change in Y for a unit change in X
Which of the following is usually the first step in a correlation analysis?
Making a scatter diagram.
A line is drawn through the points on a scatter diagram. Which three of the following are not likely to be a least squares fit?
Nearly all of the data points are below the line. The line passes through the largest and smallest data points. All of the data points are above the line.
Which of the following equations applies to the Coefficient of Determination?
R2 = SSRSStotalSSRSStotal=1 - SSESStotal
What is the symbol for the coefficient of determination?
R2, equal to r squared.
How can you transform a non-linear relationship to better use correlation analysis?
Replace one or both variables with its log, square root, reciprocal, etc.
In analyzing the strength of the relationship between two variables, what does the symbol r represent?
The Pearson correlation coefficient.
If the standard error of estimate for a regression line is large, what would you expect for the coefficient of determination?
The coefficient of determination should be small.
How do you calculate the coefficient of determination?
The coefficient of determination, R2, is the square of the correlation coefficient, r.
Which two of the following are statistics that regression analysis provides to evaluate the predictive ability of the regression equation?
The coefficient of determination. The standard error of the estimate.
What is the term that is used for the proportion of the total variation in Y that is explained by the variation in X?
The Coefficient of Determination
What values can the correlation coefficient assume?
-1≤ r ≤ 1
The correlation between the weight of an automobile and its gas mileage was found to be r = 0.77. What percentage of variation in mileage can be predicted from a car's weight using a regression line?
59%
Assume you have obtained a regression model to predict the sales revenue based on marketing expenditures. If the standard error of the estimate was found to be $12,200, which of the following would be true?
68% of your predictions would be within $12,200 of the actual value.
The correlation between wait time on a help line and customer satisfaction was found to be r = -0.85. What percentage of variation in customer satisfaction can be predicted by wait time using a regression line?
72%
Assume you have obtained a regression model to predict the sales price of a house based on the house's square footage. If the standard error of the estimate was found to be $8,400, which of the following would be true?
95% of your predictions would be within $16,800 of the actual value.
What is the best definition of a regression equation?
An equation that expresses the linear relation between two variables.
When testing a correlation, we use a null hypothesis about the population correlation. What question are we trying to answer with the test?
Could the sample correlation be r even though the population correlation is actually zero?
Why does it make sense that the prediction interval for Y would be wider than the confidence interval?
The confidence interval is for the mean of Y, and the prediction interval is for a single value.
What is the definition of the standard error of estimate?
The dispersion (scatter) of observed values around the line of regression for a given X.
The general form of the regression equation is written like this: Y∧Y∧= a + bX. Why is the dependent variable written as Y∧Y∧instead of just Y?
The hat is to emphasize that the equation estimates the Y-value for a given X.
What are we estimating when we use the "confidence interval" in conjunction with a regression line?
The mean of the distribution of Y for a given X.
In evaluating a regression equation, what does it mean if the standard error of estimate is small? Choose all that apply.
The predicted Y will have small error. The data is close to the regression line.
Which do we expect to be larger, the confidence interval or the prediction interval, and why?
The prediction interval, because it is more accurate to predict a mean than to predict a single value.
What is the definition of the coefficient of determination?
The proportion of the total variation in Y that is explained by the variation in X.
A test for the slope of the regression line uses the hypotheses H0: β = 0, H1: β ≠ 0. If we reject H0 what have we demonstrated about the regression line?
The regression line has some power to predict the value of the dependent variable.
When a line is drawn on a scatter diagram using the least squares principal, what is the quantity that is minimized?
The sum of the squared difference between the line and the data points.
Which of the following tests gives the same result as a test of the regression line slope?
The t-test for the correlation coefficient.
What are we estimating when we use the "prediction interval" in conjunction with a regression line?
The value of Y for a given value of X.
The distance an object falls is directly related to the time it takes to fall, yet the variables time and distance fallen have a low correlation coefficient. How can this be true?
The variables have a non-linear relationship.
In regression analysis it is assumed that for any given X the Y values are normally distributed (the "Normality" assumption). What else is assumed about these distributions? Choose all that apply.
Their means lie on the regression line. They have equal standard deviations. They are independent.
For two variables the correlation coefficient is found to be nearly equal to zero. How would you describe the relationship between the two variables?
There is very little, if any, relationship between the variables.
A study found a correlation of r = 0.7 between watching Netflix and grade point average. What can you reasonably conclude?
There may be a third variable involved in the relationship.
Compare the test for the slope of the regression line and the test for the correlation coefficient.
They are mathematically the same and give the same result.
How can we use correlation analysis to explore non-linear relationships?
Transform one or both variables.
Which of the following is the equation for the slope of the regression line?
b = r sYsX
A scatter diagram in which the points move from the bottom left to the upper right would be characterized by what type of correlation coefficient?
positive
What symbol is used for the Pearson correlation coefficient, which shows the strength of the relation between two variables?
r
Which of the following symbolized the standard error of estimate?
sY⋅X
The equation for the standard error of estimate is: sy⋅x=√Σ(y−y∧)2n−2Σ(y-y∧)2n-2Match the variables to their descriptions.
sy.x = standard error of Y for given X Y^ = estiamted Y for a given X n-2= df y= an bserved value of Y
What is the equation for the standard error of estimate?
sy⋅x=√Σ(y−y∧)2n−2
Which of the following is the test statistic for the correlation coefficient?
t = rn−2√1−r2√
If X is the size (in square feet) of a home and Y is its sales price and the regression equation relating them is Y∧Y∧= $92,000 + 86X, what is the predicted sales price of a home when X=0? Assume that all homes used to build the model were between 1,800 and 2,500 square feet.
$92,000
A study found a correlation of r = 0.68 between the size of someone's vocabulary and their income. What can you reasonably conclude?
A third variable may be related to both vocabulary and income.
In order to properly apply regression analysis, what assumption must be made about the distribution of Y values?
For each X value, the Y values are normally distributed.