BUSI 2305: CH. 13 SB, QA 11- Regression (Ch.13 and 14), Statistics Chapter 13 Practice, Chapter 13; correlation and linear regression, business stats chapter 13 smartbook
Which value of r indicates a stronger correlation than 0.40?
-0.80
What values can the correlation coefficient assume?
-1 ≤ r ≤ 1
What is the range of values for a coefficient of correlation?
-1.0 to +1.0 inclusive
What values can the correlation coefficient assume?
-1≤ r ≤ 1
If all the plots on a scatter diagram lie on a straight line, what is the standard error of estimate?
0
The coefficient of determination must be between
0 and 1 or 0 and 100%
When expressed as a percentage, what is the range of values for multiple R squared?
0% to +100%
What does the coefficient of determination equal if r= 0.89?
0.7921
Place the following steps in correlation analysis in the order that makes the most sense
1. make scatter diagram 2. calculate a correlation coefficient 3. draw a least squares fit line
If the coefficient of multiple determination is 0.81, what percent of variation is not explained?
19%
Given the least squares regression equation, Y= 1,202+ 1,133 x, when x=3, what does Y equal?
4,601
The correlation between the size of a house and it's sale price was found to be r=.77. what percentage of variation in sales price can be predicted from the size of a house using a regression line?
59% R^2 = 0.77^2
the correlation between wait time on a help line and customer satisfaction was found to be r=-.85. what percentage of variation in customer satisfaction can be predicted by wait time using a regression line?
72% R^2
If the coefficient of determination is 0.94, what can we say about the relationship between two variables?
94% of the total variation of the dependent variable is explained by the independent variable.
Approximately ______ % of the observations lies within two standard errors of the regression line?
95%
Which of the following is the test statistic for the correlation coefficient?
= r√n−2√1−r2rn-21-r2 Reason: With n - 2 degrees of freedom
CORRELATION ANALYSIS
A group of techniques to measure the relationship between two variables
What is Correlation Analysis?
A group of techniques to measure the relationship between two variables.
LEAST SQUARES PRINCIPLE
A mathematical procedure that uses the data to position a line with the objective of minimizing the sum of the squares of the vertical distances between the actual y values and the predicted values of y.
What is the chart called when the paired data (the dependent and independent variables) are plotted?
A scattered diagram
INDEPENDENT VARIABLE
A variable that provides the basis for estimation.
Which of the following are true assumptions underlying linear regression? 1. For each value of X, there is a group of Y values that is normally distributed. 2. The means of these normal distribution of Y values all lie on the regression line. 3. The standard deviations of these normal distributions are equal.
All of them are correct
What is the best definition of a regression equation?
An equation that expresses the linear relation between two variables.
What is the best definition of a regression equation?
An equation that expresses the linear relation between two variables. Reason: Y = a + bX
REGRESSION EQUATION
An equation that expresses the linear relationship between two variables.
In multiple regression analysis, how is the degree of association between a set of independent variable and a dependent variable measured?
Coefficient of multiple determination
In regression analysis, what is the predictor variable called?
Dependent variable
In multiple regression analysis, testing the global null hypothesis that all regression coefficients are zero is based on ________.
F-statistic
In multiple regression analysis, testing the global null hypothesis that all regression coefficients are zero is based on....
F-statistic
In multiple regression analysis, a residual is the difference between the value of an independent variable and it corresponding dependent variable value.
False
T/F- An example of a dummy variable is "time to product's first repair" in years.
False
T/F- In regression analysis error is defined as (Y-Y).
False
T/F- Multiple regression analysis is used when one independent variable is used to predict values of two or more dependent variables.
False
T/F- Step-wise regression analysis involves removing all of the independent variables at once whose p-values are not less than alpha.
False
T/F- The coefficient of determination is the square root of the coefficient of correlation.
False
T/F- The multiple coefficient of determination, R square, reports the proportion of the variation in Y that is not explained by the variation in the set of independent variables.
False
T/F- The strength of the correlation between two variables depends on the sign of the coefficient of correlation.
False
In order to properly apply regression analysis, what assumption must be made about the distribution of Y values?
For each X value, the Y values are normally distributed.
Suppose we believe that X and Y are positively correlated. Which of the following is a valid null hypothesis for this test of significance of the correlation coefficient?
H0: p <= 0 this would make the alternate --> H1: p>0 the null must have the equality
if we wanted to test to see if there was a negative correlation between two variables, which one of the following would be the correct alternative hypothesis?
H1: p< 0
Which of the following is NOT an example of correlation analysis?
Hypothesis testing for equality of means
Which of the following is NOT an example of correlation analysis?
Hypothesis testing for equality of means.
A test for the slope of the regression line uses the hypotheses H0: β = 0, H1: β ≠ 0. What are we seeking to discover with this test?
If the regression line has predictive power for the dependent variable.
Which of the following are characteristics of the correlation coefficient? Select all that apply.
Indicates the direction and strength of the linear relationship between two interval or ratio scale variables. A value near -1 indicates a negative linear relationship. The values range from -1 up to +1
Which of the following are characteristics of the correlation coefficient? Select all that apply.
Indicates the direction and strength of the linear relationship between two interval or ratio scale variables. A value near 1 indicates a positive linear relationship. The symbol of the sample correlation coefficient is lowercase, r. A value near zero indicates little linear relationship between the variables.
Which of these statements correctly describes the values that can be assumed by the correlation coefficient, r?
It can be any number from -1 to +1, inclusive.
Which of the following is true about the standard error of estimate?
It is a measure of the accuracy of the prediction
A line is drawn through the points on a scatter diagram. Which three of the following are not likely to be a least squares fit?
Nearly all of the data points are below the line. All of the data points are above the line. The line passes through the largest and smallest data points.
Based on the regression equation, we can...
Predict the value of the dependent variable given a value of the independent variable.
Which of the following equations applies to the Coefficient of Determination?
R2 = SSR/SStotal = 1 - SSE/SStotal
If there are four independent variables in a multiple regression equation, there are also four___________.
Regression coefficients
How can you transform a non-linear relationship to better use correlation analysis?
Replace one or both variables with its log, square root, reciprocal, etc.
To evaluate the assumption of linearity, a multiple regression analysis should include ______.
Scatter diagrams of the dependent variable plotted as a function of each independent variable.
If the correlation between thwo variables is close to one, the association between the variables is ______.
Strong
Which of the following statistical distributions is used for the test for the slop of the regression equation
Student's t-distribution
Which of the following statistical distributions is used for the test for the slop of the regression equation?
Student's t-distribution
STANDARD ERROR OF ESTIMATE
Syx = sqrt([sigma(y-yhat)^2 /n-2) Syx - standard error of Y for given X (or standard error of estimate) Yhat- estimated Y for given X n-2 -df: sample size minus 2 y - an observed value of Y
What is the term that is used for the proportion of the total variation in Y that is explained by the variation in X?
The Coefficient of Determination
In analyzing the strength of the relationship between two variables, what does the symbol r represent?
The Pearson correlation coefficient.
In the regression equation, what does the letter "a" represent?
The Y-intercept
What does a coefficient of correlation of 0.70 infer?
The coefficient of determination is 0.49
How do you calculate the coefficient of determination?
The coefficient of determination, R2, is the square of the correlation coefficient, r.
What is the definition of the standard error of estimate?
The dispersion (scatter) of observed values around the line of regression for a given X.
The general form of the regression equation is written like this: Y∧= a + bX. Why is the dependent variable written as Y∧ instead of just Y?
The hat is to emphasize that the equation estimates the Y-value for a given X.
In the regression equation, what does the letter "Y" represent?
The independent variable
In the regression equation, what does the letter "x" represent?
The independent variable
A valid multiple regression analysis assumes or requires that______________________
The independent variables and the dependent variable have a linear relationship.
What are we estimating when we use the "confidence interval" in conjunction with a regression line?
The mean of the distribution of Y for a given X.
What is the definition of the coefficient of determination?
The proportion of the total variation in Y that is explained by the variation in X.
A test for the slope of the regression line uses the hypotheses H0: β = 0, H1: β ≠ 0. If we reject H0 what have we demonstrated about the regression line?
The regression line has some power to predict the value of the dependent variable.
In the regression equation, what does the letter "b" represent?
The slope of the line
When a line is drawn on a scatter diagram using the least squares principal, what is the quantity that is minimized?
The sum of the squared difference between the line and the data points.
Which of the following tests gives the same result as a test of the regression line slope?
The t-test for the correlation coefficient. they are mathematically the same
DEPENDENT VARIABLE
The variable that is being predicted or estimated.
If the correlation coefficient between two variables, X and Y, equals zero, what can be said of the variables X and Y?
The variables are not related
In multiple regression analysis, residual analysis is used to test the requirement that ________
The variation in the residuals is the same for all predicted values of Y
In multiple regression analysis, residual analysis is used to test the requirement that___________.
The variation in the residuals is the same for all predicted values of Y.
In regression analysis, it is assumed that for any given X the Y values are normally distributed (the "Normality" assumption). What else is assumed about these distributions? Choose all that apply.
Their means lie on the regression line. Reason: The "equal means" assumption. They have equal standard deviations. Reason: The "equal variance" assumption. They are independent. Reason: The "independence" assumption.
How can we use correlation analysis to explore non-linear relationships?
Transform one or both variables.
T/F- A correlation matrix can be used to assess multicollinearity between independent variables.
True
T/F- A correlation matrix shows individual correlation coefficients for all pairs of variables.
True
T/F- A scatter diagram is a graph that portrays the correlation between a dependent variable and an independent variable.
True
T/F- An economist is interested in predicting the unemployment rate based on gross domestic product. Since the economist is interested in predicting unemployment, the independent variables is gross domestic product.
True
T/F- Because the coefficient of determination is expressed as a percent, its values is between 0% and 100%
True
T/F- Correlation analysis is a statistical technique used to measure the strength of the relationship between two variables.
True
T/F- In multiple regression analysis, an F-statistic is used to test the global hypothesis.
True
T/F- Interaction occurs when the relationship between an independent variable and a dependent variable is affected by another independent variable.
True
T/F- Interaction occurs when the relationship between an independent variable and dependent variable is affected by another independent variable.
True
T/F- One assumption underlying linear regression is that the X values are normally distributed.
True
T/F- Step-wise regression analysis is a method that assists in selecting the most significant variables for a multiple regression equation.
True
T/F- The coefficient of determination is the proportion of total variation in Y that is explained by X.
True
T/F- The least squares technique minimizes the sum of the squares of the vertical distances between the actual Y values and the predicted values of Y.
True
T/F- The regression equation is used to estimate a value of the dependent variable Y based on a selected value of the independent variable X.
True
T/F- The values of a and b in the regression equation are called the regression coefficients.
True
When does multicollinearity occur in a multiple regression analysis?
When the independent variables are highly correlated
What is the general form of the regression equation?
Y= bX +a or Y= a + (bX)
The general form of the regression equation is written like this:
Y∧ estimated Y value X the independent variable a the Y intercept b slope of the line
Which of the following regression equations looks like it matches the scatter diagram below? Chart w/ point: (12.5,1)
Y∧= 14.4 - 1.9X.
Which of the following regression equations looks like it matches the scatter diagram below?
Y∧Y∧= 3.0 + 1.6X.
In the general multiple regression equation, which of the following variables represents the y-intercept?
a
Which of these is the equation for the y-intercept of the regression line?
a = Ybar- bXbar
The equation for the y-intercept of the regression line is: a = Ybar- bXbar. Match the variables to their descriptions. a b Ybar Xbar
a = y-intercept b = the slope of the regression line Ybar = mean of the dependent variable Xbar = mean of the independent variable
y-intercept
a = ybar-(b)(xbar) a - the y-intercept b - the slope of the regression line ybar- mean of dependent var xbar-mean of independent variable
correlation coefficient
a measure of the strength of the linear relationship between two variables If there is absolutely no relationship between the two sets of variables, Pearson's r is zero. A correlation coefficient r close to 0 (say, .08) shows that the linear relationship is quite weak. -1 <= r <= 1 r=Σ(x−x¯)(y−y¯)////(n−1)sxsy
A study of the time spent taking a test and the final score on the test found a correlation coefficient of r = .13. how would you describe this relationship?
a weak positive correlation
What is Correlation Analysis?
a) A group of techniques to measure the relationship between two variables.
An experiment of study times versus test scores found a correlation coefficient of r = 0.49. How would you describe this relationship?
a) A moderate positive correlation.
A line is drawn through the points on a scatter diagram. Which three of the following are not likely to be at least squares fit?
a) All of the data points are above the line. b) The line passes through the largest and smallest data points. d) Nearly all of the data points are below the line.
Which of these statements correctly describes the values that can be assumed by the correlation coefficient, r?
a) It can be any number from -1 to +1, inclusive.
Which of the following is the equation for the slope of the regression line?
b = r sY/sX
Which of the following is the equation for the slope of the regression line?
b = r(sY/sX)
SLOPE OF THE REGRESSION LINE
b = r(sy/sx) r - correlation coe sy - sd of y (dependent var) sx - sd of x (indep var)
The slope of the regression line is given by b = r(sY/sX). Match the variables to their description. b sy sx r
b = the slope sy = standard deviation of sample Y values sx = standard deviation of sample X values r = sample correlation coefficient
A study of finishing time versus standardized test scores found a correlation coefficient of r = 0.13. How would you describe this relationship?
b) A weak positive correlation.
Which of the following is NOT an example of correlation analysis?
b) Hypothesis testing for equality of means.
Which two of the following could be regression equations?
b) Y = 509X + 4335 d) Y = 3.42 - 0.56X
A study of hours spent playing video games versus school grades found a correlation coefficient of r = -0.53. How would you describe this relationship?
c) A moderate negative correlation.
What is the best definition of a regression equation?
c) An equation that expresses the linear relation between two variables.
When testing a correlation, we use a null hypothesis about the population correlation. What question are we trying to answer with the test?
c) Could the sample correlation be r even though the population correlation is actually zero?
A study found a correlation of r = 0.68 between the size of someone's vocabulary and their income. What can you reasonably conclude?
c) Something else is related to both vocabulary and income.
Which of the following is the correct null hypothesis for the test of a sample correlation?
c) The population correlation is zero.
When a line is drawn on a scatter diagram using the least squares principal, what is the quantity that is minimized?
c) The sum of the squared difference between the line and the data points.
if X is the size (in square feet) of a home and Y is its sales price and the regression equation relating them is Yhat = $92,000 +86x, what is the predicted sales price of a home when x=0? assume that all homes used to build the model were between 1,800 and 2,500 square feet
cannot estimate x=0 is outside of the range of x-values used to build the mode. x=0 means there is no home
A test for the slope of the regression line uses the hypotheses H0: B=0, H1: B!=0. what are we seeking to discover with this test?
if the regression line has predictive power for the dependent variable
Which one of the following demonstrates the correct identification of the independent and dependent variables?
independent var: the size of a house dependent var: the sales price of a house
If two variables are correlated to each other, which of the following are characteristics of the dependent variable? select all that apply
it is usually shown on the vertical axis of a scatter diagram in a cause and effect relationship, it is the effect
which of the following is usually the first step in a correlation analysis?
making a scatter diagram
The formula for the correlation coefficient is r = Σ(X−X)(Y−Y)(n−1)sxsyΣ(X-X)(Y-Y)(n-1)sxsy. Match the variables to their description. n sx Y -Y
n = number of paired observations sx = standard deviation of x Y = one value of y -Y =mean of the y-values
A scatter diagram in which the points move from the bottom left to the upper right would be characterized by what type of correlation coefficient?
positive
What symbol is used for the Pearson correlation coefficient, which shows the strength of the relation between two variables?
r
The equation to estimate Y on the basis of X is referred to as the
regression equation
Which of the following symbolized the standard error of estimate?
sY⋅X
Which one of these tools allows one to examine the relationship between two variables of interval- or ratio-level measurement?
scatter diagram
a study found a correlation of r=.68 between the weekly sales of ice cream and the number of car accidents. what can you reasonably conclude?
something else is related to ice cream sales and car accidents. for example, ice cream sales may go up during the summer months and more traveling takes place durring summer months
how is the standard error of the estimate calculated from ANOVA information?
sqrt(MSE) = sqrt(SSE/n-2)
The equation for the standard error of estimate is: sy⋅x=√Σ(y−y∧)2n−2Σ(y-y∧)/2n-2. Match the variables to their descriptions. sy.x y∧ n - 2 y
sy.x = standard error of Y for given X y∧ = estimated Y for a given X n - 2 = df: sample size minus 2 y = an observed value of Y
What is the equation for the standard error of estimate?
sy⋅x = Σ(y−y∧)2/n−2√
Which of the following is the test statistic for the correlation coefficient?
t = r√n−2 / √1−r2
The formula for the test of significance of the sample correlation is: t = r√n−2√/1−r2rn-21-r2. Match the variables to their description. t n - 2 r n
t = t-distribution test statistic n - 2 = degrees of freedom r = sample correlation n = sample size
The equation for the test for the slope of a regression line is: t = b−0/sb. Match the variables and their description for this equation. t n - 2 b sb
t = the test statistic n - 2 = degrees of freedom for t b = r(sY/sX) sb = standard error of the slope
What is the test statistic to test the significance of the slope in a regression equation?
t-statistic
TEST FOR SLOPE OF REGRESSION LINE
t=(b-0)/s of b s of b - standard error of the slope b- r(sy/sx)
t TEST FOR THE CORRELATION COEFFICIENT
t=r√(n-2)///√(1-r^2) with n−2 degrees of freedom t - t-dist test stat n-2 is degrees of freedom r - sample correlation n-sample size
What is the term that is used for the proportion of the total variation in Y that is explained by the variation in X?
the coefficient of determination
If the standard error of estimate for a regression line is large, what would you expect for the coefficient of determination?
the coefficient of determination should be small (a large error means a small predictive abilitiy)
How do you calculate the coefficient of determination?
the coefficient of determination, R^2, is the square of the correlation coefficient , r.
A line is drawn through the points on a scatter diagram. Which three of the following are not likely to be a least squares fit?
the line passes through the largest and smallest data points all of the data points are above the line nearly all of the data points are below the line
which of the following is the correct null hypothesis for the test of a sample correlation?
the population correlation is zero H0: p=0, H1: p != 0
in evaluating a regression equation, what does it mean if the standard error of estimate is small?
the predicted y will have small error the data is close to the regression line
Which of the following are statistics that regression analysis provides to evaluate the predictive ability of the regression equation?
the standard error of the estimate the coefficient of determination
When a line is drawn on a scatter diagram using the least squares principal, what is the quantity that is minimized?
the sum of the squared difference between the line and the data points
Which of the following illustrate the connection between the tests of B and p? select all that apply
their t-stats are the same their p-values are the same their degrees of freedom are the same
Compare the test for the slope of the regression line and the test for the correlation coefficient
they are mathematically the same and give the same result
GENERAL FORM OF LINEAR REGRESSION EQUATION
yˆ=a+bx where yhat - estimated value of the y variable for a selected x value a - the y-intercept. the estimated value of Y when x=0 (where the regression line crosses the y=axis when x is zero) b - slope x - any value of the independent variable that is selected
Which of the following is the formula for the correlation coefficient, r?
Σ(X−X)(Y−Y)/(n−1)sxsy
How is the standard error of the estimate calculated from ANOVA information?
√MSEMSE=√SSE/n−2
A line is drawn through the points on a scatter diagram. Which three of the following are not likely to be a least squares fit?
✅Nearly all of the data points are below the line. Reason: Unless the points above are very far from the line, this couldn't be a least squares fit. ✅All of the data points are above the line. Reason: This could not be a least squares fit. ✅The line passes through the largest and smallest data points. Reason: This could happen with a least squares fit, but is very unlikely.