BA 4 - Single Variable Linear Regression
AS X INCREASES, Y DOES NOT CHANGE. = Zero Slope AS X INCREASES, Y DECREASES. = Negative Slope AS X INCREASES, Y INCREASES. = Positive Slope
4.2 Practice Question 1 The slope provides information about the linear relationship between y (the dependent variable) and x (the independent variable). Match each relationship with the appropriate slope using =. ITEMS AS X INCREASES, Y DOES NOT CHANGE. AS X INCREASES, Y DECREASES. AS X INCREASES, Y INCREASES. CATEGORY Positive Slope Zero Slope Negative Slope
e)
4.2 Practice Question 2 Given the regression equation Selling Price = 13,490.45 + 255.36*(House Size) Where House Size is measured in square feet, what happens to the average selling price (in dollars) for houses whose size are increased by 500 square feet? a) Average selling price would remain the same b) Average selling price would increase by approximately $255 c) Average selling price would increase by approximately $13,490 d) Average selling price would increase by approximately $12,750 e) Average selling price would increase by approximately $127,500
c)
4.2 Practice Question 3 Given the regression equation Selling Price = $13,490.45 + $255.36*(House Size) What happens to the average selling price (in dollars) for houses whose size are decreased by 1000 square feet? a) Average selling price would decrease by approximately $268,490 b) Average selling price would increase by approximately $268,490 c) Average selling price would decrease by approximately $255,000 d) Average selling price would increase by approximately $255,000
b)
4.2 Practice Question 4 Given the regression equation Selling Price = 13,490.45 + 255.36*(House Size) Which of the following values represents the value of Selling Price at which the regression line intersects the vertical axis? a) Selling Price b) $13,490.45 c) $255.36 d) House Size
a)
4.2 Practice Question 5 Given the regression equation, Selling Price = 13,490.45 + 255.36*(House Size) Which of the following values represents the value of House Size at which the regression line intersects the horizontal axis? a) - 52.83 square feet b) 13,490.45 square feet c) 255.36 square feet d) The answer cannot be determined without further information.
The expected value of y The dependent variable The value we are trying to predict
4.2 Practice Question 6 Given the general regression equation, y = a + bx, which of the following describes y? Select all that apply. The expected value of y The expected value of x The independent variable The dependent variable The value we are trying to predict The intercept
a)
4.2.1. Question 1 Below is a scatter plot of selling price against house size. Based on this scatter plot, what is the approximate square footage of the smallest and largest homes sold during the summer of 2013? a) 600 and 4,700 b) 0 and 5,000 c) 150,000 and 1,300,000 d) 0 and 1,400,000
b)
4.2.1. Question 2 On the basis of the scatter plot, what tends to happen to selling price as house size increases? a) Price decreases b) Price increases c) Price remains the same
d)
4.2.1. Question 3 What is your estimate of the correlation coefficient of the relationship between selling price and house size? a) -0.70 b) 0.25 c) 0.50 d) 0.85
c)
4.2.2. Question 1 Here is the correct regression line—the best fit line through the data. What is your estimate of the slope of this line, that is, the average change in selling price as house size increases by one square foot? a) -250 b) 0 c) 250 d) 250,000
c)
4.2.2. Question 2 The following graphs show a sample of data points for two variables. Which graph displays the line that most accurately describes the linear relationship between the two variables? Again, please use your judgment here rather than using any formal metrics to minimize the "total distance" between the data points and the line. a) see image attached b) see image attached c) see image attached d) see image attached
a)
4.2.2. Question 3 The following graphs show a sample of data points for two variables. Which graph displays the line that most accurately describes the linear relationship between the two variables? Again, please use your judgment here rather than using any formal metrics to identify the line that minimizes the "total distance" between the data points and the line. a) see image attached b) see image attached c) see image attached d) see image attached
d)
4.2.2. Question 4 The following graphs show a sample of data points for two variables. Which graph displays the line that most accurately describes the linear relationship between the two variables? a) see image attached b) see image attached c) see image attached d) see image attached
c)
4.2.3 Question of the first mark Given the regression equation: Selling Price = 13,490.45 + 255.36*(House Size) Which of the following values represents the average change in selling price as house size increases by one square foot? a) Selling Price b) 13,490.45 c) 255.36 d) House Size
d)
4.2.3 Question of the second mark Given the regression equation: Selling Price = 13,490.45 + 255.36*(House Size) If you increase the square footage of a house by 100 square feet, what would happen to the selling price? a) Average selling price would remain the same b)Average selling price would increase by approximately $255 c) Average selling price would increase by approximately $2,550 d) Average selling price would increase by approximately $25,500
a)
4.3 Practice Question 2 Do you feel confident of the prediction you just made for a 900 square foot house, given the data available? See chart attached. a) Yes b) No
b)
4.3 Practice Question 4 Do you feel confident of the prediction you just made for a 425 square foot house, given the data available? a) Yes b) No
a)
4.3 Practice Question 6 Do you feel confident of the prediction you just made for a 3500 square foot house, given the data available? a) Yes b) No
c)
4.3.1 Question of the second mark Given the regression equation Selling Price = 13,490.45 + 255.36*(House Size) How much do you expect a 3,000 square foot house to cost? a) Approximately $600,000 b) Approximately $700,000 c) Approximately $775,000 d) Approximately $900,000
a)
4.3.2 Question on the fourth mark How would the width of the actual prediction interval (at a 95% confidence level) for a 3,000 square foot home differ from the width of the actual prediction interval (at a 95% confidence level) for a 2,000 square foot home, given that the average home size is approximately 1,750 square feet? a) The width of the actual prediction interval for a 3,000 square foot home would be larger than the width of the prediction interval for a 2,000 square foot home. b) The width of the actual prediction interval for a 3,000 square foot home would be smaller than the width of the prediction interval for a 2,000 square foot home. c) The width of the actual prediction interval for a 3,000 square foot home would be the same as the width of the prediction interval for a 2,000 square foot home.
d)
4.3.2 Question on the second mark The best point forecast for the selling price of a 2,500 square foot house is the expected selling price of a 2,500 square foot home, approximately 13,490 + 255.36(2,500) = $652,000. Given that the standard error of the regression is about $151,000, which of the following would give the BEST estimate for the prediction interval for a 2,500 square foot home with approximately 95% confidence? a) $2,500 ± $151,000 b) $2,500 ± 2($151,000) c) $652,000 ± $151,000 d) $652,000 ± 2($151,000)
0.0025 0.0100
4.4 Practice Question 1 Which of the following p-values would indicate that we can be 95% confident that there is a significant linear relationship between two variables? Select all that apply. 0.0025 0.0100 0.9500 0.9750
- We can be 90% confident that there is a significant linear relationship between the two variables. - We can be 95% confident that there is a significant linear relationship between the two variables.
4.4 Practice Question 2 A p-value to test the significance of a linear relationship between two variables was calculated to be 0.0210. What can we conclude? Select all that apply. - We can be 90% confident that there is a significant linear relationship between the two variables. - We can be 95% confident that there is a significant linear relationship between the two variables. - We can be 98% confident that there is a significant linear relationship between the two variables. - We can be 99% confident that there is a significant linear relationship between the two variables.
-9.85; 5.26 -5.26; 9.85
4.4 Practice Question 3 Which of the following 95% confidence intervals for a regression line's slope indicates that the linear relationship is NOT significant at the 5% level? Select all that apply. -9.85; -5.26 -9.85; 5.26 -5.26; 9.85 5.26; 9.85
a)
4.4 Practice Question 4 The linear relationship between two variables can be statistically significant but not explain a large percentage of the variation between the two variables. This would correspond to which pair of R^2 and p-value? a) Low R-squared, Low p-value b) Low R-squared, High p-value c) High R-squared, Low p-value d) High R-squared, High p-value
e)
4.4 Practice Question 5 When analyzing a residual plot, which of the following indicates that a linear model is a good fit? a) Patterns or curves in the residuals b) Increasing size of the residuals as values increase along the x-axis c) Decreasing size of the residuals as values increase along the x-axis d) Random spread of residuals around the y-axis e) Random spread of residuals around the x-axis
a)
4.4.1 Question 1 Which sum of squares is illustrated in this image? a) Residual Sum of Squares b) Regression Sum of Squares c) Total Sum of Squares
b)
4.4.1 Question 1 (second set) Earlier in this module, we found that the correlation coefficient between house size and selling price is 0.86. What is the R2 of the best fit line that describes the relationship between selling price and house size? a) 0.43 b) 0.74 c) 0.86 d) 0.93
c)
4.4.1 Question 2 Which sum of squares is illustrated in this image? a) Residual Sum of Squares b) Regression Sum of Squares c) Total Sum of Squares
d)
4.4.1 Question 2 (second set) Let's return to our Disney example. What do you estimate to be the R2 of the regression line that describes the relationship between home video units and 2011 gross box office sales? See attached image. a) -0.80 b) 0.20 c) 0.80 d) 0.99
a)
4.4.2 Question on the fourth mark Based on the segment of the output table shown below for the regression analysis of the U.S. motion picture industry's 2011 home video units vs. 2011 gross box office revenues, is there evidence of a significant linear relationship between these two variables? a) Yes b) No
-20.00; 5.00 -0.36; 0.55
4.4.2 Question on the second mark Which of the following 95% confidence intervals for a regression line's slope indicates that the linear relationship is not significant at the 5% level? Select all that apply. -11.89; -2.17 25.11; 44.37 -20.00; 5.00 -0.36; 0.55
b)
4.4.3 Question 1 Do you think the R2 value of the regression shown is close to 0 or close to 1? a) Close to 0 b) Close to 1
a)
4.4.3 Question 2 Consider the p-value corresponding to the independent variable's coefficient in the regression shown. Do you think the p-value is less than 0.05 or greater than 0.05? a) Less than 0.05 b) Greater than 0.05
a)
4.4.3 Question 3 Do you think the R2 of the regression shown on the top is smaller or larger than the R2 of the previous regression (shown on the bottom)? a) Smaller b) Larger
a)
4.4.3 Question 4 Consider the p-value corresponding to the independent variable's coefficient in the regression shown. Do you think that p-value is less than 0.05 or greater than 0.05? a) Less than 0.05 b) Greater than 0.05
b)
4.5 Practice Question 1 The spreadsheet below contains a partial view of data about U.S. corn acreage planted (in millions of acres) and the amount of corn (in millions of bushels) in storage from the previous year at the beginning of the year for each year from 1976 to 2013. We wish to use the data to predict the number of acres of corn that will be planted, based on the beginning corn stock in storage. Which variable is the independent variable? a) Corn Acreage Planted (in million acres) b) Stock of Corn at Start of Year (in million bushels)
c)
4.5 Practice Question 2 The spreadsheet below contains a partial view of data about U.S. corn acreage planted (in millions of acres) and the amount of corn (in millions of bushels) in storage from the previous year at the beginning of the year for each year from 1976 to 2013. If you want to include the variables' labels in the regression output, which cells should you select as the input range for the independent variable? a) Input Y Range, B1:B39 b) Input Y Range, B2:B39 c) Input X Range, B1:B39 d) Input X Range, B2:B39
DIRECTION (NORTH, SOUTH, EAST, AND WEST) = Create a dummy variable TEMPERATURE (IN DEGREES CELSIUS) = Do not create a dummy variable VOLUME (IN CUBIC METERS) = Do not create a dummy variable COUNTRY TELEPHONE CODE = Create a dummy variable
4.5 Practice Question 4 For each of the following variables, determine if a dummy variable should be created for a regression analysis. Use = ITEMS DIRECTION (NORTH, SOUTH, EAST, AND WEST) TEMPERATURE (IN DEGREES CELSIUS) VOLUME (IN CUBIC METERS) COUNTRY TELEPHONE CODE CATEGORY Create a dummy variable Do not create a dummy variable
- FUNCTION (FINANCE, MARKETING, OPERATIONS, HUMAN RESOURCES) - HIGHEST EDUCATION LEVEL ATTAINED (HIGH SCHOOL, - BACHELORS, MASTERS, OTHER) - FLAVOR (VANILLA, CHOCOLATE, STRAWBERRY) - LETTER GRADE (A+, A, A-,...F)
4.5.2 Question on the first mark (Qualitative) Determine whether each of the following variables is quantitative or qualitative. Select Qualitative Variables ITEMS - FUNCTION (FINANCE, MARKETING, OPERATIONS, HUMAN RESOURCES) - HEIGHT (INCHES) - TEST SCORE (0-100) - HIGHEST EDUCATION LEVEL ATTAINED (HIGH SCHOOL, - BACHELORS, MASTERS, OTHER) - PRICE (DOLLARS) - DISTANCE (MILES) - FLAVOR (VANILLA, CHOCOLATE, STRAWBERRY) - AGE (YEARS) - LETTER GRADE (A+, A, A-,...F)
- HEIGHT (INCHES) - TEST SCORE (0-100) - PRICE (DOLLARS) - DISTANCE (MILES) - AGE (YEARS)
4.5.2 Question on the first mark (Quantitative) Determine whether each of the following variables is quantitative or qualitative. Select Quantitative Variables ITEMS - FUNCTION (FINANCE, MARKETING, OPERATIONS, HUMAN RESOURCES) - HEIGHT (INCHES) - TEST SCORE (0-100) - HIGHEST EDUCATION LEVEL ATTAINED (HIGH SCHOOL, - BACHELORS, MASTERS, OTHER) - PRICE (DOLLARS) - DISTANCE (MILES) - FLAVOR (VANILLA, CHOCOLATE, STRAWBERRY) - AGE (YEARS) - LETTER GRADE (A+, A, A-,...F)
b)
4.6.1 Question on the third mark How much of the variation in home video units can be explained by gross box office sales? a) 89.64% b) 80.36% c) 80.23% d) 28.32%