BA Module 4

Ace your homework & exams now with Quizwiz!

What is the correlation coefficient of the relationship between the average weekly hours spent studying and the score on the final exam?

0.5049 0.5049 is the Multiple R value. Remember that for single variable linear regression, Multiple R, which is the square root of R2, is equal to the absolute value of the correlation coefficient. The regression coefficient for Average Weekly Hours Studying (0.03, as shown in the bottom table of the output) is positive, so the slope is of the regression line is positive. Therefore, the correlation coefficient must also be positi

Earlier in this module, we found that the correlation coefficient between house size and selling price is 0.86. What is the R2 of the best fit line that describes the relationship between selling price and house size?

0.74 Remember that for a single variable linear regression, R2 is the square of the correlation coefficient. Here, the correlation coefficient is 0.86, so R2=0.86squared=0.74.

Let's return to our Disney example. What do you estimate to be the R2 of the regression line that describes the relationship between home video units and 2011 gross box office sales?

0.80 The independent variable explains a lot of the variation in the dependent variable, but not quite all of it. In total, the data points are close to the best fit line, but they do not lie on it. Thus, an R2 of 0.80 seems like a good estimate. wrong: -0.80 Remember that R2 is a value between 0 and 1, so a negative value is not possible. For a single variable linear regression, the correlation coefficient equals the positive or negative square root of R2, and can range from -1 to 1.

Which of the following 95% confidence intervals for a regression line's slope indicates that the linear relationship is not significant at the 5% level? Select all that apply.

-20.00; 5.00 The range between -20.00 and 5.00 contains zero, which indicates that the linear relationship is not significant at the 5% level. Note that another option is also correct. -0.36; 0.55 The range between -0.36 and 0.55 contains zero, which indicates that the linear relationship is not significant at the 5% level. Note that another option is also correct.

Given the regression equation, SellingPrice=13,490.45+255.36(HouseSize), which of the following values represents the value of HouseSizeHouseSize at which the regression line intersects the horizontal axis?

-52.83 square feet The regression line intersects the horizontal axis when Selling Price = $0, that is, when House Size = -52.83 square feet. 13,490.45+ 255.36*(-52.83)=$0.00 (actually, -52.82914, which rounds to -52.83).

Given the regression equation, SellingPrice=13,490.45+255.36(HouseSize)SellingPrice=13,490.45+255.36(HouseSize), if you increase the square footage of a house by 100 square feet, what would happen to the selling price?

Average selling price would increase by approximately $25,500 The slope of our regression line, about 255 dollars/square foot, describes the expected change in price when house size increases by one square foot. If the square footage increased by a factor of 100, the expected price must also increase by a factor of 100. Therefore, the average increase in price as square footage increases by 100 square feet is 100($255)=$25,500.

A scientist believes that, over the years, the number of major earthquakes has been decreasing. To test his hypothesis, the scientist collects data on the number of earthquakes above magnitude 7.0 on the Richter scale that have occurred each year from 1900 to 2012. Using the data below create a scatter plot with year on the horizontal axis.

From the Insert menu, select Scatter, then select Scatter With Only Markers. The Input Y Range is B1:B114 and the Input X Range is A1:A114. You must check the Labels in first row box to ensure that the scatter plot's axes are appropriately labeled.

graph: close tog

High R2 (0.99): A large portion of the variation in yy is explained by the regression line. Low p-value (0.0000): There is a significant linear relationship between the dependent and independent variables.

Next, calculate the same quantity—the expected selling price of a home in a school district that has average SAT scores below 1700 (SAT=0)—but do it using only the data and standard Excel functions, without the regression model.

The average selling price of homes, given they are located in school districts where students have low SAT scores can be calculated as AVERAGEIF(B2:B31,0,C2:C31)=$389,376.

Based on the scientist's regression model, forecast the number of earthquakes above magnitude 7.0 that will occur in 2019.

The expected number of earthquakes above magnitude 7.0 that will occur in 2019 is B15+B16*2019=14.4. You must link directly to the cell values in order to obtain the correct answer.

Use the 2012 model to develop a baseline forecast for Frozen's home video units. Assume that Disney Studios estimated that the gross box office sales for Frozen would be approximately $360 million.

The expected number of home video units that will be sold is B15+B16*360=7,074 thousand. You must link directly to the values in order to obtain the correct answer. B15=intere/coeff B16=gross box office/coeff

Based on the regression model, forecast the expected production volume when there are 112 factory workers.

The expected production volume when there are 112 factory workers is B15+B16*112=118,846. You must link directly to values in order to obtain the correct answer.

Given the regression equation, SellingPrice=13,490.45+255.36(HouseSize), what do you expect the selling price of a 2,000 square foot home to be?

The expected selling price of a 2,000 square foot home is B2+B3*2000=$524,217.93.

Given the regression equation, SellingPrice=13,490.45+255.36(HouseSize), what do you expect the selling price of a 6,000 square foot home to be?

The expected selling price of a 6,000 square foot home is B2+B3*6000=$1,545,672.87. You must link directly to the values in order to obtain the correct answer

Use the regression model to calculate the expected selling price of a home in a school district that has average SAT scores above 1700 (SAT=1).

The expected selling price of homes in school districts where students have average SAT scores above 1700 is B15+B16*1=B15+B16=$809,100. You must link directly to the values in order to obtain the correct answer.

Use the regression model to calculate the expected selling price of a home in a school district that has average SAT scores below 1700 (SAT=0).

The expected selling price of homes in school districts where students have low SAT scores is B15+B16*0=B15=$389,376. You must link directly to the values in order to obtain the correct answer. B15= coeff/intercept B16=SAT (0=low,1=high)

Given the general regression equation, ŷ =a+bx,which of the following describes ŷ y^? Select all that apply.

The expected value of y The dependent variable The value we are trying to predict

A human resources department wants to understand the relationship between the number of factory workers and production volume, which is measured in units produced per day. Perform a regression analysis, where the number of workers is the independent variable and production volume is the dependent variable. Be sure to include the residuals and residual plot in your analysis.

rom the Data menu, select Data Analysis, then select Regression. The Input Y Range is A1:A21 and the Input X Range is B1:B21. You must check the Labels box to ensure that the regression output table is appropriately labeled. You must also check the Residuals and Residual Plots boxes so that you are able to analyze the residuals.

Correlation Coefficient(R)

±√Rsquared

The regression output table

divided into three main parts: the Regression Statistics table, the ANOVA table, and the Regression Coefficients table. Although this course does not cover some of the ANOVA (Analysis of Variation) measures, we've included the definitions for completeness. The Residual Output table appears only if we select Residuals when inputting data in the regression dialog box.

R2 - R-squared

measures how closely a regression line fits a data set

Below is a partial regression output table, which of the following values most likely belongs in the Lower 95% cell for the independent variable in the output table?

-2.45 Since the p-value, 0.3956, is greater than 0.05, the linear relationship is not significant at the 95% confidence level. Therefore, the 95% confidence interval of the slope must contain zero. The confidence interval is centered around the slope of 1.78, so the lower and upper bounds must be equally distant from the slope. The Upper 95% minus the slope is 6.01 (x/upper)-1.78(x/coeff)=4.23, so the Lower 95% is 1.78-4.23=-2.45.

The best point forecast for the selling price of a 2,500 square foot house is the expected selling price of a 2,500 square foot home, approximately 13,490 + 255.36(2,500) = $652,000. Given that the standard error of the regression is about $151,000, which of the following would give the BEST estimate for the prediction interval for a 2,500 square foot home with approximately 95% confidence?

$652,000 ± 2($151,000) A prediction interval is centered at a point forecast, in this case $652,000. The standard error of the regression is multiplied by 2 since we wish to estimate the prediction interval at the 95% confidence level. Note that we are using 2 to approximate the z-value for a 95% prediction interval. The actual z-value corresponding to 95% (for sufficiently large samples) is 1.96.

If the expected production volume when there are 120 workers is approximately 131,958 units, which of the following equations would provide a reasonable estimate of the 68% prediction interval for the output of those 120 workers?

131,958±14,994.93 A reasonable estimate of the prediction interval is the point forecast (131,958) plus or minus the z-value times the standard error of the regression (14,994.93 - standard error). As usual, the z-value is based on the desired level of confidence. Since we want a 68% prediction interval, the z-value is equal to one. Therefore 131,958±14,994.93 is the best option.

Here is the correct regression line—the best fit line through the data. What is your estimate of the slope of this line, that is, the average change in selling price as house size increases by one square foot?

250 Pick two points on the x-axis—let's say 1,000 and 2,000—and see what the corresponding points are on the y-axis. According to the regression line, the expected selling price of a 1,000 square foot house is approximately $250,000, and for a 2,000 square foot house is approximately $500,000. Therefore, as house size increases by 1,000 square feet, price increases, on average, by approximately $250,000. To find the average change in price as house size increases by one square foot, we divide $250,000 by 1,000. We find that as house size increases by one square foot, price increases, on average, by approximately $250. (rise/run)

Given the regression equation, SellingPrice=13,490.45+255.36(HouseSize)SellingPrice=13,490.45+255.36(HouseSize), which of the following values represents the average change in selling price as house size increases by one square foot?

255.36 255.36 dollars/square foot is the line's slope, which is equal to the average change in selling price as house size increases by one square foot.

How much variation in production volume can be explained by the number of factory workers?

57.56% The percent of variation in production volume that can be explained by the number of factory workers is represented by the R2 value. The R2 value is 57.56%.

How much of the variation in home video units can be explained by gross box office sales?

80.36% R2 is the amount of variation in home video units that is explained by this model. 80.36% of the variation in home video units can be explained by the relationship with gross box office sales

Let's forecast the selling price of a 1,500 square foot house using the regression equation, SellingPrice=13,490.45+255.36(HouseSize)

=SUMPRODUCT(array1, [array2], [array3],...), SUMPRODUCT(B2:B3,C2:C3)=B2*C2+B3*C3=B2*1+B3*1500. B2: intercept B3:house size C2:1 C3:1500

The owner of Boston sports bar believes that, on average, her restaurant is busier on days when the Red Sox play an away game (a game played at another team's stadium), but she wants to be sure before adding more staff. To test whether this is true, she takes a random sample of 50 days over the course of the baseball season and records the total daily revenue, along with whether the Red Sox were playing away that day (1 if yes, 0 if no). Using the data provided, perform a regression analysis to determine the effect of Red Sox away games on revenue. Be sure to include the residuals and residual plot in your analysis.

From the Data menu, select Data Analysis, then select Regression. The Input Y Range is A1:A51 and the Input X Range is B1:B51. You must check the Labels box to ensure that the regression output table is appropriately labeled. You must also check the Residuals and Residual Plots boxes so that you are able to analyze the residuals.

graph: far apart

Lower R2 (0.70): A smaller portion of the variation in yy is explained by the regression line than in the previous graph. Low p-value (0.0000): There is a significant linear relationship between the dependent and independent variables, even though the R2 is lower than in the previous graph.

The owner of an ice cream shop wants to determine whether there is a relationship between ice cream sales and temperature. The owner collects data on temperature and sales for a random sample of 30 days and runs a regression to determine if there is a relationship between temperature (in degrees) and ice cream sales. The p-value for the two-sided hypothesis test is 0.04. How would you interpret the p-value?

If there is no relationship between temperature and sales, the chance of selecting a sample this extreme would be 4%. Correct. The null hypothesis is that there is no relationship. The p-value indicates how likely we would be to select a sample this extreme if the null hypothesis is true.

Consider the p-value corresponding to the independent variable's coefficient in the regression shown. Do you think the p-value is less than 0.05 or greater than 0.05?

Less than 0.05 A p-value less than 0.05 indicates that we can be 95% confident that the true slope is not zero, that is, that there is a significant linear relationship between the two variables. This graph provides strong evidence that there is a significant linear relationship between the two variables.

The scatter plot below displays the relationship between two variables. Which of the following options most accurately describes the R2 value and the p-value of this relationship?

Low R2; high p-value (i.e., p-value greater than 0.05) A low R2 and high p-value indicates that the independent variable explains little variation in the dependent variable and the linear relationship is not significant. Since the data points are widely dispersed and do not indicate a clear linear pattern, this relationship likely has a low R2 and high p-value.

Do you feel comfortable with the prediction you just made for a 6,000 square foot house?

No 6,000 square feet lies quite far outside the range of our historical housing data. Remember that there is greater uncertainty as we forecast outside of the historical range of the data, so we probably should not feel very comfortable with this prediction.

Based on the residual plot, do you think that this regression model is a good fit?

No The linear model does not appear to be a good fit because the residuals are not randomly distributed. The residuals form a funnel shape, which indicates that they are heteroskedastic. That is, the size of the residuals grows (in absolute value) as the average weekly hours studying decreases.

What is the expected change in production volume, on average, as the number of factory workers decreases by five?

Since the slope represents the average change in production volume as the number of factory workers increases by one, the average change in production volume as the number of factory workers decreases by five is 1,638.98(-5)= -8,194.9. 1638.9 = number of worke/coeff

Adding the Best Fit Line to a Scatter Plot

Step 1 Create a scatploter t with "House Size (Sqft)" on the horizontal axis and "Selling Price ($)" on the vertical axis. Include the labels when inputting your ranges so that the scatter plot is appropriately labeled. Step 2 Select Chart Tools from the Insert menu. Then select Layout, then select Trendline. Check the Display Equation box to display the equation of the best fit line.

Regression Analysis with Dummy Variables

Step 1 From the Data menu, select Data Analysis, then select Regression. Step 2 Enter the appropriate Input Y Range and Input X Range: The Input Y Range is the range of the dependent variable, in this case selling price. The data with its label are in column C, C1:C31. The Input X Range is the range of the independent variable, in this case the dummy variable, "SAT (0=low, 1=high)". The data with its label are in column B, B1:B31. Since we included the cells containing the variables' labels when inputting the ranges, check the Labels box. Step 3 Scroll down and make sure to check the Residuals and Residual Plots boxes, as this ensures that the output table will include that information. As we saw with scatter plots, residual plots may be less helpful when we have a dummy variable, but we will choose to view the residual plot for the sake of completeness.

Perform a single variable linear regression analysis to analyze the relationship between gross box office sales and home video units. Make sure to include the residuals and residual plot in your analysis.

The Input Y Range is C1:C149 and the Input X Range is B1:B149. You must check the Labels box since we included C1 and B1 to ensure that the regression output table is appropriately labeled. You must also check Residuals and Residual Plots boxes so that you are able to analyze the residuals.

Based on the regression output, what proportion of the variability in revenue can be accounted for by whether the Red Sox are playing away? Enter the value of the percentage with exactly ONE digit to the right of the decimal place. See the drop bar if you need more detail on how to round your answer.

The R Square value measures how much of the total variation in the dependent variable (in this case, revenue) that is explained by the independent variable (in this case, away game). As shown in the regression output, the R-square value is 0.2252, or approximately 22.5% You must have followed the rounding instructions in the question and entered exactly 22.5 to be graded as correct.

Residual Sum of Squares

The Residual Sum of Squares is the amount of variation that is left unexplained by the regression line, that is, the sum of the squared differences between the predicted and observed values. That is exactly what this graph shows.

Total Sum of Squares

The Total Sum of Squares is the variance of yy, that is, the total variation in yy. The Total Sum of Squares equals the sum of the squared differences between the observed values of yy and the mean of yy. That is exactly what the graph shows. (graph = swaures overlapping and in a straight line)

The sports bar owner runs a regression to test whether there is a relationship between Red Sox away games and daily revenue. Which of the following statements about the regression output is true? SELECT ALL THAT APPLY.

The average daily revenue for days when the Red Sox do not play away is $1,768.32. (coeff/interccep) This option is true. $1,768.32 is the average daily revenue on days when the Red Sox do not play away. The average daily revenue for days when the Red Sox play away is $2,264.57. This option is true. The average daily revenue on days when the Red Sox play away is $1,768.32+496.25=$2,264.57. (496.25 = yes or no)

ext, using only the data, calculate the average selling price for homes that are in school districts where students perform well on the SAT (SAT=1). fx

The average selling price of homes, given they are located in school districts where students have high SAT scores can be calculated as AVERAGEIF(B2:B31,1,C2:C31)=$809,100.

Based on the regression model, the expected daily production volume with 112 factory workers is 118,846 units. The human resource department noted that 123,415 units were produced on the most recent day on which there were 112 factory workers. What is the residual of this data point?

The residual is equal to the historically observed value minus the regression's predicted value(ε=y-ŷ). 112 factory workers historically produced 123,415 units, whereas the regression model predicts that 112 workers would produce 118,846 units. The residual is the difference between these two values: 123,415 units - 118,846 units = 4,569 units.

How would the width of the actual prediction interval (at a 95% confidence level) for a 3,000 square foot home differ from the width of the actual prediction interval (at a 95% confidence level) for a 2,000 square foot home, given that the average home size is approximately 1,750 square feet?

The width of the actual prediction interval for a 3,000 square foot home would be larger than the width of the prediction interval for a 2,000 square foot home.

good fit

When a linear model is a good fit, the residuals are randomly scattered above and below the horizontal axis. When a linear model is not a good fit, we see patterns, such as curves or heteroskedasticity, in the residuals.

Do you feel comfortable with the prediction you just made for a 2,000 square foot house?

Yes 2,000 lies well within the range of our historical housing data, so we can feel relatively comfortable with this prediction.

The scientist performs additional analyses and observes that the number of major earthquakes does appear to be decreasing but wonders whether the relationship is statistically significant. Based on the partial regression output below and a 5% significance level, is the year statistically significant in determining the number of earthquakes above magnitude 7.0?

Yes Since the p-value is not provided, the confidence interval for the coefficient should be used. Since the 95% confidence interval, -0.11 and -0.04, does not contain zero, the coefficient for year is statistically significant.

Based on the segment of the output table shown below for the regression analysis of the U.S. motion picture industry's 2011 home video units vs. 2011 gross box office revenues, is there evidence of a significant linear relationship between these two variables?

Yes Since the p-value of the independent variable, 0.0000, is less than 0.05, we can be 95% confident that there is a significant linear relationship between gross box office and home video units. We could also note that (19.58; 22.95), the 95% confidence interval for the slope, does not contain zero.

Is the relationship between Red Sox away games and average daily revenues significant at the 95% confidence level? Choose the correct answer with the corresponding correct reasoning.

Yes, because the p-value of the independent variable is less than 0.05 Since the p-value, 0.0005, is less than 0.05, we can be confident that the relationship is significant at the 5% significance level and, equivalently, at the 95% confidence level.


Related study sets

Accounting 292: Chapter 3 Cost Flows and External Reporting

View Set

Retirement & Income Planning: Chapter 11

View Set

중1 Lesson4. (단어) Question the Obvious

View Set

Computer science : chapter 5. Algorithms

View Set