DS - CHP 24
24. 1. 41-T The data in the accompanying table describe promotional spending by a pharmaceutical company for a cholesterol-lowering drug. The data cover 39 consecutive weeks and isolate the area around a certain city. The variables in this collection are shares. Marketing research often describes the level of promotion in terms of voice. In place of the level spending, voice is the share of advertising devoted to a specific product. The column Market Share is the ratio of sales of this product divided by the total sales for such drugs in the area. The column Detail Voice is the ratio of detailing for this drug to the amount of detailing for all cholesterol-lowering drugs in the city. Detailing counts the number of promotional visits made by representatives of a pharmaceutical company to doctors' offices. Similarly, Sample Voice is the share of samples in this market that are from this manufacturer. Complete parts (a) through (f). (a) Do any of these variables have linear patterns over time? Use timeplots of each one to see. Do any weeks stand out as unusual? Construct a timeplot of Market Share. Construct a timeplot of Detail Voice. Construct a timeplot of Sample Voice. Do any of these variables have linear patterns over time? Do any weeks stand out as unusual? Select the correct choice below and, if necessary, fill out the answer box to complete your choice. (b) Fit the multiple regression of Market Share on three explanatory variables: Detail Voice, Sample Voice, and Week (which is a simple time trend, number the weeks of the study from 1 to 39). Does the multiple regression, taken as a whole, explain statistically significant variation in the response? Fit the multiple regression. Does the multiple regression, taken as a whole, explain statistically significant variation in the response? Use α=0.05. State the null and alternative hypotheses. Determine the test statistic. Determine the p-value. State the appropriate conclusion. (c) Does collinearity affect the estimated effects of these explanatory variables in the estimated equation? In particular, do the partial effects create a different sense of importance from what is suggested by marginal effects? (d) Which explanatory variable has the largest variance inflation factor (VIF)? (e) What is your substantive interpretation of the fitted equation? Take into account collinearity and statistical significance. (f) Should both of the explanatory variables that are not statistically significant be removed from the model at the same time? Explain why doing this would not be such a good idea, in general. (Hint: Are they collinear?)
Answer: *In JMP, go to Graph Builder* (a) a timeplot of Market Share. *in Graph Builder put Week in x, and Market Share and click to show LINE* a timeplot of Detail Voice. *in Graph Builder put Week in x, and Detail Voice and click to show LINE* a timeplot of Sample Voice. *in Graph Builder put Week in x, Sample Voice and and click to show LINE* Market Share *does not show a linear pattern.* Detail Voice *does not show a linear pattern.* Sample Voice *shows a downward trend.* Yes, week(s) *6* appears to be an outlier in at least one of the timeplots. (Use a comma to separate answers as needed.) *Find by taking a look at the graph and see which one has an outlier* *OPEN FIT MODEL IN JMP AND BUT RESPONSE ON Y AND THE OTHER THREE EXPLANATORY VARIABLES IN ADD* (b) *Market Share= 0.210 + −0.008Detail Voice + 0.030Sample Voice + 0.00013Week* (Round the coefficient of Week to five decimal places as needed. Round all other values to three decimal places as needed.) *Find under estimates* H0:*β1=β2=β3=0* Ha:*At least one βi is different from 0.* F=*4.96* (Round to two decimal places as needed.) *ITS F-Ratio in JMP* p-value=*0.006* (Round to three decimal places as needed.) *Find under F-ratio* *Reject* the null hypothesis. There *is sufficient* evidence to conclude that the multipleregression, taken as awhole, explains statistically significant variation in the response. (c) Yes. The marginal slope of detailing is positive, and the partial slope is negative. (d) *Sample voice* has the largest VIF, with VIF=*4.24.* (Round to two decimal places as needed.) *In JMP right click on top of estimates, go to columns and then VIF* (e) *Sampling* is the only one of the three variables that contributes statistically significant variation in this fit. (f) This is not a good idea because two insignificant variables might be highly correlated with each other.
24. 1. 2 Match the property of a regression model with its description. Minimum value of VIF
Answer: 1 REASON= The minimum value of the VIF is 1. The VIF (variance inflation factor) quantifies the amount of unique variation in each explanatory variable and uses this to summarize the effect of collinearity. In a regression with two explanatory variables X1 and X2, the VIF for each variable is equal to 11−r2, where r=corrX1,X2. If X1 and X2 are uncorrelated, then VIF=1.
24. 1. 5 Match the property of a regression model with its description. Correlations among variables
Answer: Correlation matrix REASON= The correlation matrix compactly summarizes the association between the variables.
24. 1. 11 Determine if the following statement is true or false. If you believe that the statement is false, briefly explain why you think it is false. The use of correlated explanatory variables in a multiple regression implies collinearity in the model.
Answer: This statement is true. REASON= Large correlations between explanatory variables in a multiple-regression model produce collinearity. Collinearity can produce imprecise estimates of the partial slopes of the explanatory variables.
24. 1. 6 Match the property of a regression model with its description. Scatterplots among variables
Answer: A scatterplot matrix shows correlations among variables. REASON= Scatterplot matrix
24. 1. 17 Mark the statement true or false. If you believe that the statement is false, briefly explain why you think it is false. We can detect outliers by reviewing the summary of the associations in the scatterplot matrix.
Answer: True REASON= A scatterplot matrix helps identify the extent of the collinearity among the explanatory variables and identify important outliers.
24. 1. 8 Match the property of a regression model with its description. Test whether adding X1 improves the fit of the model
Answer: t-statistic for b1 REASON= The t-statistic for b1 can be used to test whether adding explanatory variable X1 improves the fit of the model.
24. 1. 34-T A manufacturer produces custom metal blanks that are used by its customers for computer-aided machining. The customer sends a design via computer, and the manufacturer comes up with an estimated cost per unit, which is then used to determine a price for the customer. The data for the analysis were sampled from the accounting records of 100 orders that were filled during the previous three months. Complete parts (a) through (d) below. (a) Fit the multiple regression of Average Cost on Material Cost and Labor Hours. Both explanatory variables are per unit produced. Do both explanatory variables improve the fit of the model that uses the other? Use Material Cost as x1 and Labor Hours as x2. Find the p-values for both explanatory variables. Do both explanatory variables improve the fit of the model that uses the other? (b) The estimated slope for labor hours per unit is much larger than the slope for material cost per unit. Does this difference mean that labor costs form a larger proportion of production costs than material costs? (c) Find the variance inflation factors for both explanatory variables. Interpret the value that you obtain. Interpret the value. (d) Suppose that you formulated this regression using total cost of each production run rather than average cost per unit. Would collinearity have been a problem in this model? Explain.
Answers: *In JMP, add data to FIT MODEL and then fill out the following using the estimates* (a) *y= 21.065 + 1.372x1 + 37.070x2* (Round to three decimal places as needed.) The p-value for x1 is *0.501* (Round to three decimal places as needed.) The p-value for x2 is *0.000* (Round to three decimal places as needed.) No, because the p-value for x1 is large. (b) No. It just means that in general, increases in labor hours worked increase the average cost much faster than increases in material costs do. (c) *VIF=1.093* (Round to three decimal places as needed.) *In JMP, go to COLUMNS and select VIF* The VIF is very close to 1, so there is very little collinearity between the two explanatory variables. (d) No, because the two explanatory variables are not changing.
24. 1. 32-T The accompanying data describe sales over time at a franchise outlet of a major U.S. oil company. Each row summarizes sales for one day. This particular station sells gas, and it also has a convenience store and a car wash. The response Sales gives the dollar sales of the convenience store. The explanatory variable Volume gives the number of gallons of gasoline sold, and Washes gives the number of car washes sold at the station. Complete parts a through d below. (a) Fit the multiple regression of Sales on Volume and Washes. Do both explanatory variables improve the fit of the model? Use α=0.05. (b) Which explanatory variable is more important to the success of sales at the convenience store: gasoline sales or car washes? Do the slopes of these variables in the multiple regression provide the full answer? (c) Find the variance inflation factor. Interpret the variance inflation factor. Choose the correct answer below. (d) One of the explanatory variables is just barely statistically significant. Assuming the same estimated value, would a complete lack of collinearity have made this explanatory variable noticeably more statistically significant?
Answers: (a) Both explanatory variables improve the fit of the model, though the variable Washes just barely improves the model. (b) Gasoline sales because the slope for Volume is larger than the slope for Washes and the daily volume of gallons sold is more than the number of car washes. The slopes only provide part of the answer. The volume of gallons of gasoline and the number of car washes per day is also important. (c) VIF(Volume)=VIF(Washes)=1.01 (Round to two decimal places asneeded.) *In JMP, open data in FIT MODEL, and then add sales to y and the other two explanatory variables. Then right click the mouse when on top of estimates in the bottom, COLUMNS, then click on VIF* Collinearity has little effect on the standard errors. (d) No, there is almost no collinearity.
24. 1. 23 Collinearity is sometimes described as a problem with the data, not the model. Rather than having data that fill the scatterplot of x1 on x2, the data concentrate along a diagonal. For example, the plot to the right shows monthly percentage changes in the whole stock market and a certain stock index. The data span the period running from 1999 through 2007. Complete parts (a) through (c). (a) Data for two months (May and June of 2002) deviate from the pattern evident in other months. What makes these months unusual? (b) If you were to use both returns on the market and those on the individual index as explanatory variables in the same regression, are these two months leveraged? (c) Would you want to use these months in the regression or exclude these from the multiple regression?
Answers: (a) In one of the two months the entire market had greater returns than the specific index. The opposite happened in the other month. (b) Yes. These months are different combinations of the two explanatory variables. (c) These months should be used in the regression because these points reduce the correlation between the explanatory variables.