DS - CHP 26
26. 1. 17 Decide if the statement below is true of false. If you believe that it is false, briefly explain why you think it is false. The F-test in an anova tests the null hypothesis that all of the groups have equal variance.
Answer: *The statement is false. The null hypothesis claims that all groups have equal means.*
26. 1. 15 Decide if the statement below is true of false. If you believe that it is false, briefly explain why you think it is false. A regression model that has at least two explanatory variables, all of which are dummy variables, produces an analysis of variance.
Answer: *The statement is true.*
26. 1. 5 Match the term from an ANOVA regression shown below to its symbol. Mean of data in omitted category
Answer: *b0*
26. 1. 8 Match the given term from an ANOVA regression to its symbol below. Null hypothesis of F-test
Answer: *μ1=μ2=⋯=μJ*
26. 1. 53-T Movie studios often release films into selected markets and use the reactions of audiences to plan further promotions. In these data, viewers rate the film on a scale that assigns a score from 0 (dislike) to 100 (great) to the movie. The viewers are located in one of three test markets: urban, rural, and suburban. The groups vary in size. Complete parts (a) through (g) below. Assume a 0.05 level of significance whenever necessary. (a) Plot the data. Choose the correct answer below. Do the data appear suited to ANOVA? (b) From your visual inspection, do differences among the average ratings appear large when compared to the within-group variation? (c) Fit a multiple regression of rating on two dummy variables that identify the urban and suburban viewers. Interpret the estimated intercept and slopes. (d) Are the standard errors of the slopes equal? Explain why or why not. Select the correct choice below and fill in the answer boxes to complete your choice. (e) Does a statistical test agree with your visual impression of the differences among the groups? Test the null hypothesis that the ratings are the same in the three markets. Determine the null and alternative hypotheses. Identify the F statistic. Identify the p-value. Does this p-value agree with your visual impression of the differences among the groups? (f) Do these data meet the conditions required for an ANOVA? Assume that the both samples were suitably randomized. Select all conditions below that are clearly not satisfied. (g) What conclusions should the studio reach regarding the prospects for marketing this movie in the three types of markets?
Answers: (a): *in JMP, go to FIT Y BY X, have Y as Market and Rating in X. PAY ATTENTION TO THE ACTUAL GROUPS AND PICK THE CORRECT GRAPH* - Yes, the variances appear to be similar enough. (b): No, the average ratings are all roughly within the same range, while the data within each group vary substantially. (c): *Predicted rating=47 +14.97D Urban +21.33D Suburban*(Round to two decimal places as needed.) REASON= in JMP, go to FIT MODEL an have rating in Y and ADD both dummies . - The intercept is the average*rural*rating*47.*The slope for the dummy variable representing urban viewers*14.97*is the*difference*between the average ratings of*urban*and*rural viewers.*The slope for the dummy variable representing suburban viewers*21.33*is the difference between the average ratings of*suburban*and*rural*viewers. (Round to two decimal places as needed.) (d): The standard error for the DUrban coefficient is *8.390* while the standard error for the DSuburban coefficient is *12.190.* These are different due to the difference in sample sizes. REASON= these numbers are next to the estimates found by Fit Model H0: μUrban=μSuburban=μRural Ha: At least one mean is different. F=*2.023* (Round to three decimal places as needed.) p-value=*0.143* (Round to three decimal places as needed.) *Yes,*the*large*p-value indicates that the *within* group variation outweighs the*between*group variation. (f):It is reasonable to assume that all conditions are satisfied. (g): The data is not strong enough to reach a conclusion regarding the differences in ratings between markets. They should consider repeating the study with a larger sample size.
26. 1. 46 A department store sampled the purchase amounts of 50 customers. Half the customers in the sample used coupons. The data is coded as −1 for those who did not use a coupon and +1 for those who did. The plot to the right shows the data and the least squares regression line. Term Estimate Std Error t Ratio p Intercept 197.30x 7.00 28.19x <0.0001 Coupon 46.38x 7.00 6.63x <0.0001 Answer parts (a) through (d). (a) Interpret the estimated intercept. Choose the correct answer below. Interpret the slope. Choose the correct answer below. Interpret the value of se. Choose the correct answer below. (b) Should managers conclude that customers who use coupons spend statistically significantly more than those who do not? (c) Suppose the comparison had been done using a pooled, two-sample t-test. What would be the value of the t-statistic? d) Suppose the comparison had been done using a dummy variable (coded as 1 for coupon users and 0 otherwise) rather than the variable Coupon. Give the values of b0, b1, and the t-statistic for the estimated slope.
Answers: (a): -*The intercept is the overall mean of the response. The intercept implies that the average shopper, regardless of whether a coupon was used, spent $197.30.* -*The slope is half the difference between the means of the two groups. The slope implies that a shopper with a coupon spent $92.76 more than a shopper without a coupon.* -*The value of se is the pooled estimated standard error for the difference between the two sample means.* (b): *Yes, because the slope of the least squares line is positive and significantly different from zero.* (c): t =*6.63* (Round to two decimal places as needed.) REASON= this number if found as t Ratio for Coupon above. (d): b0=*150.92* (Round to two decimal places as needed.) REASON= subtract 197.30 - 46.38 from the data provided in the problem. b1=*92.76* (Round to two decimal places as needed.) REASON= first add the estimates, 197.30+46.38=243.68, then subtract 243.68-150.92(which is b0)= 92.76. t=*6.63* (Round to two decimal places as needed.) REASON= the same as the t ratio for coupon.
26. 1. 45 Rather than use a dummy variable (D, coded as 1 for men and 0 for women) as the explanatory variable in the regression of responses of men and women, a model that includes an explanatory variable X coded as +1 for men and −1 for women was used. (This type of indicator variable is sometimes used rather than a dummy variable.) Answer parts (a) through (d). (a) What is the difference between the scatterplot of Y on X from the scatterplot of Y on D? (b) Does the regression of Y on D have the same R2 as the regression of Y on X? (c) In the regression of Y on X, what is the fitted value for men? For women? (d) Compare b0 and b1 in the regression of Y on D to b0 and b1 in the regression of Y on X. Choose the correct answer below.
Answers: (a): *The scale on the x-axis changes. The x-axis in the plot of Y on D goes from 0 to 1 while the x-axis on the plot of Y on X goes from −1 to 1.* (b): *Yes, because only the scale has changed. The association between the variables remains the same.* (c): *With either explanatory variable, the fitted values are the sample means of the two groups.* (d): *In the regression on D, b0 is the mean for women and b1 is the difference (mean of men minus mean of women). In the regression on X, b0 is the overall mean and b1 is half the distance between the group means.*
26. 1. 51-T People procrastinate when it comes to an unpleasant chore, but consumers often put off a good experience, too. Researchers gave coupons for either one or two free movie tickets to 120 students. Sixty coupons expired within a month, and 60 expired within 3 months. Students were randomized to have 4 samples of 30 for each combination of 1 or 2 tickets and 1 or 3 months. At the end of 3 months, theatres reported which coupons had been used in the accompanying data table. Complete parts (a) through (e). (a) Which group used the most coupons? The least? (b) Are these samples large enough to use for two-sample comparison? (c) Assuming the samples are large enough to satisfy the necessary conditions, fit and interpret the regression of use on the assigned group. Note that in this equation 1 month/1 ticket is a dummy variable that is valued 1 if the observation belongs to this group and 0 otherwise. The dummy variables 1 month/2 tickets and 3 months/1 ticket, are defined similarly. Interpret the regression results. (d) Are any of the differences among the four groups statistically significant? Construct Tukey confidence intervals for the differences between two means, using α=0.05. Which groups have statistically significant mean differences? Select all that apply. (e) Interpret the results of this analysis for retailers using coupons.
Answers: (a): The group *1 month/2 tickets* used the most coupons, and the group *3 months/1 ticket* used the least coupons. (b):*No, because not all groups have numbers of "successes" and "failures" greater than or equal to 10. Term Estimate Intercept *.433* 1 month/1 ticket * 0* 1 month/2 tickets * .4* 3 months/1 ticket *−0.033* (Round to three decimal places as needed.) REASONS=*In JMP, click on the Group column and then go to COLS on the top, go to utilities and click on make indicator column. Delete the last column(3/2). Then go to FIT MODEL and have use? as Y and all of the dummies in ADD.* - The regression equation indicates that 43.3% of those in the baseline group, 3 months/2 tickets, used the coupon; 43.3% of those in the group 1 month/1 ticket used the coupon; 83.3% of those in the group 1 month/2 tickets used the coupon; and 40% of those in the group 3 months/1 ticket used the coupon. (Round to one decimal place as needed.) REASON= for 83.3% add .433+.4 (d): *in JMP, go to the data sheet, and in FIT Y BY X. have use? in Y and the one column Group in X. Then in the red triangle go to Compare Means, and then click on All Pairs, Turkey HSD* 1 month/2 tickets−1 month/1 ticket:*0.08*to *0.72* 3 months/1 ticket−1 month/1 ticket:*−0.35*to*0.29* 3 months/2 tickets−1 month/1 ticket:*−0.32*to*0.32* 3 months/1 ticket−1 month/2 tickets:*−0.75*to*-0.11* 3 months/2 tickets−1 month/2 tickets:*−0.72*to*-0.08* 3 months/2 tickets−3 months/1 ticket:*-0.29* to *0.35* (Round to two decimal places as needed.) REASON= *PAY ATTENTION to what the problem is asking for, and is they are reversed change signs(+/-)* - 1 month/1 ticket and 1 month/2 tickets 1 month/2 tickets and 3 months/1 ticket 1 month/2 tickets and 3 months/2 tickets (e): The experiment shows that a short time frame with a more desirable product has higher redemption than a long time frame with a less desirable product.
26. 2. 48-T Because an ANOVA does not presume a linear trend, it can be used to check for deviations from the straight-enough condition. The procedure requires replicated observations; we need several values of y at each value of x. Use the data in the accompanying data table. The response is the weekly sales of a beverage in 47 stores, and the explanatory variable is the number of feet of shelf space used to display the product. Complete parts (a) through (c). (a) Fit the linear regression of sales on number of feet of shelf space. Does the relationship meet the straight-enough condition? Write the regression equation below. Make a plot of the number of feet of shelf space versus sales alongside the fitted prediction line. Choose the correct graph below. Does the relationship meet the straight-enough condition? (b) Build six dummy variables to represent the values of the explanatory variable (1, 2, 3, . . ., 6 with 7 excluded). The dummy variable D1 identifies stores displaying the product on 1 foot of shelf space, D2 identifies those with 2 feet, and so forth. Fit the multiple regression of the residuals from the simple regression in part (a) versus the six variables D1, D2, . . . , D6. Summarize the fit. Write the regression equation below. Determine the overall significance of the regression (the p-value of the ANOVA F-statistic). Describe the fit. (c) Does the regression of the residuals on the dummy variables explain statistically significant amounts of variation in the residuals? Should it? Use α=0.05.
Answers: (a): - Predicted Sales=*92.876*+*39.833*(Number of feet of shelf space) (Round to three decimal places as needed.) REASON= *In JMP, take the data place the sales in y and display feet in x for FIT Y BY X. then in the top red triangle press on FIT LINE.* -*correct graph* REASON= from the JMP, try to look for the graph shown and find the most similar one in MyLab -*Yes, because the plotted points show a rough linear association.* (b): *in JMP, go to the red triangle next to Linear Fit, and then save residuals.* *then in JMP, plot in FIT MODEL and have the saved residuals as Y and all of the dummies D1 to D6* - Predicted residual = *(−36.625)+(−41.847)D1+(74.741)D2+(71.309)D3+(45.316)D4+(48.203)D5+(50.986)D6* (Round to three decimal places as needed.) - p-value=*0.000* (Round to three decimal places as needed.) - When the shelf space is *1 foot or 7 feet,* the residuals tend to be *negative.* Otherwise, they tend to be *positive.* (c): Because the p-value of the ANOVA F-statistic is *0.000*, which is *less than α=0.05,* the regression of the residuals on the dummy variables does explain a statistically significant amount of variation in the residuals. If there were a true linear association between sales and number of feet of shelf space, this result*would not*occur because the residuals would be *evenly distributed about 0* for each value of shelf space.
26. 1. 30 A Web site monitors the number of unique customer visits, producing a total for each day. The accompanying table summarizes the totals by day of the week, averaged over the last 12 weeks. (For example, during this 12-week period, the site averaged 2,430 visitors on Mondays.) A regression model regressed the number of visits on six dummy variables, representing the days Monday through Saturday (omitting Sunday). Complete parts (a) through (c) below. (a) What is the estimated intercept b0 in the regression? (b) What is the slope of the dummy variable that represents Monday? (c) Are the differences among the days statistically significant?
Answers: (a): *2890* (Type an integer or a decimal. Do not round.) REASON= you get this number from the Average Number of visits for Sunday. (b): *-460* (Type an integer or a decimal. Do not round.) REASON= you subtract the Average Number of visits for Monday by the Average Number of visits for Sunday, 2430-2890. (c): *This question cannot be answered without information on the unexplained variation.*