DS - CHP 26

¡Supera tus tareas y exámenes ahora con Quizwiz!

26. 1. 17 Decide if the statement below is true of false. If you believe that it is​ false, briefly explain why you think it is false. The​ F-test in an anova tests the null hypothesis that all of the groups have equal variance.

Answer: *The statement is false. The null hypothesis claims that all groups have equal means.*

26. 1. 15 Decide if the statement below is true of false. If you believe that it is​ false, briefly explain why you think it is false. A regression model that has at least two explanatory​ variables, all of which are dummy​ variables, produces an analysis of variance.

Answer: *The statement is true.*

26. 1. 5 Match the term from an ANOVA regression shown below to its symbol. Mean of data in omitted category

Answer: *b0*

26. 1. 8 Match the given term from an ANOVA regression to its symbol below. Null hypothesis of​ F-test

Answer: *μ1=μ2=⋯=μJ*

26. 1. 53-T Movie studios often release films into selected markets and use the reactions of audiences to plan further promotions. In these​ data, viewers rate the film on a scale that assigns a score from 0​ (dislike) to 100​ (great) to the movie. The viewers are located in one of three test​ markets: urban,​ rural, and suburban. The groups vary in size. Complete parts​ (a) through​ (g) below. Assume a 0.05 level of significance whenever necessary. ​(a) Plot the data. Choose the correct answer below. Do the data appear suited to​ ANOVA? ​(b) From your visual​ inspection, do differences among the average ratings appear large when compared to the​ within-group variation? ​(c) Fit a multiple regression of rating on two dummy variables that identify the urban and suburban viewers. Interpret the estimated intercept and slopes. ​(d) Are the standard errors of the slopes​ equal? Explain why or why not. Select the correct choice below and fill in the answer boxes to complete your choice. ​(e) Does a statistical test agree with your visual impression of the differences among the​ groups? Test the null hypothesis that the ratings are the same in the three markets. Determine the null and alternative hypotheses. Identify the F statistic. Identify the​ p-value. Does this​ p-value agree with your visual impression of the differences among the​ groups? ​(f) Do these data meet the conditions required for an​ ANOVA? Assume that the both samples were suitably randomized. Select all conditions below that are clearly not satisfied. ​(g) What conclusions should the studio reach regarding the prospects for marketing this movie in the three types of​ markets?

Answers: (a): *in JMP, go to FIT Y BY X, have Y as Market and Rating in X. PAY ATTENTION TO THE ACTUAL GROUPS AND PICK THE CORRECT GRAPH* - ​Yes, the variances appear to be similar enough. (b): ​No, the average ratings are all roughly within the same​ range, while the data within each group vary substantially. (c): *Predicted rating=47 +14.97D Urban +21.33D Suburban*(Round to two decimal places as​ needed.) REASON= in JMP, go to FIT MODEL an have rating in Y and ADD both dummies . - The intercept is the average*rural*rating*47.*The slope for the dummy variable representing urban viewers*14.97*is the*difference*between the average ratings of*urban*and*rural viewers.*The slope for the dummy variable representing suburban viewers*21.33*is the difference between the average ratings of*suburban*and*rural*viewers. ​(Round to two decimal places as​ needed.) (d): The standard error for the DUrban coefficient is *8.390* while the standard error for the DSuburban coefficient is *12.190.* These are different due to the difference in sample sizes. REASON= these numbers are next to the estimates found by Fit Model H0​: μUrban=μSuburban=μRural Ha​: At least one mean is different. F=*2.023* ​(Round to three decimal places as​ needed.) p-value=*0.143* ​(Round to three decimal places as​ needed.) *Yes,*the*large*​p-value indicates that the *within* group variation outweighs the*between*group variation. (f):It is reasonable to assume that all conditions are satisfied. (g): The data is not strong enough to reach a conclusion regarding the differences in ratings between markets. They should consider repeating the study with a larger sample size.

26. 1. 46 A department store sampled the purchase amounts of 50 customers. Half the customers in the sample used coupons. The data is coded as −1 for those who did not use a coupon and +1 for those who did. The plot to the right shows the data and the least squares regression line. Term Estimate Std Error t Ratio p Intercept 197.30x 7.00 28.19x <0.0001 Coupon 46.38x 7.00 6.63x <0.0001 Answer parts​ (a) through​ (d). ​(a) Interpret the estimated intercept. Choose the correct answer below. Interpret the slope. Choose the correct answer below. Interpret the value of se. Choose the correct answer below. ​(b) Should managers conclude that customers who use coupons spend statistically significantly more than those who do​ not? ​(c) Suppose the comparison had been done using a​ pooled, two-sample​ t-test. What would be the value of the​ t-statistic? d) Suppose the comparison had been done using a dummy variable​ (coded as 1 for coupon users and 0​ otherwise) rather than the variable Coupon. Give the values of b0​, b1​, and the​ t-statistic for the estimated slope.

Answers: (a): -*The intercept is the overall mean of the response. The intercept implies that the average​ shopper, regardless of whether a coupon was​ used, spent ​$197.30.* -*The slope is half the difference between the means of the two groups. The slope implies that a shopper with a coupon spent ​$92.76 more than a shopper without a coupon.* -*The value of se is the pooled estimated standard error for the difference between the two sample means.* (b): *​Yes, because the slope of the least squares line is positive and significantly different from zero.* (c): t =*6.63* ​(Round to two decimal places as​ needed.) REASON= this number if found as t Ratio for Coupon above. (d): b0=*150.92* ​(Round to two decimal places as​ needed.) REASON= subtract 197.30 - 46.38 from the data provided in the problem. b1=*92.76* ​(Round to two decimal places as​ needed.) REASON= first add the estimates, 197.30+46.38=243.68, then subtract 243.68-150.92(which is b0)= 92.76. t=*6.63* (Round to two decimal places as​ needed.) REASON= the same as the t ratio for coupon.

26. 1. 45 Rather than use a dummy variable​ (D, coded as 1 for men and 0 for​ women) as the explanatory variable in the regression of responses of men and​ women, a model that includes an explanatory variable X coded as +1 for men and −1 for women was used.​ (This type of indicator variable is sometimes used rather than a dummy​ variable.) Answer parts​ (a) through​ (d). ​(a) What is the difference between the scatterplot of Y on X from the scatterplot of Y on​ D? ​(b) Does the regression of Y on D have the same R2 as the regression of Y on​ X? (c) In the regression of Y on​ X, what is the fitted value for​ men? For​ women? (d) Compare b0 and b1 in the regression of Y on D to b0 and b1 in the regression of Y on X. Choose the correct answer below.

Answers: (a): *The scale on the​ x-axis changes. The​ x-axis in the plot of Y on D goes from 0 to 1 while the​ x-axis on the plot of Y on X goes from −1 to 1.* (b): *Yes, because only the scale has changed. The association between the variables remains the same.* (c): *With either explanatory​ variable, the fitted values are the sample means of the two groups.* (d): *In the regression on​ D, b0 is the mean for women and b1 is the difference​ (mean of men minus mean of​ women). In the regression on​ X, b0 is the overall mean and b1 is half the distance between the group means.*

26. 1. 51-T People procrastinate when it comes to an unpleasant​ chore, but consumers often put off a good​ experience, too. Researchers gave coupons for either one or two free movie tickets to 120 students. Sixty coupons expired within a​ month, and 60 expired within 3 months. Students were randomized to have 4 samples of 30 for each combination of 1 or 2 tickets and 1 or 3 months. At the end of 3​ months, theatres reported which coupons had been used in the accompanying data table. Complete parts​ (a) through​ (e). ​(a) Which group used the most​ coupons? The​ least? ​(b) Are these samples large enough to use for​ two-sample comparison? ​(c) Assuming the samples are large enough to satisfy the necessary​ conditions, fit and interpret the regression of use on the assigned group. Note that in this equation 1​ month/1 ticket is a dummy variable that is valued 1 if the observation belongs to this group and 0 otherwise. The dummy variables 1​ month/2 tickets and 3​ months/1 ticket, are defined similarly. Interpret the regression results. ​(d) Are any of the differences among the four groups statistically​ significant? Construct Tukey confidence intervals for the differences between two​ means, using α=0.05. Which groups have statistically significant mean​ differences? Select all that apply. ​(e) Interpret the results of this analysis for retailers using coupons.

Answers: (a): The group *1 month/2 tickets* used the most​ coupons, and the group *3 months/1 ticket* used the least coupons. (b):*​No, because not all groups have numbers of​ "successes" and​ "failures" greater than or equal to 10. Term Estimate Intercept *.433* 1​ month/1 ticket * 0* 1​ month/2 tickets * .4* 3​ months/1 ticket *−0.033* ​(Round to three decimal places as​ needed.) REASONS=*In JMP, click on the Group column and then go to COLS on the top, go to utilities and click on make indicator column. Delete the last column(3/2). Then go to FIT MODEL and have use? as Y and all of the dummies in ADD.* - The regression equation indicates that 43.3​% of those in the baseline​ group, 3​ months/2 tickets, used the​ coupon; 43.3​% of those in the group 1​ month/1 ticket used the​ coupon; 83.3​% of those in the group 1​ month/2 tickets used the​ coupon; and 40​% of those in the group 3​ months/1 ticket used the coupon. (Round to one decimal place as​ needed.) REASON= for 83.3% add .433+.4 (d): *in JMP, go to the data sheet, and in FIT Y BY X. have use? in Y and the one column Group in X. Then in the red triangle go to Compare Means, and then click on All Pairs, Turkey HSD* 1​ month/2 tickets−1 ​month/1 ticket:*0.08*to *0.72* 3​ months/1 ticket−1 ​month/1 ticket:*−0.35*to*0.29* 3​ months/2 tickets−1 ​month/1 ticket:*−0.32*to*0.32* 3​ months/1 ticket−1 ​month/2 tickets:*−0.75*to*-0.11* 3​ months/2 tickets−1 ​month/2 tickets:*−0.72*to*-0.08* 3​ months/2 tickets−3 ​months/1 ticket:*-0.29* to *0.35* ​(Round to two decimal places as​ needed.) REASON= *PAY ATTENTION to what the problem is asking for, and is they are reversed change signs(+/-)* - 1​ month/1 ticket and 1​ month/2 tickets 1​ month/2 tickets and 3​ months/1 ticket 1​ month/2 tickets and 3​ months/2 tickets (e): The experiment shows that a short time frame with a more desirable product has higher redemption than a long time frame with a less desirable product.

26. 2. 48-T Because an ANOVA does not presume a linear​ trend, it can be used to check for deviations from the​ straight-enough condition. The procedure requires replicated​ observations; we need several values of y at each value of x. Use the data in the accompanying data table. The response is the weekly sales of a beverage in 47​ stores, and the explanatory variable is the number of feet of shelf space used to display the product. Complete parts​ (a) through​ (c). ​(a) Fit the linear regression of sales on number of feet of shelf space. Does the relationship meet the​ straight-enough condition? Write the regression equation below. Make a plot of the number of feet of shelf space versus sales alongside the fitted prediction line. Choose the correct graph below. Does the relationship meet the​ straight-enough condition? ​(b) Build six dummy variables to represent the values of the explanatory variable​ (1, 2,​ 3, . .​ ., 6 with 7​ excluded). The dummy variable D1 identifies stores displaying the product on 1 foot of shelf​ space, D2 identifies those with 2​ feet, and so forth. Fit the multiple regression of the residuals from the simple regression in part​ (a) versus the six variables D1​, D2​, . . .​ , D6. Summarize the fit. Write the regression equation below. Determine the overall significance of the regression​ (the p-value of the ANOVA​ F-statistic). Describe the fit. (c) Does the regression of the residuals on the dummy variables explain statistically significant amounts of variation in the​ residuals? Should​ it? Use α=0.05.

Answers: (a): - Predicted Sales=*92.876*+*39.833*​(Number of feet of shelf​ space) ​(Round to three decimal places as​ needed.) REASON= *In JMP, take the data place the sales in y and display feet in x for FIT Y BY X. then in the top red triangle press on FIT LINE.* -*correct graph* REASON= from the JMP, try to look for the graph shown and find the most similar one in MyLab -*​Yes, because the plotted points show a rough linear association.* (b): *in JMP, go to the red triangle next to Linear Fit, and then save residuals.* *then in JMP, plot in FIT MODEL and have the saved residuals as Y and all of the dummies D1 to D6* - Predicted residual = *​(−36.625​)+​(−41.847​)D1+​(74.741​)D2+​(71.309​)D3+​(45.316​)D4+​(48.203​)D5+​(50.986​)D6* ​(Round to three decimal places as​ needed.) - p-value=*0.000* ​(Round to three decimal places as​ needed.) - When the shelf space is *1 foot or 7 feet,* the residuals tend to be *negative.* ​Otherwise, they tend to be *positive.* (c): Because the​ p-value of the ANOVA​ F-statistic is *0.000*, which is *less than α=​0.05,* the regression of the residuals on the dummy variables does explain a statistically significant amount of variation in the residuals. If there were a true linear association between sales and number of feet of shelf​ space, this result*would not*occur because the residuals would be *evenly distributed about 0* for each value of shelf space.

26. 1. 30 A Web site monitors the number of unique customer​ visits, producing a total for each day. The accompanying table summarizes the totals by day of the​ week, averaged over the last 12 weeks.​ (For example, during this​ 12-week period, the site averaged 2,430 visitors on​ Mondays.) A regression model regressed the number of visits on six dummy​ variables, representing the days Monday through Saturday​ (omitting Sunday). Complete parts​ (a) through​ (c) below. (a) What is the estimated intercept b0 in the​ regression? ​(b) What is the slope of the dummy variable that represents​ Monday? ​(c) Are the differences among the days statistically​ significant?

Answers: (a): *2890* (Type an integer or a decimal. Do not​ round.) REASON= you get this number from the Average Number of visits for Sunday. (b): *-460* (Type an integer or a decimal. Do not​ round.) REASON= you subtract the Average Number of visits for Monday by the Average Number of visits for Sunday, 2430-2890. (c): *This question cannot be answered without information on the unexplained variation.*


Conjuntos de estudio relacionados

Physical and Chemical Control of Microbes

View Set

AREndurance STUDY NOTES - BDCS-Jennypdx

View Set