07.05 Linear Regression and Interval for Slope

Ace your homework & exams now with Quizwiz!

Practice 2 A researcher at MAC cosmetics attempts to establish a linear relationship between the number of makeup tutorial videos and amount of makeup sales. The table is a partial printout of the regression analysis and is based on a random sample of nine individuals. Computer output in photo* What is the 95% confidence interval for the slope of the regression line?

0.9179 ± 2.365(0.0282)

How to read a computer output

The computer output gives you a p-value for a two-sided test. If you only need the p-value for a one-sided test, divide this value by two. 23.541 = y-intercept -2.7555 = slope 0.4668 = standard error of slope -5.90 = t value 0.000 = p-value 76% = coefficient of determination (r2) 0.5658 = standard error of the residuals

The statistics department at a large university is trying to determine if it is possible to predict whether an applicant will successfully complete the Ph.D. program or will leave before completing the program. The department is considering whether GPA (grade point average) in undergraduate statistics and mathematics courses (a measure of performance) and mean number of credit hours per semester (a measure of workload) would be helpful measures. To gather data, a random sample of 20 entering students from the past 5 years is taken. The data are given in photo. a. Use an appropriate graphical display to compare the GPAs for the two groups. Write a few sentences commenting on your display. b. For the students who successfully completed the Ph.D. program, is there a significant relationship between GPA and mean number of credit hours per semester? Give a statistical justification to support your response. c. If a new applicant has a GPA of 3.5 and a mean number of credit hours per semester of 14.0, do you think this applicant will successfully complete the Ph.D. program? Give a statistical justification to support your response.

a. Appropriate displays for this question could also include back-to-back stem- and leaf plots or parallel boxplots. In general, there are more successful students with high GPAs than unsuccessful students with high GPAs. In general, the GPAs for successful students are higher than the GPAs for unsuccessful students. b. Parameter: H0: β=0 Ha: β≠0 Conditions: It was stated that the assumptions necessary for inference are reasonable. I will be performing t-test for slope or linear regression t-test. I will be using the t-test statistic from the computer output table. Calculations: The computer output provided: t = -5.90 and p-value = 0.000. Conclusion: Because the p-value is 0.000, reject the null hypothesis. For students who successfully completed the program, there is a significant relationship between GPA and mean number of credit hours per semester. c. Predicted number of credit hours for a GPA of 3.5 For the successful group predicted hours = 23.514 - 2.7555(3.5) = 13.86975 For the unsuccessful group predicted hours = 24.200 - 3.485(3.5) = 12.0025 The actual value of 14 is much closer to the prediction for the successful group than that for the unsuccessful group. Therefore, we believe this student will be successful.

John believes that as he increases his walking speed, his pulse rate will increase. He wants to model this relationship. John records his pulse rate, in beats per minute (bpm), while walking at each of seven different speeds, in miles per hour (mph). A scatterplot and regression output are shown in photo. a. Using the regression output, write the equation of the fitted regression line. b. Do your estimates of the slope and intercept parameters have meaningful interpretations in the context of this question? If so, provide interpretations in this context. If not, explain why not. c. John wants to provide a 98 percent confidence interval for the slope parameter in his final report. Compute the margin of error that John should use. Assume that conditions for inference are satisfied.

a. Predicted Pulse = 63.457 + 16.2809(Speed) b. The intercept (63.457 bpm) provides an estimate for John's mean resting pulse (walking at a speed of zero mph). The slope (16.2809 bpm/mph) provides an estimate for the mean increase in John's heart rate as his speed is increased by one mile per hour. c. The margin of error for the confidence interval for the slope parameter is t*subn-2×sb, where sb is the standard error of the slope parameter. For a 98% confidence interval, the margin of error is 3.365 × 0.8192 = 2.7566 bpm.

Practice 1 Which of the following is NOT a necessary condition for completing an inference test for the slope of a regression line? I. The data must be produced from a well-designed random sample or randomized experiment. II. For each given value of x, the standard deviation of y remains the same. III. There must be a relationship of any kind between x and y. IV. For each given value of x, the values of the response variable y are independent. V. For each given value of the independent variable, the distribution of the response variable is Normally distributed.

III. There must be a relationship of any kind between x and y.

TEST TIP

On your AP Statistics Exam, you may want to take a moment right at the start and jot down the mnemonics you've learned so far (LINER, SOCS, SIN, and so on) before you begin working. They may come in handy!

The data from a computer printout are shown that give the regression output for predicting blood alcohol content based on the amount of beer consumed by 20 University of Ontario students. Assume the conditions necessary for inference for linear regression are present. Do these data provide convincing evidence that there is a positive relationship between the amount of beer consumed and blood alcohol content? Carry out an appropriate test at the 0.01 significance level. Computer output in photo

Parameter Let β equal the true slope of the regression line for predicting blood alcohol content from amount of beer consumed. H0: β=0 Ha: β>0 Conditions The problem states the conditions necessary for inference for linear regression are present. Because the conditions check, you will be calculating a t-test for the slope of the regression line at the α = 0.01 significance level. Calculations t = (b−β0)/(Sb) = (0.017964−0)/(0.002402) = 7.4788 With df = 20 - 2 = 18 and t = 7.48, the computer printout gives p = 0.000 for a two-sided test. The p-value for a one-sided test is half of this: p = 0.000/2 = 0.000 Conclusion Because p = 0.000 is less than α = 0.01, we reject the null hypothesis. We have sufficient evidence that, on average, there is a positive linear relationship between the number of beers consumed and blood alcohol content.

Conducting a significance test for the slope of a regression line is similar to conducting z- or t-tests of significance.

Parameter Let β equal the true slope of the regression line for predicting y from x. Most often, you will determine whether the slope of the regression line is equal to zero, making the null hypothesis H0: β=0. If the slope of the line is zero, then there is no linear relationship between the x and y variables. The alternative hypothesis is most often two-sided, with Ha: β≠0, meaning there is a linear relationship. However, it can also be written as Ha: β≠β0 Ha: β>0 Ha: β<0. The last two options allow you to determine whether the data are related positively or negatively. Conditions The conditions for a significance test for slope of a regression line have the most significant differences when compared with the other types of significance testing. Try using the mnemonic LINER (linear, independent, Normal, equal variance, and random) to recall these new conditions. Linear: Verify that the relationship between x and y is linear. The mean response of the y values for the fixed values of x are related linearly by the equation μy=α+βx. Create a scatterplot of the data to check the overall pattern is linear. Independent: Check that for each given value of x, the values of the response variable, y, are independent of each other. The data must be from random sampling and random assignment to ensure independence. Normal: For each given value of x, the values of the response variable, y, must vary according to a Normal distribution. Create a Normal probability plot, histogram, or stemplot of the residuals and check for Normality. Equal variance: For each given value of x, the standard deviation of y must remain the same. In the residual plot, we must see equal scattering both above and below the line. Random: Data must come from a well-designed random sample or randomized experiment. You must be told the data were produced in these ways. When the conditions are met, use a t-test for the slope of the regression line. Calculations By hand: in photo Using GC: t-Test for Slope of a Regression Line Step 1: Enter the explanatory variable, x, into L1, and the response variable, y, into L2. Step 2: Select [STAT], highlight TESTS, then choose option F:LinRegTTest. Step 3: Enter the appropriate list names for Xlist and Ylist, and enter 1 for Freq (unless you know you have a different frequency). Choose the appropriate alternative hypothesis. Step 4: Select Calculate, then press [ENTER]. The calculator gives you the values of t, p, df, a, b, s, r2, and r. Computer Output: Most linear regression inference questions will include a computer printout for you to use to obtain your statistics. Conclusion: You will provide the same conclusion as all other hypotheses tests based on the p-value and significance level (reject the null hypothesis or fail to reject the null hypothesis) in terms of the context of the problem.

A study was conducted in which the weight of a simple random sample of narwhal whales (in pounds) was compared with the average depth of a dive. Construct and interpret a 90% confidence interval for the true slope of the regression line. Table in photo

Parameter We want to estimate the true slope, β, of the population regression for predicting average depth of dive from the weight of a narwhal. Conditions Linear: The scatterplot shows a clear linear form. The residual plot also shows a random scatter of points about the residual line. Independent: Each whale is independent of each another. There are at least 10(9) = 90 narwhals. Normal: The histogram of the residual is unimodal and symmetric. Equal variance: The residual plot shows an equal amount of scatter around the horizontal line. Random: We are told the nine whales were selected randomly. Because the conditions are met, we will be calculating a 90% t-interval for the slope, β, of the regression line. Calculations Using a calculator to perform a linear regression t-interval, you should obtain a 90% confidence interval of (-3.125, -2.307), with df = 9 - 2 = 7. Conclusion We are 90% confident that the slope of the true regression line between the weight of a narwhal and its average depth of a dive is between -3.125 and -2.307.

Constructing a confidence interval for the slope of a regression line follows almost the same four-step process as for a t-interval.

Parameter: Estimate the true slope, β, of the population regression for predicting the y variable in context from the x variable in context. Conditions: The conditions are the same ones used in the significance test for the slope of a line. The mnemonic LINER may help you recall them. Linear: The relationship between x and y is linear. The mean response of the y values for the fixed values of x are related linearly by the equation μy=α+βx. Create a scatterplot of the data to check the overall pattern is linear. Independent: For each given value of x, the values of the response variable, y, are independent of each other. The data must be from a random sample and random assignment to ensure independence. Normal: For each given value of x, the values of the response variable, y, vary according to a Normal distribution. Create a Normal probability plot, histogram, or stemplot of the residual and check for normality. Equal variance: For each given value of x, the standard deviation of y remains the same. In the residual plot, you should see equal scattering both above and below the line. Random: Data come from a well-designed random sample or randomized experiment. You must be told the data were produced in one of those ways. After the conditions are met, use a t-interval for the slope, β, of the regression line. Calculations: The format for a confidence interval is Confidence interval = (Statistic) ± (Critical value)(Standard deviation of statistic). You are estimating β, the slope of the true regression line. You use the slope, b, of the sample regression line as the statistic that gives us the formula for the confidence interval. Formula in Photo* The formula uses standard error of the slope, Sb. However, you most likely won't need to calculate it by hand. You'll either construct the confidence interval using a calculator or a computer printout, just like for a significance test. Using GC: t-Interval t-Interval for the Slope of a Regression Line Step 1: Enter the explanatory variable, x, into L1 and the response variable, y, into L2. Step 2: Select [STAT], highlight TESTS, then choose option G, LinRegTInt. Step 3: Enter the appropriate lists for X and Y, a frequency of one (unless you know otherwise), and the confidence level in decimal form as Xlist, L1; Ylist,: L2; Freq, 1; and C-Level, .95. Step 4: Select Calculate, then press [ENTER]. The calculator gives you the confidence interval values, b (slope of the sample least-squares regression line), df, s, a, r2, and r. Computer output: The slope, b, of the least-squares regression line is given in the computer output under Coefficient and the explanatory variable. The value of the standard error of the slope is also given in the computer output under StDev and the explanatory variable. Finally, use Table B with df = n - 2 and the confidence level to determine the corresponding critical value t*. Conclusion: We are __% confident that the slope of the true linear relationship between [the y variable] and [the x variable] is between [lower value] and [upper value].

The computer output is shown from the least-squares regression analysis on the golf scores of 12 randomly selected members of college women's golf teams in two rounds of tournament play. Assume the conditions necessary for inference for linear regression are present. Is there sufficient evidence to claim a relationship between the first-round score and the second-round score? Computer output in photo*

Parameter: Let β equal the true slope of the regression line for predicting the second-round score from the first-round score. H0: β=0 Ha: β≠0 Conditions: It is stated you have the conditions necessary for inference for linear regression. Because the conditions check, you will be calculating a t-test for the slope of the regression line at the α = 0.05 significance level. (Remember, if a significance level is not specified, α = 0.05 is an accepted level.) Calculations: Computer Output: in photo Calculate by hand: t = (b−β0)/(Sb) = (0.9343 − 0)/(0.1345) = 6.9465 Notice that b (the slope) was found under Coefficients for the explanatory variable (round-one score), Sb is given under Standard Error for the explanatory variable, and the t statistic is given under t Stat for the explanatory variable. You are also given the p-value under p-Value for the explanatory variable. With df = 12−2 = 10 and t = 6.947676, the computer printout gives p = 0.0000396 Conclusion: Because p = 0.0000396 is less than α = 0.05, we reject the null hypothesis. We have sufficient evidence that there is a linear relationship between the first-round score and the second-round score.

The table gives the scores of 12 randomly selected members of college women's golf teams in two rounds of tournament play. Is there sufficient evidence to claim a relationship between the first-round score and the second-round score? Test the hypothesis at the 0.05 level of significance.

Parameter: Let β equal the true slope of the regression line for predicting the second-round score from the first-round score. H0: β=0 Ha: β≠0 Conditions: Linear: The scatterplot shows a clear linear form. Check this using a calculator. The residual plot also shows a random scatter of points about the residual line. Independent: Each golf player is independent of the others. You can assume there are at least 10(12) = 120 college women's golf players. Normal: The histogram of the residual is unimodal and roughly symmetric. Equal variance: The residual plot shows an equal amount of scatter around the horizontal line. Random: The problem states 12 college-age female golf players were selected randomly. Because the conditions check, you will be calculating a t-test for the slope of the regression line at the α = 0.05 significance level. Calculations: Using a Calculator: First, find the least-squares regression line using your calculator with the result ŷ = 5.8088 + 0.9343x, where x represents the round-one score and ŷ represents the predicted round-two score. Then, perform a linear regression t-test using the data, and obtain t = 6.9477 and p = 3.9583 × 10^-5 = 0.0000396 Conclusion: Because p = 0.0000396 is less than α = 0.05, we reject the null hypothesis. We have sufficient evidence that there is a linear relationship between the first-round score and the second-round score.

The computer output from the least-squares regression analysis of the high school GPAs and college freshmen GPAs of nine randomly selected students at New Burlington University is given in the table. Assume the conditions necessary for inference for linear regression are present. Construct and interpret a 95% confidence interval for the slope of the population regression line. computer output in photo

Parameter: We want to estimate the true slope, β, of the population regression for predicting college freshman GPA from high school GPA. Conditions: It is stated you can assume the conditions necessary for inference for linear regression are present. Because the conditions are met, you will be calculating a 95% t-interval for the slope, β, of the regression line. Calculations: b±t*Sb 1.157±2.365(0.1596) (0.7795,1.5345) With df = 9 - 2 = 7 and t* = 2.365, the computer printout gives you a 95% confidence interval of (0.7795, 1.5345). Conclusion: We are 95% confident that the slope of the true regression line between college freshmen GPA and high school GPA is between 0.7795 and 1.5345. Notice the answers for both the data and the computer output are almost exactly the same because calculations using the computer output are based on estimated (or rounded) values whereas the calculator method uses the exact values from the original data. It's just the process of getting to the answer that's slightly different.

An admissions officer at New Burlington University wants to see how well high school grade point average (GPA) predicts college freshmen GPAs. The table shows a random sample of nine students from New Burlington University and their GPA. table in photo Construct and interpret a 95% confidence interval for the slope of the population regression line between high school GPA and college freshmen GPA.

Parameter: You want to estimate the true slope, β, of the population regression for predicting college freshmen GPA from high school GPA. Conditions: Linear: The scatterplot shows a clear linear form. The residual plot also shows a random scatter of points about the residual line. Independent: Each student is independent of one another. There are at least 10(9) = 90 students at New Burlington University. Normal: The histogram of the residual is unimodal and symmetric. Equal variance: The residual plot shows an equal amount of scatter around the horizontal line. Random: You are told the nine New Burlington University students were selected randomly. Because the conditions are met, you will be calculating a 95% t-interval for the slope, β, of the regression line. Calculations: Using Technology Performing a linear regression t-interval using a calculator, you obtain a 95% confidence interval of (0.7796, 1.5344), with df = 9 - 2 = 7. Conclusion: We are 95% confident that the slope of the true regression line between college freshmen GPA and high school GPA is between 0.7796 and 1.5344.

Practice 3 Researchers believe there is a linear relationship between the year (since 1905) and the record for the women's 800-meter race. The partial computer output from a linear regression test is shown in photo. What is the value of the t-test statistic for H0:β = 0?

−15.1512


Related study sets

Insulins: Onset, Peak, and Duration

View Set

Government Unit 3 United States Government

View Set

HA - Chapter 10: Assessing for Violence

View Set

ATI ——— Targeted Med-Surg GI

View Set

FIN 515: Financial Markets & Institutions - Ch. 4 Q&As

View Set