3.1 Hypothesis Testing

Developing Hypotheses

- randomly select and sample population - test what we learn from sample to see if hypothesis is true - HISTORICALLY the AVERAGE CUSTOMER satisfaction RATING is 6.7 - Construct a RANGE around 6.7 - if the sample mean falls within the range then we do not have sufficient evidence to support the hypothesis that Customer Satisfaction has changed.

If the two-sided p-value of a given sample mean is 0.0040, what is the one-sided p-value for that sample mean?

0.0020 is correct The one-sided p-value is half of the two-sided p-value. Since the two-sided p-value is 0.0040, the one-sided p-value is 0.0040/2=0.0020. 0.0040 The one-sided p-value is half of the two-sided p-value. 0.0080 The one-sided p-value is half of the two-sided p-value. The answer cannot be determined without further information Since we know the two-sided p-value, we can calculate the one-sided p-value of the sample mean.

If the two-sided p-value of a given sample mean is 0.0040, what is the one-sided p-value for that sample mean?

0.0020 is correct The one-sided p-value is half of the two-sided p-value. Since the two-sided p-value is 0.0040, the one-sided p-value is 0.0040/2=0.0020. 0.0040 The one-sided p-value is half of the two-sided p-value. 0.0080 The one-sided p-value is half of the two-sided p-value. The answer cannot be determined without further information Since we know the two-sided p-value, we can calculate the one-sided p-value of the sample mean. Result

If the one-sided p-value of a given sample mean is 0.0150, what is the two-sided p-value for that sample mean?

0.0075 The two-sided p-value is double the one-sided p-value. 0.0150 The two-sided p-value is double the one-sided p-value. 0.0300 is correct The two-sided p-value is double the one-sided p-value. Since the one-sided p-value is 0.0150, the two-sided p-value is 0.0150*2=0.0300. The answer cannot be determined without further information Since we know the one-sided p-value, we can calculate the two-sided p-value of the sample mean.

If the one-sided p-value of a given sample mean is 0.0150, what is the two-sided p-value for that sample mean?

0.0075 The two-sided p-value is double the one-sided p-value. 0.0150 The two-sided p-value is double the one-sided p-value. 0.0300 is correct The two-sided p-value is double the one-sided p-value. Since the one-sided p-value is 0.0150, the two-sided p-value is 0.0150*2=0.0300. The answer cannot be determined without further information Since we know the one-sided p-value, we can calculate the two-sided p-value of the sample mean. Result

If the significance level of a hypothesis test is 10%, for which of the following p-values would you reject the null hypothesis? Select all that apply.

0.08 is correct We reject the null hypothesis if the mean of our sample falls within the rejection region. The area of the rejection region is equal to the significance level, so we reject the null hypothesis when the p-value is less than the significance level. Since 0.08 is less than 0.10, we would reject the null hypothesis. Remember: the lower the p-value, the stronger the evidence is against the null hypothesis. Note that another option is also correct. 0.89 We reject the null hypothesis if the mean of our sample falls within the rejection region. The area of the rejection region is equal to the significance level, so we reject the null hypothesis when the p-value is less than the significance level. Since 0.89 is greater than 0.10, we would fail to reject the null hypothesis. 0.05 is correct We reject the null hypothesis if the mean of our sample falls within the rejection region. The area of the rejection region is equal to the significance level, so we reject the null hypothesis when the p-value is less than the significance level. Since 0.05 is less than 0.10, we would reject the null hypothesis. Remember: the lower the p-value, the stronger the evidence is against the null hypothesis. Note that another option is also correct. 0.11 We reject the null hypothesis if the mean of our sample falls within the rejection region. The area of the rejection region is equal to the significance level, so we reject the null hypothesis when the p-value is less than the significance level. Since 0.11, is greater than 0.10, we would fail to reject the null hypothesis.

Two Outcomes of a Hypothesis Test

1. Reject the Null Hypothesis 2. Fail to Reject - due to insufficient evidence

If you are performing a hypothesis test based on a 0.10 significance level (10%), what are your chances of making a type I error?

10% is correct The probability of a type I error is equal to the significance level (which is 1-confidence level). A 10% significance level indicates that there is a 10% chance of making a type I error.

If we specify a 75% confidence level, what percentage of sample means do we expect to fall in the rejection region?

25% is correct The significance level equals the area of the rejection region. The significance level equals 1-confidence level. In this case, 1-0.75=0.25, that is, 25%.

3.2 Summary Lesson Summary We use hypothesis tests to substantiate a claim about a population mean (or other population parameter). The null hypothesis (H0) is a statement about a topic of interest about the population. It is typically based on historical information or conventional wisdom. We always start a hypothesis test by assuming that the null hypothesis is true and then test to see if we can nullify it using evidence from a sample. The null hypothesis is the opposite of the hypothesis we are trying to prove (the alternative hypothesis). The alternative hypothesis (Ha) is the theory or claim we are trying to substantiate. Before conducting a hypothesis test: Determine whether to analyze a change in a single population or compare two populations. We perform a single-population hypothesis test when we want to determine whether a population's mean is significantly different from its historical average. We perform a two-population hypothesis test when we want to compare the means of two populations—for example, when we want to conduct an experiment and test for a difference between a control and treatment group. Determine whether to perform a one-sided or two-sided hypothesis test. We perform two-sided tests when we do not have strong convictions about the direction of a change. Therefore we test for a change in either direction We perform a one-sided test when we have strong convictions about the direction of a change—that is, we know that the change is either an increase or a decrease. To conduct a hypothesis test, we must follow these steps: State the null and alternative hypotheses. Choose the level of significance for the test. Gather data about a sample or samples. To determine whether the sample is highly unlikely under the assumption that the null hypothesis is true, construct the range of likely sample means or calculate the p-value. The p-value is the likelihood of obtaining a sample as extreme as the one we've obtained, if the null hypothesis is true. The p-value of a one-sided hypothesis test is half the p-value of a two-sided hypothesis test. If the sample mean falls in the range of likely sample means, or if its p-value is greater than the stated significance level, we do not have sufficient evidence to reject the null hypothesis. If the sample mean falls in the rejection region, or if it has a p-value lower than the stated significance level, we have sufficient evidence to reject the null hypothesis. We can never accept the null hypothesis. Trade-offs: The higher the confidence level (and therefore the lower the significance level), the lower the chance of rejecting the null hypothesis when it is true (type I error or false positive). But the higher the confidence level, the higher the chance of not rejecting it when it is false (type II error or false negative). Excel Summary Calculating the range of likely sample means using CONFIDENCE.NORM or CONFIDENCE.T =T.TEST(array1, array2, tails, type)

3.2.1 Developing Hypotheses

3.2.2 Constructing a Range of Likely Sample Means

3.2.3 Using p-values

3.2.4 Type I and Type II Errors

3.2.5 One-sided Testing

3.2.6 Comparing Two Populations

3.3.1 The Shopping Cart A/B Test

3.3.3 The Magazine A/B Test

What is the significance level for a 95% confidence level?

5% is correct Significance level=1-confidence level. 1-0.95=0.05, that is, 5%.

Now suppose we take a sample of 25 students, taking the same standardized test, which has a mean score of 500 and a standard deviation of 100, and find that the average score of this sample is 530. Which function would correctly calculate the 95% range of likely sample means under the null hypothesis?

530 ± CONFIDENCE.NORM(0.05,100,25) The range of likely sample means is centered at the historical population mean, 500, not at the sample mean, 530. In addition, because of the small sample size, we cannot assume that the sample means are normally distributed, so we should not use the CONFIDENCE.NORM function. 530± CONFIDENCE.T(0.05,100,25) The range of likely sample means is centered at the historical population mean, 500, not at the sample mean, 530. 500 ± CONFIDENCE.T(0.05,100,25) is correct The range of likely sample means is centered at the historical population mean, 500. Because our sample is less than 30, we cannot assume that the sample means are normally distributed, and so we should use CONFIDENCE.T rather than the CONFIDENCE.NORM function. 500 ± CONFIDENCE.NORM(0.05,100,25) Because of the small sample size, we cannot assume that the sample means are normally distributed, so we should not use the CONFIDENCE.NORM function.

Now suppose we take a sample of 25 students, taking the same standardized test, which has a mean score of 500 and a standard deviation of 100, and find that the average score of this sample is 530. Which function would correctly calculate the 95% range of likely sample means under the null hypothesis?

530 ± CONFIDENCE.NORM(0.05,100,25) The range of likely sample means is centered at the historical population mean, 500, not at the sample mean, 530. In addition, because of the small sample size, we cannot assume that the sample means are normally distributed, so we should not use the CONFIDENCE.NORM function. 530± CONFIDENCE.T(0.05,100,25) The range of likely sample means is centered at the historical population mean, 500, not at the sample mean, 530. 500 ± CONFIDENCE.T(0.05,100,25) is correct The range of likely sample means is centered at the historical population mean, 500. Because our sample is less than 30, we cannot assume that the sample means are normally distributed, and so we should use CONFIDENCE.T rather than the CONFIDENCE.NORM function. 500 ± CONFIDENCE.NORM(0.05,100,25) Because of the small sample size, we cannot assume that the sample means are normally distributed, so we should not use the CONFIDENCE.NORM function. Result Correct! Question 2 of 11

The mean score on a particular standardized test is 500, with a standard deviation of 100. To assess whether a training course has been effective in improving scores on the test, we take a random sample of 100 students from the course and find that the average score of this sample is 550. Which function would correctly calculate the 95% range of likely sample means under the null hypothesis?

550 ± CONFIDENCE.NORM(0.05,100,100) The range of likely sample means is centered at the historical population mean, 500, not at the sample mean, 550. 550± CONFIDENCE.T(0.05,100,100) Because our sample is larger than 30, we can assume the distribution of sample means is roughly normal, due to the central limit theorem, and use the CONFIDENCE.NORM function. In addition, the range of likely sample means is centered at the historical population mean, 500, not at the sample mean, 550. 500 ± CONFIDENCE.T(0.05,100,100) Because our sample is larger than 30, we can assume the distribution of sample means is roughly normal, due to the central limit theorem, and use the CONFIDENCE.NORM function. 500 ± CONFIDENCE.NORM(0.05,100,100) is correct The range of likely sample means is centered at the historical population mean, 500. Because our sample is larger than 30, we can assume the distribution of sample means is roughly normal, due to the central limit theorem, and use the CONFIDENCE.NORM function.

The mean score on a particular standardized test is 500, with a standard deviation of 100. To assess whether a training course has been effective in improving scores on the test, we take a random sample of 100 students from the course and find that the average score of this sample is 550. Which function would correctly calculate the 95% range of likely sample means under the null hypothesis?

550 ± CONFIDENCE.NORM(0.05,100,100) The range of likely sample means is centered at the historical population mean, 500, not at the sample mean, 550. 550± CONFIDENCE.T(0.05,100,100) Because our sample is larger than 30, we can assume the distribution of sample means is roughly normal, due to the central limit theorem, and use the CONFIDENCE.NORM function. In addition, the range of likely sample means is centered at the historical population mean, 500, not at the sample mean, 550. 500 ± CONFIDENCE.T(0.05,100,100) Because our sample is larger than 30, we can assume the distribution of sample means is roughly normal, due to the central limit theorem, and use the CONFIDENCE.NORM function. 500 ± CONFIDENCE.NORM(0.05,100,100) is correct The range of likely sample means is centered at the historical population mean, 500. Because our sample is larger than 30, we can assume the distribution of sample means is roughly normal, due to the central limit theorem, and use the CONFIDENCE.NORM function. Result

Now suppose we take a sample and find the average satisfaction rating to be 7.3. What should be the center of the range of likely sample means? Remember that H0:μ=6.7 and Ha:μ≠6.7.

6.7 is correct We always start a hypothesis test by assuming that the null hypothesis is true. Thus, the center of the range of likely sample means is the historical average—the average specified by the null hypothesis, in this case is 6.7. Remember, the null hypothesis is that showing old classics has not changed the average satisfaction rating.

Suppose we wanted to calculate a 90% range of likely sample means for the movie theater example. Select the function that would correctly calculate this range.

6.7±CONFIDENCE.NORM(0.10,2.8,196) is correct The range of likely sample means is centered at the historical population mean, in this case 6.7. Since this is a 90% range of likely sample means, alpha equals 0.10.

If you are performing a hypothesis test based on a 20% significance level, what are your chances of making a type I error?

80% The probability of a type I error is equal to the significance level, which is 1-confidence level. 10% The probability of a type I error is equal to the significance level, which is 1-confidence level. 20% is correct The probability of a type I error is equal to the significance level, which is 1-confidence level. It is not possible to tell without more information The significance level provides the necessary information. The probability of a type I error is equal to the significance level, which is 1-confidence level.

If you are performing a hypothesis test based on a 20% significance level, what are your chances of making a type I error?

80% The probability of a type I error is equal to the significance level, which is 1-confidence level. 10% The probability of a type I error is equal to the significance level, which is 1-confidence level. 20% is correct The probability of a type I error is equal to the significance level, which is 1-confidence level. It is not possible to tell without more information The significance level provides the necessary information. The probability of a type I error is equal to the significance level, which is 1-confidence level. Result

If you are performing a hypothesis test based on a 90% confidence level, what are your chances of making a type II error?

90% 10% 5% It is not possible to tell without more information is correct Result

If you are performing a hypothesis test based on a 90% confidence level, what are your chances of making a type I error?

90% The probability of a type I error is equal to the significance level, which is 1-confidence level. 10% is correct The probability of a type I error is equal to the significance level, which is 1-confidence level. A 90% confidence level indicates that the significance level is 10%. Therefore there is a 10% change of making a type I error. 5% The probability of a type I error is equal to the significance level, which is 1-confidence level. It is not possible to tell without more information The confidence level provides the necessary information. The probability of a type I error is equal to the significance level, which is 1-confidence level.

If you are performing a hypothesis test based on a 90% confidence level, what are your chances of making a type I error?

90% The probability of a type I error is equal to the significance level, which is 1-confidence level. 10% is correct The probability of a type I error is equal to the significance level, which is 1-confidence level. A 90% confidence level indicates that the significance level is 10%. Therefore there is a 10% change of making a type I error. 5% The probability of a type I error is equal to the significance level, which is 1-confidence level. It is not possible to tell without more information The confidence level provides the necessary information. The probability of a type I error is equal to the significance level, which is 1-confidence level. Result

95% is the Threshold level of likelihood

95% falls within two standard deviations from the historical mean. z-value of 95% is 1.96 use the excel function confidence.norm to either add or subtract the confidence value from the historical mean

To perform a two-sample test in Excel, we use the same T.TEST function we used earlier. The only difference is that we use the actual data from the second sample for our second column of data.

=T.TEST(array1, array2, tails, type) array1 is a set of numerical values or cell references. This will be one sample. array2 is a set of numerical values or cell references. This will be the other sample. tails is the number of tails for the distribution. It should be set to 1 to perform a one-sided test; to 2 to perform a two-sided test. type can be 1, 2, or 3. Type 1 is a paired test and is used when the same group is tested twice to provide paired "before and after" data for each member of the group. Type 2 is an unpaired test in which the samples are assumed to have equal variances. Type 3 is an unpaired test in which the samples are assumed to have unequal variances. Since we have no reason to believe that the variances of our two samples are the same, we use type 3. There are ways to test whether variances are equal, but when in doubt, use type 3.

Performing a one-sided hypothesis test using Excel is very similar to performing a two-sided test. To calculate the p-value for the sample mean we use the same function we learned about earlier. The only difference in setting up a one-sided test versus a two-sided test is the number we assign to the tails argument: 1 for a one-sided test and 2 for a two-sided test.

=T.TEST(array1, array2, tails, type) array1 is a set of numerical values or cell references. We will place our sample data in this range. array2 is a set of numerical values or cell references. We have only one set of data, so we will use the historical mean, 6.7, as the second data set. To do this, we create a column with each entry equal to 6.7. tails is the number of tails for the distribution. It can be either 1 or 2. Now that we are performing a one-sided test, we will enter a 1 instead of a 2. type can be 1, 2, or 3. Type 1 is a paired test and is used when the same group is tested twice to provide paired "before and after" data for each member of the group. Type 2 is an unpaired test in which the samples are assumed to have equal variances. Type 3 is an unpaired test in which the samples are assumed to have unequal variances. The variances of the two columns are clearly different in our case, so we use type 3. There are ways to test whether variances are equal, but when in doubt, use type 3.

A car manufacturing executive introduces a new method to install a car's brakes that is much faster than the previous method. He needs to test whether the brakes installed with the new method are as safe and effective as those installed with the previous method. His null hypothesis is that the brakes installed using the new method are as safe as those installed using the old method. In this situation, would it be worse to make a type I error or a type II error?

A type I error, or false positive, would be that the brakes installed using the new method are safe but the manufacturer deems them unsafe and returns to the previous, slower method of installation. This would reduce the manufacturer's efficiency, but would not compromise the cars' safety. Type II is correct A type II error, or false negative, would be that the brakes are actually not safe but the manufacturer deems them safe and proceeds with the new installation method. This would be worse than returning to the slower method, because the unsafe cars could cause injuries or fatal accidents.

The specific method of hypothesis testing that Amazon uses is called

A/B testing, one of the most commonly used tools for web design optimization.

Base your decisions upon the relative cost of making each type of error.

Before we determine the significance of the results, let's look at the direction of the change. What is the effect of changing the shopping cart design on Total Units Ordered (Units) and Ordered Product Sales (OPS)? Units increased and OPS decreased Look at the difference between the mean of the control and the mean of the treatment for each test. If the mean of the treatment is larger than the mean of the control, then the design change increased that test's metric; if the mean of the treatment is smaller than the mean of the control, then the design change decreased it. Units decreased and OPS increased Look at the difference between the mean of the control and the mean of the treatment for each test. If the mean of the treatment is larger than the mean of the control, then the design change increased that test's metric; if the mean of the treatment is smaller than the mean of the control, then the design change decreased it. Units increased and OPS increased is correct For each test, the mean of the treatment is larger than the mean of the control. This indicates that changing the design of the shopping cart increased both Units and OPS. The Mean Difference and % Mean Difference confirm the increases. Units decreased and OPS decreased Look at the difference between the mean of the control and the mean of the treatment for each test. If the mean of the treatment is larger than the mean of the control, then the design change increased that test's metric; if the mean of the treatment is smaller than the mean of the control, then the design change decreased it.

The manager now has reason to believe that showing old classics has increased the customer satisfaction rating. Recall that the historical average satisfaction rating was 6.7 and that the random sample of 196 moviegoers has an average satisfaction rating of 7.3 and a standard deviation of 2.8. Calculate the upper bound of the 95% range of likely sample means for this one-sided hypothesis test using the CONFIDENCE.NORM function.

CONFIDENCE.NORM finds the margin of error for a two-sided hypothesis test but we are interested in the upper bound of a one-sided test. To find the upper bound for the one-sided test we must first determine what two-sided test would have a 5% rejection region on the right side. Since the distribution of sample means is symmetric, a two-sided test with a 10% significance level would have a 5% rejection region on the left side of the normal distribution and a 5% rejection region on the right side. Thus, the upper bound for a two-sided test with alpha=0.1 will be the same as the upper bound on a one-sided test with alpha=0.05. The margin of error is CONFIDENCE.NORM(0.1,C3,C4)=0.33. The upper bound of the 95% range of likely sample means for this one-sided hypothesis test is the population mean plus the margin of error, which is approximately 6.7+0.33=7.03.

Suppose the average satisfaction rating of the sample is 7.0 out of 10. Which of the following do you think would be the correct conclusion? Remember that H0:μ=6.7 and Ha:μ≠6.7.

Do not reject the null hypothesis is correct Although we can't be completely sure without doing the analysis, it would probably not be that unusual to draw a sample that has a mean of 7.0 if the average customer satisfaction rating has not changed, and is still 6.7. Therefore, we would probably fail to reject the null hypothesis. To be certain whether this is the case, we would have to complete the hypothesis test—that is, construct the range around the historical population mean and see whether or not 7.0 falls in that range.

Suppose the average satisfaction rating of the sample is 6.8 out of 10. Which of the following do you think would be the correct conclusion? Remember that H0:μ=6.7 and Ha:μ≠6.7.

Do not reject the null hypothesis is correct If the average customer satisfaction rating has not changed (μ=6.7), it would not be unusual to draw a sample that has a mean of 6.8. Therefore, we would probably fail to reject the null hypothesis.

Excel's Data Analysis Tool Pack provides another tool, called the t-Test: Two-Sample Assuming Unequal Variances tool, for finding the p-value when we assume unequal variances. This tool provides information in addition to the p-value. The inputs for the tool are similar to the T.TEST function. We enter the ranges of our samples into Variable 1 Range and Variable 2 Range. The Hypothesized Mean Difference is 0, because our null hypothesis assumes no change, and alpha is the significance level. The default significance level is 0.05.

We have found that for the movie theater example, the p-value for the one-sided hypothesis test is 0.0013. Assuming a 0.05 significance level, what would you conclude?

Fail to reject the null hypothesis Because the p-value is less than the specified significance level of 0.05, we reject the null hypothesis. What can we conclude after rejecting it? Reject the null hypothesis and conclude that the average satisfaction rating is no longer 6.7 Because the p-value is less than the specified significance level of 0.05, we reject the null hypothesis. If this were a two-sided hypothesis test, the accurate conclusion would be that the average satisfaction rating is no longer equal to 6.7. For this one-sided test, what is the alternative hypothesis, the claim we wish to substantiate? Reject the null hypothesis and conclude that the average satisfaction rating has increased is correct Because the p-value is less than the specified significance level of 0.05, we reject the null hypothesis. Our alternative hypothesis, the claim we wish to substantiate, is μ>6.7, so by rejecting the null hypothesis we are able to conclude that the average satisfaction rating has increased. Reject the null hypothesis and conclude that the average satisfaction rating has decreased Because the p-value is less than the specified significance level of 0.05, we reject the null hypothesis. For this one-sided test, what is the alternative hypothesis, the claim we wish to substantiate?

How would you interpret the p-value of 0.0026?

If the null hypothesis is true, the likelihood of obtaining a sample with a mean at least as extreme as 7.3 is 0.26% is correct The p-value of 0.0026 indicates that if the population mean were actually still 6.7, there would be a very small possibility, just 0.26%, of obtaining a sample with a mean at least as extreme as 7.3. Equivalently, since 7.3-6.7=0.6, this p-value tells us that if the null hypothesis is true, the probability of obtaining a sample with a mean less than 6.7-0.6=6.1 or greater than 6.7+0.6=7.3 is 0.26%.

If we use the most commonly used significance level of 0.05, we draw our conclusions on whether the sample's p-value is less than or greater than 0.05.

If the p-value is less than 0.05, we reject the null hypothesis. If the p-value is greater than or equal to 0.05, we fail to reject the null hypothesis. It is always important to use your managerial judgment when making decisions, especially when the p-value is very close to the significance level.

In this concept, we'll formalize the process of finding the lower and upper cut-off points, that is, we'll construct a range of likely sample means for the movie theater example. We will use this range to determine precisely when we should reject the null hypothesis, and when we should fail to reject it.

In the previous concept we were quite comfortable saying that we would reject the null hypothesis if the sample had a mean as low as 3.5 or as high as 9.9—both of these values are very far from the historical population mean. Likewise, if the sample had a mean of 6.8, we were pretty sure that we would fail to reject the null hypothesis because 6.8 is so close to 6.7. However, when it came to determining what to do if the sample had a mean of 7.0, we were a little less certain. Where is the cut-off point? At what point do we reject the null hypothesis? 7.1? 7.2? 7.3? 7.4? What about samples with means below 6.7? Where is the cut-off point? 6.3? 6.2? 6.1?

We create a clear hypothesis by formulating a

Null Hypothesis (H0) Alternative Hypothesis (Ha)

Jury Trial Analogy

Null is the accused is INNOCENT Alternative is accused is GUILTY The two outcomes are GUILTY or NOT GUILTY. You don't declare that the accused is INNOCENT. In a jury trial, we cannot conclude that the accused is innocent. Similarly, in the movie theater example, even if a sample's average satisfaction rating is close to 6.7, we cannot conclude that the average satisfaction still IS 6.7. We never accept the null hypothesis—we simply do not reject it, that is, we "fail to reject" it. If we reject the null hypothesis, we essentially accept the alternative hypothesis.

Suppose we want to know whether students who attend a top business school have higher earnings than those who attend lower-ranked business schools. To find out, we collect the average starting salaries of recent graduates from the top 100 business schools in the U.S. We then compare the salaries of those who attended the schools ranked in the top 50 to the salaries of those who did not. Should we perform a one-sided hypothesis test or a two-sided test?

One-sided is correct Since we are interested only in whether the average salaries of people who attended the top 50 business schools are higher than the salaries of those who did not, we should perform a one-sided test. If we were interested in learning whether the salaries of the people who went to the top 50 business schools were different (either higher or lower) than those from the other schools, we would conduct a two-sided test. Two-sided is incorrect See correct answer for explanation.

On the basis of the resulting p-value, would we reject the null hypothesis or fail to reject the null hypothesis at the 0.05 significance level?

Reject the null hypothesis is correct Because the p-value, 0.0000, is less than the significance level, we should reject the null hypothesis. Fail to reject the null hypothesis See correct answer for explanation. Result Correct!

Suppose the average satisfaction rating of the sample is 3.5 out of 10. Which of the following do you think would be the correct conclusion? Remember that H0:μ=6.7 and Ha:μ≠6.7.

Reject the null hypothesis is correct The null hypothesis is that the average satisfaction rating has not changed (μ=6.7). Drawing a sample with an average satisfaction rating of 3.5 from a population that has an average rating of 6.7 is extremely unlikely, so we would reject the null hypothesis and conclude that the average satisfaction rating is no longer 6.7. Note that 3.5 is the same distance (3.2) from 6.7 as 9.9 is from 6.7. Since the distribution of sample means is symmetric, we can conclude that 3.5 and 9.9 have the same (very low) likelihood of being drawn from a population with a mean of 6.7. We will see shortly the key roles the distribution of sample means and the central limit theorem play in hypothesis testing.

Let's return to the movie theater example and focus on the sample taken after the manager changes the theater's artistic focus. Suppose the average satisfaction rating of the sample is 9.9 out of 10. Which of the following do you think would be the correct conclusion? Remember that H0:μ=6.7 and Ha:μ≠6.7.

Reject the null hypothesis is correct The null hypothesis is that the average satisfaction rating has not changed, that is, that the population mean μ is still equal to 6.7. Drawing a sample with an average satisfaction rating of 9.9 from a population that has an average rating of 6.7 is extremely unlikely, so we would almost certainly reject the null hypothesis and conclude that the average satisfaction rating is no longer 6.7.

Suppose again that the movie theater manager had gathered a sample that had an average customer satisfaction rating of 7.05 but in this case had firm convictions that if the average rating had changed, it had increased. Given what you know about the relationship between the p-values of one-sided and two-sided tests, would you reject or fail to reject the null hypothesis, H0:μ≤6.7, at a 5% significance level? As noted above, for a two-sided test with H0:μ=6.7 and Ha:μ≠6.7, the p-value of 7.05 is approximately 0.07.

Reject the null hypothesis is correct The p-value for a one-sided hypothesis test is half the p-value of a two-sided test for the same value. The p-value for 7.05 for the two-sided hypothesis test was 0.07, so the p-value for 7.05 for the one-sided test is 0.035. Because 0.035 is less than the significance level, 0.05, we reject the null hypothesis and conclude that the average customer satisfaction rating has increased. Note that the outcomes of one-sided and two-sided tests can be different. Just because we did not reject the null hypothesis for the two-sided test does not mean that we will have the same result for the one-sided test. Fail to reject the null hypothesis See correct answer for explanation.

Suppose the movie theater manager had gathered a sample that had an average customer satisfaction rating of 7.05. For the two-sided test with H0:μ=6.7 and Ha:μ≠6.7, the p-value is approximately 0.07. Would you reject or fail to reject the null hypothesis, μ=6.7, at the 5% significance level?

Reject the null hypothesis is incorrect See correct answer for explanation. Fail to reject the null hypothesis is correct Because the p-value, 0.07, is greater than the significance level, 0.05, we do not have enough evidence to reject the null hypothesis, so we would fail to reject it. Result

The significance level also defines the confidence level:

Significance Level=1-Confidence Level

=T.TEST(A2:A197,B2:B197,2,3)

Since the p-value, 0.0026, is less than the 0.05 significance level, we reject the null hypothesis and conclude that the customer satisfaction rating has changed.

T TEST Previously, based on the sample standard deviation of 2.8, the sample size of 196, and the confidence level of 95%, we found that the range of likely sample means runs from about 6.3 to about 7.1. Since our sample mean of 7.3 fell outside of that range, we concluded that we had sufficient evidence to reject the null hypothesis that the mean customer service rating had not changed. Now let's calculate the p-value for our sample of movie theater customers and find out exactly how unlikely it would be to select a sample that has an average customer satisfaction rating at least as extreme as 7.3, if the average customer satisfaction rating is actually still 6.7. "At least as extreme" means at least as far from 6.7 as 7.3 is, that is, outside of the range 6.7±0.6. Thus, in this case, the likelihood that we would obtain a sample at least as extreme as 7.3 is the likelihood of obtaining a sample less than or equal to 6.1 or greater than or equal to 7.3. Although there are multiple ways to calculate a p-value in Excel, we will use a t-test, the most common method used for hypothesis tests. The t-test uses a t-distribution, which provides a more conservative estimate of the p-value when the sample size is small. Recall that as the sample size increases, the t-distribution converges to a normal distribution, so a t-distribution can be used for large samples as well. Companies tend to use the t-distribution rather than the normal distribution because it is safe for both small and large samples. Unfortunately, the process for conducting a one-sample hypothesis test in Excel is a bit unwieldy. To use Excel's T.TEST function for a hypothesis test with one sample, we must create a second column of data that will act as a second sample. We will walk through how to use the T.TEST function but please understand that the most important thing to take away from this discussion is the interpretation of the p-value. =T.TEST(array1, array2, tails, type) array1 is a set of numerical values or cell references. We will place our sample data in this range. array2 is a set of numerical values or cell references. We have only one set of data, so we will use the historical mean, 6.7, as the second data set. To do this, we create a column with each entry equal to 6.7. tails is the number of tails for the distribution. It can be either 1 or 2. We will learn more about what this means later in the module. Since our alternative hypothesis is that the mean has changed and therefore can be either lower or higher than the historical mean, we will be using a two-tailed, or two-sided hypothesis test. type can be 1, 2, or 3. Type 1 is a paired test and is used when the same group is tested twice to provide paired "before and after" data for each member of the group. Type 2 is an unpaired test in which the samples are assumed to have equal variances. Type 3 is an unpaired test in which the samples are assumed to have unequal variances. The variances of the two columns are clearly different in our case, so we use type 3. There are ways to test whether variances are equal, but when in doubt, use type 3.

The Arrow A/B Test (Wrong)

Alternative Hypothesis (Ha)

The alternative hypothesis (the opposite of the null hypothesis) is the theory or claim we are trying to substantiate. If our data allow us to nullify the null hypothesis, we substantiate the alternative hypothesis.

What is the alternative hypothesis (Ha) of the movie theater example? Recall that the historical average customer satisfaction rating is 6.7 out of 10.

The alternative hypothesis is that the new artistic approach of showing old classics has changed the average satisfaction rating. Therefore Ha:μ≠6.7. Note that Ha:μ≠6.7 is the opposite of H0:μ=6.7, which confirms our understanding that the alternative hypothesis is the opposite of the null hypothesis.

Null Hypothesis (H0):

The null hypothesis is a statement about a topic of interest. It is typically based on historical information or conventional wisdom. We always start a hypothesis test by assuming that the null hypothesis is true and then test to see if we can nullify it—that's why it's called the "null" hypothesis. The null hypothesis is the opposite of the hypothesis we are trying to prove (the alternative hypothesis).

What is the null hypothesis (H0) of the movie theater example? Recall that the historical average customer satisfaction rating is 6.7 out of 10.

The null hypothesis is that the new artistic approach of showing old classics has not affected the average customer satisfaction rating; that is, the new average customer satisfaction rating is the same as its historical value of 6.7 out of 10. Therefore H0:μ=6.7.

Here are the average starting salaries of recent graduates from the top 100 U.S. business schools. Use the T.TEST function to test whether those attending a top 50 business school have higher earnings than those who do not. Remember that this is a one-tailed test, so we should set the tails variable to 1. Also, make sure to use test type 3 to indicate a two-sample test with unequal variances. H0:μtop 50≤μnot top 50 Ha:μtop 50>μnot top 50

The p-value is T.TEST(A2:A51,B2:B51,1,3)=0.0000.

Below are the movie theater ratings in the sample we used for a two-sided test. Recall that because Excel does not have a function dedicated to one-sample tests, we must convert each one-sample test into a two-sample test. Adding a column of constant values essentially provides a second "sample" for Excel to use. Each value in our second "sample" is the same constant, the historical average rating. Here this step has already been completed. Step 1 In cell E2, enter the function =T.TEST(A2:A197,B2:B197,1,3) to calculate the p-value of the sample mean.

The p-value of our sample, which has a mean of 7.3, is T.TEST(A2:A197,B2:B197,1,3)=0.0013. Notice that 0.0013, the p-value from our one-sided test for the sample mean 7.3, is half of 0.0026, the p-value from our two-sided test for the sample mean 7.3. This should make sense. In each case, the p-value is the probability of obtaining a sample mean at least as extreme as 7.3 under the assumption that the null hypothesis is true. In the two-sided test, this is the probability of obtaining a sample with a mean less than 6.1 or greater than 7.3; in the one-sided test it is the probability of obtaining a sample with a mean greater than 7.3. Thus, we can perform a two-sided hypothesis test and just divide the resulting p-value by two to obtain the p-value for the one-sided test.

Suppose we wanted to calculate a 90% range of likely sample means for the movie theater example but our sample size had been only 15. (Assume the same historical population mean, sample mean, and sample standard deviation.) Select the function that would correctly calculate this range.

The range of likely sample means is centered at the historical population mean, in this case 6.7. We must use CONFIDENCE.T since the sample size is less than 30.

what if we want to know more than just whether or not to reject the null hypothesis... what if we want to know HOW STRONG OUR EVIDENCE IS?

The significance level defines the rejection region by specifying the threshold for deciding whether or not to reject null hypothesis. When the p-value of a sample mean is less than the significance level, we reject the null hypothesis.

significance level

The significance level is the area of the rejection region, meaning the area under the distribution of sample means over the rejection region.

Hypothesis testing

To ensure that important managerial decisions are as well-informed as possible, it is critical to put our claims and theories to the test before making those decisions. Hypothesis testing allows us to rigorously test such claims and theories.

Since our hypothesis test is based on sample data, there is always a chance that we will draw the wrong conclusion about the population. Unless we poll every moviegoer, we will never know for sure if the average customer satisfaction has truly changed. However, polling everyone is impractical—often impossible.

To summarize, we can go wrong in two ways. A type I error = FALSE POSITIVE 5% if Null is true A type II error = FALSE NEGATIVE ? % if Null is false Note that since we have no information on the probabilities of different sample means if the null hypothesis is false, we cannot calculate the likelihood of a type II error.

The confidence level tells us how confident we can be that the range of likely sample means contains the true population mean.

We should always specify the significance level (and thus the confidence level) before performing a hypothesis test.

Are the results for Total Units Ordered (Units) significant at the 95% confidence level?

Yes is correct A test's results are significant if the p-value is less than the significance level. In this case, the p-value, 0.0169, is less than 0.05, so the results are significant. No See correct answer for explanation. Result Correct! Question 3 of 3

Are the results for Ordered Product Sales (OPS) significant at the 95% confidence level?

Yes is correct A test's results are significant if the p-value is less than the significance level. In this case, the p-value, 0.0339, is less than 0.05, so the results are significant. No See correct answer for explanation.

The REJECTION REGION is the region outside of the threshold of likelihood.

You reject the NULL HYPOTHESIS

First we must

clearly state his hypothesis

Treatment

elegant modern arrow

Control

existing site with the semicircle behind the arrow

significance level is the probability

of rejecting the null hypothesis when the null hypothesis is actually true.

survey

researchers ask questions and record self-reported responses from a random sample of a population.

experiment

researchers divide a sample into two groups. In the "treatment group," they manipulate a variable and then compare that group's responses to the responses of the "control group," which has not been manipulated.

observational study

researchers observe and collect data about a sample (e.g., people or items) as they occur naturally, without intervention, and analyze the data to investigate possible relationships.

A manager of a factory wants to know if a new quality check protocol has decreased the number of units a worker produces in a day. Before the new protocol, a worker could produce 27 units per day. What null hypothesis should the manager use to test this claim?

µ = 27 units The null and alternative hypotheses are always opposites. If the manager's alternative hypothesis is that the average daily units produced has decreased, then the null hypothesis is that the average is the same or has increased. How would we represent that mathematically? µ ≥ 27 units is correct This is the alternative hypothesis that the manager is trying to substantiate. Remember that the null and alternative hypotheses are opposites. µ > 27 units The null and alternative hypotheses are always opposites. If the manager's alternative hypothesis is that the average daily units produced has decreased, then the null hypothesis is that the average is the same or has increased. µ < 27 units The null and alternative hypotheses are always opposites. If the manager's alternative hypothesis is that the average daily units produced has decreased, then the null hypothesis is that the average is the same or has increased.

A manager of a factory wants to know if a new quality check protocol has decreased the number of units a worker produces in a day. Before the new protocol, a worker could produce 27 units per day. What alternative hypothesis should the manager use to test this claim?

µ ≠ 27 units The alternative hypothesis is the claim the manager would like to substantiate. The manager does not want to test whether the average number of units a worker can produce has increased or decreased. µ ≤ 27 units The alternative hypothesis is the claim the manager would like to substantiate. The manager does not want to test whether the average number of units a worker can produce has remained the same or decreased. µ < 27 units is correct The manager want to know if the new quality check protocol has decreased the average number of units a worker can produce per day. For a one-sided test, the manager should use the alternative hypothesis Ha: μ<27 units. This is the claim the manger wishes to substantiate. µ > 27 units The alternative hypothesis is the claim the manager would like to substantiate. The manager does not want to test whether the average number of units a worker can produce has increased.

A manager of a factory wants to know if the average number of workplace accidents is different for workers who attended an equipment safety training compared to those who did not attend. What null hypothesis should the manager use to test this claim?

µattended > µdid not attend The null and alternative hypotheses are always opposites. If the manager's alternative hypothesis is that the average number of workplace accidents has changed, then the null hypothesis is that average number of workplace accidents has remained the same. How would we represent that mathematically? µattended ≥ µdid not attend The null and alternative hypotheses are always opposites. If the manager's alternative hypothesis is that the average number of workplace accidents has changed, then the null hypothesis is that average number of workplace accidents has remained the same. How would we represent that mathematically? µattended ≤ µdid not attend c) The null and alternative hypotheses are always opposites. If the manager's alternative hypothesis is that the average number of workplace accidents has changed, then the null hypothesis is that average number of workplace accidents has remained the same. How would we represent that mathematically? µattended = µdid not attend is correct If the manager's alternative hypothesis is that the average number of workplace accidents has changed between the two groups of workers, then the null hypothesis is that the average number of accidents has remained the same.

A manager of a factory wants to know if the average number of workplace accidents is different for workers who attended an equipment safety training compared to those who did not attend. What null hypothesis should the manager use to test this claim?

µattended > µdid not attend The null and alternative hypotheses are always opposites. If the manager's alternative hypothesis is that the average number of workplace accidents has changed, then the null hypothesis is that average number of workplace accidents has remained the same. How would we represent that mathematically? µattended ≥ µdid not attend The null and alternative hypotheses are always opposites. If the manager's alternative hypothesis is that the average number of workplace accidents has changed, then the null hypothesis is that average number of workplace accidents has remained the same. How would we represent that mathematically? µattended ≤ µdid not attend c) The null and alternative hypotheses are always opposites. If the manager's alternative hypothesis is that the average number of workplace accidents has changed, then the null hypothesis is that average number of workplace accidents has remained the same. How would we represent that mathematically? µattended = µdid not attend is correct If the manager's alternative hypothesis is that the average number of workplace accidents has changed between the two groups of workers, then the null hypothesis is that the average number of accidents has remained the same. Result

A manager of a factory wants to know if the average number of workplace accidents is different for workers who attended an equipment safety training compared to those who did not attend. What alternative hypothesis should the manager use to test this claim?

µattended ≠ µdid not attend is correct The manager has reason to believe that the training has changed the average number of workplace accidents between the two groups of workers. For a two-sided test, the manager should use the alternative hypothesis Ha: µattended ≠ µdid not attend. This is the claim the manger wishes to substantiate. µattended > µdid not attend The alternative hypothesis is the claim the manager would like to substantiate. The manager does not want to test whether the average number of workplace accidents has increased for those that attended the training. µattended < µdid not attend The alternative hypothesis is the claim the manager would like to substantiate. The manager does not want to test whether the average number of workplace accidents has decreased for those that attended the training. While we may guess that accidents would decrease after training, the manager wishes to test for a change µattended = µdid not attend The alternative hypothesis is the claim the manager would like to substantiate. The manager does not want to test whether the average number of workplace accidents has remained the same.

The manager now has reason to believe that showing old classics has increased the customer satisfaction rating. For this one-sided hypothesis test, what alternative hypothesis should he use?

μ=6.7 The alternative hypothesis is the claim that we would like to substantiate. The manager does not want to test whether the average rating stayed the same; he wants to test whether it increased. μ≠6.7 This was the alternative hypothesis for the two-sided test, when the manager wanted to test for any change, an increase or a decrease, in the average customer satisfaction rating. Now he wants to test only whether it has increased. μ>6.7 is correct The manager has reason to believe that the new artistic approach has increased the average customer satisfaction, so for a one-sided test he should use the alternative hypothesis Ha:μ>6.7. This is the claim he wishes to substantiate. μ≤6.7 The alternative hypothesis is the claim that we would like to substantiate. The manager does not want to test whether the average rating decreased or stayed the same; he wants to test whether it increased.

For the one-sided hypothesis test, what should the movie theater manager use as the null hypothesis?

μ=6.7 The null and alternative hypotheses are always opposites. If our alternative hypothesis is that the average satisfaction rating has increased, then the null hypothesis is that the rating is the same or lower. How would we represent this mathematically? μ≠6.7 The null and alternative hypotheses are always opposites. If our alternative hypothesis is that the average satisfaction rating has increased, then the null hypothesis is that the rating is the same or lower. How would we represent this mathematically? μ>6.7 This is the alternative hypothesis that the manager is trying to substantiate. Remember that the null and alternative hypotheses are opposites. μ≤6.7 is correct If our alternative hypothesis is that is that the average satisfaction rating has increased, then the null hypothesis is that the rating is the same or lower. Thus, if our alternative hypothesis is that μ>6.7, our null hypothesis is that μ≤6.7.

Suppose we want to know whether students who attend a top business school have higher earnings. What is the alternative hypothesis?

μtop 50≠μnot top 50 The alternative hypothesis is the claim we wish to substantiate. Think about what claim we want to establish. μtop 50=μnot top 50 The alternative hypothesis is the claim we wish to substantiate. Think about what claim we want to establish. μtop 50>μnot top 50 is correct The alternative hypothesis is the claim we wish to substantiate. In this case, we want to establish that people who attended a school ranked in the top 50 earn more than those who did not, so μtop 50>μnot top 50. μtop 50<μnot top 50 The alternative hypothesis is the claim we wish to substantiate. Think about what claim we want to establish.

Suppose we want to know whether students who attend a top business school have higher earnings. What is the null hypothesis?

μtop 50≠μnot top 50 The null hypothesis is the claim we assume to be true. It is the opposite of the alternative hypothesis, which is the claim we wish to substantiate. Think about what claim we would like to establish. What is the opposite? μtop 50=μnot top 50 The null hypothesis is the claim we assume to be true. It is the opposite of the alternative hypothesis, which is the claim we wish to substantiate. Think about what claim we would like to establish. What is the opposite? μtop 50≥μnot top 50 The null hypothesis is the claim we assume to be true. It is the opposite of the alternative hypothesis, which is the claim we wish to substantiate. Think about what claim we would like to establish. What is the opposite? μtop 50≤μnot top 50 is correct The null hypothesis is the claim we assume to be true. It is the opposite of the alternative hypothesis—the claim we wish to substantiate. In this case, our alternative hypothesis is that people who attended a school ranked in the top 50 earn more than those who did not. The opposite of this is that people who attended a school ranked in the top 50 earn less than or equal to those who did not.

3.1 Hypothesis Testing

Ensembles d'études connexes

Peritonitis

Quiz 4

Chapter 7 Consumers , Producers , and Efficiency of Markets Practice Questions

Chapter 11 - Asset Forfeiture

Pediatric Variations of Nursing Interventions (Chp 20?) 42Qw/exp

Philosophy. Midterm

Test 3

anatomy lecture 5 exam

Tema 5 - Los Desafíos Mundiales (Contexto 3)

Software Testing and Quality Assurance

4.8-4.12 ppq

Famous scientists

Leadership Chp. 24 Performance Appraisals

heme week 2 review

Photosynthesis Key Concepts

RN Maternal Newborn Online Practice 2019 B

POLS 1002 Final Quiz Q's

growth and development ch 24: toddler 1-3 years

Roma Aeterna, Cap. XLI, Lēctiō I, versūs 1-68

Coursera PL-300 Mock Exam 3