Chapter 6
Hypothesis Testing Procedure*
1. State hypothesis 2. Select appropriate test statistic 3. Specify level of significance 4. State the decision rule regarding the hypothesis 5. Collect the sample and calculate the sample statistics 6. Make a decision regarding the hypothesis 7. Make a decision based on the results of the test
A confidence interval
A confidence interval is determined as: {[samplestatistic-(criticalvalue)(standarderror)]≤ populationparameter≤[samplestatistic+(criticalvalue)(standarderror)]}
Contingency Table for Categorical Data
A contingency or two-way table shows the number of observations from a sample that have a combination of two characteristics. Contingency Table for Categorical Data is a contingency table where the characteristics are earnings growth (low, medium, or high) and dividend yield (low, medium, or high). We can use the data in the table to test the hypothesis that the two characteristics, earnings growth and dividend yield, are independent of each other.
Adjusted significance
Adjusted significance illustrates an example of this procedure. We list thep-values for the tests that are less than 10% in ascending order. The adjusted significance levels are calculated with the following formula:
An F-distribution
An F-distribution is presented in F-Distribution. As indicated, the F-distribution is right-skewed and is bounded by zero on the left-hand side. The shape of the F-distribution is determined by two separate degrees of freedom, the numerator degrees of freedom, df1, and the denominator degrees of freedom, df2.
Two-Tailed Hypothesis Test With p-Value
Consider a two-tailed hypothesis test about the mean value of a random variable at the 95% significance level where the test statistic is 2.3, greater than the upper critical value of 1.96. If we consult the Z-table, we find the probability of getting a value greater than 2.3 is (1 - 0.9893) = 1.07%. Since it's a two-tailed test, our p-value is 2 × 1.07 = 2.14%, as illustrated in Two-Tailed Hypothesis Test With p-Value = 2.14% At a 3%, 4%, or 5% significance level, we would reject the null hypothesis, but at a 2% or 1% significance level, we would not. Many researchers report p-values without selecting a significance level and allow the reader to judge how strong the evidence for rejection is.
one-tailed hypothesis test
For a one-tailed hypothesis test of the population mean, the null and alternative hypotheses are either: Upper tail: H0: µ ≤ µ0 versus Ha: µ > µ0, or Lower tail: H0: µ ≥ µ0 versus Ha: µ < µ0
Adjusted significance
From the results reported in Adjusted significance, we see that only two of the tests should actually be counted as rejections. Because only 2 of the 20 tests (tests 12 and 4) qualify as actual rejections based on comparison of their p-values with the adjusted significance values for their rank, our rejection rate is 10%. When the null hypothesis is true, two rejections from 20 tests is just what we would expect with a significance level of 10%. In this case, we will not reject the null hypothesis.
Nonparametric tests
Nonparametric tests either do not consider a particular population parameter or have few assumptions about the population that is sampled. Nonparametric tests are used when there is concern about quantities other than the parameters of a distribution or when the assumptions of parametric tests can't be supported. They are also used when the data are not suitable for parametric tests (e.g., ranked observations). Situations where a nonparametric test is called for are the following: The assumptions about the distribution of the random variable that support a parametric test are not met. An example would be a hypothesis test of the mean value for a variable that comes from a distribution that is not normal and is of small size so that neither the t-test nor the z-test is appropriate. When data are ranks (an ordinal measurement scale) rather than values. The hypothesis does not involve the parameters of the distribution, such as testing whether a variable is normally distributed. We can use a nonparametric test, called a runs test, to determine whether data are random. A runs test provides an estimate of the probability that a series of changes (e.g., +, +, -, -, +, -,....) are random.
Chi-Square Table
Note that the chi-square values in Chi-Square Table correspond to the probabilities in the right tail of the distribution. As such, the 16.791 in Decision Rule for a Two-Tailed Chi-Square Test is from the column headed 0.975 because 95% + 2.5% of the probability is to the right of it. The 46.979 is from the column headed 0.025 because only 2.5% probability is to the right of it. Similarly, at a 5% level of significance with 10 degrees of freedom, Chi-Square Table shows that the critical chi-square values for a two-tailed test are 3.247 and 20.483.
Parametric tests
Parametric tests rely on assumptions regarding the distribution of the population and are specific to population parameters. For example, the z-test relies upon a mean and a standard deviation to define the normal distribution. The z-test also requires that either the sample is large, relying on the central limit theorem to assure a normal sampling distribution, or that the population is normally distributed.
false positives
Recall that the probability of a Type I error is the probability that a true null hypothesis will be rejected (and is also the significance level of a test). Statisticians refer to these incorrect rejections of the null hypothesis as false positives. For a test of the hypothesis that the mean return to an investment strategy is equal to zero, with a significance level of 5%, we will get a false positive 5% of the time, on average. That is, our test statistic will be outside the critical values (in the tails of the distribution) and the p-value of our test statistic will be less than 0.05. If we do a single test, this conclusion is correct.
critical values (or rejection points)
Since the alternative hypothesis allows for values above and below the hypothesized parameter, a two-tailed test uses two critical values (or rejection points). The general decision rule for a two-tailed test is: Reject H0 if: test statistic > upper critical value or test statistic < lower critical value
Statistical significance and economic significance
Statistical significance does not necessarily imply economic significance. For example, we may have tested a null hypothesis that a strategy of going long all the stocks that satisfy some criteria and shorting all the stocks that do not satisfy the criteria resulted in returns that were less than or equal to zero over a 20-year period. Assume we have rejected the null in favor of the alternative hypothesis that the returns to the strategy are greater than zero (positive). This does not necessarily mean that investing in that strategy will result in economically meaningful positive returns. Several factors must be considered.
The Spearman rank correlation test
The Spearman rank correlation test, a non-parametric test, can be used to test whether two sets of ranks are correlated. Ranks are simply ordered values. If there is a tie (equal values), the ranks are shared, so if 2nd and 3rd rank is the same, the ranks are shared and each gets a rank if (2 + 3) / 2 = 2.5.
one-sided or two-sided
The alternative hypothesis can be one-sided or two-sided. A one-sided test is referred to as a one-tailed test, and a two-sided test is referred to as a two-tailed test. Whether the test is one- or two-sided depends on the proposition being tested. If a researcher wants to test whether the return on stock options is greater than zero, a one-tailed test should be used. However, a two-tailed test should be used if the research question is whether the return on options is simply different from zero. Two-sided tests allow for deviation on both sides of the hypothesized value (zero). In practice, most hypothesis tests are constructed as two-tailed tests.
The alternative hypothesis
The alternative hypothesis, designated Ha, is what is concluded if there is sufficient evidence to reject the null hypothesis. It is usually the alternative hypothesis that you are really trying to assess. Why? Because you can never really prove anything with statistics, when the null hypothesis is discredited, the implication is that the alternative hypothesis is valid.
The chi-square test
The chi-square test is used for hypothesis tests concerning the variance of a normally distributed population. Letting σ2 represent the true population variance and σ02 represent the hypothesized variance, the hypotheses for a two-tailed test of a single population variance are structured as: H0: σ2 = σ02 versus Ha: σ2 ≠ σ02 The hypotheses for one-tailed tests are structured as: H0: σ2 ≤ σ02versus Ha : σ2 > σ02 or H0: σ2 ≥ σ02 versus Ha σ2 < σ02
the null hypothesis
The null hypothesis, designated H0, is the hypothesis that the researcher wants to reject. It is the hypothesis that is actually tested and is the basis for the selection of the test statistics. The null is generally stated as a simple statement about a population parameter. Typical statements of the null hypothesis for the population mean include H0: µ = µ0, H0: µ ≤ µ0, and H0: µ ≥ µ0, where µ is the population mean and µ0 is the hypothesized value of the population mean.
p-value
The p-value is the probability of obtaining a test statistic that would lead to a rejection of the null hypothesis, assuming the null hypothesis is true. It is the smallest level of significance for which the null hypothesis can be rejected. For one-tailed tests, the p-value is the probability that lies above the computed test statistic for upper tail tests or below the computed test statistic for lower tail tests. For two-tailed tests, the p-value is the probability that lies above the positive value of the computed test statistic plus the probability that lies below the negative value of the computed test statistic.
The significance level
The significance level is the probability of making a Type I error (rejecting the null when it is true) and is designated by the Greek letter alpha (α). For instance, a significance level of 5% (α = 0.05) means there is a 5% chance of rejecting a true null hypothesis. When conducting hypothesis tests, a significance level must be specified in order to identify the critical values needed to evaluate the test statistic.
Decision Rule for a Two-Tailed Chi-Square Test
To illustrate the chi-square distribution, consider a two-tailed test with a 5% level of significance and 30 degrees of freedom. As displayed in Decision Rule for a Two-Tailed Chi-Square Test, the critical chi-square values are 16.791 and 46.979 for the lower and upper bounds, respectively. These values are obtained from a chi-square table, which is used in the same manner as a t-table. A portion of a chi-square table is presented in Chi-Square Table.
Type I error and Type II error:
Type I error: the rejection of the null hypothesis when it is actually true. Type II error: the failure to reject the null hypothesis when it is actually false.
The power of a test
While the significance level of a test is the probability of rejecting the null hypothesis when it is true, the power of a test is the probability of correctly rejecting the null hypothesis when it is false. The power of a test is actually one minus the probability of making a Type II error, or 1 - P(Type II error). In other words, the probability of rejecting the null when it is false (power of the test) equals one minus the probability of not rejecting the null when it is false (Type II error). When more than one test statistic may be used, the power of the test for the competing test statistics may be useful in deciding which test statistic to use. Ordinarily, we wish to use the test statistic that provides the most powerful test among all possible tests.