Hypothesis Testing
Effect
The difference between a sample mean and the population mean stated in the null hypothesis. In hypothesis testing, an effect is insignificant when we retain the null hypothesis; an effect is significant when we reject the null hypothesis.
Nondirectional Tests, or Two-tailed tests
1. Nondirectional tests, or two-tailed tests, are hypothesis tests where the alternative hypothesis is stated as not equal to (≠). The researcher is interested in any alternative from the null hypothesis. 2. For this test, we will place the level of significance in both tails of the sampling distribution. We are therefore interested in any alternative from the null hypothesis. This is the most common alternative hypothesis in behavioral science.
Alternative Hypothesis
An alternative hypothesis (H1) is a statement that directly contradicts a null hypothesis by stating that that the actual value of a population parameter is less than, greater than, or not equal to the value stated in the null hypothesis.
Alpha Level
Because we assume the null hypothesis is true, we control for Type 1 error by stating a level of significance. The level we set, called the alpha level (symbolized as α), is the largest probability of committing a Type 1 error that we will allow and still decide to reject the null hypothesis. This criterion is usually set as .05 (α = .05), and we compare the alpha level to the p value. When the probability of a Type 1 error is less than 5% (p< .05), we decide got reject the null hypothesis; otherwise, we retain the null hypothesis.
Advantage of Knowing Effect Size d
Its value can be used to determine the power of detecting an effect in hypothesis testing. The likelihood of detecting an effect, called power, is critical in behavioral research because it lets the researcher know the probability that a randomly selected sample will lead to a decision to reject the null hypothesis, if the null hypothesis is false. As effect size increases, power increases.
Four Steps of Hypothesis Testing
Step 1: State of hypothesis. Step 2: Set the criteria for a decision. Step 3: Compute the test statistic. Step 4: Make a decision.
Power
The correct decision is to reject a false null hypothesis. In other words, we decide that the null hypothesis is false when it is indeed false. This decision is called the power of the decision-making process because it is the decision we aim for. Remember that we are only testing the null hypothesis because we think it is wrong. Deciding to reject a false null hypothesis, then, is the power, inasmuch as we learn the most about populations when we accurately reject false notions of truth. This decision is the most published result in behavioral research.
Power
The likelihood of detecting an effect. To receive a research grant, researchers are often required to state the likelihood that they will detect the effect they are studying, assuming they are correct. In other words, researchers must disclose the power of their study. The typical standard for power is .80. Researchers try to make sure that at least 80% of the samples they select will show an effect when an effect exists in a population.
Inferential Statistics
We use inferential statistics because it allows us to measure behavior in samples to learn more about the behavior in populations that are often too large or inaccessible. We use samples because we know how they are related to populations.
Significance
When the p value is less than .05, we reach significance; the decision is to reject the null hypothesis. When the p value is greater than .05, we fail to reach significance; the decision is to retain the null hypothesis. Figure 8.3 shows the four steps of hypothesis testing.
Type III Error
A Type III error occurs with one-tailed tests, where the researcher decides to retain the null hypothesis because the rejection region was located in the wrong tail. The "wrong tail" refers to the opposite tail from where a difference was observed and would have otherwise been significant.
Critical Value
A critical value is a cutoff value that defines the boundaries beyond which less than 5% of sample means can be obtained if the null hypothesis is true. Sample means obtained beyond a critical value will result in a decision to reject the null hypothesis. In a nondirectional two-tailed test, we divide the alpha value in half so that an equal proportion of area is placed in the upper and lower tail. Table 8.4 gives the critical values for one - and two-tailed tests at a .05, .01, and .001 level of significance.
Cohen's d
A measure of effect size in terms of the number of standard deviations that mean scores shifted above or below the population mean stated by the null hypothesis. The larger the value of d, the larger the effect in the population. Cohen's effect size conventions are standard rules for identifying small, medium, and large effects based on typical findings in behavioral research.
Effect Size
A statistical measure of the size of an effect in a population, which allows researchers to describe how far scores shifted in the population, or the percent of variance that can be explained by a given variable.
Increasing Power: Decrease Beta, Standard Deviation, and Standard Error
Decreasing three factors can increase power. Decreasing beta error increases power. Decreasing the population standard deviation and standard error will also increase power.
Directional, Lower Tail Critical Hypothesis (H,:<) - Steps 1 & 2
Step 1: State the hypotheses. The population mean is 558, and we are testing whether the alternative is less than (<) this value Step 2: Set the criteria for a decision. The z-score associated with this probability is again z = 1.645. Because this test is a lower tail critical test, we place the critical value the same distance below the mean: The critical value for this test is z = -1.645. All of the alpha level is placed in the lower tail of the distribution beyond the critical value.
Null Hypothesis (H0)
The null hypothesis, stated as the null, is a statement about a population parameter, such as the population mean, that is assumed be true. The null hypothesis is a starting point. We will test whether the value stated in the null hypothesis is likely to be true.
The One-independent Sample Z-test
The one-independent sample z-test is a statistical procedure used to test hypotheses concerning the mean in a single population with a known variance.
Summary of Factors that Increase Power
The probability of rejecting a false null hypothesis.
Z-statistic + Obtained Value
The z-statistic is an inferential statistic used to determine the number of standard deviations in a standard normal distribution that a sample mean deviates from the population mean stated in the null hypothesis. The obtained value is the value of a test statistic. This value is compared ti the critical value(s) of a hypothesis test to make a decision. When the obtained value exceeds a critical value, we decide to reject the null hypothesis; otherwise, we retain the null hypothesis.
Rejection Region
The rejection region is the region beyond a critical value in a hypothesis test. When the value of a test statistic is in the rejection region, we decide to reject the null hypothesis; otherwise, we retain the null hypothesis.
Test Statistic
The test statistic is a mathematical formula that allows researchers to determine the likelihood of obtaining sample outcomes if the null hypothesis were true. The value of the test statistic is used to make a decision regarding the null hypothesis.
Null Result or Null Finding
When we decide to retain the null hypothesis, we can be correct or incorrect. The correct decision is to retain a true null hypothesis. This decision is called a null result or null finding. This is usually an uninteresting decision because the decision is to retain what we already assumed: that the value stated in the null hypothesis is correct. For this reason, null results alone are rarely published in behavioral research.
Hypothesis or Significance Testing
Hypothesis testing or significance testing is a method for testing a claim or hypothesis about a parameter in a population, using data measured in a sample. In this method, we test some hypothesis by determining the likelihood that a sample statistic could have been selected, if the hypothesis regarding the population parameter were true.
Level of Significance
Level of significance, or significance level, refers to a criterion of judgement upon which a decision is made regarding the value stated in a null hypothesis. The criterion is based on the probability of obtaining a statistic measured in a sample if the value stated in the null hypothesis were true. In behavioral science, the criterion or level of significance is typically set at 5%. When the probability of obtaining a sample mean is less than 5% if the null hypothesis were true, then we reject the value stated in the null hypothesis. Ex. This is similar to the criterion that jurors use in a criminal trial. Jurors decide whether the evidence presented shows guilt beyond a reasonable doubt (this is the criterion).
Directional, Lower Tail Critical Hypothesis (H,:<) - Steps 3 & 4
Step 3: Compute the test statistic. The probability is less than 5% that we will obtain a sample mean that is at least 1.645 standard deviations below the value of the population mean stated in the null hypothesis. Step 4: Make a decision. To make a decision, we compare the obtained value to the critical value. We reject the null hypothesis if the obtained value exceeds the critical value.
Type 1 Error
The incorrect decision is to reject a true null hypothesis. This decision is an example of a Type 1 error. With each test we make, there is always some probability that our decision is a Type 1 error. A researcher who makes this error decides to reject previous notions of truth that are in fact true. Making this type of error is analogous to finding an innocent person guilty. To minimize this error, we place the burden on the researcher to demonstrate evidence that the null hypothesis is indeed false. To demonstrate evidence that leads to a decision to reject the null hypothesis, the research must reach significance (p< .05).
Type II Error, or β error
The incorrect decision is to retain a false null hypothesis. This decision is an example of a Type II error, or β error. With each test we make, there is always some probability that the decision could be a Type II error. In this decision, we decide to retain previous notions of truth that are in fact false. While it's an error, we still did nothing; we retained the null hypothesis. We can always go back and conduct more studies.
P-value
The p value is a probability: It varies between 0 and 1 and can never be negative. In Step 2, we stated the criterion or probability of obtaining a sample mean at which point we will decide to reject the value stated in the null hypothesis, which is typically set at 5% in behavioral research. When the p value is less than 5% (p<.05), we reject the null hypothesis. We will refer to p< .05 as the criterion for deciding to reject the null hypothesis, although note that when p = .05, the decision is also to reject the null hypothesis. When the p value is greater than 5% (p> .05), we retain the null hypothesis.
Large Effect Size and High Power for Class 2
In this example, when alpha is .05, the critical value or cutoff for alpha is 38.61. When alpha is equal to .05, notice that practically any sample will detect this effect (the power). So if the researcher is correct, and the null is false (with a 2-point effect), nearly 100% of the samples he or she selects at random will result in a decision to reject the null hypothesis.
Small Effect Size and Low Power for Class 1
In this example, when alpha is .05, the critical value or cutoff for alpha is 40.99. When alpha equals .05, notice that only about 29% of samples will detect this effect (the power). So even if the researcher is correct, and the null is false (with a 2-point effect), only about 29% of the samples he or she selects at random will result in a decision to reject the null hypothesis.
Increasing Power: Increase Effect Size, Sample Size, and Alpha
Increasing effect size, sample size, and the alpha level will increase power. The alpha level is the probability of a Type I error; it is the rejection region for a hypothesis test. The larger the rejection region, the greater the likelihood of rejecting the null hypothesis, and the greater the power will be. This is why one-tailed tests are more powerful than two- tailed tests: They increase alpha in the direction that an effect is expected to occur, thereby increasing the power to detect an effect.