PSY 360 - Chapter 8: Introduction To Hypothesis Testing
In step 3: collecting data and compute sample statistics, when is data collected and and what is the goal of collecting data and computing the mean?
- Data is collected AFTER hypotheses are stated. - Data is collected AFTER criteria for decision is set. -> This sequence of events helps to ensure that a researcher makes an honest, objective evaluation of the data. - After collecting data and computing the mean (sample statistic), we compare the mean with the null hypothesis.
What are some hints for hypothesis testing?
- It may seem awkward to phrase both possible decision in terms of rejecting the null hypothesis. -> We either reject the H0 or we fail to reject H0. -Think of a research study as an attempt to gather evidence to provide that a treatment works. -> This is similar to what takes place during a jury trial.
In Step 1: stating the hypothesis, what are the two hypotheses stated?
- Null Hypothesis (H0) states that in the general population there is no change, no difference, or no relationship. In the context of an experiment, the null hypothesis predicts that the IV (treatment) has NO EFFECT on the DV. - Alternative Hypothesis (H1) states that there is a change, a difference, or a relationship for the general population. In the context of an experiment, the alternative hypothesis predicts that the IV (treatment) DOES HAVE AN EFFECT on the DV. - These can't both be true, one must be true. The data determines which one should be rejected. - These are both nondirectional tests, the hypotheses simply state the treatment has no effect (H0) or has some effect (H1).
In Figure 8.3, from the point of view of the hypothesis test, what is occurring?
- We're asking what would happen if the treatment were administered to the entire population. - A sample is selected and the treatment is administered to the sample. - The result is a treated sample that represents the (hypothetical) treated population. *From the point of view of the hypothesis test, the entire population receives the treatment and then a sample is selected from the treated population. In the actual research study, a sample is selected from the original population and the treatment is administered to the sample. -> From either perspective, the result is a treated sample that represents the treated population.
What are the four assumptions for hypothesis tests?
1) Random Sampling -> It is assumed that participants were selected randomly. 2) Independent Observation -> There should be no consistent, predictable relationship between any two observations. 3) The Value of the Standard Deviation is Unchanged by the Treatment. -> We assume that the standard deviation for the unknown population (after treatment) is the same as it was for the population before the treatment. --> The assumption is the effect o the treatment is add a constant amount to (or subtract a constant amount from) every score in the population. --> This is a theoretical ideal, in actual experiment, this doesn't generally happen. 4) Normal Sampling Distribution -> The distribution of sample means is normal. *The math used for hypothesis tests are based on a set of assumptions, and it these assumptions are not satisfied, the hypothesis test might be compromised.
What is the Logic of Hypothesis Testing?
1) State a hypothesis about a populatioin. -> We could hypothesize that Americans gain an average of 7 pounds after Thanksgiving. 2) Predict characteristics that the sample should have. -> The sample mean should be around 7 pounds. 3) Obtain a random sample from the population. -> For example, n = 200 American adults, measure average weight gain. 4) Compare the obtained sample data with the prediction that was made from the hypothesis. If the sample mean is consistent with the prediction, then we can conclude that our hypothesis is reasonable. If the sample mean is NOT consistent with the prediction, then we can conclude that the hypothesis is wrong.
In step 4: make a decision, what are the two possible outcomes?
1) The sample data is located in the critical region. -> We can conclude that the sample isn't consistent with the null hypothesis and our decision is to REJECT THE NULL HYPOTHESIS. -> Remember that the null hypothesis state that there is NO treatment effect. 2) The sample data is NOT located in the critical region. -> If this is the case, then the sample mean is reasonably close to the population mean. -> Our data DOES NOT provide strong evidence that the null hypothesis is wrong so our conclusion would be to FAIL TO REJECT THE NULL HYPOTHESIS.
What are the three factors that influence a hypothesis test?
1) The size of the difference between the sample mean and the original population mean. -> A big mean difference indicates that the treated sample is noticeably different from the untreated population. 2) The variability of the scores. -> The variability influences the size of the standard error, which influences the z-score. -> Usually, the larger the variability, the lower the likelihood of finding a significant treatment effect. 3) The number of scores in the sample. -> Also influence the size of the standard error, which influences the z-score. -> Increasing the sample size reduces the standard error and increases the size of the z-score.
What is an analogy for hypothesis testing?
1) The test begins with a null hypothesis stating there is no treatment effect. -> The trial begins with a null hypothesis that the defendant did not commit a crime (innocent until proven guilty). 2) The research study gathers evidence to show that the treatment does have an effect. -> The police gather evidence that the defendant really did commit a crime. 3) If there is enough evidence, the research rejects the null hypothesis and concludes there is really is a treatment effect. ->If there's enough evidence the jury rejects the null hypothesis and concludes that the defendant is guilty. 4) If there's not enough evidence, the researcher fails to reject the null hypothesis (the researcher doesn't conclude that there's no treatment effect, simply there isn't enough evidence to conclude that there is an effect. -> If there's not enough evidence, the jury fails to find the defendant guilty (but doesn't conclude that the defendant is innocent, simply that there isn't enough evidence for a guilty verdict).
What occurs in a Type I Error?
A Type I Error occurs when a researcher REJECTS a null hypothesis that is actually TRUE. -> This means that the researcher concludes that the treatment does have an effect, when in fact, it has no effect. -> This usually occurs when the researcher unknowingly obtains an extreme, unrepresentative sample. The alpha level is the probability that the test will lead to a Type I Error. -> An alpha level = 0.05 means that only 5% of samples have means in the critical region. -> The researcher controls the probability of making a Type I Error by setting the alpha level. The risk of a Type 1 Error is SMALL! ** Whenever sample data fall into the critical regions, the correct thing to do is to reject the null hypothesis but sometimes the sample data falls into the critical region by chance without any treatment effect. When this happens, the researchers makes a Type I Error.
What does a hypothesis test evaluate from the results of a research study?
A hypothesis test evaluates the statistical significance of the results from a research study. - A significant result leads to the conclusion: "The specific sample mean is very unlikely (p < .05) if the null hypothesis is true". - This suggest that the null hypothesis is very unlikely, but we can't make a probability statement about the null hypothesis. We can't say that the probability of the null being true is less than 5% just because your rejected the null with an alpha level of .05. *A significant treatment effect does not necessarily indicate a substantial treatment effect. * We have no information about the absolute SIZE of a treatment effect.
What is a hypothesis test?
A hypothesis test is a statistical method that uses SAMPLE data to evaluate a hypothesis about a POPULATION.
What is a measure of effect size?
A measure of effect size is intended to provide a measurement of the absolute magnitude of a treatment effect, independent of the size of the sample(s) being used.
What is a significant result?
A significant result means that the null hypothesis has been rejected, which means that the result is very unlikely to have occurred merely by chance. *It's possible that a sample mean is from the critical region even though there was no effect, but the probability of obtaining a sample mean in the critical region is very small.
What should accompany a hypothesis test?
Because a significant effect doesn't necessarily mean a large effect, it is recommended that the hypothesis test be accompanies by a measure of the effect size.
What is the formula to measure effect size?
Cohen's d is the standardized measure of effect size. -> This is similar to a z-score. -> Cohen's d measures the size of the mean difference in terms of the standard deviation. Cohen's d =(mean difference)/(standard deviation) =(mu treatment - mu no treatment) / lowercase sigma estimated Cohen's d =mean difference)/(standard deviation) =( M treatment - mu no treatment)/ lowercase sigma *Cohen's d measures the distance between two means is typically reported as a positive number even when the formula produces a negative value.
What are the critical regions for Directional Tests (One-Tailed Test)?
Critical regions for directional tests are contained in one tail of the distribution so the proportion is NOT divided into two tails. -> The alpha level of .05 = the entire 5% located in one tail so the z-score boundary for alpha = 0.5 is 1.65.
In step 2: set the criteria for a decision, how is the sample means divided?
Distribution of sample means is divided: - Sample means that are likely if H0 is true (values close to the center). - Sample means that are very unlikely if H0 is true (extreme values). Alpha level (level of significance) is a probability value used to define "very unlikely". Critical region is composed of the extreme sample values that are very unlikely if the null hypothesis is true (as defined by the alpha level). - Boundaries of critical regions are determined by alpha level. - If sample data fall in the critical region, the null hypothesis is rejected. * Figure 8.4
In Figure 8.6, what are the locations of the critical region boundaries for the three different levels of significance: alpha = 0.05, alpha = 0.01, and alpha = 0.001?
Figure 8.6 These are the three most commonly used alpha levels, which establishes the "cut off" for making a decision about the null hypothesis (H0). -> alpha = 0.05, z = +/- 1.96 -> alpha = 0.01, z = +/- 2.58 -> alpha = 0.001, z = +/- 3.30
What is an example or rejecting the null hypothesis?
For example, if the power of a test is 70% (1 - Beta), then the probability of a Type II Error must be 30% (Beta).
For example, during collecting data and computing sample statistics, what is occurring?
For example, we have a population with a mean of mu = 18 and a standard deviation of lowercase sigma = 4. -> We collect a sample of n = 16 with a sample mean of M = 15. 1) Compute a z-score that describes where the sample mean is located. -> z = (M - mu)/(lowercase sigma M) -> We need to compute standard error. --> lowercase sigma M = lowercase sigma/square root(n) --> =4 /square root(16) = 1 2) Compute the z-score. -> z = (M - mu)/(lowercase sigma M) -> z = (15-18)/1 = -3.00
What does it mean that hypothesis testing is an inferential process?
Hypothesis testing is an inferential process. - Uses limited information to reach a general conclusion. - Sample data is used to draw a conclusion about the population. - Samples are usually representative of the population, but there is a chance that the sample is misleading and incorrect conclusion could be made. So, ERRORS ARE POSSIBLE!
In Figure 8.11 and Table 8.2, evaluate the effect size with Cohen's d.
In Figure 8.11, the appearance of a 15-point treatment effect in two different situations. In part (a), the standard deviation is lowercase sigma = 100 and the 15-point effect is relatively small. In part (b), the standard deviation is lowercase sigma =15 and the 15-point effect is relatively large. * Cohen's d uses the standard deviation to help measure the effect. In Table 8.2, evaluation of effect size with Cohen's d were made. As the magnitude of d increase, the evaluation of the effect size increase.
What is an example defining critical regions?
In Figure 8.5, the alpha level of 0.05 separates the extreme 5% from the middle 95%. -> 5% split between two tails so 2.5% in each tail. -> Use the Unit Normal Table to look up the proportion of 0.025 in the tail column. -> z = 1.96. Thus, for any normal distribution the extreme 5% is in the tails beyond z = +1.96 and z = -1.96. -> These values define the boundaries of the critical region for a hypothesis test using an alpha level of 0.05.
What occurs in a One-Tailed Hypothesis Testing?
In a Directional Hypothesis Test, or a One-Tailed Hypothesis Test, the statistical hypotheses (H0 and H1) specify either an increase or a decrease in the population mean. Thus, a statement is made about the direction of the effect. Ex: Are Nobel-Prize winners smarter than the average person? H0: Nobel-Prize winners are not smarter. H1: Nobel-Prize winners are smarter.
In hypothesis test, what are the two different kind of errors that can be made?
In hypothesis test, there are two different kids of errors that can be made: - Type I Error - Type II Error
What is an example of the results of a study in a scientific journal?
In literature, if you were reading the results of a study in a scientific journal, you might read something like: the treatment with alcohol had a significant effect on the birth weight of newborn rates, z = 3.00, p <0.05. -> Z = 3.00 indicates that this was used as a test statistic to evaluate the sample data. -> p < 0.05 specifies the alpha level, as well as the probability of a Type I Error. The researcher is reporting that the treatment had an effect, but is admitting that this could be a false report. The probability of obtaining a sample mean in the critical region is VERY small if there's no treatment effect (p less than 0.05, must have p < 0.05 to be statistically significant).
What does a hypothesis test allow researchers to do?
It allows researchers to draw inferences about the population of interest. This is one of the most commonly used inferential procedures, very important throughout the remainder of the class.
In Figure 8.10, describe the critical region.
Notice that the z-score is smaller for a one-tailed test, so you can reject the null hypothesis when the difference between the sample and population is relatively small.
When is power calculated during a research study?
Power is calculated BEFORE a researcher conducts a research study, so assumptions are made about the size of the treatment effect.
What do researchers do with sample data?
Researchers often find meaningful patterns in sample data.
In Figure 8.12, calculate the power of the test.
See p. 267.
What are the four steps of a Hypothesis Test?
Step 1) State the hypothesis Step 2) Set the criteria for a decision Step 3) Collect data and compute sample statistics Step 4) Make a decision
What is the bottom line of Type I Error and Type II Error?
Table 8.1 is possible outcomes of a statistical decision. Type I Error: -> has no effect but it really had an effect -> FALSE POSITIVE Type II Error: -> has effect but it really has no effect -> FALSE NEGATIVE
How are alpha level selected?
The alpha level: - Sets the boundaries for the critical region. - Determines the probability of Type I Error *The largest permissible value is an alpha level of 0.05. -> When there is not treatment effect, an alpha of 0.05 means that there is still a 5% risk or a 1-in-20 probability, of rejecting the null hypothesis and committing a Type I Error. -> Lowering the alpha level reduces the probability of Type I Error BUT it also pushes the critical region farther out, making it more difficult to reach. ** Have to maintain balance! Alpha levels of 0.05, 0.01, and 0.001 are considered good values to use!
In Figure 8.2, what is the basic experimental situation for hypothesis testing?
The goal of hypothesis testing is to determine whether the treatment has any effect on the individuals in the populations, but we can't treat the entire population. We must obtain a sample. *This is the basic experimental situation for hypothesis testing. It is assumed that the parameter mu is known for the population before treatment. The purpose of the experiment is to determine whether the treatment has an effect on the population mean.
What is power of a statistical test?
The power of a statical test is the probability that the test will correct reject a false null hypothesis. -> That is, power is the probability that he test will identify a treatment effect if one really exist. - Power is an alternative measure to effect size.
What is the probability of rejecting the null hypothesis?
There are only two outcomes of a hypothesis test, either fail to reject H0 or reject H0. -> Because there are only two outcomes, their probability must add up to 1. -> The probability of committing a Type II Error is noted by the Greek letter Beta. -> Therefore, the probability of rejecting the null hypothesis is 1 - Beta (power).
How is the boundaries for the critical region defined?
To determine the exact location for the boundaries that define the critical regions, we use the alpha-level probability and the Unit Normal Table. Most distributions of sample means are normal, so we can use the unit normal table to look up precise z-score locations for the critical region boundaries.
What is a statistical technique that researchers use?
To differentiate between real, systematic patterns and random, chance occurrences, researchers use a statistical technique called hypothesis testing.
What is the most common procedure for hypothesis testing?
Two-Tailed Hypothesis Testing means that the critical region is divided between two tails of the distribution. * This is the most common procedure for hypothesis testing.
In step 4: make a decision, what do the z-score value obtained help us to do?
We use the z-score value obtained in step 3 to make a decision about the null hypothesis according to the criteria established in step 2.
What occurs in a Type II Error?
Whenever a researcher FAILS TO REJECT THE NULL HYPOTHESIS, there is a risk for a Type II Error. -> A Type II Error occurs when a researcher fails to reject a null hypothesis that is really false. -> A Type II Error means that the hypothesis test has failed to detect a real treatment effect. -> This occurs when the sample mean isn't in the critical region, even though the treatment had an effect on the sample. -> Often happens when the effect of the treatment is relatively small!