Exam 3 Statistics Psychology
What is the null hypothesis for a related-samples test?
H0: μD = 0
Descriptive Statistics and the Hypothesis Test
Often, a close look at the sample data from a research study makes it easier to see the size of the treatment effect and to understand the outcome of the hypothesis test.
The Final Formula and Degrees of Freedom
The degrees of freedom for the independent-measures t statistic are determined by the df values for the two separate samples: df for the t statistic = (n1 - 1) + (n2 - 1) = n1 + n2 - 2
Which of the following sets of data is least likely to reject the null hypothesis in a test with the independent-measures t statistic. Assume that other factors are held constant.
n = 15 and SS = 375 for both samples
Which value is not included in the calculation of an estimated Cohen's d?
n
Uncertainty and Errors in Hypothesis Testing
Hypothesis testing is an inferential process, which means that it uses limited information as the basis for reaching a general conclusion. Specifically, a sample provides only limited or incomplete information about the whole population, and yet a hypothesis test uses a sample to draw a conclusion about the population. In this situation, there is always the possibility that an incorrect conclusion will be made.
Chapter 8 The Hypothesis Testing Step 2
The α level establishes a criterion, or "cut-off", for making a decision about the null hypothesis. The alpha level also determines the risk of a Type I error. α = .01, α = .05 (most used), α = .001 The critical region consists of outcomes that are very unlikely to occur if the null hypothesis is true. That is, the critical region is defined by sample means that are almost impossible to obtain if the treatment has no effect.
Assumptions Underlying the Independent-Measures t Formula
Assumptions underlying the independent-measures t formula: 1. The observations within each sample must be independent. 2. The two populations from which the samples are selected must be normal. 3. The two populations from which the samples are selected must have equal variances. The first two assumptions should be familiar from the single-sample t hypothesis test.
What is the primary concern when selecting an alpha value?
To minimize Type I errors
Assuming a normal distribution, which of the following would call for a one-tailed hypothesis test rather than a two-tailed test?
Determining if driving a red car increases the number of speeding tickets per year
For an independent-measures research study, the two sample means are found to be M1 = 15.5 and M2 = 17. If the pooled variance is 1.23, what would be the reported value of Cohen's d, and how would the effect size be described?
1.35, which demonstrates a large effect size
For a repeated-measures study comparing two treatments with 12 scores in each treatment, what is the df value for the t statistic?
11
The results of a hypothesis test are reported as follows: "t(35) = 1.65, p < .05." Based on this report, how many individuals were in the sample?
36
One sample from an independent-measures study has n = 16 with a variance of s2 = 65. The other sample has n = 25 and s2 = 70. What is the df value for the t statistic?
39
Type I Errors
A Type I error occurs when a researcher rejects a null hypothesis that is actually true. In a typical research situation, a Type I error means the researcher concludes that a treatment does have an effect when in fact it has no effect. A Type I error occurs when a researcher unknowingly obtains an extreme, nonrepresentative sample. Fortunately, the hypothesis test is structured to minimize the risk that this will occur. The alpha level for a hypothesis test is the probability that the test will lead to a Type I error. That is, the alpha level determines the probability of obtaining sample data in the critical region even though the null hypothesis is true.
For a repeated-measures t test, the cutoff value for α = _____ using a one-tailed test is the same as the cutoff value for α = _____ using a two-tailed test.
0.025, 0.05
An independent-measures study with n = 8 in each treatment produces M = 75 for the first treatment and M = 71 for the second treatment with a pooled variance of 9. Construct a 95% confidence interval for the population mean difference.
0.7825 <μ1 - μ2 < 7.2175
If exactly 5% of the t distribution is located in the tail beyond t = 2.353, how many degrees of freedom are there?
3
For a sample size of 9 with SS = 100 and sample mean M = 18.2, what is the probability that the true population mean will be between 16.56 and 19.84?
80%
Measuring Effect Size
A measure of effect size is intended to provide a measurement of the absolute magnitude of a treatment effect, independent of the size of the sample(s) being used. Cohen's d measures the size of the mean difference in terms of the standard deviation. Cohen's d = mean difference/standard deviation
Which of the following is not an advantage of a repeated-measures study over an independent-measures study?
A repeated-measures design can be used on populations with high variances.
A study is conducted to see if teenagers drive at faster average speeds than the general population of drivers. The average speed of the driving population is 35 mph. The null hypothesis is H0: μaverage driving speed of teenagers = 35 mph. What is the alternative hypothesis?
H1: μdriving speed of teenagers ≠ 35 mph
Which of the following will cause a researcher the most problems when trying the demonstrate statistical significance using a two-tailed independent-measures t test?
High Variance
How is an independent-measures design different from a study that makes inferences about the population mean from a sample mean?
In an independent-measures design, there are two independent samples that are compared to one another.
Hypothesis Tests with the t Statistic
In the hypothesis-testing situation, we begin with a population with an unknown mean and an unknown variance, often a population that has received some treatment. The null hypothesis states that the treatment has no effect; specifically, H0 states that the population mean is unchanged. Thus, the null hypothesis provides a specific value for the unknown population mean. The sample data provide a value for the sample mean. The variance and estimated standard error are computed from the sample data.
A researcher failed to reject the null hypothesis with a two-tailed test using α = .05. If the researcher had used the same data with a one-tailed test, what can we conclude?
It is impossible to tell whether or not the researcher would reject the null hypothesis using a one-tailed test.
Measuring Effect Size for the Independent-Measures t
One technique for measuring effect size is Cohen's d, which produces a standardized measure of mean difference. In the context of an independent-measures research study, the difference between the two sample means (M1 − M2) is used as the best estimate of the mean difference between the two populations, and the pooled standard deviation (the square root of the pooled variance) is used to estimate the population standard deviation. The independent-measures t hypothesis test also allows for measuring effect size by computing the percentage of variance accounted for, r2. The calculation of r2 for the independent-measures t is exactly the same as it was for the single-sample t.
Which of the following is not an assumption for hypothesis tests with z-scores?
Small standard deviations
A random sample is normally distributed. If all values in the sample and all values in the population are multiplied by 2, what is the impact on Cohen's d?
Stays the same
Comparison of One-Tailed vs. Two-Tailed Tests
The major distinction between one-tailed and two-tailed tests is in the criteria they use for rejecting H0. A one-tailed test allows you to reject the null hypothesis when the difference between the sample and the population is relatively small, provided the difference is in the specified direction. A two-tailed test requires a relatively large difference independent of direction.
Assuming all other factors stay the same, what happens to the proportion of the data in both tails as the degrees of freedom increases with a t statistic?
The proportion in the two tails combined decreases.
A researcher is conducting a directional (one-tailed) test with a sample of n = 10 to evaluate the effect of a treatment that is predicted to increase scores. If the researcher obtains t = 2.770, then what decision should be made?
The treatment has a significant effect with α = .05 but not with α = .01.
A researcher is looking at the impact that television has on children. Children are placed in a room with a variety of toys and a television playing a cartoon. The researcher predicts that the children will spend more than half of their 30 minutes looking at the television. The researcher tested 15 children and found a sample mean of M = 17 minutes spent watching the television with SS = 79. In order to test this hypothesis, what does the researcher need?
A one-tailed t statistic
Comparing Repeated- and Independent-Measures Designs
A repeated-measures design typically requires fewer subjects than an independent-measures design. The repeated-measures design is especially well suited for studying learning, development, or other changes that take place over time. The primary advantage of a repeated-measures design is that it reduces or eliminates problems caused by individual differences. Individual differences are characteristics such as age, IQ, gender, and personality that vary from one individual to another. These individual differences can influence the scores obtained in a research study, and they can affect the outcome of a hypothesis test.
The t Distribution
A t distribution is the complete set of t values computed for every possible random sample for a specific sample size (n) or a specific degrees of freedom (df). The t distribution approximates the shape of a normal distribution. The exact shape of a t distribution changes with degrees of freedom.
What is the advantage of a repeated-subject research study?
All of the above: It uses exactly the same individuals in all treatment conditions; There is no risk that the participants in one treatment are substantially different from the participants in another; A smaller number of subjects is required.
Which of the following is an advantage that an independent-measures study has over a repeated-measures study?
An independent-measures design can eliminate time-related factors.
Which of the following explains why it is easier to reject the null hypothesis with a one-tailed test than with a two-tailed test with all the same parameters?
Because the critical region is all on one side in a one-tailed test and needs to be split between the two tails in a two-tailed test
Confidence Intervals
For the independent- measures t, we use a sample mean difference, M1 − M2, to estimate the population mean difference, µ1 − µ2. The first step is to solve the t equation for the unknown parameter. For the independent-measures t statistic, we obtain m1 - m2 = M1 - M2 + ts (M1-M2) The values for M1 - M2 and for s(M1-M2) are obtained from the sample data. Although the value for the t statistic is unknown, we can use the degrees of freedom for the t statistic and the t distribution table to estimate the t value. Using the estimated t and the known values from the sample, we can compute the value of µ1 − µ2.
The Matched-Subjects Design
In a matched-subjects study, each individual in one sample is matched with an individual in the other sample. The matching is done so that the two individuals are equivalent (or nearly equivalent) with respect to a specific variable that the researcher would like to control.
Which of the following is a problem with using the z-score statistic?
It requires knowing the population variance, which is often difficult to obtain.
Determining Proportions and Probabilities for t Distributions
Just as we used the unit normal table to locate proportions associated with z-scores, we use a t distribution table to find proportions for t statistics. A close inspection of the t distribution table will demonstrate that, as the value for df increases, the t distribution becomes more similar to a normal distribution.
A treatment is administered to a sample selected from a population with a mean of μ = 40 and a standard deviation of σ = 6.25. After treatment, the sample mean is M = 45. Based on this information, the effect size as measured by Cohen's d can be classified as which of the following?
Large effect
For a two-tailed independent-measures t test with sample sizes 7 and 5, the 99% confidence interval for the population mean difference is -1 < μ1 − μ2 < 9. If M 1 = 31, find M 2 and SM1-M2.
M 2 = 26, SM1-M2= 1.5778.
Hypothesis Tests with the Independent-Measures t Statistic
Make a decision. If the t statistic ratio indicates that the obtained difference between sample means (numerator) is substantially greater than the difference expected by chance (denominator), we reject H0 and conclude that there is a real mean difference between the two populations or treatments.
A researcher obtains an independent-measures t statistic of t = 2.12 for a study comparing two treatments with a sample of n1 = 14 in one treatment and n2 = 10 in the other treatment. What is the correct decision for a regular one-tailed hypothesis test?
Reject the null hypothesis with α = .05 but not with α = .01.
Difference Scores: The Data for a Repeated-Measures Study
The difference score for each individual is computed by: difference score = D = X2 - X1 Where X1 is the person's score in the first treatment and X2 is the score in the second treatment.
Factors That Influence a Hypothesis Test
The final decision in a hypothesis test is determined by the value obtained for the z-score statistic. Two factors help determine whether the z-score will be large enough to reject H0. In a hypothesis test, higher variability can reduce the chances of finding a significant treatment effect. Increasing the number of scores in the sample produces a smaller standard error and a larger value for the z-score.
Assumptions of the Related-Samples t Test
The related-samples t statistic requires two basic assumptions. 1. The observations within each treatment condition must be independent. Notice that the assumption of independence refers to the scores within each treatment. 2. The population distribution of difference scores (D values) must be normal.
Homogeneity of Variance
The third assumption is referred to as homogeneity of variance and states that the two populations being compared must have the same variance. Homogeneity of variance is most important when there is a large discrepancy between the sample sizes. Hartley's F-max test provides a method for determining whether the homogeneity of variance assumption has been satisfied. The F-max test is based on the principle that a sample variance provides an unbiased estimate of the population variance. The null hypothesis for this test states that the population variances are equal, therefore, the sample variances should be very similar.
What is a Type I error?
When a researcher rejects a null hypothesis that is actually true.
An independent-measures design is also known as a _______________ design.
between-subjects
When n is especially small, the t distribution is __________ and _______________.
flatter, more spread out
Which of the following is not an assumption for using the independent-measures t formula?
homogeneity of population means
A repeated-measures study with a sample of n = 10 participants produces a mean difference of MD = 4.1 points with SS = 810 for the difference scores. For these data, find the variance for the difference scores and the estimated standard error for the sample mean.
s2 = 90, SMD= 3
A sample is selected from a population and a treatment is administered to the sample. If there is a 3-point difference between the sample mean and the original population mean, which set of sample characteristics has the greatest likelihood of rejecting the null hypothesis?
s^2 = 4 for a sample with n = 50
To evaluate the effect of a treatment, a sample is obtained from a population with a mean of μ = 31, and the treatment is administered to the individuals in the sample. After a treatment, the sample mean is found to be M = 32.7 with a sample variance of s2 = 4. If the sample size is n = 9, what is the t statistic, and is the data sufficient to conclude that the treatment increased the scores significantly? Use a one-tailed test and α = .01.
t = 2.55, which is not sufficient to reject the null hypothesis
To evaluate the effect of a treatment, a sample is obtained from a population with a mean of μ = 25, and the treatment is administered to the individuals in the sample. After treatment, the sample mean is found to be M = 27.4 with SS = 64. If the sample consists of 9 individuals, what is the t statistic, and are the data sufficient to conclude that the treatment has a significant effect using a two-tailed test with α = .05?
t = 2.55, yes
For a repeated-measures study comparing two treatments with a sample of n = 16 participants, a researcher obtains a sample mean difference of MD = 3.3 with SS = 315 for the difference scores. Calculate the repeated-measures t statistic for these data using a two-tailed test, and determine if it is enough to reject the null hypothesis.
t = 2.88, the null hypothesis is not rejected
Using a two-tailed independent-measures t test, a researcher determines that the probability of the population mean difference falling between 1.23 and 4.22 is 95%. What alpha value did the researcher use?
α = 0.05
A repeated-measures t test is performed, and the 95% confidence interval due to the treatment is 2.2 < μ < 6.4. Which of the following is true? I. For the general population, the treatment will result in an increase of score between 2.2 and 6.4. II. We are 95% confident that the true mean difference is in this interval. III. The sample mean difference was 4.4.
1, 2, and 3
Chapter 8 Hypothesis Testing Steps
1. State hypothesis about the population. 2. Use hypothesis to predict the characteristics the sample should have. 3. Obtain a sample from the population. 4. Compare data with the hypothesis prediction.
Which of the following correctly describes the effect that decreasing sample size and decreasing the standard deviation have on the power of a hypothesis test?
A decrease in sample size will decrease the power, but a decrease in standard deviation will increase the power.
Repeated-Measures Designs
A repeated-measures design, or a within-subject design, is one in which the dependent variable is measured two or more times for each individual in a single sample. The same group of subjects is used in all of the treatment conditions. The main advantage of a repeated-measures study is that it uses exactly the same individuals in all treatment conditions. There is no risk that the participants in one treatment are substantially different from the participants in another.
In an independent measures test, a researcher is looking at the average height of Americans versus the average height of Australians. If μ1 and μ2 are the two population mean heights, what is the null hypothesis?
H0: μ1 − μ2 = 0
Why might a repeated-measures study require half the number of subjects compared to a similar matched-subjects study with the same number of scores?
In the repeated-measures study, each subject could be measured twice.
For a repeated-measures study comparing two treatments with n = 26 scores in each treatment, the data produce t = 2.13. If the mean difference is in the direction that is predicted by the researcher, then which of the following is the correct decision for a hypothesis test with α = .05?
Reject H0 for either a one-tailed test or a two-tailed test.
A researcher obtains an independent-measures t statistic of t = 3.01 for a study comparing two treatments with a sample of n = 16 in each treatment. What is the correct decision for a regular two-tailed hypothesis test?
Reject the null hypothesis with α = .05 or with α = .01.
Chapter 8 The Hypothesis Testing Step 1
State the hypothesis about the unknown population. The null hypothesis, H0, states that there is no change in the general population before and after an intervention. In the context of an experiment, H0 predicts that the independent variable had no effect on the dependent variable. The alternative hypothesis, H1, states that there is a change in the general population following an intervention. In the context of an experiment, predicts that the independent variable did have an effect on the dependent variable.
A research study measures the effect of age on the average driving speed by selecting a sample of teenage drivers and a sample of drivers between the ages of 20 and 25. Why is this an example of an independent-measures design?
There is one measure being taken for two non-overlapping samples.
A repeated-measures study with n = 26 participants produces a mean difference of MD = 3 points, SS = 500 for the difference scores, and t = 2.50. Calculate Cohen's d and r2 to measure the effect size for this study.
d = 0.67, r2 = 0.2
Assumptions of the t Test
1. The values in the sample must consist of independent observations. In everyday terms, two observations are independent if there is no consistent, predictable relationship between the first observation and the second. 2. The population sampled must be normal. This assumption is a necessary part of the mathematics underlying the development of the t statistic and the t distribution table. However, violating this assumption has little practical effect on the results obtained for a t statistic, especially when the sample size is relatively large.
A random sample is selected from a normal population with a mean of μ = 200 and a standard deviation of σ = 12. After a treatment is administered to the individuals in the sample, the sample mean is found to be M = 196. How large a sample is necessary for this sample mean to be statistically significant using a two-tailed test with α = .05?
35
A random sample of n = 30 individuals is selected from a population with μ = 15, and a treatment is administered to each individual in the sample. After treatment, the sample mean is found to be M = 23.1 with SS = 400. In order to determine if the treatment had a significant effect, which of the following can we use?
A t statistic. There is not enough information to use a z-score.
In a matched-subjects design, how many variables can subjects be matched on?
All of the above: 1, 2, and 3
The Hypothesis Testing Step 3
Compare the sample means (data) with the null hypothesis. Compute the test statistic. The test statistic (z-score) forms a ratio comparing the obtained difference between the sample mean and the hypothesized population mean versus the amount of difference we would expect without any treatment effect (the standard error).
Which of the following is not an assumption for hypothesis testing using the t statistic?
The sample size must be greater than 30
In a normal sample distribution with n = 16, the null hypothesis is rejected. If the sample size is changed to 64 with all other factors staying the same, what happens to the z-score and the decision about the null hypothesis?
The z-score is doubled, and the null hypothesis is still rejected.
What is the purpose of matching subjects on a variety of variables in a matched-subjects design?
To reduce or eliminate the effect of these variables on the variable that is being studied
Confidence Intervals and Hypothesis Tests
In addition to describing the size of a treatment effect, estimation can be used to get an indication of the significance of the effect.
The t Statistic
The t statistic is used to test hypotheses about an unknown population mean, μ, when the value of σ is unknown. The formula for the t statistic has the same structure as the z-score formula, except that the t statistic uses the estimated standard error in the denominator. Degrees of freedom describe the number of scores in a sample that are independent and free to vary. Because the sample mean places a restriction on the value of one score, there are n - 1 degrees of freedom for a sample with n scores.
Two samples are taken from separate populations. The sample and population mean from one group are M1 = 16 and μ1 = 14. The sample and population mean from the second groups are M2 = 21 and μ2 = 13. If the estimated standard error is S(M1 - M2) = 2, what is the t statistic?
-3
A researcher expects a treatment to produce a decrease in the population mean. The treatment is evaluated using a one-tailed hypothesis test. Which z-scores would lead us to reject the null hypothesis with α = .05? I. z = -1.75 II. z = 1.75 III. z = -1.6 IV. z = 1.6
1 only
A researcher is evaluating the influence of a treatment using a sample selected from a normally distributed population with a mean of μ = 30 and a standard deviation of σ = 3. The researcher expects a 1-point treatment effect and plans to use a two-tailed hypothesis test with α = 0.05. Compute the power of the test if the researcher uses n = 9 individuals.
17%
More about Hypothesis Tests
A result is said to be significant or statistically significant if it is very unlikely to occur when the null hypothesis is true. That is, the result is sufficient to reject the null hypothesis. Thus, a treatment has a significant effect if the decision from the hypothesis test is to reject H0.
The Unknown Population
Although the t statistic can be used in the "before and after" type of research, it also permits hypothesis testing in situations for which you do not have a known population mean to serve as a standard. The t test does not require prior knowledge about the population mean or the population variance. All you need to compute a t statistic is a null hypothesis and a sample from the unknown population.
Factors that Affect Statistical Power
As the effect size increases, the probability of rejecting H0 also increases, which means that the power of the test increases. One factor that has a huge influence on power is the size of the sample. Reducing the alpha level for a hypothesis test also reduces the power of the test. If the treatment effect is in the predicted direction, changing from a two-tailed test to a one-tailed test increases power.
If all other factors are held constant, which of the following is true if the sample sizes and the sample variance increase in a one-tailed independent-measures t test?
Because the increase in sample sizes has little or no effect on effect size and the increase in sample variance causes a decrease in effect size, the combined effect will be a decrease in effect size.
Why does a change in sample sizes have little or no effect on Cohen's d in an independent-measures t statistic?
Because the sample size occurs in the numerator as part of the difference in sample means and in the denominator as part of the pooled variance, and the effect of one is virtually cancelled out by the effect of the other.
Which of the following is a common limitation of hypothesis testing?
Both a and b: Conclusions are made about the data set rather than about the hypothesis itself; Demonstrating a significant treatment effect does not necessarily indicate a substantial treatment effect.
Which of the following is an assumption for a related-samples t statistic?
Both a and b: The observations within each treatment condition must be independent; The population distribution of the difference scores must be normal.
What would be the result of setting an alpha level extremely small?
Both a and b: There would be almost no risk of a Type I error; It would be very difficult to reject the null hypothesis.
Which of the following is not a step in a hypothesis test?
If the sample data is not located in the critical region, we accept the null hypothesis.
Repeated Measures and Matched-Subjects Designs
In a repeated-measures design or a matched-subjects design comparing two treatment conditions, the data consist of two sets of scores, which are grouped into sets of two, corresponding to the two scores obtained for each individual or each pair of subjects. Because the scores in one set are directly related, one-to-one, with the scores in the second set, the two research designs are statistically equivalent and share the common name related-samples designs (or correlated-samples designs).
Hypothesis Tests for the Repeated-Measures Design
In a repeated-measures study, each individual is measured in two different treatment conditions and we are interested in whether there is a systematic difference between the scores in the first treatment condition and the scores in the second treatment condition. A difference score is computed for each person. The hypothesis test uses the difference scores from the sample to evaluate the overall mean difference, µD, for the entire population. The hypothesis test with the repeated-measures t statistic follows the same four-step process that we have used for other tests. 1. State the hypotheses, and select the alpha level. 2. Locate the critical region. 3. Calculate the t statistic. 4. Make a decision.
A sample is selected from a population with μ = 50, and a treatment is administered to the sample. If the sample variance is s2 = 121, which set of sample characteristics has the greatest likelihood of rejecting the null hypothesis?
M = 45 for a sample size of n = 75
The Null Hypothesis and the Independent-Measures t Statistic
The goal of an independent-measures research study is to evaluate the mean difference between two populations (or between two treatment conditions). Using subscripts to differentiate the two populations, the mean for the first population is µ1, and the second population mean is µ2. The difference between means is simply µ1 − µ2. As always, the null hypothesis states that there is no change, no effect, or no difference. The null hypothesis for the independent-measures test: H0: µ1 − µ2 = 0 (No difference between the population means) The alternative hypothesis states that there is a mean difference between the two populations: H1: µ1 − µ2 ≠ 0 (There is a mean difference.) The alternative hypothesis can simply state that the two population means are not equal: µ1 ≠ µ2.
Hypothesis Tests with the Independent-Measures t Statistic
The independent-measures t statistic uses the data from two separate samples to help decide whether there is a significant mean difference between two populations or between two treatment conditions. 1. State the hypotheses and select the alpha level. 2. Compute the df for an independent-measures design. 3. Obtain the data and compute the test statistic. 4. Make a decision.
Effect Size and Confidence Intervals for the Repeated-Measures t
The most commonly used measures of effect size are Cohen's d and r2, the percentage of variance accounted for. Estimated D = MD / s R2 = t2 / (t2 + df) The size of the treatment effect also can be described with a confidence interval estimating the population mean difference, µD. mD = MD+tsMd
A research report summarizes the results of the hypothesis test by stating, "z = 3.11, p < .01." Which of the following is a correct interpretation of this report?
The null hypothesis was rejected, and the probability of a Type I error is less than .01.
Statistical Power
The power of a statistical test is the probability that the test will correctly reject a false null hypothesis. Power is the probability that the test will identify a treatment effect if one really exists. Researchers typically calculate power as a means of determining whether a research study is likely to be successful. i.e., before they actually conduct the research. To calculate power, however, it is first necessary to make assumptions about a variety of factors that influence the outcome of a hypothesis test. Factors such as the sample size, the size of the treatment effect, and the value chosen for the alpha level can all influence a hypothesis test.
Selecting an Alpha Level
The primary concern when selecting an alpha level is to minimize the risk of a Type I error. Thus, alpha levels tend to be very small probability values. By convention, the largest permissible value is α = .05. However, as the alpha level is lowered, the hypothesis test demands more evidence from the research results.
The results of a hypothesis test with a repeated-measures t statistic are reported as follows: t(9) = 2.28, p < .05. Which of the following is consistent with the report?
The study used a total of 10 participants, and the mean difference was not significant.
The Hypothesis Testing Step 4
If the test statistic results are in the critical region, we conclude that the difference is significant or that the treatment has a significant effect. In this case, we reject the null hypothesis. If the mean difference is not in the critical region, we conclude that the evidence from the sample is not sufficient, and the decision is fail to reject the null hypothesis.
A researcher selects a sample of n = 25 individuals from a population with a mean of μ = 103 and administers a treatment to the sample. If the research predicts that the treatment will decrease scores, then what is the correct statement of the null hypothesis for a directional (one-tailed) test?
μ ≥ 103
Assumptions for Hypothesis Tests with z-Scores
It is assumed that the participants used in the study were selected randomly. The values in the sample must consist of independent observations. Two events (or observations) are independent if the occurrence of the first event has no effect on the probability of the second event. The standard deviation for the unknown population (after treatment) is assumed to be the same as it was for the population before treatment. To evaluate hypotheses with z-scores, we have used the unit normal table to identify the critical region. This table can be used only if the distribution of sample means is normal.
Hypothesis Testing
A hypothesis test is a statistical method that uses sample data to evaluate a hypothesis about a population. The general goal of a hypothesis test is to rule out chance (sampling error) as a plausible explanation for the results from a research study. If the individuals in the sample are noticeably different from the individuals in the original population, we have evidence that the treatment has an effect. However, it is also possible that the difference between the sample and the population is simply sampling error. The purpose of the hypothesis test is to decide between two explanations: The difference between the sample and the population can be explained by sampling error (there does not appear to be a treatment effect); The difference between the sample and the population is too large to be explained by sampling error (there does appear to be a treatment effect).
Time-Related Factors and Order Effects
The primary disadvantage of a repeated-measures design is that the structure of the design allows for factors other than the treatment effect to cause a participant's score to change from one treatment to the next. Specifically, in a repeated-measures design, each individual is measured in two different treatment conditions, often at two different times. One way to deal with time-related factors and order effects is to counterbalance the order of presentation of treatments. That is, the participants are randomly divided into two groups, with one group receiving treatment 1 followed by treatment 2, and the other group receiving treatment 2 followed by treatment 1. The goal of counterbalancing is to distribute any outside effects evenly over the two treatments.
Type II Errors
A Type II error occurs when a researcher fails to reject a null hypothesis that is really false. In a typical research situation, a Type II error means that the hypothesis test has failed to detect a real treatment effect. A Type II error occurs when the sample mean is not in the critical region even though the treatment has an effect on the sample. Often this happens when the effect of the treatment is relatively small. The consequences of a Type II error are usually not as serious as those of a Type I error. In general terms, a Type II error means that the research data do not show the results that the researcher had hoped to obtain. The researcher can accept this outcome and conclude that the treatment either has no effect or has only a small effect that is not worth pursuing, or the researcher can repeat the experiment (usually with some improvement) and try to demonstrate that the treatment really does work. Unlike a Type I error, it is impossible to determine a single, exact probability for a Type II error. Instead, the probability of a Type II error depends on a variety of factors and therefore is a function, rather than a specific number. Nonetheless, the probability of a Type II error is represented by the symbol β, the Greek letter beta.
Measuring Effect Size for the t Statistic
A hypothesis test simply determines whether the treatment effect is greater than chance, where "chance" is measured by the standard error. In particular, it is possible for a very small treatment effect to be "statistically significant," especially when the sample size is very large. To correct for this problem, it is recommended that the results from a hypothesis test be accompanied by a report of effect size such as Cohen's d. For the t test, it is possible to compute an estimate of Cohen's d. The only change is that we now use the sample standard deviation instead of the population. estimated Cohen's d= mean difference/ standard deviation= (M - μ)/s As before, Cohen's d measures the size of the treatment effect in terms of the standard deviation. With a t test, it is also possible to measure effect size by computing the percentage of variance accounted for by the treatment. This measure is based on the idea that the treatment causes the scores to change, which contributes to the observed variability in the data. By measuring the amount of variability that can be attributed to the treatment, we obtain a measure of the size of the treatment effect. For the t statistic hypothesis test: percentage of variance accounted for = r^2 = t^2/ t^2 + df Cohen suggested r2 =.001 represented a small effect, r2 = .09 represented a medium effect and r2 = .25 represented a large effect. The size of a treatment effect can also be described by computing an estimate of the unknown population mean after treatment. A confidence interval is an interval, or range of values centered around a sample statistic. The logic behind a confidence interval is that a sample statistic, such as a sample mean, should be relatively near to the corresponding population parameter. The first step is to select a level of confidence and look up the corresponding t values in the t distribution table. This value, along with M and sM obtained from the sample data, will be plugged into the estimation formula: μ = M ± tsM Two factors affect the width of the confidence interval: To gain more confidence in your estimate, you must increase the width of the interval. Conversely, to have a smaller, more precise interval, you must give up confidence. The bigger the sample (n), the smaller the interval.
Which of the following would represent independent measures?
Both a and c: The average number of traffic tickets among men and women is measured to see if there is a statistically significant difference; Measuring whether age has an effect on the amount of time an individual can hold his breath under water. A sample of 10-year-olds is compared with a sample of 18-year-olds.
Computing Hartley's F-max Test
Compute the sample variance, s2 = SS/df for each of the separate samples. Select the largest and the smallest of these sample variances and compute F-max = s2largest / s2smallest The F-max value computed for the sample data is compared with the critical value found in an F-max table. To locate the critical value in the table, you need to know: k = number of separate samples. (For the independent-measures t test, k = 2.) df = n − 1 for each sample variance. The Hartley test assumes that all samples are the same size. The alpha level. The table provides critical values for α = .05 and α = .01.
Which of the following are correct ways of defining the power of a statistical test? I. The probability that the test will correctly reject a false null hypothesis II. The probability that the test will result in a Type II error III. The probability that the test will not result in a Type II error
I and III only
Independent-Measures Designs
The research designs that are used to obtain the two sets of data can be classified in two general categories: 1. The two sets of data could come from two completely separate groups of participants. 2. The two sets of data could come from the same group of participants. The first research strategy, using completely separate groups, is called an independent- measures or a between-subjects design.
Concerns About Hypothesis Testing: Measuring Effect Size
There are two serious limitations with using a hypothesis test to establish the significance of a treatment effect. When the null hypothesis is rejected, we are actually making a strong probability statement about the sample data, not about the null hypothesis. Demonstrating a significant treatment effect does not necessarily indicate a substantial treatment effect.
A population of trees has a mean leaf length of 6.2 inches. A sample of 17 of these trees in a particular neighborhood has a mean length of 3.2 inches. If SS = 144 for this sample, what is Cohen's d for this example, and what is the strength of the treatment effect, which in this case is growing in a particular neighborhood?
d = 1, which demonstrates a large effect
For a research study measuring the effect of two treatments, the sample sizes are n 1 = 8 in one treatment and n 2 = 10 in the other treatment, and the pooled variance is 1.23. The sample means are M 1 = 21.2 and M 2 = 24.1, with corresponding population means of μ1 = 21.2 and μ2 = 23. Using a two-tailed t test, determine the t value for α = .05 and make a conclusion about the null hypothesis.
t = -2.09, therefore we do not reject the null hypothesis.
If other factors are held constant, then how does the sample size affect the likelihood of rejecting the null hypothesis and the value for Cohen's d?
A larger sample size increases the likelihood of rejecting the null hypothesis but does not change the value of Cohen's d.
Steps of Hypothesis Testing with the t Statistic
1. State the hypotheses and select an alpha level. Although we have no information about the population of scores, it is possible to form a logical hypothesis about the value of μ. 2. Locate the critical region. The test statistic is a t statistic because the population variance is not known. Therefore, the value for degrees of freedom must be determined before the critical region can be located. 3. Calculate the test statistic. The t statistic typically requires more computation than is necessary for a z-score and can be divided into a three-stage process. First, calculate the sample variance. Remember that the population variance is unknown, and you must use the sample value in its place. Next, use the sample variance (s2) and the sample size (n) to compute the estimated standard error. This value is the denominator of the t statistic and measures how much difference is reasonable to expect by chance between a sample mean and the corresponding population mean. Finally, compute the t statistic for the sample data. 4. Make a decision regarding H0.
Effect Size and Confidence Intervals for the Independent-Measures t
Compute the test statistic. The t statistic for the independent-measures design has the same structure as the single sample t introduced in Chapter 9. However, in the independent-measures situation, all components of the t formula are doubled: there are two sample means, two population means, and two sources of error contributing to the standard error in the denominator.
Directional Tests
In a directional hypothesis test, or a one-tailed test, the statistical hypotheses (H0 and H1) specify either an increase or a decrease in the population mean. That is, they make a statement about the direction of the effect. When a specific direction is expected for the treatment effect, it is possible for the researcher to perform a directional test. The first step (and the most critical step) is to state the statistical hypotheses. The null hypothesis states that there is no treatment effect and that the alternative hypothesis says that there is an effect. The two hypotheses are mutually exclusive and cover all of the possibilities. The critical region is defined by sample outcomes that are very unlikely to occur if the null hypothesis is true (that is, if the treatment has no effect). Because the critical region is contained in one tail of the distribution, a directional test is commonly called a one-tailed test. Also note that the proportion specified by the alpha level is not divided between two tails, but rather is contained entirely in one tail.
Directional Hypotheses and One-Tailed Tests
The nondirectional (two-tailed) test is more commonly used than the directional (one-tailed) alternative. On the other hand, a directional test may be used in some research situations, such as exploratory investigations or pilot studies or when there is a priori justification. The steps of the one-tailed test: 1. State the hypotheses, and select an alpha level. 2. Locate the critical region. 3. Calculate the test statistic. 4. Make a decision. In many repeated-measures and matched-subjects studies, the researcher has a specific prediction concerning the direction of the treatment effect. This kind of directional prediction can be incorporated into the statement of the hypotheses, resulting in a directional, or one-tailed, hypothesis test.
The Influence of Sample Size and Sample Variance
The number of scores in the sample and the magnitude of the sample variance both have a large effect on the t statistic and thereby influence the statistical decision. Because the estimated standard error, sM, appears in the denominator of the formula, a larger value for sM produces a smaller value (closer to zero) for t. Any factor that influences the standard error also affects the likelihood of rejecting H0 and finding a significant treatment effect. The estimated standard error is directly related to the sample variance so that the larger the variance, the larger the error. Thus, large variance means that you are less likely to obtain a significant treatment effect. Large variance means that the scores are widely scattered, which makes it difficult to see any consistent patterns or trends in the data. The estimated standard error is inversely related to the number of scores in the sample. The larger the sample, the smaller the error. If all other factors are held constant, large samples tend to produce bigger t statistics and therefore are more likely to produce significant results.
In measuring the effect of hours of sleep on performance on a mathematics test using a repeated-measures study, which of the following is an example of an order effect?
The participants may have gained experience in taking the test the first time, thereby making it difficult to determine whether the change in hours of sleep or the experience causes the difference in performance.
A researcher administers a treatment to a sample of n = 100 participants and uses a hypothesis test to evaluate the effect of the treatment. The hypothesis test produces a z-score of z = 2.1. Assuming that the researcher is using a two-tailed test, what should the researcher do?
The researcher should reject the null hypothesis with α = .05, but not with α = .01.
The Hypotheses for a Related-Samples Test
The researcher's goal is to use the sample of difference scores to answer questions about the general population. The researcher would like to know whether there is any difference between the two treatment conditions for the general population. We are interested in difference scores. We would like to know what would happen if every individual in the population were measured in two treatment conditions (X1 and X2) and a difference score (D) were computed for everyone. For a repeated-measures study, the null hypothesis states that the mean difference for the general population is zero. In symbols: H0: μD = 0 The alternative hypothesis states that there is a treatment effect that causes the scores in one treatment condition to be systematically higher (or lower) than the scores in the other condition. In symbols, H1: µD ≠ 0
The t Statistic: An Alternative to z
The shortcoming of using a z-score for hypothesis testing is that the formula requires more information than is usually available. The estimated standard error (σM) is used as an estimate of the real standard error σM when the value of σ is unknown. It is computed from the sample variance or sample standard deviation and provides an estimate of the standard distance between a sample mean M and the population mean μ.
The t Statistic for a Repeated-Measures Research Design
The t statistic for a repeated-measures design is structurally similar to the other t statistics we have examined. The major distinction of the related-samples t is that it is based on difference scores rather than raw scores (X values). The single sample t-statistic formula will be used to develop the repeated-measures t test. t = (M - µ)/sM The sample mean, M, is calculated from the data, and the value for the population mean, µ, is obtained from the null hypothesis. The estimated standard error, sM, is calculated from the data and provides a measure of how much difference can be expected between a sample mean and the population mean. For the repeated-measures design, the sample data are difference scores and are identified by the letter D, rather than X. Therefore, we will use Ds in the formula to emphasize that we are dealing with difference scores instead of X values. The population mean that is of interest to us is the population mean difference (the mean amount of change for the entire population), and we identify this parameter with the symbol µD. With these simple changes, the t formula for the repeated-measures design becomes t = (MD - µD)/sMD. In this formula, the estimated standard error, sMD, is computed in exactly the same way as it is computed for the single-sample t statistic. The first step is to compute the variance (or the standard deviation) for the sample of D scores. The estimated standard error is then computed using the sample variance and the sample size, n.
A Closer Look at the z-Score Statistic
The z-score statistic that is used in the hypothesis test is the first specific example of what is called a test statistic. The term test statistic simply indicates that the sample data are converted into a single, specific statistic that is used to test the hypotheses. In a hypothesis test with z-scores, we have a formula for z-scores but we do not know the value for the population mean, μ. Therefore, we try the following steps. 1. Make a hypothesis about the value of μ. This is the null hypothesis. 2. Plug the hypothesized value in the formula along with the other values. 3. If the formula produces a z-score near zero (which is where z-scores are supposed to be), we conclude that the hypothesis was correct. 4. On the other hand, if the formula produces an extreme value (a very unlikely result), we conclude that the hypothesis was wrong. In the context of a hypothesis test, the z-score formula has the following structure: z = (M - μ)/σM= (sample mean - hypothesized population mean)/ standard error between M and m Thus, the z-score formula forms a ratio.
Calculating the Estimated Standard Error
To develop the formula for s(M1-M2) we consider three points: 1. Each of the two sample means represents it own population mean, but in each case there is some error. 2. The amount of error associated with each sample mean is measured by the estimated standard error of M. 3. For the independent-measures t statistic, we want to know the total amount of error involved in using two sample means to approximate two population means. a. To do this, if the samples are the same size, we will find the error from each sample separately and then add the two errors together. b. When the samples are of different sized, a pooled or average estimate, that allows the bigger sample to carry more weight in determining the final value, is used.
The Role of Sample Variance and Sample Size in the Independent-Measures t Test
Two factors that play important roles in the outcomes of hypothesis tests are the variability of the scores and the size of the samples. Both factors influence the magnitude of the estimated standard error in the denominator of the t statistic. The standard error is directly related to sample variance so that larger variance leads to larger error. As a result, larger variance produces a smaller value for the t statistic (closer to zero) and reduces the likelihood of finding a significant result. By contrast, the standard error is inversely related to sample size (larger size leads to smaller error). Thus, a larger sample produces a larger value for the t statistic (farther from zero) and increases the likelihood of rejecting H0.
The Formulas for an Independent-Measures Hypothesis Test
•The independent-measures t uses the difference between two sample means to evaluate a hypothesis about the difference between two population means. Thus, the independent- measures t formula is: t= (sample mean difference - population mean difference)/ estimated standard error= ((M1-M2) - (µ1-µ2))/S(M1-M2) In each of the t-score formulas, the standard error in the denominator measures how accurately the sample statistic represents the population parameter. In the single-sample t formula, the standard error measures the amount of error expected for a sample mean and is represented by sM. For the independent-measures t formula, the standard error measures the amount of error that is expected when you use a sample mean difference (M1 − M2) to represent a population mean difference (µ1 − µ2). The standard error for the sample mean difference is represented by the symbol s(M1-M2). The estimated standard error of M1 − M2 can be interpreted in two ways. First, the standard error is defined as a measure of the standard or average distance between a sample statistic (M1 − M2) and the corresponding population parameter (µ1 − µ2). When the null hypothesis is true, the standard error is measuring how big, on average, the sample mean difference is.
A repeated-measures study is done to measure the change in IQ test scores taken on a Monday versus those taken on a Friday. There were n = 9 participants in the study. The mean difference was MD = -5 points, and the standard error for the mean difference was SMD= 0.51. Construct a 95% confidence interval to estimate the size of the population mean difference.
-6.18 < μ < -3.82
A scientist is studying the impact of certain vitamins on a person's ability to remember. The sample size for the experimental group was 25. When a two-tailed t test was calculated, the t statistic came out to be 1.54. What are the percent of variance (r2) and the size of the effect?
0.09, which indicates a medium effect
A sample of size n1 = 16 is taken from a group of bees in one hive, and the sample variance in weight is 1.2 grams. A sample of size n2 = 16 is taken from another hive, and the sample variance is 0.8 grams. What is the estimated standard error for the difference between the two means?
0.35
What is the repeated-measures t statistic for a two-tailed test using the following? 1 | 2 3 | 7 2 | 6 8 | 6 7 | 5
0.577
What is another name for a repeated-measures design?
Within-subjects design
For a sample of n = 16 scores with SS = 375, compute the sample variance and the estimated standard error for the sample mean.
s2 = 25, sM = 1.25