Psych Statistics-Chap 9
The goal of the hypothesis test is to use:
A sample from the treated population (a treated sample) as the basis for determining whether the treatment has any effect
Since the H0 states that the population is unchanged, the null hypothesis provides:
A specific value for the unknown population mean
Thus, a t test can be used in situations for which the null hypothesis is obtained from:
A theory, a logical prediction, or just wishful thinking
You can compute the t statistic for every sample and the entire set of t values will form:
A distribution
The t distribution tends to be flatter and more spread out, whereas the normal z distribution:
Has more of a central peak
Thus, large variance means that you are _____ likely to obtain a significant treatment effect
Less
The _____, on the other hand, influences hypothesis tests and measures of effect size
Sample variance (high variance reduces the likelihood of rejecting the null hypothesis and it reduces measures of effect size)
The larger the sample is, the _____ the error is
Smaller
In particular, it is possible for a very small treatment effect to be "statistically significant," especially when:
The sample size is very large (to correct for this problem, it is recommended that the results from a hypothesis test be accompanied by a report of effect size such as Cohen's d)
The estimated standard error is computed from:
The sample variance or sample standard deviation and provides an estimate of the standard distance between a sample mean M and the population mean μ
What does a "family" of t distributions mean?
There is a different sampling distribution of t (a distribution of all possible sample t values) for each possible number of degrees of freedom
2 observations/events are independent if:
There is no consistent, predictable relationship between the first observation and the second-if the occurrence of the first event has no effect on the probability of the second event
The sample data provide a value for:
The sample mean
What is the t statistic used for?
Used to test hypotheses about an unknown population mean, μ, when the value of σ is unknown
All you need to compute a t statistic is:
A null hypothesis and a sample from the unknown population
However, in most situations the population values are not known and you must substitute the corresponding sample values in their place. When this is done, many researchers prefer to identify the calculated value as an:
"Estimated d" or name the value after one of the statisticians who first substituted sample statistics into Cohen's formula (e.g., Glass's g or Hedges's g)
On the other hand, when the difference between the data and the hypothesis is small relative to the standard error, we obtain a t statistic near zero, and our decision is:
"Fail to reject H0"
Finally, the variance and estimated standard error are computed from:
The sample data
The basic difference between the z score and t statistic is:
The t statistic uses sample variance and the z-score uses the population variance
The formula for the t statistic has the same structure as the z-score formula, except:
The t statistic uses the estimated standard error in the denominator
The numbers in the body of the table are:
The t values that mark the boundary between the tails and the rest of the t distribution
What is the shortcoming of using a z-score for hypothesis testing?
The z-score formula requires more information than is usually available-specifically, a z-score requires that we know the value of the population standard deviation (or variance), which is needed to compute the standard error
The only difference between the t formula and the z-score formula is that:
The z-score uses the actual population variance (or the standard deviation) and the t formula uses the corresponding sample variance (or standard deviation) when the population value is not known
Cohen defined his measure of effect size in terms of:
The population mean difference and the population standard deviation
As df and sample size get very large, what happens to the t distribution?
The t distribution gets closer in shape to a normal z-score distribution, the variability in the t distribution decreases
The exact shape of a t distribution changes with:
Degrees of freedom
The first column of the table lists:
Degrees of freedom for the t statistic
For t statistics, however, this relationship (like law of large numbers) is typically expressed in terms of the:
Degrees of freedom or the df value (n − 1) for the sample variance instead of sample size (n): as the df value increases, the better a t statistic approximates a z-score. Thus, the degrees of freedom associated with s^2 also describe how well t represents z
The goal of the hypothesis test is to:
Determine whether the obtained difference between the data and the hypothesis is significantly greater than would be expected by chance
Just as we used the unit normal table to locate proportions associated with z-scores, we use a _____ table to find proportions for t statistics
t distribution
The estimated standard error is _____ related to the number of scores in the sample
Inversely
Statistical procedures that permit researchers to use a sample mean to test hypotheses about an unknown population mean are based on a few basic concepts:
1. A sample mean (M) is expected to approximate its population mean (μ). This permits us to use the sample mean to test a hypothesis about the population mean. 2. The standard error provides a measure of how much difference is reasonable to expect between a sample mean (M) and the population mean (μ). 3. To test the hypothesis, we compare the obtained sample mean (M) with the hypothesized population mean (μ) by computing a z-score test statistic.
There are two reasons for making this shift from standard deviation to variance:
1. In Chapter 4 we saw that the sample variance is an unbiased statistic; on average, the sample variance provides an accurate and unbiased estimate of the population variance. Therefore, the most accurate way to estimate the standard error is to use the sample variance to estimate the population variance. 2. In future chapters we will encounter other versions of the t statistic that require variance (instead of standard deviation) in the formulas for estimated standard error. To maximize the similarity from one version to another, we will use variance in the formula for all of the different t statistics. Thus, whenever we present a t statistic, the estimated standard error will be computed.
The two factors that determine the size of the standard error are:
1. Sample variance 2. Sample size
2 basic assumptions are necessary for hypothesis tests with the t statistic:
1. The values in the sample must consist of independent observations 2. The population sampled must be normal
When the obtained difference between the data and the hypothesis (numerator) is much greater than expected (denominator), we obtain:
A large value for t (either large positive or large negative)
Every sample from a population can be used to compute:
A z score or t statistic
If you select all the possible samples of a particular size (n), and compute the z-score for each sample mean, then the entire set of z-scores will form:
A z-score distribution
In the example, the treatment has a 4-point effect, to reverse this effect we:
Add 4 points to each score
The unknown population is the one that exists _____ the treatment is administered, and the null hypothesis simply states that the value of the mean is _____ changed by the treatment
After, not
In the hypothesis-testing situation, we begin with a population with:
An unknown mean and an unknown variance, often a population that has received some treatment
The estimated standard error (sM) is used as:
As an estimate of the real standard error when the value of σ is unknown
In general, large variance is good or bad for inferential statistics?
Bad
Thus, as df increases, the proportions in a t distribution:
Become more like the proportions in a normal distribution
What is the shape and mean of t distributions?
Bell-shaped, symmetrical, have a mean of 0
If all other factors are held constant, large samples tend to produce:
Bigger t statistics and therefore are more likely to produce significant results
What is a t distribution?
Complete set of t values computed for every possible random sample for a specific sample size (n) or a specific degrees of freedom (df). The t distribution approximates the shape of a normal distribution.
An alternative technique for describing the size of a treatment effect is to:
Compute an estimate of the population mean after treatment
How well a t distribution approximates a normal distribution is determined by:
Degrees of freedom
An alternative method for measuring effect size is to:
Determine how much of the variability in the scores is explained by the treatment effect-the treatment causes the scores to increase (or decrease), which means that the treatment is causing the scores to vary; if we can measure how much of the variability is explained by the treatment, we will obtain a measure of the size of the treatment effect
Although sample size affects the hypothesis test, this factor has little or no effect on measures of:
Effect size (in particular, estimates of Cohen's d are not influenced at all by sample size, and measures of r^2 are only slightly affected by changes in the size of the sample)
The two rows at the top of the table show proportions of the t distribution contained in:
Either one or two tails, depending on which row is used
In fact, the whole reason for conducting a hypothesis test is to:
Gain knowledge about an UNKNOWN population
As always, the null hypothesis states that the treatment has no effect; specifically:
H0 states that the population mean is unchanged
Violating the assumption that the population sampled must be normal has what effect?
Has little practical effect on the results obtained for a t statistic, especially when the sample size is relatively large
To determine how well a t statistic approximates a z-score, we must determine:
How well the sample variance approximates the population variance
What is the relationship between a t distribution and the degrees of freedom?
In general, the greater the sample size (n) is, the larger the degrees of freedom (n − 1) are, and the better the t distribution approximates the normal distribution
Why is the t distribution more flat than the z distribution?
It becomes clear if you look at the structure of the formulas for z and t-for both z and t, the top of the formula can take on different values because the sample mean (M) varies from one sample to another; for z-scores, however the bottom of the formula does not vary, provided that all of the samples are the same size and are selected from the same population (all have same standard error in the denominator) because the population variance and the sample size are the same for every sample. For t statistics, on the other hand, the bottom of the formula varies from one sample to another; specifically, the sample variance changes from one sample to the next, so the estimated standard error also varies →THUS, only the numerator of the z-score formula varies, but both the numerator and the denominator of the t statistic vary and as a result, t statistics are more variable than are z-scores, and the t distribution is flatter and more spread out
One criticism of a hypothesis test is that:
It does not really evaluate the size of the treatment effect (a hypothesis test simply determines whether the treatment effect is greater than chance, where "chance" is measured by the standard error)
The basic research situation for the t statistic hypothesis test:
It is assumed that the parameter μ is known for the population before treatment. The purpose of the research study is to determine whether the treatment has an effect. Note that the population after treatment has unknown values for the mean and the variance. We will use a sample to test a hypothesis about the population mean.
The value of the mean is known or unknown for the population before treatment?
Known (the question is whether the treatment influences the scores and causes the mean to change)
Why is large variance bad for inferential statistics?
Large variance means that the scores are widely scattered, which makes it difficult to see any consistent patterns or trends in the data-high variance reduces the likelihood of rejecting the null hypothesis
Distributions of the t statistic for different values of degrees of freedom are compared to a normal z-score distribution:
Like the normal distribution, t distributions are bell-shaped and symmetrical and have a mean of zero. However, t distributions are more variable than the normal distribution as indicated by the flatter and more spread-out shape. The larger the value of df is, the more closely the t distribution approximates a normal distribution.
What does the estimated standard error in the denominator of the t statistic measure?
Measures how much difference is reasonable to expect between a sample mean and the population mean
When the sample size (and degrees of freedom) is sufficiently large, the difference between a t distribution and the normal distribution becomes:
Negligible (insignificant)
The distribution of z-scores for sample means tends to be a _____ distribution
Normal (if the sample size is large, around n = 30, or if the sample is selected from a normal population, then the distribution of sample means is a nearly perfect normal distribution)
In most situations, however, the standard deviation for the population is:
Not known
The t distribution with df = 3:
Note that 5% of the distribution is located in the tail beyond t = 2.353. Also, 5% is in the tail beyond t = -2.353. Thus, a total proportion of 10% (0.10) is in the two tails combined.
The sample variance was developed specifically to:
Provide an unbiased estimate of the corresponding population variance
As with the z-score formula, the t statistic forms a:
Ratio
The hypothesis test often concerns a population that has:
Received a treatment
Any factor that influences the standard error also affects the likelihood of:
Rejecting the null hypothesis and finding a significant treatment effect
The law of large numbers also holds true for:
Sample variance and the t statistic: as sample size increases, the better the sample variance represents the population variance, and the better the t statistic approximates the z-score
Because the estimated standard error, sM, appears in the denominator of the formula, a larger value for sM produces a _____ value (closer to zero) for t
Smaller
What does the sample standard deviation in the denominator of the estimated d do?
Standardizes the mean difference into standard deviation units (ex. thus, an estimated d of 1.00 indicates that the size of the treatment effect is equivalent to one standard deviation)
What does the numerator of the t statistic measure?
The actual difference between the sample data (M) and the population hypothesis (μ)
Example of deviations from μ (no treatment effect):
The coloured lines in part (a) show the deviations for the original scores, including the treatment effect. In part (b) the coloured lines show the deviations for the adjusted scores after the treatment effect has been removed.
If your sample t statistic is greater than the larger value listed, you can be certain that:
The data are in the critical region, and you can confidently reject the null hypothesis
To recap, what is the law of large numbers?
The larger the sample size (n), the more likely it is that the sample mean is close to the population mean
The estimated standard error is directly related to the sample variance so that:
The larger the variance, the larger the error
What do we use as estimates of the unknown parameters?
The mean for the treated sample and the standard deviation for the sample after treatment
For hypothesis tests using the t statistic, the population mean with no treatment is the value specified by:
The null hypothesis
Degrees of freedom describe:
The number of scores in a sample that are independent and free to vary
What 2 things have a large effect on the t statistic?
The number of scores in the sample and the magnitude of the sample variance
A portion of the t-distribution table:
The numbers in the table are the values of t that separate the tail from the main body of the distribution. Proportions for one or two tails are listed at the top of the table, and df values for t are listed in the first column.
When the z-scores form a normal distribution, we are able to use the _____ to find the critical region for the hypothesis test
Unit normal table
However, the population mean with treatment and the standard deviation are both:
Unknown
When the variance (or standard deviation) for the population is not known, we:
Use the corresponding sample value in its place
For standard error, we concentrated on the formula using the standard deviation. At this point, however, we shift our focus to the formula based on:
Variance
What happens when we obtain a large value for t?
We conclude that the data are not consistent with the hypothesis, and our decision is to "reject H0"
When is a normal population distribution important?
When the sample size is small
When is a normal population distribution not as important?
With larger samples, this assumption can be violated without affecting the validity of the hypothesis test (if you have reason to suspect that the population distribution is not normal, use a large sample to be safe)
Although the t statistic can be used in the "before and after" type of research, it also permits hypothesis testing in situations for which:
You do not have a known population mean to serve as a standard (specifically, the t test does not require any prior knowledge about the population mean or the population variance)
This situation appears to create a paradox: You want to use a z-score to find out about an unknown population, but:
You must know about the population before you can compute a z-score
Occasionally, you will encounter a situation in which your t statistic has a df value that is not listed in the table. What should you do in these situations?
You should look up the critical t for both of the surrounding df values listed and then use the larger value for t (ex. if you have df = 53, which is not in the table, look up the critical t value for both df = 40 and df = 60 and then use the larger t value)
The t distribution approximates a normal distribution, just as a t statistic approximates a:
Z-score
The t distribution has more variability than a normal z distribution, especially when:
df values are small
Because the sample mean places a restriction on the value of one score in the sample, there are:
n − 1 degrees of freedom for a sample with n scores
The numerator of estimated d measures:
that magnitude of the treatment effect by finding the difference between the mean for the treated sample and the mean for the untreated population (μ from H0)