Statistics Test 3
*Pooling
Data from two or more populations may sometimes be combined, or pooled, to estimate a statistic (typically a pooled variance) when we are willing to assume that the estimated value is the same in both populations. The resulting larger sample size may lead to an estimate with lower sample variance. However, pooled estimates are appropriate only when the required assumptions are true.
Degrees of freedom (df)
A parameter of the Student's t-distribution that depends upon the sample size. Typically, more degrees of freedom reflects increasing information from the sample.
Two-sample t-interval
A confidence interval for the difference in the means of two independent groups found as (⁻y₁-⁻y₂)±t*df×SE(⁻y₁-⁻y₂) where SE(⁻y₁-⁻y)=√s²₁/n₁+s²₂/n₂ and the number of degrees of freedom is given by the approximation formula or with technology
*Pooled t-interval
A confidence interval for the difference in the means of two independent groups used when we are willing and able to make the additional assumption that the variances of the groups are equal. It is found as: (⁻y₁-⁻y₂)±t*df×SEpooled(⁻y₁-⁻y₂), where SEpooled(⁻y₁-⁻y₂)=spooled√1/n₁+1/n₂, and the pooled variance is s²pooled=(n₁-1)s₂²+(n₂-1)s₂²/(n₁-1)+(n₂-1). The number of degrees of freedom is (n₁-1)+(n₂-1)
Paired t-confidence interval
A confidence interval for the mean of the pairwise differences between paired groups found as ⁻d±t*n-1×SE(⁻d), where SE(⁻d)=sd/√n and n is the number of pairs.
One-proportion z-interval
A confidence interval for the true value of a proportion. The confidence interval is ^p ± z*SE(^p) where z* is a critical value from the Standard Normal model corresponding to the specified confidence level.
Student's t
A family of distributions indexed by its degrees of freedom. The t-models are unimodal, symmetric, and bell-shaped, but generally have fatter tails and a narrower center than the Normal model. As the degrees of freedom increase, t-distributions approach the Normal model.
*Pooled t-test
A hypothesis test for the difference in the means of two independent groups when we are willing and able to assume that the variances of the groups are equal. It tests the null hypothesis H₀:μ₁-μ₂=∆₀, where the hypothesized difference ∆₀ is almost always 0, using the statistic tdf=(⁻y₁-⁻y₂)-∆₀/SEpooled(⁻y₁-⁻y₂) where the pooled standard error is defined as for the pooled interval and the degrees of freedom is (n₁-1) + (n₂-1)
Two-sample t-test
A hypothesis test for the difference in the means of two independent groups. It tests the null hypothesis H₀:μ₁-μ₂=∆₀, where the hypothesized difference ∆₀ is almost always 0, using the statistic tdf=(⁻y₁-⁻y)-∆₀/SE(⁻y₁-⁻y₂) with the number of degrees of freedom given by the approximation formula or with technology
Paired t-test
A hypothesis test for the mean of the pairwise differences of two groups. It tests the null hypothesis H₀: μd=∆₀, where the hypothesized difference is almost always 0, using the statistic t=⁻d-∆₀/SE(⁻d) with n-1 degrees of freedom, where SE(⁻d) =sd/√n and n is the number of pairs
One-sample t-interval for the mean
A one-sample t-interval for the population mean is: ⁻y±t*n-1 × SE(⁻y) where SE(⁻y)=s/√n. The critical value t*n-1 depends on the particular confidence level, C, that you specify and on the number of degrees of freedom, n-1.
One-proportion z-test
A test of the null hypothesis that the proportion of a single sample equals a specified value (H₀:p=p₀) by comparing the statistic z=^p-p₀/SD(^p) to a standard Normal model.
One-sided alternative
An alternative hypothesis is one-sided (i.e. Ha:p>p₀ or Ha:p<p₀) when we are interested in deviations in ONLY ONE direction away from the hypothesized parameter value
Two-sided alternative
An alternative hypothesis is two-sided (Ha:p≠p₀) when we are interested in deviation in EITHER direction away from the hypothesized parameter value
Confidence Interval
An interval of values usually of the form estimate ± margin of error found from data in such a way that a percentage of all random samples can be expected to yield intervals that capture the true parameter value
Significance level
Another term for the alpha level, used most often in a phrase such as "at the 5% significance level."
Paired data
Data are paired when the observations are collected in pairs or the observations in one group are naturally related to observations in the other. The simplest form of pairing is to measure each subject twice-often before and after a treatment is applied. Pairing in experiments is a form of blocking and arises in other contexts. Pairing in observational and survey data is a form of matching.
Sampling distribution model for a mean
If the independence assumption and randomization condition are met and the sample size is large enough, the sampling distribution of the sample mean is well modeled by a Normal model with a mean equal to the population mean, and a standard deviation equal to σ/√n
Sampling distribution model for a proportion
If the independence assumption and randomization condition are met and we expect at least 10 successes and 10 failures, then the sampling distribution of a proportion is well modeled by a Normal model with a mean equal to the true proportion value, p, and a standard deviation equal to √pq/n
Margin of error (ME)
In a confidence interval, the extent of the interval on either side of the observed statistic value. A margin of error is typically the product of a critical value from the sampling distribution and a standard error from the data. A small margin of error corresponds to a confidence interval that pins down the parameter precisely. A large margin of error corresponds to a confidence interval that gives relatively little information about the estimated parameter.
Central Limit Theorem
The Central Limit Theorem (CLT) states that the sampling distribution model of the sample proportions (and means) is approximately Normal for large n, regardless of the distribution of the population, as long as the observations are independent.
Null hypothesis
The claim being assessed in a hypothesis test. Usually, the null hypothesis is a statement of "no change from the traditional value," "no effect," "no difference," or "no relationship." For a claim to be a testable null hypothesis, it must specify a value for some population parameter that can form the basis for assuming a sampling distribution for a test statistic.
Effect size
The difference between the null hypothesis value and the true value of a model parameter
Sampling distribution
The distribution of a statistic over many independent samples of the same size from the same population
Type II error
The error of failing to reject a null hypothesis when in fact it is false (also called a "false negative") The probability of a Type II error is commonly denoted β and depends on the effect size.
Type I error
The error of rejecting a null hypothesis when in fact it is true (also called a "false positive") The probability of a Type I error is α
Critical value
The number of standard errors to move away from the mean of the sampling distribution to correspond to the specified level of confidence. The critical value, denoted by an asterisk (i.e. z* for a Normal critical value), is usually found from a table or with technology.
One-sample t-test for the mean
The one-sample t-test for the mean tests the hypothesis H₀: μ=μ₀ using the statistic tn-1=⁻y-μ₀/SE(⁻y) where SE(⁻y)=s/√n
P-value
The probability of observing a value for a test statistic at least as far from the hypothesized value as the statistic value actually observed if the null hypothesis is true. A small P-value indicates that the observation obtained is improbable given the null hypothesis and thus provides evidence against the null hypothesis.
Power
The probability that a hypothesis test will correctly reject a false null hypothesis. To find the power of a test, we must specify a particular alternative parameter value as the "true" value. For any specific value in the alternative, the power is 1-β
Alpha level
The threshold P-value that determines when we reject a null hypothesis. Using an alpha level of α, if we observe a statistic whose P-value based on the null hypothesis is less than α, we reject that null hypothesis.
Critical value
The value in the sampling distribution model of the statistic whose P-value is equal to the alpha level. Any statistic value further from the null hypothesis value than the critical value will have a smaller P-value than α and will lead to rejecting the null hypothesis. The critical value is often denoted with an asterisk, as z*, for example.
Sampling error
The variability we expect to see from sample to sample is often called the sampling error, although sampling variability is a better term
Alternative hypothesis
They hypothesis that proposes what we should conclude if we find the null hypothesis to be unlikely.
Standard error
When the standard deviation of the sampling distribution of a statistic is estimated from the data, the resulting statistic is called a standard error (SE). μ=0 σ=1