Statistics Chapter 6
Confidence Interval for Difference in Means
A difference in sample means based on random samples of sizes n1 and n2 has SE = √s₁²/n₁ + s₂²/n₂ where and are the means and s1 and s2 are the standard deviations for the respective samples. If t* is an endpoint chosen from a t-distribution with df equal to the smaller of n1 − 1 or n2 − 1 to give the desired level of confidence, and if the distribution of the populations are approximately normal or the sample sizes are large (n1 ≥ 30 and n2 ≥ 30), the confidence interval for the difference in population means, μ1 − μ2, is which, in this case, corresponds to (xbar1-xbar2) ± t* × √s₁²/n₁ + s₂²/n₂
Confidence Interval for a Mean
A sample mean based on a random sample of size n has Sample statistic = x-bar and SE = s / √n where x-bar and s are the sample mean and standard deviation, respectively. If t* is an endpoint chosen from a t-distribution with n - 1 df to give the desired level of confidence, and if the distribution of hte population is approximately normal or the sample size is large (n≥30), the confidence interval for the population mean, µ, is Sample statistic ± t* × SE which in this case corresponds to x-bar ± t* × (s/√n)
Central Limit Theorem for Sample Means
If the sample size n is large, the distribution of sample means from a population with mean μ and standard deviation σ is approximately normally distributed with mean μ and standard deviation σ / √n. Often in practice, we must use SE = s / √n since we don't usually know population of interest's standard deviation σ
Determination of Sample Size to Estimate Mean
If we want to estimate a population mean to within a desired margin of error, ME, with a given level of confidence, we should select a sample of size n = (z* × ~σ / ME)² where ~σ is an estimate for the standard deviation in the population
Estimate a Proportion
If we want to estimate a population proportion to within a desired margin of error, ME, with a given level of confidence, we should select a sample of size n = (z*/ME)² p~(1-p~) where we use p~ or, if available, some other estimate of p. Sometimes, when we want 95% confidence interval, we can use the formula: n ≈ 1/(ME)²
Confidence Interval for Difference in Proportions
The difference in sample proportions based on random samples of size n1 and n2, respectively, has Sample statistic = p₁^ - p₂^ and SE = √p₁^(1-p₁^)/n₁ + p₂^(1-p₂^)/n₂ If z* is a standard normal endpoint to give the desired level of confidence, and if the sample sizes are large enough so that n1p1 ≥ 10 and n1(1 − p1) ≥ 10 and n2p2 ≥ 10 and n2(1 − p2) ≥ 10, the confidence interval for a difference in population proportions p1 − p2 is Sample statistic ± z* × SE which, in this case, corresponds to (p₁^ - p₂^ ) ± z* × √p₁^(1-p₁^)/n₁ + p₂^(1-p₂^)/n₂
Confidence Interval for a Proportion
The sample proportion based on a random sample of size n has Sample statistic = pˆ and SE = √pˆ(1-pˆ) / n If z* is a standard normal endpoint to give the desired level of confidence, and if the sample size is large enough so that npˆ ≥ 10 and n(1-pˆ) ≥ 10, the confidence interval for a population proportion p is Sample statistic ± z* × SE which, in this case, corresponds to pˆ ± z* × √pˆ(1-pˆ) / n
Inference for a Difference in Means with Paired Data
To estimate the difference in means based on paired data, we first subtract to compute the difference for each data pair and compute the mean xbar_d, the standard deviation s_d, and the sample size n_d for the sample differences. Provided the differences are reasonably normally distributed (or the sample size is large), a confidence interval for the difference in means is given by Statistic ± t* × SE = xbar_d ± t* × s_d / √n_d where t* is a percentile from a t-distribution with n_d - 1 degrees of freedom. To test Ho: µ_d = 0 vs Ha: µ_d ≠ 0 (or a one tail alternative) we use the t-test statistic t = statistic - null value / SE = (xbar_d - 0) / (s_d / √n_d)
Hypothesis Test for a Proportion
To test H0 : p = p0 vs Ha : p ≠ p0 (or a one-tail alternative), we use the standardized test statistic z = (Statistic - Null Value) / SE = (p^ -p0) / √p0(1-p0) / n where is the proportion p^ in a random sample of size n. Provided the sample size is reasonably large (so that np0 ≥ 10 and n(1 − p0) ≥ 10), the p-value of the test is computed using the standard normal distribution.
T-Test for a Mean
To test H0 : μ = μ0 vs Ha : μ ≠ μ0 (or a one-tail alternative) use the t-statistic t = statistic - null value / SE = (xbar - μ0) / (s / √n) where xbar is the mean and s is the standard deviation in a random sample of size n. Provided the underlying population is reasonably normal (or the sample size is large), the p-value of the test is computed using the appropriate tales of a t distribution with n - 1 degrees of freedom.
Two Sample t-test for a Difference in Means
To test H0 : μ1 = μ2 vs Ha : μ1 ≠ μ2 (or a one-tail alternative) based on samples of sizes n1 and n2 from the two groups, we use the two-sample t-statistic t = statstic - null value / SE = (xbar1-xbar2) - 0 / √s₁²/n₁ + s₂²/n₂. where and are the means and s1 and s2 are the standard deviations for the respective samples. If the underlying populations are reasonably normal or the sample sizes are large, we use a t-distribution to find the p-value for this statistic. For degrees of freedom we can either use the smaller of n1 − 1 or n2 − 1, or technology to get a more precise approximation.
Test for Difference in Proportions
To test H₀: p₁=p₂ vs Ha: p₁≠p₂ (or a one tail alternative based on samples of size n₁ and n₂ from the two groups, the standardized test statistic is z=Statistic-Null/SE = (p₁^ - p₂^ ) - 0 / √p^(1-p^)/n₁ + p^(1-p^)/n₂ where p1hat and p2hat are the proportions in the two samples and phat is the pooled proportion obtained by combining the two samples. If both samples are sufficiently large (at least 10 successes and failures in each group) the p value of the test statistic is computed using the standard normal distribution.
The Distribution of Sample Means Using the Sample Standard Deviation
When choosing random samples of size n from a population with mean µ, the distribution of the sample means is centered at the population mean, µ, and has standard error estimated by SE = s / √n where s is the standard deviation of a sample. The standardized sample means approximately follow a t-distribution with n - 1 degrees of freedom (df). For small sample sizes (n<30), the t-distribution is only a good approximation if the underlying population has a distribution that is approximately normal.
The Distribution of Differences in Sample Means
When choosing random samples of size n1 and n2 from populations with means μ1 and μ2, respectively, the distribution of the differences in the two sample means, , is centered at the difference in population means, μ1 − μ2, and has standard error estimated by SE = √s₁²/n₁ + s₂²/n₂ The standardized differences in sample means follow a t-distribution with degrees of freedom approximately equal to the smaller of n1 − 1 and n2 − 1. For small sample sizes (n1 < 30 or n2 < 30), the t-distribution is only a good approximation if the underlying population has a distribution that is approximately normal.
Distribution for Difference in Two Proportions
When choosing random samples of size n1 and n2 from populations with proportions p1 and p2, respectively, the distribution of the differences in the sample proportions, p1^ - p2^ , is centered at the difference in population proportions, p1 − p2, has standard error given by SE = √p1(1-p1)/n1 + p2(1-p2)/n2 and is reasonably normally distributed if n1p1 ≥ 10 and n1(1 − p1) ≥ 10 and n2p2 ≥ 10 and n2(1 − p2) ≥ 10.
Distribution of a Sample Proportion
When selecting random samples of size n from a population with proportion p, the distribution of the sample proportions is centered at the population proportion p, has standard error given by SE = √p(1-p) / n and is reasonably normally distributed if np ≥ 10 and n(1 − p) ≥ 10.