Chapter 8
two ways to guess the value of p-hat when choosing n
1) Use a guess for p-hat based on a pilot study or on past experience with similar studies. You should do several calculations that cover the range of p-hat values you might get. 2) Use p-hat = 0.5 as the guess. The margin of error ME is largest when p-hat = 0.5, so this guess is conservative in the sense that if we get any other p-hat when we do our study, w will get a margin of error smaller than planned.
one-sample t interval for a population mean
Choose an SRS of size n from a population having an unknown mean μ. A level C confidence interval for μ is: x-bar +/- t* (Sx / square root of N) where t* is the critical value for the tn-1 distribution. Use this interval only when: 1) the population distribution is Normal OR the sample size is large (n >/= 30) AND 2) the population is at least 10 times as large as the sample
one-sample z interval for a population proportion
Choose and SRS of size n from a large population that contains an unknown proportion p of successes. An approximate level C confidence interval for p is: p-hat +/- z* (the square root of (p(1 - p)) / n) where z* is the critical value for the standard Normal curve with area C between -z* and z* **Use this interval only when the numbers of successes and failures in the sample are both at least 10 and the population is at least 10 times as large as the sample.
standard error (SE)
Describes how close the sample proportion p-hat will be, on average, to the population proportion p in repeated SRSs of size n. Results when the standard deviation of a statistic is estimated from data.
the t distributions: degrees of freedom
Draw an SRS of size n from a large population that has a Normal distribution with mean and standard deviation. The statistic t = (x-bar - Mu) / (Sx / square root of n)
one-sample z interval for a population mean
Draw and SRS of size n from a population having unknown mean and known standard deviation. As long as the Norma and Independent conditions are met, a level C confidence interval for the mean is: x-bar +/- z* (standard deviation / square root of n) The critical value z* is found from the standard Normal distribution.
standard error of the sample mean x-bar
It describes how far x-bar will be from μ, on average, in repeated SRSs of size n. It is represented by the formula: Sx / square root of n where Sx is the sample standard deviation
point estimator
a statistic that provides an estimate of a population parameter
The t procedures (are/are not) robust against outliers, because x-bar and Sx are not resistant to outliers.
are NOT
The margin of error in a confidence interval covers only _______ due to random sampling or random assignment.
chance variation
The _______ depends on both the confidence level C and the sampling distribution of the statistic.
critical value
As the _______ increase, the t density curve approaches the standard Normal curve ever more closely.
degrees of freedom; This happens because Sx estimates σ more accurately as the sample size increases. So using Sx in place of σ causes little extra variation when the sample is large.
A t distribution has the (same/different) shape than the standard Normal curve.
different
The confidence level (does/does not) tell us the chance that a particular confidence interval captures the population parameter.
does NOT
The size of the population (does/does not) influence the sample size we need.
does NOT
When the actual df foes not appear in the table, use the greatest df available that is (more/less) than your desired df.
less
Except in the case of small samples, the condition that the data come from a random sample or randomized experiment is (more/less) important than the condition that the population distribution is Normal.
more
The confidence interval gives us a set of plausible values for the _______.
parameter
Confidence intervals are statements about _______.
parameters
Larger samples improve the accuracy of critical values from the t distributions when the _______ is NOT Normal.
population
Standard deviation is to _______, as standard error is to _______.
population parameter, sample statistic
Inferences for _______ uses z, while inferences for _______ uses t.
proportions, means
t procedures are quite _______ against non-Normality of the population except when outliers or strong skewness are present.
robust
Remember that the margin of error in a confidence interval includes only _______!
sampling variability
When the sample size is small (n < 30), the Normal condition is about the _______ of the POPULATION distribution.
shape; We inspect the distribution to see if it's believable that these data came from a Normal population
margin of error
tells how close the estimate tends to be to the unknown parameter in repeated random sampling
critical value z*
the central area C under the standard Normal curve, which is needed to find a level C confidence interval
_______ determines the margin of error.
the sample size
With a 100% confidence level, we are 100% confident that the interval from _______ to _______ captures the true population proportion.
0 to 1
using one-sample t procedures: the Normal condition
1) Sample size < 15: Use t procedures if the data appear close to Normal (roughly symmetric, single peak, no outliers). If the data are clearly skewed or if outliers are present, do NOT use t. 2) Sample size >/= 15: The t procedures can be used except in the presence of outliers or strong skewness. 3) Large samples: The t procedures can be used even for clearly skewed distributions when the sample is large, roughly n >/= 30.
The margin of error gets smaller when...
1) The confidence level decreases 2) The sample size n increases
two reasons why t distributions are robust against non-Normal population distributions
1) The sampling distribution of the sample mean x-bar from a large sample is close to Normal (central limit theorem). Normality of the individual observations is of little concern when the sample size is large. 2) As the sample size n grows, the sample standard deviation Sx will be an accurate estimate of σ whether or not the population has a Normal distribution.
conditions for inference about a population mean (t interval)
Random: The data come from a random sample of size n from the population of interest or a randomized experiment. Normal: The population has a Normal distribution OR the sample size is large (n >/= 30) Independent: The method for calculating a confidence interval assumes that individual observations are independent. To keep the calculations reasonably accurate when we sample without replacement from a finite population, we should check the 10% condition: verify that the sample size is no more than 1/10 of the population size
conditions for constructing a confidence interval (z interval)
Random: The data come from a well-designed random sample or randomized experiment. Normal: The sampling distribution of the statistic is approximately Normal. Independent: Individual observations are independent. When sampling without replacement, the sample size n should be no more than 10% of the population size N (the 10% condition) to use our formula for the standard deviation of the statistic.
importance of each condition when constructing a confidence interval
Random: We can generalize our results to a larger population or make inferences about cause and effect. Normal: Know the sampling distribution of the statistic, which, in turn, leads to the computation of the confidence interval. Independent: For calculating the appropriate standard deviations.
Our method of calculation assumes that the data comes from an _______ of size n from the population of interest.
SRS
conditions for estimating p-hat
Shape: If the sample size is large enough that both np and n(1 - p) are at least 10 (Normal condition), the sampling distribution of p-hat is approximately Normal. Center: The mean is p. That is, the sample proportion p-hat is an unbiased estimator of the population proportion p. Spread: The standard deviation of the sampling distribution of p-hat is "the square root of (p(1 - p)) / n" provided that the population is at least 10 times as large as the sample (10% condition).
t distribution vs. Normal distribution
Similarities: t distributions are similar in shape to Normal distributions. They are symmetric with a single peak at 0 and bell-shaped. Differences: The spread of t distributions is a bit greater than that of Normal distributions. t distributions have more probability/area in the tails and less in the center than do Normal distributions. This is true because substituting the estimate Sx for the fixed parameter σ introduces more variation into the statistic.
confidence intervals: a four-step process
State: What parameter do you want to estimate, and at what confidence level? Plan: Identify the appropriate inference method. Check conditions. Do: If the conditions are met, perform calculations. Conclude: Interpret your interval in the context of the problem.
calculating a confidence interval
The confidence interval for estimating a population parameter has the form: statistic +/- (critical value)(standard deviation of statistic) where the statistic we use is the point estimator for the parameter
point estimate
The value of a statistic that provides an estimate of a population parameter. Ideally, it is our "best guess" at the value of an unknown parameter.
confidence level C
This gives the overall success rate of the method for calculating the confidence interval. That is, in C% of all possible samples, the method would yield an interval that captures the true parameter value.
degrees of freedom (df)
This is used when we perform inferences about a population mean using a t distribution. It is found by subtracting 1 from the sample size n (df = n - 1).
confidence interval
This pertains to a parameter and has two parts: 1) An interval calculated from the data, which has the form "estimate +/- margin of error" 2) A confidence level C
choosing sample size for a desired margin of error when estimating the mean
To determine the sample size n that will yield a level C confidence interval for a population mean with a specified margin of error ME: 1) Get a reasonable value for the population standard deviation from an earlier or pilot study 2) Find the critical value z* from a standard Normal curve for confidence level C 3) Set the expression for the margin of error to be less than or equal to ME and solve for n: z* (standard deviation / square root of n) </= ME
sample size for desired margin of error
To determine the sample size n that will yield a level C confidence interval for a population proportion p with a maximum margin of error ME, solve the following inequality for n: z*(the square root of (p(1 - p)) / n) < or equal to ME where p-hat is a guessed value for the sample proportion. The margin of error will always be less than or equal to ME if you take the guess p-hat to be 0.5.
interpreting confidence intervals
To interpret a C% confidence interval for an unknown parameter, say, "We are C% confident that the interval from _______ to _______ captures the actual values of the [population parameter in context]."
interpreting confidence levels
To say that we are 95% confident is shorthand for "95% of all possible samples of a given size from this population will result in an interval that captures the unknown parameter."
robust
When the probability calculations involved in an inference procedure remain fairly accurate when a condition for using the procedure (Random, Normal, Independent) is violated. For confidence intervals, this means that the stated confidence level is still pretty accurate.