STAT 104 - Chapter 3
How Bootstrapping works:
- A random sample I representative of the population - Use the sample as a proxy for the population - Draw re-samples (bootstrap) from the sample!
Importance of Random Sampling
- Gives us representative samples - If we take random samples, out sampling distribution will center around the population parameter - If we do NOT take random samples, our samples are likely biased and may not be centered around the population parameter!
How is the confidence LEVEL interpreted?
95% of all samples yield intervals that contain the true parameter.
Sample Variability
CASES within a sample vary. Can be summarized using the standard deviation
Sample Distribution
Describes ONE sample
Sampling Distribution
Describes statistics for MANY samples
Margin of Error - Why 2?
In a sampling distribution, 95% of statistics will be within 2*SE of the true parameter value
Statistics vs. Parameters
Use Statistics (known) to estimate Parameters (usually unknown)
Population Proportion
p
Confidence Interval Equation
statistic +/- 2*(Standard Error)
Common MISINTERPRETATIONS of confidence intervals include:
"A 95% confidence interval contains 95% of the data in the population." "I am 95% sure that the mean of a sample will fall within a 95% confidence interval for the mean." "95% of all sample means will fall within this 95% confidence interval." "The probability that the population parameter is in this particular 95% confidence interval is .95%"
Sampling with Replacement
- Many times it is not possible to sample repeatedly from the actual population... - But we can sample repeatedly from the sample! - To get statistic that vary, sample with replacement (each case can be selected more than once)
Standard Error
- Measures how much the statistic varies from sample to sample - It is the average distance from the statistic to the parameter - It is calculated in the same way as the standard deviation which was the average distance from the observation to the mean
Margin of Error
- Reflects the precision of the sample statistic as a point estimate for the parameter - One form of margin of error for a 95% confidence interval is: Margin of error = 2*(standard error)
Sampling Distribution - Summary
- Sampling Distribution is a collection of many statistics from a population with the same sample size, n - Width is measured by standard error - Larger sampling size = smaller standard size
Steps to create a Sampling Distribution
- Suppose we have a random sample - Take a random sample and compare a statistic - Take a different random sample and compute a statistic - Take another random sample and compute a statistic (and so on)... - Graph the statistics of many samples, and we. get our sampling distribution
What is Bootstrapping?
- Suppose you have a population and take one random sample, Sample A. - If a sample is randomly selected, it should be representative of your population. - Now, imagine we take Sample A and make many copies of it. We can consider this a guess of what our population looks like. - Then , we can take many re-samples from our "guess of what our population looks like."
Point Estimate
- The sample statistic of interested is a point estimate for a population parameter - It will not necessarily equal the parameter - refereed as the "best estimate" from the sample
Center of a bootstrap distribution
- The sampling distribution is centered around the population parameter - The bootstrap distribution is centered around the sample statistic
Standard Error from a bootstrap distribution
- The variability of the bootstrap statistic is similar to the variability of the sample statistic in a sampling distribution - The standard error of a statistic can be estimated using the standard deviation of the bootstrap distribution! - Note that the standard deviation of a bootstrap distribution is called a standard error
Where does the standard error come from?
- There are different methods of estimating standard errors, such as BOOTSTRAPPING
Interval Estimate
- This estimate show far off the parameter is from the point estimate - It gives plausible values for the parameter (Plausible doesn't always mean possible) - e.g. confidence interval
How is the confidence INTERVAL interpreted?
- We are "95% confident" that an interval contains the truth/parameter - Always include the context of the problem! That includes the variable and measurement units when appropriate.
95%
Approx. 95% of 95% confidence intervals will contain the true parameter value - The 95% is NOT a probability - The parameter is fixed but the statistic and interval are random (depends on the statistic)
Population Distribution
Describes the population
Percentile Method
If the bootstrap distribution is approximately symmetric, we can construct a confidence interval by finding the percentiles in the bootstrap distribution so that the proportion of bootstrap statistics between the percentiles matches the desired confidence level.
Parameter
Numerical summary of the POPULATION
Statistic
Numerical summary of the SAMPLE DATA
Three Types of Distributions
Population Distribution Sample Distribution Sampling Distribution
Sampling Variability
STATISTICS = from many samples vary. Can be summarized using the standard error which is the standard deviation of a sampling distribution.
Sample Size vs Bootstrap samples
Sample Size: - How many cases you have in one sample - Sample size WILL affect the standard error Bootstrap Samples: - The number of bootstrap samples is how many times you run the bootstrap simulation. - Each bootstrap sample should have the same sample size as the original/real sample - Number of bootstrap samples will NOT affect the standard Error
Bootstrap Distribution
The distribution of many bootstrap statistics - Can be used to estimate the sampling distribution
95% Confidence Interval
The range within which the true population mean lies, with 95% certainty - Can be calculated using the following formula if the sampling distribution is approx. bell shaped: statistic +/- margin of error
Bootstrap Statistic
The statistic computed on a bootstrap sample - Use the re-sampled cases in the bootstrap sample to compute the bootstrap statistic of interest
What is the purpose of a Sampling Distribution?
To show us how the sample statistic varies from sample to sample
Difference in population proportion
p1 - p2
Population Mean
μ
Difference in population means
μ1 - μ2