3.3/3.4 Bootstrap Confidence Intervals (Exam 2)
If 0 is not included in the interval..
-0 is not a possible number -therefore, there is a significant difference between the two proportions
Sampling with Replacement
-each case can be selected more than once
standard error from a bootstrap distribution
-estimated using the standard deviation of bootstrap distribution
How would you use the bootstrap sample to compute a bootstrap statistic?
-find the statistic in bootstrap sample 1 (phat*1) -find the statistic in bootstrap sample 2 (phat*2) -sample statistic (each dot in distribution)
bootstrap distribution
-many bootstrap samples must be drawn to create a distribution
Advantages of the percentile method
-more flexible -can compute 90,95,99% intervals *for skewed*
what does one dot refer to on the bootstrap distribution?
-one dot represents one bootstrap statistic based on one bootstrap sample
Disadvantages of SE
-only good for 95% confidence -may be less accurate than the percentile method
boostrap sample
-sample with replacement from original sample using the same sample size
How would you create a bootstrap statistic using cards?
-start with (sample #) of cards to represent one individual sample -write statistics of interest on the cards -randomly draw a card -record information -replace card -shuffle -randomly select card and repeat steps until you reach the same sample number -calculate population parameter and statistics
How do you create a bootstrap for difference in proportions?
-start with cards =n -split into two piles to represent the number of samples for each group size -for group 1, add specific info to each card for number in that sample -for group 2, add specific info to each card for number in that sample -For group 1, randomly choose 1 card, record results, replace, until you reach sample amount = bootstrap sample 1 -Repeat for group 2 = bootstrap sample 2
What is bootstrapping?
-take a random sample from the population -sample is a representative of your population -make many copies of sample -re-sample from same group
Grouping variable
-two groups (usually numbered 1 and 2) -order doesn't really matter as long as it's consistent -independent from each other
bootstrap statistic
-uses re-sampled cases in the bootstrap sample to compute the boostrap statistic of interest
Variable of interest: categorical
Inference for p1-p2
where should the bootstrap distribution be centered?
around the sample statistic
how do we get a more precise bootstrap?
use more samples
the standard error is from the ______________ distribution
bootstrap (simulation)
sample size
how many cases you have in one sample
number of bootstrap samples
how many times you run the bootstrap stimulation
quantitative variable: mew 1 (or 2)
mean of variable of interest in population 1 (or 2)
quantitative variable: xbar 1 (or 2)
mean of variable of interest in sample 1 (or 2)
_______________ will NOT affect the standard error
number of bootstrap samples
What test would be used for: one group + quantitative variable
population mean
the sampling bootstrap distribution is centered around the ___________________.
population parameter
What test would be used for: one group + categorical variable
population proportion
p1 (or 2)
proportion of category of interest in population 1 (or 2)
phat 1 (or 2)
proportion of category of interest in sample 1 (or 2)
each bootstrap sample should have the ___________sample size compared to the original/real sample?
same
___________________will affect the standard error
sample size
n1 (or 2)
sample size from sample 1 (or 2)
quantitative variable: n1 (or 2)
sample size from sample 1 (or 2)
the bootstrap distribution is centered around the ________________________.
sample statistic
where should a bootstrap statistic be centered?
sample statistic (we are taking multiple statistics from the same sample group with replacement)
bootstrap distributions can be used to estimate the _______________ distribution
sampling
Do the P% and SE methods give similar or different results?
similar
quantitative variable: s1 (or 2)
standard deviation of variable of interest in sample 1 (or 2)
the standard deviation of a bootstrap distribution is called ______________.
standard error
Phat 1- phat 2
statistic of interest
quantitative variable: xbar 1- xbar 2
statistic of interest
the process for creating bootstrap confidence intervals is _______________ for all parameters
the same
What test would be used for: two groups + quantitative variable
difference in population means
What test would be used for: two groups + categorical variable
difference in population proportions
As confidence increases, width of the interval __________________.
increases
Variable of interest: quantitative
inference for mew 1- mew 2
the statistic is from the real________________sample
original (not simulation)
quantitative variable: mew 1- mew2
parameter of interest
Advantages of standard error method
*for bell shaped* -relies on the fact that about 95% of statistics are within 2 standard error for bell shapes
Interpret a difference in population proportions confidence interval
*we are 95% confident* that the population proportion for (variable 1) is between (confidence interval) (higher/lower) than (variable 2)
2 methods for finding bootstrap intervals
1. estimate the standard error of the statistic by computing the standard deviation of the bootstrap distribution. Then generate a 95% confidence interval 2. Generate P% confidence interval as the range for the middle P% of bootstrap statistics
SE method is ONLY good for ______ confidence
95%
P1-P2
Parameter of interest
we don't care about the center, we care about ____________________
variability