ch 7 ap stat - sampling distributions
What is the Central Limit Theorem?
CLT) states that given a distribution with a mean μ and variance σ^2, the sampling distribution of the mean approaches a normal distribution with a mean (μ) and a variance σ^2/n as n, the sample size, increases.
*10% rule to check for independence* esp when w/out replacement
N ≥ 10n total ≥ 10 * sample -sample sizes should be no more than 10% of the pop -ensures independence when samples are drawn w/out replacement -when calc St. dev
What are sample means? How do they differ from sample proportions?
Sample proportions arise most often when we are interested in *categorical variables.* They are percents (i.e. %males, %red M&M's, etc.) Sample means are based on *quantitative variables.* They are averages (i.e. average age, average household income, etc.)
*Sampling distribution of a statistic*
Shows the statistic values from *all the possible samples* of the *sample size* from the *population* It is a distribution of the statistic p hat ex: everyone in a sample must take 15 chips *Always approx normal* bc never practiced to get all possible samples
Distribution of sample
Shows the values of the variable for *all* individuals in the *sample* graph has bars of uneven heights (freq on y axis) (p hat = )
distribution
describes the possible values and how often those values occur SOCS and specify which distrib
variability
how far a value is from the mean
variability of the sampling disturb is:
lower than pop distrib
*rule of thumb: check for norm approx*
np ≥ 10 n(1-p) ≥ 10
population
parameter
dif between variability of the parameter and sampling variability (sample means and sample proportions)
parameter is a constant -- cannot be changed for sample mean & pop, the sampling variability varies from sample to sample
µ
pop/para mean
P
pop/para proportion (%)
lower case sigma ø
pop/para st.dev
x bar
sample/stat mean
S
sample/stat st. dev
p hat
samplesample/stat proportion (%)
sampling variability
spread of whole sampling process; we want a representative sample
What is true about the standard deviation of the sampling distribution of̅x bar?
st.dev of a *sampling distrib* is *much smaller* than the st.dev of a *pop*
sample
statistic
"when sample size increases, the variability decreases"
variability of sampling distribution (sample size)
Accurate = Precision =
= unbiased = low variability
The shape of the distribution of the sample mean depends on ...
Approximately normal if you are told the *population is normal.* Approximately normal if you the *sample size is sufficiently large* based on the Central Limit Theorem. We use a rule of thumb n ≥30.
*What is the variability of a statistic? Why do we care?*
Definition: The variability of a statistic is described by the *spread* of its sampling distribution. This spread is determined primarily by the size of the random sample.
Population distribution
Gives values of the variable for *all* the individuals in the *population* accounts for everyone so no SRS needed graph has bars of equal height (freq on y axis) (parameter - p = __ )
*How does probability from a sampling distribution differ the probability of selecting an individual from the population?*
The probability of selecting 1 individual from the population will be much smaller the probability from a sampling distribution.
When is a statistic considered an unbiased estimator?
The sample proportion (or the same mean) is an unbiased estimator for the population parameter they are equal to (e.g. BIAS statistic≠parameter)
What effect does the size of the population have on the variability (spread) of a statistic?
The spread of the sampling distribution does not depend on the size the population as long as the sample meets the 10% condition ( the population is at least 10 times larger than the sample)
How is sampling variability related to MOE
We need to estimate sampling variability so we need to know how close our estimates are to the truth -- the moe (suspicion Qs) ex: you're confident but you might get 1-2 Qs wrong
unbiased estimator
a stat (x bar or p hat) used to estimate a parameter if the mean of its sampling disturb is equal to the true value of the parameter being estimated the approx value of a pop
*Define the sampling distribution of a sample mean*
a theoretical distribution of the values that the mean of a sample takes on in all of the possible samples of a specific size that can be made from a given population. Said another way... Suppose that we draw all possible samples of size n from a given population. Suppose further that we compute a statistic (e.g., a mean, proportion, standard deviation) for each sample. The probability distribution of this statistic is called a sampling distribution.
Explain the difference between bias and variability (use Bull's eye example)
•Bias = we consistently miss the bull's-eye in the same direction. Our sample values do not center on the population value. •High variability = repeated shots are widely scattered on the target. Repeated samples do not give very similar results.
*How can you reduce the variability of a statistic?*
•By taking a larger sample. Larger samples give smaller spread. •Also by better design, such as stratified sampling.
What is the ideal estimator?
•No or low bias •Minimal variability
dif between biased and unbiased estimator
•unbiased doesn't mean perfect! It *means not consistently too high or too low* when taking many rand variables •biased means *statistic is consistently higher or lower than the parameter* •Do not confuse w/ survey sampling process (undercoverage, nonresponse) which produces biased data.
