[Stats] Chapter 7 Notes
Mean of the Sampling Distribution of X-Bar
Suppose that x bar is the mean of an SRS of size n drawn from the population with mean µ and standard deviation σ. The Mean of the sampling distribution of x bar is µ_x = µ
Mean and proportion notation for populations and samples
We write µ (The greek letter mu) for the population mean and x^- (x bar) for the sample mean. We use p to represent a population proportion. The sample proportion p^ (p hat) is used to estimate that unknown parameter p.
Normal/Large Sample Condition for Sample Means
1. If the population distribution is normal, then so is the sampling distribution of x-bar. This is true no matter what the sample size n is. 2. If the population distribution is not Normal, the central limit theorem tells us that the sampling distribution of x-bar will be approximately approaching normal in most cases if n ≥ 30.
Big idea of the sampling distribution
1. Keep taking random samples of size n from a population with mean µ. 2. Find the sample mean x-bar for each sample. 3. Collect all the sample means and display their distribution: the sampling distribution of x-bar. 4. Sampling distributions are key to understanding statistical inference.
Standard Deviation of the Sampling Distribution of X-Bar
Suppose that X-Bar is the mean of an SRS of size n drawn from the population with mean µ and standard deviation σ. The standard deviation of the sampling distribution of x-bar is σ_x= σ/(√n), as long as the 10% condition is satisfied: n ≤ 1/10 N.
Sampling Variety
the value of a statistic varies in repeated random sampling
Sampling Distribution of a Sample Mean from a normal population
Suppose that a population is Normally distributed with mean µ and standard deviation σ. Then the sampling distribution of x-bar has the Normal distribution with mean µ and standard deviation (provided the 10% condition is met) σ/(√n).
Sampling Distribution [specific]
we collect the values of p from all possible samples of the same size and display them all in the sampling distribution.
The behavior of x-bar in repeated samples
1. The sample mean x bar is an unbiased estimator of the population mean µ 2. The values of x-bar are less spread out for larger samples. Their standard deviation decreases at the rate √n , so you must take a sample four times as large to cut the standard deviation of the distribution of x-bar in half 3. You should use the formula σ/(√n) for the standard deviation of x-bar only when the population is at least 10 times as large as the sample (the 10% condition) 4. These facts are true no matter what shape the population distribution has
Parameter
A number that describes some characteristic of the population. The value of the parameter is usually not known because we cannot examine the entire population. Remember, s and p: statistics come from samples and parameters come from populations.
Unbiased Estimator
A statistic used to estimate a parameter is an unbiased estimator if the mean of its sampling distribution is equal to the value of the parameter being estimated. The mean of the sampling distribution of p^ is the true value of the population proportion p. Taking a larger sample doesn't fix bias. Remember that even a very large voluntary response sample or convenience sample is worthless because of bias.
Bias
Bias means that our aim is off and we constantly miss the bull's eye in the same direction. Our sample values do not center on the population value.
Sampling Distribution of a Sample Proportion
Choose an SRS of size n from a population of size N with a proportion p of successes. Let p be the sample proportion of successes. Then: 1. The mean of the sampling distribution of p is the mean of the population distribution p 2. The Standard Deviation of the sampling distribution of p is σp = √((p(1-p)/n) as long as the 10% condition is satisfied: n ≤ 1/10N. 3. As n increases, the sampling distribution of p becomes Approximately Normal. Before you perform Normal calculations, check that the Large Counts condition is satisfied: np ≥ 10 and n(1-p) ≥ 10.
Central Limit Theorem (CLT)
Draw an SRS of size n from any population with mean µ and standard deviation σ. The central limit theorem says that when n is large, the sampling distribution of the sample mean bar is approximately normal. How large a sample size n is needed for the sampling distribution of x-bar to be close to Normal depends on the population distribution. In most cases, the sampling distribution of x-bar will be approximately normal if n ≥ 30. More observations are required if the shape of the population distribution. More observations are required if the shape of the population distribution is far from the normal. In that case, the sampling distribution of x-bar will be very non-Normal if the sample size is small.
High Variability
High Variability means that repeated shots are widely scattered on the target. Repeated samples do not give very similar results.
When the sampling size increases, the sampling distribution of x-bar changes shape and becomes normal.
It is a remarkable fact that as the sample size increases, the sampling distri- bution of x- changes shape: it looks less like that of the population and more like a Normal distribution. When the sample size is large enough, the sampling distribution of x- is very close to Normal. This is true no matter what shape the population distribution has, as long as the population has a finite standard devia- tion s. This famous fact of probability theory is called the central limit theorem (sometimes abbreviated as CLT).
Distribution of sample data
The Distribution of sample data shows the values of the variable "color" for the individuals in the sample. For each sample, we record a value for the statistic p, the sample proportion of red chips.
Variability of a Statistic
The Variability of a Statistic is described by the spread of its sampling distribution. The spread is determined mainly by the size of the random sample. Larger samples give smaller spreads. The spread of the sampling distribution does not depend much on the size of the population, as long as the population is at least 10 times larger than the sample.
The averages of several observations are less variable than individual observations.
The fact that averages of several observations are less variable than indi- vidual observations is important in many settings. For example, it is common practice to repeat a measurement several times and report the average of the results. Think of the results of n repeated measurements as an SRS from the population of outcomes we would get if we repeated the measurement forever. The average of the n results (the sample mean x-) is less variable than a single measurement.
population distribution
The population distribution gives the values of the variable for all individuals in the population.
Sampling Distribution [general]
The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population. The population distribution and the distribution of sample data describe individuals. A sampling distribution describes how a statistic varies in many samples from the population.
Standard Deviation of a Sampling distribution
The standard deviation of p^ gets smaller as the sample size n increases. Because of the square root, a sample four times larger is needed to cut the standard deviation in half.
Statistic
a number that describes some characteristic of the sample. The value of a statistic can be computed directly from the sample data. Remember, s and p: statistics come from samples and parameters come from populations.