Statistics Exam 3 Cedarville, Liu
Statistical estimation
-The sample means will vary around µ -Sampling distribution of (𝒙) follows a normal distribution with mean µ and standard deviation (e.g., 5). -The 68-95-99.7 rule: For 95% of all samples, the mean will be within two standard deviations of µ. -If the sample mean is within 10 points of µ, then µ is within 10 points of the sample mean.
Stratified random sample selection
1. divide the population into groups of similar individuals, called strate 2. choose a seperate SRS in each stratum and combine these SRSs to form the full sample
Law of large numbers
1. draw independent observations at random from any population with finite mean mu 2. decide how accurately you would like to estimate mu 3. as the number of observations drawn increases, the mean of the observed values eventually approaches the mean mu of the population as closely as you specified and then stays that close
The probabilites must satisfy what two requirements
1. every probability is a number between 0 and 1 2. when you add all the probabilities together, they will equal 1
Rules for means
1. if X is a random variable and a and b are fixed numbers, then mu sub a + bx = a + bmu sub x 2. if X and Y are random variables, then mu sub X+Y = mu sub X + mu sub Y
Rules for variances and standard deviations
1. if X is a random variable and a and b are fixed numbers, then σ_(a+bX)^2=b^2 σ_X^2. 2. if X and Y are independent random variables, then σ_(X+Y)^2=σ_X^2+σ_Y^2 and σ_(X-Y)^2=σ_X^2+σ_Y^2. (addition rule) 3. . If X and Y have correlation ρ, then σ_(X+Y)^2=σ_X^2+σ_Y^2+2ρσ_X σ_Y and σ_(X-Y)^2=σ_X^2+σ_Y^2-2ρσ_X σ_Y.
Two parts of a confidence interval
An interval computed from the data and a confidence level
Statistical Inference involves two prominent techniques
Confidence intervals and hypothesis tests
Confidence interval for the mean mu
For a Normal population with known standard deviation σ, a level C confidence for the mean µ is given by x ̅ ± m, where the margin of error m = z* σ/√n. Here z* is obtained from the standard Normal distribution such that the probability is C that a standard Normal random variable takes a value between −z* and z*.
Error Examples
In a fixed level α significance test, the significance level α is the probability of a Type I error, and the power to detect a specific alternative is 1 minus the probability of a Type II error for that alternative.
How confidence intervals behave
Other things being equal, the margin of error of a confidence interval decreases as the confidence level C decreases, the sample size n increases, and the population standard deviation σ decreases. The sample size n required to obtain a confidence interval of specified margin of error m for a Normal mean is n =〖(z*σ/m)〗^2¬, where z* is the critical point for the desired level of confidence.
Significance test concerning an unknown mean mu
Significance tests for the hypothesis H0: μ= μ¬0 are based on the z statistic, z = (x ̅-μ_0)/ (σ/√n). This z test assumes an SRS of size n, known population standard deviation σ, and either a Normal population or a large sample.
The test statistic and P-value
The test of significance is based on a test statistic. The P-value is the probability, computed assuming that H0 is true, that the test statistic will take a value at least as extreme as that actually observed. Small P-values indicate strong evidence against H0. Calculating P-values requires knowledge of the sampling distribution of the test statistic when H0 is true. If the P-value is as small or smaller than a specified value α, the data are statistically significant at significance level α.
Managing bias and variability
To reduce bias, use random sampling. -When we start with a list of the entire population, simple random sampling produces unbiased estimates—the values of a statistic computed from an SRS neither consistently overestimate nor consistently underestimate the value of the population parameter. -To reduce the variability of a statistic from an SRS, use a larger sample. You can make the variability as small as you want by taking a large enough sample.
What describes the probability distribution of X?
a density curve
Parameter
a number that describes a population -a fixed number, but in practice we do not know its value
Statistic
a number that describes a sample (is computed from the sample data) -values can change from sample to sample -often used to estimate an unknown parameter
Sample
a part of the population that we actually examine in order to gather information
Discrete random variable
a random variable that has a finite number of possible outcomes
Probability Sample
a sample chosen by chance
Random variable
a variable whose value is a numerical outcome of a random phenomenon
How to find the probability of any event
add the probabilities of the particular values that make up the event
Bias and Variability
concerns the center of the sampling distribution -a statistic used to estimate a parameter is unbiased if the mean of its sampling distribution is equal to the true value of the parameter being estimated. -The variability of a statistic is described by the spread of its sampling distribution. --This spread is determined by the sampling design and the sample size n. -Statistics from larger probability samples have smaller spreads.
Simple random sample (SRS of size n)
consists of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually seleccted
Form of the confidence interval
estimate ± margin of error
A test of significance
is intended to assess the evidence provided by data against a null hypothesis H0 in favor of an alternative hypothesis Ha. -The hypotheses are stated in terms of population parameters. -Usually H0 is a statement that no effect or no difference is present, and Ha says that there is an effect or difference. The difference can be in a specific direction (one-sided alternative) or in either direction (two-sided alternative).
The sample mean of an SRS of size n drawn from a large population with mean mu and standard deviation has a sampling distribution with what?
mean µ_(x ̅ )= µ and standard deviation σ_x ̅ = σ/√n.
Power
measures its ability to detect an alternative hypothesis. The power to detect a specific alternative is calculated as the probability that the test will reject H0 when that alternative is true. This calculation requires knowledge of the sampling distribution of the test statistic under the alternative hypothesis. Increasing the size of the sample increases the power when the significance level remains fixed.
Margin of Error
obtained from the sampling distribution and indicates how much error can be expected because of chance variation
Type II Error
occurs if H0 is accepted when in fact Ha is true
Type I Error
occurs if H0 is rejected when it is in fact true
Nonresponse
occurs when an individual chosen for the sample can't be contacted or refuses to participate
Undercoverage
occurs when some groups in the population are left out of the process of choosing the sample
Confidence Level
states the probability that the method will give a correct answer. -That is, if you use 95% confidence intervals, in the long run 95% of your intervals will contain the true parameter value. -When you apply the method once, you do not know if your interval gave a correct value (this happens 95% of the time) or not (this happens 5% of the time).
Continuous random variable
takes all values in an interval of numbers
What is the probability of any event?
the area under the density curve and above the values of X that make up the event
Sampling distribution
the distribution of values taken by the statistic in all possible samples of the same size from the same population or randomized experiment
Population
the entire group of individuals that we want information about
Confidence Interval
used to estimate an unknown parameter with an indication of how accurate the estimate is and of how confident we are that the result is correct.
What must be known for a probability sample?
what samples are possible and what chance each possible sample has
Addition rule for variances of independent random variables
σ_(X+Y)^2=σ_X^2+σ_Y^2 and σ_(X-Y)^2=σ_X^2+σ_Y^2.
General addition rule for variances of random variables
σ_(X+Y)^2=σ_X^2+σ_Y^2+2ρσ_X σ_Y and σ_(X-Y)^2=σ_X^2+σ_Y^2-2ρσ_X σ_Y To find the standard deviation, take the square root of the variance