Estimates and Sample Sizes
If standard deviation is known and the requirements for using z distribution are met, you need to do the following steps.
Press STAT. Use the arrow keys to highlight TESTS. Press 7 for 7:ZInterval... Highlight Stats. Press ENTER. Enter the values for the standard deviation (σ), sample mean number of sample values (n), and confidence level (C-Level). Highlight Calculate. Then press ENTER.
Margin of Error
denoted by E, is the maximum likely (with probability 1-a) difference between the observed sample proportion pˆ and the true value of the population proportion p.
Sample Size Notation
p = population proportion pˆ= sample proportion n = number of sample values E = desired margin of error zα/2 z = z score separating an area of α/2 alpha divided by 2 in the right tail of the standard normal distribution
confidence interval equation
pˆ+E and pˆ-E
The two main activities of inferential statistics are using sample data to
(1) estimate a population parameter (such as estimating a population parameter with a confidence interval), and (2) test a hypothesis or claim about a population parameter.
Two major activities of inferential statistics are
(1) to use sample data to estimate values of population parameters (such as a population proportion or population mean), and (2) to test hypotheses or claims made about population parameters.
You can construct this interval using a graphing calculator.
Press STAT. Use arrow keys to highlight TESTS. Using arrow keys to highlight A:1-PropZInt... Press Enter. Enter the values for x, n, and C-Level. x = 1088 (This is the number of successes.)n = 1600 C-Level = .95 Highlight Calculate.
Notation student t-distribution
μ = population mean x bar = sample mean s = sample standard deviation n = number of sample values E = margin of error t sub α/2 = critical t value separating an area of a/2 alpha divided by 2 in the right tail of the t distribution
In this lesson we present individual components of a hypothesis test. We should know and understand the following:
-How to identify the null hypothesis and alternative hypothesis from a given claim, and how to express both in symbolic form -How to calculate the value of the test statistic, given a claim and sample data -How to identify the critical value(s), given a significance level -How to identify the P-value, given a value of the test statistic -How to state the conclusion about a claim in simple and nontechnical terms
Why Do We Need Confidence Intervals?
-In the previous example we saw that 0.70 was our best point estimate of the population proportion p, but we have no indication of just how good our best estimate is. -Because a point estimate has the serious flaw of not revealing anything about how good it is, statisticians have cleverly developed another type of estimate. -This estimate, called a confidence interval or interval estimate, consists of a range (or an interval) of values instead of just a single value. Recall from the video that a confidence interval is sometimes abbreviated as CI.
In this lesson we present methods for using a sample proportion to estimate a population proportion. There are three main ideas that we should know and understand.
-The sample proportion is the best point estimate of the population proportion. -We can use a sample proportion to construct a confidence interval to estimate the true value of a population proportion, and we should know how to interpret such confidence intervals. -We should know how to find the sample size necessary to estimate a population proportion.
Point Estimate
-The sample proportion pˆ is the best point estimate of the population proportion p. -We use pˆ as the point estimate of p because it is unbiased and it is the most consistent of the estimators that could be used. -It is unbiased in the sense that the distribution of sample proportions tends to center about the value of p—that is, sample proportions pˆ do not systematically tend to underestimate or overestimate p. -The sample proportion pˆ is the most consistent estimator in the sense that the standard deviation of sample proportions tends to be smaller than the standard deviation of any other unbiased estimators.
confidence level
-the probability 1 - α (often expressed as the equivalent percentage value) that is the proportion of times that the confidence interval actually does contain the population parameter, assuming that the estimation process is repeated a large number of times. -Confidence level is sometimes called the degree of confidence, or it might be referred to as the confidence coefficient. -The confidence level gives us the success rate of the procedure used to construct the confidence interval. -The confidence level is often expressed as the probability or area 1 − α (lowercase Greek alpha), where α is the complement of the confidence level. -The most common choices for confidence level are 90% (with α = 0.10), 95% (with α = 0.05), and 99% (with α = 0.01).
Using Technology to Find the Mean and Standard Deviation of Sample
Press STAT. EDIT is highlighted. Press ENTER. Clear any existing data in L1. Then enter the 23 values shown to the right. Press 2nd MODE. (This is the command for QUIT.) Press STAT. Use the arrow keys to highlight CALC. Press ENTER for 1:1-Var Stats. Press 2nd and 1 to enter L1. Notice the list names (L1, L2, L3, etc.) appear above the buttons for the number. Press ENTER.
You can also use a graphing calculator to construct the confidence interval.
Press STAT. Use the arrow keys to highlight TESTS. Press 8 for 8:TInterval.... Highlight Stats. Press ENTER. Enter the values for the sample mean (x⎯⎯modifying above x with bar), sample standard deviation (Sx), number of sample values (n), and confidence level (C-Level). Highlight Calculate. Then press ENTER. The confidence interval is 43.9<μ<50.1 43.9 less than mu less than 50.1.
confidence interval
a range (or an interval) of values used to estimate the true value of a population parameter. A confidence interval is sometimes abbreviates as CI. A confidence interval is associated with a confidence level, such as 0.95 (or 95%).
Student t distribution
the distribution of the ratio of a standard normal random variable, divided by the square root of an independently distributed chi-squared random variable with m degrees of freedom divided by m.
critical value
the number on the borderline separating sample statistics that are likely to occur from those that are unlikely to occur. The number z sub a/2, is called a critical value that is a z score with property that is separates an area of a/2 in the right tail of the standard normal distribution.
Using Technology to Find a Confidence Interval
Press STAT. EDIT is highlighted. Press ENTER. Clear any existing data in L1. Then enter the 16 values shown to the right. Press 2nd MODE. (This is the command for QUIT.) Press 2nd and then Y = to access STAT PLOT. Press ENTER to choose 1:Plot1.... Highlight On and then press ENTER. Use the arrow keys to highlight the histogram. For Xlist, press 2nd and 1 to enter L1. Press ZOOM. Press 9 for 9:ZoomStat.The histogram shows that the data is relatively normally distributed. Because the standard deviation is unknown, we will use the Student t distribution. Press 2nd MODE to QUIT. Press STAT. Use the arrow keys to highlight TESTS. Press 8 for 8:TInterval... Highlight Data. (Choose Stats if you are not using a list of values.) Press ENTER. For List, press 2nd and 1 to enter L1.For Freq, enter 1.For C-Level, enter .99.
EXAMPLE A poll found that 70% of 1501 randomly selected adults in the United States believe in global warming. So the sample proportion is pˆ= 0.70. Find the best point estimate of the proportion of all adults in the United States who believe in global warming.
SOLUTION Because the sample proportion is the best point estimate of the population proportion, we conclude that the best point estimate of p is 0.70. When using the sample results to estimate the percentage of all adults in the United States who believe in global warming, the best estimate is 70%.
alternative hypothesis
The hypothesis that states there is a difference between two or more sets of data.
Push Polling
a polling technique in which the questions are designed to shape the respondent's opinion
student t-distribution requirements
1. The sample is a simple random sample. 2. Either the sample is from a normally distributed population or n > 30.
Confidence level 90%
1.645
Suppose that the true proportion of all adults who believe in global warming is p = 0.75. Then the confidence interval obtained from the poll does not contain the population proportion, because the true population proportion 0.75 is not between 0.677 and 0.723. This is illustrated in the diagram.
A confidence level of 95% tells us that the process we are using will, in the long run, result in confidence interval limits that contain the true population proportion 95% of the time.
point estimate
A single point or value used to approximate a population parameter. Phat is the best estimator of the population proportion for p.
chi-square distribution
A skewed distribution whose shape depends solely on the number of degrees of freedom. As the number of degrees of freedom increases, the chi-square distribution becomes more symmetrical.
hypothesis
A testable prediction, often implied by a theory
hypothesis test
a statistical method that uses sample data to evaluate a hypothesis about a population
Curbstoning
Data collection personnel filling out surveys for fake respondents
Margin or error formula
E=Z sub a/2*σ/Square root n
Margin of error student T distrabution
E=t sub a/2 s/√n
margin of error equation
E=z sub α/2 √ pˆqˆ/n
confidence level example
For a 0.95 (or 95%) confidence level, α = 0.05. For a 0.99 (or 99%) confidence level, α = 0.01.
Finding Point Estimate and E from a Confidence Interval
Point estimate of μ: xbar =(upper confidence limit) + (lower confidence limit)/2 Margin of error: E=(upper confidence limit) − (lower confidence limit)/2
You can also use a graphing calculator to construct a confidence interval for μ.
Press STAT. Use the arrow keys to highlight TESTS. Highlight 7:ZInterval.... Press ENTER. Highlight Stats. Enter the values for standard deviation σ, sample mean, sample size n, and confidence level. Highlight Calculate. Then press ENTER. The confidence interval is 72.4<μ<80.2 72.4
Identifying H sub 0 and H sub 1.
Step 1: Identify the specific claim or hypothesis to be tested, and express it in symbolic form. Step 2: Give the symbolic form that must be true when the original claim is false. Step 3: Using the two symbolic expressions obtained so far, identify the null hypothesis H sub 0 and the alternative hypothesis H sub 1 : H sub 1 baseline is the symbolic expression that does not contain equality. H sub 0 baseline is the symbolic expression that the parameter equals the fixed value being considered. (The original claim may or may not be one of the above two symbolic expressions.)
a z-score with the property that it separates an area of alpha over 2 in the right tail of the standard normal distribution. So that will be a critical value we will be calculating, z of a/2.
if we're looking at the left side of a confidence interval, we might have a negative z sub a/2 as well. And what that is, is the vertical boundary for separating the left a/2 area from the tail.
population mean
mu Calculated by adding up all the values in a population and dividing by the number of items in that population Sum(all items) ------ number of items
confidence interval limits
xbar - E, xbar + E
chi-square
χ2=(n−1)s^2/σ^2
Incorrect Interpretation
-There is a 95% chance that the true value of p will fall between 0.677 and 0.723. -It would also be incorrect to say that "95% of sample proportions fall between 0.677 and 0.723
Correct Interpretation
-We are 95% confident that the interval from 0.677 to 0.723. actually does contain the true value of the population proportion p. -This means that if we were to select many different samples of size 1501 and construct the corresponding confidence intervals, 95% of them would actually contain the value of the population proportion p. -(Note that in this correct interpretation, the level of 95% refers to the success rate of the process being used to estimate the proportion.)
We see that the requirements for using a normal distribution as an approximation to a binomial distribution begin
1. The sample is a simple random sample 2. The conditions for a binomial distribution are satisfied There must be a fixed number of trials and the trials are independent, there are two categories of outcomes and the probability's remain constant for each of the trials. 3. The normal distribution can be used to approximate the distribution of sample proportions because np is greater than or equal to 5 and nq is greater than or equal to 5 are both satisfied. What this means is that there are at least 5 successes and 5 failures.
The requirements for using the sample mean for the population mean.
1. The sample is a simple random sample. 2. The value of the population standard deviation sigma is 3. Third, we'd need to have either or both of these conditions to be satisfied, that the population is normally distributed or that n>30.
The sample mean xbar is the best point estimate of the population mean μ.
Xbar is an unbiased estimator of the population mean μ, and for many populations, sample means tend to vary less than other measures of center, so the sample mean Xbar is usually the best point estimate of the population mean μ.
margin of error
a measure of the accuracy of a public opinion poll
test statistic
a statistic whose value helps determine whether a null hypothesis should be rejected
When an estimate pˆ is unkown
n =[z sub α/2] Squared 0.25/ E Squared ALWAYS ROUND UP TO THE NEAREST WHOLE NUMBER NO MATTER WHAT
When an estimate pˆ is known:
n =[z sub α/2] Squared pˆqˆ/ E Squared ALWAYS ROUND UP TO THE NEAREST WHOLE NUMBER NO MATTER WHAT
Sample size for Estimating Mean μ
n=[zsub α*2^σ/E]^2 ROUND UP TO NEAREST WHOLE NUMBER
degrees of freedom
number of values that are free to vary after certain restrictions have been imposed on all values
inferential statistics
numerical data that allow one to generalize- to infer from sample data the probability of something being true of a population
descriptive statistics
numerical data used to measure and describe characteristics of groups. Includes measures of central tendency and measures of variation.
Notation for a Proportion
p= population proportion ^ p= x/n sample proportion of x sucsesses in a sample of size n. ^ ^ q= 1-p= the sample proportion of failures in a sample of size n.
null hypothesis
the hypothesis that there is no significant difference between specified populations, any observed difference being due to sampling or experimental error.
Confidence level 95%
1.96
Confidence level 99%
2.576
Important Properties of the Student t Distribution
1. The Student t distribution is different for different sample sizes 2. The Student t distribution has the same general symmetric bell shape as the standard normal distribution but it reflects the greater variability (with wider distributions) that is expected with small samples. 3. The Student t distribution has a mean of t = 0 (just as the standard normal distribution has a mean of z = 0). 4. The standard deviation of the Student t distribution varies with the sample size and is greater than 1 (unlike the standard normal distribution, which has a σ = 1). 5. As the sample size n gets larger, the Student t distribution gets closer to the normal distribution.
Properties of the Chi-Square Distribution
1. The chi-square distribution is not symmetric, unlike the normal and Student t distributions. 2. The values of chi-square can be zero or positive, but they cannot be negative. 3. The chi-square distribution is different for each number of degrees of freedom, and the number of degrees of freedom is given by df = n − 1.
Procedure for Constructing a Confidence Interval for p
1. Verify that the required assumptions are satisfied. 2. Refer to Table A-2 and find the critical value that corresponds to the desired confidence level. 3. Evaluate the margin of error 4. Using the value of the calculated margin of error and the value of the sample proportion , find the values of pˆ+E and pˆ-E . 5. Round the resulting confidence interval limits to three significant digits.
Procedure for Constructing a Confidence Interval for μ (With σ Unknown)
1. Verify that the required assumptions are satisfied. 2. Refer to Table A-3 and find the critical value that corresponds to the desired confidence level. 3. Evaluate the margin of error 4. Using the value of the calculated margin of error E and the value of the sample mean , find the values of and . Substitute those values in the general format for the confidence interval. 5. Round the resulting confidence interval limits. If using the original set of data, round to one more decimal place than is used for the original set of data. If using summary statistics, round the confidence interval limits to the same number of decimal places used for the sample mean.
Procedure for Constructing a Confidence Interval for µ (with Known σ)
1. Verify that the requirements are satisfied. 2. Refer to Table A-2 and find the critical value that corresponds to the desired confidence level. 3. Evaluate the margin of error. 4. Using the value of the calculated margin of error E and the value of the same mean , find the values of and . Substitute those values in the general format for the confidence interval. 5. Round the resulting values by using the following round-off rule.
Round-Off Rule for Confidence Intervals Used to Estimate µ
1. When using the original set of data to construct a confidence interval, round the confidence interval limits to one more decimal place than is used for the original set of data. 2. When the original set of data is unknown and only the summary statistics ( ) are used, round the confidence interval limits to the same number of decimal places used for the sample mean.
chi-square distribution Requirements
1.Simple random sample 2. Must have normally distributed values 3.