Stats 12: Chapter 8- Hypothesis testing for population proportions
Hypothesis testing is____
a procedure that enables us to use and analyze data to decide between TWO statistical hypotheses.
H0: p= p0 is a _____sided hypotheses
two sided
if the alternative hypothesis is two-sided: Ha: _________
Ha: p is not equal to p0
Hypotheses are always statements about______, they are never statements about_____
•population parameters •sample statistics.
The significance level and the p-value are both_____and both are calculated______. To compute these we need to know______
•probabilities •ASSUMING the null hypothesis is true •the probability distribution of the test statistic
We usually do not have data on the population, so we instead use a ________. ______have variability so we will always have_____
•random sample •Sampels •a chance of making the wrong conclusion
When is the outcome so unusual that we should reject the null hypothesis? In other words, how small of a p-value is considered unusual ENOUGH to reject the null hypothesis?
•reject the null hypothesis if the p-value is SMALLER THAN (or equal to) the value chosen for the SIGNIFICANCE LEVEL, alpha.
The denominator of the z-statistic uses the_____
•standard error assuming the NULL HYPOTHESIS H0: p=p0, NOT the observed value p hat.
Hypothesize step of a hypothesis test
•state the hypothesis/ claim you want to test against a neutral, skeptical claim (the population parameter?)
Hypothesis testing is a type of____since we are_____
•statistical inference, since we are using data on SAMPLE to make conclusions about POPULATION parameter
rejecting the null hypothesis means we have____
•sufficient or overwhelming evidence against the null hypothesis and in support of the alternative hypothesis.
In a coin spinning example, the theoretical probability can be thought of as____
•the POPULATION PROPORTION of coin spins that land on heads
The alternative hypothesis (two sided) is that____. In this context, "at least as extreme" means_____. This corresponds to_____(two tailed p values)
•the TRUE value of p is either larger or smaller than what the null hypothesis claims EVEN FARTHER AWAY from 0 than the value you observed •finding the two tailed or two-sided p value
1-B is____
•the ability of the hypothesis test to CORRECTLY reject the null hypothesis.
If the alternative hypothesis is Ha: p>p0 then____
•the alternative hypothesis is that the true value of p is GREATER than what the null hypothesis claims •in this context, "at least as extreme" means "greater than or equal to the observed value." •this corresponds to finding the right-tailed or right-sided p-value: the probability in the RIGHT tail of the N(0,1) distribution
If the alternative hypothesis is Ha: p<p0 then_____
•the alternative hypothesis is that the true value of p is LESS than what the null hypothesis claims •in this context, "at lest as extreme" means "less than or equal to the observed value" •corresponds to finding the left-tailed or left-sided p-value, the probability in the lEFT tail of the N(0,10 distribution
What is considered "at least as extreme" or "as extreme as or more extreme" depends on______
•the alternative hypothesis.
Statistical Inference is____
•the art and science of drawing conclusions about a population based on observing a subset of the population.
the sampling distribution for the one-proportion test statistic is approximately the standard Normal distribution N(0,1) if____
•the following conditions are met •AND the null hypothesis, H0; p=p0 is TRUE
if the test statistic is 0 (or close to 0) then___
•the observed value is CLOSE to what we EXPECTED if the null hypothesis is true •there is little to no evidence to doubt the null hypothesis
If the test statistic is positive then____
•the outcome was LARGER than expected
if the test statistic is NEGATIVE, then____
•the outcome was SMALLER than expected
Do not reject the null hypothesis if____
•the p-value is LARGER THAN the significance level
two-tailed or two-sided p-value
•the probability in both TAILS of the N(0,1) distribution •add the probability together
Left tailed or left-sided p-value
•the probability in the left tail of the N(0,1) distribution •left from the observed test-statistic
right-tailed or right-sided p-value is____
•the probability in the right tails of the N(0,1) distribution •right from the observed test-statistic
The significance level of a hypothesis test is_____
•the probability of mistakenly rejecting the null hypothesis when the null hypothesis is TRUE •mistake is called a Type 1 error •significance level is denoted by the greek letter alpha
the pvalue is__, it is how we measure_______
•the probability of observing a test statistic AT LEAST as extreme as the observed value, if the null hypothesis is true •how we measure surprise in observed values away from the null
To find the approximate p-value we can find_____
•the probability that the N(0,1) distribution has a value at least as extreme as the z-statistic
If we have a two-sided alternative hypothesis, then the p-value is____
•the probability that the N(0,1) distribution is either larger than the calculated z-statistic or smaller than the negative z-statistic.
Condition 1: Random and Independent
•the sample is randomly selected from the population of interest, either with or without replacement and observations are independent of each other- each observation has NO INFLUENCE on any others
p hat is___
•the sample proportion
Condition 2: Large sample
•the sample size n is large ENOUGH that the sample EXPECTS at least 10 successes and 10 failures. •np0 greater than or equal to 10, n(1-p0) greater than or equal to 10
If the three conditions of the central limit theorem hold then_____
•the sampling distribution of the sample proportion p hat is approximately normal, with mean p (the true population proportion) and standard deviation given by the standard error.
By following the rules for rejecting and not rejecting a null hypothesis we guarantee that_____
•the significance level is achieved, meaning that the probability of mistakenly rejecting the null hypothesis is (at most) alpha (the significance level)
Describe the significance level in context: The proportion of movie-goers who watch 3D movies used to be about 20%. Since the release of Avatar, we think the proportion has increased
•the significance level is the PROBABILITY of concluding that the proportion of movie-goers who watch 3D movies is MORE than 20% when the true proportion is ACTUALLY 20%.
p is___
•the true (typically unknown) population proportion of people or objects with SOME CHARACTERISTIC
If we had data on the entire population then we would know______and_______
•the true parameter •and a hypothesis test would not be necesary
p0 is___
•the value of the population proportion that the null hypothesis CLAIMS to be true
the value p0 represents____
•the value of the population proportion that the null hypothesis claims to be true
Ideally we want the probability of both types of errors (type 1 and type 2) to be small, but____
•this is not always possible because we cannot control both at once ex. could reduce probability of type 1 error to 0, but then probability of type 11 error would be 100%. And vice verse
There are____types of one-sided hypotheses
•two •right and left
Ha: p does not equal p0 is a ____sided hypotheses
•two sided
Once we have checked the conditions to use the standard Normal distribution to approximate the distribution of the test statistic, we want to_____
•use this to calculate the p-value
Small p-values (close to 00 mean____
•we ARE surprised! •if the null hypothesis is true, what we observed RARELY happens
the farther the test statistic is from 0 the more_____
•we DOUBT the null hypothesis •large values of the test statistic are evidence AGAINST the null hypothesis
not rejecting the null hypothesis means____
•we have INSUFFICIENT EVIDENCE that the null hypothesis is not true.
The null hypothesis tells us________. If we observe something unexpected or surprising we should______. If we observe something extremely unexpected, we should_____
•what TO EXPECT when we observe data •we should DOUBT the null hypothesis •reject the null hypothesis.
Type 1 error
•when you mistakenly reject the null hypothesis when the null hypothesis is TRUE
Hypothesis tests of proportions use the ______test statistic
•z
The one-proportion z-test statistic is_______where______
•z= (p hat- p0)/ SE •were SE= sq root (p0(1-p0)/ n)
When conducing a hypothesis test, we will test between two hypotheses, _____and _____
•null hypothesis and the alternative hypothesis
Ha: p<p0 is a ____sided hypotheses
•one (left) sided
Ha: p >p0 is a _____sided hypotheses
•one sided (right)
the form of the alternative hypothesis can either be____or____depending on_____
•one-sided or two-sided •depending on the statement about the PARAMETER that we want to support.
We use the p-value to measure____
•our surprise
If, regardless of the data, we NEVER reject the null hypothesis, the significance level is____
0% •alpha= 0
If we find a p-value that is less than alpha, we will reject the null hypothesis H0, there are two possibilities:
1) H0 is actually false and we made the CORRECT conclusion 2) H0 is actually true and we happened to get a random sample that produce a small p- value. (Type 1 error)
If we find a p-value that is greater than alpha, we will fail to reject the null hypothesis. There are two possibilities
1) H0 is actually true and we made the CORRECT conclusion 2) H0 is actually false and we happened to get a random sample that produce a large p-value (Type 11 error)
A formal hypothesis test has four main steps:
1) hypothesize 2) prepare 3) compute to compare 4) interpret
there are two types of one-sided hypotheses
1) if the alternative hypothesis is Ha: p<p0 2) if the alternative hypothesis is Ha: p>p0
Conditions to check for the One-proportion test statistic
1) random and independent 2) large sample 3) big population
Condition 3: Big Population
•if the sampling is without replacement, the population needs to be MUCH LARGER than the sample size •at least 10 times as big
If we reduce the significance level (probability of Type 1 error) then we_____.
•increase the probability of a Type 11 error. •vice versa if we reduce the probability of type 11--> increasing the significance level.
the ONLY way to reduce the probability of both types of error is to____This will____
•increase the sample size •improve the PRECISION of the test, so we make mistakes of ANY TYPE less often.
interpret step of a formal hypothesis test
•make CONCLUSIONS based on the results 1) decide to reject or not reject the null hypothesis 2) interpret this conclusion in the context of the data.a
in the final step of the hypothesis test we need to____
•make a decision between the hypotheses and explain what the decision MEANS
the null hypothesis usually represents the_________statement and is thus given in_______
•neutral, noncontroversial, status quo statement and is thus given in the benefit of the doubt
the null hypothesis often represents____
•no change, no effect, or no difference
sample
• a COLLECTIOn of objects or people taken from the population of interest
a test statistic is_____
• a VALUE (a statistic) that COMPARES our OBSERVED outcome with the outcome the NULL HYPOTHESIS says we SHOULD see.
population
• a group of objects or people we wish to study
statistic is____
• a numerical characteristic of a SAMPLE of data
parameter
• a numerical value that CHARACTERIZES some aspect of the population
A statistical hypothesis is____
• an ASSUMPTION or CLAIM about a POPULATION PARAMETER
the test statistic we use in this class will have the form: z=
•(observed value-NULL mean)/ NULL standard error
If, regardless of the data, we always REJECT the null hypothesis, the significance level is _____
•100% •alpha= 1
The null hypothesis written as______, is _______
•H0 •the NEUTRAL , STATUS QUO, skeptical statement about a POPULATION parameter
the alternative hypothesis, written as _____is____
•Ha •the research hypothesis •is a statement about the value of a parameter that we INTEND to demonstrate IS TRUE.
Large p-values (close to 1) mean we are ___
•NOT surprised •if the null hypothesis is true, what we observed happens pretty often
If we observe an outcome in the red, "rejection region," then we would want to__-_
•REJECT the null hypothesis.
the null hypothesis is assumed to be_____
•TRUE throughout the hypothesis testing procedure.
The probability of a type 1 error is denoted by ____and is called the ______. The probability of a type 11 error is denoted by_____
•alpha •called the significance level •beta
The significance level can depend on context, but many researchers and statisticians used a significance level of____
•alpha=0.05
In this class, the null hypothesis will always have______
•an equal (=) sign
P value is defined to be the probability of observing a test statistic AT LEAST as extreme as the observed value, if the null hypothesis is true. What do we mean by "at least as extreme?'
•asking what is the probability of getting a test statistic value at least extreme or AS EXTREME AS OR MORE EXTREME
In general, we might not know the shape of the sampling distribution of the test statistic so____. However, if certain conditions are met then we are able to_______if_____
•calculating p-values is NOT always straightforward. •APPROXIMATE the sampling distribution for the one-proportion test statistic using the Central Limit Theorem
compute to compare step of a formal hypothesis test
•collect data and compare them to your expectations 1) compute the observed value of the test statistic and COMPARE it to what is expected under the null hypothesis 2) find the p-value in order to MEASURE the level of suprise
Once we collect data on a sample we want to____
•compare what we observe to what we would EXPECT if the null hypothesis is true!
Significance levels of 0% and 100% aren't very informative, so we will have some____
•criterion for HOW unusual the observed data needs to be in order to make the decision to reject the null hypothesis or not •we want a procedure with a SMALL significance level, since we do not want to make mistakes too often
prepare step of a formal hypothesis test
•determine HOW you will use data to make your decision and make sure you have enough data to minimize the chance of a wrong conclusion 1) state a significance level 2) choose a test statistic appropriate for the hypotheses 3) state and check conditions required for computations 4) state any necessary assumptions
The z-statistic measures_____
•distance from the EXPECTED value assuming the null hypothesis, in UNITS of standard errors
If we conclude that the observe outcome (data) is __________, then, and only then do we reject the null hypothesis in favor of the alternative
•extremely unusual or unlikely under this assumption (of the null hypothesis)
Type 11 error or a _______occurs when______
•false negative •when we DO NOT reject H0 when H0 is ACTUALLY false
Type 1 error also called a "______"occurs when_______
•false positive •occurs when we reject the H0 when H0 is ACTUALLY true
We are able to set the significance level to be a _____
•fixed value
one sided hypothesis
•if the ALTERNATIVE hypothesis has a "less than" (<) sign or "greater than" (>) sign)
two-sided hypothesis
•if the alternative hypothesis has a "not equal to" sign •"different" doesn't specify greater than or less than the null
