Statistics Final
Study results are said to be replicable when...
... researchers using new subjects come to the same conclusion.
A significance level that is often used in hypothesis testing by researchers & statisticians is _____.
0.05
Precision
Sampling distribution spread measured by using the standard deviation of the sampling distribution, or the SE. The ____ of an estimator does not depend on population size, only sample size.
Standard Error
Standard deviation of a sampling distribution. Measures how much our estimator typically varies from sample to sample. When this is small, we say the estimator is precise.
Sample Proportions
(P hat) # of successes / # of trials
A new drug is being tested to see whether it can increase the chance of a quick recovery in people who have come down with the flu in the past week. The quick recovery rate in the population of concern is 0.85. The null hypothesis is that p (the population proportion using the new drug that have a quick recovery) is 0.85. What is the correct alternative hypothesis?
p> 0.85
The mean age of all U.S. vice presidents when they took office an example of a ____
parameter
A researcher carried out a hypothesis test using a 2-tailed alternative hypothesis. Which of the following z-scores is associated with the smallest p-value? Explain. i. z =−0.35, ii. z = 1.13, iii. z = −2.39, iv. z = −3.07
z = -3.07, the z-score farthest from 0 has the smallest tail area & thus has the smallest p-value.
In a chi-square test of independence, the # of degrees of freedom = product of (# of rows −1) & ________.
(# of columns -1)
If we reject the null hypothesis, can we claim to have proved that the null hypothesis is false?
No, if the p-value is sufficiently small, the null hypothesis is unlikely to be true, but unlikely is not the same as impossible.
If survey sample conditions satisfy those required by the CLT, then the probability that a sample proportion will fall within 2 standard errors of the population value is ___.
95%
Statistic
A # based on data used to estimate the value of a population's characteristic. Sometimes called an estimator.
Right-tailed Hypothesis
Greater than or equal to the observed value.
Population
Group of objects or people we wish to study.
The % of residents of a certain country who support stricter gun control laws has been 53%. A recent poll of 922 people showed 527 in favor of stricter gun control laws. Assume the poll was given to a random sample of people. Test the claim that the proportion of those favoring stricter gun control has changed. Perform a hypothesis test, using a significance level of 0.05.
H0: The population proportion that supports stricter gun control is 0.53, p=0.53. Ha: p≠0.53.
A coin is flipped 40 times and lands on heads 14 times. You want to test the hypothesis that the coin does not come up 50% heads in the long run. What is the correct null hypothesis?
H0: p=0.50
Steps in preparing to do a hypothesis test
1. Check sampling distribution conditions 2. Set the significance level 3. Compute a test statistic
Hypothesis Testing 4 Steps
1. HYPOTHESIZE- about the population parameter 2. PREPARE- state & choose a significance level. Choose a test statistic appropriate for the hypothesis. State & check conditions required for future computations, & state any assumptions that must be made. 3. COMPUTE the test statistic's observed value, & COMPARE it to the null hypothesis you said you'd get. Find the p-value to measure your level of surprise. 4. INTERPRET- Reject or not reject the null hypothesis? What does it mean in the context of the data?
The power of a hypothesis test depends on what?
1. How wrong the null hypothesis is 2. Significance level 3. Sample Size
Conditions for a 2-sample Proportion Test
1. Large samples 2. Random samples 3. Independent samples 4. Independent within samples 5. Null hypothesis is true
To reach causal conclusions & conclude that the entire population would be affected similarly, a research study must use which of the following?
1. Random Sampling: allows one to conclude an association exists between variables. 2. Random assignment: allows one to conclude that it is a causal relationship.
Conditions for Calculating an Approximate p-Value
1. Random sample 2. Large enough sample size- has at least 10 expected successes & 10 expected failures 3. Without replacement- population size it at least 10x bigger than the sample size 4. Independent- Each observation or measurement must not influence any others. 5. Null hypothesis is true.
What does the confidence interval tell us?
1. Range of plausible values for our population parameter, 2. the confidence level.
Conditions to check when doing a 2-sample z-test of proportions
1. Samples are random 2. Samples are sufficiently large 3. Samples are independent of each other & independent within samples.
A friend claims he can predict the suit of a card drawn from a special deck of 54 cards. There are 3 suits & equal # of cards in each suit. The parameter, p, is the probability of success, the null hypothesis is that the friend is just guessing. 1. What is the correct null hypothesis? 2. What hypothesis best fits the friend's claim?
1. p= 1/3 2. p> 1/3
To check if the sample size is large enough before applying the CLT for Sample Proportions, researchers can verify that the products of the sample size times the sample proportion & the sample size times (1 - sample proportion) are both greater than or equal to what #?
10
If the method has a ____ confidence level, that method always works. The interval must capture the population parameter's ____.
100%, true value
If the conditions of a survey sample satisfy those required by the CLT, then there is a 95% probability that a sample proportion will fall within how many SE's of the population proportion?
2 SE's
In the special case of categorical variables having only 2 categories, the test of homogeneity is identical to _________.
2-tailed z-test of 2 proportions
To conduct a hypothesis test for homogeneity or independence, what must the expected count in each cell be?
5 or more
What should be done to create a confidence interval for a population proportion?
Add & subtract the margin of error to/from the sample proportion.
Hypotheses
Always statements about population parameters, bout sample statistics.
In a hypothesis test, to determine association between 2 categorical variables, what can be said about the null hypothesis?
Always states that there is no association between the variables.
Hypothesis
An uncertain claim made in order to draw out & test its logical or empirical consequences.
Central Limit Theorem
As sample size increases, the distribution of sample means of size n, randomly selected, approaches a normal distribution.
The value in the null hypothesis comes from what?
Assuming the status quo about the population.
What conditions regarding sample size must be met to apply the CLT for Sample Proportions?
Sample size is large enough that the sample expects at least 10 successes & 10 failures.
How is a sampling distribution's bias measured?
By computing the distance between the center of the sampling distribution & the population parameter.
p
Can represent either proportion or probability.
Sample
Collection of people or objects taken from the population of interest.
Null Hypothesis
Conservative, status quo, business statement about a population parameter. Represents no change. Always gets the benefit of the doubt & is assumed to be true throughout the hypothesis-testing procedure.
A friend says he has ESP & can predict whether a coin flip will result in heads or tails. You test him, & he gets 10 right out of 20. Do you think he can predict the coin flip (or has a way of cheating)? Or could this just be something that occurs by chance?
He has not demonstrated ESP; 10 right out of 20 is only 50% right, which is expected from guessing.
A small p-value does what?
Discredits the null hypothesis.
Population Proportion (%)
Does not move, it is always the same.
What is the reason the significance level of a test is not made arbitrarily small?
Doing so decreases the probability of correctly rejecting the null hypothesis.
Statistical Inference
Draws a conclusion about a population based on a sample
2-tailed Hypothesis
Even farther away from 0 than the observed value.
What are controlled experiments?
Experimenters determine how subjects are assigned to treatment groups.
Hypothesis Testing
Formal procedure enabling us to choose between 2 hypotheses when we are uncertain about our measurements.
A study was conducted to determine if a drug was effective at helping jet lag. Subjects were randomly assigned either to 1 of 3 different doses (low, medium, high) or to a placebo, flown in a plane where they could not drink alcohol, coffee, or take sleeping pills.. then examined in a lab where their wakefulness was measured & classified into categories (low, normal, alert). If we test whether jet lag treatments are associated with wakefulness, are we doing an independence or homogeneity test?
Homogeneity because the passengers were randomly assigned to different groups
For 2 or more samples & 1 categorical response variable, to determine association between categorical variables, a _________ is used.
Homogeniety test
In a confidence interval, what info does the margin of error provide?
How far the estimate is from the population value.
For 1 sample & 2 categorical response variables, to determine association between categorical variables, _______ is used.
Independence
What 2 tests can be used to determine association between categorical variables?
Independence & Homogeniety
Why is the Z statistic useful?
It compares our observed sample proportion to the null hypothesis value.
When a person stands trial for murder, the jury is instructed to assume that the defendant is innocent. Is this claim of innocence an example of a null hypothesis or of an alternative hypothesis?
It is a null hypothesis, since it is assumed to be true until evidence can prove otherwise.
A poll asked a random sample of people whether they thought the country was headed in the right direction. Each answered "Right Direction" or "Wrong Direction" & was classified as Republican, Democrat, or Independent. If we wanted to test if party affiliation was associated with the answer, would this test be homogeneity or of independence?
It would be a test of independence because there was only 1 sample.
In a hypothesis test, if the sign in the alternative hypothesis is less than <, then the test is a _______,
Left-tailed test
1-tailed Hypothesis
Less than or equal to the observed value.
Probabilities
Long-run frequencies.
N
Mean, standard deviation. Designates a particular normal distribution.
Bias
Measured as the distance between the mean value of the estimator (sampling distribution's center) & the population parameter. Has a tendency to produce an untrue value.
Confidence Level
Measures the capture rate for our method of finding confidence intervals. It can be changed by changing the margin of error.
Statisticians evaluate the ____ used for the survey, not the outcome of a single survey.
Method
Does the estimator's precision depend on the size of the population?
No
Parameter
Numerical value characterizeing some aspect of the population or proportion of successes.
No matter how many different samples we take, the value of p (population proportion) is always the same, but the value of ____ changes from sample to sample.
P hat
The ___ is a probability. Assuming the null hypothesis is true, the ___ is the probability that if the experiment were repeated, you would get a test statistic as or more extreme than the one you originally got. A small ___ suggests that a surprising outcome has occurred & discredits the null hypothesis.
P-value
When taking samples from a population & computing each sample's proportion, what value is always the same?
Population proportion
p0
Population proportion according to the null hypothesis.
Sampling Distribution
Probability distribution of a sample statistic when a sample is drawn from a population to make inferences about the population . Tells us how often we can expect to see certain values of our estimator, such as bias & precision. Reminds us that P hat is not just any random outcome, it is a statistic used to estimate a population parameter.
Significance Level
Probability of rejecting the null hypothesis when the null hypothesis is true.
Our confidence is a ___ that produces confidence intervals, not in any particular interval. It is incorrect to say that a particular confidence interval has 95% (or any other %) chance of including the true population parameter. Instead, we say that the ___ that produces intervals captures the true population parameter with a 95% probability.
Process
Measurement bias
Questions asked do not produce a true answer
Confidence Interval
Range of values having a set probability of containing the "true" value. Helps determine the validity of the sample statistic. It changes with every random sample collected.
If we decide at the last step that the observed outcome is extremely unusual under this assumption, then & only then do we ___ the null hypothesis.
Reject
Alternative Hypothesis
Research hypothesis. Usually a statement about a parameter's value that we hope to demonstrate is true.
Empirical Rule
Roughly 68% of observations should be within 1 standard deviation of the mean, 95% with 2, & nearly all within 3. The SE is the standard deviation for the sampling distribution.
A practically significant result has which pairs of attributes?
Statistically significant & meaningful.
What is an important difference between statistics & parameters?
Statistics (English letters) are knowable. Anytime we collect data, we can find the statistic's value. A parameter (Greek) is typically unknown.
Census
Survey that measures each population member.
Margin of Error
Tells how far from the population value our estimate can be. If it is too small, we are more than likely to be wrong.
2-proportion z-test
Test the null hypothesis Ho: p1-p2=0 by referring the statistic z=(p^1-p^2)/SEpooled(p^1-p^2) to a standard Normal model
To measure a survey's quality, what do statisticians evaluate?
The method used.
What does the confidence level measure?
The method's success rate of finding confidence intervals.
What is true in a hypothesis test, the farther the test statistic is from 0?
The more the null hypothesis is discredited.
If the sample is collected without replacement, what condition regarding the population must be met to apply the CLT for Sample Proportions?
The population size must be at least 10x bigger than the sample size.
A researcher is testing someone who claims to have ESP by having that person predict whether a coin will come up heads or tails. The null hypothesis is that the person is guessing & doesn't have ESP, & the population proportion of success is 0.50. The researcher tests the claim with a hypothesis test, using a significance level of 0.05. What is the conclusion?
The probability of concluding that the person has ESP when in fact she or he does not have ESP is 0.05.
In the expression "power of a hypothesis test," what does the term "power" refer to?
The probability of rejecting the null hypothesis when the null hypothesis is wrong.
In 2008, a country has a total of 175175 government officials. 2525 were female. For 2008, find a 95% confidence interval for the % of government officials who were female or explain why you should not find a confidence interval for the % of government officials who were female in 2008.
The proportion 25/175 is the population proportion, not a sample proportion. You should not find a confidence interval unless you have a sample & are making statements about the population from which the sample has been drawn.
In hypothesis testing, what does a negative test statistic mean?
The sample proportion was less than the assumed population proportion in the null hypothesis.
When applying the CLT for Sample Proportions, what can be substituted for p when calculating the SE if the value of p is unknown?
The sample proportion's value.
Suppose that, when taking a random sample of 66 from 151151 women, you get a mean height of only 60 inches (5 feet). The procedure may have been biased. What else could have caused this small mean?
The small mean might have occurred by chance.
What is 1 drawback with chi-square tests?
The tests can reveal whether 2 variables are associated, but not HOW they are associated.
An ad claims a magnetized bracelet will reduce arthritis pain for those with arthritis. A medical researcher tests this claim with 233 arthritis sufferers randomly assigned to wear either a magnetized or a placebo bracelet. The researcher records the proportion of each group reporting relief from pain after 6 weeks. After analyzing the data, he fails to reject the null hypothesis. What are valid interpretations of his findings?
There's insufficient evidence that the magnetized bracelets are effective at reducing arthritis pain. There were no statistically significant differences between the magnetized bracelets & placebos in reducing arthritis pain.
With a 2-tailed test, if the test statistic (such as z) is far from 0, will the p-value be large (closer to 1) or small (closer to 0)?
The p-value will be small because a test statistic far from 0 is indicative of a very unlikely event.
When should the null hypothesis be rejected?
When the p-value is less than the significance level.
If the null hypothesis is true, then the Z statistic will be close to __. Therefore, the further the Z statistic is from __, the more the null hypothesis is suspicious, or discredited.
Zero
Surveys based on larger sample sizes have a _______, thus better precision. Increasing the sample size improves precision.
smaller standard error
Under the appropriate conditions, the sampling distribution of the z-statistic is approximately a _____.
standard Normal distribution