Stats Quiz 2
What is the z-score for our sample mean of 150, a population mean of 178, a sample size of 30, and a standard error of 1.46? Provide 2 decimal places
-19.17 Z = (xbar - μ)/SE SE (standard error) = 1.46 xbar (sample mean) = 150. μ (population mean) = 178 Z = (150-178)/1.46 = -19.17 This is a HUGE z-score in terms of absolute value, and is highly highly unlikely. This supports our conclusion that we are sampling from a different population than US adult men.
In the lecture example for the assistant claim, (slides 9 through 11), what is the p-value? HINT: you may need to read through the follow up example, on slide 21 and the rest to fully understand and apply the same concepts to the assistant claim example.
0.00016 0.05 is the significance level, divided by two for each tail (two-sided test), leaving 0.025 for each tail. 1.96 is the critical value corresponding to 0.025 on each tail. 2.6 is the calculated z-score (calculated z statistic). Since the calculated statistic > critical value (or, p-value < alpha/2), we can reject the null hypothesis in this example.
What does a normal QQ plot look like?
In a normal QQ-Plot, normally distributed data should follow the 1:1 line somewhat closely (making a diagonal across the plot area). If there is a significant curve to the data, it is likely non-normal.
Using the state unemployment dataset, we want to compare unemployment rates between Oregon and Washington. We create qqplots (below) and find that the Washington data is non-normal, while the Oregon data is normal. Which test should we use to compare the unemployment rates between a normally distributed OR plot and a non-normally distributed WA plot?
Mann Whitney U Test The correct answer is a Mann Whitney U Test (aka Wilcoxon Rank Sum). The data in one our samples is not normally distributed, therefore we cannot use a t-test (of any type). We must use a non-parametric test. The data is not paired: we are comparing counties in Oregon to counties in Washington. That eliminates the Wilcoxon Signed Rank test, which is a paired non-parametric test.
T/F : A two‐tailed test is more conservative than a one‐tailed test. In other words, with the same significance level (alpha), it is harder to reject a null hypothesis in a two-tailed test than a one-tailed test.
True In a two-tailed test, significance is divided in two tails. Therefore, the test statistic has to be more extreme (larger) to end up in the rejection region (which is smaller on either side than a one-tailed test, because alpha is divided by two) .
T/F : a fairly linear (not perfect) QQ-Plot indicates that the data is normally distributed
True The data doesn't follow the line perfectly, but it's pretty good especially in the middle of the plot. There are no clear non-linear trends (e.g. a curved shape). For real-world data, this is really quite close to normally distributed, so the correct answer is True.
T/F : The value of the calculated statistic determines the p-value.
True Yes. P-value is the probability of observing the statistic (or more extreme) if the null hypothesis is true.
T/F : The confidence level determines alpha (α).
True Yes! For a confidence level of 95% for example: 0.95 + alpha = 1 (total probability). Alpha = 1- 0.95 = 0.05
When is a case when a confidence interval would be used?
We don't know the population mean but want to estimate it using a sample Explanation : Confidence intervals are used when we don't know the population mean but want to estimate it (within some uncertainty) using a sample. It gives us a lower and upper bound for the population mean within some degree of confidence (e.g. 90%, 95%, 99%)
Select the true statements about two-sample t-tests: A : The null hypothesis is that there is no difference between the sample means B : We need to test for normality of both samples before proceeding with a two-sample t-test C : A two-sample t-test compares the sample means of two independent samples D : We don't care whether sample variances are equal or not E : We need to test for variance equality before proceeding with a two-sample t-test F : If the t-statistic is less than the critical value, we reject the null hypothesis
A, B, C, E
T/F : The smaller the p-value, the stronger our evidence for rejection is.
True
What is alpha (significance level) for a confidence level of 90%?
0.1 Alpha would be 1-0.9 = 0.1, which determines the rejection regions. Alpha means the probability of the test statistic falling in the rejection areas if the null hypothesis is true. In other words, alpha is the probability (area under the curve) beyond the critical values. This is the probability of committing a false positive (type I error), in other words, rejecting a null hypothesis that is actually true. Remember, we make the decision based on likely and unlikely areas. Alpha is the probability of "unlikely".
68% of the time, the sample mean lies within +/- ___ standard errors of the true population mean.
1 68% of the distribution lies within +/- 1 SDs from the mean. Remember: standard error is essentially standard deviation for sample means. So the empirical rule applies here as well.
The average US adult male is 178 cm with a standard deviation of 8. You measure the height of a group of 30 people. The sample mean height is 150. What is the standard error for the example above?
1.46 Standard error (SE) = sd/sqrt(n) n = 30 (sample size) sd = 8 (given) SE = 8/sqrt(30) = 1.46
95% of the time, the sample mean lies within +/- ___ standard errors/deviations of the sample mean/true population mean.
1.96 95% of the distribution lies within +/- 1.96 (~2) SE/SDs from the mean/sample mean. Remember: standard error is essentially standard deviation for sample means. So the empirical rule applies here as well.
Using the t-table in the following link, what is the critical value for a two-tailed t-test using an alpha level of 0.01 and with 22 degrees of freedom https://www.sjsu.edu/faculty/gerstman/StatPrimer/t-table.pdf
2.819 In the table, start by looking across the columns to find the one that is 0.01 for a two-tailed test (it's third from the right). Then we go down the rows until we hit df=22. The value in that cell is 2.819, which would be our critical value. Using the critical value approach, if we calculate the test statistic (t-value) from some sample, we can compare t to our critical value. If the absolute value (magnitude) of t is greater than 2.819, the test result is significant (we can reject the null hypothesis).
Using a t-distribution table, what is the t-score (aka t-multiplier, critical value) we would use to estimate a 99% two-tailed confidence interval from a sample of size 23?
2.819 (with margin: 0.005) For a 99% confidence interval, our alpha-level will be 0.01. For a two-tailed CI, that is the first column in the table (where it says 0.01 (two-tailed)). Our degrees of freedom is n-1, so 22. We follow the first column down until df = 22, where the value is 2.819.
What is the degrees of freedom for the t-distribution when we have a sample size of 23?
22
You survey a sample of 28 people. What would be the degrees of freedom (df) for a one sample t-test?
27
If we want to calculate a 95% confidence interval on a large sample (so we use the normal distribution to select a z-score) with a standard error of 5, what is the margin of error?
9.8 Margin of error = Z [a/2] * SE For a 95% confidence level, Z (0.05/2) = 1.96 standard error = 5 (given) margin of error = 5*1.96 = 9.8 If we were to calculate a confidence interval, it would be: xbar +/- 9.8 Where xbar is the sample mean
When would we use a Mann Whitney U Test in place of a two-sample t-test? A : When one or more of the samples is not normally distributed B : Only when both of the samples are not normally distributed C : When the variances are not equal D : When we want to compare data taken at different times
A Mann Whitney U Test is a non-parametric test for testing the differences between two samples. We use it in a similar way to we would a t-test, but unlike t-tests it is effective on non-normal data. If either (or both) of our samples is not normally distributed, we should use the Mann Whitney instead of a t-test.
We run a Shapiro-Wilk test on unemployment rates in New Jersey counties in R and get this output : Shapiro-Wilk normality test data : nj$Unemployment_rate_2018 W = 0.81485, p-value = 0.001117 Select the true statement: A : This is a significant p-value, indicating the data is not normally distributed B : This is not a significant p-value, indicating the data is not normally distributed C : This is not a significant p-value, indicating the data is normally distributed D : This is a significant p-value, indicating the data is normally distributed
A The null hypothesis for a Shapiro-Wilk test is that the data is normally distributed. The p-value is less than the alpha level, therefore we reject the null hypothesis and tentatively accept the alternative hypothesis that the data is not normally distributed.
Below are the results of a Mann Whitney U Test comparing unemployment rates in Washington to unemployment rates in Oregon in 2018. Wilcoxon rank sum test with continuity correction data : washington$Unemployment_rate_2018 and oregon$Unemployment_rate_2018 W = 985.5, p-value = 0.002673 alternative hypothesis : true location shift is not equal to 0 Assuming an alpha level of 0.05, select the correct interpretation based on these test results: A : Unemployment rates were significantly different in Oregon vs. Washington counties, but we do not know which had higher unemployment based on these results. B : Unemployment rates were significantly higher in Oregon than in Washington. C : Unemployment rates were significantly higher in Washington than in Oregon. D : Unemployment rates were not significantly different between Oregon and Washington.
A The p-value (0.0027) is less than the alpha level (0.05). Therefore, we can reject the null hypothesis that there is no difference in the distribution (or medians) between Oregon and Washington. Based on these results alone, we can't tell which is higher or lower, we only know that there's a significant difference. We would have to investigate further to find out which had higher unemployment in 2018.
We are interested in determining if the median household income of Vermont counties is different than the national average of $52807 at a 0.05 significance level. We collect the median household income of 14 counties in Vermont, and run a One Sample T Test. H0: The median household income of Vermont counties is not significantly different than the national average. Ha: The median household income of Vermont counties is significantly different than the national average. We get the following output. Select all that are correct. > t.test(vtCounties$Median_Household_Income, mu=52807) One Sample t-test data : vtCounties$Median_Household_Income t = 2.3653, df = 13, p-value = 0.03435 alternative hypothesis : true mean is not equal to 52807 95% confidence interval : 53272.79 - 63180.06 sample estimates : mean of x = 58226.43 A : We can reject the null hypothesis because p-value is smaller than 0.05. B : DF is 13, meaning that there are 14 observations in our dataset. C : The 95% confidence interval is [53272.79,63180.06], which doesn't include the U.S. national average! This confirms our hypothesis test results too. D : This is a one-tailed one-sample t test. E : The median household income in Vermont counties is significantly different than the median household income in the U.S. (t=2.36; p = 0.034)
A, B, C, E This is a two-tailed test! We are checking for both directions (larger and smaller). R reaffirms that by saying "alternative hypothesis is that true mean is not equal to 52807).
Which are true about the CLT? A : If we take repeated (large) samples, most of the sample means will be close to the population mean B : If we take repeated samples, the sample means will be roughly normally distributed C : The CLT states that large, properly drawn sample will resemble the population from which it is drawn D : All populations are normally distributed
A, B, and C Explanation : Key aspects of the central limit theorem include that a large enough sample that is selected in an unbiased way will resemble the whole population, and that if we take repeated samples the means of those samples will be normally distributed and most will be close to the true population mean. There are no guarantees that the population itself is normally distributed, only that if we sample it repeatedly the means of the samples will be roughly normal.
We run a Wilcox signed rank test for unemployment rates of New Hampshire in 2016 and 2018. Below is the R command and the output. Wilcoxon signed rank test with continuity correction data : nh$Unemployment_rate_2016 and nh$Unemployment_rate_2018 V = 55, p-value = 0.005729 Select all that are correct: A : The null hypothesis is that the unemployment rates of 2016 and 2018 are not different. B : At a 0.05 significant level, we cannot reject the null hypothesis, meaning that the unemployment rates did not change between 2016 and 2018. C : Wilcoxon signed rank test is the non parametric equivalent of paired t test, good for repeated measurements of the same subjects.
A, C Yes, Wilcoxon Signed Rank test is the non parametric version of paired t test, for repeated observations. Note that it's different from the Wilcoxon Rank Sum test (which is the equivalent of two-sample t test)The null hypothesis is that the distribution (or median) of unemployment rates didn't change. We can reject the null hypothesis at a 0.05 for alpha, because the p-value of the test statistic is smaller than 0.05: 0.0057 < 0.05
Which are true about a t-distribution? A: The t-distribution has heavier tails than the normal distribution B : The t-distribution is not bell shaped like the normal distribution C : We use a t-distribution instead of a normal distribution for small sample sizes D : As sample size increases, the t-distribution begins to resemble the normal distribution more and more closely E : As sample degrees of freedom increases, the t-distribution begins to resemble the normal distribution less and less
A, C, D The t-distribution can be thought of as a more conservative/wider version of the normal distribution. It is still bell shaped, but has heavier tails. We use it to estimate confidence intervals for small sample sizes (<30). The t-distribution changes based on the degrees of freedom, which is equal to n-1 where n is the number of observations in the sample. As the degrees of freedom (sample size) increases, the t-distribution becomes more and more like the normal distribution.
F test to compare two variances data : nj$Unemployment_rate_2018 and md$Unemployment_rate_2018 F = 1.0589, num df = 20, denom df = 23, p-value = 0.8879 alternative hypothesis : true ratio of variances is not equal to 1 95% confidence interval : 0.4493047 - 2.5625347 sample estimates : ratio of variances = 1.058855 Select the correct statement A : The p-value was 1.0589 B : The critical value for the F-statistic with degrees of freedom 20 and 23, and confidence level 95% is greater than 0.8784 C : We can assume that the ratio of variances is not equal to 1
B In the R output, the p-value is in the top-right. It is 0.8879, which is not significant at an alpha level of 0.05 (i.e. confidence level of 95%). Because the results were not significant, we cannot reject the null hypothesis that the ratio of variances is equal to 1 (same as saying we cannot reject the null hypothesis that the variances are equal to each other). The correct choice is: "The critical value for the F-statistic with degrees of freedom 20 and 23, and confidence level 95% is greater than 1.0589" We know that our result was not significant (based on the p-value). Therefore, if we had used the critical value approach we know that our test statistic (1.0589) is smaller than the critical value for the given confidence level.
Which of the following is not true about the alpha level A : It represents the probability of rejecting a true null hypothesis B : It represents the probability of a false negative C : It is also call the significance level D : It represents the probability of a type I error
B It represents the probability of a false negative The alpha level represents the probability of a false positive, which is also called a Type I error and is the probability of rejecting a true null hypothesis. It is the probability that we run a statistical test and the result is "significant" and so the null hypothesis can be rejected, but in fact the null hypothesis should not be rejected. The alpha level is also called the significance level. It is typically assumed 0.05 or 0.01 (5% or 1%), although there is ongoing debate in the statistical community in different domains (health, psychology or physics) regarding this. Imagine a normal distribution curve (bell shape). We expect most results to fall within the tallest part of the curve (in the middle of the bell). An alpha level of 0.05 represents the 5% of the area under this curve, split between the low and high ends (2.5% on each side). If our test result falls in this fringe area, we can reject the null hypothesis. If it falls in the meaty part of the curve, we cannot. This is confusing but important! Review the slides and readings, and come to office hours with Morteza or Kylen if you have questions!
Select which of the following statements are true about the F-test for variance between two samples A : a p-value smaller than alpha means that the two variances are equal B : It functions by calculating the ratio of variances and then comparing to a F-statistic table C : We need to run it before performing a two-sample t-test to determine which type of t-test we should use D : We don't need to know the sample sizes to use it effectively
B, C The F-test for variance equality helps us determine if the variances of two samples are significantly different. It works by calculating the ratio between the two sample variances, then comparing the result to a F-statistic table based on the degrees of freedom of the two samples (n-1 for each sample). The size of each sample is an important component (so we can calculate the degrees of freedom), so the answer suggesting we don't need the sizes is incorrect. The null hypothesis is that the variances are equal, so if we get a p-value less than the alpha level we reject the null hypothesis and say that the variances are significantly different. Finally, we run the F-test before a two-sample t-test to determine which type of t-test we should run (the test differs slightly if the variances are equal or not).
Select all that are correct: A : a small test statistic indicates that the null hypothesis is likely false B : The p-value represents the probability that, if the null hypothesis were true, we would observe a test statistic as large (or larger) than we did C : Using a critical value approach, if our test statistic is more extreme than the critical value we reject the null hypothesis D : In the p-value approach, if our p-value is smaller than the alpha level we reject the null hypothesis E : A large p-value indicates that the null hypothesis is likely false
B, C, D This question focuses on the difference between the p-value and critical value approach. In the critical value approach, we calculate a test-statistic (e.g. a z-score) and then compare that to a pre-determined "critical value" (e.g. 1.96). If the test statistic is more extreme (greater in magnitude) than the p-value, we reject the null hypothesis. On the flip side, a small test statistic generally points towards not rejecting the null hypothesis. In the p-value approach, we use the test statistic to calculate a probability called a p-value. The p-value represents the probability that, if the null hypothesis were true, we would observe a more extreme test statistic than what we observed. So a very small p-value indicates that it is unlikely that the null hypothesis is true. We compare the p-value to our alpha level (e.g. 0.05). If it is smaller, we reject the null hypothesis.
Why shouldn't we use a Mann Whitney U Test all the time instead of a t-test?
Because a t-test is more robust/accurate for normally distributed data If the data are normally distributed, we should use a t-test. This is because a t-test is more accurate and robust, so if we can meet the necessary assumptions (i.e. normal distribution) we should use it instead of a non-parametric test like the Mann Whitney U Test (aka Wilcoxon Rank Sum). Remember that the Wilcoxon Signed Rank test is the paired version of the Mann Whitney/Wilcoxon Rank Sum, so there are paired non-parametric tests.
Using the same unemployment data, we run an F-test for variance equality to compare variances between Maryland and New Jersey unemployment. Below is the R output. F test to compare two variances data : nj$Unemployment_rate_2018 and md$Unemployment_rate_2018 F = 1.0589, num df = 20, denom df = 23, p-value = 0.8879 alternative hypothesis : true ratio of variances is not equal to 1 95% confidence interval : 0.4493047 - 2.5625347 sample estimates : ratio of variances = 1.058855 Select the true statement A : The test is inconclusive B : The variances of these two samples are significantly different C : We can assume that the variances of these two samples are equal
C
Below is a list of possible null hypotheses. Select the one that is not a valid null hypothesis statement. A : The average height of the population is 167cm B : There is no relationship between the treatment and disease recovery rates C : The average weight of the population is less than 160 lbs D : Vegetation on south-facing slopes do not show higher growth rates than vegetation on north-facing slopes
C The null hypothesis is the statement we are evaluating to see if the evidence supports rejecting it. It is a statement of no difference or no significance. In this case, the height statement is valid because we can investigate to support rejecting the hypothesis that the average height is 167cm. The treatment and disease recovery example is valid because it is a statement of no relationship, as is the north vs south facing vegetation. The weight example is not a good null hypothesis because it a statement of difference: that the weight is less than 160lbs. This would be a good alternative hypothesis, but the null hypothesis would be: The average weight of the population is 160lbs.
T/F : If we can reject the null hypothesis, we can fully accept the alternative hypothesis
False It is much easier to reject the null hypothesis than to fully accept the alternative. We only need one solid piece of evidence to reject the null hypothesis, but in order to accept the alternative we would have to do many more tests to essentially build an alternative (unknown) distribution. Rejecting the alternative is not proof of the alternative.
A one-sample t-test is used to...
Determine whether a sample mean is significantly different from a known value. A one-sample t-test is used to compare a sample mean against a known/hypothesized value for the mean. It does not tell us the true population mean, only whether or not our calculated sample mean value is significantly different from some known value. For example, if we sample 100 people and find their height, we can then compare the mean of our sample (say 5'7") to the known average height of the population (say 5'9"). Is our mean significantly different? If so, then the "known" value may be wrong and/or we are sampling from a different population then that average height was calculated for. To compare between two samples, we can use a two-sample t-test. We'll learn more about that soon.
T/F : If the obtained p-value is greater than alpha (significance level), we reject the null hypothesis.
False
T/F : The larger the sample size, the larger the standard error
False
T/F : The average US adult male is 178 cm with a standard deviation of 8. You measure the height of a group of 30 people. The sample mean height is 150. True or false, is it likely that these are US adult men?
False Explanation : Think back to the bus example. If we are actually sampling from a population with a mean of 178, we expect the sample mean to be close to that. To have it be 150 indicates that the mean of the population we're sampling from is probably not 178, so we might be sampling from a group of teenagers or the like. Let's quantitatively answer : Z=(150-178)/(8/sqrt[30]) = -19.17 This means that if our population mean is in fact 178, our sample mean of 150 is more than 19 standard errors away from the population mean! That's extremely unlikely. Remember, 99.7% of all sample means should be within 3 standard errors of the population mean.
Say we are testing to see if the home sale prices in Boulder in 2019 were significantly different from the known average in Colorado as a whole. We want to use an alpha-level of 0.05. We run a one-sample t-test and calculate a t=2.12 and p=0.034. Select the correct summary statement:
The home prices in Boulder were significantly different from the home prices in the entire state of Colorado (t=2.12, p=0.034) We can find our answer looking at the p-value relative to the alpha level. Since p < alpha, we know that we can reject the null hypothesis, which in a t-test is that there is no significant difference between the sample mean and the known value we're comparing to (in this case home prices in all of Colorado). We reject that null hypothesis, so our summary statement is that there is a significant difference between Boulder home prices and the known value: The home prices in Boulder were significantly different from the home prices in the entire state of Colorado (t=2.12, p=0.034)
T/F : If (the absolute value) of the observed statistic is greater than the critical value, we reject the null hypothesis.
True
A national survey of 6,000 men (sample size) aged 50 to 64 found that 22% had participated in binge drinking within the last month. Determine the 95% confidence interval for the proportion of ALL American men aged 50 to 64 who binge drank within the last month. HINT: You will need the formula for calculating confidence interval for "proportions". SD (and SE) are calculated using a different formula for proportions. Refer to our lecture slides!
[21%,23%] This sample clearly meets the requirements for calculating a confidence interval for the population proportion. Refer to the lecture, the confidence interval for proportions follows a slightly different formula. We can use the Z distribution for proportions, and since it's a 95% confidence interval, then Za/2=1.96. Also, p = 0.22 and 1-p = 0.78
We want to see what the wait time of our customers feels like. We observe 23 customers, and see that their average wait time is 5, and the standard deviation is 0.5 minutes. What is the 99% confidence interval for our wait times?
[4.71, 5.29] Margin of error = t a/2 * SE = t a/2 * sd/n For a 99% confidence level and degree of freedom 22 (23-1), our critical value (multiplier) would be: From the t-distribution table, t 0.01/2 =2.82 (rounded up) So margin of error = 2.82*0.5/23 = 0.29 The 99% confidence interval would be: X̄ ±0.29 Where X̄ is the sample mean. So the 99% confidence interval would be [ 5 - 0.29, 5 + 0.29] or [4.71, 5.29] We are 99% confidence that this [4.71, 5.29] interval includes the population mean (ALL customer's wait time). If we wanted the 95% confidence interval, the t 0.05/2 =2.074 (for the df of n-1 = 23-1 = 22). margin of error = 2.074*0.5/23 = 0.22 95% confidence interval: [4.78, 5.22]
Type I Error
a false positive, the incorrect rejection of a null hypothesis a false positive means the test was positive (a relationship/significance was found) but should have been negative (in reality there was no significance or relationship)
Central Limit Theorem
a large, properly drawn sample will resemble the population from which it's drawn - Larger the sample, the closer the match - make inferences about populations without studying the whole population! - many samples = sample means normally distributed around true population mean
When do we use a one-tailed test? Select all that apply. A : When you consider the consequences of missing an effect in the untested direction and conclude that they are negligible and in no way irresponsible or unethical. B : When you have good reason to expect that the difference will be in a particular direction. C : When we only want to know if something is significantly lower or higher than a reference. Not both. . D : When we have a prior knowledge of the direction of the difference.
all correct
In hypothesis testing, what represents the probability of observing the test statistic in the rejection region, provided that the null hypothesis is true?
alpha
In hypothesis testing, a __________ is a point on the test distribution that delineates the rejection regions corresponding to the significance level (alpha). We can compare the (calculated) test statistic to this point to determine whether to reject the null hypothesis.
critical value
Type II Error
false negative, the incorrect acceptance of a null hypothesis A false negative means the test was negative (we didn't find a relationship or significance) but should have been positive (in reality, a relationship/significance exists).
What does standard error measure?
measures the dispersion of (multiple) sample means (around the population mean)
What does standard deviation measure?
measures the dispersion of observations around the sample mean, within one sample
A random sample of 50 students at one school was obtained and each selected student was given an IQ test. These data were used to construct a 95% confidence interval of [96.656, 106.422]. The correct interpretation of this confidence interval is...
that we are 95% confident that the mean IQ score in the population of all students at this school is between 96.656 and 106.422.