Midterm 2
Test statistic
- is based on the statistic that estimates the parameter in the hypotheses - is a measure of the compatibility between the null hypothesis and our sampled data.
Statistical significance
- only indicates whether the observed effect is likely to be due purely to chance because of random sampling - may not be practically important.Statistical significance does not provide any information aboutthe magnitude of the effect, only that there is one -
A hypothesis test
- provides a conclusion regarding rejection of H0, but does not provide a range of likely values for the population parameter - Notice: that these statements only apply to two-sided hypothesis tests.
significance level, α is
- the probability that H0 would be rejected if H0 is true. - The significance level is the level of risk for making the wrong decision when H0 is true. - When α= 0.05 is chosen, the data need to give strong enough evidence againstH0so thatH0is rejected no more than 5% of the time when H0 is true
Statistical significance
- will be found, by chance, if searched for among multiple experiments - only indicates whether the observed effect is likely to be due purely to chance because of random sampling - may not be practically important. Does not provide any information about the magnitude of the effect, only that there is one - A significant effect could be too small to be relevant. With a large enough sample size, significance can be shown for tiny effects
Let m denote the margin of error, then m=z∗σ√n:
A higher confidence levelCimplies a largerm, thus lessprecision in the range of estimates. A larger sample sizenimplies a smallerm, thus moreprecision in the range of estimates. *A smaller standard deviation,σ, implies a smallerm, thusmore precision in the range of estimates. (usually you can't change sigma)
Question: is this conclusion reliable?
A. Yes n(Po) 52(0.5) 26 z > 10 n(l-po) 32(0.5) 26 > 10
C. Explain the implication of constructing a conservative confidence interval in context.
Answer: The implication of a conservative confidence interval is that the p-value is higher and you're less likely to reject the null hypothesis. In context, this means that we are more likely to get a result that the difference between the STEM students and STEM TAs is zero.
Subscripts are used to denote group membership
Measures for group 1: p1,n1,ˆp1 Measures for group 2: p2,n2,ˆp2
Subscripts are used to denote group membership.
Measures for group 1:μ1,σ1,n1, ̄x1,s1. Measures for group 2:μ2,σ2,n2, ̄x2,s2.
Robustness
The inferential procedures based on thetdistribution are validwhen the population follows a Normal distribution For data that are not exactly Normal, the results of thetprocedures are approximately validif:1the population is roughly similar to a Normal distributionAND2the sample size is large enough.Thetprocedures arerobustto small deviations from Normality,meaning that the results will not be affected much when thesample size is large enough.
Matched pairs inference
To apply inferential methods to matched pairs data:1Create a new variable that records the difference between thetwo measurements for each individual or pair.Each subject's difference value is independent of othersubjects'.2Conduct a one-samplet-test on the difference variable. ORConstruct a one-sampletconfidence interval for the truemean difference.
6. A 90% confidence interval estimate for the population mean is (125.8, 142.6). If the same sample values are used to conduct the test of the hypotheses H0 : μ= 145 vs. Ha : μa /= 145, which of the following is true regarding the p-value? Explain.
True. Since 145 isn't in the 90% CI, it is not a plausible value of the mean at a 10% significance level so we will reject the null hypothesis and the p-value would be < 0.10.
two-sample t test
When two independent SRSs drawn from two distinct Normalpopulations,Hypotheses: H0:μ1−μ2= 0 vs Ha:μ1−μ2<,>,or6 0.Test statistic: t=( ̄x1− ̄x2) - (μ1−μ2)/√(s2/n1+s2/n2)= ( ̄x1- ̄x2)/√s21n1+s22n2 P value:P(T≤t)forHa:μ1−μ2<0P(T≥t)forHa:μ1−μ2>02P(T≤−|t|)forHa:μ1−μ26= 0Note:T∼t(k) where kis smaller of (n1−1,n2−1)
6. Using your proportion of tails as an educated guess, how many spins are needed to construct a 95% confidence interval with a margin of error that is no bigger than 0.01?
m_star = 0.01 p_star = 0.53333333333 n = (z_star / m_star)**2 * p_star*(1-p_star) n = np.ceil(n) print(n)
sample size code
m_star = 0.05 p_star = 42/52 n = (z_star / m_star)**2 * p_star*(1-p_star) n = np.ceil(n)
Question: What sample size is needed for a margin of error nomore than 0.10 with 95% confidence?
m_star = 0.10 p_star = 0.5 n = (z_star / m_star)**2 * p_star*(1-p_star) * 2 n = np.ceil(n) print(n)
A random sample of banks will be selected to estimate the meanLTDR. The standard deviation of LTDR for all banks is known tobe 12.3. Question: If an acceptable margin of error is 3 or less, how manybanks should be surveyed?
m_star = 3 n = (z_star * sigma / m_star)**2 n = np.ceil(n) print(n)
Do engineering UVa students study more per week than other UVastudents, on average?Question:What is the appropriate procedure?
matched pair two-sample t-test, no one-on-one pairing, two independent samples
Do sprinters have faster 100m times after switching to a newlyassigned track shoe?Question:What is the appropriate procedure?
matched pairs t for means
Is there a difference in IQ scores between twins, on average? Question: What is the appropriate procedure?
matched pairs t test, natural matching / pairing up
Draw a simple random sample of 35 coins using Python. Take a screenshot that shows the first few selected coin ages and your code.
my_n = 35 samp_coins = coins.sample(n=my_n) print(samp_coins)
In a given situation (clinical trial for new drugs, manufacturingspecifications, etc.), the margin of error may need to be limited toa certain value. Let m∗denote the intended margin of error.
m∗= z∗σ/√n ⇐⇒ n=(z∗σ/m∗)^2 always round up
For the approximation to be valid
n should at least be large enough that np≥10 and n(1−p)≥10
Estimating an unknownσ
s2=∑(xi− ̄x)2n−1is an unbiased estimate ofσ2. Whenσisunknown, the samplestandard deviation,s, is used to estimateσ.
E. Calculate the standard deviation of all of the sample means determined in activity A.
sample_ t = np.array([84 + 86.8 + 87.5 + 91.6 + 86 + 82.8 + 87.6 + 81.6 + 86.6 + 84.8]) sigma = np.std(sample_t) sigma = round(sigma, 4) print(sigma)
3C. What is the variable of interest to explore the average increase in time due to the Stroop effect? Explain.
second round times minus first round times results in a positive increase OR the parameter of interest is the average difference
Whether a one-sided or a two-sided test
should be used dependson the question being asked and what is known about the problembefore a test of statistical significance is performed.
Do UVa student athletes have higher GPAs than non studentathletes on average? Question:What is the appropriate procedure?
two sample t test for means: one quantitative and one categorical / binary variable
Is there a difference in opinion regarding legalizing marijuanabetween Greek and non-Greek UVa students?Question:What is the appropriate procedure?
two sample z test for proportions
Question: Should the hypotheses be one-sided or two-sided?
two-sided
Suppose that a hypothesis test is carried out at a 5% significance level with hypothesesH0:μ= 115 vs.Ha: μ >115. The p-value associated with this hypothesis test is 0.012 Question: Based on the given information, which of the following95% confidence intervals for the population mean is possible?
u =/ 115 p < 0.05: Reject Ho (115.8, 117.6)
to reduce variability
use a larger sample
A test of statistical significance (hypothesis test)
uses sample data to determine the validity of a specific hypothesis
If a few random samples are taken from the same population, thenumerical summaries that we would calculate for each samplewould
vary from sample to samplenot be equal to the corresponding numerical summarycalculated from the entire population
If there is truly no association between cell phone use and brain cancer occurrence and 20 tests are conducted each with α= 0.05
we expect to see a significant result from 1 of these tests, purely by chance
Recall the bone density study measuring the grams of mineraldensity before and after exercise for 35 subjects. A test will beconducted with 5% significance andσ= 2.Question:What type of test should be conducted for thisexperiment?
we know population sigma -> z_test if didn't know (most of time ) matched pairs -> t_test
The population mean and standard deviation were found to be$405.17 and$210.59, respectively.Suppose that we draw repeated samples of size 25 from thispopulation.Question: What do we expect the sampling distribution for thesample mean to be?
x ~ N(405, 210/sqrt25)
Suppose that we instead draw repeated samples of size 50 fromthis population.Question: How would the expected sampling distribution change?
x ~ N(405, 210/sqrt50)
Question: What is a reasonable sample statistic forμ1−μ2?
x1 bar - x2 bar
The levelCconfidence interval for the difference between twomeans of independent Normal populations is
x1- ̄x2)±t∗√s1^2/n1+s2^2/n2) wheret∗is the value from thet(k) curve with areaCbetween−t∗andt∗, andkis the smaller of (n1−1,n2−1). Thisapproximation is again a conservative approach.
type 1
α type 1 - null is true but reject - false rejection of a true hypothesis - probability = level of significance reduced: increasing level of significance cause: smaller sample size / less powerful test happens: when acceptance levels are too strict
type 2
β type 2 - null is accepted but false - false acceptance of an incorrect hypothesis - probability = level of significance reduced: decreasing level of significance cause: luck or chance happens: when acceptance levels are too lenient
The hypothesis test uses the standard deviation of the sampleproportion under the null hypothesis:
σH0ˆp=√p0(1−p0)/n
Neither the hypothesis test nor the confidence interval use thestandard deviation of the difference in sample proportion,
σˆp1−ˆp2=√p1(1−p1)/n1+p2(1−p2)/n2
Neither the hypothesis test nor the confidence interval use thestandard deviation of the sample proportion,
σˆp=√p(1−p)/n
Recall the bone density study measuring the grams of mineraldensity before and after exercise for 35 subjects. A test will beconducted with 5% significance andσ= 2.Question:Is this a matched pairs experiment?
"before" and "after" yes
cholesterol code
# create the needed data level_red = chol.level_day2 - chol.level_day4 # known and sample values n = 28 xbar = np.mean(level_red) s = np.std(level_red, ddof=1) # determine t* t_star = stats.t.ppf(0.975, df=n-1) t_star = round(t_star, 3) print(t_star) # determine CI LL = xbar - t_star * s/np.sqrt(n) UL = xbar + t_star * s/np.sqrt(n)
4. Construct a 95% confidence interval to estimate the true proportion of spins that ultimately land on tails.
# determine z* z_star = stats.norm.ppf(0.975) z_star = round(z_star, 3) print(z_star) # determine confidence interval LL = phat - z_star * np.sqrt(phat*(1-phat)/n) UL = phat + z_star * np.sqrt(phat*(1-phat)/n) print(round(LL,3)) print(round(UL,3))
Venmo code
# null hypothesis # step 1: # H_0: p = 0.5 # H_a = p > 0.5 # alpha = 0.05 p0 = 0.5 # sample and known values n = 52 X = 42 phat = X/n # test statistic test_stat = (phat - p0)/np.sqrt(p0*(1-p0)/n) print(test_stat) pval = 1 - stats.norm.cdf(test_stat) print(pval) # step 3: p = 0.0000045834 (really small) # step 4: since P < alpha, Reject H_0 # step 5: based on the data provided, we have sufficient evidence to suggest that people prefer to make friends to pay back in rounded amounts # perhaps: if you want to make more friends, pay back in rounded amounts n*(1-p0) n*(p0) # determine z* z_star = stats.norm.ppf(0.975) z_star = round( z_star, 3) print(z_star) # determine confidence interval LL = phat - z_star * np.sqrt(phat*(1-phat)/n) UL = phat + z_star * np.sqrt(phat*(1-phat)/n) print(round(LL,3))print(round(UL,3)) n*phat n*(1-phat)
P-values and significance level
- There is no sharp border between significant and insignificant,only increasingly strong evidence as the p-value decreases - A p-value is more informative than a reject-or-not conclusion using a fixed significance level - Example: Knowing that the test is significant at the 1% levelfor a two-sided test vs. knowing that the p-value is 0.0000006 - Knowing the p-value allows assessment at any significance level.
A confidence interval
- indicates whether H0 will be rejected fora two-sided hypothesis test - also provides a range of likely values for the population parameter.
The general form of confidence intervals isestimate±margin of errorThe levelCconfidence interval for the difference betweenindependent population proportions is
(ˆp1−ˆp2)±z∗√(ˆp1(1−ˆp1/n1+ˆp2(1−ˆp2)/n2)
Question: What statistic is used to estimateμ?
- x
Question:What is the estimate?
- x
Question: What is the confidence level multiplier for a 95%confidence interval?
- x +/- z* sigma / sqrt(n) if std = 0.05
A random sample of 110 banks in selected and the mean LTDRwas found to be 76.7. The standard deviation of LTDR for allbanks is known to be 12.3. Question: What is the 95% CI for the population mean LTDRbased on this sample?
- x = 79.7 sigma = 12.3 - x +/- z* (sigma/sqrtn) = 76.7 +/- 1.96 (12.3/sqrt10)
Could random variation alone account for the difference betweenthe null hypothesis and the result from a random sample?
- A small p-value implies that random variation alone is unlikely to account for the difference betweenH0and the observation from our random sample. - With a small p-valueH0is rejected as there is evidence thatthe true characteristic of the population is different from whatwas stated inH0. - The smaller the p-value, the stronger the evidence againstH0.
Confidence intervals and hypothesis tests
- Confidence intervals are generally used to estimate a range ofplausible values for an unknown parameter. - Hypothesis tests are generally used if the value of a parameteris different from, greater than, or less than a specific value.
For a hypothesis test for one population mean, the following factors play a role in the magnitude of the test statistic:
- Difference between ̄x and μ0: Big Diff -> Larger z - Sample size: Bigger n -> Larger Z - Standard deviation: Smaller std -> Larger Z
Cautions regarding test conclusions
- Failing to reject, H0, is a failure to observe sufficiently strong evidence against it. It does not mean that H0 is true - A conclusion about H0 may be invalid if data production was poorly designed - Rejecting H0 does not necessarily imply a practically significant departure from H0. The context of the data needs exploration with graphical and numerical summaries
Both the hypothesis test and confidence need theNormalapproximation to the Binomialto hold. The following rules of thumb are typically used:
- For the hypothesis test, bothnp0andn(1−p0) should be atleast 10. - For the confidence interval, bothnˆpandn(1−ˆp) should beboth at least 10, which is equivalent to seeing at least 10successes and at least 10 failures in the sample. - If the rules of thumb are not met, the results of the one-samplezprocedures arenot valid. Alternative procedures are available.
When a SRS of size n is drawn from a population with unknown proportion p of successes
- Hypotheses: H0:p=p0 - Test statistic: p−p0/(√p0(1−p0)/n) - P-value: P(Z≤z) for Ha: p<p0 - P(Z≥z) for Ha:p>p0 - 2P(Z≤−|z|) for Ha: p=/p0
The required strength of evidence depends mainly on:
- Plausibility of H0: stronger evidence is required to reject a notion more firmly established in the status quo - The consequences of rejecting H0: stronger evidence is required to favor a more radical change in thinking - Standard significance levels of α= 0.10,0.05,0.01 are often used purely by convention.
Notation
- Population proportion:p - Number of successes in the sample: X - Sample proportion:ˆp=X
p-value
- Tests of statistical significance quantify the chance of obtaining a particular result (or a more extreme result) from a random sample if the null hypothesis were true.
Approach to inference
- The parameter of interest for comparing two population means istheir difference, - μ1−μ2.The confidence interval will estimate μ1−μ2. - The hypothesis test will use the null hypothesis H0:μ1−μ2= 0
Either both or neither of these will be true for a given sample mean
- The test of Ha:μ6=μ0 will reject H0 at significance levelα - μ0 is not contained in a (1−α)100% confidence interval forμ.
In a large population of adults, the mean IQ is 112 with standarddeviation 20. Suppose 200 adults are randomly selected for amarket research campaign. Question: The sampling distribution for the sample mean IQ for200 adults is
.Approximately normal with mean 112 and standard deviation1.414
power of test
1 - B - complement of type 2 error - probability of rejecting Ho when it's false
steps
1. Ho = 227 Ha =/ 227 2. test_stat (z = -20) 3. p-value (p = 0.0455) 4. decision (since p<sl, we reject Ho 5. conclusion in context
General steps in hypothesis testing
1. State the null and alternative hypotheses 2. Calculate the test statistic 3. Obtain the p-value for our data 4. Make decision based on p-value and level of significance 5. State conclusion in context
Review of hypothesis testing
1. State the null and alternative hypotheses 2. Calculate the test statistic 3. Obtain the p-value for our data 4. Make decision based on p-value and level of significance 5. State conclusion in context.
The sampling distribution of ̄xis exactly/approximately N(μ,σ√n). Question:In what cases is this true?
1.) cases when the population is normally distributed 2.) cases when the sample size is large enough that CLT holds
Question:What is the 95% confidence interval for the averagereduction in cholesterol level for heart attack patients as more timepasses (from two to four days) after their heart attack? Question:What is the 95% confidence interval for the distancejumped by frogs made of copy paper?
1.) n = 105 mu0 = 13 2.) xbar = np.mean(jumps.distance) 2.) s = np.std(jumps.distance, ddof = 1) 2.) test_stat = (xbar - mu0) / s/np.sqrt(n)) 3.) pval = stats.t.cdf(test_stat, df=n-1) 4.) Because p-val = 0.9995 -> Conclusion: the data does not provide sufficient evidence to state that the mean jump distance for copy paper is less than 13 cm, on average 5.) t_star = stats.t.ppf(0.975, df=n-1) t_star = round(t_star, 3) 6.) LL = xbar - t_star * s/np.sqrt(n) UL = xbar + t_star * s/np.sqrt(n) Conclusion: we are 95% confidence that the true mean triple jump distance for copy paper frogs is between 14.0 cm and 16.6 cm
Consider a two-sided hypothesis test with test statistic z Question: What is the p-value of this test?
2 * P(Z > |z| ) or 2 * P( Z < -|z| ) stats.norm.cdf(-abs(z)) * 2
A random sample of 110 banks in selected and the mean LTDRwas found to be 76.7. The standard deviation of LTDR for allbanks is known to be 12.3. Question: What is the 95% CI for the population mean LTDR ifthe sample size had been 25 instead? Assume that the twosamples yield the same sample mean.
76.7 +/- 1.96 (12.3/sqrt25) n = 25 LL = xbar - z_star * sigma/np.sqrt(n) UL = xbar + z_star * sigma/np.sqrt(n) smaller n = wider interval
When the sampling distribution of ̄xis known to be exactly orapproximately Norma
95% of all possible estimates (like ̄x) willbe within roughly 2 standard deviations of the samplingdistribution of the population parameter (likeμ).
3A. What category of inference procedures would be appropriate for exploring the average increase in time due to the Stroop effect? Explain.
Answer: A T inferential procedure would be the most appropriate for exploring the average increase in time due to the Stroop effect because we do not know the standard deviation of the population, but we can calculate the standard deviation of the sample using the standard error of the sample mean. The CLT applies because the sample size is large enough, which should also ensure that the sample is close enough to a Normal distribution for the T inferential procedure to be reliable.
5. Interpret the margin of error in context.
Answer: A margin of error tells you how many percentage points your results will differ from the real population value. In context, our 80% confidence interval with a 0.663 percent margin of error means that our statistic will be within 0.663 percentage points of the real population value 80% of the time.
6B. Is it reasonable to conclude that this sample is minorly different from the histogram of Normal data?
Answer: Yes, this sample is a t-distribution, meaning it has associated degrees of freedom and a slightly larger spread than a histogram of Normal data. The sample size is large enough, however, that it should be approaching the standard Normal distribution.
10B. In February 2013, two California residents filed a lawsuit against Anheuser-Busch alleging that the company was watering down beers to boost profits. Anheuser-Busch advertises that the alcoholic content is 5.0% and the alcoholic content of beer is typically Normally distributed. Suppose that the alcoholic content of 5 cans of beer is tested and the mean and standard deviation are found them to be 4.8% and 0.11%, respectively.
Answer: As we don't know the population standard deviation, a T inferential procedure would be the most appropriate. We can calculate the standard deviation of the sample mean by estimating the standard error of the sample mean, and thus, use a T inferential procedure. While the sample size is rather small, Anheuser-Busch claims that the data is Normally distributed, and so the test should remain robust.
8. What is the conclusion in context?
Answer: Based on the sample we were given, we have sufficient evidence to state that the true mean age of all coins is above 2.5 years, and thus, the U.S. Mint should produce more coins in order to lower the average age closer to 2.5 years (the true population parameter).
7. What is the statistical conclusion of the test with a 1% level of significance? Explain.
Answer: Because our p-value is less than 1% ( 0.00000724176366 < 0.01), the null hypothesis is rejected as the data sufficiently supports the alternative hypothesis.
10A. Excess cellulose in alfalfa that will be fed to dairy cows reduces the relative feed value of the product. If the cellulose content is too high, the price will be lower and the producer will have less profit. A cellulose content that is more than 140 mg/g is considered high. An agronomist examines the cellulose content of one type of alfalfa, which is Normally distributed and has a standard deviation of 8 mg/g. A sample of 40 cuttings has a mean cellulose content of 145 mg/g.
Answer: Because we know the population standard deviation and that the data is normally distributed, a Z inferential procedure would be most appropriate for this scenario. Moreover, the sample size is large enough that the CLT also applies and ensures that the test would be reliable.
D. For the sample mean to be an unbiased estimator of the population mean, a large sample size is needed.
Answer: False. The best way to get an unbiased estimate of the population mean is to use random sampling and select an unbiased statistic for the parameter of interest. A large sample size reduces variability.
B. The sample mean is a good estimator of the population mean because it is always equal to the unknown population mean.
Answer: False. This is false because it is true that the sample mean should be equal to the unknown population mean, but biased samples will have sample means that are not close to the centre of the population mean.
13. Consider the test statistic for the test for the population mean. How does increasing the sample size affect the evidence against the null hypothesis? Explain.
Answer: Increasing the sample size would affect the evidence against the null hypothesis because the test statistic is calculated by xbar-mu0 / (sigma/square root of sample size). The test statistic would increase as sample size increases, indicating less compatibility between the null hypothesis and the sampled data.
8.2. The population average age of coins is 3.972344. Did all of the intervals reported to the LTA contain the population average age? Explain.
Answer: No, most of them did, but some of them had biased samples that either contained really new coins or really old coins. This is because the confidence interval was only 80 percent.
D. What are the appropriate hypotheses to test if there is a difference between the average time increase for current STEM students and STEM TAs? Be sure to specifically define all parameters.
Answer: The Null hypothesis is that the difference in time increase between current STEM students (μ1) and STEM TAs (μ2) is zero. The Alternative hypothesis is that the difference in time increase between current STEM students (μ1) and STEM TAs (μ2) is not equal to zero. Ho: μ1 - μ2 = 0 Ha: μ1 - μ2 ≠ 0
3.3. What would the conclusion of the appropriate test be with 1% significance? Explain.
Answer: The conclusion of this test would be inconclusive because the possible p-value range (0 to 0.05) has values that are above and below the 1% significance level (0.01). As such, we cannot reject or fail to reject the null hypothesis because our p-value can be greater or less than the significance level.
D. Do you think that the mean calculated in activity C. is greater than, less than, or equal to the mean year of the population, μ? Explain.
Answer: The mean calculated in activity is less than the mean year of the population. We know this because the mean of all sample means is 85.93, and the mean of the population is 86.736. The mean of the samples can be greater or less than the value if n is a low number, like in this case 5. If n was 25 and there were more than two samples, then the mean of samples would be much closer to equal to the mean year of population.
8.1. How would these intervals change if this activity was repeated building 95% confidence intervals instead of 80% confidence intervals?
Answer: The range of the intervals would be bigger (probably about 3.5 and 5.6) because the confidence level would be higher.
3B. Will the results of these inference procedures be reliable with the information available thus far? Explain.
Answer: The results would be reliable with the information available thus far because even if the data is not normal, t-procedures are approximately valid if the population is roughly similar to a normal distribution and the sample size is large enough. The CLT applies in this situation because the sample size should be significant to hold an approximately normal distribution, and thus, the t-test should be reliable.
7. Is the confidence interval constructed in activity 4 reliable? Explain.
Answer: Yes, it's reliable because the test should have high robustness. We have both a sufficiently large enough sample size and the population is roughly similar to a Normal distribution, which are the two factors required for a valid inferential procedure based on the t-distribution with not exactly Normal data.
8.3. Would your answer to the previous question change if 95% confidence intervals had been built? Explain.
Answer: Yes, there would be fewer intervals that did not contain the population average. This is because the intervals would all have larger ranges, and thus, be more likely to contain the population average age because the confidence interval would be 95 percent.
7. Suppose that a hypothesis test for the hypotheses H0 : μ = 10 vs. Ha : μ < 10 is conducted with 5% significance and the associated p-value is 0.0564. Which of the following 95% confidence intervals for the population mean is possible? Explain.
Answer: 2 Because the p-value (5.64%) is greater than the level of significance (5%), then we fail to reject the null hypothesis stating that u = 10 because we can't conclude that a significant difference exists. So this eliminates option 1 and option 3 because they don't include the number 10 as the mean even though we determined that it's the value we're looking for.
A. A lab safety officer plans to compare the durability of nitrile and latex gloves for chemical experiments. A random sample of 30 students in university labs are selected for the experiment. Each student will perform the same organic synthesis using the same procedure twice, once wearing each type of glove.
Answer: A matched pairs procedure is the most appropriate because the participants of both sets of values are identical, which means they are matched to each other.
B. If a new process for copper mining is to be adopted, it must produce more than 50 tons of ore per day, which is larger than the current process produces.
Answer: A one sample t-test is most appropriate because there is only one population, and we are determining if this population produces more than 50 tons of ore per day.
D. What type of inference procedures is appropriate for this analysis?
Answer: A one sample z-test would be appropriate for this analysis because we have a random sample of 60 drawn from a population with an unknown proportion of successes (p).
C. Students who want to learn French are divided into groups. One of the groups is flown to France where they live for one month. The other group is enrolled in an intensive, month-long French course at the university. At the end of the month, all students are given a standard French language exam.
Answer: A two sample t-test is more appropriate because there are two independent populations and we will be comparing the mean score between the two populations to observe differences between scores on the french language exam.
14.1. The method is based on the sampling distribution of the sample statistic.
Answer: Both
14.3. The method provides a range of plausible values for the unknown population parameter of interest.
Answer: Confidence Intervals
G. In evaluating the usefulness of a sample statistic in estimating an unknown parameter, it is enough to have an unbiased estimator.
Answer: False, we want a relatively large number of random samples as well
14.2. The method is based on the assumption that the null hypothesis is true.
Answer: Hypothesis Testing
4.4. The method measures the amount of evidence found in the data against a claim.
Answer: Hypothesis Testing
9.2. What changes to this activity would make the use of confidence intervals Inappropriate?
Answer: If the sample size was smaller and the central limit theorem did not hold, the use of confidence intervals would be inappropriate.
12.2. What are the consequences of conducting a hypothesis test in an inappropriate situation?
Answer: In an inappropriate situation such as having a very small sample size, an inaccurate P value could support a null hypothesis instead of rejecting the null. In this case, an inappropriate hypothesis test could provide inaccurate justification for not producing more coins.
F. Do you think that the standard deviation calculated in activity E. is greater than, less than, or equal to the standard deviation of the year of the population, σ? Explain.
Answer: It should be less than the standard deviation of the year of the population. This is because the sample means should be closer together and more similar than the individual observations of the population.
11.1. Did all of the samples drawn from the population lead to the same test conclusion? Explain why or why not.
Answer: Most of the samples drawn from the population lead to the same test conclusion, but not all of them. This is probably due to the variation in the random samples drawn by each group. If one group had a biased sample (very young coins), they would have a different test conclusion.
5. Based on the results for activity 4, is there evidence that the Stroop effect for STEM TAs at UVa is more than 7.5 seconds? Be sure to state the hypotheses of the relevant test and explain your conclusion.
Answer: No, the value (7.5) is within the confidence interval from question 4, and thus, there is evidence to support the null hypothesis that Mu equals 7.5. Consequently, we are failing to reject the null hypothesis and concluding that there is not evidence that the Stroop effect is more than 7.5 seconds. Ho: μ = 7.5 Ha: μ > 7.5
A. Create a variable measuring the increase in time due to the Stroop effect. Note: The resulting values should be directly comparable to the values measuring the STAT TAs' increase in time that you created in the previous lab activity. In other words, use the same equation on the student data that you used on the TA data. Are these values measuring the increase in time independent? Explain.
Answer: No, these values measuring the increase in time are for students are not independent because they are matched pairs; the two resulting samples of values are related to each other in that the members of one sample are identical to the members of the other sample. import numpy as np import pandas as pd import scipy.stats as stats STtime = pd.read_csv(r"") round2 = STtime.Round2 round1 = STtime.Round1 difference = round2 - round1
A. Are the values measuring the increase in time due to the Stroop effect for STAT TAs created in the previous lab activity independent? Explain.
Answer: No, they are dependent because any personal traits in each STAT TA participant will be carried over from the first test to the second. As matched pairs, the participants of both sets of values are identical to one another.
B. What observations can you draw from the interval constructed in activity A.?
Answer: The confidence interval ranges from -2.9 to -0.3 which means the population mu for difference in average time to complete the first round of the game between current STEM students and STEM TA is not zero. The values are also negative, which means that STEM students take more time to complete the first round than STEM TAs.
sample means are
less variable than individual observations centered around the population mean. "more normal" than individualobservations.
Page Break for 2G G. Explain the implication of conducting a conservative test in context.
Answer: The implications of a conservative test are that our P-value was probably a little bigger than it actually was, meaning that there was less evidence to reject the null hypothesis than there would have been on a non-conservative test. In context, this means that there might have been a larger P-value, and so less evidence that there was a difference between the means of STAT TAs and STAT students.
E. Conduct the appropriate test using the conservative approach to determine if there is evidence that there is a difference in the average time increase due to the Stroop effect for current STEM students and STEM TAs.
Answer: The p-value was .7273, which is greater than the significance level of 0.05. As such, the null hypothesis is not rejected, and data does not support the alternative hypothesis. This means the results are statistically significant. In context, this means that there is no evidence of a difference in the average time increase due to the Stroop effect for current STEM students and STEM TAs. n1 = 464 xbar1. = np.mean(STdifference) s1 = np.std(STdifference) n2 = 38 xbar2 = npmean(TAdifference) s2 = np.std(TAdifference) test_stat = (xbar1 - xbar2) / (np.sqrt(s1**2/n1 + s2**2/n2)) print(test_stat df_min = np.min([n1-1, n2-1]) print(df_min) pval = 2 * stats.t.cdf(-abs(test_stat), df=df_min pval = round(pval, 4) print(pval)
9.1. Explain what makes this case appropriate.
Answer: This case is appropriate because the sample size is large enough that the central limit theorem holds true.
A. Sample statistics, such as the sample mean, vary from sample to sample; they are not fixed values.
Answer: True
C. The sample mean is an unbiased estimator of the population mean because if we take a "large" number of random samples and calculate the sample mean for each one, the average of the resulting sample means is equal to the unknown population mean.
Answer: True
E. The variance of the sample mean describes the average squared distance of the sample means from the population mean. The smaller the variance, the closer the sample means are, on average, to the unknown population mean.
Answer: True - Law of Large Numbers
F. The variance (or standard deviation) of the sample mean decreases as the sample size increases.
Answer: True - Law of Large Numbers
10.2. Considering all possible randomly selected samples of size 35 from this population, the percentage of those samples' confidence intervals that include the true value of the population mean is 80%.
Answer: True. If the interval is constructed with 80% confidence, the interval contains the parameter in 80% of samples.
F. Which kind of error (Type I or Type II) could you have made? Explain.
Answer: We could have made a Type I error because we rejected the null hypothesis, but it could have actually been true. We could have had a biased sample, a sample size that was too small, or a significance level that was too high and caused us to falsely reject the null hypothesis.
F. The graphic below shows boxplots for the time increase for both current STAT students and STAT TAs. Is your conclusion from activity E. reliable? Explain.
Answer: Yes, our conclusion from activity E seems reliable. The boxplots show that the STAT students have more outliers and a wider range, but the IQR and the median is generally similar between the two populations. This seems to support our conclusion. Moreover, the test should be fairly robust (and so reliable) because n1 + n2 > 40, even though n1 is not approximately equal to n2.
E. Are the results of the test conducted in activity C. reliable? Explain.
Answer: Yes, they are reliable. NPo is 48 and N(1-Po) is 12. Both of these are greater than ten which means the hypothesis test is valid because the Normal approximation to the Binomial hold.
5. Is the interval constructed in activity 4 reliable? Explain.
Answer: Yes, they are reliable. Np̂ is 32 and N(1-p̂) is 28. Both of these are greater than ten which means the confidence interval is valid. This essentially means we saw at least ten successes and ten failures in the sample.
10.1. The probability that the true value of the population mean falls between the bounds of your computed interval is 80%.
Answers: False. This is because it is not the probability that the true value will fall in your calculated confidence interval, but that all sample's confidence intervals include the value.
9.3. What are the consequences of using confidence intervals in an inappropriate case?
Answers: It would be a bad estimate of variability for the parameter, and probably be inaccurate.
Question: To find as much evidence as possible against the null hypothesis, what is the best magnitude of the test statistic?
B. Larger
Quality control randomly selects four of their containers of cherrytomatoes. Each container is labeled 1/2 lb. (227 g.) and theaverage weight of the selected containers is 222 g. Previousanalysis has shown that the weights of cherry tomato containers isNormally distributed with standard deviation 5 g. Question:Is there evidence that the packaging machine needscalibration?
Based on our sample of four containers, we have sufficient evidence to state that the machine does need to be calibrated, as we rejected the null hypothesis of mu = 227.1
conclusion
Based on the sample we were given, we have sufficient evidence to state that the true mean weight of tomatoes in these containers is not 227 grams. In essence, we have evidence to suggest that there's a difference in the mean weight.
Question:Whatever significance level is used for a hypothesistest, that is the magic value at which the null hypothesis changesfrom definitely true to definitely false.In other words, whenα= 0.05, a p-value of 0.051 means that thenull hypothesis is definitely true and a p-value of 0.049 means thatthe null hypothesis is definitely false.
False
Two-samplezprocedure (proportions):
two categorical(binary) variables.
Weights of brown eggs are N(65, 5) grams. Weights of white eggsare N(μ, 5) grams. Question: What range roughly contains the middle 95% of thesample means for each sampling distribution?
Brown: 65 +/- 2 * 5/sqrt12 White: 64.2 +/- 2 * 5/sqrt12
Question:What type of variable is type/precision of payment?
Categorical
A study is conducted to investigate if the average daily hours spent at a manufacturing plant is less than 8 hours per day. The mean hours spent working found from a very large sample is 7.998hours per day.
Confidence interval
sampling variability
Each time a random sample is drawn from a population, we arelikely to select a different set of individuals and calculate a differentvalue of the statistic. This fact that the values of the statistic willvary over different samples is called
Confidence intervals generally take the following form:
Estimate ± margin of error.
Example: Venmo A 2018 UVAToday article summarizes portions of a studyconducted by researchers at UVa's Darden School to investigate ifpeople tend to prefer making friends with others who repay dinnertabs in rounded amounts rather than in exact amounts. It reportsthat "52% of respondents used precise amounts in recenttransactions with friends" via Venmo or PayPal. Question: What is the variable of interest being measured?
Exact vs. Rounded Repayment
A two-sided test of the population mean has hypotheses:
H0: μ=μ0 Ha:μ6=μ0 OR H0: μ=μ0 Ha: μ > μ0 Note: μ0is the value of the parameter under the null hypothesis.
Question: is there evidence that paper frogs made of copy paper jump less than 13 cm on average?
Ho: u = 13 Ha: u < 13
A company tests whether the mean volume of tea in their bottlesis 500 ml, as stated on the label. The company is concerned thatthe bottles contain less than advertised, which could lead tocomplaints of false advertising. Question:What are the null and alternative hypotheses?
Ho: u = 500 Ha: u < 500
The FDA tests whether a generic drug has an absorption extentsimilar to the known absorption extent of the brand-name drug it iscopying. Higher or lower absorption would both be problematic. Question: What are the null and alternative hypotheses?
Ho: ug = ub Ha: ug =/ ub
When a SRS of sizenis drawn from a population with unknownmean,μ, and known standard deviation,σ, if the population isNormal or the sample size is large enough for the CLT to hold
Hypotheses: H0: μ=μ0 vs Ha: μ <,>,=/ μ0 Test statistic: z= ̄x−μ0/ σ/√n P-value: P(Z≤z) for Ha: μ < μ0 P(Z≥z) for Ha:μ > μ0 2P(Z≤−|z|) for Ha: μ6=μ0
When a SRS of sizenis drawn from a Normal population withunknown mean,μ, and unknown standard deviation,σ,
Hypotheses: H0:μ=μ0 vsHa:μ <,>,or6=μ0 Test statistic:t= ̄x−μ0s/√n Pvalue:P(T≤t)forHa:μ < μ0P(T≥t)forHa:μ > μ02P(T≤−|t|)forHa:μ6=μ0Note:T∼t(k) wherek=n−1
Confidence level:
If a confidence intervals is constructed withC% confidence level, the confidence interval contains theparameter (likeμ) inC% of samples.
Motivation
Many studies collect data on categorical variables that are thevariable of interest for analysis. Some quantitative variables can bemodified to be recorded as a categorical variable. For example, thequantitative variable age can be recorded as age groups. We willexplore inferential methods for binary variables (categoricalvariables with two classes).
matched pairs design
If an experiment is designed to compare treatments or conditionsat the individual level, the two resulting samples of values are notindependent (they are related to each other). The members of onesample are identical to, or matched with, the members of the othersample.
sampling distribution
If many random samples of the same size are drawn from a givenpopulation, the values calculated for the statistic will follow apredictable pattern the probability distribution of that statistic
As a general guideline, the two-sampletprocedure is fairly robust,especially if the sample sizes of both groups are similar.
If n1+n2<15, the normality assumption is critical. Ifn1+n2≥15, proceed in the absence of outliers and strongskewness. Ifn1+n2≥40, the procedures are generally robust. Enhance robustness by planningn1≈n2.
Between matched pairs and two-sample t procedures
If there are a pair of values from each subject, or if subjects for m obvious pairs, a matched pairs design is most likely appropriate.
Weights of brown eggs are N(65, 5) grams. Weights of white eggsare N(μ, 5) grams. A dozen white eggs are randomly selected andthe sample mean, ̄x, is 64.2 grams. Question:Is there evidence that white eggs are lighter than browneggs, on average? Question:Should this be a left-tailed or right-tailed test?
Left-tailed
When a SRS of sizenfrom a large population with populationproportionpof successes andnis large, the sampling distributionof ˆpisapproximately
N( p , √p(1−p)/n )
Weights of brown eggs are N(65, 5) grams. Weights of white eggsare N(μ, 5) grams. Question: If repeated samples of size 12 are drawn from eachpopulation, what is the distribution of the sample means?
N(u, 5/sqrt12
Question:The scores in a course have improved slightly fromprevious semesters after a huge monetary investment. Is the slightimprovement relevant in light of the monetary investment?
No
Weights of brown eggs are N(65, 5) grams. Weights of white eggsare N(μ, 5) grams. A dozen white eggs are randomly selected andthe sample mean, ̄x, is 64.2 grams Question:Is it reasonable to say that this provides evidence thatwhite eggs are, on average, lighter than brown eggs? / Question:Is this observed difference between the sample mean ofwhite eggs and the population mean of brown eggs due to chance, or is it due to the (possible) fact that white eggs are lighter thanbrown eggs on average?
No, use probability calculation
Between one-sample and two-sample t procedures
One-sample t procedure: one quantitative variable. Two-sample t procedure: one quantitative variable and one categorical (binary) variable.
HypothesesTest statisticsP-valuesTest for the population meanExample: EggsWeights of brown eggs are N(65, 5) grams. Weights of white eggsare N(μ, 5) grams. A dozen white eggs are randomly selected andthe sample mean, ̄x, is 64.2 grams. Question:Is there evidence that white eggs are lighter than browneggs, on average? Question:Should this be a one-sided or two-sided test?
One-sided
Quality control in a food company randomly selects four of theircontainers of cherry tomatoes packaged for sale. Each container islabeled 1/2 lb. (227 g.).
Quality control's decision to make:Does the machine that sortscherry tomatoes into containers need calibration? In statistical terms:Is the population meanμof the distributionof weights of cherry tomato containers different from 227 g.?
In the study described in this UVAToday article, 52 participantswere presented with transaction histories for an individual who paidprecise amounts and another individual who paid rounded amountsin Venmo and PayPal. The participants were then asked to selectthe individual they would rather be friends with. It turns out that42 preferred to be friends with the individual who paid roundedamounts. Question: What is the 95% confidence interval for the proportionof people who prefer to make friends with others who they perceiveas less petty?
Question: What is the 95% confidence interval for the proportionof people who prefer to make friends with others who they perceiveas less petty?
Example: Cholesterol
Recall the data containing cholesterol levels for heart attackpatients two days after their heart attack. The study alsomeasured the cholesterol levels for these patients four days aftertheir heart attack.Each subject had a pair of measurements: cholesterol leveltwo days and four days after the heart attack.The measurements are not independent: a subject'scholesterol level four days after a heart attack is most likelyassociated with their cholesterol level two days after theattack.
The diameter of a component of a motor is designed to be 4 mm. If this component is too large or too small, the motor will notfunction properly. The manufacturer takes a random (sufficiently large) sample of this component of the motor and finds the 95% confidence interval for the mean diameter of this component to be(3.45, 3.88) Question: If the appropriate hypothesis test with 5% significance had been conducted instead, what would the conclusion have been?
Reject H0
Whenσ1andσ2are unknown, the standard deviation of ̄x1− ̄x2isunknown. The standard error of ̄x1− ̄x2is
SE ̄x1− ̄x2 = √(s1^2/n1+s2^2/n2)
The hypothesis test uses the standard error under the nullhypothesis:
SEH0ˆp1−ˆp2=√(ˆp(1−ˆp)(1/n1+ 1/n2))
The standard error of the sample proportion is
SEˆp = √ˆp(1−ˆp)/n
The confidence interval uses the standard error of the sampleproportion:
SEˆp = √ˆp(1−ˆp)/n.
The standard error of the sample proportion
SEˆp1−ˆp2=√ˆp1(1−ˆp1)/n1+ˆp2(1−ˆp2)/n2
The confidence interval uses the standard error of thedifference in sample proportions:
SEˆp1−ˆp2=√ˆp1(1−ˆp1/n1+ˆp2(1−ˆp2)/n2
Question: What is the reported "52% of respondents used precise amounts in recent transactions with friends" via Venmo or PayPal?
Statistic
Case 1: The population of the variable of interestXisN(μ,σ).
The sampling distribution of the sample mean ̄Xfor all possiblesamples of sizenisN(μ,σ√n).
Quality control in a food company randomly selects four of theircontainers of cherry tomatoes packaged for sale. Each container islabeled 1/2 lb. (227 g.).Question:What are the hypotheses to test if the machine thatsorts cherry tomatoes into containers need calibration?
The hypotheses are: H0: μ= 227 Ha: μ6= 227
Normal approximation rule of thumb:
The number of successesand the number of failures inbothsamples are at least 10.
significance level, α
The p-value is compared with a level that we regard as decisive
Case 2 - Central Limit Theorem:
The population of the variableof interestXis not Normally distributed and has meanμandstandard deviationσ. Ifnislarge enough, the sampling distribution of the sample mean, ̄X, for all possible samples of sizenis approximatelyN(μ,σ√n)
Normal approximation rule of thumb:
The sample size should be atleast large enough thatnp0andn(1−p0) areboth at least 10
Normal approximation rule of thumb:
The sample size should be atleast large enough thatnˆpandn(1−ˆp) areboth at least 10
Recall that the standard error of a statistic is the estimate of thestandard deviation of the statistic determined with the data
The standard deviation of the sample proportion is σˆp1−ˆp2=√p1(1−p1)/n1+p2(1−p2)/n2
Standard error of sample proportion
The standard deviation of the sample proportion is: σˆp = √p(1−p)/n
null hypothesis, H0.
The statement being tested in a test of significance is usually a statement of no effect or no difference
Both thetand standard Normal distributions are centered at zero,symmetric, and bell-shaped.
Their differences are:t(k) has an associated degrees of freedom.t(k) has slightlylarger spread.As the sample size increases,t(k) approaches the standard Normal.
Whenσis unknown and estimated bys, the samplingdistribution of ̄xis unknown.
Theone-sample t-statisticist= ̄x−μ0/s/√n.The one-sample t-statistic approximately follows thetdistribution with degrees of freedomn−1.
Cautions
These CI formulas are based only on data from a SRS. Since the sample mean is not resistant, neither is the CI. In order for the sampling distribution to be exactly orapproximately Normal, either 1) the population must beNormal or 2) the sample size must be large enough for theCentral Limit Theorem to hold. These CI formulas are based on the fact that the populationstandard deviation,σ, is known. In reality, this is not pratical.
Motivation 2.0
To analyze if there are differences in means between twoindependent population, a quantitative variable is comparedbetween the two groups (which are defined by a thrid, binaryvariable).
to eliminate bias
Use random sampling. Select an unbiased statistic for the parameter of interest.
scenario
We take many random samples of a given sizenfrom apopulation with meanμand standard deviationσ.
Two-sample z-test
When a SRS is drawn from a each of two independent populations with unknown proportions of successes, p1 and p2, Hypotheses: H0:p1−p2= 0 vsHa:p1−p2<,>,or6= 0. Test statistic:z=ˆp1−ˆp2√ˆp(1−ˆp)(1/n1+1/n2),where ˆp=X1+X2n1+n2. P-value:P(Z≤z)forHa:p1−p2<0P(Z≥z)forHa:p1−p2>02P(Z≤−|z|)forHa:p1−p26= 0 Normal approximation rule of thumb: The number of successesand the number of failures inbothsamples are at least 10.
Sampling distribution of sample proportion
When a SRS is drawn from each of two independent large populations with population proportions p1 and p2 of successes and both sample sizes are large, the sampling distribution of ˆp1−ˆ p2 is approximately N(p1−p2,√p1(1−p1)n1+p2(1−p2)n2) For the approximation to be valid,nshould at least be largeenough thatn1p1≥10,n1(1−p1)≥10,n2p2≥10, andn2(1−p2)≥10 When the approximation is valid, thezstatistic is z= ( ˆp1−ˆp2)−(p1−p2)√p1(1−p1)n1+p2(1−p2)n2
Conservative approach
When computing confidence intervals, an approach is conservativeif it results in confidence intervals that area bit wider than thetrue confidence interval.
Suppose that a SRS of sizenis drawn from N(μ,σ).
Whenσis known, the sampling distribution of ̄xis N(μ,σ√n).The one-sample z-statistic isz= ̄x−μσ/√n, which follows thestandard Normal distribution.
One-sampletconfidence interval
Whenσis unknown, the levelCconfidence interval for apopulation mean is ̄x±t∗s√nwheret∗is the value from thet(n−1) distribution curve with areaCbetween−t∗andt∗.
A study is conducted to investigate if the average daily hours spentworking at a manufacturing plant is less than 8 hours per day. Themean hours spent working found from a very large sample is 7.998hours per day, which corresponds to a p-value of 0.016. Thesignificance level used for this hypothesis test isα= 0.05. Theconclusion stated as a result of the study is that there is evidencethat the average daily hours spent working at the manufacturingplant is significantly less than 8 hours per day. Question:Is this conclusion appropriate and reasonable?A.YesB.No
Yes it's appropriate but not reasonable because difference is only 0.002 hours. o
Question: Is this interval reliable?
Yes, because the number of successes and failures in each samples are greater than 10
Question: Is this conclusion reliable?
Yes, because there are more than ten successes and more than ten failures for each sample
12.1. Is this information relevant for the hypothesis test conducted above? Explain.
Yes, this is relevant because confidence intervals further confirm the result of our hypothesis tests. Because significance tests are not always accurate, it's more beneficial to use both tests to confirm statistical significance. Also, CLT needs assumption that it's approximately normal to be met.
Question:In the conservative case, the p-value is
a little less likely to reject the null hypothesis
statistic
a number describing a characteristic of asample. The value can change from one sample to another.Remember that the sample is the part of the populationfrom which data are collected. often used to estimate an unknown parameter
Matched pairstprocedure (means):
a pair of values fromone quantitative variableunder two treatments or fromobvious pairs.
In hypothesis testing,
an approach is conservative if it results inp-values that area bit larger than the true p-values
The degrees of freedom
are approximated using the smaller of(n1−1,n2−1). This degrees of freedom approximation isconsidered aconservative approach.
When testing the hypothesisH0:μ=μ0
based on an SRS of sizenfrom a Normal population with unknown meanμand knownstandard deviationσ, the fact that ̄xis distributed N(μ,σ/√n) isused.
A study is conducted to investigate if the average daily hours spentworking at a manufacturing plant is less than 8 hours per day. Themean hours spent working found from a very large sample is 7.998hours per day, which corresponds to a p-value of 0.016. Question:Would the 95% confidence interval created using thissample information include 8 hours per day?A.YesB.No
because our p-value is so low and we rejected the null hypothesis: No
The sample size foreach sample
can be chosen to target a desiredmargin of error.
One of the inferential methods that we will explore is
confidence intervals
A significant effect
could be too small to be relevant. With alarge enough sample size, significance can be shown for tinyeffects
tdistributions are:
described by theirdegrees of freedom.denoted byt(k), wherekis the degrees of freedom.
The value of a statistic (like ̄x)
describes the particular sample thatwas drawn and estimates the parameter (likeμ). However, if another random sample is selected, the value of thatstatistic will likely be different.
The general form of confidence intervals is
estimate ± margin of error
The parameter of interest
for comparing two population proportions is their difference, p1−p2 The confidence interval will estimate p1−p2 The hypothesis test will use the null hypothesis, H0: p1−p2= 0
The standard deviation of any sampling distribution
gets narrower by a factor of√n.
Question: Which of the following is true for a given set of data?
if ONLY difference is t vs z, then: The 95% t confidence interval will be wider than the 95% z confidence interval else more information is needed to know which interval will be longer
Question:How close is the value of the statistic (or estimate) tothe parameter?
if x is large, variability / moe should shrink and thus it should be closer
4. Construct a 90% confidence interval for the mean increase in time for STEM TAs at UVa
import numpy as np import pandas as pd import scipy.stats as stats TAtime = pd.read_csv(r"") difference = Tatime.Different same = TAtime.Same difference2 = difference - same n = 38 xbar = np.mean(difference2) print(xbar) sigma = np.std(difference2, ddof = 1) print(sigma) t_star = stats.t.ppf(0.950, df=n-1) print(t_star) LL = xbar - t_star * sigma/np.sqrt(n) UL = xbar + t_star * sigma/np.sqrt(n) print(round(LL,2)) print(round*UL,2))
1. The DMS odor threshold was tested for each of a random sample of 10 beginning students of oenology: 31, 31, 43, 36, 23, 34, 32, 30, 20, and 24 μg/L. Construct a 95% confidence interval for the mean DMS odor threshold for beginning oenology students.
import numpy as np import scipy.stats as stats list = 31, 31, 43, 36, 23, 34, 32, 30, 20, 24 xbar = np.mean(list) print(xbar) sigma = np.std(list) print(sigma) z_star = stats.norm.ppf(0.975) z_star = round(z_star, 3) print(z_star) LL = xbar - z_star * sigma/np.sqrt(n) UL = xbar + z_star * sigma/np.sqrt(n) print(round(LL,1)) print(round(UL,1))
C. Conduct the appropriate hypothesis test for the validity of this statement.
import numpy as np import scipy.stats as stats n = 60 X = 32 phat = X/n p0 = 0.80 test_stat = (phat - p0) / np.sqrt(p0*(1-p0)/n) print(test_stat) pval = 2 * stats.norm.cdf(-abs(test_stat)) print(pval)
Question: Does the opinion of college students differ betweenstudents at private and public institutions?
import numpy as np import scipy.stats as stats n1 = 82 n2 = 87 x1 = 38 x2 = 43 phat1 = X1/n1 phat2 = X2/n2 phat = (X1 + X2)/(n1 + n2) test_stat = (phat1 - phat2)/np.sqrt(phat*(1-phat)*(1/n1 + 1/n2)) print(test_stat) pval = 2 * stats.norm.cdf(test_stat) print(pval) Step 4: Since P > 0.05, we fail to reject the null hypothesis Step 5: Based on the data, we have insufficient evidence to state that students at private vs. public institutions differ in their opinions on whether hate speech is protected by the first amendment
6. Create a histogram of time increase among STAT TAs due to the Stroop effect. 6A. Display your histogram.
import seaborn as sns import matplotlib.pyplot as plt sns.distplot(difference2, bins=10, kde=False, color="white", hist_kws=dict(edgecolor="gray")) plt.title("Histogram of Mean Stroop Time Increases") plt.xlabel("Time") plt.ylabel("Frequency")
Numerical summaries for binary (categorical) data
include counts and proportions - parameters are population proportions - statistics are sample proportions Example: - Parameter: Proportion of all Virginians who favor legalizing marijuana - Statistic: Proportion of Virginians in an opinion poll who favor legalizing marijuana.
parameter
is a number describing a characteristic of thepopulation. In practice we do not know its value.Remember that the population is the entire group ofinterest.
A confidence interval (CI)
is an estimate of a unknownparameter that provides an indication of variability Confidence intervals are a range of plausible values for the unknown parameter.
variability of statistic
is described by the spread of itssampling distribution. This spread depends on the sampling design and thesample size n. Generally speaking, if the sample size is increased, thevariability of a statistic decreases.
The test of significance
is designed to assessthe strength of the evidence against the null hypothesis
bias of a statistic
is determined in part by the location ofthe center (mean) of the sampling distribution.
Evidence against the nullhypothesis
is evidence for the alternative hypothesis.
Failing to reject H0
is said be a statistically insignificant result
RejectingH0
is said to be a statistically significant result
The value ofz∗
is such that C% of the area under the standardnormal curve is between−z∗andz∗ stats.norm.ppf(0.95 + 0.05/2 or stats.norm.cdf(0.975) - for c = 0.95
The p-value
is the area under the sampling distribution of valuesthat are at least as extreme as the value from the random sample.The area is in the direction ofHa
Thestandard error of a statistic
is the estimate of the standarddeviation of the statistic determined with the data
critical value
is the least extreme value ofthe test statistic that rejects the null hypothesis.
Alternative hypothesis, Ha
is the statement that we suspectis true instead of the null hypothesis.
A conservative approach
is to setp∗1=p∗2=12.
hypothesis
isan assumption or a theory about the characteristics of one or morevariables in one or more populations.
cdf =
left tail
Question:Is the observed difference in the sample means due onlyto variations from the random sampling, or does it reflect a truedifference in population means?
n1 = 105 xbar = np.mean(jumps_copy.distance) s1 = np.std(jumps_copy.distance, ddof=1) n2 = 126 xbar2 = np.mean(jumps_con.distance) s2 = np.std(jumps_con.distance, ddof=1) test_stat = (xbar1 - xbar2)/(np.sqrt(s1**2/n1 + s2**2/n2) df_min = np.min[(n1-1,n2-1)] pval = stats.t.cdf(test_stat, df=df_min) print(pval) P = 0.0000609 P < alpha (for any reasonable test), REJECT H_0 Based on our 231 data points, we have sufficient evidence to suggest that construction paper frogs jump further, on average, than copy paper frogs. If you want a frog to jump further, then make it out of constructive paper!
A. Construct a conservative 95% confidence interval for the difference in average time to complete the first round of the game between current STEM students and STEM TAs.
n1 = 38 xbar1 = np.mean(TAtime.Same) s1 = np.std(TAtime.Same) n2 = 464 xbar2 = np-mean(STtime.Round1) s2 = np.std(STtime.Round1) s2 = np.std(STtime.Round1) t_star = stats.t.ppf(0.975, df-df_min) t_star = round(t_star, 3) print(t_star) LL = (xbar1 - xbar2) - t_star * np.sqrt(s1**2/n1 + s2**2/n2) UL = (xbar1 - xbar2) + t_star * np.sqrt(s1**2/n1 + s2**2/n2) print(round(LL,1)) print(round(UL,1))
4. What is the margin of error for your confidence interval determined in activity 3.?
n2 = np.sqrt(n) m = z_star * (sigma/n2) m = round(m, 3) print(m)
The sample size can be chosen to target a desired margin of error.Theminimum sample sizeto obtain a margin of error of at mostm∗at confidence level (1−α) is
n= (z/m∗)^2 * p∗(1−p∗) where p∗ is an educated guess of the true p A conservativeapproach is to setp∗=1/2
Theminimum sample sizeto obtain a margin oferror of at mostm∗at confidence level (1−α) is
n=(z∗m∗)2(p∗1(1−p∗1) +p∗2(1−p∗2) wherep∗1andp∗2are educated guesses of the truep1andp2.
Question: How large is "large enough"?
no magic number (typically n > 40)
Question: is this interval reliable?
np^ n(1-p)
One-samplezprocedure (proportions):
one binary variable.
One-sampletprocedure (means):
one quantitative variable
One-samplezprocedure (means):
one quantitative variable
Two-sampletprocedure (means):
one quantitative variableand one categorical (binary) variable
What is the average amount of money that UVa students spend ontextbooks for this semester?Question:What is the appropriate procedure?
one sample t for means: no population std nor categorical info
The NFL will investigate if Tom Brady's footballs are inflated toless than 12.5 psi. Question: What is the appropriate procedure?
one sample t test, one quantitative variable
Do a majority of UVa upperclassmen believe that a mentorshipprogram for incoming 1st year UVa students is valuable?Question:What is the appropriate procedure?
one sample z test for proportions
Question: What is a reasonable sample statistic for p1−p2?
p1hat - p2hat
6. Determine the p-value and explain what information its value provides.
pval = 1 - stats.norm.cdf((z)) print(pval)
p-value code
pval = 2 * stat.norm.cdf(-abs(test_stat)) print(pval)
The level C confidence interval for a population proportion is
p± z∗√ˆp(1−ˆp)/n
The margin of error
reflects the precision of the estimate is calculated using the confidence levelC typically uses the confidence levelC= 0.95.
A random sample of 110 banks in selected and the mean LTDRwas found to be 76.7. The standard deviation of LTDR for allbanks is known to be 12.3. Question: Based on the calculated CI, is it reasonable to concludethat the average LTDR is less than 80 for the population?
sigma = 12.3 n = 110 xbar = 76.7 # determine z z_star = stats.norm.ppf(0.975) z_star = round(z_star, 3) print(z_star) # determine CI LL = xbar - z_star * sigma/np.sqrt(n) UL = xbar + z_star * sigma/np.sqrt(n) print(round(LL,1)) print(round(UL,1)
test statistic
sigma = 5 n = 4 mu0 = 227 xbar = 222 test_stat = (xbar - mu0)/(sigma/(np.sqrt(n)) print(test_stat)
test stat / pval code
sigma = x n = y mu0 = z # test statistic xbar = 64.2 test_stat = (xbar - mu0) / (sigma/np.sqrt(n)) print(test_ stat) p-value pval = 2*stats.norm.cdf(-abs(test_stat)) print(pval)
3.3. Using your average, z∗, and the population standard deviation σ = 3.059924, construct the 80% confidence interval for the average age of all coins in this population.
sigma = x n = y LL = xbar - z_star * sigma/np.sqrt(n) UL = xbar + z_star * sigma/np.sqrt(n) print(round(LL,1)) print(round(UL,1))
1.1. Create the population histogram of age of the coins using 11 bins. Copy and paste the graphic.
sns.distplot(coins['Age'], bins=11, kde=False, color="white",hist_kws=dict(edgecolor="black")) plt.ylabel=("") plt.xlabel = ("")
The two-sampletstatistic is
t = ( ̄x1− ̄x2) −(μ1−μ2)/(√(s1^2/n1+s2^2/n2) which approximately follows a t distribution
C. Calculate the mean of all of the sample means determined in activity A.
t = 84 + 86.8 + 87.5 + 91.6 + 86 + 82.8 + 87.6 + 81.6 + 86.6 + 84.8 mu = t / 10 mu = round(mu, 4) print(mu)
how is t* related to z*?
t* should be a little bit larger
one sample t-test moe
t*s/square root n
Example: Jumping frogsRecall the paper frog triple jump data where distances (in cm)jumped by paper frogs (copy [1] or construction [2]) were recorded. Question: What is the 95% confidence interval for the differencein average distance jumped by both types of paper frogs?
t_star = stats.t.ppf(0.975, df=df_min) t_star = round(t_star, 3) print(t_star) LL = (xbar1 - xbar2) - t_star * np.sqrt(s1**2/n1 + s2**2/n2) UL = (xbar1 - xbar2) + t_star * np.sqrt(s1**2/n1 + s2**2/n2) Conclusion: We are 95% confident that the true difference in average jumping distance between copy paper and construction paper is between 6.5 and 2.2 cm, with construction paper frogs jumping further
The larger the test statistic (in absolute value)
the further away the sample statistic is from μ0.
The general approach toinferential statisticsis the following:perform probability calculations to distinguish patterns seen in databetween
the patterns that are due tochanceandthe patterns thatreflect a real featureof the phenomenonunder study.
To compare the means from two groups,
the same responsevariable is measured for random samples from both populations.
As the sample size increases
the sampleestimates will move closer to the population parameter, on average.
Whenσis unknown,
the standard deviation of the sample mean isestimated by the standard error of the sample mean. The standard deviation of the sample mean isσ ̄x=σ/√n.Thestandard error of the sample meanisSE ̄x=s/√n.
If the p-value is equal to or less thanα,
then H0 is rejected as the data sufficiently supportsHa
If both populations are Normal andσ1andσ2are known
then the sampling distribution of ̄x1− ̄x2 is N(μ1−μ2, √(σ1^2/n1 +σ2^2/n2))
If the mean of the sampling distribution of a statistic is equalto the true value of the parameter being estimated
then thestatistic is anunbiasedestimator of the parameter.
If the p-value is greater thanα
thenH0is not rejected as thedata do not sufficiently supportH
Question: What is the relationship between the critical values of a two-sided hypothesis test with a 5% significance level and the confidence level multiplier for a 95% confidence interval?
they are the exact same
11. An experiment was conducted to record the jumping distances of paper frogs made from construction paper. Based on the sample, the corresponding 95% confidence interval for the mean jumping distance is (8.8104, 11.1248)cm. What is the corresponding 98% confidence interval for the mean jumping distance? Note: It is reasonable to assume that the sample size is large enough that the Central Limit Theorem holds.
xbar = (8.8104 + 11.1248)/2 xbar = round(xbar, 4) print(xbar) z_star = stats.norm.pprf(0.99) z_star = round(z_star, 3) print(z_star) moe = 11.1248 - 8.8104) / 2 / 1.96 * 2.326 moe = round(moe, 4) print(moe) LL = xbar - moe LL = round(LL, 4) UL = xbar + moe UL = round(UL,4) print(LL) print(UL)
Suppose that a random sample of 25 students is selected. Question: What is the probability that the sample mean of thissample is less than$350?
xbar = 350 z = (xbar - mu) . (sigma/np.sqrt(25)) p1 = stats.norm.cdf(z)
Suppose that a random sample of 50 students is selected. Question: What is the probability that the sample mean of thissample is less than$350?
xbar = 350 z = (xbar - mu) . (sigma/np.sqrt(50)) p2 = stats.norm.cdf(z)
3.1. Determine the average age. State your average using the notation for sample means.
xbar = np.mean(samp_coins) xbar = round(xbar, 4) print(xbar)
confidence interval code
xbar = x sigma = y n = z sigma_xbar = sigma/np.sqrt(n) LL = xbar = 1.96 * sigma_xbar UL = xbar + 1.96 * sigma_xbar print(round(LL,1)) print(round(UL,1))
The confidence interval for population mean is
x±z∗σ√n.
Question: Is the test conclusion reliable?
yes, because or sample size is large enough
To assess how far the estimate is from the value under the null hypothesis, the estimate is standardized. Generally,
z = estimate−hypothesized value / standard deviation of the estimate
Question: What are the critical values of a two-sided hypothesis test with a 5% significance level?
z = stats.norm.ppf(0.975) = 1.96
In the previous unit, the test statistic was defined as a measure ofhow far the sample statistic is from the the null value. For the testof a population mean, that measure is
z= ( ̄x−μ0) / (σ/√n)
When the approximation is valid, the z statistic is
z= ˆp−p/√p(1−p)/n
The test statistic is
z= ̄x−μ0 / (σ/√n)
The two-samplezstatistic
z=( ̄x1− ̄x2)−(μ1−μ2)/√(σ1^2/n1+ σ2^2/n2)
3.2. Determine the appropriate value of z∗ for an 80% confidence interval.
z_star = stats.norm.ppf(0.90) z_star = round(z_star, 3) print(z_star)
Question: With 95% confidence, how different are the proportionsof college students at private and public institutions who believethat the First Amendment protects hate speech?
z_star = stats.norm.ppf(0.975) z_star = round(z_star, 3) print(z_star) LL = (phat1 - phat2) - z_star * np.sqrt(phat1*(1-phat1)/n1 + phat2*(1-phat2)/n2) UL = (phat1 - phat2) + z_star * np.sqrt(phat1*(1-phat1)/n1 + phat2*(1-phat2)/n2) print(round(LL, 3)) print(round,UL,3))
Question: what is the margin of error?
z∗σ√n