STAT EXAM 2
What You Should Say (Continued)
"I am 90% confident that the true mean amount that students sleep is between 6.272 and 7.008 hours per night."
Illustration of Errors when H0:μ = μ0 vs. Ha:μ > μ0
(a) when the null hypothesis H0:μ = μ0 is true (b) when the alternative hypothesis Ha:μ = μa > μ0 is true. The value μa is chosen to represent a practically important effect
Two sample C-level confidence interval for 𝜇1 − 𝜇2
(for unknown non-equal population variances) • C is the area between −t* and t* • Find t* in the t-table for the computed (or estimated) degrees of freedom or invT in TI83+ moe and CI equations
Recap: The Steps of a Significance Test
1. Assumptions Randomization, sample size, population distribution. 2. Hypotheses 𝐻0, 𝐻𝑎: one-sided or two-sided 3. Test Statistic Sampling distribution 4. P-value Depends on the form of 𝐻𝑎 5. Conclusion Compare P-value with pre-assigned 𝛼𝛼 and interpret
Recap of Confidence Interval of p
Approximate sampling distribution 𝑝 ̂ ≈ 𝑁( 𝑝, sqrt 𝑝 (1 − 𝑝) ⁄𝑛 • When 𝑝 is unknown, sqrt 𝑝 (1 − 𝑝) ⁄𝑛 is unknown • use p^ instead of p
Paired Data: Assumptions and Conditions
Assumption • Randomization of data • For paired data, find the differences to remove the subject effect. • The differences are independent. • Need to assume the differences follow a Normal model. The twosample t-test and the paired t-test are not interchangeable. Their probability distribution models are not the same. Nearly Normal Condition: • Even if the individual measurements are skewed, bimodal or have outliers, the differences may still be Normal. • Sketch a histogram and normal probability plot of the differences. • Normality less important for larger sample sizes.
Factors Affecting Margin of Error
Confidence Interval: (estimate - moe, estimate + moe) • The margin of error for normal population: z × 𝜎/sqrt 𝑛 : - increases as the confidence level increases - decreases as the sample size increases These properties apply to all confidence intervals, not just the one for the population mean
One major reason that the two-sample t procedures are widely used is that they are quite robust
Confidence levels and P-values from the t procedures are quite accurate even if the population distribution is not exactly Normal. The two-sample t statistic is most robust when both sample sizes are equal AND both sample data distributions are similar. ⇒ T distribution will work for as small as 𝑛𝑛1 = 𝑛𝑛2 = 5.
Point Estimate and Interval Estimate
Estimator = Statistic • A point estimate is a single number that is our "best guess" for the parameter, i.e. predicts a parameter by a single number (e.g. p � for p, x� for μ, s for σ). • An interval estimate is an interval of numbers that are believable for the parameter
Why do we use a z-score from a normal distribution in constructing large-sample confidence intervals for a proportion?
For large random samples the sampling distribution of the sample proportion is approximately normal.
Test of significance of p1 - p2
Hypotheses: H0: p1 = p2 vs Ha: p1 > p2 (or < , or ≠ depends on theory or belief ) • If H0 is true, p1 = p2 ≡ p0 we are sampling twice from the same population and we can pool the information from both samples to estimate the common pop. proportion, p0. • The pooled sample proportion is 𝑝 ̂ = total successes/ total observations = 𝑋1+𝑋2 /𝑛1+𝑛2 If 𝐻0: 𝑝1 = 𝑝2 (or, 𝑝1 − 𝑝2 = 0) is true and all sample counts (successes and failures in each sample) are 5 or more Test Statistic eq
Uncertainty and Confidence
If you picked different simple random sample (SRS) of same size from a population, you would probably get different sample proportions and almost none of them would actually equal the true population proportion that you are interested. • However, for a normal probability model, with all the possible intervals computed using all the possible sample proportions, we know 95% of those intervals will capture the unknown population proportion 𝑝 and 5% of those intervals will not capture 𝑝 • Even though you really don't know if your computed interval contain the unknown 𝑝𝑝 or not, you have the CONFIDENCE that this method can give us correct results 95% of time and wrong results 5% of time.
CI When normal with σ unknown
In practice, we don't know the population standard deviation. Use the sample standard deviation S to estimate σ S=sqrt S^2 SD (𝑋bar) = 𝜎/sqrt 𝑛 is replaced by 𝑠/sqrt 𝑛 • SE (𝑋bar) = 𝑠/sqrt 𝑛 ; standard error of 𝑋bar. *estimator of SD(Xbar) Normal model with 𝑠/sqrt 𝑛 for 𝜎/sqrt 𝑛 in 𝑋bat−𝜇/ (𝜎⁄sqrt 𝑛) has some error. • W. Gosset: t-distribution for the sampling distribution of sample mean for the normal population with unknown standard deviation, which depends on sample size n.
When p is too big or too small
In the case of the confidence interval for a population proportion, the method works poorly for small samples because the normal approximation model no longer applies. • Modify the formula if the normal approximation fails: Apply the "plus four" method to modify the sample proportion and use the same confidence interval formula The "plus four" method gives reasonably accurate confidence intervals. We act as if we had four additional observations, two successes and two failures. Thus, the new sample size is n + 4 and the count of successes is X + 2. • The "plus four" point estimate (modified sample proportion) of p is: p~= X+2/n+4 p~+/-z x sqrt p~ (1-p~)/n+4
Example Does smoking damage the lungs of children exposed to parental smoking? Forced Vital Capacity (FVC) is the volume (in milliliters) of air that an individual can exhale in 6 seconds. FVC was obtained for a sample of children not exposed to parental smoking and for a sample of children exposed to parental smoking.
Is the mean FVC lower in the population of children exposed to parental smoking? • Hypothesis: H0: µsmoke = µno ⇔ (µsmoke − µno) = 0 Ha: µsmoke < µno ⇔ (µsmoke − µno) < 0 (one-sided) • In Table, for df 40, we find: |t| > 3.551 ⇒P < 0.0005 (one-sided) • Software gives P =P(t<-3.92)=0.00014, highly significant ⇒ We reject H0 • Conclusion: Lung capacity is significantly impaired in children exposed to parental smoking, compared with children not exposed to parental smoking.
Which of the following statements about t distribution are true?
It approaches the standard normal distribution as the sample size increases. t assumes the population data is normally distributed. t is used to construct confidence intervals for the population mean when population standard deviation is unknown
What does the margin of error in a confidence interval for a population mean describe
It describes the difference between the sample mean and the population mean. Due to the random sampling method, different sample means will be different.
extra eq
LL=Xsample mean-E UL= Xsample mean+E E= UL-LL/2 Xsample mean= UL+LL/2 moe= =t* x SE
Interpretation of the Confidence Level
Meaning of being 95% confident • a long-run interpretation - how the method performs when used over and over with many different random samples. • We use the 95% confidence interval method to obtain a confidence interval for a population parameter knowing this method gives correct results 95% of the time and wrong results 5% of the time in the long run. A confidence interval is an interval containing the most believable values for a parameter
P-value was P = 0.03,
No, P = 0.03 means that the probability, assuming the null hypothesis is true, that the test statistic will take a value at least as extreme as that actually observed is 0.03.
Explain why we cannot safely use either the large-sample confidence interval or the test for comparing the proportions of normal and altered mice that develop tumors.
One of the counts is 0. For large-sample intervals, all counts must be at least 10, and for significance testing, all counts must be at least 5
calc CI and significance test
STAT-> TESTS-> 2-PropZInter STAT-> TESTS-> 2-PropZTest
Point Estimation
Suppose 𝑋 : E (𝑋) = 𝜇, SD (𝑋) = 𝜎 • Our Interest: (unknown) population mean 𝜇 • 𝑋1, 𝑋2,..., 𝑋𝑛𝑛: 𝑋bar • 𝑋bar : an estimator for 𝜇 • 𝜇?, 𝑥1, 𝑥2,..., 𝑥𝑛 → x bar
If you were to construct a 95% confidence interval to estimate μw - μnw, which one of the following about this confidence interval is true?
The confidence interval wouldn't contain 0 because the p-value is less than 0.05.
Which of the following is an appropriate description of the significance level of a test?
The proportion of the time we expect that we'll reject null hypotheses when we really shouldn't.
significance level of a test?
The proportion of the time we expect that we'll reject null hypotheses when we really shouldn't.
Interpretation of Confidence Interval graph
blue lines show pop mean u -red dont - number of blue lines shows overall success rate of this method - one specific interval range may or may not capture true pop mean
Confidence Interval for p
both 𝑛p ≥ 10 and 𝑛( 1 − 𝑝) ≥ 10 both 𝑛𝑝 ̂≥ 15 and 𝑛( 1 − 𝑝 ̂) ≥ 15 95% Confidence Interval for p (p^-1.96 sqrt(p^ (1-p^)/n), p^+1.96 sqrt(p^ (1-p^)/n)) use p^ instead of p SD(p^)= sqrt(p (1-p)/n) SE(p^)= sqrt(p^ (1-p^)/n)
narrowest/most informative CI
dec CI and inc n
response explanatory
dependent independent
Large-sample CI for estimating 𝑝1 − 𝑝2
estimator +/- Z score x SD z* : the z-score corresponding to the middle area of C under Z-curve the unknown p1 and p2 replaced with 𝑝𝑝 ̂ 1 and 𝑝𝑝 ̂ 2 respectively. -estimator ±z-score×SE(est) • Use this method when the number of successes and the number of failures are each at least 10 in each sample
How should we interpret the standard error in the context of this study?
in repeated samples of the same size, the standard deviation of all sample proportions is estimated to be about 0.08.
point est v. interval est
point estimate only tells us the best guess for the parameter; an interval estimate also gives us a measure of how far away that guess is likely to be from the parameter
difference between a point estimate and an interval estimate?
point estimate only tells us the best guess for the parameter; an interval estimate also gives us a measure of how far away that guess is likely to be from the parameter.
t distribution for 2 independent samples with 𝜎𝜎1 = 𝜎𝜎2
s pooled and t pooled equations • hard to check the assumption of equal variance • the unequal variance test is safer and only discuss the method assuming unequal population variance
p hat p
sample population
CI
sample prop must lie in it but unsure if pop prop will random and success and failures both greater that 10/15 dont use the word chance; largest/ smallest CI that will or will not incude a P value (ex: p value is 0.052, 94% is largest because 0.06 is bigger than 0.052 and 95 is smalles because 0.05 )
heir margin of error for estimating the proportion of schools that teach coding was 0.03. What does that margin of error represent?
sample proportion (0.75) falls within 0.03 of the population proportion of schools teaching coding.
The sampling distribution of 𝑝^ 1 − 𝑝^2
sampling distribution of p^1-p^2 SD- sqrt p1(1-p1)/n1+ p2(1-p2)/n2 mean p1-p2
uncertainty of point estimate
standard error
test statistic
t= 𝑥̅ 𝑑−𝜇𝑑/ 𝑠𝑑⁄ sqrt n
How to use t-table.
tail area= 1-C/2 read table
The power of the test against the specific alternative μ = 3 is which of the following?
the probability that the test rejects H0 when μ = 3 is true reject Ho|Ho is true
highly significant, though the difference is quite small. The explanation is that
the sample size is very large.
How to compare two groups
two categorical variables -independent- p1-p2 -dependent- McNemar method two quantitative variables -independent- u1-u2 -dependent-Estimate or Test on mean of difference µd
Exercise. A hypothesis test has been conducted at the 0.05 significance level, resulting in a p-value of 0.25. Obviously, in this case, we would fail to reject H0. If an error was made, it would be a(n):
type II
Which of the following describes the meaning of the significance level α for a test of the hypotheses: H0: μ = 2 versus Ha: μ > 2 ? The power of the test in (a) when the true value of the population mean is μ = 5 is
α is the probability of selecting a sample whose data leads us to the decision to reject H0 when in reality μ = 2 is true. The probability of selecting a sample whose data leads us to the decision to reject H0 when μ = 5 is true.
Two-sample t test comparing 𝜇1 and 𝜇2
• Assume necessary conditions are satisfied. • With 2 independent random samples we test H0: µ1 = µ2 (or µ1 − µ2 = 0 ) Test Statistic equations
Reducing Both Type I and II Errors
• Trade-off between 𝛼𝛼 and 𝛽 Q. How can we reduce both? - Increase the sample size! - Standard deviation goes down, the spread is narrower. - The "overlap" parts of the two sampling models is reduced. - Both α and β are reduced.
(contd.) Using the TI 83+/84+ calculator: a CI for 𝝁𝟏 − 𝟐𝟐
• Use TI83+/84+ , STAT TESTS 0: 2-SampTInt...
When the P-Value is Not Small
• Wrong: Accept H0 or We have proven H0. • Right: Fail to reject H0 There is insufficient evidence to reject H0. H0 may or may not be true. • Example: H0: All swans are white. If we sample 100 swans that are all white, there could still be a black swan. • A statistically non-significant result does not 'prove' the null hypothesis. • A statistically significant result does not 'prove' the alternative hypothesis.
Significance level
• 𝛼𝛼 = 𝑃𝑃(reject H0 | H0 is true) = 𝑃𝑃(Type 1 error) • 𝛼𝛼 = 0.05: 5% risk of concluding that a difference exists when no actual difference. • 𝛼𝛼 is the largest P-value tolerated for rejecting H0. • an arbitrary threshold that can be used when reporting whether a P-value is statistically significant. • P-value ≤ 𝛼𝛼: we say that the effect is statistically significant at level 𝛼𝛼. • Set 𝛼𝛼 = 5% and repeat analyses using multiple samples of the same size 𝑛𝑛 from the same population, we can expect to reject a true null hypothesis 5% of the time. • If you try the same analysis with 100 random samples of same size 𝑛𝑛, you can expect about 5 of them to be statistically significant (reject 𝐻𝐻0 even if 𝐻𝐻0 is true)
2. The Steps of Hypotheses Test
① Validity of Assumptions ② Hypotheses ③ Test Statistic ④ P-value ⑤ Conclusion
Correct Decisions and Type I, Type II errors
𝐻0 is rejected & true Type I Error, P(Type I Error) = 𝛼 = 𝑃(reject H0 | H0 is true) 𝐻0 is rejected & false (Ha is true) Correct Decision, P(reject H0 | H0 is false) = 1 − 𝛽 = Power of the test 𝐻0 is not rejected & true Correct Decision, 𝑃(fail to reject H0 | true H0 ) = 1 − a 𝐻0 is not rejected & false (Ha is true) Type II Error, P(Type II Error ) = 𝛽 =P(fail to H0 | H0 is false).
The margin of error (MOE), m
𝑚 = 𝑧 x sqrt p^(1-p^)/n , (z=1.96) margin of error a sampling error that occurs when we use a sample statistic to estimate an unknown population parameter.
P-value and statistically significant
• P-value is a probability summary of the evidence against the H0 • The small P-values say that the observed result would be unlikely to occur if H0 is true • The small values are evidence against H0. • Small values indicates that he data are inconsistent with H0, so we should reject H0. • We say that the results are statistically significant.
P-value and "Unusual, Rare Enough or Unlikely"
• P-value=P(TS has at least as extreme as observed value | 𝐻𝐻0 ): - The P-value is the probability of getting results as unusual as observed given that 𝐻𝐻0 is true. • P-value ≠ P(𝐻𝐻0 is true | observed value of TS): - The P-value never gives a probability that 𝐻𝐻0 is true. • P-value = 0.03 - Not mean a 3% chance of 𝐻𝐻0 being correct. - But says that if 𝐻𝐻0 is correct, then there would be a 3% chance of observing a statistic value like what was observed or even more extreme values (or at least as extreme as ) that could be observed.
Confidence Interval for 𝜇𝜇 (σ unknown)
• Population is 𝑁 (𝜇, 𝜎) • 𝜎 is unknown and estimated by S. • C is the area between critical values −t* and t*. • We can find t* from t-Table. • The margin of error: 𝒎𝒎 = 𝒕*𝒏−𝟏 ∗ 𝒔/sqrt 𝒏 • The CI for µ : 𝒙bar ± 𝒕*𝒏−𝟏 ∗ 𝒔/sqrt 𝒏 • The critical value t* depends on - the confidence level, C, and - the degrees of freedom n - 1
Exercise Suppose that a device advertised to increase a car's gas mileage actually does not work. We test it on a fleet of cars (with H0: not effective), and our data results in a P-value of 0.004. With 5% level of significance , what probably happens as a result of our experiment?
C. We reject H0, making a Type I error
1. Confidence Interval vs. Hypotheses Test
Confidence Intervals • Start with sample data and find a range of plausible values for the population parameter. • Always 2-sided (estimate-moe<parameter<estimate +moe) No preconceived notion of what our parameter should be; Simply want to estimate it. Hypotheses Test • Start with a proposed population parameter value and then use the sample data to see if that value is not plausible. - 1-sided test - 2-sided test
Step 2: Set up hypothesis H0 and Ha
Hypothesis is a statement about population, not sample Each test has two hypotheses: • H0 : the null hypothesis - usually represents there is no effect, or no difference - a statement we want to disprove(or reject). - States that the parameter takes a single particular value. • Ha : the alternative hypothesis - states that something has changed, or has an effect . - a statement we hope the data will support. - the value in Ha usually represents an effect of some type. - States parameter falls in some alternative range of values. The hypotheses are formed before collecting data
What affects the power of a significance test? bean
Power = 1 - B beta(𝛽): the probability of Type II error • effect size (|𝜇0 -𝜇𝐴| or |𝜇0 -𝜇𝐴|/σ standardized e.s.): The size of the specified effect divided by σ - larger effects size are easier to detect - larger σ leads to lower power • alpha (α) : significance level; the probability of Type I error - a lower α yields lower power. • n: sample size n - larger samples have narrower sampling distributions and thus greater power.
The power, 𝛼 and 𝛽 and their relationship
• Power = P(reject H0 | H0 is false)= 1 - 𝛽. • The power of a test is its ability to detect a specified effect size (reject H0 when a given Ha is true) at significance level α. • Reducing 𝛼 moves the critical value p* to the right. This increases 𝛽 and decreases the power. • The larger difference between p and p0, the smaller chance of Type II error and greater the power.
Confidence interval vs Confidence level
A confidence interval is an interval containing the most believable values for a parameter. • Confidence level, C, is the probability that this method produces an interval that contains the true parameter. (i.e. The confidence level is the success rate for the method.) • C is a number chosen to be close to 1, most commonly 0.95. • We have confidence C that µ falls within the interval computed.
Confidence Interval with known 𝜎𝜎
An interval of possible values of an unknown parameter, within the interval the value of a parameter has a stated probability of occurring, or confidence level. • Population is normal with known σ, X~N (μ, σ) 95% confidence interval for 𝜇: (𝑋bar − 1.96 𝜎/sqrt 𝑛 , 𝑋bar + 1.96 𝜎/sqrtn)
Confidence Interval for not normal pop.
Cases to be considered: 1. Population is not normal with 𝜎 known 2. Population is normal but 𝜎 not known . Case 1. Apply the Central Limit Theorem - The (approximate) confidence interval for the population mean µ is 𝑥̅± 𝑧∗( 𝜎/sqrt 𝑛) . - The larger the sample size, the better the approximate confidence interval • Case 2. Use t-distribution.
Conditions for inference comparing two means
Conditions • Two simple random samples, representing two distinct populations. • The samples are independent. • We measure the same quantitative variable for both samples. Theory • Symbol 𝑋𝑖~𝑁(𝜇𝑖, 𝜎𝑖), 𝑖 = 1,2 Goal • Confidence interval for 𝜇1 − 𝜇2 • Testing Hypotheses 𝐻0: 𝜇1 − 𝜇2 = 0 vs. 𝐻𝑎: 𝜇1 − 𝜇2 ≠ 0
Exercise Suppose that a manufacturer is testing one of its machines to make sure that the machine is producing more than 97% good parts (H0: p = 0.97 and HA: p > 0.97). The test results in a P-value of 0.12. In reality, the machine is producing 99% good parts. What probably happens as a result of our testing?
E. We fail to reject H0, making a Type II error
How to find point estimate and margin of error from CI
For symmetric Confidence Intervals, - The margin of error is the half width of CI - The point estimate is the midpoint of the CI Given a CI is (𝑎, 𝑏) m= b-a/2, p^= a+b/2
Sample mean and margin of error from CI
For symmetric Confidence Intervals, - The margin of error is the half width of CI - The point estimate is the midpoint of the CI Given a CI is (𝑎, 𝑏) , 𝑚 = 𝑏−𝑎/ 2 (half width of the CI) ; 𝑥̅= 𝑎+𝑏/ 2 (midpoint of the CI)
The margin of error (𝑚)
How accurate we believe our guess is, based on the variability (SD) of the estimator Generic CI: (estimate - 𝑚, estimate +𝑚) • Margin of error depends on - the level of confidence and - the standard deviation of the sample statistic.
Choice of Sample Size
Need a certain margin of error when estimating an unknown population parameter (e.g., drug trial, manufacturing specs). • The population variability is fixed, but we can choose the number of measurements (sample size n) for the desired margin of error. 𝑚 = 𝑧∗ 𝜎/sqrt 𝑛 ⇔ 𝑛 = (𝑧∗𝜎/ 𝑚)^ 2 , always round up! - The critical value 𝑧∗ depends on the desired confidence level. - Not use the 𝑡∗critical value because of its dependence on n.
REVIEW: t procedures
One-sample t procedure Population parameters µ and σ unknown. One sample summarized by its mean x̅and standard deviation s (Inference about u) Two-sample t procedure Population parameters µ1 , σ1 , µ2 , σ2 unknown. Two independent samples (unrelated individuals in the two samples). We summarize each sample separately ( Inference about u1-u2) Matched pairs t procedure Population parameters 𝜇diff and 𝜎diff unknown. Two paired datasets (from a matched-pairs design). From the n pairwise differences we compute (.xbar diff, s diff) (inference about ud)
Interpretation of Confidence Interval
P(A)=0.95= #(𝐴)/ #(trials) x bar and u interchangeable to find CI Interpretation: - 95% of all possible sample means falls within 1.96 SD (𝜎⁄ sqrt𝑛) of the population mean 𝜇. - 95% of all intervals computed with this method capture μ.
Comparing Pairs and No Pairs
Pairing • With pairing husbands and wives, there was a clear difference in ages. • 95% CI: (1.6, 2.8). Without Pairing • Without pairing, this would not have been as conclusive. • From the box plots the difference is hard to detect. (side-by-side box plots are not appropriate for displaying paired data) Note • DON'T look for the difference between the means of paired groups with sideby-side box plots. • Comparing box plots is likely to be misleading.
Confidence intervals formula with MOE
Parameters could be means, proportions, variances, standard deviations, regression slopes, and others. • Most of the confidence interval (CI) has the form. point estimate ± margin of error (p^-MOE, p^ +MOE) Not all CI are symmetric about the parameter. Some complex methods produce asymmetric intervals
Pros and Cons of Pairing
Pros • Can significantly reduce variability by focusing just on what is being compared. • Pairing is an example of effective blocking. Cons • Fewer degrees of freedom. • Each pair considered as a single data value instead of two values. The advantage outweighs the disadvantage, so consider pairing if possible.
3. Significance test for population mean µ
Recall when population is normal: ① If σ is known, 𝑧 = 𝑥̅−𝜇/ σ⁄ 𝑛 ~ 𝑁(0,1), the test procedure is called Z-test. ② If σ is unknown, t = 𝑥̅−𝜇 /𝑠⁄ 𝑛 ~ 𝑡𝑛−1, the test procedure is called T-test
The t distributions
Recall 𝑋~𝑁 (𝜇, 𝜎) with known σ When σ is unknown and estimated by S, the statistic 𝑇 ≡ 𝑋bar−𝜇/ (𝑆⁄ sqrtn) follows the (Student) t distribution with n − 1 d.f. ~ 𝑡( 𝑛 − 1) Bell shaped and symmetric about 0. • The probabilities depend on the degrees of freedom, 𝑑f = 𝑛 − 1, here 𝑛 is the sample size. t-dsitribution ⇒ Z distribution as n ↑ ∞ • Has thicker or heavier tails than the standard normal distribution, i.e., it is more spread out. • Less peaked than the standard normal distribution, i.e., less dense at the center part. • Margin of error= (t -score) × 𝑆/sqrt 𝑛 . Note: when 𝑧∗ (𝜎/sqrt 𝑛) use when pop normal and known SD
The sampling distribution of 𝑿bar𝟏 − 𝑿bar𝟐 when 𝜎𝑖 known
Sample • 𝑛𝑛𝑖𝑖, 𝑋bar𝑖, 𝑆𝑖, 𝑖 = 1, 2 the 𝑖th sample size, sample mean, and sample standard deviation E(𝑋bar 1 − 𝑋bar2 = 𝜇1 − 𝜇2 SD= sqrt (s1^2/n1+s2^2/n2) • non-normal populations with large enough 𝑛𝑖, approximately normal by CLT.
The Paired t-Test
Significance test of the mean differences • Assume the conditions are met. • 𝐻0: 𝜇𝑑 = 0 vs 𝐻𝑎: 𝜇𝑑 ≠ 0 ( or > 0, or < 0 ) • Test statistic 𝑡 = 𝑥̅ 𝑑−0 /(𝑠𝑑⁄ sqrt𝑛) ~ 𝑡𝑛−1 𝑥̅ 𝑑: the sample mean of the pairwise differences sd : the sample standard deviation of the pairwise difference s n : the number of pairs. • Find the P-value using the t-distribution model with n-1 df
Statistical Inference.
Statistical procedure (methods) to draw conclusions about a population from sample data. Two Areas 1. Estimation • Point Estimation (Standard Error) • Interval Estimation (confidence level) 2. Testing -Complementary Hypotheses (Significance level or P-value) Using the sampling distribution of statistics, express how trustworthy our conclusions are
Using TI83+/84+ Calculator to find P-value when given a value of test statistic t and the alternative hypothesis
Suppose Ha: μ ≠ μ0 and n = 10, TS t = 2.70, then P-value = 2×P(T > 2.70) = 2× tcdf(2.70, 1E99, 9) = 0.0244 Suppose Ha: μ > μ0 and n = 10, TS t = 2.70, then P-value = P(T > 2.70) = tcdf(2.70, 1E99, 9) = 0.0122 Press 2nd, VARS, select tcdf( Enter lower bound, upper bound, degree of freedom
Logic of confidence interval test
Suppose we found a 99% confidence interval for the true mean bacterial density is 𝑥𝑥̅± 𝑚𝑚 = 28 ± 1.5 or 26.5 to 29.5 million bacteria/ per ml. With 99% confidence, could the population mean be µ =25? or µ = 29? Cannot reject H0 Reject H0 : µ = 29 : µ = 25 A confidence interval gives a range of likely values for the true population mean µ and can also gives a black and white answer: Reject or don't reject H0 in a two-sided test for µ . A P-value quantifies how strong the evidence is against the H0. But if you reject H0, it doesn't provide information about the true population mean µ.
P-value for a one- or two-sided alternative when testing the population proportion p
The P-value is the probability, if H0 was true, of obtaining a test statistic like the one computed or more extreme in the direction of Ha. Use z-table or TI-calculator function "nomalcdf( )" to find the tail probability need to double the tail probability for two-sided alternative hypothesis or, use "1-PropZTest" when data is provided
What value for the confidence level C?
The confidence level C can be 90%, 96%, 99%, etc. It doesn't have to be 95% although it is a common value for level of confidence. .9= p^+/- 1.635xSE(p^) .99= p^+/- 2.576xSE(p^)
Deciding 1 or 2 sided Hypotheses
The direction of alternative hypothesis depends on the context of the real problem. - one-sided: 𝐻a: 𝜇 > 𝜇0 (upper) or 𝐻a:𝜇 < 𝜇0(lower) - two-sided: 𝐻a: 𝜇 ≠ 𝜇0
Effects of Confidence Level and Sample Size on Margin of Error
The margin of error depends on • - Confidence level : increases as the confidence level increases − Standard error : decreases as the sample size increases. • A 99% confidence interval is wider than a 95% confidence interval. • A confidence interval with 200 observations is narrower than one with 100 observations at the same confidence level. • These properties apply to all confidence intervals m=x x SE (p^)
Objectives: Compare two groups of quantitative variables
Two population means, 𝜇1 and 𝜇2 Two independent sample (Sampling distribution of 𝑋bar 1 − 𝑋bar2 ) - Making statistical inferences by Confidence interval for (𝜇1−𝜇2) (difference of two population means) Significance test for (𝜇1−𝜇2) Matched-pairs data: Two dependent samples (Sampling distribution of 𝑋bar𝑑) Making statistical inferences by Confidence interval for 𝜇𝑑 (population mean of differences) Significance tests for 𝜇d
Long Run versus Subjective Probability Interpretation of Confidence
Warning: You might be tempted to interpret a statement such as "We can be 95% confident that the population proportion p falls between, say, 0.42 and 0.48" as meaning that the probability is 0.95 that p falls between 0.42 and 0.48. However, probabilities apply to statistics (such as the sample proportion), not to parameters (such as the population proportion). The 95% confidence refers not to a probability for the population proportion p but rather to a probability that applies to the confidence interval method in its relative frequency sense: If we use it over and over for various samples, in the long run we make correct inferences (i.e., cover the true parameter) 95% of the time. Once a confidence interval is computed based on observed sample statistic from a random sample, it either includes the true parameter or it doesn't. It doesn't make sense to say 'probability' that the parameter value is within this particular CI.
Choosing a Sample size for Estimating p
You may need to choose a sample size large enough to achieve a specified margin of error. • Because the sampling distribution of 𝑝𝑝 ̂is a function of the unknown population proportion p this process requires that you guess a likely value for p: 𝑝∗ 𝑛 = (𝑧∗ /𝑚)^ 2 𝑝∗ (1 − 𝑝∗) 𝑧∗ is a critical value depends on the desired confidence level C p* is a guessed value for population proportion Make an educated guess, or use 𝒑∗ = 𝟎. 𝟓 (the most conservative estimate).
Independent samples or Dependent samples?
two independent A sample of 64 students from the University of Utah. Students were randomly assigned to a cell phone group or to a control group, 32 to each. On a simulation of driving situations, a target flashed red or green at irregular periods. Participants pressed a brake button as soon as they detected a red light. The control group listened to radio or books- on- tape while they performed the simulated driving. The cell phone group carried out a phone conversation about a political issue with someone in a separate room. The experiment measured each subject's mean response time over many trials. two dependent A sample of 32 students from the University of Utah. On a simulation of driving situations, a target flashed red or green at irregular periods. Participants pressed a brake button as soon as they detected a red light. Each participant is also his or her own match. In a randomly assigned order, participants either listened to radio or books- on- tape while they performed the simulated driving or carried out a phone conversation about a political issue with someone in a separate room. Reaction times for the sample subjects using and not using cell phones are recorded.
Exercise Geckos are lizards with specialized toe pads that enable them to easily climb even slick surfaces. Researchers want to know if male and female geckos differ significantly in the size of their toe pads. In a random sample of Tokay geckos, they find that the mean toe pad area is 6.0 cm2 for the males and 5.3 cm2 for the females
umale-ufemale=0 Should the alternative hypothesis be one-side or two-sided? 2
Interpreting CI for Comparing Two Proportions
• (- , +): If the confidence interval includes zero, then it is plausible (but not necessary) that p1 = p2 . There is no evidence of a significant difference between the two proportions from the two independent populations. • (+, +): If the confidence interval does not include zero, and the values in the interval are positive (a, b), then it is plausible that p1 is between a more and b more than p2. (for 1st group minus 2nd group) • (-, -): If the confidence interval does not include zero, and the values in the interval are negative (-a, -b), then it is plausible that p1 is between b less and a less than p2. In other words, p2 is between a more and b more than p1. • The magnitude of values in the confidence interval tells you how large any true difference is. If all values in the confidence interval are near 0, the true difference may be relatively small in practical terms.
Interpret a CI for 𝜇1 − 𝜇2
• (-, +) Check whether 0 falls in the interval. When it does, 0 is a plausible value for 𝜇𝜇1 − 𝜇𝜇2 , meaning that it is possible that 𝜇𝜇1 is equal to 𝜇𝜇2. • (+, +) A confidence interval for 𝜇𝜇1 − 𝜇𝜇2 that contains only positive numbers suggests that 𝜇𝜇1 − 𝜇𝜇2 is positive. We then infer that 𝜇𝜇1 is larger than 𝜇𝜇2 . • (-, -) A confidence interval for 𝜇𝜇1 − 𝜇𝜇2 that contains only negative numbers suggests that 𝜇𝜇1 − 𝜇𝜇2 is negative. We then infer that 𝜇𝜇1 is smaller than 𝜇𝜇2 .
Statistical Significance vs Practical Significance
• A downside of hypothesis testing is that we may get a statistically significant outcome that has little or no practical importance. • Statistical significance says whether the effect observed is likely due to chance or some factor of interest. • Statistical significance may not be practically important. Example: A drug to lower temperature is found to consistently lower a patient's temperature by 0.4° Celsius (P-value < 0.01). But clinical benefits of temperature reduction require a 1°C decrease or greater.
Step 3: Test Statistic (T.S.)
• A test statistic (T.S.) describes how far the point estimate falls from the parameter value given in H0 (usually in terms of the number of standard errors between the two) . • If the test statistic falls far from the value suggested by H0 in the direction specified by Ha, it is evidence against H0 and in favor of Ha. • We use the test statistic to assesses the evidence against the null hypothesis by giving a probability, the P-Value.
The role of sample size in statistical significance
• All else being equal, the larger the sample size is, the larger the absolute calculated value of the test statistic is. Example: 𝑡 = 𝑥̅−𝜇0/ 𝑠⁄ sqrt𝑛 = 𝑥̅−𝜇0/ 𝑠 x sqrt𝑛 • A small absolute difference between two hypothesized means, for example, is more likely to be reported as statistically significant if it is based on a sample n of 1000, say, than on a sample n of 10.
t distribution for 2 independent samples with 𝜎1 ≠ 𝜎2
• Both populations should be Normally distributed . • In practice, it is enough that both distributions have similar shapes and that sample data contains no extreme outliers. = 5 equations
4. Two-Sided Tests vs. Confidence Intervals
• Conclusions about population means using two-sided significance tests are consistent with conclusions using confidence intervals (from the same sample data) . • Two-sided significance test with significance level 𝛼𝛼 rejects a null hypothesis 𝐻𝐻0: 𝜇𝜇 = 𝜇𝜇0 when the value 𝜇𝜇0 falls outside a level 1 − 𝛼𝛼 confidence interval for 𝜇𝜇. |𝑥̅−𝜇0 /𝑆⁄ sqrt𝑛| > 𝑡𝑛−1;𝛼/2 ⇔ 𝜇0 > 𝑥̅+ 𝑡𝑛−1;𝛼/2 × 𝑆/sqrt 𝑛 or 𝜇0 < 𝑥̅− 𝑡𝑛−1;𝛼/2 × 𝑆𝑆/sqrtn
Step 5: Conclusion
• Could random variation alone account for the difference between H0 and observations from a random sample? • Reports the P-value and interprets what it says about the question that motivated the test. - Small P-values are strong evidence AGAINST H0 and we reject H0. The findings are "statistically significant." - P-values that are not small don't give enough evidence against H0 and we fail to reject H0. Beware: Failing to reject H0 does not make H0 true. • An accompanying confidence interval helps also.
Significance Test for a Proportion
• H0: p = p0 (a given value we are testing) versus HA: p > p0 ( or HA: p < p0 ; or HA: p ≠ p0 ) • If H0 is true, - the sampling distribution of 𝑝̂ ≈ 𝑁( 𝑝0 , 𝑝0 sqrt p0(1−𝑝0)/n the standardized form of p^ =𝑝̂− 𝑝0/ sqrt 𝑝0 (1−𝑝0) /𝑛 ≈ 𝑁 0, 1 • This is valid when both expected counts — expected successes np0 and expected failures n(1 − p0) — are each 10 or larger.
What Can Go Wrong?
• Null and alternative hypotheses are statements about population parameters before data is collected. • Don't base your 𝐻0 on what you see in the sample data. - Changing the null hypothesis after looking at the data is just wrong. • Don't base your 𝐻𝑎𝑎 on what you see in the sample data. - Both the null and alternative hypotheses must be stated before peeking at the data. • Don't accept the null hypothesis. - 𝐻0 represents the status quo. You either reject or fail to reject but never accept the null hypothesis. You can only say you don't have evidence to reject H0 . • Don't interpret the P-value as the probability that H0 is true. - P-Value is a probability about sample statistic, not the hypothesis for the parameter. - It is the probability of observing a sample statistic this unusual given that H0 is true. • Don't believe too strongly in arbitrary 𝛼 levels. - P-value = 0.0499 and P-value = 0.0501 are basically the same. Often it is better to report just the P-value.
The P-Value and Surprise
• P(TS has at least as extreme as the observed value | 𝐻0) • Tells us how surprised we would be to get these data given 𝐻0 is true. • P-value small: Either 𝐻0 is not true or something remarkable occurred • P-value not small enough: Not a surprise. Data is consistent with the model. Do not reject H0.
Step 4: P-value
• P(TS has at least as extreme as the observed value | 𝐻0) • Under the assumption of H0, the probability that test statistic (TS) has value like the observed one or even more extreme (at least as extreme) as the one obtained. • Extreme means the test statistics has values in the direction of Ha . If Ha: µ < 5 , then extreme means TS is smaller than the observed value. • The sampling distribution of the TS is considered under H0 . • Summarize how far out in the tail the test statistic falls by the tail probability of that value and values even more extreme. • A probability summary of the evidence against the H0 : the smaller P-value, the stronger evidence against H0
5. Significance Test for a Proportion
• Recall: What are the steps of significance test? • Conditions for inference on proportions (Assumptions) 1. The data used for the estimate are a SRS from the population studied. (the most important) 2. Categorical Data (summarized by proportion) 3. The sample size n is large enough that the shape of the sampling distribution of sample proportion is approximately Normal. How large depends on the type of inference conducted
One-sample t-test
• Significance test for µ when σ is unknown 1) Assumptions: quantitative variable, randomized data, population distribution is normal (for small n) 2) States the Hypotheses (BEFORE data collection): 𝐻𝐻0: 𝜇𝜇 = 𝜇𝜇0 , where 𝜇𝜇0 is the hypothesized numerical value. 𝐻𝐻a: 𝜇𝜇 > 𝜇𝜇0 or 𝐻𝐻a: 𝜇𝜇 < 𝜇𝜇0 (one-sided) or 𝐻𝐻a: 𝜇𝜇 ≠ 𝜇𝜇0 (two-sided) 3) Calculate the Test statistic 𝑡𝑡 = 𝑥𝑥̅−𝜇𝜇0 𝑠𝑠⁄ 𝑛𝑛 4) Find the P-value using a t-table or calculator (T-Test) 5) Conclusion: smaller P-values gives stronger evidence against 𝐻𝐻0. Describe your results in the context of the practical question so the general audience can understand.
Statistical Hypotheses test
• Study finding is a claim about the unknown value of a population parameter. We check whether or not this claim makes sense in light of the "evidence" gathered (sample data). Hypothesis is a statement about the population parameter, usually of the form that a certain parameter takes a particular numerical value or falls in a certain range of values. A statistical hypothesis test is a method of using data to summarize the evidence about a hypothesis. • When interpreting a finding in a study, a natural question arises as to whether the finding could have just occurred by chance. • Hypothesis testing is a statistical procedure for testing whether chance is a plausible explanation of a study finding. • We compute the probability for if the finding could just occurred by chance. • If this probability is a very small value, then it is unlikely the finding could have occurred by chance alone.
Matched-pairs Design
• Subjects are matched in legitimate (non arbitrary) pairs and each treatment is given to one subject in each pair. • Record two responses for each subject (repeated measures), as in beforeand-after observations or when two treatments are give at different times to the same subject Example • Pre-test and post-test studies look at data collected on the same sample elements before and after some experiment is performed. • Twin studies often try to sort out the influence of genetic factors by comparing a variable between sets of twins. • Using people matched for age, sex, and education in social studies allows us to cancel out the effect of these potential lurking variables
Plus Four CI for 2 proportions
• The "plus 4" method improves the accuracy of the confidence interval. We act as if we had 4 additional observations: 1 success and 1 failure in each of the 2 samples. • The new combined sample size is n1 + n2 + 4 and the proportions of successes are p~1= X1+1/n1+2 p~2= X2+1/n2+1 CI
A graphical presentation of P-value
• The P-value is the probability that Test Statistic has value like the observed one or even more extreme under H0 . • 𝐻a: 𝜇 > 𝜇0; the shaded area in the tail of the sampling distribution.
Calculation of P-value
• The P-value is the probability, if H0:𝜇 = 𝜇0 was true, of randomly drawing a sample like the one obtained or more extreme in the direction of Ha. One-sided (one-tailed)- Ha: u>u0-> P(T>t) Ha: u<u0-> P(T<t) two sided/tailed- Ha: u not qeual u0-> 2P(T> |t|) Test Stat- t= xbar-u/(s/ sqrtn)
Robustness of matched pairs t procedures
• The matched-pairs t procedures in comparing two dependent samples are the same as for the one-sample t procedures. • However, their notations and interpretations are different . • As a guideline for sample size, we follow the same rule for the one-sample t procedures: -When n < 15, the data must be close to Normal and without outliers. - When 15 < n < 40, mild skewness is acceptable, but not outliers . - When n > 40, the t statistic will be valid even with strong skewness.
P-value and Significance level 𝛼𝛼
• The significance level (𝛼) - threshold value to decide whether P-value is low enough to reject H0 - the largest P-value tolerated for rejecting H0 - decided arbitrarily before data collection • P-value ≤ 𝛼: - reject H0. - statistically significant (not imply Ha is actually true) • P-value > 𝛼 : - fail to reject H0. - not statistically significant (not imply H0 is actually true) • How small p-value is needed to reject H0 • 𝛼 = 0.05 is most common (1 in 20 chances is pretty rare ) • 0.001, 0.01, 0.1: commonly used levels of significance
Step 1: Assumptions
• The statistical test assumes that the data is produced using randomization. • Other assumptions: - the sample size, - population distribution.
Illustration of errors for 𝐻0: 𝑝 = 𝑝0 vs. 𝐻𝑎: 𝑝 > 𝑝0
• The top figure, p0 is the true proportion. A high 𝑝 ̂ from p0 (𝑝 ̂> 𝑝∗ leads to a P-value less than 𝛼) results in a Type I error. • The bottom figure, p is the true proportion. A low 𝑝 ̂from p ( 𝑝 ̂< 𝑝∗ leads to a P-value greater than 𝛼) results in a Type II error
Robustness of the two-sample t procedures
• The two-sample t statistic is most robust when both sample sizes are equal AND both sample data distributions are similar. ⇒ T distribution will work for as small as 𝑛𝑛1 = 𝑛𝑛2 = 5. • But even when we deviate from this, two-sample t procedures tend to remain quite robust. ⇒ T distribution will even with the most skewed distributions for (n1 + n2)≥ 40 • A histogram, dotplot, stemplot, or Normal quantile plot will help you determine whether the data show evidence of deviations from Normality
An analogy: Decision in a Legal Trial
𝐻0: The defendant is innocent vs. 𝐻𝑎: The defendant is guilty Jury Decisions • Type I error (reject a true 𝐻0): Found guilty when the defendant is actually innocent. Put an innocent person in jail. • Type II error (fail to reject a false 𝐻0): Not enough evidence to convict, but the defendant is actually guilty. A murderer goes free . • Correct decision: reject a false 𝐻0 or fail to reject a true 𝐻0
Effect size and statistical significance
𝐻0: 𝜇 = 𝜇0 vs 𝐻𝐴: 𝜇 > 𝜇0 • Observed effect= |𝑥̅− 𝜇0| • Effect Size= |𝜇𝐴 − 𝜇0|, where 𝜇𝐴 > 𝜇0 • the standardized effect size = 𝜇0−𝜇𝐴 𝜎 • Don't conclude a small P-value is necessarily due to a large effect size, for all three factors affect the P-value. • Even a tiny effect can be highly statistically significant if the sample size is very large, individual variations are very small, or both. • Inversely, even a large effect size may not reach statistical significance if the sample size is too small, the data are extremely variable, or both.
Calculation of Sample size in practice
𝑛 = (𝑧∗𝜎/ 𝑚)^ 2 In practice, we don't know the value of σ. • Substitute an educated guess for σ . -the sample standard deviation from a similar study. - Range of the data/6 𝑃(𝜇 − 3 𝜎 < 𝑋 < 𝜇 + 3 𝜎) = 0.997 Conduct a pilot study (a smaller scale study with a small sample) and collect the information for designing a full study.
Sampling Distribution for 𝑝 hat
𝑝 ̂ ≈ 𝑁( 𝑝, 𝑝 sqrt 1 − 𝑝 ⁄𝑛 ⇔ 𝐷 𝑝^−𝑝 /sqrt𝑝( 1−𝑝) ⁄𝑛 ≈ 𝑁( 0, 1 )when 𝑛p ≥ 10 and 𝑛( 1 − 𝑝) ≥ 10,
Paired T Interval for estimating μdiff or μd
𝜇𝑑 or μdiff : the mean of the pairwise difference in the response to the two conditions within matched pairs of subjects in the entire population When the conditions are met, the CI for 𝜇𝑑 is xbar d +/- t*n-1 (sd/sqrtn) • 𝑥̅ 𝑑: the sample mean of the pairwise differences • sd : the sample standard deviation of the pairwise differences • n : the number of pairs. • 𝑡𝑛−1 ∗ : the critical value from the Student's t-model corresponding to the confidence level C and the degree of freedom df = n - 1