t-tests and effect size

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Translating Critical Values to Raw Scores

"𝐶𝐼%&''"=𝜇±𝑡()*+(,'∗𝜎𝑛 We don't actually create CI for Nulls, this is just for the example and for translating. Point Estimate:The NULL's mean, in the original number scale t-critical:The level of significance I want (ex. 𝛼=.05, t = refer to chart) in standardized scale Margin of Error:The amount of error based on SE and a desired level of confidence in the original number scale Standard Error (SE):Standard deviation of the sampling distribution

DEPENDENT SAMPLES T-TEST

-

EFFECT SIZE COHENS D

-

ONE SAMPLE T TEST

-

NULL is Zero

Because of random error and chance, the difference will not always be zero. ¤Just like before, we want to know a range of reasonable zero enough values (a range of "zero enough") ¤If our t-statistic is far enough outside the "zero enough" range, we can reject the null and conclude there is a difference in pre and post scores

Degrees of Freedom

Because we have two sample groups now, we will have to update our degrees of freedom to 𝑑𝑓=𝑁−2

Standardized Differences

Cohen's d is a standardized difference between two means, while the t-test is using a sampling distributionto test a hypothesis.

Hypothesis Formulation -Differences in Means

In this class, and for most comparisons between groups, the null hypothesis assumes a true difference of 0, so no difference between the groups. That is, we usually assume: =0

What would happen to the standard error and to the t-statistic if we increased the sample?

Increasing the sample size: ¤Gives us more degrees of freedom and a lower critical value ¤Decreases standard error ¤Increases the t-statistic

Reject the NULL Example

Notice how the MEAN of the NULL is NOT in the 95% confidence interval for our difference in the means.¤In light grey on the left is the NULL hypothesis and its range of "zero enough"¤On the right is our 95% confidence interval for our difference in the means

Fail to Reject NULL Example

Notice how the MEAN of the NULL is in the 95% confidence interval for our difference in the means¤A difference of zero (the NULL) is a reasonable mean difference, so you cannot reject the null. The red line (which represents the sample mean difference) is inside of the range of "zero enough

Independent Samples t-test NULL

The NULL hypothesis for an independent samples t-test is: "There is no difference between the means."

Dependent Samples t-test NULL

The NULL hypothesis for repeated measures t-test is: "There is no difference between the pre-post means."

Independent Samples Visualized

There are some populations out there......that we take our samples from to then test the difference between them and......if we sampled and calculated those differences again and again, the average difference between the groups will be a normally distributed sampling distribution

Pooled Standard Error SEp

We also need to calculate a pooled standard error for the sampling distribution, i.e. the standard deviation for the sampling distribution.

Pooled Standard Deviation Sp

We also rarely know 𝜎$&and 𝜎%&, so we'll have to approximate these with 𝑠$&and 𝑠%&. Then we can combine the two samples' standard deviation together to give us one measure of variation, pooled standard deviation, which we can use for calculating Cohen's d.

Effect Size -Cohen d

We can also calculate the effect size, Cohen's d, to get an idea of the practical significance of the results. For these samples, the effect size is d = 1.01 ̈The Low Fat dieters lose, on average, a whole standard deviation more than the Low Carb group.

Hypothesis Testing -One Sample t-test

We've actually done a bit of this already with the z and the t tests from the previous section, but now we focus on just the t-test (𝜎 unknown) ¤We were using them as examples for hypothesis testing

Try it. Freshman Fifteen

You've heard that the average amount of weight change for the first year of college is +15 pounds. To test this claim, you take a sample of 25 UT sophomores and ask them to report the amount their weight has changed during the previous year. You calculate summary statistics for your sample as :̅𝑥=13𝑙𝑏𝑠, 𝑠=4.0 Test the claim of the "Freshman 15" using 𝛼=.05. Use the steps of hypothesis testing!What are the hypotheses? One or two tail test? Step 1: State the Hypotheses -and draw the picture! 𝐻0:Freshmen gain an average of 15 pounds during their first year of college. 𝐻1:Freshmen do not gain an average of 15 pounds during their first year of college. 𝐻0:𝜇=15 𝐻1:𝜇≠15 This is a two-tailed test.(Freshmen may gain less than 15 or more than 15, hence two tailed. There is no assertion of direction. You are not assuming one way or the other, you just want to know if the amount of change is equal to 15 or not.) Step 2: Level of Significance𝛼=.05 Step 3: Statistical Test = One sample t-test Critical t ̈𝛼=.05 ̈n = 25 What is our critical value? t-critical = ¤±2.064 Step 4: Find the Critical Value(s) Finally, we know that our test is a two-tailed test, meaning that our two critical values should cut off tails with probabilities of .025 each (½ of .05 is .025).The t-critical values that satisfies 𝑑𝑓=24and 𝑡."#$are 𝑡%&'(=±2.064. Step 5: Calculating the test statistic Substituting our sample statistics into the formula below, we have: So, 𝑡)(*(=−2.5 This is what it looks like on our curve, assuming that the NULL is true, i.e. +15lbs is the average weight change¤I've "translated" the critical value into pounds, what do you think our results will be? Step 6: Make a Conclusion Our 𝑡stat=−2.5, and our 𝑡crit=±2.064 Our 𝑡stat is past our 𝑡crit, so we reject 𝐻0 Think Critically... ̈Our results were "statistically significant", but what does that mean practically? What could it mean in regards to our sampled group? ¤Maybe UT students don't gain as much weight -Maybe Austin is a healthier city -Maybe you sampled some athletes and your sample was not representative ¤Also, it would not have been significant if it was just .35lbs more... Considering the sample size and standard error, the sample mean of ̅𝑥=13is not a reasonable value in the null world. It is extreme enough for us to conclude that the probability of seeing a ̅𝑥=13if the null were true is less than .05, less than a 5% chance.

All t-tests...

¤Are hypothesis tests¤That compare some sort of average score to another hypothesized average score ¤With an unknown 𝝈¤And use a Standard Error (SE) ¤Use degrees of freedom (n-1 or n-2)

Purpose of a One Sample t-test

̈A one-sample t-test compares the mean of ONE sample mean (̅𝑥) to an assumed population mean (𝜇). ̈To determine if the sample is truly different from an assumed population mean ¤Ex. A group of students that received statistics tutoring vs the normal population of those that do not receive tutoring ̈A "Statistically Significant Difference"

"The results were significant."

̈Again, statisticians are not good at naming things. We think "significant" means: ¤"sufficiently great or important to be worthy of attention" ̈This common definition can confuse what "statistical significance" means... ̈Results can be statistically significant but only have a tiny, negligible real world effect. ¤Ex. "The results were statistically significant, there was a .02 point increase in IQ when using a special shampoo." -02 when the mean is 100 and the SD is 15 is negligible ̈Statistical significance DOES NOT equal practical significance¤Hypothesis tests only tell us the probability that our finding would occur under the null hypothesis ¤With a large enough sample size, lots of negligible differences can be "statistically significant" ̈Practical significance are differences that are large enough to mean something in everyday life

If the NULL were true, would you expect that every sample of ̅𝑥"−̅𝑥#equal exactly 0?Why or why not?

̈Because of random error and chance, the difference will not always be exactly zero. ¤Just like before, we want to know a range of reasonable values that we can be 95% confident with A range of "zero enough"

Cohen's d, Effect Size

̈Cohen's dis an effect size used to indicate the standardized difference between two means It looks just like our t-test but now we are dividing by standard deviation rather than standard error. effect size in standard units

Statistical vs. Practical Significance

̈How different are the two means? ¤Not just are they "statistically significantly different" because as we have seen that can mean very little practically. How do we quantify practical significance?

Cohen's d and t-statistic Examples

̈How would you interpret the following? ¤t(24) = +2.45, p < .05, d = .15 -Statistically significant, but very small effect -We can be confident that there is only small difference ¤t(24) = +2.05, p > .05, d = .15 -NOT statistically significant, and very small effect -We can't be confident...¤t(4) = +2.45, p > .05, d = .80 -NOT statistically significant, BUT very large effect size -This suggests we might be UNDERPOWERED, not a big enough ¤t(4) = +2.78, p < .05, d = .80 -Statistically significant, and a very large effect size -We can be confidence that there is a large effect

Translating Between Scales

̈I have been including numbers in the standardized t-scale and numbers in the original scale of the question ¤This is not always common practice, but I find it helps makes the idea more tangle ̈You can make the same conclusions as your t-test, but the scientific community prefers standardized numbers (because we aren't always familiar with the original scale) ¤You still have to report you t-test statistic, but if you want to double check your work, here's how...

Which test and how how many tails should you use for the following examples?

̈I want to compare the average exam grade of a sample of n=10 of my students in this Stats Literacy course to a sample of n=10 students in the other section of the Stats Literacy course. ¤Independent Samples, Two Tailed ̈I want to compare a sample of UT students' average hours of sleep to the national average (unknown 𝜎). ¤One Sample, Two Tailed ̈I want to know if the Premont school district is performing lower than the state average on the STAR test. ¤One Sample, One Tailed (Left) ̈I want to know which drug is better at reducing flu symptoms. I create two groups, one receives drug A, the other drug B. ¤Independent Samples, Two Tailed ̈I want to know if allowing people to fidget changes their average math score. I first tell the students they are not allowed to fidget for one week, then the following week allow them to fidget as much as they would like. I then compare their two averages. ¤Dependent Samples, Two Tailed

Cohen's d Interpretation

̈In a one sample t-test example, a Cohen's d value of d = +1.0 would suggest that your sample of people (̅𝑥) are 1.0 standard deviations above the average of the population (𝜇) ¤We may not be familiar with a certain scale and whether or not a difference of 10 points means anything, but standard units are helpful because we are familiar with them and know what they imply -If I said the Cohen's d was +1.5, we know there was a large difference between the two means -If I said Cohen's d was .06, you'd know there was little practical difference between the means

Independent and Dependent

̈In independent samples, we take the averages of both groups and take the difference between them and compare it to the Null (i.e. zero difference) ̈In dependent samples, we compute the difference (ex. After Treatment -Before Treatment) then take the average of those difference and compare it to...?What do you think our NULL hypothesis is here?

Assumptions

̈Independence of Observations ¤There is no relationship between the people in the groups or across the groups ̈The Data are Normally Distributed ¤Follow a normal distribution, no skews or crazy outliers ̈Homogeneity of Variance ¤Both groups have similar of variances within their groups -Ex. If the variance in Group A is 25, the variance in Groups B should be fairly close -The ratio of the larger over the smaller variance should not exceed 1.5

Dependent Samples NULL

̈Just like in independent samples t-tests, we are going to compare to the NULL hypothesis assumption that there is NO DIFFERENCE between before and after treatment scores. We always start with the assumption there is no difference, no treatment affect, no change, etc. ¤Subscript is now D for difference

Sampling Distribution of Differences in Means

̈Now, rather than comparing our one sample to the one known population, now we will test the difference between our two sample means ¤If we did this again and again, drawing samples, and taking the difference, ̅𝑥'−̅𝑥(the difference between them, over repeated samples will -Follow a normal distribution (thanks to the central limit theorem) -Have their own joint standard error ̈Independent samples t-test are typically two-tailed, but one tailed are possible if there is already some evidence or theory to show that there is directionality

Independent Samples t-test

̈Now, we are going to compare two sample means to each other, rather than to a known population mean ¤Perhaps there is no known population mean for the groups of interest -Ex. What is the average number of minutes/week UT students exercise compared to students from St. Edwards? ¤Or maybe there is a new thing we are measuring (or a new scale) that doesn't have established population means -Ex. Is there a difference in fidget scores on the Fidget Assessment Battery-II between people diagnosed with ADHD and those not diagnoses with ADHD?

One Sample t-test Review

̈Situations when: ¤You are comparing your one sample mean to a known population mean ¤𝜎unknown -If it was known, you'd use a z-test But what if you want to compare two samples means you collected to each other?

One Sample t-tests are for...

̈Situations when: ¤You are comparing your sample mean to a known population mean -Ex. Did this school district score lower than the national average? -Ex. Are UT students more or less likely to gain the national average of 15lbs their Freshman year? -Ex. Does tracking behavior increase the number of minutes/week of exercise compared to the average of those who don't track their behavior (30 minutes/week)? ¤𝜎unknownnIf it was known, you'd use a z-test One Sample t-tests are for one sample to compare against some known population mean, but what if we want to compare two groups that we sampled? ̈Then we need an... Independent Samples t-test

Cohen's d and Test Statistics

̈Test statistics determine how likely these findings would be if the NULL hypothesis were true ¤This is why it takes the variance AND the sample size into account (i.e. Standard Error) ¤It does not determine the magnitude of the difference ̈Cohen's d does determines the magnitude of difference, but cannot determine the likelihood of seeing these findings

Cohen's d

̈The effect size (d) is in standard units, aka standard deviations, think back to our simple z-distribution ̈So, an effect size of d = 1.0 says that the means are a whole one standard deviation apart ¤Small = 0.2 ¤Medium = 0.5 ¤Large = 0.8+

t-stat formula

̈The formula one sample t-test statistic should appear familiar: ̈Numerator (̅𝑥−𝜇) the difference between the sample mean and the assumed population mean ̈Denominator ('!$) is the standard error ̈t-statistic 𝑡is the standardized (translated raw mean) score we can compare to a critical value

Independent Groups

̈The two groups are categorized by some sort of categorical (nominal) variable ¤Ex. Female vs. Male, Mountain People vs. Beach People, Side Sleepers vs. Back Sleepers, etc. ̈ The two groups you choose to compare can be anything, really, but... ̈ They should only differ on that one grouping quality, all else should be fairly equal ¤Ex. Female vs. Male, the participants should be of similar SES, age, etc. -This of course depends heavily on what the research question is and what is available to you

Confidence Intervals and Nulls

̈When constructing confidence intervals for a difference in means, the null hypothesis is usually 𝐻+:𝜇$−𝜇%=0.¤So, we can compute our confidence interval for the difference in means, and check to see whether the interval contains 0. ̈When the interval contains 0, it is plausible that 𝜇$−𝜇%=0, and this is consistent with failing to reject 𝐻+. ̈When the interval does notcontain 0, it is unlikely that 𝜇$−𝜇%=0, and this is consistent with rejecting𝐻+.

Overlaps and Shifts

̈With Cohen's d we know that a d = +1.0value implies that two distributions (ex. UT vs non-UT students) are shifted away from each other by one full standard deviation. If there was no difference between the sample mean and population mean, the two curves would be overlapped nearly perfectly. But if they have a have a d = +1.0 , the curves are pulled away from each other by one standard deviation.

Dependent Samples t-tests

̈With dependent samples, we are interesting in comparing two means (like we've been doing along) which are now relatedin some way ("depend" on each other in some way) ̈Matched Pairs of Data¤Repeated Samples:nTaking two measures from one person nEx. Before and After treatmentnEx. With notifications on the phone enable and disabled


Set pelajaran terkait

Ch 6 Greek Art Video Quiz A (Classical Revolution Part 1)

View Set

Chapter 10 Managing Conflict & Negotiations Practice Quiz

View Set

AP Statistics Chapter 5 (Mostly) Multiple Choice Questions

View Set