EDUC 667 2
Three things necessary to establish causality
1.Things must co-vary/correlate 2.Cause has to precede effect; temporal order 3.Need to rule out alternative explanations 2 and 3 have to do more with the design Counterfactual reasoning provides a powerful lens for thinking about these questions -You'd like to know what outcomes these individuals wouldhave had ifthey had received a different treatment
Dependent Samples: t-tests
A dependent sample t-test is used when comparing the means of two samples that are related (repeated measures or matched samples) •Dependent samples t-test measures difference scores -Difference scores/change scores/gain scores -Mean at time 2 -Mean at time 1 •Dependent samples t-test is also referred to as 'paired' samples t-test because every observation from one sample is paired with another
F-statistic
An F test statistic is the ratio of two sample variances •The 𝑀𝑆𝐵and 𝑀𝑆𝑊are two sample variances and that's what we divide to find F. F=MSB/MSW
Homoscedasticity
Assumes that the random variance (error variance) is uniform across the entire regression line - dots all close in straight line on graph
Correlation Characteristics
Correlation describes a relationship between two variables, with ρor r to denote a population or sample correlation, respectively Three characteristics of a relationship -Form: linear or nonlinear -Direction: positive or negative -Strength: from -1 to 1
Critical values for t-statistic
For a t-statistic, the critical value changes depending on the degrees of freedom (df= n-1) Easy version of t-distribution table: http://statcalculators.com/students-t-distribution-table/
Hypothesis Testing for ANOVA
Hypothesis Testing Steps- Step 1: State the hypothesis -Step 2: Set alpha level, locate critical region based on alpha and df -Step 3: Calculate F-statistic with assumptions:•Random Sample•Interval/Ratio•Independent Observations•Normality•Homogeneity of Variance -Step 4: Make decision about H0and state the conclusion. Reject H0 if the computed F falls in the critical region
Effect Size for Independent Samples t-test
If you get a statistically significant results, and you reject the null, then you follow up by calculating the effect-size •Remember, "statistical significance" does not automatically mean that the effect is interesting, impactful or important •It DOES mean that whatever effect you claim exists is very likely to be genuine •There are multiple options for an effect-size measure, but the one that is most commonly used is Cohen's d:
Multiple Comparisons
Important to consider the Type I error consequences of making multiple comparisons with the same data •If 𝛼=.05, then a hypothesis test will make a Type I error 5% of the time •If 100 t-tests were conducted at 0.05 level, instead of 1 ANOVA, even if the Null was true in all cases, you would expect 5 to be rejected •With multiple tests it's likely that significant results may not be true findings. •Need to adjust the significance level
Independent Samples
Independent samples are samples where the individuals in both samples are not related to each other •Not comparing individuals between the groups; instead comparing group means with one another
Pearson Correlation
Pearson Correlation is used to measure linear relationship between X and Y for data measured on an interval or ratio scale. •Other types of correlations (e.g., Biserial) •Often used to describe data, but also important for validityand reliabilityevidence Max r(xy) = sq root (rxx * ryy)
Power Analysis
Provides a recommended minimum sample size per group •Based on a few factors: -Statistical test/# groups -Effect Size -Power -Other factors too (significance level, SD) •www.anzmtg.org/stats/PowerCalculator
Step 4: Make a Decision
Reject Null, if the sample statistic falls in the critical region •Fail to reject Null, if not Check Slide 15
Anova Step 4: Make a Decision
Reject the null hypothesis when F statisitic> critical value •Similarly, when the p-value is less than the significance level α, we reject the null hypothesis
Look over Scatterplot
Slides 3-7
Independent Samples: t-statistic
State hypothesis as: •𝐻0:μ1−μ2=0 •𝐻1:μ1−μ2≠0 •Much like with the single sample t-test, we use the sample variability in the t-statistic: -Check slide 7 for t Formulas
Hypothesis Testing in 4 Steps
Step1: State the hypothesis •Step2: Set the alpha level and criteria for a decision. •Step3: Collect data and compute sample statistics. •Step4: Make a decision
Within Group Variation 𝑆𝑆w (Slide 12)
The within group variation is the sum of the individual variations:
What does a p-value tell us?
What we wish a p-value told us (and many pretend it does): -"Given these data, what is the probability that 𝐻1is true?" -𝑃𝐻1=𝑇𝑟𝑢𝑒𝐷𝑎𝑡𝑎 •What a p-value actually tells us: -"Given that Ho is true, what is the probability of these (or more extreme) data?"-𝑃(𝐷𝑎𝑡𝑎|𝐻0=𝑇𝑟𝑢𝑒) •If the null hypothesis is correct, then these data are highly unlikely.-These data have occurred.-Therefore, the null hypothesis is highly unlikely." •The problem is that rejecting 𝐻0does not affirm 𝐻
Step 3: Calculate F-Ratio
build in the basic one-way ANOVA table Look at sample table on slide 10
Correlation does not imply
causation!!
Look at Linear vs not linear
slide 13
Side Note: Covariance and Correlation - Look at equation slide 22
•Covariancealso tells us how two continuous variables are related •Similar to a correlation, if larger values of one variable are associated with larger values of another variable, the covariancewill be positive, and vice-versa. But it's difficult to interpret... •The value that the covarianceproduces is dependent on the scales of the two variables -For example, the covariancein our example using years of education and weekly earnings was 2354. -Is that high? Is that low? We don't know... •For this reason, correlationsare more often presented; which are covarianceswhere the two variables have been standardized
Considerably unequal sample sizes
•Hochberg's GT2
Cohen's D
= mean difference/standard deviation
z-statistic vs. t-statistic
Check slide 24 for formulas
Hypothesis Testing
Hypothesis testing, or tests of statistical significance, provide validity evidence to support claims between variables in the population Hypotheses always refer to population parameters -NOT sample quantities
Tukey HSD
If the difference in means between any two groups are greater than Tukey's honestly significant difference(HSD) value, then that difference is considered statistically significant. HSD= q*sq root (MSw/n) Where n = sample size for each group -Tukey's HSD requires equal sample size between groups; when 𝑛1=𝑛2...=𝑛𝑘 •The value q is based on α, number of groups (k) and 𝑑𝑓𝑊
Statistical Significance vs Effect Size
In general statistical significance means that the result is not due to chance-does not provide information absolute size of the treatment effect . •Effect sizeprovides magnitude of effect (e.g., Cohen's d). (e.g., large cohort study with n=10,000, a mean difference of 1pt would have statistical significance...but not practicalsignificance
Step1: State the hypothesis
Null hypothesis (Ho) •Alternative hypothesis (H1) Statement of statistical hypothesis involves population parameters (Greek letters) Two tailed hypothesis tests: Test either direction, use equals and does not equal One-tailed hypothesis tests: Specifies either an increase or decrease in the population mean, uses greater than and greater than or equal to Check for wording
Pooled Variance
Represents a weighted average of the variability between samples 𝑑𝑓=𝑛−1 Check other formulas on slide 8
ANOVA Effect Size recommendations:
Small: < .09 Medium: .09-.25 Large: >.25
Effect size interpretation
Small: 0.2 Medium: 0.5 Large: 0.8 Check slide 9 for d formula
Total Group Variation 𝑆𝑆𝑇 (Slide 13)
The total variation comprises all the variation in the sample with respect to the grand mean
Mean Squares (MS)
The variances are also called the Mean of the Squares and abbreviated by MS, often with an accompanying variable subscript: 𝑀𝑆𝐵or 𝑀𝑆𝑊 •They are an average squared distance from the mean and are found by dividing the variation by the degrees of freedom 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒=𝑆𝑆/𝑑𝑓 •𝑀𝑆𝐵=𝑆𝑆𝐵/𝑑𝑓𝐵 •𝑀𝑆𝑊=𝑆𝑆𝑊/𝑑𝑓𝑊
Step1: State the Hypothesis
There are k treatment levels (groups) and one independent variable (one factor) •𝐻0:μ1=μ2=μ3=...=μ𝑘 •𝐻1: Not 𝐻0
another problem: The Null hypothesis is never true
This point is largely attributed to the Cohen paper. Though, as he clearly demonstrates, many statisticians shared these concerns prior to him. •"It is foolish to ask 'Are the effects of A and B different?' They are always different—for some decimal place" (Tukey, 1991 p. 100). •If you are going to pull samples from two populations, it is incredibly unlikely they will be identical and number. There will be some difference no matter how small.
Big Data
We live in the age of big data. A p-value for a sample of 1 million is literally meaningless. •It is like the null hypothesis is just begging to be rejected!! •Finding significant p-values does not even really tell us all that much.
Dependent Samples
•Dependent samples are samples where individuals in the samples are related to each other •Comparing individuals between the groups -Often times, comparing the same individual over time -Other times, comparing similar people between the groups, for example, in matching
So what can we do if we violate these assumptions? (e.g., ordinal data)
-We can transform the existing data, non-normal normal -Use an adjusted statistic, such as Welch's variant of the t-test and ANOVA -Use a non parametric test •Do not state hypothesis in terms of specific parameters, and •Make few assumptions about the population distribution
Post Hoc
The ANOVA doesn't indicate which condition(s) are different, you need post hoc tests to determine that •Several options that consider Type I and Type II errors
The Logic of ANOVA
ANOVA compares the variability of scores within a group (SSW) to the variability between group means (SSB) .•If the variability between groups is considerably larger than the variability within groups, the result is evidence of treatment effect. •Comparison of the two types of variances is done by computing a statistic called the F-ratio: F = (Varience Between/Varience Within) = (Treatment +Chance)/Chance
ANOVA vs t-tests
An ANOVA can be run on only two groups; the conclusion will be exactly the same as with a t •F = t^2 •the F is computed with variances as opposed to mean differences and the SE •Why not just do all possible t-tests with adjusted alpha? •ANOVA is robust to slight assumption violations, sometimes referred to as an Omnibus Test •More power with the F to detect a difference anywhere, since all the observations are pooled •ANOVA best for answering, "Is there a difference anywhere at all?"
Between Group Variation 𝑆𝑆𝐵 (Slide 11)
Between group variation (not variance) is the variation between each sample mean and the grand mean •Each individual variation is weighted by the sample size-𝑆𝑆𝐵=𝑛𝑖𝑥𝑖−𝑥2𝑘𝑖=1 •With Grand Mean (𝑥):-𝑥=𝑛𝑖𝑥𝑖𝑘𝑖=1𝑛𝑖𝑘𝑖=1 •The grand mean is the average of all values when IV is ignored; weighted average of all sample means
Change Scores
Change scores are controversial: Post-test score -pre-test score This: 𝑂𝑋𝑂
Simulated Data
Data generated based on parameters of original dataset using:•http://rlanders.net/dataset-generator/ Ex: •Pain Scale Ratings for the 3 groups:
ANOVA Effect Size
Effect size in a one-way ANOVA can be denoted by eta squared: η^2= 𝑆𝑆𝐵/𝑆𝑆𝑇 •Indicating the variation accounted for by differences in treatment groups.
So what are our solutions?
First, we must accept that Hypothesis Testing is not going anywhere, but it does serve a function particularly for smaller data sets •There is a big push for attention towards effect sizes (practical significance over statistical significance). •Be cautious of research that only reports p-values, particularly for large sample sizes •Replication!!!!!
Independent Samples: Comparing Means
For t-tests for independent samples, we compare the means of two unrelated samples •It is very similar to a single sample t-test -Both tests compare two values; however, in t-tests for independent samples, the additional sample has additional variance that must be taken into account •While we are dealing with two sampling distributions now, we can create a singular sampling distribution for our critical values known as thesampling distribution of the differences between the means
•Slightly unequal sample sizes
Gabriel
•Unequal population variance•
Games-Howell
Single Sample t-test
Hypothesis Testing Steps- Step 1: State the hypothesis -Step 2: Set alpha level, locate critical region based on alpha and df -Step 3: Calculate t-statistic with assumptions:•Random Sample•Interval/Ratio•Independent Observations•Normality -Step 4: Make decision about H0and state the conclusion. Reject H0 if the computed tfalls in the critical region Check slide 23 for formula
Experimentwise Error Rate
If you have 4 groups then you are actually making 6 comparisons c= [k(k-1)]/2 𝛼𝐸𝑥𝑝𝑒𝑟𝑖𝑚𝑒𝑛𝑡𝑤𝑖𝑠𝑒=1−(1−𝛼)^𝑐
Dependent samples: t-statistic
Just like with our two previous t-statistics, the paired sample t-statistic is the difference in means over the standard error of the sampling distribution •Average D is the average difference between the pairs •Note n pairs in the denominator such that each pair is counted only once •Can also use Cohen's d for effect size :•𝑑=𝐷/𝑆𝐷d
Bonferroni Adjustment
Maintain the desired experiment-wise alpha by adjusting the alpha based on the number of comparisons: •𝛼 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑=𝛼/𝑐 𝛼 𝐸𝑥𝑝𝑒𝑟𝑖𝑚𝑒𝑛𝑡𝑤𝑖𝑠𝑒 =1−(1−𝛼𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑)^𝑐 Gets too conservative when 𝑘is large
Step 3: Collect Data and Compute Sample statistic
Note that data are collected after the researcher has stated the hypothesis and established criteria for a decision. •Next, compute sample statistics. There are different hypothesis tests for different situations. Parametric tests (z-test, t-test, ANOVA, correlation) generally require these assumptions: -Random Sample -Independent observations -Interval or ratio data -Normality -Homogeneity of Variance (homoscedasticity) •Always test assumptions first!
Null Hypothesis Testing has some problems.
People (ritualistically) put all their attention towards the p-value, regardless of the context of their data or the context of research question. •Not all research questions require a p-value. •For example, Cohen (1994) details a scenario where there is a theory that a certain disease does not exist in a certain population at all. So they gather a sample of 30 and find 1 person has the disease. -They then try to determine if this is statistically significant. -The problem? The research question does not require Hypothesis Testing!!!!! -We found someone in the population with the disease. Therefore, the theory mustbe false. •Hypothesis testing has become so much the norm that many researchers just go through the motions without actually questioning if it is even appropriate given the context.
What if we don't have σ?
So far, we knew the standard deviation of the parent population (σ)that allowed us to find the standard deviation of the sampling distribution (σ𝑥) •Very often, we do not have the standard deviation of the parent population •In that case, we use the standard deviation of the sample in what we call a t-test •For a t-test p-values come from the t-distribution table with df= n -1. Changes based on n
Step2: Set Alpha/Critical Region
The F test is a right tail test only •The F test statistic has an F distribution with 𝑑𝑓𝐵equal to the numerator and 𝑑𝑓𝑊equal to the denominator •The critical region/statistical significance occurs when 𝐹𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑> 𝐹𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙
Z Test
The Z test is used when testing one group and the population SD is known Z= Sample Mean - Population mean divided by the standard deviation of the sample variance or obtained difference over difference expected due to chance •Assumptions: -Random sample of observations -Independent observations -Interval/ratio data -Normality -Population variance (σ) is known
Step2: Set Alpha Level
The alpha-level (α) is a probability value (p-value) that is used to define very unlikely sample outcomes assuming Ho is true, "Region of Very Unlikely" α level depends on research question •Defines the threshold for evidence that result is unlikely to be due to chance •Usually decided before conducting the test
Degrees of Freedom
The dfis often one less than the number of values •The between group dfis one less than the number of groups (k): -𝑑𝑓𝐵=𝑘−1 •The within group 𝑑𝑓𝑊is the sum of the individual df'sof each group. Since each sample has 𝑑𝑓=𝑛−1and there are kgroups, the total degrees of freedom is kless than the total sample size N: -𝑑𝑓𝑊=𝑁−𝑘; when 𝑛1=𝑛2...=𝑛𝑘 •The total dfis one less than the sample size -𝑑𝑓𝑇=𝑁−1
The t-Distribution
The larger the df, the more closely the tdistribution looks like a normal distribution. •Often recommend to have n> 30 because at that point, it looks like a normal distribution •Can use lookup tables to determine probabilities in tails •Adjusts to protect against Type I error for small samples
One-Way ANOVA
The one-way analysis of variance or ANOVA is used when comparing three or more groups •ANOVA Terminology -Factor: categorical variable defining the groups-Levels: distinct groups within a factor •One-way ANOVA has 1 factor •Two-way ANOVA has 2 factors, etc.
Violating Assumptions
The tests we discussed so far are based on certain assumptions. For example, samples come from normal populations, equal variances, interval/ratio data, etc. •These tests are referred to asparametric tests.
The sampling distribution of the differences between the means
These distributions have means equal to the population mean of one group minus the population mean of the second group •The standard error is estimated as the standard deviation for sample mean differences, fluctuation in average difference between groups due to chance .•Remember: http://onlinestatbook.com/stat_sim/sampling_dist/
Pearson Correlation Formula
We can summarize the relationship between two continuous variables using the Pearson correlation formula: r = sp/sq root (ssx*ssy) or 𝑑𝑒𝑔𝑟𝑒𝑒 𝑡𝑜 𝑤ℎ𝑖𝑐ℎ 𝑥 𝑎𝑛𝑑 𝑦 𝑣𝑎𝑟𝑦 𝑡𝑜𝑔𝑒𝑡ℎ𝑒𝑟 / 𝑑𝑒𝑔𝑟𝑒𝑒 𝑡𝑜 𝑤ℎ𝑖𝑐ℎ 𝑥 𝑎𝑛𝑑 𝑦 𝑣𝑎𝑟𝑦 𝑠𝑒𝑝𝑎𝑟𝑎𝑡𝑒𝑙𝑦 With sum of products (SP): SP = Sigma XY - (Sigma X* Sigma Y)/n where n = # pairs 𝐻𝑜: 𝜌=0; 𝐻1: 𝜌≠0; uncommon
Least Squares Regression
When we draw a regression line, many of values do not fall perfectly on the line -> some are above, some are below Like correlation, this suggests that two variables are not perfectly related (𝑟=±1) and therefore, our regression line also does not perfectly account for the data (i.e. there is error) Errors of prediction: The difference between observed value (𝑌) and predicted value (ŷ);known as residuals Least Squares Regression: Calculates the regression line which best minimizes the squared differences between 𝑌 and ŷ; (𝑌−ŷ)^2
Correlation & Regression Effect Size
•In order to judge how good a relationship is (strength of the relationship) we need to square the correlation. •Effect Size: 𝑟=.1, .3, or .5 reflect small, medium, or large effects, respectively •𝑟^2= coefficient of determination is a measure of the percent of the predictable variability; explains how much of the variance in the dependent variable (Y) is explained or predicted by the independent variable (X) •The coefficient of alienation(1−𝑟^2) represents the proportion of variance in the dependent variable that is not accounted for by the independent variable(s)
Bivariate Regression
•Like correlation, bivariate regression measures the relationship (association) between two continuous variables. •Bivariate Regression takes the relationship one step further by having one variable (independent /predictor variable) predict the other variable (dependent/criterion variable)using a regression line •A line is drawn through the middle of the data points that provides a description of the linear relationship and establishes a one-to-one relationship between X and Y values.
Hypothesis Testing In Regression
•Since bivariate regression is just between two variables, it is exactly like correlation •If the correlation between X and Y is significant (𝐻1:𝜌≠0) then the relationship of X predicting Y is also (and equally) significant •An alternative approach (and one which is necessary in multiple regression) is to use a test statistic H0: β1= 0 (no linear relationship) H1: β1 does not =0(linear relationship does exist)
Regression Line Equation
•The regression equation has the form: ŷ = bx + a with slope b and intercept a as: b = r *(sy/sx) or b = sp/ssx & a = y(samp) - bx (samp) •Restriction of range -only make predictions using X values within the observable range
Standardized Coefficients (b vs β)
•To standardize a bivariate regression, both the dependent and independent variables must be z scored •The regression coefficients are no longer called 'b' coefficients-> rather they are called 'beta' coefficients β •When it is bivariate regression (only 2 variables), the beta coefficient is actually the correlation coefficient •The intercept in a standardized bivariate regression is always 0 •Therefore, the beta coefficient is actually measuring the effect size directly -> the changes in the standard deviations of the dependent variable
Equal sample sizes and similar population variances
•Tukey HSD-good power •Scheffe-more conservative
Pearson Correlation Assumptions
•Variables are normally distributed •Interval/Ratio data •Homogeneity of Variance •Independence of Errors•Linear relationship •Other assumptions: Measurement error, truncation of range, etc.
It's related anyways...
•t-test is a generalization of correlation •ANOVA is generalization of t-test •Regression is a generalization of ANOVA •...they are extensions of a correlation