PSYC 3290 Final
Analysis of variance-NHST, one or more null hypotheses?
-One-way ANOVA, one null hypothesis -Regular null and alt hyps: Null: there is no significant effect; Alt: there is a significant effect -Factorial ANOVA, 3 null hypotheses 1. Null: Condition 1 does not have a significant effect; Alt: condition 1 does have a significant effect 2. Null: Same as above but for Condition 2 3. Null: Condition 1/ Condition 2 is equal at every level of Condition 2/1; Condition 1/Condition 2 is not equal at every level of condition 2/ Condition 1 Alternative phrasing: There is no difference between condition 1 and 2, for each level of condition 1 and 2
null hypothesis (H0)
-means there is no relationship in the population, sample relationship occurred by chance -if there is less than a 5 percent chance of a result as extreme as the sample result if null was true, then the null is rejected (statistically significant)
What influences type I and type II errors?
-small a: stronger evidence is required to reject the null, less likely to make a type I error however, there is a higher standard of evidence, meaning we will miss more effects that do exist (more type II errors) -larger N: less likely to make type II error, also decreases a which will minimize chance of getting both types of errors -smaller a means larger B and vice versa
alternative hypothesis (H1)
-the relationship in the sample reflects the relationship in the population (you always start by assuming null hypothesis is TRUE and go from there)
Repeated measures (between subjects ANOVA)
1 IV with 3 or more conditions - every participant is exposed to each level of the IV · F-test the same way as with independent groups Df calculations Df 1: k-1 (same) Df 2: (k-1)(n-1) - for repeated measures designs Independent groups was n-k RM is more powerful than independent groups ANOVAs One important assumption for repeated measures anova: Sphericity assumption: the variance of the difference scores between each pair of conditions must be equal Do mauchly's test to test this assumption (in R) If mauchley's test is significant, then the sphericity assumption is violated If violated, we need to use a correction factor to adjust the df and re calculate an adjusted p value Greenhouse-Geisser test Hyunh-Feldt test Pick one, doesn't really matter as long as you state it HF is a bit more powerful Effect size measure is eta squared (same as above)
One-way independent groups design
1 IV with 3 or more levels; each participant is exposed to only 1 level - between-groups ·Anova looks to see whether the population means are actually different, or whether they are the same -whether differences in sample means are due to random sampling variability or true difference in the world
Assumptions of one way between subjects ANOVA
1. Homogeneity of variance: the variability of scores in each underlying population are equal a. Textbook: population underlying each sample has the same standard deviation b. If sample SDs are close to being similar, this assumption is considered to be satisfied 2. Residuals are normally distributed 3. Individuals are randomly assigned to conditions - Additional considerations we make with all testing: pop is normally distributed, random sample from pop ·Anova can be robust even if these assumptions aren't met
*Between subjects factorial design
3 null hypotheses above 1st: marginal means for high and low are equal (no main effect of frequency) 2nd: marginal means for dim and bright are equal (no main effect of luminance) 3rd: low means are equal across brightness, high means are equal across brightness (and vice versa with dim/bright) - no difference between any of the cell means (no interaction) ·The # of hypotheses depends on how many IVs there are
CI around sample correlation (r)
As r gets further away from 0, the CI gets shorter and more asymmetric, with the side further from 0 as the smaller side (this is bc r can only take values between 1 and -1) -Larger N, shorter CI
SS (for independent groups)
Between-Group variability = Error variability (Random sampling variability) + Variability among population means [SS Between] Within-Group variability = Error variability (Random Sampling Variability) [SS Within] SS Total = SS Between + SS Within
T distribution
Compares t-critical with t-obtained and reject/fail to reject the null hypothesis (Obtained > critical=reject null, Obtained < critical= fail to reject the null) -calculated the same as z, but uses SAMPLE SD in place of population standard deviation -if we don't know the SD, we need the sample SD to estimate the whole SD. Then, we can calculate the CI for the population. -can vary more than z because it depends on both M and s (both vary from sample to sample) -has multiple different T distributions, we can figure out which one by using a parameter called df (N-1) -has higher tails than normal distribution because population SD is unknown, needs more room for varaibility -larger the N is, larger the df is, closer t.95 is to z.95 (closer you are to normal dist.) -important in calculating p value
Simple Main effects
Differences among particular cell means within the design. To be precise, it is the effect of one IV within one level of a second IV variable -Simple main effects compare the effect of IV1 at each level of IV2, or conversely, the effect of IV2 at each level of IV1 -Do this if there is a significant interaction in factorial ANOVA -If a split-plot ANOVA has NON-PARALLEL LINES (ie: if the lines DO cross), there is a simple main effect
Different measures of effect size and when appropriate to use
Eta squared (η2): one-way ANOVA: expressing between groups variability as a proportion of the total variability. To calculate eta squared: SSbetween divided by SSbetween + SSerror Factorial Design: partial eta-squared (ηp2): effect size for each main effect and the interaction. Expresses SSIV as a proportion of SSIV + SSError To calculate partial eta-squared (use the proportion above), ie: SSiv divided by SSiv + SSerror Benchmarks for petasq: .01 (small), .06 (medium), >=.14 (large) (same benchmarks for eta squared)
Calculating F
F = MSbetween/ MSwithin ·We do an ANOVA instead of t-tests for each comparison because with t-tests, the 0.05 alpha levels gets multiplied each time we conduct the test. ANOVA ensures that alpha is 0.05 across all comparisons
Marginal means
Goes across 2 cells; gives mean for 1 level of 1 IV across all levels of the other IV -Ex. Found avg of all low and all high freq words; found avg of all dim and all bright words
Sample size
Increased sample size= greater power & precision
Correlation and Regression
Like correlation, regression is based on a data set of (X, Y) pairs -Large correlations= X predicts Y more strongly -Small correlations= X only gives part of the story
Understand line of best fit: examples of simple, multiple linear regression
Linear regression: regression line goes through means of X and Y, marked by crossed vertical and horizontal lines -Regression line selected to minimize the estimation error (minimize SD of the residuals)
Relationship/Difference between scatterplot and pearson's r, how test for that generally works
Pearson's r measures linear component of an (X, Y) relationship, but we need to see the scatterplot bc r doesn't tell the full story -Viewing a scatterplot can help reveal range restriction -Quadrants can help eyeball any r -When we calculate a correlation we are interested in what the correlation is in the population (not the sample). The population parameter that we test is the null hypothesis
Sum of Squares
SS, is the sum, over all observations, of the squared differences of each observation from the overall mean. It is a measure of the total variation from the mean
SS (for repeated measures/within subjects)
SSSubject (Subject variability) & SSError (Error Variability) SS Error = SS Within - SS Subject
Split Plot ANOVA, repeated measures
Split plot uses one between- and one within-subjects variable (from lecture drink/gender example) To compare the means of each Drink condition separately for males and females, you need to conduct 2 one-way repeated-measures ANOVAs, followed by post hoc tests (test for simple main effects)
Type I error (alpha)
Stating that there is an effect when none exists (accepting an alternative hypothesis when the null is true) -a false positive
Type II error (beta)
Stating there is not an effect when one exists (failure to reject null hypothesis when it's false) (ex. pregnancy test say a pregnant woman is not pregnant) -false negative -there is an effect but it was missed -if p-value is less than 0.5= REJECT null
CI on the difference between means
The farther apart the two means, the more evidence there is of a difference -the difference between the two group means (M2-M1) (figure 7.2 in textbook)
In an effort to demonstrate the replicability of his research, Andy conducts the same study 50 times, each of those 50 studies has a power of .8. In how many of the 50 studies does Andy expect to obtain a p value less than .05? Assume there is an effect (difference)
The generally accepted power should be 0.8 or greater because it means you should have an 80% chance of finding a significant difference - 40 out of the 50 studies should obtain a p-value of 0.05
p-value
The probability level which forms basis for deciding if results are statistically significant (not due to chance). -the probability of obtaining the observed results of a test (ex. if p is less than 0.5, 5% of occasions are when p is less than alpha and we should reject the null)
What's beta weight?
The slope of the standardized regression line of ZY on ZX is referred to as β, called the standardized regression weight Standardized regression coefficient (slope of line in regression equation) Beta weight will equal the correlation coefficient when there is a single predictor variable The value is a measure of how strongly each predictor variable influences the criterion dependent value the beta is measured in units of standard deviation
Post Hoc Tests
We do post hocs to confirm where the differences occurred between the two groups. -You honestly only do a post hoc when its a one way anova and you look at the significant main effect, not the interaction. It's just a double-check kinda thing -they should run only when there is a statistical significance difference in group means like in a one way ANOVA -A Tukey procedure is a post hoc test based on range distribution you can use the Tukey HSD to find the specific group means which are compared with each otheto test all pairwise comparisons. -Bonferroni is also considered a post hoc test it is highly flexible and easy to compute and can be used with any type of statistical test like correlation and not just ANOVA. A result could occur that shows the statistical significance of the dependent variable, even if there is none
Effect size measures (Cohen's D)
a way to compare results even if they are measured on different scales. A standard effect size measure, expressed in SD units -anything of the researcher's interest, usually what the dependant variable measures in a study -The population ES is simply the true value of an effect in the population. The sample ES is calculated from the data, and is typically used as our best point estimate of the population ES -Independent Group Design: d=(M2-M1)/sp Paired Group Design: d=Mdiff/Sav 2 condition design
Regression models with two or more predictor variables: multiple regression
b is how much change in the outcome variable results from one unit of change in the respective predictor variable after controlling for the other predictor variable -Controlling for other predictor is critical when predictor variables are non-orthogonal (correlated themselves) -Amount of unique variance accounted by each predictor can be quantified using partial R2 -Preferable to choose predictors on a priori hypotheses, but can also use a data-driven approach (stepwise method for entering predictor variables) Forward selection Backward selection Forward and backward selection
power in independent groups design
depends on: design, chosen alpha (type I value), target effect size, size of group -the larger N is, the higher the power will be
power in paired design
depends on: target effect size, N, population correlation between 2 measures (ex. pretest and post-test)
Between subjects
different people are tested in different groups and different conditions
Within group variability for independent groups ANOVA
error/random sampling variability · Under the null hypothesis, this f ratio should be 1; There would be no variability among population means · When there is variability among population means (increase numerator), the f statistic will increase ·If null is true, large f values are unlikely
post hoc power
observed power, calculated after running study using obtained effect size as target
Between group variability for independent groups ANOVA:
one of two kinds of variability: error/random sampling variability + variability among pop means
Statistical power
probability of detecting differences when they truly exist, the chance that our experiment will find an effect of a particular size -also the chance that our replication will find a statistically significant effect -also known as the probability of rejecting null hypothesis correctly (not making a Type II error) -considered dichotomous and requires assumption of dichotomous hypotheses (H0 and H1) both need exact values -as power increases, so does the probability of making a type II error
Within subjects
repeated-measures, all participants measured in each condition -we can remove the variability due to overall subject differences, making the denominator of the F-test smaller
Dance of the means
take a lot of independent samples from a population, the means of the samples may vary but they will cluster around the population mean -used to visualize the differences between samples and how they relate to the population -smaller sample sizes will vary more, larger ones will be closer to representing the population mean -shows the extent of variability from sample to sample
95% CI
tells you that there is a range of values you can be 95% confident about containing the true population mean (19 out of 20 times) -quantifies our uncertainty -an interval calculated from sample data that is one from an infinite sequence, [M- 1.96 X population SD/square root of N, M+ 1.96 X population SD/square root of N]
Central Limit Theorem
the sum or mean of a number of independent measurements has approximately a normal distribution -the more measurements (or N), the closer the distribution will be to "normal" -as long as all individual components are independent and not affected by other factors, the theorem applies. -even if population has quite a different shape, sampling distribution will be close to normal
target effect size
value of the population standard deviation specified by the alternative hypothesis, population effect size for which we calculate statistical power
Split Plot ANOVA
§ One within and one between-subjects variable § Can also get subject-level means across each soft drink condition § When there is a significant interaction: simple main effects test would be a one-way RM ANOVA, one for males level and one for females level (one for each level of the RM IV) 1.When there is a significant simple main effect (ex. Within females, drink has a significant effect) then do 3 t-tests, one for each drink level, as a post-hoc test
Within Subjects factorial design
§ Repeated measures 2-way ANOVA No cells anymore in a fully repeated measures design; Each individual exposed to every cell § 2x2 design, same 10 people in every condition § Cell means are at the bottom in red - mean across all participants within the particular condition Can also get marginal means Get mean of both absent columns, both present columns, both 0 columns and both 8 columns (mean for each IV condition across both levels of the other IV) Can also find subject means Find how much variation is due to individuals, not condition § Post-hoc test = paired t-test § Interaction takes precedence over main effects §Uses partial eta squared as effect size measure
F distribution
· Has 2 df values - one associated with the numerator (between group), and one with the denominator (within group) · Df1 = # of groups -1 · Df2: # of participants - # of groups ·Most f distributions are centered close to 1