Advanced Statistics II Test 1

Ace your homework & exams now with Quizwiz!

Problems with t -test

when using a t test one can ovesimplify or mask variability within groups you are assessing ex: do men and women score differently on tests of intelligence -by using a t test you make the assumption that gender is binary --so people are excluded or not accurately described

What is the single biggest factor we can use to estimate the probability of T2 errors?

SAMPLE SIZE! -if we don't have a large enough N then we will not be able to identify a significant effect (i.e. T2 error)

Calculating Difference Scores

The data of interest in repeated-measures study are difference scores, calculated using the following formula: Difference score: D=X1-X2 Each sign of a D score tells you a direction of change The t formula for a repeated measures design becomes: t= average sample difference score - ud/sd note that this is again the same basic formula

Homogeneity of variance

definition: states that the 2 populations being compared must have the same or roughly equivalent variance. The importance of this assumption increases when there are large differences in sample size -if one sample variance is more than 3 to 4x larger than the other then there is cause for concern. Hartley's F max test is used to test for homogeneity of variance of 2 or more independent samples Why? it affects the accuracy of the t tests (biases it up or down and makes it inaccurate, there is now a factor other than the mean that the groups differ on) Hartley's f max test is used to test for homogeneity of variance of 2 or more independent samples ---if this sees that the HOV is violated, then the t test is probably not the right test to use

ANOVA Terminology: factor, single factor design, factorial design, levels

-In ANOVA, an independent variable (true one, needs random assignment) or quasi-independent (already assigned - race, gender, etc) variable is called a factor -A research study that involves only one factor is called a single factor design -A study with more than 1 factor is called a factorial design -The treatment conditions that make up a factor are called levels and are symbolized by k IV 1: Self Presentation Tactics ,k= 2 (ingratiation, self-promotion) IV2: Reference groups, k=2 (peers, social influencers) DV: self esteem Thus, a 2x2 design

Power Analysis

- you do this before experiment, typically during proposal stage -frequently required by committees ----want to see that you've got a chance of finding the effect, if you're right ----power analysis is based on your sample size, alpha level, and how big of an effect you expect. ---really not a "big deal" ----Cohen (1988) did all the work for us, and you'll talk about how to actually do the analysis later

Testwise vs. Experimentwise Alpha Levels

-T1 error rate = alpha level of one hypothesis -T1 errors are cumulative within the experiment -With alpha= .05 , any given correlation will be declared significant, by chance 5% of the time. ---> this is a testwise alpha *Testwise Alpha* = is the alpha for just 1 hypothesis test -think of a correlation matrix with 6 variables -the probability of committing a type 1 error if we just looked at the whole matrix: .75 (.05x15) ---- there is a 75% chance that s/t will be significant if we have 6 variables....if we have 7 variables there is a 1.05 chance of s/t being significant.... THUS, even if nothing is truly related one correlation will probably be extreme enough by chance that s/t will work ( more variables= higher chance of of T1 errors) ---> this is an experimentwise alpha *Experimentwise alpha:* total alpha level for entire study based on all hypothesis tests ran (.75) -to get the experimentwise alpha just sum up the alphas for all the tests you ran (.75 total vs. .05 of one test) ---SO T1 ERRORS WILL ADD UP AS WE DO MORE AND MORE HYPOTHESIS TESTS --. EVENTUALLY WILL GET S/T THAT WORKS AND IS SIGNIFICANT

The logic of ANOVA

-The first step is to determine the total variability for the set of data -The next step is to break the entire variability down into two separate, basic components: *Between treatments variance *measures the variability between treatments in order to provide a measure of the overall differences between the treatment conditions or sample means. *Within treatments variance* provides a measure of the variability within each treatment condition (variability between groups, numerator in F ratio) Breaking total variance of DV down to component Total variance in the DV (made up of within treatments and between treatments which should equal total variance)

Hypotheses for the Single factor, Independent-Measures Design

-The goal for ANOVA is to help the researcher decide between the following 2 interpretations. H0: There really are no differences between the populations or treatments. Any observed differences are simply due to chance or sampling error (here it is inherent in the format that the sample means represent the population, and are generalizable to the populations/treatments) ex: H0: u1=u2=u2=u4=...ux H1: The differences between sample means represent real differences between the populations or treatments. The sample data accurately reflect real differences in the populations. That is, at least one population mean is different from the other. ALL ANOVAS TELL US: one groups average happiness score is different from the other, if have specific hypotheses need a follow up test to which one is different

Analysis of Variance (ANOVA)

-a hypothesis -testing procedure that is used to evaluate mean differences between 2 or more treatments or populations. (t test is only between 1 treatment or population ( can have be comparing different samples within one population though -males vs. females) -Comparing 2 or more treatments is a major advantage -ANOVA can be used with either an independent measures or a repeated measures design -Do these groups differ on average to one another? -variance:how spread out scores are around the mean o^2 = average squared deviation from the mean (ie. average variance) What are we analyzing? -always analyzing variance in the DV, total o^2 in the DV

The logic of hypothesis testing

-all hypothesis testing is based on probabilities and one can never be absolutely right or wrong

Between-Treatments Variance

-can get difference in-groups based on treatment effect but not only reason groups are going to differ. Between-treatments variance can be caused by two different situations: -A true treatment effect exists between the groups and therefore between the group means -The differences between the groups (and therefore between the group means) are due to chance alone. -The problem with between-treatments variance is that the treatment effect and chance variation are mixed together so that it is impossible to measure the treatment effect by itself. ANOVA solves this problem by measuring chance differences by themselves through computing the within-treatments variance. Between Treatments: true treatment effect, error (random), individual differences (all bunched under between in terms of what being measured)

What is it called when we keep running tests?

-capitalization of chance - running analysis after analysis on things not related until eventually s/t will work out for you

Statistical Power

-power is the probability that we will correctly identify a true effect power = 1-B -we want enough power in study to find an effect see diagram on notes -but in summary: if there is an effect present, and our experiment detects it and is significant sthen we had good power and can reject the null -if an effect was really there but our experiment found our effect to be N.S. then a type II error has occurred -if there really isn't an effect or relationship between our variables, but our experiment finds a significant one then a type 1 error has occurred -if there is no effect in real life and our experiment also finds a N.S. effect then we made the correct decision and there is no effect! (fail to reject the null) THIS IS WHAT MODERN HYPOTHESIS TESTING FOCUSES ON (NSHT)

Assumptions of the t-statistic

-the values in the sample consist of independent observations. -the population sampled must be normal (normal curve) ----in reality, violating this second assumption has little practical effect on the results obtained using a t statistic -this reality is particularly the case when you have a large N ex: N=30 , if bigger N errors will also be normally distributed and its fine -if you are using multiple populations, then we also assume that (a) both populations are normal, and (b) the populations have equal variances. variance: how spread out scores are around the mean.

Main criticism of hypothesis testing

-we focus on the wrong "correct decision" --> fail to reject the null -We should instead be using confidence intervals instead, providing a range of values where we're confident the "true value lies" -this is logically distinct from NSHT, but allows you to not LOSE your information Logical difference: 1. Hypothesis testing as it's most commonly done allows us to say, "With p=.05, we're 95% certain we won't identify a null relationship as being significant" 2. Confidence intervals allow us to say "we're 95% certain the true value of the relationship we're testing falls between these two other values" ex: r=.37 (observed value), p<.05 --> w/NHST all we can say is we reject the null, but w/ CI we can calculate that the observed value of .37 falls between .17 and .57 and can say we are 95% certain that the value between .17 and .57 ---> SO WITH CI WE GET MORE INFO THAN WITH NHST Ex2: IQ and shoe size r=.12 (observed value) p =.50 -with NSHT would say fail to reject the null -but with CI can say we are 95% confident that our true value lies between -.05 and 0.27 -the difference between the two examples given is whether 0 is part of interval, for IQ it is possible. ----if 0 is a part of the interval then you fail to reject the null ----if 0 is not a part of the interval then one can reject the null OVERALL, you don't actually lose info you get from hypothesis tests if you use CIs instead of NSHT

Type 1 Errors

-we make a type 1 error when we obtain a statistically significant effect (p<.05), but in fact there is no "true" relationship between our variables. Our effect is due to chance. -One can set the alpha a priori (or before the experiment). If set it at .05, 5% of the time you will get a T1 error. If you set it at .01, then 1% of the time you will get it. THE RESEARCHER HAS COMPLETE CONTROL OF THE T1 ERROR RATE The lower the alpha level the more extreme results have to be to be significant...so there is a tradeoff between not wanting to get a T1 error or wanting it to be more likely to obtain significant results

Key terms for related samples/correlated samples:

1) A repeated measures study is one in which a single sample of individuals is measured more than once on the same dependent variable. Thus, the participants are used in all treatment conditions. ---measure same group of people at 2 times (change from T1 to T2) 2) Matched subjects: match people across groups on whatever variables they are equal on (experimental control); use if don't do random sampling. ---each individual in one sample is matched with a participant in the other sample. The matching is done so that the 2 individuals are equivalent with respect to a particular variable that the researcher would like to control.

Other t-test designs

1. Independent measures or between subjects research designs use two completely separate samples for each population (male, female) or treatment condition (drug, no drug). 2. Repeated measures or within subjects research designs obtain 2 sets of data from the same sample (pre and post treatment). This is what chapter 11 is about. Logic of the formulas for these 2 remains the same but now we're using HYPOTHESIZED population parameters whereas in the examples above we're using actual parameter values

p<.05 means? (4 parts)

1. The probability of getting a result this extreme *by chance* is less than .05. Such results are *statistically significant*! thus, null is correct if we get a p value greater than .05 -- means difference has to be due to variables and not because of random chance 2. If we reject the null and conclude that there is a real effect (difference, relationship, whatever) based on this finding, we will only be wrong 5% of the time. 3. When set a priori (before experiment), .05 is reffered to as our *alpha level* 4. Our alpha level also defines our type 1 error rate

What are the 2 conclusions we can make when we hypothesis test?

1. reject null = p<.05= differences found 2. fail to reject the null (not same as accept the null)= no differences found nhst - null hypothesis significance testing; we are never actually testing the alternative hypothesis

Total DV o^2 means?

100% of the variance in the DV, we are trying to figure out how much of the variance in the DV is due to treatment (IV) and how much is due to error?

Type II Error

Definition: say no effect (not treatment effect, no correlation, etc.) when in reality there is So, T1 error = say intervention that could lower anxiety but in reality will not T2 error= intervention is not helpful when in fact it would be helpful = missing out on s/t that could be helpful in live ==> IMPORTANT Cohen (1962) demonstrated that this error is INCREDIBLY common. Assuming a moderate effect size (relationship between variables) the studies reported in the 1960 Journal of abnormal and social psychology had 52% chance of making a type II error. That is, over half the time, they wouldn't be able to detect an effect that was really present. Where Type I error is referred to as alpha, Type II error is labeled beta (B) -we still like to focus on T1 errors though

F statistic (the ANOVA test statistic)

F= variance (differences) between samples means/ variance (differences)expected by chance or error OR (observed differences/chance differences) -The f ratio is thus based on variance as measure of mean difference. -If the sample means are clustered close together, the variance will be small. -if the sample means are spread over a wide range of values, the variance will be larger Therefore, the variance in the numerator of the f ratio provides a single number that describes the differences among the sample means AS LOOKING AT VARIABLES, THE VARIANCE IS TRYING TO CAPTURE HOW SPREAD OUT THESE SCORES ARE AS AN INDICATOR OF HOW DIFFERENT THESE GROUPS ARE

Differences between the t statistic and the f ratio

In each of these statistics, the numerator of the ratio measures the actual difference between populations or treatments obtained from sample data. In each of these statistics, the denominator measures the difference that would be expected by chance F= Observed/chance The denominator of the F ratio is often called the error variance in order to stress its relationship to the standard error in the t statistc. So the f test asks a similar question to the t test but only frames it in terms of variances - how much variance did we get across people, relative to how much variance we'd expect just by random chance?

Hypothesis Testing for a Related Samples t-test

Looking at difference scores: difference between T1 to T2 or differences between individuals Inferences are made about the theoretical probability distribution of difference scores, with inferences about the general population of difference scores made from the sample data -The H0 suggests that for the general population there is no change, no effect or no difference. H0: ud=0 ---so H0 suggests no change from T1 to T2 The H1 states that there is a treatment effect that causes the scores in one treatment condition to be systematically higher or lower than the scores in the other condition H1: ud doesn't equal 0

Null Hypothesis vs. Alternative Hypothesis

Null (H0) = nothing is going to happen, no effect/relationship Alternative (H1) = think some variables are related -both the null and alternative cannot happen at the same time, they are mutually exclusive and cover all possibilities -if you reject the null you are hoping for results extreme enough that they aren't due to random chance

What are the acceptable standards for Type 1 and Type 2 Errors?

T1: .05 (5% chance of a T1 error occuring) T2: .20 (20% chance of detecting an effect when one is truly present) ---- this is quite low and really should be .80 (80% chance of detecting an effect when one is truly present) -dont tend to focus on t2 and power bc a lot goes into power and is harder to control

T-test

The t statistic is used to test hypotheses about the population mean (u) when the value of the population standard deviation (O) is not known. The t statistic formula uses estimated standard error to quantify/calculate the expected amount of error (or distance) between the sample mean (M or x) and the population mean -estimated standard error (sx or sample standard deviation) is used as an estimate of the population s.d. (ox)when the actual population sd is not known Overall, the t-test tells us how much difference there is between two means relative to the amount of difference (the estimated standard error) that we'd expect just by random chance. or how much difference is between groups just by chance?

Choosing alpha levels

The way we structure our statistical testing is designed to minimize type 1 error. We want to be confident the effects we're seeing are "real" That's not the only kind of error though...though duh duh duh type II errors!

Within Treatments Variance

The within-treatments variance provides a measure of how much variation it's reasonable just to expect by chance. It consists of two parts: -Individual differences: the backgrounds, characteristics, attitudes, etc. that participants bring with them to the experiment -Experimental error: simple error that does not vary in any systematic way Note that "error" due to individual differences is only an issue in the between-subjects model. This "error" can also be a focal element of a study. Why do people differ within a group?

Independent Measures/Between Subjects T-test

t= (x1-x2) - (u1-u2)/S(x1-x2) The sample statistic is the difference between sample means. The population parameter is the difference between population means. So...Same basic formula as before. The null hypothesis is that there's no difference at the population level, so the formula functionally reduces to this: t= (x1-x2)/s(x1-x2)

Sample vs. Population t formula

t= x-bar - population mean (u)/ sample standard deviation -used when population parameter (i..e.) mean is used Good for broad level statistics, not used all that often in published research. It's a good one to know about and be able to use when you're evaluating interventions though. H0= sample mean equals population mean (x-bar =mu) (numerator is 0) H1: sample mean and population mean are not equal -these are two tailed tests bc they do not specify direction H0= M (sample mean) is less than or equal to u (population mean) H1: M is greater than u H0: M greater than or equal to 0 H1: M is less than or equal to 0 ALL OF THESE ARE KNOWN AS ONE TAILED TESTS BC THEY SPECIFY DIRECTION

General Form of the t-Formula

t=sample stat-hypothesized population parameter/estimated standard error genarally all versions of the test are in the form of: observed differences between group means/ chance differences (standard error) example of a research question that we would use a t test and where we compare a sample mean to a known population mean -Do xavier psychology graduate student at xavier do significantly worse , in terms of salary after graduation, than psychology graduate students do on average in the population?


Related study sets

ACC307 Chapter 7- Data Analytics and Presentation

View Set

Sub-queries and MERGE statements

View Set

Chapter 7: Concepts of Bio Midterm

View Set

Social studies map like 10 biggest country Saudi Arabia

View Set

75% Math, Data Analysis and Probability

View Set

Prokaryotic and Eukaryotic Cells

View Set