PSYCH 308 Exam 2

Ace your homework & exams now with Quizwiz!

Hypothesis Testing: when σ not known

Can we just substitute s for σ? No, we Can't substitute s for σ because s more likely to be too small, way smaller than σ However, we do it anyway, but call the answer t Compare t to the tabled values of t instead of to the distribution of z

What are degrees of freedom (df )?

Df is what statisticians do to make their data more conservative and to account for any potential error or flaw in design; they do this by taking away pieces of data points

What does sampling error have to do with all this? i.e. Why is a sampling error important to uinderstand?

Df tries to help compensate for the errors that comes from sampling and working with humans

Sampling distribution of mean differences

Distribution approaches normal as n increases. Later we will modify this formula to "pool" variances, but for now this formula is fine.

How does regression differ from correlation?

Regression describes how an independent variable is numerically related to the dependent variable. Correlation is used to represent the linear relationship between two variables.

What is the problem when the population variance (or st. dev.) is not known?

The problem is that we can't just substitute the sample standard deviation, s, for the population standard deviation, σ, because s is more likely to be smaller than σ, however, we do it anyway, but call the answer t instead of z

How do we calculate the degrees of freedom with two groups?

Total df = n1 - 1 + n2 - 1 Total df = n1 + n2 - 2 Ex. You have two groups, each with sample sizes of 100. 100 + 100 - 2 = 198 df

Degrees of freedom for a two independent t-test

Total df = n1 - 1 + n2 - 1 Total df = n1 +n2 -2 Ex. If each group has 100 subjects, total df = 100 + 100 -2 = 198

Distinguish between Type I and Type II errors.

Type I error: reject the null hypothesis when it is actually true (we say something is going on when in reality something is not going on); a false positive Type II error: fail to reject the null hypothesis when it is actually false (we say nothing is going on when in reality something is going on)

regression coefficients (Chapter 9)

a and b B = slope o Change in predicted Y for one unit in X A = intercept o Value of predicted value, Y or Y hat, when X = 0 In order to calculate the regression line we need to solve for a and b

What goes on the y axis?

dependent variable

Name the 3 basic parts of the CLT

*When you have a population with a mean of µ and a standard deviation of σ 1. Sampling distribution of the mean is going to have a mean very close to µ 2. The sampling distribution of the mean is going to have a standard deviation that is equal to σ/sq rt(n) 3. As n, the sample size, increases, the distribution is going to become more and more normal

Concepts critical to hypothesis testing

- Type I Error: reject the null hypothesis when it is actually true (we say something is going on when in reality something is not going on); a false positive - Type II Error: fail to reject the null hypothesis when it is actually false (we say nothing is going on when in reality something is going on) -Critical values: these represent the point at which we decide to reject the null hypothesis e.g. We might decide to reject null when (p|null) < .05 Our test statistic has some value with p = .05. We find this value from tables We reject this when we exceed that value; If we exceed the critical value, then we reject the null hypothesis because something is going on That value is the critical value - One-tailed test -Two-tailed test

Two major assumptions for a two-independent t-test

1. Both groups are sampled from populations with the same variance; when the two populations have roughly the same variance • "homogeneity of variance" (we are not talking about the samples here, but the populations in which the samples were drawn from) 2. Both groups are sampled from normal populations (the populations were big enough that they have that normal distribution) • Assumption of normality • Frequently violated with little harm

What factors affect the value of t ?

1. the difference between sample and population means (X bar - µ) 2. The magnitude of the sample variance (s^2) 3. the sample size (N) 4. the significance level (α) 5.whether the test is a one or a two tailed test

Be able to perform any one of the t-test based off a written explanation. MEMORIZE the t-test equations. (Chapters 12, 13, 14)

?

What would a correlation of .00 tell us about the relationship between lab grades and exam grades?

A correlation of .00 would tell us that there is no correlation between the two variables and that H0 is true

What are the advantages and disadvantages of related samples?

Advantages o Eliminate subject-to-subject variability (ex. If first client had scores of 48 and 42 this would drastically increase variability of data, but difference score still is 6 and no change in variability of differences.) o Control for extraneous variables (ex. A variable that influences client number 1 influences both before and after equally and is not reflected in the results.) o Need fewer subjects Disadvantages o Order effects: the effect of different variables may carry over and influence each other o Carry-over effects: any time you give a person some kind of treatment or intervention, there is a possibility that it will carry over and affect their after scores more severely than anticipated o Subjects no longer naïve: using the same exam twice may influence their score the second time taking the test because they are more familiar with the exam now o Change may just be a function of time: time may have passed; so PTSD symptoms may have gone down not just because of supportive counseling but it was a function of time, time has helped lessen their symptoms of PTSD o Sometimes not logically possible to do a related samples t test

Review the advantages and disadvantages to RELATED samples design. (Chapter 13)

Advantages o Eliminate subject-to-subject variability (i.e. If first client had scores of 48 and 42 this would drastically increase variability of data, but difference score still is 6 and no change in variability of differences. o Control for extraneous variables (A variable that influences client number 1 influences both before and after equally and is not reflected in the results) o Need fewer subjects Disadvantages o Order effects: the effect of different variables may carry over and influence each other o Carry-over effects: any time you give a person some kind of treatment or intervention, there is a possibility that it will carry over and affect their after scores more severely than anticipated o Subjects no longer naïve: using the same exam twice may influence their score the second time taking the test because they are more familiar with the exam now o Change may just be a function of time: time may have passed (i.e. PTSD symptoms may have gone down not just because of supportive counseling but it was a function of time, time has helped lessen their symptoms of PTSD) o Sometimes not logically possible to do a related samples t test

Why do we call Y - Y hat a residual?

Because y is all of your variation minus y hat which is what you expect to have variated; if you take all the variation and you take out what you expect to have variated, you have what is leftover; a residual is something that is left over, namely the unexplained variation

Homogeneity of variance (Chapter 14)

Both groups are sampled from populations with the same variance; i.e. when the two populations have roughly the same variance (we are not talking about the samples here, but the populations in which the samples were drawn from)

What does the text mean by modifying the df when we have heterogeneity of variance?

By modifying the degrees of freedom it simply means that since the variances are very unequal, suggesting heterogeneity of variances, that we can't pool them together, so we have to adjust for this factor by adjusting the degrees of freedom by using the given equation

Difference scores

Calculate difference between first and second score o e. g. Difference = Before - After Base subsequent analysis on difference scores o IGNORING BEFORE AND AFTER DATA AND FOCUS ONLY ON THE DIFFERENCE SCORES

Be able to recognize and use the equation for SSExplained (chapter 10)

Explained variation: our deviation from Y hat (or Y') from Y bar; what we anticipate Y to be Sigma(Y hat (or Y') - Y bar)^2

Type II Error (chapter 8)

Fail to reject the null hypothesis when it is actually false (we say nothing is going on when in reality something is going on) • Ex. Assume the violent video does have an effect on associations and makes a difference, but we conclude that they don't make a difference • Probability denoted by β • Power= (1-β) = probability of correctly rejecting false null hypothesis (we won't really use this in this class)

Hypothesis Testing: when σ not known T-test for one mean Ex. Assume the following data scores from a data set of watching violent videos versus non violent videos Mean number of aggressive free associates = 7.10 The sample size is 100 Assume that we know that without watching the aggressive video, the mean would = 5.65 We don't know σ, but we know that s = 4.40 Is there a large enough difference between 7.10 and 5.65 to conclude that watching the video affected the results?

H0: µ = 5.65 H1: µ DOES NOT = 5.65 (two tailed) Sample mean, X bar = 7.10 s = 4.5 n = 100 Since σ is NOT known, we use a t-score (this is essentially the same as a z-score except we have s in place of σ) t = (X bar - µ)/(s/sq rt(n)) t = (7.1 - 5.65)/(4.40/sq rt(100)) t = 3.30 Degrees of Freedom • Skewness of sampling distribution of variance decreases as n increases • t will differ from z less, as the sample size increases • Therefore, need to adjust t accordingly • df = n - 1 = 1 100-1 = 99 • t based on df We look up the t-score on the table. For a T distribution ALWAYS AIR ON THE SIDE OF .05 This is a two-tailed test, so we look up our t-score for a two-tailed test with a level of significance of .05 for df = 99. From looking this up on the table, we find that our critical value is 1.984. We need to exceed 1.984 in order to be able to say that we have significant results Conclusions: • With n = 100, t.05(99) = 1.98 • Because t = 3.30 > 1.98, reject H0 • Conclude that viewing violent video leads to more aggressive free associates than normal

Testing Hypotheses: σ known Ex. Assume the following data scores from a data set of watching violent videos versus non violent videos Mean number of aggressive free associates = 7.10 The sample size is 100 Assume that we know that without watching the aggressive video, the mean would = 5.65 and the standard deviation = 4.5; These are parameters (µ and σ) Is there a large enough difference between 7.10 and 5.65 to conclude that watching the video affected the results?

H0: µ = 5.65 H1: µ DOES NOT = 5.65 (two-tailed) Sample mean, X bar = 7.10 σ = 4.5 n = 100 Since σ is known, we use a z-score. First, we must use the z equation to find the z-score. z = (X bar - µ)/(σ/sq rt(n)) z = (7.10-5.65)/(4.5/sq rt(100)) z = 3.22 If z > + 1.96, reject H0 (find this value on a t-table for df of 99 under a two-tailed test for level of significance of .05) 3.22 > 1.96, therefore, we reject H0 and conclude that the difference is significant and that watching violent videos does effect subsequent violent behavior.

Understand hypothesis testing and all its parts (chapter 8)

Hypothesis testing: structured way of thinking of the problem; includes the aspects of H0, Hi, alpha, critical value; it is the scientific method applied to statistics - Start with Hypothesis that subjects do not differ from the "norm" ○ Null hypothesis - Find what normal subjects do, HI - Compare our subjects to the normal standard by using a critical value Steps in hypothesis testing - Define the null hypothesis - Decide what we would expect to find if the null hypothesis were true - Define the alternative hypothesis, Hi - Look at what you actually found - Compare it to a critical value - Reject the null hypothesis if what you found is not what you expected

Revised formula with pooled variances

IF YOU HAVE EQUAL SAMPLE SIZES, THERE IS NO REASON TO CALCULATE A POOLED VARIANCE BECAUSE YOU WILL END UP WITH THE SAME t VALUE IF YOU HADN'T POOLED THEM (i.e. it's just extra work)

When would we not pool the variances?

If the sample sizes, n, from both groups are equal, or one variance is more than four times the other (that suggests heterogeneity)

What does it mean to "pool" the variances?

If we assume that the population variances are equal, then you take the average of the two sample variances for a better estimate of the population variance (this only is necessary/applies if the sample sizes, n, are different and if the sample variances are homogenous, no more than 4 times each other)

How do we calculate difference scores? What happens if we subtract before from after instead of after from before?

In order to calculate difference scores we just subtract the difference between the two scores. If we switch the order in which subtract it will change the sign of our difference and we will have to account for that change

How does the CLT apply to our statistics class?

It applies to almost everything we do in this class

Ex. Therapy for rape victims We'll focus on a group that received Supportive Counseling Measured post-traumatic stress disorder symptoms before AND after therapy Before After Dif 21 15 6 24 15 9 21 17 4 26 20 6 32 17 15 27 20 7 21 8 13 25 19 6 18 10 8 Mean 23.89 15.67 8.22 SD 4.20 4.24 3.60

Mean = 8.22 Range = 11 s. d. = 3.60 Notice that we are ignoring original scores; WE ARE ONLY FOCUSING ON THE DIFFERENCE SCORES Was this enough of a change to be significant? If no change, mean of differences should be zero o So, test the obtained mean of difference scores against µ = 0. IN A TWO SAMPLE RELATED T-TEST, µ WILL ALWAYS EQUAL 0!!! We now have one sample of data (the differences); we have transitioned into a one sample t test; µ is our parameter o Use same test as in Chapter 12. We don't know σ, so use s and solve for t D and sD = mean and standard deviation of differences. t = (D bar - µ)/(sD/sq rt (n)) t = 8.22/(3.6/sq rt(9)) = 6.85 df = n - 1 = 9 - 1 = 8 Now we use the t table to look up our critical value for df of 8, for a two-tailed test for significance level of .05. t.05(8) = +/- 2.306 Since the actual t value of 6.85 is greater than our critical t value of 2.306, 6.85>2.306, we reject H0 and conclude that the mean number of symptoms after therapy was significantly less than the mean number of symptoms before therapy and supportive counseling seems to work

If the slope is negative, what does that tell us about the sign of r?

Negative Allows you to predict what is going to happen, if the slope of the line is negative, then r is negative and vice versa

What is the central limit theorem (CLT)? (chapter 12)

Official definition: Given a population with a mean = µ and a standard deviation = σ, the sampling distribution of the mean (the distribution of sample means) has a mean = µ, and a standard deviation = σ/sq rt (n). The distribution approaches normal as n, the sample size, increases.

When would CovXY be large and negative?

One small number and one big number o Large x values matched with small y values and vice versa

Distinguish between one-tailed and two-tailed tests.

One tailed test: directional; Rejects null if the obtained value is too low or too high (we only set aside one direction for rejection) Two tailed test: nondirectional; rejects null when the obtained value is too extreme in either direction (either too high or too low)

When would CovXY be large and positive?

Our Xs and Ys will both be large OR Our Xs and Ys will both be small

Why do we have 8 df in our sample when we have 18 observations?

Our degrees of freedom is going to be n-1 for related sample t tests. Our n will be the number of pairs or matches in our sample rather than each score, so since we have 9 pairs, our n = 9 and our df = 9-1 = 8.

Type I Error (chapter 8)

Reject the null hypothesis when it is actually true (we say something is going on when in reality something is not going on); a false positive • Ex. Assume violent videos have no effect on associations, but we conclude that there is an effect • The probability of a type I error, α, is set by the researcher (α is usually set at .05) • The probability of type I error is .05; there is a 5% chance of getting a type I error

Chapter 12 (Hypothesis Tests: One Sample Mean)

Shape of the sampling distribution • As we take more and more samples of 100 people again and again and again, that our sampling distribution of the mean will approach normal • Rate of approach depends on both how many groups of the sample we take, but as well as the sample size (the bigger the sample size, the faster the distribution approaches normal)

standard error of estimate

Standard deviation of predicted values around the regression line; the square root of the residual variance S Y - Y'= Sq rt[(Sigma(Y - Y')^2)/(N-2)]

Be able to recognize and use the equation for SSTotal (chapter 10)

Sum of squares total (total variation): deviation of Y (Y scores) from Y bar (our mean) Sigma(Y - Y bar)^2

Be able to recognize and use the equation for SSUnexplained (chapter 10)

Sum of squares unexplained: deviation of obtained scores (actual scores) from our predicted scores (we cant explain); there must be other factors in play that we can't anticipate, so we have variability Sigma(Y - Y')^2

Sampling error

The normal variability that we would expect to find from one sample to another or from one study to another; Random variability among observations or statistics that is simply due to chance

What is sampling error?

The normal variability that we would expect to find from one sample to another or from one study to another; the random variability among observations or statistics that is simply due to chance

How do you determine what goes on which axis of a scatterplot?

The dependent variable, or the predictor variable, goes on the x axis, while the dependent, or criterion variable goes on the y axis

Define a sampling distribution

The distribution of a statistic over repeated sampling from a specified population

Sampling distribution

The distribution of a statistic over repeated sampling from a specified population

Null hypothesis

The hypothesis that our subjects came from a population of normal responders; the hypothesis we usually want to reject; the null hypothesis says that there is nothing really going on; we strive to reject the null hypothesis and conclude something is going on

Hypothesis Tests: Two Related Samples t-test (Chapter 13)

The same participants give us data on two measures o e. g. Before and After treatment o i.e. aggressive responses before video and aggressive responses after video With related samples, someone high on one measure is probably high on other. Sometimes called matched samples or repeated measures t tests (other names for related samples t test) o Matched sample is a little different than repeated measures or related samples because a match sample uses two partners i.e. husband and wife responses; the husband is matched with their actual wife Correlation between before and after scores is central to the problem o causes a change in the statistic we can use We use difference scores for two related sample t-tests

What does the shape of the sampling distribution of the variance have to do with anything?

The skewness of sampling distribution of variance decreases as n, the sample size, increases DEALS WITH THE CLT

residual variance (error variance)

The variability of predicted values around the regression line S^2 Y - Y'= (Sigma(Y - Y')^2)/(N-2)

Indicate the level (high, medium, or low) and sign of the correlation for: 1. Number of guns in community and number firearm deaths. 2. Robberies and incidence of drug abuse. 3. Protected sex and incidence of AIDS. 4. Community education level and crime rate. 5. Solar flares and suicide.

There should be graphs that accompany a question like this, be able to interpret the graph and which correlation coefficient values are weak, medium, strong, or no correlation

How does t differ from z?

They are basically the same except we use s in place of σ for t

Be able to explain why you calculate different t-scores when you use the same data but in a different design (i.e. going from independent to dependent designs).

This is just to see if you can explain the situations under which we perform the 3 t-tests we have learned

Heterogeneity of variance

This refers to the case of unequal population variances. We don't pool the sample variances. We adjust df and look t up in tables using the adjusted df.

What is the probability that we'd conclude violent videos cause aggression if they really don't?

This would be a type I error because we are rejecting the null hypothesis when it is actually true. And in this case, the probability of a type I error is equal to α. The researcher is the one who will choose α, which is usually p<.05

Testing r

Use the table for r critical values (E.2) to look up the df = n-2 for level of significance .05 If the correlation coefficient exceeds the critical value, reject H0 and conclude Hi *when testing r it is always two-tailed and df = n-2

Why do we care about regression?

We may want to make a prediction More likely, we want to understand the relationship better o How fast does CHD mortality rise with a one unit increase in smoking? o Note we are speaking about predicting, but we often don't actually predict when doing regression

Why do we care about the sum of squares, SSExplained, SSUnexplained, and SSTotal? (Chapter 10)

We need to be able to account for errors in our data set

When have we used the CLT?

We use it for almost everything we do in this class. We have used it in testing hypotheses. We have used it any time we have worked with sampling distributions, or bell curves.

Hypothesis Tests: Two Independent Samples (chapter 14)

We want to test differences between sample means. o Not between a sample and a population mean o Instead of looking at our sample compared to the population, We are looking at the 2 sample means Ex. Is there a significant difference between the two groups, the one that saw the violent video vs the educational video? We have to worry about the distribution of two samples, not just one. We need the sampling distribution of differences between means. o Same idea as before, except the statistic is (X1 - X2)

Hypothesis Tests: One Sample Mean (Chater 12)

We want to test the difference between a sample and population mean. There are two types of problems we practiced in this section: 1. Testing Hypotheses when σ (population SD) is known 2. Testing Hypotheses when σ (population SD) is not known

What is considered a weak, moderate, and strong correlation coefficient?

Weak = below 0.3 Moderate = 0.3-0.7 Strong = anything above 0.7

What is the sampling distribution of the mean?

When you take multiple samples, record their means, and then plot the frequency of their means on a distribution (i.e. the distribution of a statistic over repeated sampling)

How may homogeneity of variance affect your t-test?

Without homogeneity of variance, you can't reasonably do a t-test, and it won't be useful; If you have to pool the variances it isn't as useful either

Know how to calculate a regression equation. (Chapter 10)

Y hat = bx + a Y hat = the predicted value (variable) of Y (CHD mortality rate) X = smoking incidence for that country (cigarette consumption)

Pooling Variances

YOU ONLY POOL VARIANCES IF YOU THE SAMPLE SIZES, n, ARE DIFFERENT BETWEEN THE TWO GROUPS AND IF ONE VARIANCE, SD, IS MORE THAN FOUR TIMES THE OTHER If we assume both population variances are equal, then the average of the sample variances would be a better estimate of the population variance. Substitute sp2 in place of separate variances in formula for t *Will not change the result if sample sizes are equal (if you have equal sample sizes, then you will not see a difference in the t value you will get); we pool variances because it is a weighted average, so WE USE THE POOLING OF VARIANCES WHEN WE HAVE UNEQUAL SAMPLE SIZES THAT WE ARE TRYING TO PLUG INTO A T- TEST EQUATION FOR TWO INDEPENDENT SAMPLES *Do not pool if one variance is more than four times the other (this might indicate that there is heterogeneity of variance, meaning that their population variances are not equal, instead of homogeneity of variance, so we don't want to pool the variances)

What is the correlation coefficient? (Chapter 9)

a measure of the degree of relationship between two variables The sign of the correlation coefficient refers to direction Positive or negative Positive: both variables move up or down together Negative: one variable moves up while the other moves down Ranges between 0 and 1 Based on covariance o Measure of the degree to which large scores on x go up with large scores on y, and small scores on x with small scores on y

Negative correlation

as one variable increases, the other decreases

How to calculate the slope, b, for a regression line

b = (covxy)/(Sx)^2

Positive correlation

both variables either increase or decrease together

What goes on the x-axis?

independent variable

What is the symbol for the correlation coefficient?

r

Coefficients of determination

r^2 The proportion of variance in Y that can be explained by the relationship between X and Y This also means that calculating the sum of squares explained divided by the sum of squares total will equal r 2 BE ABLE TO SOLVE r BY o Squaring your correlation coefficient, r AND o Be able to prove that your sum of squares explained divided by the sum of squares total is going to equal the same r^2 value

What does it mean to say that r 2 = percent variation accounted for?

r^2 is correlation coefficient predicting value of what your correlation is r^2=SSexplained/SStotal

Mean of sampling distribution of mean differences

µ1 - µ2


Related study sets

Direct Object, Indirect Object, Predicate Nominative, Predicate Adjective, Object of the Preposition

View Set

MSN-FNP Musculoskeletal Disorders

View Set

Med-Surge: Dermatology (20 Questions)

View Set

NURS 7800 Gastrointestinal System

View Set

Reward and Comp UARK MGMT 4953 Exam 3- Gupta

View Set