Exam 4 Stats

Ace your homework & exams now with Quizwiz!

Strength of Relationship: The magnitude of the relationship.

A numerical index, "Pearson correlation coefficient," (r) is used to indicate the magnitude and direction of the linear relationship. Regardless of the sign (direction), as the numeric value is close to 1.00 (-1.00), the stronger the relationship is. r describes the linear relationship between pairs of continuous variables. 1. Range of r: -1 ~ 0 ~ +1 2. + sign indicates a positive relationship; - sign indicates a negative relationship. a. r = +1 "perfect positive relationship" b. r = -1 "perfect negative relationship" c. r = 0 "no linear relationship" 3. The more closely a value of r approaches either -1 or +1, the stronger the relationship. r = .80 and r = -.80 have the same degree of strength.

Positive relationship

A positive correlation indicates that X and Y change in the same direction. Individuals obtaining high scores on one variable tend to obtain high scores on a second variable. The converse is also true; that is individuals scoring low on one variable tend to score low on a second variable (e.g., length of study time and exam score).

Correlation

A statistical technique that is used to measure and describe the DIRECTION and MAGNITUDE of a relationship between two variables.

An analysis of variances produces dfbetween = 2 and dfwithin = 24. For this analysis, what is dftotal? a. 26 b. 27 c. 28 d. cannot be determined without additional information

A. 26

For an F-ratio with df = 2, 10, the critical value (cutoff) for a hypothesis test using = .05 would be __________. a. 4.10 b. 7.56 c. 19.39 d. 99.40

A. 4.10

An analysis of variance produces SStotal = 90 and SSwithin = 40. For this analysis, what is SSbetween? a. 50 b. 130 c. 3600

A. 50

1. Most of the students who scored below the mean on test 1 also scored below the mean on test 2. The correlation between the two tests appears to be ___? a) positive b) negative c) near zero d) curvilinear

A. Positive

A treatment effect refers to differences between scores that are caused by the different treatment conditions. The differences (or variability) produced by treatment effects will contribute to __________. a. the numerator of the F-ratio b. the denominator of the F-ratio c. both the numerator and the denominator of the F-ratio d. Treatment effects do not contribute to the F-ratio because they are removed before the F-ratio is computed.

A. the numerator of the F-ratio

In analysis of variance MS provides a measure of __________. a. variance b. average differences among means c. the total variability for the set of N scores d. the overall mean for the set of N scores

A. variance

If we already have t-test, why is ANOVA necessary?

ANOVA is necessary to protect researchers from excessive risk of a Type I error in situations where a study is comparing more than two means.

If we are comparing two means from independent samples....

ANOVA will produce the same results as the t-test for two independent samples.

If we are comparing two means from dependent (related) samples...

ANOVA will produce the same results as the t-test for two related samples.

When the null hypothesis is true for an ANOVA, what is the expected value for the F-ratio? a. 0 b. 1.00 c. k - 1 d. N - k

B. 1.00

An analysis of variance produces SSbetween = 40 and SSwithin = 60. Based on this information, the percentage of variance accounted for, 2, is equal to __________. a. 40/60 = 67% b. 40/100 = 40% c. 60/40 = 150% d. 60/100 = 60%

B. 40/100 = 40%

The correlation coefficient (r) between hostility and cynicism is .70, whereas the r between hostility and self-esteem is -.70. Which of the following is true? a) The magnitude of the relationship between hostility and cynicism is stronger than that of between hostility and self-esteem. b) The strength of the relationship between hostility and cynicism is the same as that between hostility and self-esteem. c) Hostile people tend to be cynical and have high self esteem. d) None of the above.

B. The strength of the relationship between hostility and cynicism is the same as that between hostility and self-esteem.

In analysis of variance, the F-ratio is a ratio of __________. a. two (or more) sample means b. two variances c. sample means divided by sample variances d. None of the other 3 choices is correct.

B. two variances

Thus, we use variance to define and measure the size of the differences among the sample means.

BUT, the purpose of the t test and F test is the same - to test for mean differences.

Between-group variability

Between-group variability Difference in means across groups (treatments or conditions) Three groups who watched different TV programs may show different MEAN aggression scores (experimental or treatment effect).

Total variability is divided into _____________ and ___________________.

Between-group variability and within-group variability

If you are a researcher, which variability would you hope to be large, between or within?

Between-group variability because it is associated with treatment/experimental effect.

Given five pairs for X and Y, respectively: (10,20) (20,50)(30,30)(40,10), and (50,40). Which of the values of r below appears to be reasonable for theses data? a) -1.0 b) -5.0 c) 00 d) .50 e) 1.0

C. 00

In a particular experiment, the value of r is .95 between X and Y. a) X is the cause of Y. b) Y is the cause of X. c) Low scores on X are associated with low scores on Y. d) X and Y share 95% of the variance.

C. Low scores on X are associated with low scores on Y.

Suppose there is a correlation of +0.87 between the length of time a person is in prison and the amount of aggression the person displays on a psychological inventory. This means that spending a longer amount of time in prison causes people to become more aggressive

False

When the null hypothesis is true, the F-ratio for analysis of variance is expected, on average, to be zero.

False (F ratio is expected to = 1)

Analysis of variance is used to test for differences in variance between two or more populations.

False (although variances are used in calculation, the purpose of ANOVA is to compare mean differences among groups)

Performing several t-tests rather than one ANOVA involves a greater risk of committing Type II error (True, False).

False (greater risk of committing Type I error)

Suppose it was observed that there is a correlation of r = -0.81 between a driver's age and the cost of the car insurance. This correlation would mean that, in general, older people pay more for car insurance.

False (older people pay less for car insurance)

F sampling distribution is negatively skewed.

False (positively skewed)

The purpose of ANOVA is to test significant differences on variances among groups (True, False).

False (testing significant differences on MEANS).

Total variability

It means how scores in a distribution differ; it associates with total individual differences. Going back to the example of TV viewing - Aggression experiment, total variability in this study indicates how much 21 children show different aggression scores.

Total

SS total 822.95 Df Total 20

Within groups

SS within 258 DF within 18 MS within = SS within/ Df within 258/18= 14.33

Between groups

SSbetween 564.95 dfbetween 2 Ms between = SSwithin/dfwithin 564.95/2=282.48 Ms between/ Ms within 282.48/14.33=19.71

negative relationship

A negative correlation indicates that X and Y change in the opposite direction. Individuals obtaining high scores on one variable tend to obtain low scores on a second variable. The converse is also true; that is individuals scoring low on one variable tend to score high on a second variable (e.g., parental punishment and children's self-esteem).

Hypothesis Testing

Step 1: Set null and research hypothesis Ho: Mean aggression scores are the same across three groups. H1: Mean aggression scores are NOT the same across three groups. Or At least one mean is different from the other. Step 2: Set the criterion for rejecting Ho Sampling distribution of F a=0.05 F218 = 3.56 Step 3: calculate F statistic Step 4: Make a conclusion

Independent variable (IV)

The independent variable is the variable that is manipulated by the researcher. The independent variable usually consists of the two (or more) treatment conditions to which subjects are exposed. o Treatment conditions or o Type of TV program This study has only one IV

Levels

The levels of the factor are the individual conditions or values that make up a factor - Three types (condition/levels) of treatment

As researcher reports an F-ratio with dfbetween =2 and dfwithin =40. How many treatment conditions were compared in the experiment? How many participants in the experiment?

Three treatment conditions: N=43

A correlation of r = -0.90 means that the data points cluster closely around a line that slopes down from left to right.

True

A research report states that there are "significant differences between treatments with F(2, 27) = 5.36, p < .05." Based on this report, you can conclude that the decision from the ANOVA was to reject the null hypothesis.

True

A researcher obtained a correlation of r = +0.62 between the amount of time spent watching television and level of blood cholesterol. This means that there is a general tendency for people who watch more television also to have higher blood cholesterol.

True

In ANOVA, we use variances to test the mean differences among groups (True, False).

True

In an analysis of variance, dftotal will always equal the sum of dfbetween and dfwithin.

True

In general, a large value for an F-ratio indicates that the null hypothesis is wrong.

True

Pearson correlation of r = -1.00 means that all the data points fit perfectly on a straight line.

True

The basic "analysis" in ANOVA involves partitioning the total variability into two components: between-treatments variability and within-treatments variability.

True

The purpose of using t-test and F-test is the same (True, False).

True

When a study involves more than two treatment conditions, an analysis of variance will evaluate all of the separate mean differences in a single test.

True

Basic Data

Two sets of measurements are obtained on the same individuals (or events)

In an independent-measures experiment with three treatment conditions, all three treatments have the same mean, M1 = M2 = M3. For these data SSbetween equals __________. a. 0 b. 1.00 c. 3(5.50) d. cannot be determined from the information given

a (M1 = M2 = M3 there is no group mean difference)

Nonparametric tests are needed when the research situation does not conform to the requirements of parametric tests...

o Do not state the hypotheses in terms of a specific population parameter o Make few assumptions about the population distribution Often termed distribution free tests o Participants usually classified into categories Nominal or ordinal scales are used Data for nonparametric tests are frequencies

Parametric tests share several assumptions

o Normal distribution in the population o Homogeneity of variance in the population o Numerical score for each individual

n

refers to the number of participants in each group.

Interpreting correlations

-A. Correlation does not demonstrate causation. It simply describes the magnitude and direction between two variables -B. Value of correlation is affected by the range of scores in the data i. r is sensitive to the range characterizing the measurements of the two variables. ii. As range decreases, r tends to decreases. In some rare occasions, r increases. -C. Extreme data points, outliers, can have a dramatic effect on the correlation. Even a single outlier can have a dramatic effect on r when the sample size is small (see below) d. Linear transformations of X or Y or both X and Y do not have any effect on r. If either X or Y or both variables are transformed to a new variable(s) via addition, subtraction, multiplication, and/or division, we call this a linear transformation. e. Strength of relationship. For the interpretation purpose, we square r, which is called, Coefficient of determination (r2). *It measures the proportion of variability in one variable that can be determined from (or explained by) the relationship with the other variable.*

An analysis of variance is used to evaluate the mean differences for a research study comparing four treatments with a separate sample of n = 5 in each treatment. If the data produce an F-ratio of F = 3.15, then which of the following is the correct statistical decision? a. Reject the null hypothesis with = .05 but not with = .01. b. Reject the null hypothesis with either = .05 or = .01. c. Fail to reject the null hypothesis with either = .05 or = .01. d. There is not enough information to make a statistical decision.

. c df (3, 16) Fcv = 3.24 (α = .05) Fcv =5.29 (α = .01). Obtained F value = 3.15 is not in rejection area for both alpha levels.

The χ2 Goodness of Fit Test: One Variable

1. It uses sample data to test hypotheses about proportions (frequencies or %) of a population distribution. It determines how well the obtained sample proportions fit the expected population proportions specified by the null hypothesis. It is a nonparametric test. 2. The data for a chi-square test are remarkably simple. You just select a sample of n individuals or observe n events and count how many are in each category. Example #1: In 200 flips of coin, we would expect 100 heads and 100 tails. But what if we observed 92 heads and 108 tails? Would we reject the hypothesis that the coin is fair? Or would we attribute the difference between observed and expected frequencies to random fluctuation? Test this hypothesis using α = .05. Example #2 : Automobile insurance is much more expensive for teenage drivers than for older drivers. To justify this cost difference, insurance companies claim that the younger drivers are much more likely to be involved in costly accidents. To test this claim, a researcher obtains information about registered drivers from the department of motor vehicles and selects a sample of n=300 accident reports from the police department. The motor vehicle department reports the percentage of registered drivers in each age category as follows: 16% are under age 20; 28% are 20 to 29 years old; and 56% are age 30 or older. The number of accident reports for each age group is as follows: Under age 20 = 68 Age 20-29 = 92 Age 30 or older= 140 Are data sufficient to conclude that the age distribution for juniors is significantly different from the distribution for the population of registered voters? Test this hypothesis using α = .01 . 3. Four steps of hypothesis testing

Scatterplot (scatter diagram) : Graphic Presentation of Relationship

1. how to construct a scatterplot? 2. direction of relationship and scatterplot 3. Scatterplots and r. Below are some examples of scatterplots and the correlation found for each data set. For the scatterplots without perfectly linear lines, imagine a line going through the points that best fits in the middle of those points. Then try to use that visual line to guess the direction and strength of the relationship. (ALL THE POINTS FALL ON THE STRAIGHT LINE = PERFECT CORRELATION = R=+1 4. Linear vs. non-linear relationship - r describes the linear relationship between two variables. Linear relationship is expressed in a straight line.

factor

Factor A factor is the variable (independent or quasi-independent) that designates the group o Treatment conditions or o TV programs This study has only one factor

Tell me whether each statement is associated with positive, negative, or no relationship 1. A study of married couple showed that the longer they had been married, the more similar their opinions on social and political issues were. 2. Intelligence test was given to all the children in an orphanage. The results showed that the longer children had lived in the orphanage, the lower their IQ scores 3. In a study of American cities, a relationship was found between the number of violent crimes and the number of stores selling violence-depicting pornography 4. A college professor notices that the farther students sit toward the back of the room, the worse their grades in the course seem to be. 5. A study found that intelligent people can be either tall or short.

1. positive 2. negative 3. positive 4. negative 5. no correlation

A correlation of r = -0.90 means that there is essentially no consistent relationship between X and Y.

False

ANOVA is a statistical procedure that compares two or more treatment conditions for differences in variance (True, False).

False

If the value of the Pearson correlation is r = 0, then all data points on a scatterplot would fall on a straight line.

False

Which of the following statements make sense? a) The correlation between class 1 and class 2 on their final exam scores is +.70. b) The correlation between my final exam score and my quiz score is +.70. c) The correlation between quiz scores and final exam scores in a statistics class is +.70. d) All of the above make sense.

C. The correlation between quiz scores and final exam scores in a statistics class is +.70. (you need a group of people and two variables measured from the same people)

The correlation coefficient is obtained between academic aptitude test score and academic achievement (1) among honor students, and (2) among students in general. Other things being equal, we expect ____. a) the two coefficients to be about the same b) the first to be higher c) the second to be higher d) one to be negative, the other positive

C. The second to be higher (because the range of aptitude and achievement scores are not restricted for students in general)

The r is a measure of the (hint: find the best answer) a) magnitude of the relationship between X and Y. b) linearity of the relationship between X and Y. c) magnitude and direction of the linear relationship between X and Y. d) relationship between two groups of persons on a variable.

C. magnitude and direction of the linear relationship between X and Y.

In ANOVA, variability is more often denoted with sum of squares and variance than...

S (standard deviation). All are measures of variability (see chapter 4).

The r between musical ability and IQ would probably be greatest from which of the following groups? a) professional musicians b) a random sample of college students c) academically gifted adults d) a random sample of adults.

D. A random sample of adults (correlation is expected to be highest among people with wide range of scores)

Which of the following is the most appropriate response to the question, "what is the correlation of abstract reasoning for a group of high school seniors?" a) r is probably positive b) r is probably negative c) r is near zero d) the question is meaningless.

D. The question is meaningless (correlating abstract reasoning with what?)

When interpreting correlation coefficient several things have to be considered EXCEPT _____. a) whether the relationship between X and Y is linear or nonlinear b) whether there is an outlier c) whether the ranges of X and Y scores are restricted d) whether X and Y variables are measured in the same scales

D. whether X and Y variables are measured in the same scales

An analysis of variance is used to evaluate the mean differences among three treatment conditions. The analysis produces SSwithin = 20, SSbetween = 40, and SStotal = 60. For this analysis, what is MSbetween? = a. 20/3 b. 20/2 c. 40/3 d. 40/2

D: (ssbetween/df between = 40/2)

Coefficient of Determination =r^2 ========

Effect size

When there are more than two treatments in an ANOVA, rejecting the null hypothesis means that all of the treatment means are significantly different from each other.

F (at least one mean is different from the other)

What is the f test

F= variance (differences) between treatments/ variances (differences) expected with no treatment effect Mean differences between groups who received different treatments/ Individual differences for people who received the same treatments

Parametric tests

Hypothesis tests used thus far tested hypotheses about population parameters. Because these tests all concern parameters and require assumptions about parameters

No relationship

No regularity is apparent among the pairs of observations; For example, individuals obtaining high scores on one variable tend to obtain high, medium, or low scores on a second variable (e.g., singing ability and birth order).

Four Steps of Hypothesis testing

Step I: Setting Ho and H1 i. Example #1 Ho: The proportion of getting heads will be the same as that of getting tails. Head Tail 50% (0.5) 50% (0.5) OR Head Tail 100 100 H1: The proportion of getting heads will be different from that of getting tails. Step II: Find rejection area and determine χ2 critical value Chi-square test 1. Calculate observed frequencies and expected frequencies iii. Chi-square distribution and df 1. The formula tells you the following facts about the χ2 distribution. a. All χ2 are zero or larger; No negative values. b. When Ho is true, you expect the observed frequency (fo) to be close to expected frequency (fe). When Ho is true, we expect χ2 values to be SMALL. 2. Chi-square distribution: Chi-Square distribution includes values for all possible random samples when H0 is true 3. Critical value and rejection area a. α b. df = C-1 (C = number of categories) Example #1: α = .05; C = 2 df = 1 Table B.8 (The chi-square distribution) Critical value of χ2 = 3.84 Step 4: Make a conclusion Obtained chi-square value falls on nonrejection area. We failed to reject Ho.

If SSbetween = 20 and MSbetween = 10, then the ANOVA is comparing three treatment conditions.

T (dfbetween=2; number of group (K) should be 3)

Dependent variable (DV)

The dependent variable is the one that is observed for changes in order to assess the effect of the treatment. - aggressive behaviors

The F-test is based on variance instead of

mean differences

Why use variance?

The purpose of ANOVA is to test for significant differences between two or more MEANS. As seen in the F-test formula, the variances are used to test mean difference

Quasi-independent variable (Quasi-IV)

The quasi-independent variable is the variable that is NOT manipulated by the researcher. Sex (male vs. female), marital status (married, divorced, single, separated, others) are the examples.

Within-group variability

Within-group variability Difference in scores within each group (treatment or condition) Children in each group who watched the same TV program may show different aggression scores (experimental error).

The F sampling distribution is

a theoretical distribution of VARIANCE given Ho is true. ii. Because F-Ratios are computed from two variances, F will always be positive, ranging from 0 to +∞. iii. When the null hypothesis is true (there is no treatment effect), the majority of F-Ratio is expected to be around F = 1.00 as the numerator and denominator of the ratio are measuring exactly the same variance (Read p. 394). Extreme F ratios occur with low probability. Therefore, the F sampling distribution is positively skewed.

The sampling distribution is

a theoretical distribution of a sample statistic (mean, median, variance, etc) given Ho is true. It is an essential distribution that is used in ALL INFERENCTIAL STATISTICS.

A researcher measures job satisfaction among managers, secretaries, skilled workers, and laborers to determine whether the mean job satisfaction is significantly different among four occupation groups. a. IV (or quasi-IV): b. DV: c. Factor: d. Level:

a. Occupation (it is quasi-IV because occupation variable cannot be manipulated) b. Job satisfaction c. Occupation d. 4 Types of occupation

Effect size in ANOVA

a. n^2 = ss between/ ss total = Variance due to treatment/ total variance (individual 1 differences) n^2 =eta squared b. Interpretation: n^2* 100 = "% of variance in DV can be explained by IV." c. Rule of thumb .01 = small effect .06 = medium effect .14 = large effect

The r is the

average of cross-product of the z- scores of two variables (=Average standardized covariability)

An analysis of variances produces SSbetween = 20 with dfbetween = 2, and SSwithin = 30 with dfwithin = 15. For this analysis, what is the F-ratio? a. 20/30 = 0.67 b. 10/2 = 5.00 c. 30/20 = 1.50 d. 2/10 = 0.20

b F = MSbetween/MSwithin = (20/2) / (30/15)

In general the distribution of F-ratios is __________. a. symmetrical with a mean of zero b. positively skewed with all values greater than or equal to zero c. negatively skewed with all values greater than or equal to zero d. symmetrical with a mean equal to dfbetween

b. positively skewed with all values greater than or equal to zero

Analysis of variance means partitioning the total variability into two basic components:

between-group ("between-treatment") variability and within-group ("within-treatment") variability.

An analysis of variance is used to evaluate the mean differences for a research study comparing four treatment conditions with a separate sample of n = 5 in each treatment. The analysis produces SSwithin = 32, SSbetween = 40, and SStotal = 72. For this analysis, what is MSwithin? a. 32/5 b. 32/4 c. 32/16 d. 32/20

c SSwithin/dfwithin=32/(20- 4)

A researcher uses an ANOVA to compare three treatment conditions with a sample f n=8 in each treatment. Find dftotal______________, dfbetween ___________, dfwithin ____________.

df total - 23 df between- 2 df within - 21

The F critical value that is associated with alpha level (or rejection area) is determined by

dfbetween and dfwithin.

What statistics is associated with ANOVA?

f- test

k

identifies the number of treatment conditions (the number of levels of a factor)

ANOVA allows researcher to evaluate all of the mean differences

in a single hypothesis test using a single α-level and, thereby, keeps the risk of a Type I error under control no matter how many different means are being compared.

The purpose of ANOVA is to

test for significant differences between two or more means.

The difference between ANOVA and the t tests is

that ANOVA can be used in situations where there are two or more means being compared, whereas the t tests are limited to situations where only two means are involved.

N

the total number of participants in all groups.

F test (F Ratio)

variance between group means/ vairance expected from sampling error = variance between treatments/ variance with treatments


Related study sets

the great gatsby - chapter 7 questions

View Set

O/I/A Quiz Online #1 (Major Upper Body)

View Set

Math 8 - Study Guide for Tuesday's Test - Part 1

View Set

Chapter 28 Assisting with Respiratory and Oxygen Delivery

View Set

Quantitative Methods Test 2 (Ch. 6-8)

View Set