Social Statistics for exam 2 chapters

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

When the null hypothesis is false, the F test statistic is most likely

large

In general, you should reject the null hypothesis for

large values of the F test statistic

sstructure of F ratio =

systemic treatment effects + random, unsystematic differences / random, unsystematic differences

the df value for the independent measures t statistics can be expressed as

df = n1 + n2 - 2

when measuring different scores are obtained for the same person

difference score

MD =

(sumD/n)

difference score =

D= X2 - X1

MD=

ED/n

SS=

EX^2 - (EX^2/N)

the estimated standard error for Md

is then computed using the sample variance (or sample standard deviation) and the sample size, n. Smd= sqrt s^2/n or Smd= s/ sqrt n

A study that combines two factors is called a

two-factor design or a factorial design.

F ratio =

(variance between treatments/ variance within treatments) = (differences including any treatment effects / differences with no treatment effects)

chi test null hypothesis #1. No Preference, Equal Proportions

. The null hypothesis often states that the population is divided equally among the categories or that there is no preference among the different categories. For example, a hypothesis stating that there is no preference among the three leading brands of soft drinks would specify a population distribution as follows: The no-preference hypothesis is used in situations in which a researcher wants to determine whether there are any preferences among the categories, or whether the proportions differ from one category to another.

when finding the df for between and within`

Each df value is associated with a specific SS value. Normally, the value of df is obtained by counting the number of items that were used to calculate SS and then subtracting 1. For example, if you compute SS for a set of n scores, then .

The sum of all the scores in the research study (the grand total) is identified by ____. You can compute ___ by adding up all N scores or by adding up the treatment totals: ___= ET.

G

the null hypothesis is symbolised by

H0: u1 - u2 = 0 or u1 = u2

SS between =

SUM (T^2/n) - (G^2/N)

What is suggested by a large value for the F-ratio in an ANOVA?

There is a treatment effect and the null hypothesis should be rejected.

without satisfying the homogeneity of variance requirement, you cannot

accurately interpret a t statistic, and the hypothesis test becomes meaningless

numerator is

between treatment

the first two assumption should be

familiar form the single-sample t hypothesis test

All the statistical tests we have examined thus far are designed to test hypotheses about specific population parameters. For example, we used t tests to assess hypotheses about a population mean μ or mean difference μμ. In addition, these tests typically make assumptions about other population parameters. Recall that, for analysis of variance (ANOVA), the population distributions are assumed to be normal and homogeneity of variance is required. Because these tests all concern parameters and require assumptions about parameters, they are called______________________ they require a numerical score for each individual in the sample. The scores then are added, squared, averaged, and otherwise manipulated using basic arithmetic. In terms of measurement scales, parametric tests require data from an interval or a ratio scale

parametric tests.

MS =

s^2 = SS/df

calculating s when you have SS

since SS = (df)(s^2/2), s2= sqrt (SS / df)

Smd =

sqrt (s^2/n)

law of large numbers

statistics obtained from large samples tend to be better (more accurate) estimates of population parameters than statistics obtained from small samples

the null hypothesis states

that there is no change, effect, or, in this case, no difference

homogeneity of varience is most important when _________________, but is less critical with equal or nearly equal sample sizes, but is still important

there is a large discrepancy between the sample sizes.

For an analysis of variance, the differences between the sample means contribute to the __________and appears in the _________of the F-ratio.

variance between treatments, numerator

denomenator is

within treatment

for the chi square test to find the degrees of freedom

df= c-1 (C as is # of categories)

one tailed hypothesis test is also known as a

directional test

estimated d =

estimated mean difference / estimated standard deviation M1 - M2 / sqrt s(^2/p)

In the context of ANOVA, an independent variable or a quasi-independent variable is called a

factor.

when the larger sample has a larger df value, it caries more weight when averaging the two variances. this produces an alternative formula for computing pooled variance

s ^(2/p) = (df1 s(^2/p) + df2 s (^2/2)) / df1 + df2

in calculating s(M1-M2), you typically first need to calculate

s^(2/p)

alternative hypothesis state

H1: u1 - u2 does not = 0 (there is a mean difference) or u1 does not equal u2

Between-treatments variance is caused by either systematic treatment type variance or random unsystematic variance due to individual differences and sampling error.

In this example, "Flaws in the tool used to measure the pretreatment and posttreatment severity of the arachnophobia," "Individual differences in the severity of the arachnophobia," "Individual differences in the resistance of a person with arachnophobia to any form of treatment," "Systematic differences caused by in vivo flooding being more effective than imaginal flooding," and "Differences in severity of arachnophobia that are caused because in vivo and imaginal flooding both effectively treat arachnophobia" are all causes of between-treatments variance.

the individuals are simply classified into categories and we want to know what proportion of the population is in each category. _________________________________ is specifically designed to answer this type of question. The test determines how well the obtained sample proportions fit the population proportions specified by the null hypothesis.

The chi-square test for goodness of fit

Total Degrees of Freedom, df total. To find the df associated with SS total, you must first recall that this SS value measures variability for the entire set of N scores. Therefore, the df value is

df total = N-1

MS is

mean square in place of the term "variance"

t=

sample mean - population mean / estimated standard error M - u / SM

A low standard deviation indicates that the values tend to be_________ (also called the expected value) of the set, while a high standard deviation indicates that the values are spread out over a___________

close to the mean, wider range.

Between-Treatments Degrees of Freedom, df between. The df associated with SS between can be found by considering how the SS value is obtained. These SS formulas measure the variability for the set of treatments (totals or means). To find df between, simply count the number of treatments and subtract 1. Because the number of treatments is specified by the letter k, the formula for df is

df = k-1 (k= # of groups)

Within-Treatments Degrees of Freedom, . To find the df associated with , we must look at how this SS value is computed. Remember, we first find SS inside of each of the treatments and then add these values together. Each of the treatment SS values measures variability for the n scores in the treatment, so each SS has df = n-1. When all these individual treatment values are added together, we obtain

df withing = E(n-1) = Edf in each treatment

The goal of an independent-measures research study is to

evaluate the mean difference between two populations (or between two treatment conditions).

the symbol for expected frequency is

fe

the expected frequency

for each category is the frequency value that is predicted from the proportions in the null hypothesis and the sample size (n). The expected frequencies define an ideal, hypothetical sample distribution that would be obtained if the sample proportions were in perfect agreement with the proportions specified in the null hypothesis.

if the t statistic is greater than (to the right of) the positive critical value, or less than (to the left of) the negative critical value then the t statistic is

in the critical region

The letter ____ is used to identify the number of treatment conditions—that is, the number of levels of the factor. For an independent-measures study, ____ also specifies the number of separate samples.

k

for independant-measures t statistic, there are two SS values and two df values (one from each sample) - these values are combined/pooled and identifyed as

s^(2/p) = (SS1 + SS2) / (df1 + df2)

when there is only one sample, the sample varience is computed as

s^2 = SS/df

t=

sample statistic - population parameter / estimated standard error

s=

sqrt (ED^2 - (ED^2/n) /n-1

steps in a one tailed hypothesis test

step 1: State the Hypotheses and Select the Alpha Level. step 2: locate the critical region step 3: Compute the t Statistic. step 4: make a decision

steps in

step 1: state the hypotheses and select the alpha level step2: locate the critical region step 3: collect the data and calculate the test statistic step 4: make a decision

The problem is to determine how well the data fit the distribution specified in —hence ______________-

the name goodness of fit.

the mean for the irst population is

u1

the mean for the second population is

u2

independent-measures research design or a between-subjects design.

using completely separate groups;uses two separate samples to represent the two different populations (or two different treatments) being compared. ex. comparing grades for freshman who are given computers and the grades for those without

violating the homogeneity of variance assumption can

negate any meaningful interpretation of the data from an independent measures experiment

t statistics =

= ( (M1 - M2) - (u1 - u2) ) / (s(M1-M2) ) sample mean difference - population mean difference / estimated standard error

diasdvantages of using a related sample (either one sample of participants with repeated measures or two matched samples) versus using two independent samples include which of the following? Check all that apply.

A related-samples design reduces or eliminates problems caused by individual differences such as age, IQ, gender, or personality. A study that uses related samples to compare two drugs (specifically, one sample of participants with repeated measures) can have a carryover and/or order effect such that the effects of the drug taken before the first measurement may not wear off before the second measurement.

One sample from an independent-measures study has n=4 with SS=72. The other sample has n=8 and SS=168. For these data, compute the pooled variance and the estimated standard error for the mean difference.

pooled variance = 240/10 =24 estimated standard error=3

To make this formula consistent with the ANOVA notation, we substitute the letter G in place of EX and obtain

SS= EX^2- (G^2/N)

the difference between means is

u1 - u2

t=

M-u/Sm

when a researcher uses a nonmanipulated variable to designate groups, the variable is called a

quasi-independent variable

the pooled variance is obtained by

averaging or "pooling" the two samples variences using a procedure that allows the bigger sample to carry more weight in determining the final value

if the t statistic is in the critical region, you can _______ the null hypothesis because probability is less than 0.05 of obtaining such a t statistic if the mean difference is truly zero

reject

one tailed tests can lead to _____________ when the mean differecne is relatively small compared to the magnitude required by a two tailed test

rejecting H0 (the null hypothesis)

Advantages of using a related sample (either one sample of participants with repeated measures or two matched samples) versus using two independent samples include which of the following? Check all that apply.

Related samples have less sample variance, increasing the likelihood of rejecting the null hypothesis if it is false (that is, increasing power). Related samples (specifically, one sample of participants with repeated measures) can have a carryover effect such that participants can learn from their first measurement and therefore do better on their second measurement. Related samples (specifically, one sample of participants with repeated measures) can have an order effect such that a change observed between one measurement and the next might be attributable to the order in which the measurements were taken rather than to a treatment effect.

estimated standard error of

M1 - M2 = s(M1-M2) = sqrt of (s^(2/p) / n1 + s^(2/P) / n2 )

each of the two sample means represents its own population mean, but in each case there is some error

M1 approximates u1 with some error M2 approximates u2 with some error

t=

MD-uD / Smd

the standard error for the sample mean difference is represented by the symbol

S (M1-M2)

Between-Treatments Sum of Squares, SS between treatments when SS total = 172

SS between = SS total - SS within (172-88= 84)

MS between =

SS between/ df between

MS within =

SS within / df withinF

Within-Treatments Sum of Squares, SS withing treatments Example SS 1 = 24, SS 2= 34, SS 3 =30

SS withing treatment = ESS inside each treatment = 24 + 34 + 30 = 88

s^2 =

SS/ n-1 = SS/df or s= sqrt SS/df

for the independent-measures t test, what describes the pooled variance

a weighted average of the two sample variances (weighted by the sample sizes)

for the independent-measures t test, what describes the estimated standard error or M1-M2

an estimate of the standard distance between the difference in sample means (M1-M2) and the difference in the corresponding population means (u1-u2)

if the t statistic is not in the critical region you can _________ the null hypothesis

can not reject

The total number of scores in the entire study is specified by a

capital letter N

The sum of the scores for each treatment condition is identified by the

capital letter T (for treatment total)

Greek letter (chi, pronounced "kye"), which is used to identify the test statistic.

chi square test

When the null hypothesis is true, the F test statistic is

close to 1

we prefer to phrase the null hypothesis in terms of the

difference between the two population means

expected frequency=

fe = pn where p is the proportion stated in the null hypothesis and n is the sample size.

the third assumption is referred to as _________________________ and states that the two populations being compared must have the same variance

homogeneity of variance

this standard error measures

how accurately the difference between two sample means represents the difference between the two population means

____________ is the value used in the denominator of the t statistic for the independent-measures t test

s(M1-M2)

r^2 =

t^2 / t^2 + df

One of the most obvious differences between parametric and nonparametric tests is ___________________________

the type of data they use.

chi test null hypothesis #2. No Difference from a Known Population.

The null hypothesis can state that the proportions for one population are not different from the proportions than are known to exist for another population. For example, suppose it is known that 28% of the licensed drivers in the state are younger than 30 years old and 72% are 30 or older. A researcher might wonder whether this same proportion holds for the distribution of speeding tickets. The null hypothesis would state that tickets are handed out equally across the population of drivers, so there is no difference between the age distribution for drivers and the age distribution for speeding tickets. Specifically, the null hypothesis would be The no-difference hypothesis is used when a specific population distribution is already known. For example, you may have a known distribution from an earlier time, and the question is whether there has been any change in the proportions. Or, you may have a known distribution for one population (drivers) and the question is whether a second population (speeding tickets) has the same proportions.

required assumptions for z tests

The sample is selected using random sampling. This means every member of the population has an equal chance of being included in the sample, and this chance of being included does not change as the members are selected. In practice, it is almost impossible to have a true random sample, but researchers should take precautions to make the selection of the sample as random as possible.•The observations are independent from one another. This means members of the sample are not connected to one another such that their data values are systematically related. For example, if you include siblings in the same sample, data related to health factors or lifestyle are likely to be similar to each other. In addition, this requirement for independent observations means that researchers should not, for example, interview members of the same sample simultaneously to, for example, ask them to express their opinion on something. One member's opinion might then influence another member's opinion.•The standard deviation of the variable of interest is constant across treatments. Here "treatments" might consist of an actual treatment (as in one group takes a drug, the other does not), or it may just refer to a change in conditions (such as comparing a set of measurements from one year to the next).•The distribution of sample means is normal. Using the central limit theorem, this is true when the original population is normal or when the sample size is sufficiently large (typically greater than 30 as long as the original population is not extremely nonnormal). So, having a normally distributed population or having a sample larger than 30 is a necessary assumption to make the central limit theorem work.

standard error tells you

how much discrepancy is reasonable to expect between the sample statistic and the corresponding population parameter

the normality assumption is the

less important of the two, especially with large samples

The number of scores in each treatment is identified by a

lowercase letter n

cohen's d = d=

mean difference / standard deviation u1-u2 / s

MDis the

mean for the sample of D scores

the first version of H0 produces a specific numerical value (zero) that is used in the calculation of the

t statistic

between-treatments variance simply measures how much difference exists between the treatment conditions. There are two possible explanations for these between-treatment differences: T

the differences between treatments are not caused by any treatment effect but are simply the naturally occurring, random and unsystematic differences that exist between one sample and another. That is, the differences are the result of sampling error. The differences between treatments have been caused by the treatment effects. For example, if treatments really do affect performance, then scores in one treatment should be systematically different from scores in another condition.

the single sample t uses one sample mean to test a hypothesis. the sample mean and the population mean appear in the numerator of the t formula, which measures how much difference there is between the sample data and teh population hypothesis

the overal t formula

ANOVA can be used with either an independent-measures or a repeated-measures design true or false

true

F =

variance due to both chance and caffeine / variance due to chance

a large value for the chi-square statistic indicates a big discrepancy between the data and the hypothesis, and suggests that ________

we reject H0

how to find x^2

x^2 = E (fo-fe)^2 / fe

In ANOVA, the F test statistic is the ___________ of the between-treatments variance and the within-treatments variance.

ratio

When there are no systematic treatment effects, the differences between treatments (numerator) are entirely caused by random, unsystematic factors. In this case, the numerator and the denominator of the F-ratio are both measuring random differences and should be roughly the same size. With the numerator and denominator roughly equal, the F-ratio should have a value around 1.00. In terms of the formula, when the treatment effect is zero, we obtain

F= 0 + random, unsystematic differences / random, unsystematic differences

Within-treatments variance is caused by random unsystematic variance caused by individual differences or sampling error. treatment" are all causes of within-treatments variance.

In this example, "Flaws in the tool used to measure the pretreatment and posttreatment severity of the arachnophobia," "Individual differences in the severity of the arachnophobia," and "Individual differences in the resistance of a person with arachnophobia to any form of

F=

MS (between) / MS within

In some ANOVA summary tables you will see, the labels in the first (source) column are Treatment, Error, and Total. Which of the following reasons best explains why the within-treatments variance is sometimes referred to as the "error variance"?

The within-treatments variance measures random, unsystematic differences within each of the samples assigned to each of the treatments. These differences are NOT due to treatment effects because everyone within each sample received the same treatment; therefore, the differences are sometimes referred to as "error."

The formula for chi-square involves adding squared values, so you can never obtain a negative value. Thus, all chi-square values are zero or larger. When is true, you expect the data ( values) to be close to the hypothesis ( values). Thus, we expect chi-square values to be small when is true. These two factors suggest that the typical chi-square distribution will be positively skewed (Figure 15.2). Note that small values, near zero, are expected when is true and large values (in the right-hand tail) are very unlikely. Thus, unusually large values of chi-square form the critical region for the hypothesis test.

chi square distribution

difference scores or

d values

the difference scores measure the amount of change in reaction time for each person

difference scores

repeated-measures research design or a within-subjects design.

ex. obtain one set of scores by measuring depression for a sample of patients before they begin therapy and then obtain a second set of data by measuring the same individuals after six weeks of therapy.

An F-ratio that is much larger than 1.00 is an indication that H0 is

not true.

pooled variance

one method for correcting the bias in the standard error is to combine the two sample variances into a single value

Both tests are based on a statistic known as chi-square and both tests use sample data to evaluate hypotheses about the proportions or relationships that exist within populations. Note that the two chi-square tests, like most nonparametric tests, do not state hypotheses in terms of a specific parameter and they make few (if any) assumptions about the population distribution. For the latter reason, nonparametric tests sometimes are called _______________________--

distribution-free tests.

it may not be appropriate to use a parametric test. Remember that when the assumptions of a test are violated, the test may lead to an erroneous interpretation of the data. Fortunately, there are several hypothesis-testing techniques that provide alternatives to parametric tests. These alternatives are called________________________ participants are usually just classified into categories such as Democrat and Republican, or High, Medium, and Low IQ. Note that these classifications involve measurement on nominal or ordinal scales, and they do not produce numerical values that can be used to calculate means and variances

nonparametric tests.

Because the denominator of the F-ratio measures only random and unsystematic variability, it is called

the error term.

the individual groups or treatment conditions that are used to make up a factor are called

the levels of the factor.

note

the no-preference null hypothesis will always produce equal expected frequencies ( values) for all categories because the proportions (p) are the same for all categories. On the other hand, the no-difference null hypothesis typically will not produce equal values for the expected frequencies because the hypothesized proportions typically vary from one category to another. You also should note that the expected frequencies are calculated, hypothetical values, and the numbers that you obtain may be decimals or fractions. The observed frequencies, on the other hand, always represent real individuals and always are whole numbers.


Ensembles d'études connexes

Marketing test 2 sample questions (ch 4, 5

View Set

Chapter 7 - Corruption and Ethics

View Set

SAChE Module hazards and risk AJJ

View Set

Nursing 102 Chapter 26 Vital signs

View Set

Organic Chemistry Exam I, Chapters 6 and (SN2 and E2)

View Set

Evolve HESI Leadership/Management

View Set