cross-cultural research methods

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Conversely the irt methods for detecting dif use and estimate of the _____ latent proficiency level to match test takers across groups that is obtained by estimating an appropriately chosen IRT model

"true"

An item gives an advantage to a group at the lower levels of proficiency while favours another groups at the higher levels

(crossing nonuniform dif)

problems have often been detected in what are typically called translation differentilal item funcitioning studies _______

(translation DIF studies)

An item gives one group a minor advantage at the lower levels of proficiency and a major advantage at the higher levels

(unidrectional nonuniform DIF)

The use of bilingual test takers constitutes a sound experimental methodology for evaluating translated versions of items. However, some weaknesses of the approach can be pointed out:

- Bilingual test takers may not represent the test behaviour of either of the groups of interest - The idea of a truly bilingual test taker may not exist, and there are surely variations across test takers in their proficiencies in one or both languages, as well as differences in which language they are most dominant

Test score methods for detecting DIF

- Delta plot - Standardisation - Mantel-haenszel - Logistic regression

The review form consists of 25 questions that pertain to five broad topics

- General translation questions - Item format and appearance - Grammar and phrasing - Passages and other item relevant stimulus materials (if relevant) - Cultural relevance and specificity

which of the following statements is correct - An approapte translation from the source language into target language ensures that the two tests are equivalent - Test adaption is a broader concept than test translation - The difficulty of eduational test items never changes with the translation from one language into another - None of the other alternatives are correct

- Test adaption is a broader concept than test translation

Common irt models for unidimensional tests with dichotomous items are:

- The one parameter logistic (1pl) model - The two parameter logisitc (2pl) model - The three parameter logistic (3pl) model

The following three pieces of information must always be provided

- the data set - the group membership of the respondents - the focal group label

Equivalence entails (in general translation questions)

- the meaning of the item - the difficulty of the item and related elements such as passages or other stimulus materials - commonality and attractiveness, as well as similarity in connotation

Large sample sizes are desired because:

- they are more likely to be representative of populations - they are more likely to produce replicable results - They increase statistical power (i.e. The probability of rejecting the null hypothesis when it is false)

A fundamental tenet of multiple choice item is that there is _________ response option for a given item In preparing translation versions of instruments, care should be taken to ensure that changes are not made that alter the likelihood of another possibel correct response option for an item

only one correct

One-way means that there is _______ variable (factor)

only one indipentdent

𝜂Ƹ2 tends to ___________ the population effect size η2 because its numerator, SSB, is inflated by some error variability

overestimate, Bias is less for larger sample sizes

In presecene of __________ (i.e., vectors that contain information associated with each other), a vector can be used to identify values on the parallel vector

paralell vectors

In most cases, ______ are denoted by greek letters and _____ by roman letters

parameters, statistics

Also has to specify if it is ______, so either TRUE (default) or FALSE - if TRUE, partial indices are returned (for eta, epsilon or omega squared data input)

partial

Plot - it specifies the type of _____ to be displayed (values are: - IrStat - the ______ ratio statistics are displayed; default - itemCurve - the fitted logistic curves are ______

plot, likelihood, displayed

An advantage of irt methods is that, to evaluate dif in polytomous items, it is sufficient to fit an irt model for _______

polytomous items

for 𝜀Ƹ2 and 𝜔ෝ2 it is _______ to compute a value below 0

possible

Interpretations of cultural differences among people based on "statistically significant" findings may be based on "__________" between means (ie the nonzero difference between culture means may be so small that it is of little or no practical significance

practically insignificant differences

Since we observe a sample and not the entire population the conclusions about the population are always formulated on ____________

probailistic basis

Note that the probability of endorsing the item is close to the ______ guessing parameter of .2 when the proficiency level of the test taker is as low as −4 and is higher than .5 when the proficiency level equals the difficulty parameter of the item (i.e., 0)

pseudo

cross-cultural studies often involve ______, in which samples are not randomly assigned to conditions (researchers cannot randomly assign an individual to a culture)

quasi experiments

total variance (or total mean squares)

ratio between the total deivance SS t and total degrees of freedom df T = N - 1 (where N is the total number of individuals)

When a test is translated from one language to another the _______ is typically the group taking the version of the test written in the original language

reference group

the standardisatio group can be one of the following:

reference group, focal group, total group

If the null hypothesis is _______ at least two populations have different mean

rejected

The null hypothesis is _______ if the sample statistics are associated with a low probability of occurrence (less than α) when the null hypothesis is assumed to be true

rejected

why do we report a value thats less than 0 instead of converting it to 0

replacing negative values with 0 might cause an additional positive bias in an estimate that is based on averaging estimates in a meta-analysis

𝜂2 - effect size for one way anova

represents the proportion of variance in the dependent variable that is accounted for by the independent variable (i.e., the different groups)

The null hypothesis is ______ if the sample statistics are associated with a high probability of occurrence (greater than α) when the null hypothesis is assumed to be true

retained

The researcher conducts the study to determine whether the statement specified by the null hypothesis is likely to be true (the nyll hypothesis is ______) or not (the null hypothesis is _______)

retained, rejected

Simpson's paradox occurs when groups of data show one particular trend but this trend is _________ or disappears when the groups are combined together

reversed

All the variables that potentially have an effect on the dependent variable must be taken into account in the explanation of the observed cross-cultural differences, otherwise there is room for _________ of these differences

rival explanations

The criteria by cohen and those by other authors provide _______

rough benchmarks

If the null hypothesis is retained, all populations have the _______

same mean

The one way univariate anove (analysis of variance) is a statistical test of the null hypothesis that two or more independent samples are selected from populations with the ________

same mean

One of the main concerns about the null hypothesis significance testing is its dependence on ________

samples size - the larger the sample size the easier it is for small differences to become statistically significant

X i j

score of individual i of group j

"data" must be specified only when the variables in the formula are not ______ vectors

separate

population

set of all elements, individuals or units that meet the selection criteria for the group we want to study

statistical inference

set of methods based on probability theory that allow the researcher to make inferences about the behaviour of the population, starting from the observation of asample drawn from the population

sample

set of n elements, individuals, or units drawn from the population

When the operation involves two vectors, the two vectors must have the same lengt or the length of the longer vector must be a multiple of that of the ________

shorter vector

The threshold depends on the set of delta scores and the shape of the delta points (through the sample variances and covariance and the slop of the main axis) and on an appropriately chosen ________ level

significance

Each sample of subjects is assigned to a ________ of the levels of the factors

single combination

Effect ____ are obtained as the difference ΔR2 between proportions of variance R2 accounted for in two nested models

sizes

𝑛1 and 𝑛2

sizes of two samples

If the difference between the means observed on the two samples is ________ then the probability that the two samples are selected from populations with the same mean is large (the null hypothesis is retained)

small and conversely if the difference between the means observed on the two samples is large then the probability that the two samples are selected from populations with the same mean is small (the null hypothesis is rejected)

Regardless of how the effect sizes are interpreted, the _______ of the effect size indices must be reported to allow researchers to make their own interpretations of meaningfulness

specific values

Cohen's d measures effect size in terms of the number of ________ that the means of the two populations differ from one another

standard deviations

Actually, the backbone of scientific research in psychology - _________ - may lead to significant biases in the interpretation of results if the measurement instruments do not take linguistic and cultural differences into account

standardisation of measures

stdWeight: Specifies the ______ group - Possible values are "focal" (default), "reference" and "total"

standardization

Effect sizes are

statistical measures of the size of the effect in the population

Given a sufficiently large sample size - null hypothesis significance testing will always show a __________ (i.e. It will lead to rejecting the null hypothesis) even if the difference between population means is trivially small

statistically significant result

Hambleton and zeniski (2011) produced a comprehensive, clear and validated review form to standardise the checking of translated and adapted items on educational and psychological tests

the 25 question test

df B

the between-groups degree of freedom

𝜎2 B - effect size for one way anova

the between-groups population variance

There is an interaction when

the effect of one factor on the dependent variable is not the same at all levels of the other factor

The greater the difference between the samples, compared to the difference within the samples, _________

the greater the ratio

There is a relationship among cultural distance, the probability of observing statistically significant cross-cultural differences, and rival explanations that account for such differences: ___________, the easier it is to find significant cross-cultural differences, but the more difficult it is to interpret them

the larger the cross-cultural distance between groups

the larger the sample sizes

the more likely it is that the null hypothesis is rejected

The symbol # represents comment marker in R. Anything following______

the symbol is not executed by the system

𝜎2 T - effect size for one way anova

the total population variance

𝜎2 W - effect size for one way anova

the within-groups population variance

In particular, items with a perpendicular distance to the main axis larger than an acceptable ______ are flagged as DIF

threshold

The more recent approach prescribed deriving the ________ from data using a normality assumption on the delta points

threshold

SS T

total deviance

Whenever possible ______-- of graphics tables and other item elemetns should be limited to the necessary translation of text or labels

translation

Idiomatic expressions often have a specific connotation in the source language that can be difficult to translate and may result in noticeably longer amount of text in the target language version of the item to communicate the nuances of meaning that were more natural in the source language version which might result in _____

translation DIF

Some words may have additional shades of meaning in one language but not in the other and this may result in_________

translation DIF

Different font styles can be a source of translation DIF

true

Interpreting findings about similarities and differences is much more difficult in cross cultural studies than in exeperimental studies

true

The grammatical structure of translated items ay be more susceptible to problems when the sentence completion item format is used

true

The larger absolute value of Cohen's d, the larger the effect in the population

true

The main effects are most appropriately interpreted when there is no interactin

true

The result of an appropriate model (e.g. The result of an anova or factorial design) must be specified

true

The translation must appropriately reflect the society and customs of the target language

true

Then a test stastistic is computed that allows for determining the liklihood of obtaining the samples statistics if the null hypothesis was _______

true

logistic regression - A ΔR2 of .02 is negligible according to both the criteria by Jodoin and Gierl (2001) and Zumbo and Thomas (1997)

true

logistic regression - f the slope of an item for the interaction between total test score and group membership is statistically significant, there is evidence of nonuniform DIF on that item

true

pjFm is the proportion of test takers from the focal group with a total test score of m who answered item j = 1, ..., J correctly

true

pjRm is the proportion of test takers from the reference group with a total test score of m who answered item j = 1, ..., J correctly

true

𝑤m is the proportion of test takers from the standardization group with a total test score of m

true

𝛽1j ,𝛽3j ,and 𝛽 2j are the slopes of item j for S, G and SG

true

When an item is translated from one language to another and the two separate versions are administered to different groups of test takers, there is nothing to link the data across the separate groups

true (and further more: We cannot assume that the two groups are equivalent because the test takers come from different backgrounds We cannot consider the items to be equivalent because they are adaptations of one another)

Main effects are most appropriately interpreted when there is no interaction

true - In the case of the example, the difference produced by one factor on the dependent variable are the same at each level of the other factor(s)

Unlike other dif methods the delta plot method can be used with small samples of test takers either in one of the groups or in both groups

true - Indeed the delta plot does not rely on sophisticated statistics with known asymptotic distribution

𝑤 𝑚 is a weighting factor that weighs the differences between p jFm and pjRm

true - in particular wm gives the greatest weight to the difference between p jFm and p jRm at those total score levels that are mostly attained by the standardisation group

It follows that the probability that all samples are selected from populations with the same mean is large (the null hypothesis is retained)

true and conversely It follows that the probability that all samples are selected from populations with the same mean is small (the null hypothesis is rejected)

Depending of the dif technique that has been used, different resutls may be obtained

true one of the reasons that reasoned judgement is necessary

Test translation always concerns at least _____ languages but this may not be the case with test adaption

two

In preparing translated versions of instruments, items should remain in the same order and page location and ________ differences should also be taken into account

typesetting

representation sample

unbiased indicator of what the population is like

Conversely DIF is an ____________ difference among groups of test takers who are supposed to be comparable with respect to the consturct measured

unexpected

An item shows _________ dif if it gives one group an advantage that is constant across all levels of proficiency

uniform

The statistical significance of parameters 𝛽2j and 𝛽3j is tested by means of likelihood-ratio test If 𝛽2j is statistically significant, then there is evidence of _____ DIF on item j. If 𝛽3j is statistically significant, then there is evidence of nonuniform DIF on item j

uniform

The population effect size η2 is ______ and must be estimated from samples

unknown

𝑠2 1 and s2 2

variances of the two samples

MS W

within-groups variance

X ~ Y * Z means that

x depends on y, on z and on the interaction between x and z

factorial design

η2 and partial η2 ε2 and partial ε2 ω2 and partial ω2

the three estimators of effect size in anova

η2 are 𝜂Ƹ2, epsilon-squared 𝜀Ƹ2, and omega-squared 𝜔2,

𝜂Ƹ2 tends to underestimate the population effect size η2

𝜂Ƹ2 tends to overestimate the population effect size η2

Since all quantities in the formulas of 𝜂Ƹ2, 𝜀Ƹ2, and 𝜔ෝ2 are greater than or equal to 0, the following inequality holds true:

𝜔ෝ 2 ≤ 𝜀 Ƹ 2 ≤ 𝜂 Ƹ 2

St-P-DIF ranges from − 1 to 1. Values close to ___ indicate that the item does not function differently

0

The larger the value of 𝑏j , the more difficult it is to respond correctly to item j (in an educational test) or to endorse it (in a psychological test)

1PL

The larger the value of 𝜃n , the larger the proficiency (ability, attitude) level of test taker n

1PL

𝑏j is the difficulty parameter of item j

1PL

It follows that the_____ model (which consideres item difficulty only) can be used to dtect only uniform dif whereas the 2pl and 3pl models (which consider both item difficulty and discrimination) are suitable for detecting uniform and nonuniform dif

1pl

Each ______ contingency table has group membership (reference group, focal group) and response type (correct, incorrect) as entries

2 x 2

The larger the value of 𝑎j , the larger the ability of item j to discriminate among test takerswith different proficiency levels

2PL

𝑏j and 𝑎j are the difficulty and discrimination parameters of item j

2PL

The larger the value of 𝑐j , the larger the probability that a test taker with a low proficiency level responds correctly to a moderate or difficult item (in an educational test) or endorses it (in a psychological test)

3PL

𝑏j , 𝑎j , and 𝑐j are the difficulty, discrimination, and pseudo guessing parameters of item j,

3PL

X ~ Y can be used for running

ANOVA

As the one way anova, also the factorial design is based on the decomposition of the total variance into

Between groups variance Within groups variance

The anova test is based on the decomposition of the total variance (difference among all individuals) into

Between groups variance and Within-groups variance

The ratio between the between-groups variance and the within groups vaiance expresses the size of the difference between the samples:

Between-groups variance Within-groups variance

A researcher applied the Lord's chi-square test to the estimates of the 1PL model parameters and concluded that no item exhibited uniform or nonuniform DIF. This statement:

Cannot be correct. The 1PL model does not allow for evaluating nonuniform DIF but only uniform DIF

This group consits of five questions that evaluate if the different language versions of the tests are approprate for the cultures in which they are to be administered

Cultural relevance and specificity

Items that substantially depart from the main axis of this elliptical cloud are flagged as ______

DIF

thrTID: it specifies the threshold on the perpendicular distances to detect ______ items Can be ________ (default is 1.5 - Other wise it can be given the value "norm" if the threshold must be derived from data by using a normality assumption on the delta points)

DIF, numerical value

is a statistical observation that involves matching test takers from different groups on the characteristic measured by the test and then looking for performance differencs on an item

Differntial item functioning (DIF)

The sampling distribution of F statistic approximates a_______ with degrees of freedom df = (k - 1) and (N - k)

F-distribution

Assuming that the pseudo guessing parameter of an item is larger than 0, the probability of endorsing the item is .5 when the proficiency level of the test taker equals the difficulty parameter of the item

FALSE - assuming that the pseudo guessing parameter of an item is larger than 0 the probability of endorsing the item is larger than .5 when the proficiency level of the test taker equals the difficulty parameter of the item

Dif is a differencce among groups of test takers for which the construct is supposed to differ

FALSE - dif is a difference among groups of test takers who are supposed to be comparable with respect to the construct

Whenever an item shows staistically significant dif item bias is present

FALSE - dif is necessary but insufficient condition for item bias

In factorial designs there are two or more depndent variables and one independent variable

FALSE - in factorial designs there are two or more independent variables and one dependent variable

ANOVA tests the alternative hypothesis that two or more indepndent samples are selected from populations with different mean

FALSE - it tests the null hypothesis that two or more indepndent samples are selected from populations with the same mean

In an analysis comparing k = 5 groups, the degrees of freedom of the within groups variance are k-1 = 5 - 1 = 4 4

FALSE - k-1 are the degrees of freedom of the between groups variance

The grand mean is the mean of the scores in the group with the largets number of individuals

FALSE - the grand mean is the mean of the scores of all individuals

The IRT methods for detecting DIF do not match test takers based on proficiency level

FALSE - the irt methods match test takers on an estimate of the "true" latent proficiency level

The Lord's chi-square test consists in evaluating the size of the area between the item characteristic curves of the reference and focal groups

FALSE - the lord' chi square test consists in comparing the item parameters estimated in the reference and focal groups. The area between the item characteristics is considered in the raju's area method

The reference group is usually some type of minority group

FALSE - the reference group is usually some type of majority group

In a factorial design with two indepndent factors, the total deviance is the sum of the deviances of the main effects of the two factors and the within groups deviance

FALSE - the sum also includes the deviance of the interaction effect, that is: SSt = Ssa + SSb + Ssaxb + SSw

In a factorial design with two factors, one having 3 levels adn the other having 5 levels there are 3 +5 = 8 groups of individuals

FALSE - there are 3 x 5 = 15 groups of individuals

A parameter is a measure referred to a sample

False - a parameter is a measure referred to a population

The t test for two independent samples tests the null hypothesis that two independent samples are selected from populations with different mean

False - it tests the null hypothesis that two independent samples are selected from populations with the same mean

Statistical inference allows for making inferences about the behaviour of the sample - starting from the characteristics of the population from which the sample is drawn

False - statistical inference allows for making inferences about the behaviour of the population - starting from the observation of a sample drawn from the population

According to Cohen (1988), d = - .4 denotes a very small effect in the population

False - the ds must be interpreted in absolute values

The null and alternative hypothesis refer to the value of a statistic in two or more samples

False - the null and alternative hypothesis refer to the value of a parameter in two or more populations

Two samples, one made up of mother and the other made up of their own children are independent samples

False - these two samples are dependent (or paired) samples because each mother is paired with her child

α is the probability of rejecting the null hypothesis when it is false

False - α is the probability of rejecting the null hypothesis when it is true

logistic regression - A ΔR2 of .10 is large according to the criteria by Jodoin and Gierl (2001) but moderate according to those by Zumbo and Thomas (1997)

False! - A ΔR2 of .10 is large according to the criteria by Jodoin and Gierl (2001) but negligible according to those by Zumbo and Thomas (1997)

logistic regression - It allows for detecting nonuniform DIF only

False! - It allows for detecting uniform and nonuniform DIF

logistic regression - It evaluates uniform DIF by testing the statistical significance of the effect of the total test score

False! - It evaluates uniform DIF by testing the statistical significance of the effect of group membership. The total test score is used as matching variable

For 𝜂Ƹ2, 𝜀Ƹ2, and 𝜔ෝ2, it is possible to compute a negative value

False! - It is possible to compute a negative value for 𝜀Ƹ2 and 𝜔ෝ2, but not for 𝜂Ƹ2

𝜔ෝ2 = .24 indicates that 24% of variance in the independent variable is accounted for by the dependent variable

False! - 𝜔ෝ2 = .24 indicates that 24% of variance in the dependent variable is accounted for by the independent variable

The operators respect the normal operator precedence

First exponentiations, then mulitplications and divisions and finally additions and substractions

_____ thresholds at .05 or .10 are commonly used to identify DIF items. Items with St-P-DIF in absolute terms above .05 or .10 are flagged for DIF

Fixed

The probability of answering the tested item correctly (pj) is the dependent variable The total test score (s) the group membership (g) and the interaction between these two (SG) are the independent variables

For each tested item j, a logistic regression model is fitted where

This group consists of four quetions that evaluate the extent to which the versions of each item are practically equivalent across the source and target languages of interest

General translation questions

This group consists of six questions that compare source and target language versions of the tests with respect to grammar and sytanc The empahsis is on differences in expression that result in either simplfying or making text more complex, resulting in differences in difficulty between the source and target language versions

Grammar and phrasing

The test score methods for detecting dif use the observed total test score to match test takers across groups

IRT methods for detecting DIF

When evaluating passages the following aspects should be considered: - The source and translated versions of passages should reflect comparable levels of language complexity and formality - _______ should be translated to communicate meaning rather than be translated word for word - No additions, omissions or clarifications of text should emerge in the translated versions of ______

Idioms, passages

1 - does the item have the same or highly similar meaning in both the source and target language versions?

In some cases differences in meaning stem from deficiences in translation

t test for two independent samples

It is a statistical test of the null hypothesis that two independent samples are selected from populations with the same mean

test translation

It is used to create a test in one language that is linguistically equivalent to that in another language

cohen's d

It provides an estimate of the standardised difference between the means of two populations

Refers to a significant group difference on an item that cannot be attributed to a factor that is relevant to the construct measured by the test

Item bias

This group consists of five questions that evaluate the extent to which the source and target language versions of the tests are comparable with respect to item format and physical appearance of items on the page (or screen, in the case of computerised tests)

Item formation and appearance

The values MH-α are often converted into the index MH-D-DIF via:

MH-D-DIF = − 2.35 ln(MH-α)

• A value of 1 indicates that the reference and focal groups exhibited the same performance to the item • A value greater than 1 indicates that the item favours the reference group • A value smaller than 1 indicates that the item favours the focal group

MH-α is interpreted as follows:

The between groups variance is further decomposed into

Main effects Interaction effects

______ dif is evaluated by testing the statistical significance of the interaction term

Nonuniform

Changes in text introduced through translation generally fall into three categores

Omissions Substitutions Additions

2 - is the language of the translated item of comparable difficulty and commonality with respect to the words in the item in the source language version?

One challenge is choosing between one expression that is an exact equivalent but is rarely used and a more commonly used but less equivelent one (It is possible to ahve a translation that is "correct" in the absolute sense but when (a) words of low frequency are used or (b) words are used in a way that is not comon in the target language, the items may not be comparable)

how to specify data in cohen's d data input in r

One is defining two vectors - each containing the data of one of the two groups

In many cases items do not consist only of stem and answer choices but refer to a passage or other item relevant stimulus material (e.g. Table, chart, graph)

Pasages and other item relevant stimulus materials (if relevant)

_____ values of St-P-DIF for an item indicate that the item favours the focal group, while negative values indicate that the item disadvantages the focal group

Positive

The ______ (Raju, 1988) consists in computing the area between the item characteristic curves of the reference and focal groups. If the area is zero, there is no DIF. As the area between the curves moves away from zero, DIF increases

Raju's area method

The total test score ___ is the matching variable used to link test takers on proficiency

S

A researcher conducted a one-way univariate ANOVA and computed 𝜂Ƹ2, 𝜀Ƹ2, and 𝜔ෝ2. He obtained the following values: .455, .501, and .464. However, he got a bit confused and does not know how to associate them with the three indicators. Could you help him?

Since the inequality 𝜔ෝ2 ≤ 𝜀Ƹ2 ≤ 𝜂Ƹ2 holds true, then: 𝜂Ƹ2 = .501 𝜀Ƹ2 = .464 𝜔ෝ 2 = . 4 5 5

11 - are there any grammatical clues that might make this item easier or harder in the target language version?

Some of these gramatical clues are - Inconsistencies between the stem and response options in multiple choice items - Inconsistencies among response options in multiple choice items - Words from the source language version that are mistakenly retained in the target version

In both uniform and unidirectionl nonuniform dif a group is always favoured over another group

TRUE

In crossing nonuniform dif a group is favoured at certain proficiency levels and disadvantaged at other proficiency levels

TRUE

The larger the value of 𝑐𝑗, the larger the probability of observing a 1 response to a moderate or difficult item by a test taker with a low proficiency level

TRUE

Assuming that SSt = 55.34 and SSw = 16.12, then SSb = 39.22

TRUE - since SSt = SSb + SSw, then SSb = SSt - SSw = 55.34 -16.12 = 39.22

If the difference among the samples is smaller than the difference within the samples the f statistic is smaller than 1

TRUE - the f statistic is the ratio between the between groups variance (that expresses the difference among the samples) and the withing groups variance (that expresses the difference within the samples). If the numerator is smaller than the denominator the f statistic will be smaller than 1

______ on the other hand, is a culturally focused process

Test adaptions

A distinction must be made between:

Test translation and Test adaption

__________ are the focus of the analysis and the ________ serves as a basis of comparison for the focal groups

The focal groups, reference group

The irt methods for detecting dif are

The likelihood ratio test Lord's chi square test Raju's area method

______ test consists in comparing the item parameters estimated in the reference and focal groups

The lord chi square

The irt models prescribe that the response of a test taker to an item can be explained as a function of

The proficiency (ability, attitude, etc) level of the test taker in the trait measured by the test One or more characteristics of the item that depend on the considered IRT model

__________ is usually some type of majority group

The reference group

It has been found to be effective for flagging adapted versions of items for dif when sample sizes were as small as 100

The standardisation method

allows for identifying uniform dif among dichotomous items

The standardisation method

People in different cultrues use response scales for psychological tests in different ways .....

The use (or lack of use) of extreme ratings

• A value of 0 indicates that the reference and focal groups exhibited the same performance to the item • A value greater than 0 indicates that the item favours the focal group • A value smaller than 0 indicates that the item favours the reference group

Thus, MH-D-DIF is interpreted as follows: (D = delta)

5 - is the item format, including physical layout the same in the two language versions?

To the extent possible an item should look the same or highly similar in all language versions

In the t test for two independent samples (1 and 2) the sampling distribution of statistic t approximates a t-distribution with degrees of freedom df = n1 +n2 - 2

True

Usually σ denotes the standard deviation of the population

True - σ is the Greek letter sigma

The decision about the null hypothesis is based on a significance level (denoted as ______ or (α) that expresses the probability of rejecting the null hypothesis when it is ______

Type I error, true

______ dif is evaluated by testing the statistical significance of the effect of group membership

Uniform

The following arguments must be specified

Value(s) - effect size value or vector of effect size values Rules - the criteria that must be used for evaluating the effect size value(s)

• First, the total test score is entered (i.e., the matching variable) • Then, group membership is entered • Finally, the interaction between total test score and group membership is entered

Variables are entered into the regression equation hierarchically

_______ are objects that contain sequences of values consisting of elements of the same type (e.g., all numbers, all strings)

Vectors

X ~ Y * Z can be used for running

a factorial design

parameter

a measure (e.g. Frequency, mean, variance) referred to a population

statistic

a measure (e.g. Frequency, mean, variance) referred to a sample

The larger the _________-, the larger the effect in the population

absolute value of d

All the commands start with "dif" followed by the ______ for the specific method

acronym

𝜃 is the proficiency level of test taker n

all 3 pl models

the difference between 𝜂Ƹ2 effect, 𝜀Ƹ2 effect, 𝜔ෝ2 effect and partial 𝜂Ƹ2 effect, partial 𝜀Ƹ2 effect , partial 𝜔ෝ2 effect is that the former take into account ______---, whereas the latter take into account only those sources that are directly relevant to the effect under consideration

all sources of variability

Differently from null-hypothesis significance testing, effect sizes _________ by sample size

are not affected

𝑋1 and 𝑋2

are the means of two samples

SS T and SS B

are the total and the between-groups deviance

The first developers of R were Robert gentlemand and Ross Ihaka of the deparment of statistics at the univeristy of ________

auckland

As a measure of the proportion of variance in the dependent variable that is accounted for by the independent variable, 𝜂2 cannot be ________

below 0

Another possible solution is using _____ test takers for evaluating comparability of items in original and target languages

bilingual

An item is flagged for DIF if the associated MH-χ2 statistic value is larger than a critical value based on the _______ distribution with one degree of freedom and an appropriately chosen significance level

chi-square

Sometimes an item could be flagged for dif but there could be no ____ reason why dif exits - the researcher should try to find a theoretical reason for why dif occurs

clear (one of the reasons that reasoned judgement is necessary)

Of the difference between the samples is substantially equivalent to the difference within the samples then the ratio is ______

close to 1

The terms small, medium and lareg are relative, not only to each other but to the area of behavioural science or even more particularly to the specific content and research ethod being employed in any given investigation In the face of this relativitiy thre is a certain risk inherent in offering conventional operational definitions for these terms

cohen's cautions for when interpreting effect size

t test fort wo indepndent samples

cohen's d

Simpson's paradox illustrate the importance of comparing the _______ as is done in DIF analysis

comparable

𝜀Ƹ2 and 𝜔2 are less biased estimators of the population effect size η2. They try to ________ for the fact that 𝜂Ƹ2 tends to overestimate η2 by subtracting dfBMSW from the numerator of 𝜂Ƹ2. Moreover, the formula of 𝜔ෝ2 also adds ______- to the denominator

compensate, MSW

Some languages define nouns as either masculine or feminine and if the translation is not done carefully, these references can cue test takers to the _______________

correct answer

The delta plot involves first calculating the proportions of test takers in the reference and focal groups who answer and item ________

correctly

The standardisation method involves first calculating the proportions of test takers in the reference and focal groups conditional on the total test score who answer an item ______

correctly

The number of rival explanations depend on __________ of the groups involved in the study: _________

cultural distance, more dissimilar groups may show more differences in target variables, but it is also more likely that they differ in background variables

The likelihood ratio test consists of fitting two irt models to the ______

data

thrSTD: Specifies the threshold on the St-P-DIF statistic to detect DIF items (____ is .10)

default

________ or transformed item difficulties is a simple method for identifying uniform dif among dichotomous items

delta plot

The delta plot usually takes the form of an elliptical cloud of ______

delta points

Recalling that the delta point is a pair of _________, whose first element is the delta score of the reference group and the second element is the delta score of the focal group,

delta scores

Univariate means that there is only one ________

dependent variable

The sampling distribution of t statistic approximates a t-distribution with degrees of freedom

df=n1 +n2 -2

The MH method is a popular and successful method for identifying uniform DIF among ______ items

dichotomous

In item impact the group _________ in item performance reflects true group diffrences on the construct measured

difference

Between groups variance

difference among individuals belonging to different samples

Within-groups variance

difference among the individuals within the same group

Specific terms that are used in items in the source language may not be appropriate for use in the target language and so it is necessary to adapt tems to reflect terms and content differences that are present in __________

different countries

Note that the probability of endorsing an item is .5 when the proficiency level of the test taker equals the _______ parameter of the item

difficulty

The _______ parameters are considered to detect uniform dif and the discrimination parameters are considered to detect nonuniform dif

difficulty

Avoiding controversial and inflammatory topics on assessments for which such ideas are not relevant is an important part of ensuring that an educational or psychological instrument does not cause undue emotional ___________ for respondents

distress

In cross-cultural research (as well as in other fields) statistical significance ________ necessarily reflect meaningful differences among people of different cultures

does not

Evidence of dif _______ directly transalte into item bias: dif is necessary but insufficient condition for item bias

does not, (one of the reasons that reasoned judgement is necessary)

one way anova for two or more than two independent samples

eta aquared (η2) epsilon sqaured (ε2) omega square (ω2)

Factorial designs (or factorial experiments) are

experiments with two (or more) independent variables (factors) and one dependent variable

The main effects

express the differences among the means for one factors, computed over the levels of the other factor

Passages must always be translated literally

false

Topics that are appropriate for a certain culture are certainly also appropriate for other cultures

false

When the target language version of the passage is much shorter than the source language version, text can be added to the former version to make its length comaprable to that of the source language version

false

In preparing translated versions of instruments, whenever possible item lengths should be kept comparable across item versions because there may be a _______ due to the use of longer items

fatigue effect

The choice of the irt model depnds on different issues such as item ______ (e.g. Dichotomous or polytomous items), the number of traits measured by the test (unidimensional or multidimensional test) and the goodness of fit of the irt model to the data (how well the model describes the observed data)

format

In ___________ translation, one bilingual translates the test from the source language to the target language and a second bilingual independently translates it back to the source language

forward-backward

x (with a line over it)

grand mean

The substantial significane of the results should be interpreted by __________ or by quantifying their contribution to knowledge

grounding them in a meaningful context

The factors are completely crossed (what does it mean?)

i.e., there are all possible combinations of the levels of the factors)

The samples are _______, that is, each individual belongs to a single samples

independent

In cross cultural research effect size allow for examining the degree to which cross cultural data are ____________ between two or more cultures' population

indicative of meaningful differences

df effect

is the degree of freedom of a certain effect

SS effect

is the deviance of a certain effect (e.g., effect A, effect B, effect A×B)

S pooled

is the pooled standard deviation of the two samples

The between-groups variance (or between-groups mean squares)

is the ratio between the between-groups deviance SSB and the between-groups degrees of freedom dfB = k ‒ 1 (where k is the number of groups):

The F statistic

is the ratio between the between-groups variance and the within-groups variance:

The within-groups variance (or within-groups mean squares)

is the ratio between the within-groups deviance SSW and the within-groups degrees of freedom dfW = N ‒ k (where N is the total number of individuals and k is the number of groups):

The between groups deviance (or between groups sum of squares)

is the sum of the squared deviations of the mean of the scores of individuals in group j from the grand mean:

The within groups deviance (or within groups of sum squares)

is the sume of the squared deviations of the score of each individual from the mean of their group

MS W

is the within-groups variance

A fixed threshold at 1.5 is commonly used to identify dif iems However it appeared that such a fixed threshold was mot often too conservative in the presence of DIF but why is that?

it might miss items that function differently across groups

Differential item functioning does not necessarily signify ________: item bias is present when an item has statistically flagged for DIF and the reason for the DIF is traced to a factor that is irrelevant to the construct the test is intended to measure

item bias

eg when a group has a higher proportion of individuals answereing an item correct than another group in the context of an educational assessment or a higher endorsement rate in the context of attitude, opinion, or personality assessment

item bias

Item bias must be distinguished from ________ that refers to a significant group difference on an item that reflects true group differences on the construct measured

item impact

𝛽0j is the intercept of item __

j

A disadvantage of irt methods is that they require a relativel _____ sample size for accurately estimateing the item parameters

large

The IRT methods are based on the estimation of an IRT model and use the estimate of the _______ trait level as a matching criterion

latent

One of the best methods for identifying uniform and nonuniform dif among dichotomous items

logistic regression

The test score methods are usually based on statistical procedures for categorical data and use the total test score as a matching criterion

methods for detecting dif

The values MH-D-DIF are easier to interpret than MH-α. The log transformation centres the value of MH-D-DIF to 0. In addition, the _____ sign changes the interpretation of values greater or smaller than 0

minus

If different language versions of an item function differentially when administered to bilinguals, it is likely that they will function differentially also when administered to ______

monolinguals

It is a ________ matrix where n represents the number of observations (usually the number of individuals) and m represents the number of variables

n x m

In preparing translated versions of instruments, care should be taken to ensure that the ________ of the task to be used in different language versions is sufficiently common across cultures

nature - Any bias suspected can be addressed with one or two practice items added to the instructions

Mantel and Haenszel (1959) developed a χ2-test of the null hypothesis of ______ association between item response and group membership, which corresponds to the hypothesis of no DIF

no conditional

Usually the null hypothesis is a conservative hypothesis stating that there is ________ between the populations

no difference

itemFit - specifies the model to be selected for drawing the item curves (possible values are: - Best - two curves are drawn if the item is flagged as DIF and only one curve is drawn if the item is flagged as ____-- (default) - Null - two curves are drawn - Not used if plot = IrStat

non-DIF

An item shows ________ dif if the advantage given to a group changes with the level of proficiency

nonuniform

A possible solution is using ____ items or items with little verbal loading, when possible

nonverbal

More precisely the sample of delta points is assumed to arise from a bivariate ______ distribution

normal

Two samples are independent when their elements are _______ to one another

not linked

Under the _______ of no conditional association between item response and group membership (which corresponds to the hypothesis of no DIF), the MH-χ2 statistic follows a chi-square distribution with one degree of freedom

null hypothesis

Item - specifies either the _____ or the name of the item for which logistic curves are plotted (used only if plot = "itemCurve"

number


Set pelajaran terkait

Rubins - Female Reproductive System

View Set

American Government Chapter 14: Foreign Policy

View Set

Occupational Rehabilitation and Return-to-Work Programming

View Set

The Industrial Revolution (Chapter 9)

View Set

Chapter 8: Atlantic Revolutions, Global Echoes 1750-1900

View Set

Chemistry Appendix Quiz F 11th grade

View Set