cross-cultural research methods
Conversely the irt methods for detecting dif use and estimate of the _____ latent proficiency level to match test takers across groups that is obtained by estimating an appropriately chosen IRT model
"true"
An item gives an advantage to a group at the lower levels of proficiency while favours another groups at the higher levels
(crossing nonuniform dif)
problems have often been detected in what are typically called translation differentilal item funcitioning studies _______
(translation DIF studies)
An item gives one group a minor advantage at the lower levels of proficiency and a major advantage at the higher levels
(unidrectional nonuniform DIF)
The use of bilingual test takers constitutes a sound experimental methodology for evaluating translated versions of items. However, some weaknesses of the approach can be pointed out:
- Bilingual test takers may not represent the test behaviour of either of the groups of interest - The idea of a truly bilingual test taker may not exist, and there are surely variations across test takers in their proficiencies in one or both languages, as well as differences in which language they are most dominant
Test score methods for detecting DIF
- Delta plot - Standardisation - Mantel-haenszel - Logistic regression
The review form consists of 25 questions that pertain to five broad topics
- General translation questions - Item format and appearance - Grammar and phrasing - Passages and other item relevant stimulus materials (if relevant) - Cultural relevance and specificity
which of the following statements is correct - An approapte translation from the source language into target language ensures that the two tests are equivalent - Test adaption is a broader concept than test translation - The difficulty of eduational test items never changes with the translation from one language into another - None of the other alternatives are correct
- Test adaption is a broader concept than test translation
Common irt models for unidimensional tests with dichotomous items are:
- The one parameter logistic (1pl) model - The two parameter logisitc (2pl) model - The three parameter logistic (3pl) model
The following three pieces of information must always be provided
- the data set - the group membership of the respondents - the focal group label
Equivalence entails (in general translation questions)
- the meaning of the item - the difficulty of the item and related elements such as passages or other stimulus materials - commonality and attractiveness, as well as similarity in connotation
Large sample sizes are desired because:
- they are more likely to be representative of populations - they are more likely to produce replicable results - They increase statistical power (i.e. The probability of rejecting the null hypothesis when it is false)
A fundamental tenet of multiple choice item is that there is _________ response option for a given item In preparing translation versions of instruments, care should be taken to ensure that changes are not made that alter the likelihood of another possibel correct response option for an item
only one correct
One-way means that there is _______ variable (factor)
only one indipentdent
𝜂Ƹ2 tends to ___________ the population effect size η2 because its numerator, SSB, is inflated by some error variability
overestimate, Bias is less for larger sample sizes
In presecene of __________ (i.e., vectors that contain information associated with each other), a vector can be used to identify values on the parallel vector
paralell vectors
In most cases, ______ are denoted by greek letters and _____ by roman letters
parameters, statistics
Also has to specify if it is ______, so either TRUE (default) or FALSE - if TRUE, partial indices are returned (for eta, epsilon or omega squared data input)
partial
Plot - it specifies the type of _____ to be displayed (values are: - IrStat - the ______ ratio statistics are displayed; default - itemCurve - the fitted logistic curves are ______
plot, likelihood, displayed
An advantage of irt methods is that, to evaluate dif in polytomous items, it is sufficient to fit an irt model for _______
polytomous items
for 𝜀Ƹ2 and 𝜔ෝ2 it is _______ to compute a value below 0
possible
Interpretations of cultural differences among people based on "statistically significant" findings may be based on "__________" between means (ie the nonzero difference between culture means may be so small that it is of little or no practical significance
practically insignificant differences
Since we observe a sample and not the entire population the conclusions about the population are always formulated on ____________
probailistic basis
Note that the probability of endorsing the item is close to the ______ guessing parameter of .2 when the proficiency level of the test taker is as low as −4 and is higher than .5 when the proficiency level equals the difficulty parameter of the item (i.e., 0)
pseudo
cross-cultural studies often involve ______, in which samples are not randomly assigned to conditions (researchers cannot randomly assign an individual to a culture)
quasi experiments
total variance (or total mean squares)
ratio between the total deivance SS t and total degrees of freedom df T = N - 1 (where N is the total number of individuals)
When a test is translated from one language to another the _______ is typically the group taking the version of the test written in the original language
reference group
the standardisatio group can be one of the following:
reference group, focal group, total group
If the null hypothesis is _______ at least two populations have different mean
rejected
The null hypothesis is _______ if the sample statistics are associated with a low probability of occurrence (less than α) when the null hypothesis is assumed to be true
rejected
why do we report a value thats less than 0 instead of converting it to 0
replacing negative values with 0 might cause an additional positive bias in an estimate that is based on averaging estimates in a meta-analysis
𝜂2 - effect size for one way anova
represents the proportion of variance in the dependent variable that is accounted for by the independent variable (i.e., the different groups)
The null hypothesis is ______ if the sample statistics are associated with a high probability of occurrence (greater than α) when the null hypothesis is assumed to be true
retained
The researcher conducts the study to determine whether the statement specified by the null hypothesis is likely to be true (the nyll hypothesis is ______) or not (the null hypothesis is _______)
retained, rejected
Simpson's paradox occurs when groups of data show one particular trend but this trend is _________ or disappears when the groups are combined together
reversed
All the variables that potentially have an effect on the dependent variable must be taken into account in the explanation of the observed cross-cultural differences, otherwise there is room for _________ of these differences
rival explanations
The criteria by cohen and those by other authors provide _______
rough benchmarks
If the null hypothesis is retained, all populations have the _______
same mean
The one way univariate anove (analysis of variance) is a statistical test of the null hypothesis that two or more independent samples are selected from populations with the ________
same mean
One of the main concerns about the null hypothesis significance testing is its dependence on ________
samples size - the larger the sample size the easier it is for small differences to become statistically significant
X i j
score of individual i of group j
"data" must be specified only when the variables in the formula are not ______ vectors
separate
population
set of all elements, individuals or units that meet the selection criteria for the group we want to study
statistical inference
set of methods based on probability theory that allow the researcher to make inferences about the behaviour of the population, starting from the observation of asample drawn from the population
sample
set of n elements, individuals, or units drawn from the population
When the operation involves two vectors, the two vectors must have the same lengt or the length of the longer vector must be a multiple of that of the ________
shorter vector
The threshold depends on the set of delta scores and the shape of the delta points (through the sample variances and covariance and the slop of the main axis) and on an appropriately chosen ________ level
significance
Each sample of subjects is assigned to a ________ of the levels of the factors
single combination
Effect ____ are obtained as the difference ΔR2 between proportions of variance R2 accounted for in two nested models
sizes
𝑛1 and 𝑛2
sizes of two samples
If the difference between the means observed on the two samples is ________ then the probability that the two samples are selected from populations with the same mean is large (the null hypothesis is retained)
small and conversely if the difference between the means observed on the two samples is large then the probability that the two samples are selected from populations with the same mean is small (the null hypothesis is rejected)
Regardless of how the effect sizes are interpreted, the _______ of the effect size indices must be reported to allow researchers to make their own interpretations of meaningfulness
specific values
Cohen's d measures effect size in terms of the number of ________ that the means of the two populations differ from one another
standard deviations
Actually, the backbone of scientific research in psychology - _________ - may lead to significant biases in the interpretation of results if the measurement instruments do not take linguistic and cultural differences into account
standardisation of measures
stdWeight: Specifies the ______ group - Possible values are "focal" (default), "reference" and "total"
standardization
Effect sizes are
statistical measures of the size of the effect in the population
Given a sufficiently large sample size - null hypothesis significance testing will always show a __________ (i.e. It will lead to rejecting the null hypothesis) even if the difference between population means is trivially small
statistically significant result
Hambleton and zeniski (2011) produced a comprehensive, clear and validated review form to standardise the checking of translated and adapted items on educational and psychological tests
the 25 question test
df B
the between-groups degree of freedom
𝜎2 B - effect size for one way anova
the between-groups population variance
There is an interaction when
the effect of one factor on the dependent variable is not the same at all levels of the other factor
The greater the difference between the samples, compared to the difference within the samples, _________
the greater the ratio
There is a relationship among cultural distance, the probability of observing statistically significant cross-cultural differences, and rival explanations that account for such differences: ___________, the easier it is to find significant cross-cultural differences, but the more difficult it is to interpret them
the larger the cross-cultural distance between groups
the larger the sample sizes
the more likely it is that the null hypothesis is rejected
The symbol # represents comment marker in R. Anything following______
the symbol is not executed by the system
𝜎2 T - effect size for one way anova
the total population variance
𝜎2 W - effect size for one way anova
the within-groups population variance
In particular, items with a perpendicular distance to the main axis larger than an acceptable ______ are flagged as DIF
threshold
The more recent approach prescribed deriving the ________ from data using a normality assumption on the delta points
threshold
SS T
total deviance
Whenever possible ______-- of graphics tables and other item elemetns should be limited to the necessary translation of text or labels
translation
Idiomatic expressions often have a specific connotation in the source language that can be difficult to translate and may result in noticeably longer amount of text in the target language version of the item to communicate the nuances of meaning that were more natural in the source language version which might result in _____
translation DIF
Some words may have additional shades of meaning in one language but not in the other and this may result in_________
translation DIF
Different font styles can be a source of translation DIF
true
Interpreting findings about similarities and differences is much more difficult in cross cultural studies than in exeperimental studies
true
The grammatical structure of translated items ay be more susceptible to problems when the sentence completion item format is used
true
The larger absolute value of Cohen's d, the larger the effect in the population
true
The main effects are most appropriately interpreted when there is no interactin
true
The result of an appropriate model (e.g. The result of an anova or factorial design) must be specified
true
The translation must appropriately reflect the society and customs of the target language
true
Then a test stastistic is computed that allows for determining the liklihood of obtaining the samples statistics if the null hypothesis was _______
true
logistic regression - A ΔR2 of .02 is negligible according to both the criteria by Jodoin and Gierl (2001) and Zumbo and Thomas (1997)
true
logistic regression - f the slope of an item for the interaction between total test score and group membership is statistically significant, there is evidence of nonuniform DIF on that item
true
pjFm is the proportion of test takers from the focal group with a total test score of m who answered item j = 1, ..., J correctly
true
pjRm is the proportion of test takers from the reference group with a total test score of m who answered item j = 1, ..., J correctly
true
𝑤m is the proportion of test takers from the standardization group with a total test score of m
true
𝛽1j ,𝛽3j ,and 𝛽 2j are the slopes of item j for S, G and SG
true
When an item is translated from one language to another and the two separate versions are administered to different groups of test takers, there is nothing to link the data across the separate groups
true (and further more: We cannot assume that the two groups are equivalent because the test takers come from different backgrounds We cannot consider the items to be equivalent because they are adaptations of one another)
Main effects are most appropriately interpreted when there is no interaction
true - In the case of the example, the difference produced by one factor on the dependent variable are the same at each level of the other factor(s)
Unlike other dif methods the delta plot method can be used with small samples of test takers either in one of the groups or in both groups
true - Indeed the delta plot does not rely on sophisticated statistics with known asymptotic distribution
𝑤 𝑚 is a weighting factor that weighs the differences between p jFm and pjRm
true - in particular wm gives the greatest weight to the difference between p jFm and p jRm at those total score levels that are mostly attained by the standardisation group
It follows that the probability that all samples are selected from populations with the same mean is large (the null hypothesis is retained)
true and conversely It follows that the probability that all samples are selected from populations with the same mean is small (the null hypothesis is rejected)
Depending of the dif technique that has been used, different resutls may be obtained
true one of the reasons that reasoned judgement is necessary
Test translation always concerns at least _____ languages but this may not be the case with test adaption
two
In preparing translated versions of instruments, items should remain in the same order and page location and ________ differences should also be taken into account
typesetting
representation sample
unbiased indicator of what the population is like
Conversely DIF is an ____________ difference among groups of test takers who are supposed to be comparable with respect to the consturct measured
unexpected
An item shows _________ dif if it gives one group an advantage that is constant across all levels of proficiency
uniform
The statistical significance of parameters 𝛽2j and 𝛽3j is tested by means of likelihood-ratio test If 𝛽2j is statistically significant, then there is evidence of _____ DIF on item j. If 𝛽3j is statistically significant, then there is evidence of nonuniform DIF on item j
uniform
The population effect size η2 is ______ and must be estimated from samples
unknown
𝑠2 1 and s2 2
variances of the two samples
MS W
within-groups variance
X ~ Y * Z means that
x depends on y, on z and on the interaction between x and z
factorial design
η2 and partial η2 ε2 and partial ε2 ω2 and partial ω2
the three estimators of effect size in anova
η2 are 𝜂Ƹ2, epsilon-squared 𝜀Ƹ2, and omega-squared 𝜔2,
𝜂Ƹ2 tends to underestimate the population effect size η2
𝜂Ƹ2 tends to overestimate the population effect size η2
Since all quantities in the formulas of 𝜂Ƹ2, 𝜀Ƹ2, and 𝜔ෝ2 are greater than or equal to 0, the following inequality holds true:
𝜔ෝ 2 ≤ 𝜀 Ƹ 2 ≤ 𝜂 Ƹ 2
St-P-DIF ranges from − 1 to 1. Values close to ___ indicate that the item does not function differently
0
The larger the value of 𝑏j , the more difficult it is to respond correctly to item j (in an educational test) or to endorse it (in a psychological test)
1PL
The larger the value of 𝜃n , the larger the proficiency (ability, attitude) level of test taker n
1PL
𝑏j is the difficulty parameter of item j
1PL
It follows that the_____ model (which consideres item difficulty only) can be used to dtect only uniform dif whereas the 2pl and 3pl models (which consider both item difficulty and discrimination) are suitable for detecting uniform and nonuniform dif
1pl
Each ______ contingency table has group membership (reference group, focal group) and response type (correct, incorrect) as entries
2 x 2
The larger the value of 𝑎j , the larger the ability of item j to discriminate among test takerswith different proficiency levels
2PL
𝑏j and 𝑎j are the difficulty and discrimination parameters of item j
2PL
The larger the value of 𝑐j , the larger the probability that a test taker with a low proficiency level responds correctly to a moderate or difficult item (in an educational test) or endorses it (in a psychological test)
3PL
𝑏j , 𝑎j , and 𝑐j are the difficulty, discrimination, and pseudo guessing parameters of item j,
3PL
X ~ Y can be used for running
ANOVA
As the one way anova, also the factorial design is based on the decomposition of the total variance into
Between groups variance Within groups variance
The anova test is based on the decomposition of the total variance (difference among all individuals) into
Between groups variance and Within-groups variance
The ratio between the between-groups variance and the within groups vaiance expresses the size of the difference between the samples:
Between-groups variance Within-groups variance
A researcher applied the Lord's chi-square test to the estimates of the 1PL model parameters and concluded that no item exhibited uniform or nonuniform DIF. This statement:
Cannot be correct. The 1PL model does not allow for evaluating nonuniform DIF but only uniform DIF
This group consits of five questions that evaluate if the different language versions of the tests are approprate for the cultures in which they are to be administered
Cultural relevance and specificity
Items that substantially depart from the main axis of this elliptical cloud are flagged as ______
DIF
thrTID: it specifies the threshold on the perpendicular distances to detect ______ items Can be ________ (default is 1.5 - Other wise it can be given the value "norm" if the threshold must be derived from data by using a normality assumption on the delta points)
DIF, numerical value
is a statistical observation that involves matching test takers from different groups on the characteristic measured by the test and then looking for performance differencs on an item
Differntial item functioning (DIF)
The sampling distribution of F statistic approximates a_______ with degrees of freedom df = (k - 1) and (N - k)
F-distribution
Assuming that the pseudo guessing parameter of an item is larger than 0, the probability of endorsing the item is .5 when the proficiency level of the test taker equals the difficulty parameter of the item
FALSE - assuming that the pseudo guessing parameter of an item is larger than 0 the probability of endorsing the item is larger than .5 when the proficiency level of the test taker equals the difficulty parameter of the item
Dif is a differencce among groups of test takers for which the construct is supposed to differ
FALSE - dif is a difference among groups of test takers who are supposed to be comparable with respect to the construct
Whenever an item shows staistically significant dif item bias is present
FALSE - dif is necessary but insufficient condition for item bias
In factorial designs there are two or more depndent variables and one independent variable
FALSE - in factorial designs there are two or more independent variables and one dependent variable
ANOVA tests the alternative hypothesis that two or more indepndent samples are selected from populations with different mean
FALSE - it tests the null hypothesis that two or more indepndent samples are selected from populations with the same mean
In an analysis comparing k = 5 groups, the degrees of freedom of the within groups variance are k-1 = 5 - 1 = 4 4
FALSE - k-1 are the degrees of freedom of the between groups variance
The grand mean is the mean of the scores in the group with the largets number of individuals
FALSE - the grand mean is the mean of the scores of all individuals
The IRT methods for detecting DIF do not match test takers based on proficiency level
FALSE - the irt methods match test takers on an estimate of the "true" latent proficiency level
The Lord's chi-square test consists in evaluating the size of the area between the item characteristic curves of the reference and focal groups
FALSE - the lord' chi square test consists in comparing the item parameters estimated in the reference and focal groups. The area between the item characteristics is considered in the raju's area method
The reference group is usually some type of minority group
FALSE - the reference group is usually some type of majority group
In a factorial design with two indepndent factors, the total deviance is the sum of the deviances of the main effects of the two factors and the within groups deviance
FALSE - the sum also includes the deviance of the interaction effect, that is: SSt = Ssa + SSb + Ssaxb + SSw
In a factorial design with two factors, one having 3 levels adn the other having 5 levels there are 3 +5 = 8 groups of individuals
FALSE - there are 3 x 5 = 15 groups of individuals
A parameter is a measure referred to a sample
False - a parameter is a measure referred to a population
The t test for two independent samples tests the null hypothesis that two independent samples are selected from populations with different mean
False - it tests the null hypothesis that two independent samples are selected from populations with the same mean
Statistical inference allows for making inferences about the behaviour of the sample - starting from the characteristics of the population from which the sample is drawn
False - statistical inference allows for making inferences about the behaviour of the population - starting from the observation of a sample drawn from the population
According to Cohen (1988), d = - .4 denotes a very small effect in the population
False - the ds must be interpreted in absolute values
The null and alternative hypothesis refer to the value of a statistic in two or more samples
False - the null and alternative hypothesis refer to the value of a parameter in two or more populations
Two samples, one made up of mother and the other made up of their own children are independent samples
False - these two samples are dependent (or paired) samples because each mother is paired with her child
α is the probability of rejecting the null hypothesis when it is false
False - α is the probability of rejecting the null hypothesis when it is true
logistic regression - A ΔR2 of .10 is large according to the criteria by Jodoin and Gierl (2001) but moderate according to those by Zumbo and Thomas (1997)
False! - A ΔR2 of .10 is large according to the criteria by Jodoin and Gierl (2001) but negligible according to those by Zumbo and Thomas (1997)
logistic regression - It allows for detecting nonuniform DIF only
False! - It allows for detecting uniform and nonuniform DIF
logistic regression - It evaluates uniform DIF by testing the statistical significance of the effect of the total test score
False! - It evaluates uniform DIF by testing the statistical significance of the effect of group membership. The total test score is used as matching variable
For 𝜂Ƹ2, 𝜀Ƹ2, and 𝜔ෝ2, it is possible to compute a negative value
False! - It is possible to compute a negative value for 𝜀Ƹ2 and 𝜔ෝ2, but not for 𝜂Ƹ2
𝜔ෝ2 = .24 indicates that 24% of variance in the independent variable is accounted for by the dependent variable
False! - 𝜔ෝ2 = .24 indicates that 24% of variance in the dependent variable is accounted for by the independent variable
The operators respect the normal operator precedence
First exponentiations, then mulitplications and divisions and finally additions and substractions
_____ thresholds at .05 or .10 are commonly used to identify DIF items. Items with St-P-DIF in absolute terms above .05 or .10 are flagged for DIF
Fixed
The probability of answering the tested item correctly (pj) is the dependent variable The total test score (s) the group membership (g) and the interaction between these two (SG) are the independent variables
For each tested item j, a logistic regression model is fitted where
This group consists of four quetions that evaluate the extent to which the versions of each item are practically equivalent across the source and target languages of interest
General translation questions
This group consists of six questions that compare source and target language versions of the tests with respect to grammar and sytanc The empahsis is on differences in expression that result in either simplfying or making text more complex, resulting in differences in difficulty between the source and target language versions
Grammar and phrasing
The test score methods for detecting dif use the observed total test score to match test takers across groups
IRT methods for detecting DIF
When evaluating passages the following aspects should be considered: - The source and translated versions of passages should reflect comparable levels of language complexity and formality - _______ should be translated to communicate meaning rather than be translated word for word - No additions, omissions or clarifications of text should emerge in the translated versions of ______
Idioms, passages
1 - does the item have the same or highly similar meaning in both the source and target language versions?
In some cases differences in meaning stem from deficiences in translation
t test for two independent samples
It is a statistical test of the null hypothesis that two independent samples are selected from populations with the same mean
test translation
It is used to create a test in one language that is linguistically equivalent to that in another language
cohen's d
It provides an estimate of the standardised difference between the means of two populations
Refers to a significant group difference on an item that cannot be attributed to a factor that is relevant to the construct measured by the test
Item bias
This group consists of five questions that evaluate the extent to which the source and target language versions of the tests are comparable with respect to item format and physical appearance of items on the page (or screen, in the case of computerised tests)
Item formation and appearance
The values MH-α are often converted into the index MH-D-DIF via:
MH-D-DIF = − 2.35 ln(MH-α)
• A value of 1 indicates that the reference and focal groups exhibited the same performance to the item • A value greater than 1 indicates that the item favours the reference group • A value smaller than 1 indicates that the item favours the focal group
MH-α is interpreted as follows:
The between groups variance is further decomposed into
Main effects Interaction effects
______ dif is evaluated by testing the statistical significance of the interaction term
Nonuniform
Changes in text introduced through translation generally fall into three categores
Omissions Substitutions Additions
2 - is the language of the translated item of comparable difficulty and commonality with respect to the words in the item in the source language version?
One challenge is choosing between one expression that is an exact equivalent but is rarely used and a more commonly used but less equivelent one (It is possible to ahve a translation that is "correct" in the absolute sense but when (a) words of low frequency are used or (b) words are used in a way that is not comon in the target language, the items may not be comparable)
how to specify data in cohen's d data input in r
One is defining two vectors - each containing the data of one of the two groups
In many cases items do not consist only of stem and answer choices but refer to a passage or other item relevant stimulus material (e.g. Table, chart, graph)
Pasages and other item relevant stimulus materials (if relevant)
_____ values of St-P-DIF for an item indicate that the item favours the focal group, while negative values indicate that the item disadvantages the focal group
Positive
The ______ (Raju, 1988) consists in computing the area between the item characteristic curves of the reference and focal groups. If the area is zero, there is no DIF. As the area between the curves moves away from zero, DIF increases
Raju's area method
The total test score ___ is the matching variable used to link test takers on proficiency
S
A researcher conducted a one-way univariate ANOVA and computed 𝜂Ƹ2, 𝜀Ƹ2, and 𝜔ෝ2. He obtained the following values: .455, .501, and .464. However, he got a bit confused and does not know how to associate them with the three indicators. Could you help him?
Since the inequality 𝜔ෝ2 ≤ 𝜀Ƹ2 ≤ 𝜂Ƹ2 holds true, then: 𝜂Ƹ2 = .501 𝜀Ƹ2 = .464 𝜔ෝ 2 = . 4 5 5
11 - are there any grammatical clues that might make this item easier or harder in the target language version?
Some of these gramatical clues are - Inconsistencies between the stem and response options in multiple choice items - Inconsistencies among response options in multiple choice items - Words from the source language version that are mistakenly retained in the target version
In both uniform and unidirectionl nonuniform dif a group is always favoured over another group
TRUE
In crossing nonuniform dif a group is favoured at certain proficiency levels and disadvantaged at other proficiency levels
TRUE
The larger the value of 𝑐𝑗, the larger the probability of observing a 1 response to a moderate or difficult item by a test taker with a low proficiency level
TRUE
Assuming that SSt = 55.34 and SSw = 16.12, then SSb = 39.22
TRUE - since SSt = SSb + SSw, then SSb = SSt - SSw = 55.34 -16.12 = 39.22
If the difference among the samples is smaller than the difference within the samples the f statistic is smaller than 1
TRUE - the f statistic is the ratio between the between groups variance (that expresses the difference among the samples) and the withing groups variance (that expresses the difference within the samples). If the numerator is smaller than the denominator the f statistic will be smaller than 1
______ on the other hand, is a culturally focused process
Test adaptions
A distinction must be made between:
Test translation and Test adaption
__________ are the focus of the analysis and the ________ serves as a basis of comparison for the focal groups
The focal groups, reference group
The irt methods for detecting dif are
The likelihood ratio test Lord's chi square test Raju's area method
______ test consists in comparing the item parameters estimated in the reference and focal groups
The lord chi square
The irt models prescribe that the response of a test taker to an item can be explained as a function of
The proficiency (ability, attitude, etc) level of the test taker in the trait measured by the test One or more characteristics of the item that depend on the considered IRT model
__________ is usually some type of majority group
The reference group
It has been found to be effective for flagging adapted versions of items for dif when sample sizes were as small as 100
The standardisation method
allows for identifying uniform dif among dichotomous items
The standardisation method
People in different cultrues use response scales for psychological tests in different ways .....
The use (or lack of use) of extreme ratings
• A value of 0 indicates that the reference and focal groups exhibited the same performance to the item • A value greater than 0 indicates that the item favours the focal group • A value smaller than 0 indicates that the item favours the reference group
Thus, MH-D-DIF is interpreted as follows: (D = delta)
5 - is the item format, including physical layout the same in the two language versions?
To the extent possible an item should look the same or highly similar in all language versions
In the t test for two independent samples (1 and 2) the sampling distribution of statistic t approximates a t-distribution with degrees of freedom df = n1 +n2 - 2
True
Usually σ denotes the standard deviation of the population
True - σ is the Greek letter sigma
The decision about the null hypothesis is based on a significance level (denoted as ______ or (α) that expresses the probability of rejecting the null hypothesis when it is ______
Type I error, true
______ dif is evaluated by testing the statistical significance of the effect of group membership
Uniform
The following arguments must be specified
Value(s) - effect size value or vector of effect size values Rules - the criteria that must be used for evaluating the effect size value(s)
• First, the total test score is entered (i.e., the matching variable) • Then, group membership is entered • Finally, the interaction between total test score and group membership is entered
Variables are entered into the regression equation hierarchically
_______ are objects that contain sequences of values consisting of elements of the same type (e.g., all numbers, all strings)
Vectors
X ~ Y * Z can be used for running
a factorial design
parameter
a measure (e.g. Frequency, mean, variance) referred to a population
statistic
a measure (e.g. Frequency, mean, variance) referred to a sample
The larger the _________-, the larger the effect in the population
absolute value of d
All the commands start with "dif" followed by the ______ for the specific method
acronym
𝜃 is the proficiency level of test taker n
all 3 pl models
the difference between 𝜂Ƹ2 effect, 𝜀Ƹ2 effect, 𝜔ෝ2 effect and partial 𝜂Ƹ2 effect, partial 𝜀Ƹ2 effect , partial 𝜔ෝ2 effect is that the former take into account ______---, whereas the latter take into account only those sources that are directly relevant to the effect under consideration
all sources of variability
Differently from null-hypothesis significance testing, effect sizes _________ by sample size
are not affected
𝑋1 and 𝑋2
are the means of two samples
SS T and SS B
are the total and the between-groups deviance
The first developers of R were Robert gentlemand and Ross Ihaka of the deparment of statistics at the univeristy of ________
auckland
As a measure of the proportion of variance in the dependent variable that is accounted for by the independent variable, 𝜂2 cannot be ________
below 0
Another possible solution is using _____ test takers for evaluating comparability of items in original and target languages
bilingual
An item is flagged for DIF if the associated MH-χ2 statistic value is larger than a critical value based on the _______ distribution with one degree of freedom and an appropriately chosen significance level
chi-square
Sometimes an item could be flagged for dif but there could be no ____ reason why dif exits - the researcher should try to find a theoretical reason for why dif occurs
clear (one of the reasons that reasoned judgement is necessary)
Of the difference between the samples is substantially equivalent to the difference within the samples then the ratio is ______
close to 1
The terms small, medium and lareg are relative, not only to each other but to the area of behavioural science or even more particularly to the specific content and research ethod being employed in any given investigation In the face of this relativitiy thre is a certain risk inherent in offering conventional operational definitions for these terms
cohen's cautions for when interpreting effect size
t test fort wo indepndent samples
cohen's d
Simpson's paradox illustrate the importance of comparing the _______ as is done in DIF analysis
comparable
𝜀Ƹ2 and 𝜔2 are less biased estimators of the population effect size η2. They try to ________ for the fact that 𝜂Ƹ2 tends to overestimate η2 by subtracting dfBMSW from the numerator of 𝜂Ƹ2. Moreover, the formula of 𝜔ෝ2 also adds ______- to the denominator
compensate, MSW
Some languages define nouns as either masculine or feminine and if the translation is not done carefully, these references can cue test takers to the _______________
correct answer
The delta plot involves first calculating the proportions of test takers in the reference and focal groups who answer and item ________
correctly
The standardisation method involves first calculating the proportions of test takers in the reference and focal groups conditional on the total test score who answer an item ______
correctly
The number of rival explanations depend on __________ of the groups involved in the study: _________
cultural distance, more dissimilar groups may show more differences in target variables, but it is also more likely that they differ in background variables
The likelihood ratio test consists of fitting two irt models to the ______
data
thrSTD: Specifies the threshold on the St-P-DIF statistic to detect DIF items (____ is .10)
default
________ or transformed item difficulties is a simple method for identifying uniform dif among dichotomous items
delta plot
The delta plot usually takes the form of an elliptical cloud of ______
delta points
Recalling that the delta point is a pair of _________, whose first element is the delta score of the reference group and the second element is the delta score of the focal group,
delta scores
Univariate means that there is only one ________
dependent variable
The sampling distribution of t statistic approximates a t-distribution with degrees of freedom
df=n1 +n2 -2
The MH method is a popular and successful method for identifying uniform DIF among ______ items
dichotomous
In item impact the group _________ in item performance reflects true group diffrences on the construct measured
difference
Between groups variance
difference among individuals belonging to different samples
Within-groups variance
difference among the individuals within the same group
Specific terms that are used in items in the source language may not be appropriate for use in the target language and so it is necessary to adapt tems to reflect terms and content differences that are present in __________
different countries
Note that the probability of endorsing an item is .5 when the proficiency level of the test taker equals the _______ parameter of the item
difficulty
The _______ parameters are considered to detect uniform dif and the discrimination parameters are considered to detect nonuniform dif
difficulty
Avoiding controversial and inflammatory topics on assessments for which such ideas are not relevant is an important part of ensuring that an educational or psychological instrument does not cause undue emotional ___________ for respondents
distress
In cross-cultural research (as well as in other fields) statistical significance ________ necessarily reflect meaningful differences among people of different cultures
does not
Evidence of dif _______ directly transalte into item bias: dif is necessary but insufficient condition for item bias
does not, (one of the reasons that reasoned judgement is necessary)
one way anova for two or more than two independent samples
eta aquared (η2) epsilon sqaured (ε2) omega square (ω2)
Factorial designs (or factorial experiments) are
experiments with two (or more) independent variables (factors) and one dependent variable
The main effects
express the differences among the means for one factors, computed over the levels of the other factor
Passages must always be translated literally
false
Topics that are appropriate for a certain culture are certainly also appropriate for other cultures
false
When the target language version of the passage is much shorter than the source language version, text can be added to the former version to make its length comaprable to that of the source language version
false
In preparing translated versions of instruments, whenever possible item lengths should be kept comparable across item versions because there may be a _______ due to the use of longer items
fatigue effect
The choice of the irt model depnds on different issues such as item ______ (e.g. Dichotomous or polytomous items), the number of traits measured by the test (unidimensional or multidimensional test) and the goodness of fit of the irt model to the data (how well the model describes the observed data)
format
In ___________ translation, one bilingual translates the test from the source language to the target language and a second bilingual independently translates it back to the source language
forward-backward
x (with a line over it)
grand mean
The substantial significane of the results should be interpreted by __________ or by quantifying their contribution to knowledge
grounding them in a meaningful context
The factors are completely crossed (what does it mean?)
i.e., there are all possible combinations of the levels of the factors)
The samples are _______, that is, each individual belongs to a single samples
independent
In cross cultural research effect size allow for examining the degree to which cross cultural data are ____________ between two or more cultures' population
indicative of meaningful differences
df effect
is the degree of freedom of a certain effect
SS effect
is the deviance of a certain effect (e.g., effect A, effect B, effect A×B)
S pooled
is the pooled standard deviation of the two samples
The between-groups variance (or between-groups mean squares)
is the ratio between the between-groups deviance SSB and the between-groups degrees of freedom dfB = k ‒ 1 (where k is the number of groups):
The F statistic
is the ratio between the between-groups variance and the within-groups variance:
The within-groups variance (or within-groups mean squares)
is the ratio between the within-groups deviance SSW and the within-groups degrees of freedom dfW = N ‒ k (where N is the total number of individuals and k is the number of groups):
The between groups deviance (or between groups sum of squares)
is the sum of the squared deviations of the mean of the scores of individuals in group j from the grand mean:
The within groups deviance (or within groups of sum squares)
is the sume of the squared deviations of the score of each individual from the mean of their group
MS W
is the within-groups variance
A fixed threshold at 1.5 is commonly used to identify dif iems However it appeared that such a fixed threshold was mot often too conservative in the presence of DIF but why is that?
it might miss items that function differently across groups
Differential item functioning does not necessarily signify ________: item bias is present when an item has statistically flagged for DIF and the reason for the DIF is traced to a factor that is irrelevant to the construct the test is intended to measure
item bias
eg when a group has a higher proportion of individuals answereing an item correct than another group in the context of an educational assessment or a higher endorsement rate in the context of attitude, opinion, or personality assessment
item bias
Item bias must be distinguished from ________ that refers to a significant group difference on an item that reflects true group differences on the construct measured
item impact
𝛽0j is the intercept of item __
j
A disadvantage of irt methods is that they require a relativel _____ sample size for accurately estimateing the item parameters
large
The IRT methods are based on the estimation of an IRT model and use the estimate of the _______ trait level as a matching criterion
latent
One of the best methods for identifying uniform and nonuniform dif among dichotomous items
logistic regression
The test score methods are usually based on statistical procedures for categorical data and use the total test score as a matching criterion
methods for detecting dif
The values MH-D-DIF are easier to interpret than MH-α. The log transformation centres the value of MH-D-DIF to 0. In addition, the _____ sign changes the interpretation of values greater or smaller than 0
minus
If different language versions of an item function differentially when administered to bilinguals, it is likely that they will function differentially also when administered to ______
monolinguals
It is a ________ matrix where n represents the number of observations (usually the number of individuals) and m represents the number of variables
n x m
In preparing translated versions of instruments, care should be taken to ensure that the ________ of the task to be used in different language versions is sufficiently common across cultures
nature - Any bias suspected can be addressed with one or two practice items added to the instructions
Mantel and Haenszel (1959) developed a χ2-test of the null hypothesis of ______ association between item response and group membership, which corresponds to the hypothesis of no DIF
no conditional
Usually the null hypothesis is a conservative hypothesis stating that there is ________ between the populations
no difference
itemFit - specifies the model to be selected for drawing the item curves (possible values are: - Best - two curves are drawn if the item is flagged as DIF and only one curve is drawn if the item is flagged as ____-- (default) - Null - two curves are drawn - Not used if plot = IrStat
non-DIF
An item shows ________ dif if the advantage given to a group changes with the level of proficiency
nonuniform
A possible solution is using ____ items or items with little verbal loading, when possible
nonverbal
More precisely the sample of delta points is assumed to arise from a bivariate ______ distribution
normal
Two samples are independent when their elements are _______ to one another
not linked
Under the _______ of no conditional association between item response and group membership (which corresponds to the hypothesis of no DIF), the MH-χ2 statistic follows a chi-square distribution with one degree of freedom
null hypothesis
Item - specifies either the _____ or the name of the item for which logistic curves are plotted (used only if plot = "itemCurve"
number