Week 6: Test Construction
In norming a test we need to bear in mind
how the test is to be used
The empirical approach to psychological test development
relies on the frequency of endorsement of items by selected groups
Psychological tests
seldom achieve more than a statement about rank order in terms of the characteristic of interest
The discriminability of an item refers to the capacity of the item to
separate those that are high and low on the trait of interest
Item Response Theory is a stricter model for test construction than classical true score theory in that it
specifies the parameters of the trace line
If p is the proportion of a sample endorsing a dichotomously scored item in the keyed direction and q is 1-p (i.e. the proportion endorsing the item in the opposite direction), then the standard deviation of scores on the item is
square root of pq
The Mental Measurements Yearbook is
a catalogue of test reviews
The biserial correlation
can be estimated from the item discrimination index
Multiple choice tests provide more than two options for each question to overcome the problem of
guessing
Scalogram analysis implies that a person's position on a trait indicates
whether they will get a test item right or wrong
Test construction
follows a sequence of steps but these steps may need to be retraced from time to time
A good manual for a psychological test
- indicates to the unqualified user that they should not be using the test - is comprehensible to the qualified test user - is precise enough to satisfy measurement specialists
The item validity is the
Correlation of the item with an external criterion measure of the construct being tested
Thurstone's approach to the construction of attitude scales was replaced for most practical purposes by one developed by
Rensis Likert
Because 0 does not represent the complete absence of heat, the Celsius scale cannot be considered
a ratio scale
The first step in constructing a psychological test is to
be clear about the construct or constructs to be assessed with the test
According to SS Stevens, which of the following is not a type of measurement
dichotomous
Although it is useful to include norms for different groups from the population we need to bear in mind that
increasing the number of groups increases the overall sample size required
Classical Test Theory
is a theory of testing based on the idea that a person's observed or obtained score on a test is the sum of a true score (error-free score) and an error score
Item Response Theory
is a theory of testing based on the relationship between individuals' performances on a test item and the test takers' levels of performance on an overall measure of the ability that item was designed to measure.
In using Item Response Theory in practice
item difficulty is often selected as the focus of interest
In studying the behaviour of items in a psychological test, one of the item statistics recommended by some experts is item reliability. This is
the product of the item-total correlation and the standard deviation of the item
Thurstone's model for item construction calls for a
non-monotonic trace line
The term 'social desirability', when used with respect to construction of a personality test, refers to the fact that
people differ in their tendency to create a favorable impression of themselves when answering test items
Systematic bias in a test can occur when
people respond to non-essential features of items rather than to item content
An important step in writing items for psychological tests is to
pilot test the items with individuals similar to those for whom the test is being developed
Items with very high or very low endorsement frequencies generally are
poor items
In preparing a test for publication we need to spend a good deal of time on
preparing a manual for the test user
Monotonic means
the function does not change direction once it begins,
A trace line for an item relates
the likelihood of endorsement of the item to the strength of the underlying trait
In conducting item analysis in test construction
the procedure can be repeated with new samples of items until a satisfactory set has been found
The model of measurement that underlies many commercially available psychological tests is
the weak true score model
If a person endorses a substantial number of items in the improbable direction (e.g. 'I have never told a lie in my life'), we might infer the person is
trying to create a favourable impression of himself or herself