Test Development: Chapter 8

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

DIF items

items that respondents from different groups at the same level of the underlying trait have different probabilities of endorsing as a function of their group membership

completion item

requires the examinee to provide a word or phrase that completes a sentence

item analysis

statistical procedures that assist in making judgments about which items are good as they are, which items need to be revised, and which items should be discarded

categorical scaling

stimuli are placed into one of two or more alternative categories that differ quantitatively with respect to some continuum

scoring drift

a discrepancy between scoring in an anchor protocol and the scoring of another protocol

qualitative item analysis

a general term for various non statistical procedures designed to explore how individual test items work

item-characteristic curve

a graphic representation of item difficulty and discrimination

rating scale

a grouping of words, statements, or symbols on which judgments of the strength of a particular trait, attitude, or emotion are indicated by the testtaker

item-discrimination index

a measure of item discrimination, symbolized by a lowercase italic "d"

binary-choice item

a multiple choice item that contains only two possible responses. Ex: true/false

test conceptualization

an idea for a test is conceived

scalogram analysis

an item analysis procedure and approach to test development that involves a graphic mapping of a testtaker's responses

differential item functioning (DIF)

an item functions differently in one group of test takers as compared to another group of test takers known to have the same level of the underlying trait

biased test item

an item that favors one particular group of examinees in relation to another when differences in group ability are controlled

short-answer item

another name for a completion item

qualitative methods

are techniques of data generation and analysis that rely primarily on verbal rather than mathematical or statistical procedures

test development

an umbrella term for all that goes into the process of creating a test

ipsative scoring

comparing a testtaker's score on one scale within a test to another scale within that same test

multiple-choice format

has three elements, the stem, a correct answer, and several incorrect answers referred to as distractors or foils


a test validation process conducted on two or more tests using the same sample of testtakers, also referred to as co-norming

test tryout

conditions that simulate the conditions that the final version of the test will be administered under

comparative scaling

entails judgments of a stimulus in comparison with every other stimulus on the scale


only one dimension is presumed to underlie the ratings

selected-response format

requires testakers to select a response from a set of alternative responses

contructed-response format

requires testakers to supply or to create the correct answer, not merely to select it

item-difficulty index

used in achievement testing

item-endorsement index

used in personality testing

Likert scale

a summative scale, used extensively in psychology, usually to scale attitudes

item format

variables such as the form, plan, structure, arrangement, and layout of individual test items

item bank

a relatively large and easily accessible collection of test questions

Guttman scale

a scaling method that yields ordinal-level measures where items range sequentially from weaker to stronger expressions of the attitude, belief, or feeling being measured

test construction

a stage in the process of test development that entails writing test items, as well as formatting items, setting scoring rules, and otherwise designing and building a test.

item-validity index

a statistic designed to provide an indication of the degree to which a test is measuring what it purports to measure

factor analysis

a statistical tool useful in determining whether items on a test appear to be measuring the same thing

sensitivity review

a study of test items, typically conducted during the test development process, in which items are examines for fairness to all prospective test takers and for the presence of offensive language stereotypes, or situations

essay item

a test item that requires the test taker to respond to a question by writing a composition, typically one that demonstrates recall of facts, understanding, analysis, and/ or interpretation

anchor protocol

a test protocol scored by a highly authoritative scorer that is designed as a model for scoring and a mechanism for resolving scoring discrepancies

test revision

action taken to modify a test's content or format for the purpose of improving the test's effectiveness as a tool of measurement

phantom factors

factors that actually are just artifacts of the small sample size


more than one dimension is thought to guide the testtaker's responses

item reliability index

provides and indication of the internal consistency of a test

DIF analysis

test developers scrutinize group-by-group item response curves, looking for what are termed DIF items

ckass scoring

testtaker responses earn credit toward placement in a particular class or category with other testtakers whose pattern of responses is presumably similar in some way

method of paired comparisons

testtakers are presented with pairs of stimuli which they are asked to compare

"think aloud" test administration

testtakers take the test and think aloud while they respond to each item

item branching

the ability of the computer to tailor the content and order of presentation of test items on the basis of responses to previous items

validty shrinkage

the decrease in item validities that inevitably occurs after cross-validation of findings

item fairness

the degree, if any, a test item is biased

test revision

the development process that the test undergoes as it is modified and revised

item analysis

the different types of statistical scrutiny that the test data can potentially undergo at this point

floor effect

the diminished utility of an assessment tool for distinguishing test takers at the low wen of the ability trait, or other attribute being measured

ceiling effect

the diminished utility of an assessment tool for distinguishing testtakers at the high end of the ability, trait, or other attribute being measured

summative scale

the final test score is obtained by summing the ratings across all the items

pilot work

the preliminary research surrounding the creation of a prototype of the test


the process of setting rules for assigning numbers in measurement

item pool

the reservoir or well from which items will or will not be drawn for the final version of the test


the revalidation of a test on a sample of test takers other than those on whom test performance was originally found to be a valid predictor of some criterion

matching item

the testtakers is presented with two columns, premises on the left, and responses on the right. the test taker must determine which response is best associated with which premise

Ensembles d'études connexes

Chapters 1-4 Quiz Questions Network+

View Set

Teaching Reading: Elementary (5205)

View Set

EAQ #6 Transient Ischemic Attack (TIA) and Stroke & Neuro and M/S

View Set

Lecture 13: Microevolution: Population genetics and speciation (Major Concepts)

View Set

Principles of Business: Chapter 18

View Set

economics : saving and borrowing

View Set