Ch 5 What is test reliability/ precision
internal consistency
The internal reliability of a measurement instrument; the extent to which each test question has the same value of the attribute that the test measures.
generalizability theory
A proposed method for systematically analyzing the many causes of inconsistency or random error in test scores, seeking to find systematic error that can then be eliminated.
scorer reliability
The degree of agreement between or among persons scoring a test or rating an individual; also known as interrater reliability.
reliability coefficient
the correlation between the two sets of scores on test that are, or are expected to be, parallel; the proportions of the observed score variance on a tests accounted for by the true score.
Spearman-Brown formula
the formula used to estimate what the reliability of the full length test would be after the test was split into two parts to use the split half method of estimating reliability.
true score
the score that would be obtained if an individual took a test an infinite number of times and then the average score across all the testings were computed.
alternate forms
two forms of a test that are alike in every way except for the questions; used to overcome problems such as a practice effects; also referred to as parallel forms.
random error
The unexplained difference between a test taker's true score and the obtained score; error that is nonsystematic and unpredictable, resulting from an unknown cause.
intrarater agreement
how well a scorer makes consistent judgments across all tests.
test-retest method
A method for estimating test reliability in which a test developer gives the same test to the same group of test takers on two different occasions and correlates the scores from the first and second administrations.
split-half method
A method for estimating the internal consistency or reliability of a test by giving the test once to one group of people, making a comparison of scores, dividing the test into halves, and correlating the set of scores on the first half with the set of scores on the second half.
Reliable test
A test that consistently yields the same measurements for the same phenomena.
homogeneous test
A test that measures only one trait or characteristic.
standard error of measurement (SEM)
An index of the amount of inconsistency or error expected in an individual's test score.
Reliability /precision
The consistency with which an instrument yields measurements.
interscorer agreement
The consistency with which scorers rate or make decisions.
interrater agreement
The consistency with which scorers rate or make yes/no decisions.
parallel forms
Two forms of a test that are alike in every way except questions; used to overcome problems such as practice effects; also referred to as alternate forms.
practice effects
When test takers benefit from taking a test the first time (practice) because they are able to solve problems more quickly and correctly the second time they take the same test.
confidence interval
a range of scores that the test user can feel confident includes the true score.
Correlation
a statistical procedure that provides an index of the strength and direction of the linear relationship between two variables.
heterogeneous test
a test that measures more than one trait or characteristic.
Cohen's Kappa
an index of agreement for two sets of scores or ratings.
order effects
changes in test scores resulting from the order in which tests or questions on tests were administered.
measurement error
variations or inconsistencies in the measurements yielded by a test or survey.
systemic error
when a single source of error can be identified as constant across all measurements.
intrascorer reliability
whether each scorer was consistent in the way he or she assigned scores from test to test.