Chapter 6: What is Test Reliability/Precision?
scorer reliability/interscorer agreement
The extent to which two people scoring a test agree on the test score, or the extent to which a test is scored correctly
Schmitt emphasized what about internal consistency
it's not the same as homeogenity
heterogeneous tests
measure more than one trait
internal consistency
measure of how related the items (or groups of items) on test are to one another
homogeneous tests
measure one trait
correlation
measures stability of test scores
high coefficent alpha
not proof of measure of a good skill set
order effects
occur when the order in which the participants experience conditions in an experiment affects the results of the study
practice effects
test takers benefit from taking the test the first time
Methods for estimating reliability/validity
test-retest, alternate-forms, internal consistency (split-half, coefficent alpha, evaluative)
Reliability/Precision
the ability of a measuring instrument to give consistent results on repeated trials
alterate forms or parallel forms
two forms of same test are given to eliminate practice effects
generalization theory
how well and under what conditions we can generalize an estimation of reliability/precision of test scores from one test administration to another
Cohen's kappa
A measure of inter-rater or inter-coder reliability between two raters or coders. The test yields the percentage of agreement and the probability of error
standard error of measurement
An index of the amount of error in a test or measure. The standard error of measurement is a standard deviation of a set of observations for the same test.
split-half method
Divide a test/questionnaire into two parts after data has been obtained Correlate the two sets of responses A high positive correlation = reliable test.
Intrarater agreement
How well a scorer makes consistent judgments across all tests. (scores lasts exams same as first exams)
Personality Assessment Inventory
Leslie Morey, used in clinical
systematic error
When a single source of error can be identified as constant across all measurements. (ex. a bathroom scale that adds three pounds)
confidence interval
a range of scores that we feel confident will include the test taker's true score
random error
an error that occurs when the selected sample is an imperfect representation of the overall population (X=T+E)
interrater agreement
an index of how consistently the scorers rate or make decisions
reliable test
one we can trust to measure each person in approximately the same way every time it's used
test-retest method
same test administered to the same group after a short period of time and results are compared using a correlation
Measurement error
variations in the measurements in a room, for example. random due to inconsistencies
intrascorer reliability
whether each clinician was consistent in the way he or she assigned scores from test to test