PSYC 4100: Chapter 6 What is test reliability/precision?
consistency, inconsistency
Basic idea is that test scores reflect 2 sorts of factors: factors that contribute to _________ and factors that contribute to __________
same testing situation
Carefully following all of the instructions for administering a test ensure that all test takers experience the ____ _______ ________ each time the test is given
Random error
Caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process for example random, individually experienced events such as illness, test interruption.
likely to exist
Confidence intervals give us a realistic estimate of how much error is _______ ___ _____ in an individual's observed score
homogeneous
The length of the test influences the reliability, the more ____________ questions the respondent answers, the more info the test yields about the concept the test is designed to measure
inter-rater
if a test involves multiple raters, to check reliability one should use
internal consistency
if a test is used to measure only one construct, reliability may be measured using methods of
Reliability coefficient
is simply a Pearson product-moment correlation coefficient applied to test scores
The coefficient alpha formula
is used for test questions that have more than one correct answer
measurement error
more _________ ______ = reduced reliability in the test score
heterogeneous tests
tests measuring more than one trait or characteristic
scorer reliability/interscorer agreement
the amount of consistency among scorers' judgements
observed score
the calculated confidence interval is almost always centered on an _________ ______, not a true score
coefficient alpha
the most commonly applied estimate of a multiple item scale's reliability; represents the average of all possible split-half reliabilities for a construct
r
the symbol used to represent a correlation coefficient
Parallel forms (or alternate forms)
two forms of the test that are as much alike as possible
correct decision
The more confident we're that an observed score on a test is really close to the person's true score, the more comfortable we can be that we are making a ______ ________ about the meaning of the score
higher
The number of questions on a test is directly related to reliability -- more questions = _________ reliability (given the items are equivalent in content and difficulty)
consistency
The term reliability/precision describes the ________ of test scores
alternate forms
Their practical advantage is that they can also be used as pretests and posttests if desired
split-half method
divide the test into halves and then compare the set of individual test scores on the first half with the set of individual test scores on the second half; measure of internal consistency
stable characteristics of the individual
factors that contribute to consistency
features of the situation
factors that contribute to inconsistency
test itself, administration, scoring, test takers
four sources of error that influence reliability
cohen's kappa
- a measure of interrater or intercoder reliability between two rater or coders the test yields the percentage of agreement and the probability of error.
Coefficient alpha
- describes the extent to which questions on a test or subscale are interrelated
practice effects
A limitation of test-retest is that test takers may score differently because of ________ _________ theses occur when test takers benefit from taking the test the first time (practice) which enables them to solve problems more quickly and correctly the next time
internal consistency method
A measure of how related the items (or groups of items) on the test are to one another
approximately the same way
A reliable test is one we can trust to measure each person in ____________ ____ ______ ______ every time it is used (But we don't talk about reliable tests -- we refer to reliable scores) - this has to do with scores/interpretations - tests can be used incorrectly
test-retest
A test developer gives to the same group of test takers on two different occasions, then correlated the scores from the first and secon administration
increase
Adding more question that measure the same trait can ________ a test's reliability
KR-20, coefficient alpha
Another way of measuring internal consistency is to compare individuals' scores on all possible ways of splitting the test into halves, which can be done using either
68
Approximately ___% of the observed test scores (X) would be within ±1 SEM of the true score (T).
95
Approximately ___% of the observed test scores (X) would be within ±2 SEM of the true score (T).
testing conditions, test instructions
Effective testing practices decrease chances that test taker's scores will be contaminated with error due to poor _________ _______ or poor _____ _____________
homogeneous
Estimating reliability using methods of internal consistency is appropriate only for __________ tests
quantify, variation
Generalizability theory allows you to ________ each possible source of __________ so that you can determine whether the results obtained are likely to generalize to different scores evaluate by different rates of different situations
systematic
Generalizability theory proposes separating sources of systematic error from random error to eliminate ________ error
truly equivalent
Greatest danger in alternate forms, are they ________ __________
test-retest
If the test will be used more than once, for reliability use
true-score variance, total observed-score variance
In classical test theory, reliability is defined as ________ ____________ divided by ________ _________________ ________
homogeneous subtest, factor
It isn't appropriate to calculate overall estimate of internal consistency when the test is heterogeneous, the developer should calculate and report an estimate of internal consistency for each _________ ________ or _______
valid
Just because a test is reliable does not make it ________ - that the inferences being made from the test scores are correct or that it is being used properly
well-defined characteristics
Much easier to develop alternate forms for _____________ ___________ (math ability) than for personality traits
intraclass correlation coefficient
One statistical way to evaluate inter rater reliability is the
instructions, constancy, effective practices
Proper test administration affects reliability in three ways
Alternate Forms method
Psychologists may give two forms of the same test, designed to be as much alike as possible, to the same people; which are administered as close in time as possible
standard error of measurement (SEM)
Psychologists use this as an index of the amount of inconsistency or error expected in an individual's observed test score
cancel themselves out
Random errors that may occur in one test will actually _______ ___________ ____ over an infinite number of testing occasions
unpredictable
Random measurement error affects each individual's score in an ____________ way
true score
Represents the score that would be obtained if the individual took a test infinite times and then the scores were averaged across infinite tests
accuracy
Systematic error affect _________ of a measurement
haphazardly
Test administrators need to be aware of individuals who complete the test in an unusually short amount of time -- they may have answered ___________, on purpose or by mistake
estimate, differ
The SEM is an ______ of how much the individual's observed test scores (X) might _______ from the individual's true score (T)
longer, shorter
The SPearman-Brown formula is also helpful to test developers who wish to estimate how the reliability/precision of a test would change if the test were made either _________ or _________
wider, more
The _______ the confidence interval, the ______ measurement error is present in the test score
random assignment
The best way to divide the test in split-half is to use ________ __________ to place each question in one half or the other
test-retest
This method is only appropriate when test takers are not likely to learn something the first time they take the test that can affect their scores on the second administration, or when the interval between is long enough to prevent these effects
index of the strength
To describe our estimates of reliability/precision of test scores, we use correlation to provide an _______ ____ _____ __________ of the relationship between two sets of test scores
respond
Treating all test takers in the same way decreases errors that arises from creating differences in the way individuals __________
shorter
Using split-half method means we are correlating the scores on two _______ versions of the test
sign, number itself
We look at a correlation coefficient in two ways to interpret its meaning
Systematic error
When a single source of error always increases or decreases the true score for the same amount
difference in true scores
When confidence intervals around the true scores overlap, you may not be sure that differences in observed test scores actually correspond to ________ __ ____ _______ ( and you may have to consider the two scores equivalent for decision-making purposes)
prophecy
When doing split-half reliability tests, we use the SPearman and Brown formula, the __________ formula - used to estimate what the reliability coefficient would be if the tests had not been cut in half but instead were the original length
high, low
When reliability of test scores is _____, then the SEM is ____
Coefficient alpha
describes the extent to which questions on a test or subscale are interrelated
overlap, equivalent differences
When true-score confidence intervals for two different scores ________, it means that you cannot be sure that the observed scores' differences reflect __________ __________ in true scores
spearman-brown formula
When using split-half method, we must mathematically adjust the reliability coefficient to compensate for the impact of splitting the test into halves; for this we use the
internal consistency
Whether knowledge of how a person answered one item can give information to help you predict how they answered on another test item
raw scores, larger SEM
____ _______ near the mean of score distribution tend to have a _______ _____ than very high or very low scores, but scaled scores that have been transformed from raw scores for easier interpretation can sometimes show the opposite patterns
practice, order
______ and ______ effects can add systematic as well as random error to test scores
KR-20
______ is a substitute for coefficient alpha when test items are scored dichotomously (true/false). (Used to estimate internal consistency reliability)
random, systematic
_______ error reduces reliability; _________ error does not
heterogeneous
____________ tests can be expected to have lower reliability coefficients
COhen's kappa
a popular index of agreement
Confidence interval
a range of scores that we feel confident will include the test taker's true score
homogeneous test
a test that measuers one trait or characteristic
Interrater agreement
an index of how consistently the scorers rate or make decisions
interval lengthens
as the ______ __________ the test-retest reliability will decline because the number of opportunities for the test takers or testing situation to change increases over time
coefficient alpha
basically splits the test in half in every possible way
order effects
changes in test scores resulting from the order in which the teste were taken - the test takers half receive from a first, the other half for B first
Generalizability theory
concerns how wella and under what conditions we can generalize an estimation of reliability/precision of test scores from one test administration to another
normally
random error is assumed to be ________ distributed
Homogeneity
refers to whether the questions measure the same trait or dimension
variance of true scores
reliability reflects the proportion of the total observed variance in the test scores that is attributable to the
decreases
shortening a test ________ its reliability
Measurement error
variations in measurements due to random mistakes or inconsistencies of the person
reliability coefficient
when we are referring to the results of the statistical evaluation of reliability, the term ___________ ___________ is preferred
Intrascorer reliability
whether a particular scorer is consistent in the way they assign scores from test to test