Psychological Testing: Chapter 4
classical test theory
(CTT; also variously referred to as true score theory ) the assumption is made that each testtaker has a true score on a test that would be obtained but for the action of measurement error.
Standard error of the mean
A measure of sampling error
Subgroup norms
A normative sample can be segmented by any of the criteria initially used in selecting subjects for the sample. What results from such segmentation are more narrowly defined subgroup norms
Standard error of the difference
A statistic used to estimate how large a difference between two scores should be before the difference is considered statistically significant
Standard error of measurement
A statistic used to estimate the extent to which an observed score deviates from a true score
Validity
A test is considered valid for a particular purpose if it does, in fact, measure what it purports to measure.
Age norms
Also known as age-equivalent scores, age norms indicate the average performance of different samples of test-takers who were at various ages at the time the test was administered
Standard
As a noun, standard may be defined as that which others are compared to or evaluated against. As an adjective, standard often refers to what is usual, generally accepted, or commonly employed. The verb "to standardize" refers to making or transforming something into something that can serve as a basis of comparison or judgment.
Error variance
Because error is a variable that must be taken account of in any assessment, we often speak of error variance, that is, the component of a test score attributable to sources other than the trait or ability measured.
Grade norms
Designed to indicate the average test performance of testtakers in a given school grade, grade norms are developed by administering the test to representative samples of children over a range of consecutive grade levels
Standard error of estimate
In regression, an estimate of the degree of error involved in predicting the value of one variable from another
cumulative scoring
Inherent in cumulative scoring is the assumption that the more the testtaker responds in a particular direction as keyed by the test manual as correct or consistent with a particular trait, the higher that testtaker is presumed to be on the targeted ability or trait.
norm-referenced
One way to derive meaning from a test score is to evaluate the test score in relation to other scores on the same test. As we have pointed out, this approach to evaluation is referred to as norm-referenced.
local norms
Provide normative information with respect to the local population's performance on some test.
Assumption 2
Psychological Traits and States Can Be Quantified and Measured
Assumption 1
Psychological Traits and States Exist
Assumption 3
Test-Related Behaviour Predicts Non-Test-Related Behaviour
Assumption 7
Testing and Assessment Benefit Society
Assumption 6
Testing and Assessment Can Be Conducted in a Fair and Unbiased Manner
Assumption 4
Tests and Other Measurement Techniques Have Strengths and Weaknesses
Standardisation
The process of administering a test to a representative sample of test-takers for the purpose of establishing norms is referred to as standardisation or test standardisation .
Sampling
The process of selecting the portion of the universe deemed to be representative of the whole population is referred to as sampling . The test developer can obtain a distribution of test responses by administering the test to a sample of the population—a portion of the universe of people deemed to be representative of the whole population.
Assumption 5
Various Sources of Error Are Part of the Assessment Process
norm-referenced testing and assessment
a method of evaluation and a way of deriving meaning from test scores by evaluating an individual testtaker's score and comparing it to scores of a group of testtakers. In this approach, the meaning of an individual test score is understood relative to other scores on the same test. A common goal of norm-referenced tests is to yield information on a testtaker's standing or ranking relative to some comparison group of testtakers.
criterion
a standard on which a judgment or decision may be based.
States
also distinguish one person from another but are relatively less enduring (Chaplin et al., 1988).
percentile
an expression of the percentage of people whose score on a test or measure falls below a particular raw score.
construct
an informed, scientific concept developed or constructed to describe or explain behaviour. We can't see, hear, or touch constructs, but we can infer their existence from overt behaviour.
national norms
are derived from a normative sample that was nationally representative of the population at the time the norming study was conducted.
to norm
as well as related terms such as norming , refer to the process of deriving norms. Norming may be modified to describe a particular type of norm derivation.
trait
has been defined as "any distinguishable, relatively enduring way in which one individual varies from another" (Guilford, 1959, p. 6). The trait term that an observer applies, as well as the strength or magnitude of the trait presumed to be present, is based on observing a sample of behaviour. Samples of behaviour may be obtained in a number of ways, ranging from direct observation to the analysis of self-report statements or pencil-and-paper test answers.
Criterion-referenced testing and assessment
may be defined as a method of evaluation and a way of deriving meaning from test scores by evaluating an individual's score with reference to a set standard.
domain sampling
may refer to either (1) a sample of behaviors from all possible behaviors that could conceivably be indicative of a particular construct or (2) a sample of test items from all possible items that could conceivably be used to measure a particular construct.
overt behaviour
overt behaviour refers to an observable action or the product of an observable action, including test- or assessment-related responses.
Percentage correct
refers to the distribution of raw scores—more specifically, to the number of items that were answered correctly multiplied by 100 and divided by the total number of items.
normative sample
that group of people whose performance on a particular test is analyzed for reference in evaluating the performance of individual testtakers. Whether broad or narrow in scope, members of the normative sample will all be typical with respect to some characteristic(s) of the people for whom the particular test was designed.
race norming
the controversial practice of norming on the basis of race or ethnic background.
Reliability
the criterion of reliability involves the consistency of the measuring tool: the precision with which the test measures and the extent to which error is present in measurements. In theory, the perfectly reliable measuring tool consistently measures in the same way.
error
traditionally refers to something that is more than expected; it is actually a component of the measurement process. More specifically, error refers to a long-standing assumption that factors other than what a test attempts to measure will influence performance on the test.
fixed reference group scoring system
type of aid providing a context for interpretation. Here, the distribution of scores obtained on the test from one group of testtakers—referred to as the fixed reference group —is used as the basis for the calculation of test scores for future administrations of the test. Perhaps the test most familiar to college students that exemplifies the use of a fixed reference group scoring system is the SAT.
Norm
used in the scholarly literature to refer to behavior that is usual, average, normal, standard, expected, or typical. In a psychometric context, norms are the test performance data of a particular group of testtakers that are designed for use as a reference when evaluating or interpreting individual test scores.
Criteria for a good test
would include clear instructions for administration, scoring, and interpretation. It would also seem to be a plus if a test offered economy in the time and money it took to administer, score, and interpret it. Most of all, a good test would seem to be one that measures what it purports to measure. Beyond simple logic, there are technical criteria that assessment professionals use to evaluate the quality of tests and other measurement procedures. Test users often speak of the psychometric soundness of tests, two key aspects of which are reliability and validity.
