Chapter 4: Validity and Test Development
Convergent Validity
demonstrated when a test correlates highly with other variables or tests with which it shares an overlap of constructs. Ex., two test designed to measure different types of intelligence should, nonetheless, share enough of the general factor in intelligence to produce a hefty correlation when jointly administered to a heterogeneous sample of subjects.
Discriminant Validity
demonstrated when a test does not correlate with variables or tests from which it should differ. Ex., social interest and intelligence are theoretically unrelated, and tests of these two constricts should correlate negligibly, if at all.
Sensitivity
accurate identification of patients who have a syndrome in this care, dementia.
Rational Scale construction (Internal consistency)
all scale items correlate positively with each other and also with the total score for the scale
Test Utility
can be summed up by the question, "Does use of this test result in better patient outcomes or more efficient delivery of services?" ex., we might envision an experiment in which individual psychotherapy clients were randomly assigned to two groups.
Convergent Validity
demonstrated when a test correlated highly with other variables or tests with which it shares an overlap of constructs. Ex., two test designed to measure different types of intelligence should, nonetheless, share enough of the general factor in intelligence to produce a hefty correlation when jointly administered to a heterogenous sample of subjects.
Ratio Scale
has all the characteristics of an interval scale but also possesses a conceptually meaningful zero point in which there is a total absence of the characteristic being measured.
Extra-validity concerns
include side effects and unintended consequences of testing.
Likert Scale
presents the examinee with five responses ordered on an Strongly agree/strongly disagree or approve/disapprove continuum.
Testing is big business
test development is extraordinarily expensive, which means that publishers are inherently conservative about introducing new tests.
Ordinal Scale
Constitutes a form of ordering or ranking. A ranking of "1" is "more" than a ranking of "2", and so on. The "more" refers to the order of preference, however, ordinal scales fail to provide information about the relative strength of rankings.
Interval Scale
Provides information about ranking, but also supplies a metric for gauging the differences between ranking. Scale from 1 to 100
Validity coefficient
The correlation between test and criterion. The higher the validity coeficient the more accurate is the test in predicting the criterion.
Feedback from the examinees
The inter-university entrance exam is a group test consisting of five multiple choice subtests: General knowledge, Figural Reasoning, Comprehension, Mathematical Reasoning, and English.
Factor Analysis
is to identify the minimum number of determiners (factors) required to account for the intercorrelations among a battery of tests.
Validity Shrinkage
A common discovery in cross-validation research is that a test predicts the relevant criterion less accurately with the new sample of examinees than with the original tryout sample.
Factor Loading
A correlation between an individual test and a single factor. Factor loadings can vary between -1.0 and +1.0. the final outcome of a factor analysis is a table depicting the correlation of each test with each factor.
Mini-Mental State Examination
A short screening test of cognitive functioning. Consists of a number of simple questions (ex. what day is this?) and easy tasks (ex. remembering three words). The test yields a score from 0 (no items correct) to 30 (all items correct). Dementia is a general term that refers to significant cognitive decline and memory loss caused by a disease process such as Alzheimer's disease or the accumulation of small strokes. Patients are known from independent, comprehensive medical and psychological workups to meet the criteria for dementia or not.
Face Validity
A test has face validity if it looks valid to test users, examiners, and especially the examinees.
Validity
A test is valid to the extent that inferences made from it are appropriate, meaningful, and useful
Criterion Validity
Any outcome measure against which a test is validated. Must be more than just imaginative; they must also be reliable, appropriate, and free of contamination from the test itself. Accumulations of stressful life events Ex. divorce, job promotion, traffic tickets.
Content Validity
Determined by the degree to which the questions, tasks, or items on a test are representative of the universe of behaviour the test was designed to sample. Nothing more then a sampling issue.
Multitrait-multimethod matrix
Each test is administered twice to the same group of subjects and scores on all pairs of tests are correlated. This matrix is a rich source of data on reliability, convergent validity, and discriminant validity.
False Positives
Some persons predicted to succeed will, in fact, fail.
Method of empirical keying
Test items are selected for a scale based entirely on how well they contrast a criterion group from a normal sample. Ex., depression scale could be derived from a pool of true-false personality inventory questions.
Standard error of estimate
The margin of error to be expected in the predicted criterion score.
Homogenous scale
The most commonly used method for achieving this goal is to correlate each potential item with the total score and select items that show high correlations with the total score.
Specificity
has to do with accurate identification of normal patients.
Production of testing materials
testing materials must be user friendly if they are to receive a wise acceptance by psychologists and educators.
Regression equation
the best-fitting straight line for estimating the criterion from the test. For current purposes, it is more important to understand the nature and function of regression equations.
Nominal Scale
the numbers serve only as category names. Ex., when collecting data for a demographic study, a researcher might code as "1" and females as "2". Simplifying a form of naming.
Cross Validation
the practice of using the original regression equation in a new sample to determine whether the test predicts the criterion as well as it did the original sample.
Publishing the Test
the test developer must oversee the production of the testing materials, publish a technical manual, and produce a user's manual.
Construct Validity
A theoretical, intangible quality or trait in which individuals differ. Ex., leadership ability, over controlled hostility, depression, and intelligence.
Guttman Scale
Produced by selecting items that fall into an ordered sequence of the examinee endorsement. Ex. Depression: ( ) I occasionally feel sad ( ) I often feel sad ( ) I feel sad most of the time ( ) I always feel sad and I can't stand it
Decision Theory
Purpose of psychological testing is not measurement per se but measurement in the service of decision making:
False Negatives
Some predicted to fail would, if given the chance, succeed.
Technical manual and user's manual
Technical data about a new instrument are usually summarized with appropriate references. The prospective user can find information about item analyses, scale reliabilities, cross-validation studies. The user's manual gives instructions for administration and provides guidelines for test interpretation.
Concurrent Validity
Under criterion-related validity: the criterion measures are obtained at approximately the same time as the test scores. For example, the current psychiatric diagnosis of patients would be an appropriate criterion measure to provide validation evidence for a paper-and-pencil psychodiagnostic test. Ex. an arithmetic achievement test scores could be used to predict, with reasonable accuracy, the current standing of students in a mathematics course.
Predictive Validity
Under criterion-related validity: the criterion measures are obtained in the future, usually months or years after the test scores are obtained, as with the college grades predicted from an entrance exam. Test scores are used to estimate outcomes to be measures at a later date.
Criterion-related Validity
When a test is shown to be effective in estimating an examinee's performance on some outcome measure. This test score is useful only insofar as it provides a basis for accurate prediction of the criterion. Ex., a college entrance exam that is reasonable accurate in predicting the subsequent grade point average of examinees would possess criterion-related validity.
Method of Absolute scaling
a procedure for obtaining a measure of absolute items difficulty based on results for different age groups of test takers.