Psychological Testing
Mental retardation
(1) Idiocy - a lifelong developmental phenomenon, believed to be incurable a) High focus on language skills, 3 levels of mental retardation: Those using short phrases, those using only monosyllables, those with cries only, no speech
Mental Illness
(dementia) - usually had more of an abrupt onset in adulthood, could show improvement
Sir Francis Galton
1822-1911 created the 1st battery of mental tests. Proved that individual differences exist and are objectively measurable
Wilheim Wundt
1832-1920 Credited with the founding of the 1st psychological lab in 1879 Leipzig, Germany. Came up with an explanation for the differences in mental processes
James McKeen Cattell
1860-1944 Invented the term mental test
Clark Wissler
1901 - The greatest influence on the early history of psychological testing. Tested the validity of brass instruments anf found no correlation with intelligence
Rapport
A comfortable, warm atmosphere that serves to motivate examinees and elicit cooperation. Especially important with children.
Factor Loadings
A correlation between an individual tests and a single factor. Can vary between -1.0 to +1.0
Motivation to Deceive
A small fraction of persons seeking benefits from rehabilitation or social agencies will consciously fake bad on personality and ability tests.
Psychometricians
A specialist in psychology or education who develops and evaluates psychological tests
Test
A standardized procedure for sampling behavior and describing it with categories or scores.
Norm
A summary of test results for a large and representative group of subjects.
Validity
A test is valid to the extend that inferences made from it are appropriate, meaningful and useful.
Classification
A variety of procedures that share a common purpose: assigning a person to one category rather than another. Assignment to categories is the basis for differential treatment.
Item Response Function
AKA Item Characteristic Curve (ICC) is a mathematical equation that describes the relation between the amount of a latent trait an individual possesses and the probability that he or she will give a designated response to a test item designed to measure that construct.
Responsibilities of Test Publishers
Adequate documentation of the test - Technical report, user's manual, marketing and distribution
Test-Retest Reliability
Administer the identical test twice to the same group of heterogeneous and representative subjects. If reliable, each person's second scores will be predictable.
Power test
Allows enough time for a test taker to attempt all items but is constructed so that no test taker is able to obtain a perfect score
Assessment
Appraising or estimating the magnitude of one or more attributes in a person
Best Interest of the Client
Ask yourself what is best for the client; the functional implication of this guideline is that assessment should serve a constructive purpose for the individual examinee.
Creativity Test
Assess novel, original thinking and the capacity to find unusual or unexpected solutions, especially for vaguely defined problems.
Correction for Guessing
Based on established principles of probability. Gives a statistical correction for wild guesses.
Responsibilities of Test Users
Best Interest of the client, Confidentiality, Duty to Warn, Informed Consent, Obsolete Tests, Standard of Care, Responsible Report Writing, Communication of Test Results, Consideration of Individual Differences
Changing Conceptions of mental Retardation
Binet created IQ tests in the early 1900's to help id children in the Paris school system who were unlikely to profit from ordinary instruction. Mental retardation was separated from mental illness leading to a newfound humanism which led to interest in diagnosis and remediation of mental retardation.
Rudimentary Forms of testing
China 2200 BC - Written tests for examinations
5 Uses of Tests
Classification Diagnosis and Treatment Planning Self Knowledge Program Evaluation Research
Duty to Warn
Clinicians must communicate and serious threat to the potential victim law enforcement agencies, or both.
Speed test
Contains items of uniform and generally simple levels of difficulty; if time permitted, most subjects should be able to complete most or all of the items on such a test
3 Ways of Accumulating Validity
Content Validity, Criterion Related Validity, Construct Validity
Proper Diagnosis
Conveys information about strengths, weaknesses, etiology, and best choices for treatment/remediation
Alfred Binet
Created the intelligence test in 1905 Europe
The Kuder-Richardson Estimate of Reliability
Earlier version of Cronbach's Alpha, created by Kuder and Richardson (1937). Called KR-20 because it was 20th in a lenthy series of derivations. Relevant to the special case where an item is scored 1 or 0 (yes or no)
Desirable Procedures of Test Administrations
Examiners must be intimately familiar with the materials and directions before administration begins. This involves extensive rehearsals and anticipation of unusual circumstances and the appropriate response.
The Brass Era of Testing
Experimental psychology in 1800s Great Britain and Europe - mistook simple sensory processes for intelligence; Hawkings would have been labeled mentally simple
Speech Impairment
Failed comprehension by the examiner may lower credit received
Desirable Procedures of Group Testing
Follow standardized procedure and never offer supplementary advice about guessing. Deviation from the instruction manual are unacceptable.
Test Homogeneity
If a scale measures a single construct, then its component items (or subtests) likely will be homogeneous (also referred to as internally consistent)
Sensitivity to Disability
Impairments in hearing, vision, speech, or motor control may seriously distort test results. If an examiner does not recognize the physical disability responsible for the poor test performance, a subject may be branded as intellectually or emotionally impaired when the problem is a sensory or motor disability.
Cattell
Imports Brass Instruments to US, Invented term "Mental Test"
Alternate-Forms Reliability
In some cases test developers produce 2 forms of the same test, independently constructed to meet the same specifications. Both forms are given to the same group and the two sets of scores are correlated.
Sources of Errors in Group Testing
Incorrect Timing of Tests Lack of Clarity in Directions Noise Failure to Explain When and If Examinees Should Guess
Individual Tests
Instruments that by their design and purpose must be administered one on one.
Sources of Measurement Error
Item Selection, Test administration, test scoring, unsystematic measurement error, systematic measurement error, measurement error and reliability
Group Tests
Largely pencil and paper measures suitable to the testing of large groups of persons at the same time.
APA Competencies
Level A - Comprised of simple paper and pencil tests, requires minimal training Level B - Requires training in statistics and knowledge of test construction, some graduate school req Level C - The most complex instruments, minimum Master's degree required
Factors influencing the Soundness of Testing
Manner of Administration, Characteristics of the tester, Context of the Testing, Motivation and Experience of the Examinee, Method of Scoring
Motor Impairment
May be penalized by timed performance tests; examiners may wish to omit timed performance subtests or discount scores from untimed subtests.
Achievement Tests
Measure a person's degree of learning, success, or accomplishment in a subject or task.
Interest Inventories
Measure an individual's preference for certain activities or topics and thereby help determine occupational choice.
Neuropsychological Tests
Measure cognitive, sensory, perceptual, and moto performance to determine the extent, locus and behavioral consequences of brain damage.
Aptitude tests
Measure one or more clearly defined and relatively homogeneous segments of ability. Can be Single aptitude or multiple aptitude tests.
Personality Tests
Measure the traits, qualities, or behaviors that determine a person's individuality; such tests include checklists, inventories, and projective techniques.
Examiner Sex, Experience, and Race
Most studies find that they make little, if any, difference. In isolated instances, a particular examiner characteristic might have a large effect on examinee test scores.
Behavioral Procedures
Objectively describe and count the frequency of a behavior, identifying the antecedents and consequences of the behavior.
Confidentiality
Obligation to safe-guard the confidentiality of information, including test results, obtained from clients while consulting. Can be ethically released after the client or legal rep. gives unambiguous consent, usually in written form.
Split-half reliability (internal consistency)
Obtained by correlating the pairs of scores obtained from equivalent halves of a test administered only once. If scores show a strong correlation from two halves of a single test then two whole tests should also show a strong correlation. Considered supplementary to the gold standard of test-retest.
Intelligence Tests
Originally designed to sample a broad assortment of skills to estimate the individual's general intellectual level. Modern tests refer to a test that yields an overall summary score based on results from a heterogeneous sample of items
User Validity
Overall accuracy and effectiveness of interpretation resulting from the test output
Certification
Pass/Fail quality. Implies a minimum proficiency in some discipline or activity.
Selection
Pass/Fail quality. Similar to certification in that it confers privileges (i.e. Selected to go to a University).
Test Anxiety
Phenomenological, physiological, and behavioral responses that accompany concern about possible failure on a test. Time pressures can exacerbate the degree of a personal threat, causing significant reductions in the performance of test-anxious persons.
Variant forms of Classification
Placement Screening Certification Selection
Interscorer Reliability
Projective tests leave judgement to the examiner in the assignment of scores. A sample of tests is independently scored by 2 or more examiners and scores for pairs of examiners are then correlated. Interscorer Reliability supplements other reliability estimates but does not replace them
Coefficient Alpha
Proposed by Cronbach (1951). Thought of as the mean of all possible split-half coefficients, corrected by the Spearman-Brown formula.
Self Knowledge
Psychological tests can provide self knowledge; the feedback a person receives can alter career or life course.
Screening
Quick and simple tests to identify persons who might have special characteristics or needs. Follow up testing is often advised to avoid misclassifications.
Influence of the Examiner
Rapport Examiner Sex, Experience, and Race
Indications of Possible Hearing Loss
Referral for audiological Exam, if hearing problem confirmed use specialized test. Lack of response to sound inattentiveness difficulty in following oral instructions intent observation of the speaker's lips poor articulation
Norm Group
Referred to as the standardization sample.
Competence of Test Purchaser
Restricted Access, APA Competencies
Spearman-Brown Formula
Since the Pearson's r in a split half reliability only has half the data to work with, it underestimates the reliability of the full instrument. This formula estimates the reliability of the full test.
Program Evaluation
Social programs are designed to provided services that improve social conditions and community life. Also Educational programs
Indications of a Visual Impairment
Squinting/Blinking excessively lose their place when reading holding books too close headaches or nausea after reading
Obsolete Tests and the Standard of Care
Standard of care is the "usual, customary or reasonable". Be wary of obsolete tests
Tests defining features
Standardized Procedure, Behavior Sample, Scores or Categories, Norms or Standards, Prediction of Nontest Behavior
Standardized Procedure
Tests are standardized when procedures for administering it are uniform from one examiner and setting to another.
Research
Tests play a major role in both the applied and theoretical branches of behavioral research.
Classical Test Theory
The idea that test scores result from the influence of two factors: Factors that contribute to consistency. These consist entirely of the stable attributes of the individual, which the examiner is trying to measure. Factors that contribute to inconsistency. These include characteristics of the individual, rest, or situation that have nothing to do with the attribute being measured, but that nonetheless affect scores.
Standardized Procedures in Test Administration
The interpretation of a psychological test is most reliable when the measurements are obtained under the standardized conditions outlined in the publisher's test manual. Nonstandard testing procedures can alter the meaning of test results rendering them invalid and misleading.
Criterion-Referenced Test
The objective is to determine where the examinee stands with respect to very tightly defined educational objectives.
Norm-Referenced Test
The performance of each examinee is interpreted in reference to a relevant standardization sample.
The reliability coefficient
The ratio of true score variance to the total variance of test scores: Variance of the true score divided by the variance of the true score plus the variance of error
Psychometrics
The science of measuring mental capacities and processes
Placement
The sorting of persons into different programs appropriate to their needs or skills (i.e. AP math).
Background and Motivation of the Examinee
The test results may be inaccurate because of the filtering and distorting effects of certain examinee characteristics such as anxiety, malingering, coaching, or cultural background.
Standardization Sample
This group must be representative of the population for whom the test is intended. Selection and testing of this sample is crucial to the usefulness of a test.
Factor analysis
To identify the minimum number of determiners (factors) req. to account for intercorrelations among a battery of tests.
Special Circumstances in the Estimation of Reliability
Unstable characteristics (galvanic skin response), Speed and power tests, Restriction of range, criterion-referenced tests
Responsible Report Writing
Use simple direct writing that is helpful to the client. Never suggest the client undergo specific medical procedures, on refer to consultations
Diagnosis
Usually a precursor to treatment. Consists of 2 intertwined tasks: Determining the nature and source of a person's abnormal behavior; Classifying the behavior pattern within an accepted diagnostic system
Informed Consent
Very important! "Informed consent implies that the test takers or representatives are made aware, in language that they can understand, of the reasons for testing, the type of tests to be used, the intended use and the range of material consequences of the intended use. If written, video, or audio records are made of the testing sessions, or other records are kept, test takers are entitled to know what testing information will be released and to whom." (AERA et al., 1999)
Communication of Test Results
When providing feedback it is the responsibility of the clinician to check for adverse reactions
Measurement Error
X = T + e where X is the obtained score, T is the true score and e is the errors of measurement
Measurement of Error
X=T+e Where T=the true score e=positive or negative error component X=Observed score
Brass Instruments Era of Testing
a) Experimental psychology flourished in 1800s Great Britain and Europe; Experimental psychologists mistook simple sensory processes for intelligence; Brass instruments were used to measure sensory thresholds and record reaction times; those times were then related to intelligence
Item Response Theory
beginning slowly in the 1960s psychometricians have favored this test theory, AKA Latent Trait Theory. IRT is a collection of mathematical models and statistical tools with widespread uses. The foundational elements of IRT include item response functions (IRFs), information functions, and the assumption of invariance.
Content Validity
determined by the degree to which the questions, tasks, or items on a test are representative of the universe of behavior the test was designed to sample.
The Correlation Coefficient
expresses the degree of linear relationship between two sets of scores obtained from the same person. Can range from -1 to +1
Testing of Cultural and Linguistic Diversities
most assessment tools have been developed in Western cultures without consideration to language and cultural differences. Improper translations can invalidate tests (ex. "out of sight, out of mind" was translated as "invisible and insane"). It is advised to use multiple methods of assessment to provide more reliable, holistic perspectives.
Rasch Model
p(θ)=1∕(1+ⅇ^(-(θ-b) ) )