Psych testing and assessment
Public Law 93-112
"Bill of Rights" for individuals with disabilities Outlawed discrimination on the basis of disability Psychologists mandated to assess for disabilities using appropriate measures.
Culture Fair Intelligence Test (CFIT)
-Tried to measure analytical & reasoning ability in abstract & novel situations in a manner as "free" of cultural bias as possible. - The unique approach they used in this test was use nonverbal measures of fluid intelligence
Learning Disability Diagnosis
1975 definition: a disorder in 1 or more of the basic psychological process involved in understanding or in using language, spoken or written, which may manifest itself in an imperfect ability to listen, speak, read, write, spell, or do mathematical calculations Included perceptual, but not sensory impairments According to 1990 Individuals with Disabilities Act Look for a severe discrepancy (1 SD) between intelligence and achievement in 1 or more of 7 areas
ACT
4 tests: english, math, reading, science Composite score of 4, SS on 36 point scale Heavy emphasis on reading across all tests
Wonderlic Test
A group intelligence test to measure intelligence, for screening purposes. Timed 50 item test -Item format is multiple-choice or fill in -Questions presented in ascending order of difficulty
Denver-II
A screening instrument to assess healthy and at-risk children between birth and 6 yrs The most used Authored by a pediatrician, William Frankenburg, and a psychologist, Josiah Dodds Offshoot of the Gessell Developmental Schedules Designed to be an inexpensive instrument with a small number of tasks that is quick and easy to administer Full administration - 20-25 minutes Abbreviated version - 10-15 minutes - often used by pediatricians Typical Administration 20-25 pass/fail items that are administered, observed, or reported by parent Assessment is made in four areas: 1. Personal-social 2. Fine motor (ex. Using fingers to pick up an object, wiggling toes) 3. Language 4. Gross motor (ex. walking, running, jumping) + 5 behavior ratings: Typical, Compliance, Interest in surroundings, Fearfulness, Attention span Test Interpretation 1. Normal = passes each evaluated area 2. Questionable = one delay in any evaluated area 3. Abnormal = two or more delays in any evaluated area * Delay = not passing an item passed by 90% of children their age Psychometrics Standardized on over 2,000 children from Colorado. Represents U.S. population except for race and maternal education Geographically limited but most are okay with the standardization Excellent inter-rater reliability; Test-retest reliability (average .90) Validity - the test refers appropriately for children with serious problems or gross developmental delays but is not sensitive enough for more subtle problems Sensitivity is decent Specificity is good It is not very good with picking up on the subtle problems
Test Bias
A technical concept amenable to impartial analysis Refers to objective statistical indices that examine the patterning of test scores for relevant sub-populations.
Why do postgraduate selection tests have limited predictive validity?
Because of the restriction of range problem Applicants with low scores are unlikely to be accepted for graduate training in the first place, and, thus, relatively little information is available with respect to whether low scores predict poor academic performance Correlation of scores with graduate academic performance is based mainly on persons with middle to high scores.
Cattell
Believed intelligence was best defined as having an overall general ability as well as distinct abilities, particularly those that encompassed one's acquired cultural knowledge (Crystallized intelligence) as well as one's high level reasoning that can be used for problem-solving (Fluid Intelligence)
Thurstone
Believed intelligence was best defined as having multiple factors but not an overall general ability (no "g")
Gardner
Believed intelligence was best defined as having multiple factors that included a wide range of abilities, including non-academic focused abilities. (Emotional intelligence, musical intelligence, etc).
Guilford
Believed intelligence was best defined as having several multiple factors that could be broken down into a large number of very specific factors. Structure of Intellect model
Sternberg
Believed intelligence was best defined as having three types of abilities: componential, experiential, contextual. All with an emphasis on practical everyday functioning.
Advantages of group testing
Can be used efficiently with large # of test takers. Inexpensive to administer-cost effective/ more appropriate for time Usually objectively scored (often by computer)
Quotas
Candidates should be selected as "politically appropriate" for representative location, i.e. chose candidates in approximate percentage of incidence in the local or national population. e.g. 38% White; 26% Hispanic; 22% Asian;12% Black; 2% Others
Qualified individualism
Candidates should be selected entirely on tested abilities without consideration for age, sex, race, etc. We are ethically bound to NOT make decisions using age, sex, race, etc.
What type of validity is most important for developmental tests?
Concurrent validity We are looking for agreement across multiple assessment modalities i.e. the test, parent report, pediatrician's assessment, teacher's assessment, etc.
Psychometric properties of Wonderlic Test (standardization, reliability, validity)
Concurrent validity correlates with WAIS as high as .90
Wechsler
Created intelligence tests for children and adults that included the evaluation of both verbal and nonverbal skills.
Learning Disability definition
Discrepancy is between general ability (intelligence) and specific achievement
Public Law 94-142
Education for All Handicapped Children Act Mandated disabled school children receive appropriate assessment & educational opportunities Psychologists mandated to assess for disabilities using appropriate measures.
What is the essence of the controversy over race differences in intelligence tests scores?
Essence of the controversy is whether test differences reflect actual group differences or a bias that favors certain groups over others
What is and how do we assess for bias in content validity?
Experts differ in opinion about whether a test is valid for a given population We assess for it by an item or subscale of the test being relatively more difficult for Members of one group than for members of another group Can also use panels of reviewers
nonreading & motor-reduced tests
For those who understand, but can't read English young children, those who can't read, those with speech or expressive-language impairments Performance subtests on most mainstream IQ tests but aren't good if person can't manipulate objects
Testing the Deaf or Hard of Hearing
Give traditional measures using sign language - if not fluent in ASL, testing is very difficult - Caution: sign language has many variants - Should refer to a psychologist who has been immersed in deaf culture Wechsler performance subtests are tools of choice WAIS-III available in a formal ASL translation
What is and how do we assess for predictive validity bias?
If the test does not meet criteria of homogeneous regression then bias has been demonstrated Homogenous regression: data for both groups falls on same regression line We assess for it by predicting our criterion equally well for members of different groups -prediction of academic success/failure in school -WISC seems to predict school achievement equally well for Caucasian, African American, and Hispanic kids
Psychometric properties of Shipley2 (standardization, reliability, validity)
Internal consistency is high at .92 for adults. Shipley-2 validity correlates with WAIS-111 with a .86
Know the general features of the KTEA-II.
Kaufman Test of Educational Achievement-II For ages 4 ½ to 25 Test is untimed Core battery has 8 subtests spanning 4 areas that consist of: 1) Reading (letter & word recognition, reading comprehension) 2) Math (concepts & applications, computations) 3) Written Language (written expression, spelling 4) Oral Language (listening comprehension, oral expression) Scores 3 composite scores (Reading, Math, and Written Language) Subtest scores Total Battery Composite score Scoring is objective and highly reliable (Mean = 100, SD = 15) Systematic method for evaluating the qualitativ e nature of subtest errors
WPPSI-III Age
Lower age range (from 2 to 6 years of age)
(Predicting College performance) SAT
Mastery of high school matter, reasoning skills. Reasoning tests (critical readings, math, writing) Subject tests: for advanced placement Predictive validity: good, but high school record is better, both together is best!
How well do the overall IQ scores on the mainstream individual intelligence tests correlate with one another?
Most are correlated .80-.90 with one another
How is the HOME used?
Most widely used index of children's home environment Includes home observation & parent interview provides a measure of physical and social environments 3 forms - infant & toddler, early childhood, middle childhood Promising research tool & as a practical adjunct to intervention
Group testing
Multiple choice format Objective scoring Group administration Screening purposes Huge standardization samples for tests. Group tests are easier to administer Purposes of individualized tests are a lot more detailed
Visual Impairments Approach
Must rely on non-visual stimuli Most traditional intelligence tests - verbal scales are ok, performance scales altered Examples of tests: - Perkins-Binet Adapted from the SB retains most of the verbal items, adapts other items to a tactile mode - Haptic Intelligence Scale for the Adult Blind adapted from Wechsler performance scales, 2 new (bead arithmetic & pattern board) no research on the instrument - Blind Learning Aptitude Test (BLAT) - Intelligence Test for Visually Impaired Children (ITVIC)
Cautions in use or factors that may influence test scores for Shipley2 Test
Non-native English speakers -Individuals with difficulty reading -Abstraction scores affected by motivation, working memory, attention
Shipley2 Test
Original purpose of Shipley2 test was to measure intellectual deterioration. Current purpose is to measure of general intellectual functioning in educational, counseling, personnel, and research settings Typically administered in a group setting -Most people have time to complete items they know (not a speeded test)
Environmental hypothesis
Our environment substantially influences our IQ and accounts for differences across the races Those raised in impoverished environments (i.e., poor nutrition, less education parents, fewer resources) develop to have lower IQs Response: Supported by research studies Many environmental factors can limit development of IQ Kaufman - plant example Tie in motivation & value placed on testing
Laypersons
Practical problem-solving ability - Verbal ability - Social competence Example: Street Smart Dawg $$
Yerkes
Psychologist and army major appointed by APA to develop brief group intelligence tests to assess army recruits when US entered WWI -Army Alpha for literate soldiers -Army Beta for non-English speaking and illiterate persons -Group IQ tests, brief screening test to see who is intelligent enough to be a soldier. **Important because shows there are diff ways to evaluate people's skills.
Cautions in use or factors that may influence test scores for Wonderlic Test
Race differences have been acknowledged in the Wonderlic Personnel Test. Consider age and make adjustments accordingly. 15-29 Add 0 to 12 min. raw score 30-39 1 40-49 2 50-54 3 55-59 4
Scores obtained (types of scores, standard scores used) of Shipley2 Test
Raw scores, standard scores, and percentile ranks for each subtest **Get a composite score -Scoring guidelines: Over 130 is Superior Well above Average is 120-129 Above Average is 110-119 Average 90 - 190 Below Average 80-89 Well Below Average 70-79 Low is less than 70
LSAT (law school)
Reading, understanding of complex materials, organization management of information, ability to think critically and draw correct inferences. 30 min writing sample- sent to law schools Scored 120-120 reliability is okay. Moderately good predictor
Drawbacks of Group testing
Some examinees will score below their true score due to motivational problems or difficulty following directions More dependent upon reading ability Usually can't differentiate test takers as finely as individual tests Invalid scores may not be recognized as such.
Genetic hypothesis
Some groups are genetically brighter than others. VERY controversial Jensen at UC Berkeley (1969) Response: no reliable research to support this hypothesis Studies find we are more alike than we are different & variations in cultural and environmental factors account for many of our differences
Wechsler Intelligence Scale for Children-IV (David Wechsler; Psychological Corporation & Harcourt Assessment Company, 2003)
Subtests (15) - (10 core subtests) Standard Subtest Administration Order Supplementary subtest can be substituted for a core subtest 1. Block Design 2. Similarities 3. Digit Span 4. Picture Concepts 5. Coding 6. Vocabulary 7. Letter-Number Sequencing 8. Matrix Reasoning 9. Comprehension 10. Symbol Search 11. Picture Completion 12. Cancellation 13. Information 14. Arithmetic 15. Word Reasoning Composite Scores (5) Composite Score Abbreviation o Verbal Comprehension Index VCI o Perceptual Reasoning Index PRI o Working Memory Index WMI o Processing Speed Index PSI o Full Scale IQ (sum of 10 core subtests) FSIQ Mean = 100; SD =15 10 Core subtests divided among 4 indices Standardization Sample 2200 cases (200 in each of 11 age groups from 6 ½ through 16 ½ years) 100 males and 100 females in each age group Race/Ethnicity matched to March 2000 census Stratified by parent education Race/Ethnicity (White, African American, Hispanic, Asian, and others) proportional to census data within each age group - Standardization Testing Sites West, Midwest, Northeast, South - Psychometrics for WISC-IV Reliability - strong and comparable to previous editions of the test Split-half & test-retest reliability in the .90s Very high for FSIQ and Index Scores Subtests range from .79 (cancellation & symbol search) to .90 (letter-number sequencing), most in high .80s Lower reliability Test-retest tend to have slightly lower reliability Validity MANY studies to support all 3 types of validity IQs average of 3 points lower on WISC-IV than on WISC-III (is typical of new tests)
What are the 3 main hypotheses that attempt to explain group differences in intelligence test scores?
Test bias hypothesis Genetic hypothesis Environmental hypothesis
Test bias hypothesis
Tests are biased against minorities. Middle-class Caucasians have more of the experiences that are measured on intelligence tests. Many tests normed on white, middle class individuals. Response: Most mainstream tests show little to no evidence of bias. Attempts to develop culture-free tests have been unsuccessful
Why do scores on infant tests have low correlations low with other tests of intelligence?
The types of tasks on these tests are different Primarily Sensorimotor vs. Primarily Verbal
Vineland Adaptive Behavior Scales-II (VABS-II)
This assessment instrument has a semi-structured interview format for parents or caregivers and questionnaire format for teachers Purpose is to measure personal and social skills used for everyday living Designed for special needs populations, e.g., intellectual disability, developmental delays, functional skills impairment, and speech/language impairment 3 Versions 1. Interview Edition, Survey form 297 items assessing general adaptive behavior Administered to parents or caregivers 2. Interview Edition, Expanded form 577 items - more comprehensive than Interview Edition 3. Classroom Edition 244 items assessing adaptive behavior in the classroom Administered to teachers 5 Domains 1. Communication 2. Daily Living Skills 3. Socialization 4. Motor Skills Each of these domains has subdomains. The sum of 1-4 yields: The Adaptive Behavior Composite (ABC) 5. Maladaptive Behavior Domain (optional) for 5 years+ Scales use standard score means = 100; SD = 15 Item Scores 2 Yes, usually 1 Sometimes or partially 0 No, never N No opportunity DK Don't know Communications Domain What the individual understands, says, reads, and writes Understands the meaning of the word "no" Says at least 50 recognizable words Speaks in full sentences Daily Living Skills Domain How the individual eats, dresses, practices personal hygiene, what household tasks they perform, and how they use time, money, telephone, and job skills 1. Urinates in toilet or potty chair 2. Dresses self completely except for shoelaces 3. Demonstrates the understanding of money Socialization Domain How the individual interacts with others. How they play and use leisure time, and how they demonstrate responsibility and sensitivity to others 1. Labels happiness, sadness, fear and anger in self 2. Follows rules in simple game without being reminded 3. Follows school or facility rules Motor Skills Domain How the individual uses arms and legs for movement and coordination. How the individual uses hands and fingers to manipulate objects 1. Cuts paper along a line with scissors 2. Hops on one foot with ease Maladaptive Behavior Domain Undesirable behavior which may interfere with the individual's adaptive functioning 1. Sucks thumb or fingers 2. Is too physically aggressive 3. Runs away 4. Has temper tantrums Psychometrics for 2nd test Standardization - excellent. This test was normed on over 3,000 individuals Multiple supplemental norm groups High test-retest reliability (.80-.90) Good concurrent validity - correlates with WISC-R .47-.70
Unqualified individualism
This position says the best qualified candidate should be selected for employment, admission, etc. by using any and all valid predictor variables e.g. race, sex, ethnic group membership, etc. based on free and open competition
What is and how do we assess for bias in construct validity?
This test shows bias when a test measures different hypothetical traits for one group versus another or measures the same trait with different degrees of accuracy We assess for it by factor analysis Ex. math test composed of word problems given to elementary school children (non-English speakers need the ability to decode the language and understand math)
Postgraduate selection tests (GRE) graduate record exam
Three parts: verbal, quantitative, analytical writing New scoring in 2012 M = 500 SD=100 To m = 150 (r=130-170) Reliability .90 alpha coefficient Validity .22 (Q), .28 (V) Predictive w GPA Not good, lots of debate about using this for selection Probably poor because of restricted range Subject tests are better predictors
Administration (subtests, timing, etc.) of Wonderlic
Timed 50 item test 12 minutes to complete as many as possible
What are the three ethical positions regarding social values & test fairness?
Unqualified individualism Qualified individualism Quotas
Aptitude Test Use
Used for educational and/or career counseling and vocational placement measure segments of ability and are used for predictive measures. More specific than ability test
Spearman
Using factor analysis he developed the "Two factor theory" of intelligence "g" + "s" factors ( general intelligence + specific factors)
Experts
Verbal Intelligence - Problem Solving Ability - Practical Intelligence Examples: Book smart
MCAT
Verbal reasoning, physical sciences, biological sciences 1 essay section- writing sample, scores on scale 1-15 Correlates .6 with grades. .7 with licensing exam scores
Wechsler Adult Intelligence Scale-IV (WAIS-IV)
Very similar to the WISC-IV Significant revision from WAIS-III in that: Addition of two subtests Simplified test structure Emphasis on index scores that provide a sharper demarcation of discrete domains in cognitive functions WAIS-IV abandons the familiar bifurcation of intelligence into Verbal IQ and Performance IQ Same scores, same basic administration & scoring procedures Same scores obtained FSIQ, 4 Composites, 15 subtests Only 10 of the subtests, known as core subtests, are needed to obtain the tradition IQ score and component index scores The other 5 subtests are deemed supplemental These can be used as acceptable substitutes for core subtests WAIS-IV is scored for four index scores, each based on a 2 or 3 of the 10 core subtests Derived from factor analysis of the subtests Based on the familiar mean of 100 and SD of 15 Minor differences in subtests Four domains (subtests): VCI - Verbal Comprehension Index cleaner and more direct measure of verbal comprehension that VIQ, therefore it is now the preferred PRI - Perceptual Reasoning Index more refined measure of perceptual reasoning than PIQ WMI - Working Memory index comprised of subtests sensitive to attention and immediate memory PSI - Processing Speed Index comprised with subtest that require the highly speeded process of visual information Sensitive to a wide variety of neurological and neuropsychological conditions WAIS-IV Standardization Standardization 2200 adults ages 16-91 broken down into 13 age bands, 200 people in each (except top 4 oldest groups) Very representative sample Uncooperative subjects, as well as, those for whom English was a second language were excluded Same approach to obtain sample as described for the WISC-IV Reliability Exceptional Split half average .90-.96 for factors, .98 for FSIQ Much weaker for subtest scores Information (.90) and Vocabulary (.91) Remaining subtests reliability values ranged from the low .70 to the mid .80s. Therefore, approach subtest profile analysis very cautiously Could be a consequence of the generally weak reliability of certain subtests rather than indicating true cognitive strengths or weaknesses SEM about 2.6 for 16 & 17yo, 2.1 for all others 95% of the time, an examinee's true Full scale IQ will be +/- 4 points (2 standard errors of measure) of the obtained value 8-point band of error (this is excellent!) IQ scores are accurate within about +/- 4 points Validity Content - very good Was built in from the beginning through comprehensive literature review and consultation with experts to assure that items and subtests tap the relevant range of cognitive processes CRV - correlates very highly w/other IQ tests Construct - MUCH research done, highly supportive The goodness-of-fit of the four-factor hierarchical model of intelligence turns out to be exceptionally strong, although difficult to summarize in visual form
Administration (subtests, timing, etc.) of Shipley2 Test
Vocabulary and Abstraction or Vocabulary and Block Patterns --40 multiple choice items, From 4 options, choose word closest in meaning to the target word, 10 minutes --25 sequence completion items for 12 minutes
Test Fairness
a broad concept that recognizes the importance of social values in test usage Subjective values state a test is fair in the way it is used to make decisions
Fluid Intelligence
as well as one's high level reasoning that can be used for problem-solving
Achievement Tests
current level of skill or functioning Measures learned information or abilities
Slope bias
data follows two non-parallel lines of regression for different racial groups, therefore a single regression line would over and under predict for both groups
Intercept bias
data follows two parallel lines of regression for different racial groups, therefore a single regression line would over-predict for one group and under-predict for the other
Homogenous regression
data for both groups falls on same regression line
Ability Tests
intelligence tests Broad range of abilities Supposed to tap into capacity, or potential
Scores obtained (types of scores, standard scores used) of Wonderlic
is Calculated from your raw score. -Small SEM , calculating confidence interval will be smaller too
intellectual disability
is characterized by significant limitations both in intellectual functioning and in adaptive behavior as expressed in conceptual, social, and practical adaptive skills. This disability originates before age 18. Old DSM-IV Criteria: Prior to age 18 IQ < 70 (70-75 when considering confidence intervals) Deficits in adaptive behavior New DSM-5 Criteria: Emphasize Adaptive Behavior Deficits No specific IQ score "cut off" Deficits in the developmental period
Experts and Laypersons agree on
many aspects of intelligence but experts place a greater emphasis on verbal skills and lay people on practical problem solving and social skills.
Aptitude Tests
measures segments of ability Still capacity, but in narrowly defined areas Sometimes Ability and Aptitude used interchangeably
Leiter Measurement of Intelligence
nonverbal intelligence, cognitive abilities, culture-reduced -Testing is performed by the child or adolescent matching small laminated cards underneath corresponding illustrations on an easel display Test is untimed and completely nonverbal Contains 20 subtests organized into 2 batteries Visualization and Reasoning Reasoning consists of: classification and design analogies Visualization consists of: matching, figure-ground, paper folding, figure rotation Memory and Attention Memory consists of: memory span, spatial memory, associative memory, delayed recognition Attention consists of: underlining test and a measure of divided attention * Not all subtests are administered to every child Psychometrics Yields a composite IQ with mean 100 and SD 15 Produces subtest scaled scores with mean of 10 and SD of 3 Normed over 2000 children and adolescents from 2 to 21 years of age Internal consistency reliability for subtests, domain scores, and IQ scores are excellent Has value in assessing medically fragile children, low-functioning children with autism, and evaluating children classified as language impaired.
Crystallized intelligence
particularly those that encompassed one's acquired cultural knowledge
Bayley Scale of Infant Development-III (BSID-III)
provides the single most comprehensive measure of the developmental status of infants between birth and 42 months (0-3.5yrs old) Core battery of 5 scales: 3 scales are administered with child interaction: cognitive, motor, language Items are scored as pass or fail Manual provides age at which 50% of infants pass item and range of ages for which 5 to 95% pass. 2 scales conducted with parent questionnaires: social-emotional, adaptive behavior Adaptive Behavior subtest assess Communication, Community use, Functional pre-academics, Home living, Health and safety, Leisure, Self-care, Self-direction, Social, Motor Social-Emotional subtest determines the mastery of early capacities of social-emotional growth through parent questionnaire, monitors healthy social and emotional functioning, monitors progress in early intervention programs, detects deficits or problems with developmental social-emotional capacities o Each yields a score with Mean=100, SD=15 o Makes it tempting to think of as an IQ Normed on over 1700 infants using a stratified random sample based on 2000 census Psychometric properties - excellent for an infant scale Test-retest reliability .76 Inter-rater reliability .75 Split-half .81 to .93 - increases with age. Good concurrent validity for identifying at-risk children and extent of developmental delays
The American Association on Intellectual and Developmental Disabilities (AAIDD)
state that there are specific skills within three areas of adaptive functioning: Conceptual Skills - Language and literacy Social Skills - Interpersonal skills, social responsibility, self-esteem, social problem solving, and the ability to follow rules/obey laws and to avoid being victimized Practical Skills - activities of daily living (personal care), occupational skills, health care, travel/transportation, schedules/routines. Instruments for assessing adaptive behavior Vineland Social Maturity Scale (VABS) Scales of Independent Behavior - Revised (SIB-R) Inventory for Client and Agency Planning (ICAP)
Differential Aptitude Test (DAT)
was originally published in 1947. 5th edition published in 1992 very commonly used. Intended for students as part of a career assessment battery to determine suitability of work or choice of college major For 7th-12th graders, later young adults. Aptitude is assessed in 8 different areas. Each subtest is timed total test - 3 hours Subtests guided by criteria Each test is independent Tests should measure power Battery should yield a profile Norms should be adequate Test materials should be practical Tests should be easy to administer Alternate forms should be available