Psychological Testing Midterm
Representativeness Heuristics
"...judging the relationship between variables on the basis of similarity alone disregards other potentially significant factors..." Clinicians base their diagnoses on the degree to which an individual is thought to resemble those making up a diagnostic category Stereotypes (How similar is person X to the typical person in diagnostic category Y?) Prototypes (How similar is person X to the person showing all characteristics associated with diagnosis Y?) Exemplars (How similar is person X to those that the clinician has seen in their personal work?)
Hindsight Bias
"ad hoc fallacy" or tendency to explain an event after it occurs, unaware a biased prediction has occurred. "knew it all along" Foster unrealistic sense of confidence
Scaling
"the process by which a measuring device is designed and calibrated, and the way numbers - scale values - are assigned to different amounts of the trait, attribute, or characteristic being measured"
Test
A measuring device or procedure in which a sample of behavior is obtained, evaluated, and scored.
Test Retest Reliability
A reliable test should yield similar scores over time Used to gauge consistency of scores over time for the same person (via correlation) Test given to same group on two occasions, scores on first and second exam correlated
Psychological Test Definition
A standardized procedure for sampling behavior and describing it with categories or scores. A measurement tool or technique that requires a person to perform one or more behaviors in order to make inferences about human attributes, traits, or characteristics or predict future outcomes. By inference, we mean using evidence to reach a conclusion.
Non random/systematic measurement error
A test consistently measures something other than the target Occurs when source of error always increase or decrease a true score Does not lower reliability of a test since the test is reliably inaccurate by the same amount each time
Indirect Behavioral Assessment
ADHD assessment has scales designed to be completed by parents or teachers at more than one setting
Affective Assessment
Affective Assessment assesses all noncognitive features of an individual, including temperament, clinical disposition, personality, attitudes, values, and interests. Structured Inventories for diagnostic purposes, hypothesis testing, treatment planning, and progress evaluation (Minnesota Multiphasic Personality Inventory-II (MMPI-2) Strong Interest Inventory Unstructured assessment involves the use of projective techniques and qualitative methods Based on psychoanalytic theory, present the client with unstructured, ambiguous stimuli, allowing the client to "project" thoughts and feelings onto the stimulus Inkblots, pictures, incomplete sentences... yield insights into a client's motivation, personality, values, etc.
Late 19th century: Intelligence Tests
Alfred Binet and the Binet-Simon Scale Lewis Terman and the Stanford-Binet Intelligence Scales David Weschler and the Weschler-Bellevue Intelligence Scale and the Weschler Adult Intelligence Scale
Psychological Test Differences
Behavior performed Construct measured and outcome predicted Content Administration and format Scoring and interpretation Psychometric quality
Bootstrap approach
Combination of the two; sequential method First write items based on theory, next validate items based on using samples and statistically analyzing the findings.
Clinician Bias in relation to patient characteristics
Cultural Identities of the pt. influenced the diagnostic, therapeutic, and prognostic decisions made by clinical psychologists. E.g., increased organic dos diagnoses with increase in pt. age E.g., increased Borderline personality dos (i.e., emotional dysregulation, fear of abandon, rejection hypersensitivity...) among female pts. E.g., Increasing age along with poor health ------ less optimistic psychologist predictions regarding treatment and prognosis ("ageism" and "healthism" in everyday clinical practice
Limitations of Psychological Tests
Decisions about peoples' lives should not be made on the basis of a single high-stakes test score. Tests are biased and unfair to minorities and women Tests create anxiety and stress Tests label and categorize Test developers dictate what students must know or learn "Teaching to the test" inflates scores Multiple-choice questions punish creative thinkers; trivialize the complexities of the learning process
Validity
Does your test actually measure what it is designed to measure? Truthfullness
Reliability
Does your test yield consistent results? Consistency
Measurement Error and Reliability
Error reduces the reliability or repeatability of psychological test results A crucial assumption of classical theory is that unsystematic measurement errors act as random influences Main features of classical theory Measurement errors are random Mean error of measurement = 0 True scores and errors are uncorrelated: rTE = 0 Errors on different tests are uncorrelated: r12 = 0
Correlation Coefficient [but comparing the same person]
Expresses the degree of linear relationship between two sets of scores obtained from the same person Rxx = True score variance (T)/Total Variance of test scores (O) O = T + E If measurement error is very small, close to zero, R = If measurement error is very large, R =
Types of Validity
Face Validity Content-Related Validity Construct Validity Criterion-Related Validity
Possible Sources of Measurement error
GROUP 1 - TEST ITSELF GROUP 2 - TEST ADMINISTRATION GROUP 3 - TEST SCORING GROUP 4 - TEST TAKERS
Types of Tests
Group vs. individual Intelligence tests Aptitude tests Achievement tests Creativity tests Personality tests Interest inventories Behavioral procedures Neuropsychological tests
Psychological Test Similarities
Limited sample of behavior (an observable and measurable action) Standardized procedure Behavior used to make inferences about some psychological construct (an underlying, unobservable personal attribute, trait, or characteristic of an individual that is thought to be important in describing or understanding human behavior)
central tendency
MEAN MEDIAN MODE
Neuropsychological Tests
Measure cognitive, sensory, perceptual and motor performance to determine the extent, locus, and behavioral consequences of brain damage
Response Bias from Examinees
Motivation Fake good or fake bad (malingering) Integration of well-developed validity scales, review both self- and other-report data, use of objective mental-status examination findings...
Creativity Tests
Novel, original thinking
Confidence Interval Calculations
O=X+E 95% CI = X +/- 2(SEM) 99.7%CI = X +/- 3(SEM)
Behavioral Procedures
Objectively describe and count the frequency of a behavior//Identify the antecedents and consequences of the behavior
Availability Heuristic
Pertains to the situation where information used for prediction or decision making is that which is most easily accessed or recalled Illusory Correlation (forming test sign-symptom correlations without empirical evidence) Correlations based on clinicians' personal associations and projections than on data Recall/memory availability and vividness can limit judgement accuracy Clinician's memory capacity Only remembered one piece of information
Interest Inventories
Preference for certain activities or topics//Occupational/career interest
Psychological Test assumptions
Psychological tests measure what they purport to measure or predict what they are intended to predict An individual's behavior, and therefore test scores, will typically remain stable over time Individuals understand test items the same way Individuals will report accurately about themselves Individuals will report honestly about their thoughts and feelings The test score an individual receives is equal to his or her true score plus some error
Measures of variability
Range Standard Deviation
Misrepresentation of change
Regression to the Mean - extreme observations, scores, or performances on one occasion will likely be followed by less extreme results on future occasions E.g., "His depression scale score is lower than the one he scored two months ago. He is definitely better! The treatment definitely worked!"
Split Half Reliability
Reliability gauged by splitting a test into two parts and comparing an individual's scores on both halves If a test if split into two (odd vs. even questions), the two halves should yield similar scores for a given individual
Group One Test Itself Error Factors
Representation of the items, wording of the items, culturally biased items, linguistically biased items, double-barreled items
early 1900s: personality tests
Robert Woodworth and the Personal Data Sheet Carl Jung and the Rorschach Inkblot Test Henry Murray and C. D. Morgan and the Thematic Apperception Test
Standard Error of measurement
SD * square root of 1-r (reliability)
Empirical approach to test construction
Sampled large samples on random items - identify items that relate to the construct attempted to measure
Content Validity
Shows evidence that the test items adequately reflects the test domain - via literature review, consulting subject matter experts To ensure that you comprehensively cover/capture the test domain of interest
Criterion Related Validity
Shows that a test is able to predict the behavior that it is designed to predict (criterion/outcome) To show that your test scores actually lead to predicted behavioral outcomes either now (concurrent) or in the future (predictive)
Assessment
Systematic procedures for making inferences about characteristics of people. Broader and more comprehensive than testing.
Adjustment and Anchoring
Tendency for final judgments to be biased in the direction of initially reviewed data Judgement overly influenced by the first page of the clinical material reviewed Potential for reviewers to reach different opinions regarding the same evidence in the event that this evidence is reviewed in different sequences
Test vs. Assessment
Test Assessment
Construct Validity
The extent to which the variables being studied represent the constructs they are purported to measure. (Are we comparing oranges to apples?) Shows evidence that the test items adequately capture the concept (or construct) that it is designed to capture To show that your test correlates positively (convergent) with similar test and negatively (discriminant) with dissimilar tests
Confirmation Bias
The tendency to selectively attend to information that is in line with one's viewpoint while minimizing or disregarding data that may disconfirm this position. e.g., "selective" data review
Rational Approach to test construction
Theoretical approach
Personality Tests
Traits, qualities or behaviors that determine a person's individuality//Checklists, inventories, and projective techniques
Observed Exam Score =
True Exam Score + error
early mid 1900s vocational tests
U.S. Employment Service and the General Aptitude Test Battery
Inter Rater Reliability
Used to gauge the consistency of ratings across multiple raters (e.g., figure skating judges, diving judges) In order for a scoring method to be reliable, independent ratings by multiple judges should be highly similar
Parallel Forms Reliability
Used to gauge the equivalence of two or more different versions of an assessment that measure the same concept Different versions of the same test should yield highly similar scores for a given individual Test developer creates two forms of the test Assesses equivalence of two parallel forms scores on both tests correlated
Internal Consistency Reliability
Used to see if responses to a set of similar items are uniform (or consistent) Measured using Coefficient Alpha a way to compare individuals' scores on all possible ways of splitting the test in halves (instead of just one random split). Logic: A reliable test should contain only those questions that measure the same concept Heterogeneous test (or homogenous subsets) is split in half and scores on first half compared with scores on second half Assesses how related items or groups of items are to one another scores on both halves correlated
Psychometrician
a specialist in psychology or education who develops and evaluates psychological tests.
Intelligence Tests
ability in global areas
Ratio Variables
an interval scale, but with a true zero point. temperature, length, number of children, income...
Group Tests
are designed for administration to groups of participants simultaneously. Advantage - speed and efficiency Limitations in the type of test formats available (e.g., paper-and-pencil, computer-based). A major drawback is the inability to observe all examinees and control relevant individual factor (e.g., client motivation, mood).
Individual Tests
are often used for diagnostic decision making and generally require some interaction b/t the examiner and examinee. Allow the two to establish rapport, reduce anxiety Often the administrator require special training Can provide information about the client's presentation, affect, attitudes, verbal and nonverbal behaviors, etc.
Speed and Power Tests
bg
Aptitude Tests
capability in a specific task
Diagnostic Overshadowing Bias
client's problem receives inadequate treatment b/c attention is diverted to an more salient characteristics E.g., Gay or lesbian pt. ------ a clinician might perceive the presenting problem as related to conflicts over sexual orientation and fail to address other critical issues E.g., Individuals with AIDS were less likely to be referred to treatment for emotional symptoms (depression) than patients with other medical problems.
Heuristics
decisional simplification strategies - Mental Shortcuts Representativeness Availability Adjustment and Anchoring
Achievement Tests
degree of learning, success, or accomplishment in a subject or task
Behavioral Observations
direct indirect
Nonstandardized tests
do not do the above.
Random/Unsystematic Measurement Error (reduce reliability)
effects are unpredictable and inconsistent Random in nature Will increase and decrease a person's score by exactly the same amount with infinite testing Cancels itself out Lowers reliability of a test
Standardized vs nonstandardized tests
f
Group vs. Individual Tests
ff
Objective vs subjective scored tests
ff
Cognitive vs affective tests
gg
Norm vs criterion referenced tests
gr
Standardized Tests
have specific conditions for administration, timing, and scoring. To ensures that no matter who the examiner or examinee, the test will be administered under strict, replicable conditions. Allow comparability of scores and interpretations across time/situation Conform to rigorous test construction guidelines
Speed Test
is to measure how many of the simple items a person can complete within a certain amount of time; the score is simply the number of (correct) items completed within the time limit. Coding tests of the WAIS-IV
Objective Tests
leave no doubt as to the correctness of a given answer: correct answers are predetermined and require no judgement on the part of the examiner. Multiple-choice, true-false items Help control subjective bias in scoring (interscorer reliability)
Test Administration Error
level of noise, room temperature, lighting, inconsistent way of giving instructions and/or answering questions
Cognitive Tests
memory, perceptual, processing and reasoning capacities Intelligence Tests: measure a person's ability to learn, solve problems, and understand increasingly complex or abstract information Wechsler ADult Intelligence Scale - Fourth Edition (WAIS-IV) Aptitude Tests: predict a person's capacity to perform some skill or task in the future (e.g., college) SAT (actually, can only predict the freshman yr of college) Achievement Tests: measure knowledge students have acquired through instruction or training up to a certain point in their academic career.
Test Takers Error
motivation, anxiety, attention, and fatigue level
Norm Referenced Tests
often Standardized tests, are administered to a representative sample of participants (called a standardization sample), to determine average performances for various subgroups of interest (called a norm group). A client's score can then be compared to the average of the standaridization sample (i.e., Average, Above Average, Below Average) Commonly used to assess intelligence, achievement, personality, cognitive functioning... The raw score is transformed into some type of standard score or percentile rank Example, SAT
Nominal Variables
qualitative system; if numbers are used they are arbitrary. sex, ethnicity, college major, political affiliation...
Ordinal Variables
ranking according to characteristics, with intervals not necessarily being consistent between ranks. tallest to shortest, Olympic medalists...
Interval Variables
ranking with equal space between units but with no true zero. many exams/psych tests (at least aspire!) to interval...
Subjective Tests
require the examiner to make a judgment on the quality of the response in scoring an item Essay, open-ended questions Can elicit important and rich client information
Test Scoring Error
scorers' levels of skills and qualifications, criteria of scoring, subjectivity
Criterion Referenced Tests
tests compare a person's score to a predetermined standard or level of performance - a criterion. Either "passed" or "failed;" a cutoff score For example, a depression screening test with a total score of 20 and above - indicate further evaluation DSM-V diagnostic checklists (3 or more of the six listed symptom criteria make a clinical diagnosis)
Face Validity
the degree to which a procedure, especially a psychological test or assessment, appears effective in terms of its stated aims.
Direct Behavioral Assessment
the examiner physically present in the same environment with the client and uses a data collection procedure to assess the frequency, duration, and/or magnitude of one or more target behaviors A school counselor may observe a 2nd-grade student referred for overactivity in the classroom using a time-on-task observation system Has a natural control group
Power Test
the score is an indicator of the skills or abilities possessed by the examinee, without the pressure of time limits. Items vary in difficulty, and examinees eventually miss or could not complete many items in a row (reach the ceiling level), and the administration ceases. For example, Matrix Reasoning of the WAIS-IV
Past Behavior Heuristic
use of previous behavior to predict future behavioral outcomes Patient A's past substance use experience ---Patient A is an addict today!