Testing and Measurement - Test 1 - Dr. Alvarez

Ace your homework & exams now with Quizwiz!

criterion-referenced interpretation

interpret test performance in relation to some well-defined external criterion rather than in relation to norms e.g. classroom exams: >90% = A, <60% = F; EPPP (Nat'l Board Exam for Psych) is a criterion reference exam/interpretation Applicability: requires a well-defined body of content

Non-normal shapes of distribution

kurtosis - the "peakedness" of the distribution leptokurtic - more peaked than normal distribution platykurtic - flatter than normal distribution modes: bimodal (2 modes), multimodal, etc.

Major Force: Practical Applications

most developments came from a need to satisfy practical problems (e.g., Binet tried to solve a practical problem for Parisian schools)

frequency distribution

organizes the data into groups of adjacent scores

frequency histogram

plots score ranges along the x-axis and score frequencies along the y-axis

Educational Uses of Tests

primarily group-administered tests of ability or achievement, or predicted future success in academic work E.G., SAT, ACT, LSAT, OLSAT, GRE

Personnel/Employment Uses of Tests

primarily used in business and the military for 2 purposes 1) select individuals most qualified for a position 2) assign individuals to different tasks to optimize the organizations overall efficiency

Tests Info: Book on Single Tests

Entire book devoted to just one test E.g., WAIS-IV (Wechsler Adult Intelligence Scale), MMPI-2/MMPI-2-RF (Minnesota Multiphasic Personality Inventory) Strengths: outstanding, in-depth coverage Weaknesses: available for only a few tests, typically written by person who are positive about the test, not impartial

Tests Info: Other Users

Fellow professional in the field may be a helpful source of info Strengths: good source for what is typically used and for peculiarities of a test Weaknesses: not all are up-to-date

Descriptive Statistics

Help to summarize (describe) raw data to aid our understanding of the data (e.g. mean, median, mode, SD)

Wilhelm Wundt

His psychophysics lab (1879) established experimental psych and emphasized standardized conditions and precision of measurement. Trained many key figures Theory: Structuralism - mind elements and how they work together

Test development question about tests

How do we develop a good test?

Norms question about tests

How do we interpret the scores?

Validity question about tests

How do we know what the test actually is measuring?

Speed vs. Power Tests

How fast the examinee completes easy items. How accurately the examinee completes questions testing the depth of their knowledge

Practical issues questions about tests

How much does it cost? (can be v expensive) How long does it take? (we want a much info as possible) Is it easily obtained? Is it available in languages other than English?

Reliability question about tests

How stable are the scores on a test?

Francis Galton

In "The Roots" Era Darwin's cousin interested in hereditary genius - believed in fixed IQ Invented the bi-variate distribution chart - his friend Pearson made "r"

Alfred Binet

In "The Roots" Era Parisian schools Binet-Simon Scale in 1905

Charles Spearman

In "The Roots" Era created 1st modern, empirically-based theory of intelligence; developed by an early form of factor analysis

Horace Mann

In the "Setting the Stage" Period Through the Boston School Committee he advocated for improvement in the way schools evaluate their students

4 Major Uses and Users of Tests

1. Clinical 2. Educational 3. Personnel/Employment 4. Research

6 Common Elements of a Test

A tests is a: 1) Standardized 2) process or device 3) that yields info 4) about a sample of 5) behavior or cognitive processes 6) in a quantified manner

Mental Ability

Category of Psychological Testing includes cognitive functions such as vocab and memory Historcially centered on intelligence 3 subtypes

2 Major textbooks during the Consolidation Period

Chronbach's Essentials of Psychological Testing (1949) Anastasia's Psychological Testing (1954)

Percentiles

Concept: percentage of cases in the norm group falling below a score Range: 1-99, Median: 50

Defining a variable from most general to most specific

Construct --> Measure --> Raw Data

Tests Info: Textbooks on Testing

Lists of tests in major categories but not intended to be the primary source about particular tests Strengths: good overview Weaknesses: examples/illustrative, may be outdated

Standard Scores to Know: T-Scores

M = 50, SD = 10 Widely used in personality tests (e.g. MMPI-2)

Mode (Mo)

Measure of Central Tendency the most frequent score

Median (Md)

Measure of Central Tendency the score that falls in the middle of the distribution of scores

Mean (M)

Measure of Central Tendency the statistical average of the scores of all participants in a sample: add up all the scores and divide by the # of participants Disadvantage: affected by outliers

Major Force: Rise of Clinical Psych

Testing is closely tied to the emergence of this field As new tests developed, clinicians used them or developed tests themselves

Inferential Statistics

drawing conclusions (inferences) from the sample to the population as a whole (e.g. t-test, Chi Square, ANOVA) All of these statistics yield a p-value

Major Force: Statistical Methodology

interactive relationship with test dvpt. (a number of stat methods were invented specifically in response to dvpts. in testing)

Setting the Stage Period

1840-1880 Concern for mental illness increased and diagnostic methods began Formal written exams - Horace Mann Darwin's theory of evolution got people thinking about differences between species and individuals Experimental Psych established 1879 by Wundt

The Roots Period

1880-1915 Important People: Francis "psych testing" Galton, Alfred "intelligence testing" Binet, Charles "modern intelligence theory" Spearman Concern for reliability emerging - created test that could be scored objectively - hope to make education scientific and increase scorer reliability

The Flowering Period

1915-1940 Many new tests during this era: Terman revised Binet-Simon Scale --> Stanford-Binet (1916) Otis revised Stanford-Binet --> group-administered forms called Army Alpha & Beta (1918/WWI) Rorschach ink blots (1921) Wechsler-Bellevue Intelligence Scale (1939)

The Consolidation Period

1940-1965 Development and revision of tests Application of testing in WWII, clinical pracitce, schools, and industry

Just Yesterday Period

1965-2000 Item response theory - using item characteristic curves, replacing classical test theory which was big in Consolidation Era Legislative and judicial activism - accountability in education, civil rights movement, appropriate use of tests with disabled persons) Public criticism - against standardized tests of ability & achievement Influence of computers on recent testing (1990s-on)

And Now Period

2000-present huge increase in number and diversity of tests influence of managed care: desire for more focused testing on diagnosis --> treatment --> outcomes emergence of evidence-based practice Online administration/reporting of test and dvpt of computer programs to simulate human judgement in the analysis of responses

Tests Info: Electronic Listings

4 Major Electronic Sources: 1) Educational Testing Service 2) Mental Measurements Yearbook 3) Health and Psychosocial Instruments (HaPI) 4) PsycTESTS (http://library.tulane.edu) Strengths: Quick access, convenient Weaknesses: some provide only basic info, some entries are outdated

Interval

Type of scale Equal intervals on scale (e.g. Fahrenheit thermometer, z-scores)

Nominal

Type of scale classifies, assigns numerals (e.g. Student ID #)

Ordinal

Type of scale places in order of magnitude (e.g. College football rankings, percentiles, Likert Scale)

Ratio

Type of scale true zero point (e.g. True zero point on the Kelvin scale, height, weight)

Remote Background Period

Up to 1840 Concern for what was common Oral examinations much more common than written exams Chinese civil service (gov't workers) exams (beginning 200 B.C.) helped with job placement

9 Major Sources of Information about Tests

1) Comprehensive Listings 2) Systematic Reviews 3) Electronic Listings 4) Special Purpose Collections 5) Book on Single Tests 6) Textbooks on Testing 7) Professional Journals 8) Test Publishers 9) Other Users

4 Assumptions of Testing

1) Humans have traits and differences in these traits (e.g. extroversion, verbal ability) are potentially important 2) We can quantify these traits on a continuum 3) traits are reasonably stable 4) measures of the traits relate to actual behavior in real-life situations

3 Research Uses for Tests

1) May serve as the operational definition of a dependent variable in a study 2) May be used to describe samples 3) Research on the tests themselves

Standard Score Characteristics

1) Mean is set at a standard value 2) SD is set at a standard value 3) Since SS's follow the Normal distribution, value of any score expresses its relative location in the distribution Note: if raw scores are not normally distributed, need an area transformation to have a normalized SS

5 Fundamental questions about tests

1) Norms - numerical summary of standardized scores 2) Reliability - consistency of the measurement 3) Validity - accuracy 4) Test Development 5) Practical Issues

3 Major types of norms

1) Percentiles 2) Standard Scores 3) Developmental Norms These 3 are systematically related to one another and conceptualized in the context of the normal curve

3 Significant Publications during the Flowering Period

1) Psychometrika 2) Educational & Psychological Measurement 3) Mental Measurements Yearbook - reviews of tests

7 Historical Periods

1) Remote Background (Up to 1840) 2) Setting the Stage (1840-1880) 3) The Roots (1880-1915) 4) The Flowering (1915-1940) 5) The Consolidation (1940-1965) 6) Just Yesterday (1965-2000) 7) And Now (2000-present)

"Standardized" Test

1) Tests has norms 2) Uniform Procedures 3) group administered, machine scored, multiple choice

6 Major Forces

1) The Scientific Impulse 2) Concern for the Individual 3) Practical Applications 4) Statistical Methodology 5) Rise of Clinical Psychology 6) Computers

5 Major Categories of Tests

1. Mental Ability 2. Achievement 3. Personality 4. Interests and Attitudes 5. Neuropsychological

5 Additional Test Categories

1. Paper & Pencil vs. Performance Tests 2. Speed vs. Power Tests 3. Individual vs. Group Tests 4. Maximum vs. Typical Performance Tests 5. Norm-referenced vs. Criterion-referenced Tests

Tests Info: Test Publishers

All test publishers have a catalog of their products (e.g., Multidimensional Self-Esteem Inventory MSEI) Most have professional personnel available for consultation Strengths: most important for up-to-date info (notices about upcoming new editions) Weaknesses: obviously not unbiased - want you to spend $

Primary Types (5) of Achievement Tests

Batteries (e.g., Stanford Achievement Test) Single Subject (e.g., Graduate Record Examinations GRE, Subject Test in Psychology) Certification, Licensing (e.g., Examination for Professional Practice in Psychology EPPP) Gov't-Sponsored Programs (e.g., National Assessment of Educational Progress NAEP) - Grades 4, 8, 12 Individual Achievement Tests (e.g., Woodcock-Johnson Tests of Achievement)

Interests and Attitudes

Category for Psychological Testing Aimed at assessing one's interests and attitudes 2 types

Neuropsychological

Category for Psychological Testing designed to yield info about the functioning of the CNS (especially the brain) this type of comprehensive assessment overlaps with other categories of psychological testing but takes a unique perspective.

Personality

Category for Psychological Testing designed to yield information about the human personality 3 categories of tests

Achievement

Category of Psychological Testing assesses a person's level of knowledge or skill in a particular domain 5 primary types

Standard Scores (SS)

Conversion of z-scores into a new system with an arbitrarily chosen M and SD Several versions of SS exist (e.g. T-scores, SATs, IQs) Usually a convenient M & SD KNOW THE FORMULA

Major Force: Concern for the Individual

Differential perspective Nearly always an interest in individuals

Clinical Uses of Tests

Includes counseling, school, and neuropsychology Testing may help to do 3 things: 1) identify the nature and severity of a problem 2) provide suggestions about how to deal with the problem 3) measure progress in dealing with the problem

Subtypes (3) of Mental Ability Tests

Individually administered (e.g., Wechsler Adult Intelligence Scale WAIS) group administered (e.g., Otis-Lennon School Ability Test OLSAT) other abilities (e.g., memory (WMS-IV)

Tests Info: Special Purpose Collections

Info about tests in a narrow range of topics E.g., Handbook of Personality Assessment, Positive Psychological Assessment Strengths: very concentrated coverage Weaknesses: only availalbe in a few areas, often not updated

Construct

Most general level of defining a variable Trait or characteristic we want to measure (e.g. egalitarianism)

Standard Scores Linear vs. Nonlinear

Most will be linear transformations of raw scores Some are derived from nonlinear transformations when the raw scores are not normally distributed so we do an area transformation making normalized standard scores

Paper & Pencil vs. Performance Tests

Multiple choice or true/false questions answered using pencil and paper examinee completes an action such as assembling a product, delivering a speech, conduction an experiment, or leading a group

Raw Data

Numbers resulting from application of the measures - most specific level of a variable. Do statistics to this to find out info about the construct

Categories (3) of Personality Tests

Objective - objectively scored (e.g., Minnesota Multiphasic Personality Inventory MMPI) Projective - the examiner interprets the examinee's responsses to ambiuous, unstructured stimuli/tasts as potential indicators of personality (e.g., Rorschach Inkblot Test) Other Approaches - all other techniques devised to assess personality (e.g., clinical interviews)

Measure

Operational definition, often a test

Major Force: Computers

Originally: just statistical processing Phase 2: computers prepared reports of test scores Phase 3: Test administration in 3 ways 1) Computer-based test admin: puts test questions in a text file 2) computer-adaptive testing: next test item is based on previous responses 3) automated scoring: program simulates human judgement in scoring essays - NOT YET

Tests Info: Professional Journals

Some journals are devoted exclusively to testing and measurement topics: Psychometrika; Educational & Psychological Measurement - Flowering Era Journal of Applied Psych often features testing topics Strengths: provide the latest technical research developments in testing methodology Weaknesses: limited for considering the ordinary use of a test - not practical info on a test

Percentiles Strengths and Weaknesses

Strengths: simple concept, easy to explain, easy to calculate Weaknesses: confusion with %-correct score they represent an ordinal scale of measurement so no absolute differences between examinees can be calculated inequality of units throughout the scale - it's a non-linear transformation because percentiles are distributed flat / bunched up in the middle and spread to extremes

Tests Info: Comprehensive Listings

Summary: Breadth but not depth 1) Tests in Print (TIP) - all English commercially published tests 2) Tests - comprehensive reference for tests 3) Directory of Unpublished Experimental Mental Measures - excludes tests from commercial publishers Strengths: very thorough Weaknesses: just basic info about each tests but would have to order and buy them to get use of one (just a catalog)

Tests Info: Systematic Reviews

Summary: Depth but not breadth Mental Measurement Yearbook (MMY) - new copy every 3-4 years and covers each test in depth. Started in Flowering Period Test Critiques - similar but not as many tests as MMY Strengths: critical evaluations of tests Weaknesses: not available for all tests (or most recent editions), only opinion

Major Force: Scientific Impulse

The need to measure scientifically

z-score

The score resulting from subtracting the mean (M) from a raw score (X), then dividing by the standard deviation (SD) It has M = 0 and SD = 1 They are used to map out the normal curve in terms of areas under the curve

Standardized IQ Scores

These scores fall on a normal curve with M=100 and SD=15

Types (2) of Interests and Attitudes Tests

Vocational Interests - help individuals explore careers relevant tot heir interests (e.g., Strong Interest Inventory) Attitude Scales - measure attitudes toward topics, groups, and practices (e.g. attitudes toward older adults)

Central Tendency (3 kinds)

What? the center around which the raw data tend to cluster Mean (M) Median (Md) Mode (Mo)

Variable

a construct or dimension along which individuals differ/vary

Differential perspective

a general disposition to view human behavior in terms of differences between people rather than in terms of general laws that apply to everyone

Normal ("bell") curve

a unimodal, symmetrical around its central axis shape of distribution. Most naturally occurring phenomena are normally distributed (e.g. intelligence)

Individual vs. Group Tests

administered to one person at a time test administered to many people at the same time

categorical variable

can assume only a finite # of values (e.g., had heart attack: yes or no)

Variability

how much participants' scores differ from one another

Maximum vs. Typical Performance Tests

how well the examinees perform when at their best (achievement & ability tests) someone's typical or normal personality, attitude, interest, etc.

Norm-referenced vs. Criterion-referenced Tests

if a score is interpreted based on how groups of people actually perform on the test (e.g. percentile score) if a score is interpreted in relation to some well-defined external criterion (e.g. 90% correct, 70% correct)

Raw score

immediate result of an individuals responses to a test difficult to interpret itself so we usually norm/standardize the scores

Range

simplest measure of variability: it displays the difference between the highest and lowest value observed on the variable Negative: can be biased due to outliers

Departures from normality

skewness: the degree of symmetry for the right and left sides of the curve Negative skew: long tail on the left Positive skew: long tail on the right

Standard Deviation (SD)

the amount the average participant deviates from the mean of the sample. NEED TO KNOW THE FORMULA

Normed Score

the examinee's raw score is compared with the scores of individuals in the normative (standardization) group

continuous variable

theoretically can take on an infinite # of values (e.g. body weight)


Related study sets

Criminal Justice Revel Pearson Chapter 8

View Set

Consumer Behavior - Comprehensive Test

View Set

1601 Quiz 10 Information Security Fundamentals

View Set

RN 31 Ch 39 PrepU Fluid, Electrolyte, Acid-Base Balance

View Set

Cognitive Psychology Exam Qs Ch. 9, 12, 13

View Set