Lecture 2 : Reliability, validity, and the test standards

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

What is the difference between reliability & validity?

Reliability: The test measures one and only one thing (precisely). Validity: The test measures what it is supposed to measure.

5. The intended and unintended consequences of testing - example

Consider consequences of testing (which can be unforseen by the test developer and test user) => EXAMPLE: Test scores (e.g., NAPLAN) intended by developer to be used to identify progress - used to target policy towards areas of need. ACTUAL consequences: League tables, high SES flight from low-NAPLAN schools.

What is the Basis of Construct Validity?

BASIS: Idea of a nomological network - "the interlocking system of laws which constitute a theory" (p. 290) - The theoretical framework about what the test should measure - Encompasses things like: • Correlations with other tests and outcomes • Group differences • Structure • Differences over time • Differences under different conditions, training, interventions etc. -> Validity = empirical tests of this theoretical framework

What is the GENERALIZABILITY THEORY?

GENERALIZABILITY THEORY (G-Theory): -> looks at all the different sources as part of the same analysis - looks at the amount of inconsistency due to each source of error. OR (Wikipedia) -> is a statistical framework for conceptualising, investigating, and designing reliable observations. It is used to determine the reliability (i.e., reproducibility) of measurements under specific conditions.

2. Response Processes as Evidence for Validity - explain

If the test is intended to capture a particular process, evidence should show that the test does measure this process - Think aloud protocols - Eye-tracking - Computer models - Susceptibility to manipulations, coaching etc. (in line with theory)

What are the problems with Construct Validity?

PROBELMS: - If a researcher tests validity and does not find expected outcomes, is the theory mis-specified or is the test invalid? - No clear specification of HOW to test this

What are the three parts to the test standards

Part I : Foundations Part II: Operations Part III: Testing Applications

What are the 3 tenants of professional practice for using and interpreting test scores?

Part I: Foundations 1) Validity 2) Reliability/precision and errors of measurement 3) Fairness in testing

What are the Test Standards?

Recommendations for using and interpreting test scores, developed and distributed by: • American Psychological Association (APA) • American Educational Research Association (AERA) • National Council on Measurement in Education (NCME) • 1954, 1966, 1974, 1985, 1999 - New standards "available" now (2014)

Why are the test standards important?

Test standards are a framework - Represent current consensus (therefore current operational guidelines) - Alternative viewpoints, arguments, propositions - Psychometric models for evaluating validity, reliability, and bias (including generalizability theory)

What was the 1985 Standards definition of Validity?

Tripartate + outcomes (intended or unintended consequences of the test)

What was the 1966 Standards definition of Validity?

Tripartate view (dominated psychology) CONTENT VALIDITY CRITERION VALIDITY (concurrent & predictive) CONTRUCT VALIDITY (discriminant & convergent)

What is validity dependent on?

Validity is dependent on: - Test purpose and use - Characteristics of the test-takers

What is Validity? (Test Standards, 2014)

"Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests" EVIDENCE (empirical observations) THEORY (meaning of observations within frameworks) INTERPRETATION (not scores themselves, but the meaning that test users derive from them) USE OF TESTS (need to know the test's purpose/outcome to evaluate validity)

Give two examples of tests being used for different purposes in different groups

- English-language reading comprehension test: valid indicator of sixth grade academic achievement, BUT poor indicator of the intelligence of adult migrants from non-English speaking countries - MMPI: validity = differences b/w normal vs psychiatric disorder. But this and derivatives now used for employment selection MMPI - Minnesota Multiphasic Personality Inventory

Support for Reliability - what are three things that need consistency?

1) CONSISTENCY ACROSS ITEMS: All items measure the same thing: - Internal consistency - Alternate forms - Split-half reliability 2) CONSISTENCY ACROSS TIME: The test measures the same thing every time. - Test-retest 3) CONSISTENCY ACROSS OTHER SOURCES: (e.g, raters - inter-rater reliability)

Support for Validity - what are the five sources of evidence?

1) EVIDENCE FROM ITEM CONTENT 2) EVIDENCE FROM PROCESS / MANIPULATIONS 3) EVIDENCE FROM INTERNAL STRUCTURE 4) EVIDENCE FROM RELATIONSHIP TO OTHER VARIABLES: - Criterion (concurrent + predictive) - Construct (discriminant + convergent) 5) EVIDENCE FROM CONSEQUENCES OF TEST USE

What is the over-arching standard of validity?

A rationale should be presented for each recommended interpretation and use of test scores, together with a summary of the evidence and theory bearing on the intended use or interpretation.

What was the 1999 Standards definition of Validity?

A unitary form of validity, based on evidence from multiple sources to support an argument for what the test scores actually mean 1. Evidence from the content of the test 2. Evidence from response processes 3. Evidence from internal structure 4. Evidence from the relationship to other variables 5. Evidence regarding the consequences of testing

The 2014 Standards definition of Validity:

A unitary form of validity, based on evidence from multiple sources to support an argument for what the test scores actually mean (essentially unchanged from 1999)

What was the 1954 Standards definition of Validity?

Criterion-based view -"A test is valid for anything with which it correlates" (Guilford, 1946)

Expand upon the criterion view of Validity:

Early 1950s: Guilford, Gullikson, Cureton BASIS: A test is used to predict an outcome. How well it predicts this outcome is the validity of the test. - Validity = correlation with criterion (e.g., intelligence with school grades) - Possible to talk about a validity coefficient in an absolute sense - Validity conceptualized as a static property of the measure: A test is either valid or not valid

3. Internal Structure as Evidence for Validity - give an example

The number of subcomponents found empirically = the number of subcomponents theoretically expected =>6 facets of Conscientiousness from the NEO-PI-R personality model (competence, order, dutifulness, achievement striving, self-discipline, and deliberation)

What are the three key components in the Tripartite view?

The three key components in the Tripartite View: 1. Content validity - Content of test is both relevant to domain and representative of domain 2. Criterion validity: Correlations with a criterion • Concurrent: Criterion measured as same time as test administered • Predictive: Criterion measured as some time after the test administered 3. Construct validity • Convergent: concepts that are theoretically related demonstrate empirical relationships • Discriminant: concepts that are theoretically unrelated show no empirical relationships

What are the two key publications in the Tripartite view?

The two key publications in the Tripartite View: 1. Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281 - 302 2. Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait multimethod matrix. Psychological Bulletin, 56, 81-105

What is a major problem with the "new" standards?

They are currently very difficult to get!

4. Relationship to Other Variables as Evidence of Validity - explain

• Convergent and discriminant evidence (As in tripartite model) • Test-criterion relationships (As in tripartite model) • Validity generalization - Replication in different situations: • Different POPULATIONS (e.g., different countries, states, sectors) • Different CONDITIONS (e.g., proctored versus unproctored, timed versus untimed) - Replication for different purposes: • job performance vs academic achievement • performance on different types of jobs

Validity as expressed in 1999/2014 Standards

• No longer conceptualise different "types" of validity • Validity is a property of the interpretation of test scores, not the test scores themselves. • Evidence that the interpretation is valid derives from: 1. The content of the test 2. The response processes captured by the test 3. The internal structure of the test 4. The relationship of the test to other variables • Convergent and discriminant evidence • Test-criterion relationships • Validity generalization 5. The intended and unintended consequences of testing

1. Test Content Evidence for Validity - what are the two parts?

• Relevance • Representativeness

What were some of the issues with the criterion view of validity?

• There is not always one obvious criterion variable - Criterion for test of self-control? aggression? - Thought experiment: Other than test scores, what is a pure measure of the attribute? • Some tests used for different purposes, in different groups

What are some issues with the Tripartite View?

• Too much emphasis on different forms of validity - The test can have "convergent validity" but not "predictive validity" - what does this mean? - Distinction between convergent and concurrent not always clear e.g. Correlation between a vocabulary test and English language grade? • Over-emphasis on correlations as proof • No explicit mention of the test use and consequences

Lecture 2 : Reliability, validity, and the test standards

Set pelajaran terkait

GMAT - Quant

Econ final exam

J! set 18

Chapter 18: Blood

Engl 1320 Test #2

Chapter 36 Questions, Chapter 38 Questions, Chapter 37 Questions

Unit 9 PK Terms + SAQ

koder i Python

Econ 201 Exam 1

Chapter 28: Growth and Development of the School-Age Child

CHAPTER 1: MATHEMATICAL ESSENTIALS

Current Electricity

Chapter 11 SCMT 335

Ch. 9 Soc. - Inquizitive

BLW 302 Chapter 3 ethics

Mastering A & P Chapter 10

Chapter 3

Elements, Compounds and Mixtures Test

Superficial Structures: Chapter 23 - Scrotum

APUSH American Revolution Review