Lecture 4: Validity
What is the general process involved in testing empirical validity?
1. Create a hypothesis regarding how your measure should perform if it is valid. 2. Design and run a study to validate this hypothesis.
Give two examples of criterion contamination, explaining what is contaminated in each case.
1. If you have a test that measures schizophrenia, you can claim that your test is valid if it can effectively measure schizophrenia. However, if the diagnosis is based on your test itself, then it has undergone criterion contamination (measure becomes redundant). 2. Zuckerman sensation seeking scale was validated by comparing with another test, but that test had the same items as the Zuckerman's scale.
Criterion Validity
A form of validity in which a psychological measure is able to predict some future behavior or is meaningfully related to some other measure
Criterion
A standard against which the test is evaluated. For example, actual driving speed is one of the criteria used to validate the driving speed questionnaire.
Predictive Validity
A subset of criterion validity that measures the criterion at another future time from the original test score. For example, IQ at age 17 and GPA at uni.
Internal Structure As Expected
Are the items on the test heterogenous or homogenous as we wanted them to be? Is also evaluated using Factor Analysis.
How could I create an exam that had great empirical validity but poor content validity?
Ask a bunch of questions that are correlated with the construct, but does not measure a construct itself. For example, for a PSYC3020 exam you could ask an individual how well they did in PSYC3010, how many hours they spent studying, etc. without asking anything about the course content itself.
Non-random attrition
Certain types of people dropping out of the study.
What are different ways you can go about testing the empirical validity of a test?
Criterion validity (concurrent, predictive and incremental validity), convergent validity, discriminant/divergent validity, developmental changes as expected, experimental effects as expected, internal structure as expected.
Describe the main features of the WISC IV intelligence test.
Designed for use for children age 6-16 10 subtests arranged into 4 groups (derived through factor analysis) that are combined to form a measure of general intellectual functioning. Good test-restest, interrater and internal consistency reliabilities SEDiff = 3.79 Predicts academic achievement, differences between normative and special populations, correlates with other IQ tests.
Why is it not strictly accurate to talk about the validity of a test?
Different tests have different purposes, therefore the validity of a particular test varies to the context that it is used upon.
Opinion-based Validity
Face validity and Content Validity
What is the difference between content and face validity?
Face validity is how valid a test seems to a layperson while content validity is how much of the actual content in an area we are trying to measure is sampled in the measure. Face validity requires the opinion of the layperson/test taker, while content validity is best measured by the opinion of experts.
Measuring Content Validity
Getting experts to rate each item based on relevance and level of representation with respect to the material being tested.
Explain what factor analysis does?
Groups individual question items based on how much they correlate with one another, through mathematical techniques. Identifies the homogeneity/heterogeneity of the test.
Highly Valid Measurement
Has minimal construct underrepresentation and minimal construct irrelevant variance.
Incremental Validity
How much each individual predictor adds to predicting the criterion in addition to the effect of other predictors.
Face Validity
How valid a test appears to be - usually to the perspective of the test-taker.
Developmental Changes As Expected
How well scores vary with age, as predicted.
Convergent Validity
How well test scores correlate with another measure of the same/similar thing.
Construct Validity
How well the scores on your test reflect the construct that your test is supposed to be measuring.
Magnitude of Validity
Is dependent upon context. For example, a new IQ test must have a validity coefficient of at least a .80, other tests might not require a number as large.
Empirical Evidence Based Validity
Is measured through formulating hypotheses that are later tested. Constitutes criterion validity, convergent validity, discriminant/divergent validity, developmental changes as expected, experimental effects as expected, internal structure as expected (factor analysis).
Context
Must be validated with each different context the test is used with
Is validity necessary for reliability
No, because you can measure something over and over again and get the same results, but you might not be measuring what you are trying to measure/the measure is not valid.
Give two examples of things that might restrict the range of scores in a test and indicate what influence this could have on the validity coefficient.
Non-random attrition - validity coefficient will only be determined by those who stay in the study. Self-selection: only a certain type of people sign up to the study and they will bring in certain characteristics that could confound the study/affect validity coefficient. Essentially these are confounds to an experimental study.
Factors that restrict the range of scores in a validity study
Non-random attrition and self-selection.
Self-selection
Only a certain type of people constituted within the sample in the first place.
Concurrent Validity
Subset of criterion validity that measures the criterion and test scores at the same time.
Experimental Effects As Expected
The extent of a difference in scores before and after a certain experimental effect.
Construct Validity
The extent to which a test measures constrict. Essentially covers all forms of validity.
Content Validity
The extent to which a test samples the behavior that is of interest (such as a driving test that samples driving tasks, an exam that contains materials from the lecture)
Discriminant/Divergent Validity
The extent to which test scores DO NOT correlate with another measure that you DO NOT expect them to correlate with.
Validity
The extent to which the measure actually measures what it's supposed to measure.
What is the relationship between the reliability of the test/criterion and the validity coefficient?
The validity coefficient is always less than or equal to the square root of the rest's reliability multiplied by the square root of the test's criterion.
Construct
The variable that we are trying to measure
Construct Irrelevant Variance
Variation that is captured in the measurement of a particular construct, but it not related to the construct itself.
Construct Underrepresentation
Variation that is not captured in the underlying construct that is being measured.
Criterion Contamination
When the criterion used to assess the validity of the test is pre-determined by the test itself.
Does it matter if a criterion used to validate a test is not that reliable? Why?
Yes, because the reliability of each scale limits the size of how big the validity coefficient can be.
Is reliability necessary for validity
Yes, because you cannot have a valid test that is completely unreliable as there is nothing stable enough to be valid/invalid.