Exam 4: Chapter 4A Test Validity

Ace your homework & exams now with Quizwiz!

convergent validity

demonstrated by a strong relationship between the scores obtained from two (or more) different methods of measuring the same construct; high correlation between tests and variables

content validity

determined by the degree to which the questions, tasks, or items on a test are representative of the universe of behavior the test was designed to sample

weak, acceptable, or strong

determining whether inferences are appropriate, meaningful, and useful typically requires numerous studies of the relationship ("predictive validity") between test performance and independently observed behaviors validity is characterized by___

example of maximum possible rxy

test reliability (rxx) =0.88 criterion (outcome) reliability (0.33) square root (0.88)(0.33) = 0.54 0.54 maximum reliability (usually will be lower than 1)

example of appropriate developmental changes

vocabulary in order to establish the content validity of a new vocabulary test, older people should do better than younger people. stanza-poem (intended definition on test) stanza-car (what younger people would say because this car was new around that time)

Sy

standard deviation of the CRITERION (outcome variable)

test utility

"Does use of this test result in better patient outcomes or more efficient delivery of services?"

validity

"a test is to the extent that inferences made from it are appropriate, meaningful, and useful"

2 points about construct validity

1) psychopathy is a multifaceted construct 2) no SINGLE study is going to establish construct validity all of the time

categories of construct validity

-analysis to determine whether the test items or subtests are homogeneous and therefore measure a single construct -study of developmental changes to determine whether they are consistent with the theory of construct -research to ascertain whether group differences on test scores are theory-consistent -analysis to determine whether intervention effects on test scores are theory-consistent -correlation of the test with other related and unrelated tests and measures -factor analysis of test scores in relation to other sources of information -analysis to determine whether test scores allow or the correct classification of examinees

95% confidence interval

.95 +/-(1.96).152 .95 +/- 3 = .65 and 1.25 There is a 95% probability that his actual GPA will be between .65 and 1.25 We are making a prediction of Johnny's GPA but we are also taking into account prediction error.

steps of establishing test homogeneity

1) validity process- more of a weaker form of validity by itself. necessary but not sufficient 2) item analysis- gives us a set of homogeneous items but, we still have to be able to predict some real world behavior 3) homogeneous scale- sets the stage for good construct validity studies later

decision theory assumptions

1. The value of various outcomes to the institution can be expressed in terms of a common utility scale. One scale is profit and loss. The cost of the selection procedure must also be factored in to the utility scale as well. 2. In institutional decisions. the most generally useful strategy is one that maximizes the average gain on the utility scale (or minimizes average loss) over many similar decisions.

characteristics of all psychological constructs

1. There is no single external referent sufficient to validate the existence of the construct; that is, the construct cannot be operationally defined. 2. Nonetheless, a network of interlocking suppositions can be derived from existing theory about the construct.

methods of establishing construct validity

1. test homogeneity 2. appropriate developmental changes 3. theory-consistent group differences 4. convergent/divergent validity

content, criterion-related, construct

3 broad areas of validity

AUM example with sensitivity and specificity

87.2% sensitivity; 22.8% specificity the test is good at determining who will get a degree nut not who will not earn a degree model is sensitive but not specific

factor loading

A correlation between a single measure and the factor to which it's being related.

multitrait-multimethod matrix

A matrix that includes information on correlations between the measure and traits that it should be related to and traits that it should not theoretically be related to. The matrix also includes correlations between the measure of interest and other same-methods measures and measures that use different assessment methods.

homogeneous scale

A scale in which the individual items tend to measure the same thing; homogeneity is gauged by item-total correlations.

convergent validity example

California Psychological Inventory - Self Control Scale the scale is trying to show that there is a correlation between temperament and self control in both males and females it has a fairly high (in the context) correlations with similar scales from the Guilford-Zimmerman Temperament survey, the content has convergent validity (and content validity)

discriminant (divergent) validity example

California Psychological Inventory- Self Control Scale the scale is trying to show that intelligence is not related to impulsivity in both males and females (self control) On the scale, self control has a low correlation with intelligence/achievements, so there is discriminant/divergent validity. (also contains content validity)

types of validity

Content validity Criterion-related validity Construct validity

example of concurrent validity

What is the correlation (rxy) between Reading Achievement Test Scores and CURRENT class ranking in reading?

example of criterion-related validity

Does SAT scores predict college GPA?

appropriate developmental changes

Does the test react in an expected way to developmental changes? some constructs show regular, age-related changes across all or part of the lifespan. not all tests have any kind of age-related pattern of change (i.e., depression)

test sensitivity

How good is the test at predicting the positive outcome? (amount of degree) true positives / true positive + false negatives

test specificity

How well is the test able to predict the negative outcomes? true negatives/ true negatives + false negatives

negative predictive value meaning

If a person gets a 'negative' test score, there is a __% probability that they will be in the 'negative' outcome category.

positive predictive value meaning

If a person gets a 'positive' test score, there is a __ probability that they will be in the 'positive' outcome category.

face validity

If a test lacks this, the person can doubt the test and may not take it seriously, affecting the results deals with how the test appears to the examinee not "validity" in a technical sense could cause examinee to doubt the usefulness of psychological testing

theoretical understanding

Is the test able to predict the non-test criteria that the theory predicts ? tests claim to measure this (Not use to predict behavior, but use it to see if the test accounts for a specific behavior)

criterion-related validity

Is the test effective in estimating (predicting) the examinee's performance on some OUTCOME measure? OUTCOME measure=criterion

content validity

Is the test representative of the universe of behavior the test was designed to sample? boils down to a SAMPLING issue if the sample is representative o the population, then the test has content validity has more application in the realm of achievement tests, where it is possible ahead of time to delineate what people are supposed to know

construct example

Jack gives money to the beggar because Jack is generous. We infer that Jack is generous because of his behavior. There is something about Jack (generosity) that causes him to give money to the beggar.

example of prediction error

Johnny takes a test designed to predict his 1st semester GPA in the 10th grade. The test score leads us to predict that Johnny will earn a 0.95 GPA during his 1st semester in the 10th grade. But Johnny's actual GPA turns out to be 0.75. The difference between the 0.95 PREDICTED GPA and the 0.75 ACTUAL GPA is a PREDICTION ERROR!! In this case, 0.20 GPA points

^Y +/- 1.96 (SEe)

Once you've calculated the SEe you can then calculate the 95% CI around the predicted outcome score ^Y. Multiply SEe by 1.96. Then add this product to the predicted outcome score and subtract this product by the predicted outcome score.

base rate

PPV is similar to this

construct validity example

Psychopaths are supposed to have behavioral disinhibition. There is a MMPI scale called the Psychopathic Deviate scale Does this scale really measure this component of psychopathy? To have construct validity, this scale must measure behavioral disinhibition.

criterion-related validity must be...

RELIABLE (how can we predict the outcome if the outcome is unreliable? unreliable=unpredictable) APPROPRIATE (there should be a clear rationale choosing the criterion. criterion must be described accurately) FREE OF CONTAMINATION (no overlapping 'test items,' raters must not know test scores in advance of ratings

regression equation

The best-fitting straight line's equation for estimating the criterion from the test

overall correct classification meaning

The overall correct classification of the test is __ %.

positive predictive value

The proportion of patients with positive test results who are correctly diagnosed "If a person gets a 'positive' test score, what is the probability that he actually is or will be in the 'positive' outcome category?"

test sensitivity meaning

The test accurately detects __% of the people who have the positive outcome.

test specificity meaning

The test accurately detects __% of the people with the negative outcome.

example of concurrent validity

Time 1 (ACT) ---> Time 2 (GPA) 0.40 predictive validity big time interval allows this amount of error to happen there are external factors like stress, working, kids, illness, wrong major, and mental health that makes this number so low

predictive validity

Validity coefficient (rxy)=Pearson correlation between test and outcome. Same as for concurrent validity test --> outcome between time one --> time two

inferences from observable behavior

Where do constructs come from?

situational

Which type of validity is best? impossible to say that one is better than another, they answer different questions, follow from different definitions of the term validity, proper validation depends upon the type of test

domain specification

a description of the domain that they are trying to measure

factor analysis

a statistical procedure that identifies clusters of related items (called factors) on a test; used to identify different dimensions of performance that underlie a person's total score.

discriminant validity

a test does not correlate with variables or tests from which it should differ

example of validity coefficient formula

a test has a reliability (rxx)= 0.90 the criterion (outcome) has a reliability (ryy) =0.82 the MAX correlation (rxy) that can exist between these two variables is: rxy = square root (0.90)(0.82) = 0.86

construct

a theoretical, intangible quality or trait in which individuals differ people have differing "amounts" of this trait or quality ex: introversion, depression, leadership ability, overcontrolled hostility

true positive and true negative

accurately predicted outcomes less error, more linear, off-diagonals minimized, and diagonals maximizes

standard error of estimate (SEe)

allows us to estimate the AMOUNT of error in a predicted criterion (outcome) SEe = Sy square root (1 - rxy^2) Sy= standard deviation of the criterion (outcome variable) rxy^2 = the square of the validity coefficient units will be whatever the outcome variables units are

test specificity

among people who were negative on the outcome variable, what percent scored below the cut-off score? how is the test at detecting the negative outcome?

test sensitivity

among people who were positive on the outcome variable, what percent scores about the cut-off score? how good is the test at detecting the positive outcome?

base rate + test validity

base rate is the number of cases that is in the population population can be general or some more restricted population generally, it is easier to detect "true" positives when the base rate is high, as compared to when it is low

dichtomous

binary outcomes

example of predictive validity

college exams predicted from an entrance exam

2 types of criterion validity

concurrent and predictive validity

Theory-Consistent Group Differences

construct validity may be established by demonstrating that groups that SHOULD be different, ARE different on the test. contrasting groups approach

test validation research

continues long after test publication and throughout the life of the test

convergent and discriminant (divergent) validity

convergent validity- a test correlates highly with other variables or tests with which it shares overlap of constructs discriminant validity- the test DOES NOT correlate with variables or tests from which it should differ

test homogeneity

correlate each potential item with total scores and toss out items that don't show high correlations with total scores

false positive and false negative

did not accurately predict outcomes off-diagonals

concurrent validity

eliminates the time interval for all of these extraneous variables to effect amount of error easier and less expensive than predictive validity rxy=Pearson correlation coefficient between test scores and outcome measure

D

false negative

A

false positive

incorrect predictions formula

false positives + false negatives / total number of outcomes or 100% - (correct predictions) = __ %

face validity example

firefighter promotion exam seeing a problem concerning the Pythagorean theorem; may not think that it belongs on the exam vs seeing a problem with a fire truck and a building and having to calculate the angle that the ladder needs to be put at same problem but one looks more relevant than the other

Cronbach and Meehl (1955)

found that hunters who "carelessly" shot somebody had Psychopathic Deviate scores that were significantly higher than the scores of other hunters. If Psychopathy involves behavioral disinhibition, and if the MMPI measures psychopathy, then this is the outcome that should have been found

correlation and prediction error

high scores on x should report high outcomes while low scores on x should report low outcomes false positive and negatives should be errors (off-diagonal quadrants)

face validity

if it looks valid to test users, examiners, and especially the examinees

error in predictions

if rxy=1 means that you would NEVER make an error in prediction, then any rxy<1 means that you will make SOME errors in predicting the criterion (outcome)

criterion-related validity

if the relationship between the Test and the Criterion were Perfect, then rxy=1 AND there would be NO Error in predicting the Criterion from the Test but, rxy is Never perfect, and it's good if you can get rxy equivalent to 0.80. (creates a ceiling)

meaning of validity coefficient formula

it means that the MAX correlation that can exist between Test X and Criterion Y is limited by the RELIABILITIES of each measure; max correlation is equal to the square root of the product of validity of the test of the outcome variable fundamental testing on reliability

self-control

low impulsivity

false positives; false negatives

no selection test is a perfect predictor, so two other types of outcomes are also possible some persons predicted to succeed would, if given the chance, fail & some persons predicted to fail would, if given the chance, succeed

content validity

not only do you need to consider the infinite universe of items, but also, the infinite universe of responses, how the test items are constructed, the instructions, and so on... when the test is measuring something that is not very well defined, then content validity is difficult

construct validity

pertains to psychological tests that claim to measure complex, multifaceted, and theory-bound psychological attributes such as psychopathy, intelligence, leadership ability, and the like

theory-consistent group differences example

on a social interest scale, those in different statuses will likely have different social interests (nuns and church members scored highest, convicted felons scored lowest) on an IQ scale, those in different occupations, should have different IQ's (professionals and technical scored highest, unskilled laborers scored lowest) if these things are not different, then need to question the construct validity

Y

outcome variable

contamination of criterion-related validity

overestimation of validity independent; if 2 parts are the same, then it will be over-inflated and raise the validity coefficient

negative predictive value

proportion of patients with negative test results who are correctly diagnosed. "If a person gets a 'negative' test score, what is the probability that she actually is or will be in the 'negative' outcome category?

Negative Predictive Value (NPV)

proportion of people who scored below the cut score who did not earn a degree ratio of true negatives to all people who tested negative not everyone who is predicted to be negative will receive the negative outcome but this certain percentage of people did true negatives / true negatives + false negatives

Positive Predictive Value (PPV)

proportion of people who test positive who are also positive on the outcome variable ratio of true positives to all people who tested positive not everyone who is predicted to be positive will receive the positive outcome but this certain percentage of people did true positives / true positives + false positives

validity coefficient formula

rxy= square root (rxx)(ryy) computes the correlation between test and criterion (rxy)

contrasting groups approach

seeing if two items that are not related have any correlation on a test there should be little correlation between the two items if the theory-consistent group differences have construct validity

difference between sensitivity and specificity

sensitivity- positive outcome test specificity- negative outcome test

0.80

serves as a ceiling because reliability can NEVER be perfect 1 is mathematically possible but not very practical (criterion-related validity)

extravalidity concerns

side effects and unintended consequences of testing

SEe = sdy square root (1-rxy^2)

standard error of estimate formula. we use it to place a 95% interval around the predicted outcome score ^Y. this formula says to subtract the square of the validity coefficient (rxy^2) from 1 and then take the square root. multiply this by the standard deviation for the predicted outcome variable (sdy) DON'T FORGET TO SQUARE RXY

basics of validity

still evolving as a construct (still debated and discussed) the DEMONSTRATION that a test measures what it claims to measure test developer is responsible for providing validity information, usually in the test manual something other than itself scores and outside behavior/external criterion

decision theory

stresses that the purpose of psychological testing is not measurement per se but measurement in the service of decision making

X

test

examples of predictive validity

test predicted outcome SAT test 1st semester college GPA Pre-employment test Supervisor job performance ratings Cholesterol Screening test later heart attacks in each case, the validity coefficient (rxy)=the Pearson correlation between the TEST and the PREDICTED OUTCOME

meaning of test scores

test scores are meaningless until the examiner draws Inferences from it based on research findings "A score of 65 on an IQ test is meaningless until the examiner infers that the individual is intellectually disabled and will have problems with academic pursuits"

construct validity

tests that claim to measure complex, multifaceted theory-bound attributes like psychopathy, intelligence, and leadership ability. no single study is capable of establishing the construct validity of a test it is an ongoing enterprise, and is refined as more becomes known through research

concurrent validity

the criterion measures are obtained at approximately the same time as test scores

predictive validity

the criterion measures are obtained in the future, usually months or years after the test scores are obtained

population

the infinite universe of test items

standard error of estimate (SEest)

the margin of error to be expected in the predicted criterion score

overall correct classification

the proportion of all cases that are true positive or true negative

base rate

the proportion of cases in the population ex: essentially the probability that selecting someone at random from the applicant pool will be 81% that they will earn a degree.

Inversely proportional relationship

the relationship between test sensitivity and test specificity

rxy^2

the square of the validity coefficient

expert judges

the test developer will create a "DOMAIN SPECIFICATION" they will then rate each item, using a rating scale, according to its relevance to the domain

sample

the test items usually relies on expert jugdement

validity relationship to reliability

the theoretical Upper limit of the validity coefficient rxy = square root (rxx)(ryy) rxy=validity rxx=reliability of test ryy=outcome variable this is a VERY IMPORTANT question that is basically sets the limit on how highly a test and an outcome may correlate

meaning of 95% confidence interval

there is a 95% probability that the predicted outcome score will fall in this interval

rxy = square root (rxx)(ryy)

this formula defines the MAXIMUM validity coefficient that may be obtained between a test X and an outcome variable Y the formula says take the reliability coefficient of the test (rxx) multiply is by the reliability coefficient for the outcome variable (ryy) and then take the square root of the product

C

true negative

C/C+D

true negatives/ true negatives + false negatives negative predictive value

C/A+C

true negatives/ true negatives + false positives test specificity

B

true positive

B+C / A+B+C+D

true positives + true negatives / TP + TN + FP + FN overall correct classification

correct predictions formula

true positives + true negatives / total number of outcomes

B/B+D

true positives/ true positives + false negatives test sensitivity

B/B+A

true positives/ true positives + false positives positive predictive value

prediction error

unit less rxy=.75 perfect validity rxy =1 1-.75 =.25 .25 prediction error (unit less)

test homogeneity example

we have a test that consists of 10 items, each item is sores on a 5 point Likert scale (1-5). we then give the test to 30 people. everyone will have a total score (range 10-50) and everyone will have 10 scores on 10 items, with scores ranging 1-5. item 3 has a correlation of .11 and item 6 has a correlation of .12, so it would be likely that you would eliminate those items. but you could also, rewrite or restructure those items since they don't seem to be measuring the construct that the other items (that have better correlations) are measuring

criterion

what you are trying to predict aka outcome variable


Related study sets

Chem I Lesson 12 - Thermodynamics, Entropy, Gibbs Free Energy, and Le Chatlier's Principle

View Set

ACC-590 Practice Exam 1, ACC-590 Chapter 3 Textbook MC, ACC-590 Chapter 2 Textbook MC, ACC-590 Chapter 1 Textbook MC, ACC-590 Quizzes 1-6

View Set

Topic Two, Lesson Five: Benefits of Free Enterprise-Economics

View Set

Chapter 3: Networking Connectors and Wiring Standards

View Set