Psych Assess - Exam III

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Domain Sampling

(1) A sample of behaviors from all possible behaviors that could be indicative of a particular construct; (2) a sample of test items from all possible items that could be used to measure a particular construct Items must represent the construct (i.e. Romantic satisfaction)

Sources of Error

-test construction: Use domain sampling -test administration: Uneven tables, loud noise, uncomfortable temperature -test scoring and interpretation

Validity Coefficient

A correlation coefficient that provides a measure of the relationship between test scores and scores on a criterion measure Rarely greater than 0.5

Kuder-Richardson formula 20

A formula for computing split-half reliability that corrects for the fact that individual scores are based on only half of the total test items.

Split-half reliability

A measure of reliability in which a test is split into two parts and an individual's scores on both halves are compared. Decreases internal consistency

Internal Consistency

A measure of reliability; are we measuring the same contruct

Kappa statistic

A measure of the degree of nonrandom agreement between observers of the same categorical variable.

coefficient of stability

An estimate of test-retest reliability obtained during time intervals of six months or longer

Standard Error of Measurement

An index of how much error there is in observed scores; amount of rubber in our tape measure Used to compute confidence intervals - 95 means that 95% sure that the true score is within that interval i.e. Gifted or special ED - Creating cut off score with confidence interval How sure can we be that the person meets the cut off

Cronbach's alpha

An indicator of internal consistency reliability assessed by examining the average correlation of each item (question) in a measure with every other question.

Who gets surveyed?

Brightly educated people - 33% of Americans are college grads - Use simple direct language - Avoid colloquialisms, even if all your friends know them

If anyone gives the same answer then, it is most likely not a useful question

Can you conceive of a test item on a rating scale requiring human judgment that all raters will score the same 100% of the time?

What is your construct of interest?

Can you define it? Who is using this info? Is there a specific context - Tends to find fault with others AT WORK

Random Error

Caused by unpredictable flunctuations of other variables in measurement process (i.e. noise)

Generalizabilty theory

Compatible with True Score Theory Non-test error

Utility

Considers how tests and asses benefit society Usually quantitative

Ways of considering utility

Cost efficiency Time Savings Cost-benefit ratio Clinical utility: Some tests are more efficient Selection utility: Can help choose the best people

Item Response Theory

Different than True Score Theory Focus: Item Difficulty: Ranges from 0 (everyone answers the same way) to 1 (Extreme) Allows for adaptive testings; questions adjust depending on how well previous questions were answered Less time testing Requires large pool of items

Parallel forms

Different versions of a test used to assess test reliability; the change of forms reduces effects of direct practice, memory, or the desire of an individual to appear consistent on the same items; psychometrically identical, it has been tested Same means and variances r must be at least .80 Easier for intellectual ability than personality

How to improve reliability

Eliminate poor items Add new items Factor analysis

True Score Theory

Every score has true level of knowledge & some error; (X = T + E) Observed Score = True Score + Error

Test-retest reliability, minimizes cheating, having a parallel form just in case someone gets sick, minimizes practice effects

From the perspective of the test user, what are other possible advantages of having alternate or parallel forms of the same test

Test-retest reliability Minimizes cheating Having a parallel form just in case someone gets sick Minimizes practice effects

From the perspective of the test user, what are other possible advantages of having alternative or parallel forms of the same test?

Audio-recording method: One clinician interviews a patient and assigns diagonoses. Then a second clinician, who does not know what diagnoses were assigned, listens to an audio-recording of the interview and independently assigns diagnoses - They don't ask about remaining symptoms of the disorder - Only the interviewing clinician can follow up patient responses with further questions - Even when semistructured interviews are used it is possible that two highly trained clinicians might obtain different responses The reliability of diagonises is far lower than commonly believed

How is reliability of diagonoses determined? Why does the method affect the estimated reliability? What does this mean?

Discrimination

How much of a distinction is there a difference between those who chase and those that didn't Ranges from -1 to 1 (-1 = reflects opposite of trait) 0.8=excellent, -0.2 to +0.2 = Poor

May not like to shoot someone Increasing suicide rates

How would you describe the non-economic cost of a nation's armed forces using ineffective screening mechanisms to screen military recruits?

Can you conceive of a test item on a rating scale requiring human judgement that all raters will score the same 100% of the time?

If anyone gives the same answer then, it is most likely not a useful question

Spearman-Brown

In order to estimate the internal-consistency reliability of a test, one would use the __________ formula

Why is the phrase valid test sometimes misleading?

It can be context dependent might be valid in some contexts but not the other

Face Validity

Measures whether a test looks like it tests what it is supposed to test. Doesn't really tell us if we are actually indicate actual validity Does not help interpret the scores

Pre-post designs are not optimal because there is no way of telling if treatment worked or not Having an experimental and control group

Might it have been advisable to have simultaneous testing of a matched group of couples who did not participate in sex therapy and simultaneous testing of a matched group of couples who did not consult divorce attorneys? In both instances, would there have been any reason to expect any significant changes in the test scores of these 2 control groups

Federale Express Status Quo

Must have a valid driver's license No criminal record 3 months probation

Deciding in patient eval is not easy Police officer assessment, suspected child abuse, if someone is suicidal, guilty or innocent, has cancer

Provides an example of another situation in which the stakes involving the utility of a tool of psychological assessment are high.

Top-Down Selection

Ranking applicants on the basis of their total scores and selecting from the top down until the desired number of candidates has been selected. Disadvantage: Disparate Impact

Interpreting Reliability

Research= 0.7 or better Diagnosis= 0.9 or better Screening= 0.8 or better Screening is broad and is used to see if a follow up is needed. It is cheaper and is done with multiple people Observed scores are closer to the true score variability is narrow Each observed score has an error. The more subjects we have, the error in the average goes to 0 When n goes up and towards infinity and beyond, the error goes down towards 0

Depression False Positive Consequences

Resources are taken away for those that need it Giving meds to those that don't need it Losing time Unnecessary labels

Things to Avoid

Response with more than 1 meaning, should be clear and unambigious Double barreled questions

Standard Error of the Difference Equation

SD times the square root of 2 - reliability coefficient 1 - reliability coefficient 2 Comparing 2 scores

SEM equation

SD x sqrt of 1 - rxx

History source may not always be accurate Must decide what is important

Test developers who publish history tests must maintain the content validity of their tests. What challenges do they face in doing so?

Parallel Form Reliability

The correlation coefficient determined by comparing the scores of the two similar measuring devices (or forms of the same test) administered to the same people at one time.

Reliability

The degree of consistency in measurement The pre-requisite to validity and is a minimum, but not as a sufficient criterion for validity

Construct Validity

The extent to which there is evidence that a test measures a particular hypothetical construct.

Sampling Error

The level of confidence in the findings of a public opinion poll. The more people interviewed, the more confident one can be of the results.

Analogy of a rubber tape measure

The smaller the standard deviation, the less rubber there is, the more reliable it is, the closer it is to the mean

Content Sampling

The variety of the subject matter contained in the items

Broad Item Strategies

Think about verbs and behaviors related to the construct Remember to change the polarity for some items to promote paying attention Emotions related to behavior related to the construct Think about time related behaviors

Systematic Error

Typically constant to what is presumed to be the true value of measured variable

Depression False Negative Consequences

Undermining the need for help Misdiagnosis

FERT

Utility: Less likely to ruin reputation, narrows down choices, less harm to self-esteem Face Validity: Take test more seriously Content Validity: How did people fail in the past? How can they succeed? Goal is criterion validity

Divergent Evidence

Variables that not theoretically correlated, should go down - Jealousy and social desirability - Marital satisfaction - DAS should NOT have strong correlations with: Career satisfaction, introversion/extroversion

Convergent Evidence

Variables that theoretically should correlate Usually has high correlations - Jealousy is measured with self-report and peer report - Marital satisfaction: DAS should correlate well with - Overall problem-solving, life-satisfaction, lower anxiety

False Positive Rate

We say they will be successful, but they are not

False Negative Rate

We say they won't be successful, but they are

Hit Rate

What % of people do we correctly classify

Miss Rate

What % of people do we incorrectly classify

More items= more reliability (cost) Reliability is pre-requisite for validity, must be consistent (cost) Children might lose attention (benefit) There might be some questions missing (benefit)

What are other situations in which a reduction in test size or the time it takes to administer a test might be desirable? What are the arguments against reducing test size?

Disparate Impact

a condition in which employment practices are seemingly neutral yet disproportionately exclude a protected group from employment opportunities

percent agreement

a measure of interrater reliability in which the percentage of times that the raters agree is computed

Criterion-Related Validity

a measure of validity based on showing a substantial correlation between test scores and job performance scores Measures proficiency of people that are hired already can create restriction of range - Must strive for large range Incremental validity - Does adding this aspect improve prediction? i.e. Does adding the SAT score improve HS GPA prediction

Average proportional distance (APD)

a measure used to evaluate the internal consistency of a test that focuses on the degree of difference that exists between item scores

test-retest reliability

a method for determining the reliability of a test by comparing a test taker's scores on the same test taken on separate occasions Measures correlation between 2 administrations: when correlation increases, reliability increases Most appropriate for ability and traits Not usually appropriate for states Practice effects reduce this

Factor Analysis

a statistical procedure that identifies clusters of related items (called factors) on a test; used to identify different dimensions of performance that underlie a person's total score.

Error Variance

deviation of the individual scores in each group from their respective group means

Validity

how well a test measures what it is supposed to measure in a particular context i.e. Can personality tests predict salesmanship?

What is the value of face validity from the perspective of the test-user?

i.e. CBCL (Child Behavior Check List) - Widely used: Internalization (Depressive symptoms) Externalization (Disruptive behavior) - Empirically validated - Version for parents and teachers - Parents balk at questions if it isn't face valid. If it was face valid, people take it more seriously

Methodological Error

interviewers may not have been trained properly, the wording in the questionnaire may have been ambiguous, or the items may have somehow been biased to favor one or another of the candidates.

Weighted Kappa

takes into account the consequence of disagreements between examiners, used when all disagreements are not equally consequential

Content Validity

the extent to which a test samples the behavior that is of interest Abstract constructs can't be measured directly - i.e. Openness to experience? - Do you like trying new foods. How many items do we need? Can be established by other instruments, interview experts, and read about construct

True Variance

the portion of the differences in scores that is (theoretically) real

Confidence Intervals

the range on either side of an estimate that is likely to contain the true value for the whole population 90% - X(observed score) +/- 1.65 (SEM) 95% - X(observed score) +/- 1.96 (SEM) 99% - X(observed score) +/- 2.58 (SEM)

Alternate Forms

two or more different forms of a test designed to measure exactly the same skills or abilities, which use the same methods of testing, and which are of equal length and difficulty. In general, if test takers receive similar scores on _________ of a test, this suggests that the test is reliable. Intended to be identical, but untested

Voir tous les ensembles d'études

Psych Assess - Exam III

Ensembles d'études connexes

KS3 Industrial Revolution

A&P II Chapter 27 & 28 Lab

AP Computer Science Principles - Semester 1 Final Review

Planets of our solar system

MKT blackboard quiz 3

General Insurance 10% (1 of 10)

Text completions

Study Set 10

End-of-Life Care

History

Entrepreneurship Chapter 2

GCOM Final

Real Time 3D Week 1 Tutorial

APUSH Chapters 27-29 Quotes

Social Aspect exam 2

MGT 6890 Chapter 8

bus 360

BUS100 Final Set

Ch 38: Abdominal Assessment

Chapter 3: Settling the Northern Colonies 1619-1700