Exam 3 psych test/mesurement
A written test is to paper and pencil where a practical test is to
Demonstrating skills
Kevin did not pass his cardiology board exam (barely!). Based on his score, we could infer that he is not knowledgeable enough about cardiology to be board certified at this time. Which type of evidence refutes (i.e., goes against) that inference?
evidence based on reliability
Sources of validity evidence include all of the following EXCEPT...
evidence based on response processes {evidence based on inter-rater agreement} evidence based on test content evidence based on relations with other variables
Kevin did not pass his cardiology board exam (barely!). Based on his score, we could infer that he is not knowledgeable enough about cardiology to be board certified at this time. Which of the following is criterion-related validity evidence to support that inference?
there is a strong, positive correlation between scores on the board exam and patient outcomes
Tests are discriminatory if...
there is evidence of testing bias
Suppose a university is selecting applicants based on their SAT score. If the validity coefficient is .9 ...
they are likely to have more true positives than false positives
Heterotrait-heteromethod correlations
different trait and method (BELOW .2, SURVEY FOR ENVIRONMENTAL ACTIVISM W/SURVEY AND HOW MANY USERS SOMEONE HAS FOR ONLINE VIDEO GAMES)
Heterotrait-monomethod correlations
different traits w/same method (BELOW .3, CORRELATION BTW DOG LOVER AND CONSIENTIOUSNESS USING SAME TYPE OF SCALE)
how do we make a test fair
ensuring content validity, use within group norming,
Kevin did not pass his cardiology board exam (barely!). Based on his score, we could infer that he is not knowledgeable enough about cardiology to be board certified at this time. Which of the following is content validity evidence to support that inference
the test does not measure anything irrelevant
in terms of construct validity what questions are we asking?
-are test scores positively, strongly associated w/scores on similar constructs -are they associated with the same construct -is the scores unrelated to scores on dissimilar constructs
why do we hate cut scores, what can we do to help it?
-because they tend to be on the border line of any given point -round up
Ethical Principles of psychology
1 Bases of assessment 2 Use of assessment 3 Informed consent 4 Release of test data 5 Test construction 6 Interpretation of assessment results 7 Assessment by unequal persons 8 Obsolete tests/Outdated results 9 Test scoring/Interp. Services 10 Explain assessment result 11 Maintain test security
When developing a test, what are the recommended steps to improve the content validity?
1) define the testing universe 2) develop test specifications 3) establish a test format 4) construct test questions
test specifications
A documented plan containing details about a tests content
Objective criterion
A measurement that is observable and measurable, such as the number of accidents on the job.
Common problem with criterion related validity
Restriction of range
Which of the following is NOT an example of validity evidence?
Scores on this quiz are consistent with scores on a parallel form of this quiz
Concurrent evidence of validity
A method for establishing evidence of validity based on a test's relationships with other variables in which test administration and criterion measurement happen at roughly the same time. NOT PREDICTION Provide information about the present and the status quo
Discrimination index
A statistic that compares the performance of those who made very high test scores with the performance of those who made very low test scores on each item. ideal value is 30 or higher
Quantitative item analysis
A statistical analysis of the responses that test takers gave to individual test questions.
Psychologists should obtain informed consent for routine educational assessment
Based on the Ethical Principles of Psychologists and Code of Conduct outlined by the American Psychological Association (APA), which of the following is NOT an ethical principle involving assessments?
A large, positive correlation between individuals' scores on the ACT and SAT would be...
Black individuals who scores lower than White individuals on the SAT are expected to perform the same in college
examining the content of a test to evaluate the validity of an inference or decision is...
Content validity
How would you measure test-retest reliability?
Correlate individuals' scores on two occasions of a test
face validity
Do the questions seem relevant and important to the test taker NOT to demonstrate evidence of validity on a test
what would an expert review look like
Expert review and read how relevant each question/item is to the attribute the test measures
What would expert content categorization look like
Experts look at each questionAnd try to match it with the construct with trying to get 40% on most missed questions on last exam. 10% for item analysis and 40% for validity
The textbook tells the story of Michael who was administered an intelligence test by his school. The school determined he was "retarded," and they moved him into a special education class. When his parents asked about his intelligence score, the prinicipal said it was better to leave such matters to school authorities. Which ethical principle of assessments was violated?
Explaining Assessment Results
To have bias is to
Having poor evidence of content validity
Internal structure is to
Homogenous or Heterogeneous
What do integrity tests measure
Individual attitudes and experiences towards honesty, dependability, trustworthiness, reliability and prosocial behavior
Which of the following questions is relevant to content validity?
Is the test content representative? Do the test questions measure anything irrelevant? Does the test fail to assess any important concepts?
What two questions should we ask interms of validity
Is there evidence supporting the interpretation of the test scores? Is there evidence supporting the proposed use of test scores?
Assume a distribution of scores is negatively skewed. What are the mean, median, and mode in order of smallest to largest?
Mean, Median, Mode
Interitem correlations tend to be relatively
Small in size, often in .15 to .20 range
item difficulty
The percentage of test takers who answer a question correctly divide number of persons who enter correct by total respond to question. 0-.2= too hard .9-1= too easy most variation=.5
what is a construct?
an attribute, trait, or characteristic thats not directly observable but can be inferred by looking at observable behaviors (aggression or knowledge)
content validity ratio
an index that describes how essential each test item is to measuring the attribute or construct that the item is supposed to measure. ranges from -1.00 to 1. we compare minimum values
what questions are we asking with relations to criterion (criterion related validity)
are scores related to things we expect (what are the meaningful outcomes)
What is one way to examine content validity after a test is developed?
ask experts to match every test question to the content area that is covered
what can one do to mitigate problems associated with score differences by race?
changing the decisions we make based on test scores, taking scores into consideration but not basing contingency on them (consider other factors)
Suppose you are developing a selection test for a company (i.e., a test that will help the company make hiring decisions). Which of the following could you do to help define the testing universe?
conduct a job analysis
A large, positive correlation between individuals' scores on the ACT and SAT would be...
convergent
validity coefficient
correlation coefficient between a test score (predictor) and a performance measure (criterion). It is desirable to get the widest range of test scores
multitrait-multimethod matrix
correlation matrix displays data on convergent and discriminant evidence trait=construct; method=how the construct is measured
SPSS can be used to do all of the following EXCEPT...
help write good scale items display the distribution of scores on a scale {provide Cronbach's alpha of a scale} help inform decisions about which scale items to keep or remove
For a good survey item, the item-total correlation is
high if the survey is homogeneous
Restriction of range
if there is not a full range of scores on one of the variables in the association, it can make the correlation appear smaller than it really is. The validity coefficient Will only be calculated from the restrictive group/is likely to be lower than if all candidates have been hired and included in the study.
If a test question has a content validity ratio of .9 ...
it is a good test question; experts agree that it is important and relevant
Project Stage 4 and the final exam are both intended to assess students' overall knowledge of the course material. The correlation between scores on Project Stage 4 and the final exam would be a...
monotrait heter method
Cut scores are typically...
problematic because measurement is not perfectly reliable
Monotrait-monomethod correlations
same trait measured with the same method (test retest reliability, ABOVE .7, HAVE PEOPLE TAKE YOUR SCALE TWICE)
convergent validity is to
scores being strongly, positively associated with scores on similar constructs
Which of the following is evidence related to criterion-related validity?
scores on the SAT are moderately, positively associated with college GPA
Monotrait-heteromethod correlations
the same trait measured with different methods (DOG LOVERS AND HOW MANY DOGS THEY OWN) ALSO ABOVE .6
Which of the following is a subjective criterion?
supervisor ratings of performance
what steps could we take to examine discriminate validity
take two unrelated topics, then correlate scores on both scales (should be below .3)
item-total correlation
the correlation between scores on individual items with the total score on all items of a measure. higher on homogeneous, calculated through Pearson product moment correlation formula. unlike the interitem correlation which only assesses every other item. RANGE= .2-.4
Which of the following is a common problem when examining predictive validity?
the full range of data is unavailable
Validity is about...
the quality of inferences individuals' draw from a test score
If Cronbach's alpha of a scale is .60...
the reliability of the scale is inadequate (i.e., not acceptable)
Scenario: using the same assessment on two different situations, didn't allow participants to consent to take the test/to tell them why they are taking the test, items between the test didn't correlate and had overall low correlation (validity was miscalculated)
use of assessment informed consent test construction
Scenario: based job applicants on the intellect for job analysis, didn't provide feedback, mislead applicants who scored high, management assumed that those with lesser intellect would be less likely to leave the job
use of assessment explaining assessment results