Chapter 11 - How do we develop a test?-
Specific guidelines for essay and interview questions
-Use essay items appropriately -consider the time necessary for response -prepare an answer key -score essays anonymously -use multiple independent scores
Subjective items
-essay -interview questions -projective techniques -sentence completion (evidence of integrate reliability [IRR] is of particular importance for subjective tests)
Complex item formats
-performance assessments -simulations -portfolios
Best practices for test taker instructions
-are simple, concise, and written at a low reading level -are delivered by test administrator from a script -appear in writing at beginning of test -include encouragement for answering accurately and honestly -include "context" for answering (e.g., "when responding, think about...")
Scoring methods - Cumulative model
-the most common method for deterring an individual's final test scores the more the test taker responds in a particular fashion (either with "correct" answers or are consistent with a particular attribute), the more the test taker exhibits the attribute being measured (e.g., multiple choice questions) -accumulates to a "raw score" for test -yields interval-level data -scoring used for Dr. Glaser's in-class exams! (# or % correct)
Forced-choice
-similar to multiple-choice format usually used for personality and attitude tests rather than for knowledge tests -usually two or more words/phrases that appear unrelated but are equally acceptable -can sometimes be aggravating as respondents may want to respond "it depends"! -more difficult for respondents to guess or fake -has little face validity (no apparent connection wth the stated purpose of the test) (this may produce poor responses from test takers
Composing the "test items"
-test questions are not always traditional "questions" -stimuli can often be statements, pictures, or incomplete sentences that a test taker must respond to -refer to stimuli as "test items"
First steps in test development includes
- Defining the Testing Universe - Defining the Target Audience -Defining the Test Purpose These steps provides foundation for all other development activities
Defining the Test Purpose
- Identify exactly what the test will measure - How test scores will be used (e.g., normative or criterion approach)
Writing effective items (overall guidelines for multiple-choice items)
-Identify item topics by consulting the test plan -ensure each item is based on an important objective -write each item in a clear and direct manner -use vocabulary and language appropriate for target audience -avoid slang or colloquial language -make all items independent -ask someone else to review items to reduce ambiguity and inaccuracies BE PREPARED TO WRITE AT LEAST TWICE AS MANY ITEMS THAN WHAT WILL ULTIMATELY BE USED IN THE FINAL VERSION!
Objective items
-Multiple choice -True/false -Forced choice
Subjective test formats
-No one correct response -Interpretation of response is judgement call of the scorer/interpreter Examples: essay questions, projective tests
Objective test formats
-One response is designated as correct -Easier to establish as reliable/valid than subjective test format Examples: multiple choice, true/false, fill-in-the-blank
Sample response styles/sets
-Social desirability -Acquiescence -Random responding -Faking
What to include in administrator instructions
-whether the test should be administered in a group and/or individually -specific requirements for the test administration location, such as privacy, quiet, and comfortable chairs, tables, or desks -required equipment such as No. 2 pencils, a computer with a DVD or CD-ROM drive, and/or Internet access -time limitations or the approx. time for completing the test when there are no time limits -a script for the administrator to read to test takers, including answers to queries that test takers are likely to ask -credentials or training required for the test administrator
Best practice for scoring instruction
-written to ensure all persons follow same scoring process -explain how score relate to construct measured (what does high/low score mean)
What are the three primary reasons to develop a new test?
1. Meet needs of a special group of test takers 2. Sample behaviors from a newly defined test domain 3. Improve accuracy of test scores for intended purposes
Defining the construct and the content to be measured
Definition should be: - concise - operationalized in terms of observable and measurable behaviors
Scoring methods - Categorical model
Places test takers in a particular group or class (e.g., displays a pattern of responses that indicates a clinical diagnosis of a certain psychological disorder - or attributes that make up a behavioral trait) -yields nominal data because it places test takers into categories example: Beck Depression Inventory (BDI) -> higher total scores indicate more severe depressive symptoms
Test formats/ two elements
Refers to the type of questions that the test will contain/ - Stimulus: to which the test taker responds (ex. "which one of the following is true?" - Mechanism: for response (ex., A, B, C, or D)
Scoring methods - Ipsative model
Requires test takers to choose among the constructs the test measures (forced choice format) - picks statement "most like me" and "least like me" -comparisons CANNOT be made across individuals (Ipsative model can only provide information where test takers stand relative to themselves on the constructs the test is designed to measure)
A comparison of objective and subjective formats (sampling and construction)
Sampling -objective: provides opportunity to sample universe -subjective: limited to # of questions/topics to which test taker can respond in one session; validity based on content may suffer Construction -objective: require much thought and development time -subjective: easier to construct and revise; better suited for testing higher order skills
A comparison of objective and subjective formats (scoring and response set)
Scoring -objective: simple and can be done by computer -subjective: time consuming and relies on judgment Response set -objective: can guess correct response -subjective: may bluff or pad answers with superfluous or excessive information; scorers may be influenced by irrelevant factors
Defining the Target Audience
Target audience: group of individuals who will take the test - List characteristics of the persons who will take the test - Consider appropriate reading level, possible disabilities, likelihood of answering honestly
Developing a Test Plan
Test plan - specifies the characteristics of the test, including an operational definition of the construct and content to be measured (testing universe), the format for the questions, and the administration and scoring of the test
Defining the Testing Universe
Testing Universe: body knowledge or behaviors that the test represents - Prepare working definition of construct -If abstract construct, review psychological literature to locate studies that explain construct and find tests that measure construct
Faking and MMPI and Validity scales
The MMPI-2 is not a valid measure of a person's psychopathology or behavior if the person taking the test does so in a way that is not honest or frank. A person may decide, for whatever reasons, to overreport (exaggerate) or underreport (deny) the behavior being assessed by the test.
Administering and scoring the test - questions to consider
Will influence the format and content of the test items -will the test be administered in writing, orally, or by computer? -how much time will the test takers have to complete the test? -will the test be administered to groups or individuals? -will the test be scored by a test publisher, the test administrator, or by test takers? -what type of data is the test expected to yield? (will the test scores provide the information required by the purpose of the test?)
Response Bias and Response styles/sets
a source of error in test scores that come from the test takers themselves Response styles/sets - a pattern of responding to test items that result in false or misleading information ->limit the accuracy and usefulness of test scores
Writing the administration instructions
administrator instructions -> test taker instructions -> scoring instructions
Social desirability
provide answers that are socially acceptable or present test taker in favorable light
Random responding
provides answer to test items in a random fashion without reading/considering them -likely to occur when: test takers lack the necessary skills (such as reading) to take the test, do not wish to be evaluated, or lack the attention span necessary to complete the task.
Faking
provides answers to test items in a way that will cause a desired outcome or diagnosis -can be faking good or faking bad -one way to prevent this is to use a forced-choice format for the test items
Acquiescence (yeah-saying)
tendency to agree with ideas/behaviors [and also nay-saying] -may have cultural implications (deference and politeness) test and survey users should balance items for which the correct response would be positive with an equal number of items for which the correct response would be negative
Test taker instructions
test administrator typically delivers instructions by reading prepared script -instructions also typically appear in writing -test taker needs to: --know where to respond --know how to respond --have specific directions for each type of item
Methods for offsetting effects of acquiescence
test and survey users should balance items for which the correct response would be positive with an equal number of items for which the correct response would be negative -test scorer reverses the response numbers of negative items and uses cumulative model of scoring (don't forget to change labels!)
True/false
On a shortest containing only true/false items, guessing can have large impact on scores
Multiple choice
Used most often Many people use it for pre-employment tests, standardized achievement tests, and classroom tests -partial sentence (stem) followed by one correct answer and a number of incorrect answers (distracters)