Chapter 11 - How do we develop a test?-

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Specific guidelines for essay and interview questions

-Use essay items appropriately -consider the time necessary for response -prepare an answer key -score essays anonymously -use multiple independent scores

Subjective items

-essay -interview questions -projective techniques -sentence completion (evidence of integrate reliability [IRR] is of particular importance for subjective tests)

Complex item formats

-performance assessments -simulations -portfolios

Best practices for test taker instructions

-are simple, concise, and written at a low reading level -are delivered by test administrator from a script -appear in writing at beginning of test -include encouragement for answering accurately and honestly -include "context" for answering (e.g., "when responding, think about...")

Scoring methods - Cumulative model

-the most common method for deterring an individual's final test scores the more the test taker responds in a particular fashion (either with "correct" answers or are consistent with a particular attribute), the more the test taker exhibits the attribute being measured (e.g., multiple choice questions) -accumulates to a "raw score" for test -yields interval-level data -scoring used for Dr. Glaser's in-class exams! (# or % correct)

Forced-choice

-similar to multiple-choice format usually used for personality and attitude tests rather than for knowledge tests -usually two or more words/phrases that appear unrelated but are equally acceptable -can sometimes be aggravating as respondents may want to respond "it depends"! -more difficult for respondents to guess or fake -has little face validity (no apparent connection wth the stated purpose of the test) (this may produce poor responses from test takers

Composing the "test items"

-test questions are not always traditional "questions" -stimuli can often be statements, pictures, or incomplete sentences that a test taker must respond to -refer to stimuli as "test items"

First steps in test development includes

- Defining the Testing Universe - Defining the Target Audience -Defining the Test Purpose These steps provides foundation for all other development activities

Defining the Test Purpose

- Identify exactly what the test will measure - How test scores will be used (e.g., normative or criterion approach)

Writing effective items (overall guidelines for multiple-choice items)

-Identify item topics by consulting the test plan -ensure each item is based on an important objective -write each item in a clear and direct manner -use vocabulary and language appropriate for target audience -avoid slang or colloquial language -make all items independent -ask someone else to review items to reduce ambiguity and inaccuracies BE PREPARED TO WRITE AT LEAST TWICE AS MANY ITEMS THAN WHAT WILL ULTIMATELY BE USED IN THE FINAL VERSION!

Objective items

-Multiple choice -True/false -Forced choice

Subjective test formats

-No one correct response -Interpretation of response is judgement call of the scorer/interpreter Examples: essay questions, projective tests

Objective test formats

-One response is designated as correct -Easier to establish as reliable/valid than subjective test format Examples: multiple choice, true/false, fill-in-the-blank

Sample response styles/sets

-Social desirability -Acquiescence -Random responding -Faking

What to include in administrator instructions

-whether the test should be administered in a group and/or individually -specific requirements for the test administration location, such as privacy, quiet, and comfortable chairs, tables, or desks -required equipment such as No. 2 pencils, a computer with a DVD or CD-ROM drive, and/or Internet access -time limitations or the approx. time for completing the test when there are no time limits -a script for the administrator to read to test takers, including answers to queries that test takers are likely to ask -credentials or training required for the test administrator

Best practice for scoring instruction

-written to ensure all persons follow same scoring process -explain how score relate to construct measured (what does high/low score mean)

What are the three primary reasons to develop a new test?

1. Meet needs of a special group of test takers 2. Sample behaviors from a newly defined test domain 3. Improve accuracy of test scores for intended purposes

Defining the construct and the content to be measured

Definition should be: - concise - operationalized in terms of observable and measurable behaviors

Scoring methods - Categorical model

Places test takers in a particular group or class (e.g., displays a pattern of responses that indicates a clinical diagnosis of a certain psychological disorder - or attributes that make up a behavioral trait) -yields nominal data because it places test takers into categories example: Beck Depression Inventory (BDI) -> higher total scores indicate more severe depressive symptoms

Test formats/ two elements

Refers to the type of questions that the test will contain/ - Stimulus: to which the test taker responds (ex. "which one of the following is true?" - Mechanism: for response (ex., A, B, C, or D)

Scoring methods - Ipsative model

Requires test takers to choose among the constructs the test measures (forced choice format) - picks statement "most like me" and "least like me" -comparisons CANNOT be made across individuals (Ipsative model can only provide information where test takers stand relative to themselves on the constructs the test is designed to measure)

A comparison of objective and subjective formats (sampling and construction)

Sampling -objective: provides opportunity to sample universe -subjective: limited to # of questions/topics to which test taker can respond in one session; validity based on content may suffer Construction -objective: require much thought and development time -subjective: easier to construct and revise; better suited for testing higher order skills

A comparison of objective and subjective formats (scoring and response set)

Scoring -objective: simple and can be done by computer -subjective: time consuming and relies on judgment Response set -objective: can guess correct response -subjective: may bluff or pad answers with superfluous or excessive information; scorers may be influenced by irrelevant factors

Defining the Target Audience

Target audience: group of individuals who will take the test - List characteristics of the persons who will take the test - Consider appropriate reading level, possible disabilities, likelihood of answering honestly

Developing a Test Plan

Test plan - specifies the characteristics of the test, including an operational definition of the construct and content to be measured (testing universe), the format for the questions, and the administration and scoring of the test

Defining the Testing Universe

Testing Universe: body knowledge or behaviors that the test represents - Prepare working definition of construct -If abstract construct, review psychological literature to locate studies that explain construct and find tests that measure construct

Faking and MMPI and Validity scales

The MMPI-2 is not a valid measure of a person's psychopathology or behavior if the person taking the test does so in a way that is not honest or frank. A person may decide, for whatever reasons, to overreport (exaggerate) or underreport (deny) the behavior being assessed by the test.

Administering and scoring the test - questions to consider

Will influence the format and content of the test items -will the test be administered in writing, orally, or by computer? -how much time will the test takers have to complete the test? -will the test be administered to groups or individuals? -will the test be scored by a test publisher, the test administrator, or by test takers? -what type of data is the test expected to yield? (will the test scores provide the information required by the purpose of the test?)

Response Bias and Response styles/sets

a source of error in test scores that come from the test takers themselves Response styles/sets - a pattern of responding to test items that result in false or misleading information ->limit the accuracy and usefulness of test scores

Writing the administration instructions

administrator instructions -> test taker instructions -> scoring instructions

Social desirability

provide answers that are socially acceptable or present test taker in favorable light

Random responding

provides answer to test items in a random fashion without reading/considering them -likely to occur when: test takers lack the necessary skills (such as reading) to take the test, do not wish to be evaluated, or lack the attention span necessary to complete the task.

Faking

provides answers to test items in a way that will cause a desired outcome or diagnosis -can be faking good or faking bad -one way to prevent this is to use a forced-choice format for the test items

Acquiescence (yeah-saying)

tendency to agree with ideas/behaviors [and also nay-saying] -may have cultural implications (deference and politeness) test and survey users should balance items for which the correct response would be positive with an equal number of items for which the correct response would be negative

Test taker instructions

test administrator typically delivers instructions by reading prepared script -instructions also typically appear in writing -test taker needs to: --know where to respond --know how to respond --have specific directions for each type of item

Methods for offsetting effects of acquiescence

test and survey users should balance items for which the correct response would be positive with an equal number of items for which the correct response would be negative -test scorer reverses the response numbers of negative items and uses cumulative model of scoring (don't forget to change labels!)

True/false

On a shortest containing only true/false items, guessing can have large impact on scores

Multiple choice

Used most often Many people use it for pre-employment tests, standardized achievement tests, and classroom tests -partial sentence (stem) followed by one correct answer and a number of incorrect answers (distracters)

Chapter 11 - How do we develop a test?-

Ensembles d'études connexes

History 201 Final

Assessing Head and Neck

Sex, gender, and society Exam 1

Nutrition Hesi 1

Real Estate

OB Success: Normal Postpartum

Chapter 22 - Part 2

SRE and DevOps Engineer with Google Cloud

Chapter 2 Quiz

Chapter 4 Multiple choice questions

Programming in C

Relative age and fossils terms

NURS 111 Exam 3 Thyroid PrepU questions

FIN2100_2019

ACCT 315

Chapter 9 Labs

Kin. #2

MGMT 406 Strategic Mangement

Sukys Business Law 15th Edition Ch. 1-3

The Ovaries and Fallopian Tubes - Chapter 18 (Review Questions)