Test Development
Grade-based
if performance is a function of grade
Stanine
if raw scores are transformed into scores that range from 1 to 9
Age-based
if test performance is a function of age
Constructing Relevant Test Items
Items can be classified as either selection-type or supply-type.
Scale Values
are assigned to different amounts of the trait, attribute, or characteristic being measured
Item difficulty
calculate the proportion of the total number of test-takers who answered the item correctly
Item fairness
degree a test item is biased; a biased testitem is an item that favors one particular group of examinees in relation to another when differences in group ability are controlled
Interval
difference between 2 points differ by the same number of scale units
Item discrimination
indicate how adequately an item separates or discriminates between high scorers and low scorers on an entire test
Speed tests
the closer an item is to the end of the test, the more difficult it may appear to be; this is because test takers may not get to items near the end of the test before time runs out
Guessing
the issues surrounding guessing are more complex
Pilot Work
Also called pilot study or pilot research. Refers to the preliminary research surrounding the creation of a prototype of the test. The test developer attempts to determine how best to measure a targeted construct. May include literature reviews, experimentation, creation, revision and deletion of preliminary test items.
Item Analysis
Data from the tryout will be collected and test-takers' performance on the test as a whole and on each item will be analyzed. Statistical procedures are employed to assist in making judgments about which items are good as they are, which items need to be revised, and which items should be discarded
Scaling
Defined as the process of setting rules for assigning numbers in measurement. Process by which a measuring device is designed and calibrated and by which numbers (or other indices)
General Guidelines in Item Writing
Select the type of test item that measures the intended learning outcome most directly. Write the test item so that the performance it elicits matches the performance in the learning task. Write the test item so that the task is clear and definite. Write the test item so that it is free from nonfunctional material. Write the test item so that irrelevant factors do not prevent an informed student from responding correctly. Write the test item so that irrelevant clues do not enable the uninformed student to respond correctly. Write the test item so that the difficulty level matches the intent of the learning outcome, the age group to be tested, and the use to be made of the results. Write the test item so that there is no disagreement concerning the answer. Write the test item far enough in advance that they can be later reviewed and modified as needed. Write more test items than called for by the test plan.
Test Construction
Stage that entails writing test items, as well as formatting items, setting scoring rules, and designing and building a test
Test Conceptualization
Takes place once the idea for a test is conceived
Use of Item Response Theory (IRT) in Building and Revising Tests
Item response theory is a probabilistic model that attempts to explain the response of a person to an item
Ratio
meaningful zero point
Supply-type items
1. Completion 2. Essay (restricted response) 3. Essay (extended response)
Selection-type items
1. Multiple choice 2. True-false 3. Matching 4. Classification
5 Stages Test Development
1. Test conceptualization 2. Test construction 3. Test tryout 4. Item analysis 5. Test revision
Due for revision when
1. The stimulus materials look dated and current test-takers cannot relate to them. 2. The verbal content of the test, including the administration instructions and the test items, contains dated vocabulary that is not readily understood by current test-takers. 3. Certain words or expressions in the test items or directions may be perceived as inappropriate or even offensive to a particular group. 4. Test norms are no longer adequate due to changes in group membership. 5. Reliability or validity of the test can be significantly improved by a revision. 6. The theory on which the test was originally based has been improved significantly, and these changes should be reflected in the design and content of the test.
Test Tryout
Happens once a preliminary form of the test has been developed. The test is administered to a representative sample of test-takers under conditions that stimulate the conditions under which the final version of the test will be administered. The test should be tried out on people who are similar in critical respects to the people for whom the test was designed. Issue of number of people on whom the test should be tried out. Rule of thumb: no fewer than 5 and preferably as many as 10 for each item. The more subjects in the tryout, the better.
Test Revision As a Stage in New Test Development
Having conceptualized the new test, constructed it, tried it out, and item-analyzed it, what remains in to act judiciously on all the information and mold the test into its final form
Types of Scales
Nominal, Ordinal,Interval, Ratio, Age-based, Grade-based,Stanine, Unidimensional or Multidimensional (e.g. height, achievement), Comparative or Categorical
Test Revision
Refers to action taken to modify a test's content or format for the purpose of improving the test's effectiveness as a tool of measurement. Usually based on item analyses, as well as related information derived from the test tryout
Selection-type Items
Require students to select from a predetermined list of potential answers. These questions include multiple choice, true/false, matching, and classification questions. often viewed as less challenging in terms of the thinking skills required to answer them. However, when well written, they can measure higher levels of thinking, not simply the recalling of facts. The writing of these test items can be challenging, and they frequently take more time to construct. When well written, though, they are easier to score and can provide a more objective method of assessment than do created response items.
Test Revision in the Life Cycle of an Existing Test
Tests should be revised "when significant changes in the domain represented, or new conditions of test use and interpretation, make the test inappropriate for its intended test use
Supply-type Items
These questions measure the student's ability to communicate effectively, not just their understanding of content. Include extended answer and essays. Most teachers have to balance several considerations when choosing what to include in a test, including the fair assessment of knowledge or skills, number of students taking the test, and amount of time available to score the test. This type of question is often easier to write, but can require more time to score. Scoring is often less reliable because it is more subjective.
Test Development
Umbrella term for all that goes into the process ofcreating a test (Cohen, Swerdlik, Sturman, 2013). May be due to a stimulus about an emerging social phenomenon or pattern or behavior, or in response to a need to assess mastery in an emerging occupation or profession.
Writing Items
What range of content should the items cover? Which of the many different types of items should be employed? How many items should be written in total and for each content area covered?
Item validity
provides an indication of the degree to which the test is measuring what it purports to measure
Item reliability
provides an indication of the internal consistency of a test; the higher this index, the greater the test's internal consistency. Equal to the product of the item score standard deviation and the correlation between the item score and the total test score
Nominal
purpose of naming
Ordinal
ranking