Assessment Vocabulary
Primary trait scoring
A scoring procedure by which products or performances are evaluated by limiting attention to a single criterion or a few selected criteria. These criteria are typically based upon the trait or traits that are most essential to a good performance. For example, if a student is asked to write to the Department of Energy urging the opening or closing of a nuclear power plant, the primary traits might be the ability to communicate persuasively and the correct application of scientific knowledge to back up one's position. Scorers would attend only to these two traits.
Holistic Scoring
A single, overall score is assigned to a performance
Performance criteria
A description of the characteristics that define the basis on which the response to the task will be judged. Performance criteria are expressed as a rubric or scoring guide. Anchor papers or benchmark performances may be used to identify each level of competency in the rubric or scoring guide.
Task (as in performance task)
A goal-directed assessment exercise. For example, a particular math problem to solve, a lab to do, or a paper to write.
Evaluation
A judgement regarding the quality or worth of the assessment results. Evaluations are usually based on multiple sources of assessment information. For example, "The information we collected indicates that students are performing above expectations"
Indicator
A more specific description of an outcome in terms of observable and assessable behaviors. It specifies what a person who possesses the qualities articulated in an outcome understands or can do. For example, a student may demonstrate his or her understanding of problem solving by finding a solution to a mathematics problem. The solution is the_____________.
Analytical trait scoring
A performance is judged several times along several different important dimensions or traits of the performance. Use of a scoring rubric and anchor papers for each trait is common. An example might be the judging of student problem solving for understanding the problem, correct use of procedures and strategies, and the ability to communicate clearly what was done.
Portfolio
A purposeful, integrated collection of student work showing effort, progress, or degree of proficiency.
Generalized rubric
A rubric that can be used to score performance on a large number of related tasks. For example, to score problem-solving and communication skills on any math problem-solving problem.
Task-specific rubric/scoring
A scoring guide or rubric that can only be used with a single exercise or performance task. A new rubric is developed for each task.
Criterion-referenced assessment
An assessment designed to reveal what a student knows, understands, or can do in relation to specific performance objectives. They are used to identify students strengths and weaknesses in terms of specific knowledge or skills which are the goals of the instructional program.
Dispositions
Affective outcomes such as flexibility, perseverance, self-confidence, and a positive attitude toward science and mathematics. Some new assessments attempt to measure these outcomes.
Norm-referenced assessment
An assessment designed to reveal how an individual student's performance or test result ranks or compares to that of an appropriate peer group.
Rubric
An established and written-down set of criteria for scoring or rating students' performance on tests, portfolios, writing samples, or other performance tasks.
Standards (performance)
An established level of achievement, quality of performance, or degree of proficiency expected of students. Examples include a cut-off score on a multiple-choice test or an expected benchmark performance on a performance assessment.
Validity
An indication of how well an assessment actually measures what it is supposed to measure rather than extraneous features. For example, a valid assessment of mathematics problem solving would measure the student's ability to solve a problem and not the ability to read the problem.
Reliability
An indication of the consistency of scores across evaluators, over time, or across different versions of the test. An assessment is considered reliable when the same answers receive the same score no matter when the assessment occurs or how or who does the scoring, or when students receive the same scores no matter which version of the test they took.
Alternative Assessment
Any type of assessment in which students create a response to a question, as opposed to assessments in which students choose a response from a given list, such as multiple-choice, true/false, or matching. Can include short answer questions, essays, performance assessments, oral presentations, demonstrations, exhibitions, and portfolios.
Authentic (assessment)
Assessment tasks that elicit demonstrations of knowledge and skills in ways that resemble "real life" as closely as possible, engage students in the activity, and reflect sound instructional practice.
On-demand assessment
Assessment that takes place at a predetermined time and place. State tests, SATs, and most final exams are examples
Standardized assessments
Assessments that are administered and scored in exactly the same way for all students. They are typically mass-produced and machine scored and are designed to measure skills and knowledge that are thought to be taught to all students in a fairly standardized way. Performance assessments can also be standardized if they are administered and scored in the same way for all students.
Performance assessment
Direct, systematic observation of actual student performances according to pre-established performance criteria.
Anchor papers or benchmark performances
Examples of performances that serve as a standard against which other papers or performances may be judged, often used as examples of performances at different points on a scoring rubric. In math problem solving, for example, anchor papers are selected from actual student works that are considered to exemplify the quality of a performance level of 1,2,3, and so forth. If used with analytical scoring, there may be anchor papers or benchmark performances for each trait being assessed. Frequently there are also anchors for each grade level assessed.
Benchmark performance
See "anchor papers".
Criteria
See "performance criteria"
Standards (content or curriculum)
Statements of what should be taught. For example, the NCTM.
Selected-response assessments
Students select the correct response from among a set of responses offered by the developer of the assessment. Multiple-choice and matching tests are examples.
Assessment
The act of collecting information about individuals or groups of individuals in order to understand them better.
Generalizability
The extent to which the performances sampled by a set of assessment items/tasks are representative of the broader domain being assessed. For example, can we generalize about a student's problem-solving ability in general from the performance of the student on a specific set of 10 problem-solving tasks?
Open-response tasks
The kind of performance required of students when they are required to generate an answer, rather than select it from among several possible answers, but there is no single, correct response. An is example is "There are four pieces of wood, each measuring seven feet. If you used them as a fence around your square yard, how large an area would you create?"
Open-ended tasks
The kind of performance required of students when they must generate a solution to a problem or perform a task when there is no single, right answer. Example: "Below you see a bar graph with no labels. What might this be a graph of?"
Scale
The range of scores possible on an individual item or task. Performance assessment items are typically scored on a 4-to-6 point scale, compared to a scale of 2 (right/wrong) on multiple choice items.
Context (of an alternative assessment)
The surrounding circumstances within which the assessment is embedded. For example, problem solving can be assessed in the context of a specific subject (for example, mathematics) or in the context of a real-life laboratory problem requiring the use of mathematical, scientific, and communication skills. Or, science process skills can be assessed in the context of a large-scale, high-stakes assessment or a classroom grading context.
Extraneous interference (or error)
Things that might cause us to mismeasure students. For example, excessive reading on a mathematics test, or role playing on a science test.