Assessment and Measurement

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Anchor papers or benchmark performances

Examples of performances that serve as a standard against which other papers or performances may be judged; often used as examples of performances at different points on a scoring rubric. In math problem solving, for example, anchor papers are selected from actual student works that are considered to exemplify the quality of a performance level of 1, 2, 3 and so forth. If used with analytical scoring, there may be anchor papers or benchmark performances for each trait being assessed. Frequently there are also anchors for each grade level assessed.

Benchmark performance

See anchor papers

Performance criteria

a description of the characteristics that define the basis on which the response to the talk will be judged. Performance criteria may be holistic, analytical trait, general or specific. Performance criteria are expressed as a rubric or scoring guide. anchor papers or benchmark performances may be used to identify each level of competency in the rubric or scoring guide.

Task (as in a "performance task")

a goal-directed assessment exercise. For example, a particular math problem to solve, a lab to do, or a paper to write.

evaluation

a judgment regarding the quality or worth of the assessment results. Evaluations are usually based on multiple sources of assessment information. For example,"The information we collected indicates that students are performing above expectations."

Indicator

a more specific description of an outcome in terms of observable and assessable behaviors. An indicator specifies what a person who possesses the qualities articulate. For example, a student may demonstrate his or her understanding of problem solving by finding a solution to a mathematics problem. The solution is an indicator.

Analytical trait scoring

a performance is judged several times along several different important dimensions or traits of the performance.Use of a scoring rubric and anchor papers for each trait is common. An example might be judging of student problem solving for understanding the problem, correct use of procedures and strategies, and the ability to communicate clearly what was done.

Portfolio

a purposeful, integrated collection of student work showing effort, progress, or degree of proficiency.

Generalized rubric

a rubric that can be used to score performance on a large number of related tasks. For example, to score problem-solving and communication skills on any math problem-solving problem.

Task-specific rubric/scoring

a scoring guide or rubric that can only be used with a single exercise or performance task. A new rubric is developed for each task.

Primary trait scoring

a scoring procedure by which products or performances and evaluated by limiting attention to a single criterion or a few selected criteria. These criteria are typically based upon the trait or traits that are most essential to a good performance. For example, if a student is asked to write to the Department of Energy urging the opening or closing of a nuclear power plant, the primary traits might be the ability to communicate persuasively and the correct application of scientific knowledge to back up one's position. Scorers would attend only to these two traits.

Holistic scoring

a single, overall score is assigned to a performance

dispositions

affective outcomes such as flexibility, perseverance, self-confidence and a positive attitude toward science and mathematics. Some new assessments attempt to measure these outcomes.

Norm-referenced assessments

an assessment designed to reveal how an individual student's performance or test result ranks or compares to that of an appropriate peer group.

Criterion-referenced assessment

an assessment designed to reveal what a student knows, understands, or can do in relation to specific performance objectives. Criterion-referenced assessments are used to identify student strengths and weaknesses in terms of specific knowledge or skills which are the goals of the instructional program.

Rubric

an established and written=down set of criteria for scoring or rating students' performance on tests, portfolios, writing samples, or other performance tasks.

Standards (performance)

an established level of achievement, quality of performance, or degree of proficiency expected of students. Examples include a cut-off score on a multiple-choice test or an expected benchmark performance on a performance assessment.

Reliability

an indication of consistency of scores across evaluators, over time, or across different versions of the test. An assessment is considered reliable when the same answers receive the same score no matter when the assessment occurs or how or who does the scoring, or when students receive the same scores no matter which version of the test they took.

Validity

an indication of how well an assessment actually measures what it is supposed to measure rather than extraneous features. For example, a valid assessment of mathematics problem solving would measure the student's ability to solve a problem and not the ability to read the problem.

Alternative Assessment

any type of assessment in which students create a response to a question, as opposed to assessments in which students choose a response from a given list, such as multiple-choice, true/false, or matching. Alternative assessments can include short answer questions, essays, performance assessments, oral presentations, demonstrations, exhibitions, and portfolios.

Authentic (assessment)

assessment tasks that elicit demonstrations of knowledge and skills in ways that resemble "real life" as closely as possible, engage students in the activity, and reflect sound instructional practice.

On-demand assessment

assessment that takes place at a predetermined time and place. State tests, SATs, and most final exams are examples of on-demand assessments.

Standardized assessments

assessments that are administered and scored in exactly the same way for all students. Traditional standardized tests are typically mass-produced and machine-scored and are designed to measure skills and knowledge that are thought to be taught to all student s in a fairly standardized way. Performance assessments can also be standardized if they are administered andn scored in the same way for all students. Standardization is an important consideration if comparisons are to be made between scores of different individuals or groups.

Performance assessments

direct, systematic observation of actual student performances and rating those performances according to pre-established performance criteria.

Criteria

see performance criteria

Standards (content or curriculum)

statements of what should be taught. For example, the NCTM curriculum standards.

Selected-response assessments

students select the correct response from among a set of responses offered by the developer of the assessment. Multiple-choice and matching tests are examples of selected-response assessments.

Assessment

the act of collecting information about individuals or groups of individuals in order to understand them better.

Generalizability

the extent to which the performances sampled by a set of assessment items/tasks are representative of the broader domain being assessed. For example, can we generalize about a student's problem-solving ability in general from the performance of the student on a specific set of 10 problem-solving tasks?

Open-response tasks

the kind of performance required of students when they are required to generate an answer, rather than slect it from among several possible answers, but there is still a single, correct response. An example is "There are four pieces of wood, each measuring seven feet. If you used them as a fence around your square yard, how large an area would you create?"

Open-ended tasks

the kind of performance required of students when they must generate a solution to a problem or perform a task when there is no single, right answer. An example is : "Below you see a bar graph without any labels. what might this be a graph of?"

Scale

the range of scores possible on an individual item or task. Performance assessment items are typically scored on a 4-to 6-point scale, compared to a scale of 2 (right/wrong) on multiple-choice items.

Context (of an authentic assessment)

the surrounding circumstances within which teh assessment

Extraneous interference (or error)

things that might causer us to mismeasure students, for example, excessive reading on a mathematics test, or role-playing on a science assessment.


Set pelajaran terkait

大学英语CET-4词汇表Part 2(g-q,带词组)

View Set

Chapter 14: Direct, Online, Social Media and Mobile Marketing

View Set

fill in the blank question page two and three

View Set

Chapter 17 and 18 Analyzing Environmental Risks and Impact of Environmental Policy

View Set

CH 12 - Building the ISL-LM Model

View Set