Assessment of student achievement

Ace your homework & exams now with Quizwiz!

Shift changes

(1) when there are only two possible alternatives, a shift can be made to a true-false item; (2) when there are a number of similar factors to be related, a shift can be made to a matching item; and (3) when the items are to measure analysis, interpretation, and other complex outcomes, a shift can be made to the interpretive exercise. This procedure makes it possible to use the special strengths of the multiple-choice item and to use the other selection-type items more appropriately.

The degree of realism present

(1) paper-and-pencil performance, (2) identification test, (3) structured performance test, (4) simulated performance, (5) work sample, and (6) student project.

specific learning objectives

(1) recall of knowledge, (2) intellectual abilities and skills, (3) general skills (laboratory, performance, communication, work-study), and (4) attitudes, interests, and appreciation

Locating and Selecting Resources.

2.1 Has a variety of resources been selected? 2.2 Is the resource material relevant to the problem? 2.3 Do the resources provide various possible solutions to the problem? 2.4 Does the resource material include evidence supporting the suggested solutions? 2.5 Is there enough resource material to provide for valid conclusions?

A common outline for a problem-solving project includes the following items.

1. Establishing criteria and standards. 2. Selecting and stating the problem. 3. Locating and selecting resources. 4. Writing the report. 5. Designing and completing a research study or making a product. 6. Orally presenting and defending the project.

Rules for Scoring Essay Answers

1. Evaluate answers to essay questions in terms of the learning outcomes being measured. 2. Score restricted-response answers by the point method, using a model answer as a guide. 3. Grade extended-response answers by the rating method, using defined criteria as a guide. 4. Evaluate all of the students' answers to one question before proceeding to the next question. 5. Evaluate answers to essay questions without knowing the identity of the writer. 6. Whenever possible, have two or more persons grade each answer.

portfolio

A collection of your work along with documentations of achievements is known as a

rubric

A method of scoring work (e.g. an essay) using a numerical value (e.g. 1-5), where each value is associated with certain characteristics.

Which of the following is the best-stated true-false item?

A rising barometer forecasts fair weather

achievement tests

A test designed to assess what a person has learned

Anecdotal record

a brief description of some significant event. It typically includes the observed behavior, the setting in which it occurred, and a separate interpretation of the event.Although keeping anecdotal records can be time consuming, the task can be kept manageable by limiting the records to certain types of behavior (e.g., safety) and to those individuals needing the most help (e.g., slow, careless).

Constructing a graph from a given set of data is an example of

a restricted performance task

Which test item is least useful for educational diagnosis?

a. Multiple-choice item. *b. True-false item. c. Short-answer item.

Which test item provides the highest score by guessing?

a. Multiple-choice item. *b. True-false item. c. Short-answer item.

Which test item is difficult to score objectively?

a. Multiple-choice item. b. True-false item. *c. Short-answer item.

What is another name for true-false items?

alternative-response items

A series of selection-type test items based on introductory material such as a graph is an example of

an interpretive exercise

An advantage of restricted performance tasks over extended performance tasks is that they

are easier to judge performance

When preparing the environment for a performance assessment, activities should be created for students waiting to perform that

are unrelated to the performance task

rating scale

assessment in which a numerical value is assigned to specific behavior that is listed in the scale

A weakness of short-answer questions is that they

can potentially have several answers

Asking students to judge each statement as true or false, and then to change the false statements so they are true increases the item's level of

difficulty

The incorrect responses in a multiple-choice item are called

distracters

What are the incorrect responses in a multiple-choice item called?

distracters

A limitation of the interpretive exercise is that scoring is highly subjective.

f

A true-false item with the correct answer being "false" provides evidence that the student knows the correct answer.

f

According to the "Rules for Writing Essay Questions" in your textbook, it is best to limit the amount of time a student has to answer each essay.

f

In writing true-false items, one useful rule is to include absolute terms like "always" or "never."

f

extended-response questions.

gives students almost unlimited freedom to determine the form and scope of their responses.

A major advantage of specifying the performance outcomes prior to administering the task is that they can

help students understand what is expected

Types of Essay Questions

restricted-response questions and extended-response questions.

Essay questions are more appropriate than multiple-choice items when the specific outcome calls for

supplying the answer

A statement of opinion, by itself, cannot be marked true or false.

t

According to the "Rules for Scoring Essay Answers" in your textbook, it is best to grade essay tests question by question, rather than student by student.

t

Analytic scoring is most appropriate when it is desired to provide the student with feedback on performance of each step in a process.

t

Asking a student to defend a position or point of view would best be assessed with an extended-response essay question.

t

Having two or more persons grade each essay question is the best way to check the reliability of scoring.

t

Only grade spelling, grammar, and punctuation of supply-type answers when they relate to the intended learning outcome.

t

Restricted performance outcomes typically require students to identify, construct, or demonstrate completion of a simple task.

t

Scores from true-false items are more likely to be influenced by student guessing than matching items.

t

Absolute Grading

the use of letter grades defined by a 100-point system.

placement assessment

used to determine whether students have skills or knowledge necessary to move on to new material

Observation

will judge the response. It should be recognized, however, that a series of restricted tasks do not provide sufficient evidence of a comprehensive performance. For that we need more extended tasks that integrate the specific skills into a complex pattern of movements or the production of a high-quality product.

Action verbs-IDENTIFY: Selects the correct objects, part of the object, procedure, or property (typical verbs: identify, locate, select, touch, pick up, mark, describe)

Select the proper tool. Identify the parts of a typewriter. Choose correct laboratory equipment. Select the most relevant statistical procedure. Locate an automobile malfunction. Identify a musical selection. Identify the experimental equipment needed. Identify a specimen under the microscope.

GENERAL GUIDELINES FOR ITEM WRITING

Select the type of test item that measures the intended learning outcome most directly. Use a supply-type item if supplying the answer is an important element of the task (e.g., writing). Use a selection-type item if appropriate (e.g., identification) or if both types are equally appropriate. 2. Write the test item so that the performance it elicits matches the performance in the learning task. The intended learning outcome specifies the learning task in performance terms and the test task should call forth the same performance. 3. Write the test item so that the test task is clear and definite. Keep the reading level low, use simple and direct language, and follow the rules for correct punctuation and grammar. 4. Write the test item so that it is free from nonfunctional material. Material not directly relevant to the problem being presented increases the reading load and may detract from the intent of the item. Use extraneous material only where its detection is part of the task (e.g., in math problems). 5. Write the test item so that irrelevant factors do not prevent an informed student from responding correctly. Avoid trick questions that might cause a knowledgeable student to focus on the wrong aspect of the task. Use clear, unambiguous statements that maximize the 81 82 performance to be measured and minimize all other influences. For example, word problems measuring mathematical reasoning should keep reading level and computational demands simple if an uncontaminated measure of reasoning ability is desired. 6. Write the test item so that irrelevant clues do not enable the uninformed student to respond correctly. Removing unwanted clues from test items requires alertness during item writing and reviewing the items after setting them aside for a while. The most common clues for each item type will be considered in the following chapters. It is also important to prevent the information given in one item from providing an answer to another item in the test. 7. Write the test item so that the difficulty level matches the intent of the learning outcome, the age group to be tested, and the use to be made of the results. When difficulty is being evaluated, check to be certain that it is relevant to the intended learning outcome and that the item is free from sources of irrelevant difficulty (e.g., obscure materials, overly fine discriminations). 8. Write the test item so that there is no disagreement concerning the answer. Typically, the answer should be one that experts would agree is the correct or best answer. Most problems arise here when students are to provide the best answer (best procedure, best explanation). This involves a matter of judgment and to be defensible the answer must be clearly best and identified as such by experts in the area. Where experts disagree, it may be desirable to ask what a particular authority would consider to be the best method, the best reason, and the like. When attributed to a source, the answer can be judged as correct or incorrect. 9. Write the test items far enough in advance that they can be later reviewed and modified as needed. A good time to write test items is shortly after the material has been taught, while the questions and context are still clearly in mind. In any event, reviewing and editing items after they have been set aside for a while can detect flaws that were inadvertently introduced during the original item writing. 10. Write more test items than called for by the test plan. This will enable you to discard weak or inappropriate items during item review and make it easier to match the final set of items to the test specifications.

ABSOLUTE GRADING strengths and weakknesses

Strengths 1. Grades can be described directly in terms of student performance, without reference to the performance of others. 2. All students can obtain high grades if mastery outcomes are stressed and instruction is effective. Limitations 1. Performance standards are set in an arbitrary manner and are difficult to specify and justify. 2. Performance standards tend to vary unintentionally due to variations in test difficulty, assignments, student ability, and instructional effectiveness. 3. Grades can be assigned without clear reference to what has been achieved (but, of course, they should not be).

Relative Grading strengths and weaknesses

Strengths 1. Grades can be easily described and interpreted in terms of rank in a group. 2. Grades distinguish among levels of student performance that are useful in making prediction and selection decisions. Limitations 1. The percent of students receiving each grade is arbitrarily set. 2. The meaning of a grade varies with the ability of the student group. 3. Grades can be assigned without clear reference to what has been achieved (but, of course, they should not be).

Scoring-Selection-Type Items Objective, simple, and highly reliable.

Subjective, difficult, and less reliable.-Essay Questions

essay item

Supply-type items used to measure the ability to organize and integrate material are called

Relative Grading

When assigning grades on a relative basis, the students are typically ranked in order of performance (based on a set of test scores or combined assessment results), and the students ranking highest receive a letter grade of A, the next highest receive a B, and so on

Factors Distorting Scores-Selection-Type Items Reading ability and guessing.

Writing ability and bluffing.-Essay Questions

Which test item measures the greatest variety of learning outcomes?

*a. Multiple-choice item. b. True-false item. c. Short-answer item.

Matching Items Strengths

1. A compact and efficient form is provided where the same set of responses fit a series of item stems (i.e., premises). 2. Reading and response time is short. 3. This item type is easily constructed if converted from multiple-choice items having a common set of alternatives. 4. Scoring is easy, objective, and reliable.

Interpretive Exercises strengths

1. An efficient means of measuring the interpretation of printed information in various forms (e.g., written, charts, graphs, maps, pictures) is provided. 2. More meaningful complex learning outcomes can be measured than with the single-item format. 3. The use of introductory material provides a common basis for responding. 4. Scoring is easy, objective, and reliable.

Rules for Writing Matching Items

1. Include only homogeneous material in each matching item. 2. Keep the lists of items short and place the brief responses on the right. 3. Use a larger, or smaller, number of responses than premises, and permit the responses to be used more than once. 4. Place the responses in alphabetical or numerical order. 5. Specify in the directions the basis for matching and indicate that each response may be used once, more than once, or not at all. 6. Put all of the matching item on the same page.

GUIDELINES FOR EFFECTIVE AND FAIR GRADING

1. Inform students at the beginning of instruction what grading procedures will be used. 2. Base grades on student achievement, and achievement only 3. Base grades on a wide variety of valid assessment data. 4. When combining scores for grading, use a proper weighting technique 5. Select an appropriate frame of reference for grading 6. Review borderline cases by reexamining all achievement evidence.

Checklist for Evaluating Short-Answer Items

1. Is this type of item appropriate for measuring the intended learning outcome? 2. Does the item task match the learning task to be measured? 3. Does the item call for a single, brief answer? 4. Has the item been written as a direct question or a well-stated incomplete sentence? 5. Does the desired response relate to the main point of the item? 6. Is the blank placed at the end of the statement? 7. Have clues to the answer been avoided (e.g., "a" or "an," length of the blank)? 8. Are the units and degree of precision indicated for numerical answers?

Characteristics of a Good Student Project

1. It focuses on multiple learning outcomes. 2. It includes the integration of understanding, skills, and strategies. 3. It is concerned with problems and activities that relate to out-of-school life. 4. It involves the active participation of students in all phases of the project. 5. It provides for student self-assessment and independent learning. 6. It requires performance skills that are generalizable to similar situations. 7. It is feasible within the constraints of the students' present knowledge, time limits, and available resources and equipment. 8. It is both challenging and motivating to students. 9. It is fair and doable by all students. 10. It provides for collaboration between the students and the teacher.

Interpretive Exercises limitations

1. It is difficult to construct effective items. 2. Written material is highly dependent on reading skill. 3. This item type is highly subject to extraneous clues. 4. It is ineffective in measuring the ability to originate, organize, and express ideas.

Short answer item limitations

1. It is difficult to phrase statements so that only one answer is correct. 2. Scoring is contaminated by spelling ability when responses are nonverbal. 3. Scoring is tedious and time consuming. 4. This item type is not very adaptable to measuring complex learning outcomes.

Short answer item strenghts

1. It is easy to write test items. 2. Guessing is less likely than in selection-type items. 3. This item type is well suited to computational problems and other learning outcomes where supplying the answer is important. 4. A broad range of knowledge outcomes can be measured.

Construction of a checklist for performance assessment involves the following steps.

1. List the procedural steps or product characteristics to be evaluated. 2. Add common errors to the list, if such is useful in diagnosing poor performance. 3. Arrange the list in some logical order (e.g., sequence of steps). 4. Provide instructions and a place for checking each item. 5. Add a place for comments at the bottom of the form, if needed.

The construction of a rating scale for performance assessment typically includes the following step

1. List the procedural steps or product characteristics to be evaluated. 2. Select the number of points to use on the scale and define them by descriptive terms or phrases. 3. Arrange the items on the rating scale so that they are easy to use. 4. Provide clear, brief instructions that tell the rater how to mark items on the scale. 5. Provide a place for comments, if needed for diagnostic or instructional purposes.

Guidelines When Preparing the Environment

1. Make certain that all tools, equipment, and instruments to be used by each student during the performance assessment are available and in good working condition. 2. Ensure that conditions that may handicap performance (e.g., weather, climate, temperature, lighting, time of day, etc.) are the same for each student. 3. Provide students with adequate space to perform. 4. Eliminate unnecessary conditions that may distract a student during performance (e.g., ringing telephone, disruptive students, conversation, etc.). 5. Allow sufficient time to observe performance, record observations, and provide feedback to the student. 6. Create activities for students waiting to perform. These activities should have educational value and occupy their time. However, the activities should be unrelated to the performance task so that they do not unfairly benefit from the activities compared to other students. 7. Create activities for students who have completed the performance assessment. Again, these activities should have educational value and occupy their time. However, these students should not have the opportunity to discuss the assessment with those students waiting to be assessed and, therefore, who may unfairly benefit from this counsel.

Computing Composite Scores for Grading

1. Select assessments to be included in the composite score and assign percentages. 2. Record desired weight for each assessment. 3. Equate range of scores by using multiplier. 4. Determine weight to apply to each score by multiplying "desired weight" by "multiplier to equate ranges."

Rules for Constructing Interpretive Exercises

1. Select introductory material that is relevant to the learning outcomes to be measured. 2. Select introductory material that is new to the examinees. 3. Keep the introductory material brief and readable. It is inefficient for both the test maker and the test taker to use extended introductory material and only one or two test items. 4. Construct test items that call forth the type of performance specified in the learning outcome.

STEPS IN PREPARING PERFORMANCE ASSESSMENTS

1. Specifying the performance outcomes. 2. Selecting the focus of the assessment (procedure, product, or both). 3. Selecting an appropriate degree of realism. 4. Selecting the performance situation. 5. Selecting the method of observing, recording, and scoring.

Rules for Writing Short-Answer Items

1. State the item so that only a single, brief answer is possible. 2. Start with a direct question and switch to an incomplete statement only when greater conciseness is possible by doing so. 3. It is best to leave only one blank, and it should relate to the main point of the statement.

TEACHERS' STANDARDS FOR STUDENT ASSESSMENT

1. Teachers should be skilled in choosing assessment methods appropriate for instructional decisions. Skill in choosing appropriate, useful, administratively convenient, technically adequate, and fair assessment methods are prerequisite to good use of information to support instructional decisions. 2. Teachers should be skilled in developing assessment methods appropriate for instructional decisions. While teachers often use published or other external assessment tools, the bulk of the assessment information they use for decision making comes from approaches they create and implement. 3. The teacher should be skilled in administering, scoring, and interpreting the results of both externally produced and teacher-produced assessment methods. It is not enough that teachers are able to select and develop good assessment methods; they must also be able to apply them properly. 4. Teachers should be skilled in using assessment results when making decisions about individual students, planning teaching, developing curriculum, and school improvement. Assessment results are used to make educational decisions at several levels: in the classroom about students, in the community about a school and a school district, and in society, generally, about the purposes and outcomes of the educational enterprise. Teachers play a vital role when participating in decision making at each of these levels and must be able to use assessment results effectively. 5. Teachers should be skilled in developing valid pupil grading procedures that use pupil assessments. Grading students is an important part of professional practice for teachers. Grading is defined as indicating both a student's level of performance and a teacher's valuing of that performance. The principles for using assessments to obtain valid grades are known and teachers should employ them. 6. Teachers should be skilled in communicating assessment results to students, parents, other lay audiences, and other educators. Teachers must routinely report assessment results to students and to parents or guardians. In addition, they are frequently asked to report or to discuss assessment results with other educators and with diverse lay audiences. If the results are not communicated effectively, they may be misused or not used. To communicate effectively with others on matters of student assessment, teachers must be able to use assessment terminology appropriately and must be able to articulate the meaning, limitations, and implications of assessment results. 7. Teachers should be skilled in recognizing unethical, illegal, and otherwise inappropriate assessment methods and uses of assessment information. Fairness, the rights of all concerned, and professional ethical behavior must undergird all student assessment activities, from the initial planning for and gathering of information to the interpretation, use, and communication of the results.

Essay Questions strengths

1. The highest level learning outcomes (analyzing, evaluating, creating) can be measured. 2. Preparation time is less than that for selection-type items. 3. The integration and application of ideas is emphasized.

Essay Questions limitations

1. There is an inadequate sampling of achievement due to time needed for answering each question. 2. It is difficult to relate to intended learning outcomes because of freedom to select, organize, and express ideas. 3. Scores are raised by writing skill and bluffing and lowered by poor handwriting, misspelling, and grammatical errors. 4. Scoring is time consuming and subjective, and it tends to be unreliable.

Characteristics of Sound Performance Criteria

1. They describe the components that are most crucial to satisfactory completion of the performance (e.g., beware of peripheral activities that are trivial). 2. They focus on observable aspects of the performance (e.g., "Follows safety procedures," not "Demonstrates safety consciousness"). 3. They apply in various contextual settings (e.g., "skill in computation" is applicable in all contexts). 4. They represent aspects of performance that experts would agree are necessary for a successful performance (e.g., "Good organization" would be recognized by experts as basic in all types of writing). 5. They are stated in terms that are readily understood and usable by students in evaluating performance (e.g., for self-evaluation and peer evaluation). 6. They are in harmony with the instructional objectives and the use to be made of the assessment results (e.g., criteria used in judging writing skills and their improvement over time).

matching item limitations

1. This item type is largely restricted to simple knowledge outcomes based on association. 2. It is difficult to construct items that contain a sufficient number of homogeneous responses. 3. Susceptibility to irrelevant clues is greater than in other item types

Rules for Writing Essay Questions

1. Use essay questions to measure complex learning outcomes only 2. Relate the questions as directly as possible to the learning outcomes being measured. 3. Formulate questions that present a clear task to be performed. 4. Do not permit a choice of questions unless the learning outcome requires it. Provide ample time for answering and suggest a time limit on each question.

Three website that I can use to create tables of specifications for assessments

1.http://www.slideshare.net/ymdp08/table-of-specifications-29682915) 2. http://www.biz.colostate.edu/MTI/summer/Documents/3-2Grading.pdf 3. http://www.parcconline.org/assessment-blueprints-test-specs

Writing the Report.

3.1 Has the problem been clearly stated? 3.2 Have the study procedures been adequately described? 3.3 Has the material from various sources been analyzed, compared, and evaluated? 3.4 Have the findings been integrated into a well-organized report? 3.5 Have the findings been supported by adequate and relevant information? 3.6 Does the summary include the main points? 3.7 Are the conclusions in harmony with the findings and the limits of the study? 3.8 Does the report exhibit good reasoning ability?

self-assessment

An evaluation of your strengths and weaknesses.

performance assessment

Assessment of a student's ability to perform tasks, not just knowledge.

authentic assessment

Assessment procedures that test skills and abilities as they would be applied in real-life situations

Checklist

Assessment tool with which a teacher evaluates student performance by indicating whether specific behaviors or qualities are present or absent.

formative assessment

Assessment used throughout teaching of a lesson and/or unit to gauge students' understanding and inform and guide teaching

Extended performance examples

Designs and conducts an experiment. Writes an accurate account of the study. States valid conclusions. Writes a critique of the procedure and findings. Presents and defends the study in class.

Action verbs-CONSTRUCT: Makes a product to fit a given set of specifications (typical verbs: construct, assemble, build, design, draw, make, prepare)

Draw a diagram for an electrical circuit. Design a pattern for making a dress. Assemble equipment for an experimental study. Prepare a circle graph. Construct a weather map. Prepare an experimental design.

Action verbs-DEMONSTRATE: Performs a set of operations or procedures (typical verbs: demonstrate, drive, measure, operate, perform, repair, set up)

Drive an automobile. Measure the volume of a liquid. Operate a filmstrip projector. Perform a modern dance step. Repair a malfunctioning TV set. Set up laboratory equipment. Demonstrate taking a patient's temperature. Demonstrate the procedure for tuning an automobile.

Analytic scoring rubric

Enable a teacher to focus on one characteristic of a response at a time. Separate scores for characteristics provide the student with clearer feedback about the strengths and weaknesses of the response. Permit teachers to evaluate one characteristic of a response at a time. Different aspects of the essay are specified, such as content, organization, or word choice. Points are assigned based on the level of fulfillment of each of these categories. The rubric should explain what type of performance is necessary to get differing levels of scores in each category.

Probable Effect on Learning-Selection-Type Items Encourages students to remember, interpret, and use the ideas of others.

Encourages students to organize, integrate, and express their own ideas.-Essay Questions

summative assessment

Evaluation at the conclusion of a unit.

diagnostic assessment

Highly specialized, comprehensive and detailed procedures used to uncover persistent or recurring learning difficulties that require specially prepared diagnostic tests as well as various observational techniques.

Selection-Type Items Essay Questions Learning Outcomes Measured Good for measuring the recall of knowledge, understanding, and application levels of learning; inadequate for organizing and expressing ideas.

Inefficient for measuring the recall of knowledge; best for ability to organize, integrate, and express ideas.-Essay Questions

Preparation of Items-Selection-Type Items Preparation of good items is difficult and time consuming.

Preparation of good items is difficult but easier than selection-type Essay Questions items.-

For which of the following general outcomes is the essay item least appropriate?

Remembering

Holistic Scoring Rubric

The assignment of a score based on an overall quality of a performance or product rather than a consideration of individual elements. Can generally be constructed more rapidly than analytic scoring rubrics. Grade the essay as a whole. Like analytic rubrics, holistic rubrics should describe what level of performance is needed in order to receive each score. While they are easier and faster to use than analytic rubrics, holistic rubrics will most likely not provide students with sufficient feedback.

Sampling of Content-Selection-Type Items The use of a large number of items results in broad coverage, which makes representative sampling of content feasible.

The use of a small number of items limits coverage, which makes representative sampling of content infeasible.-Essay Questions

informal observation

most often used to collect data by preschool teachers and more appropriate for program planning

restricted-response questions

places strict limits on the answer to be given

The matching item consists of

premises and responses

When a specific procedure is not necessary to produce a product, the focus of assessment should be on the

product

Short-answer items are typically limited to measuring a student's ability to

remember information

Supply-type items

require students to produce the answer. This may be a single-word or a several-page response.supply-type items are typically divided into (1) short-answer items, (2) restricted-response essay, and (3) extended-response essay

One method to diagnose a student's depth of understanding during a performance assessment is to

require the student to explain the purpose of each step


Related study sets

Chapter 18 Eating and Feeding Disorders REVIEW

View Set

Unit 3 Government involvement in the Economy

View Set

Sadlier-Oxford Vocabulary Workshop Level E Unit 12

View Set

Primary Professional Military Education (Enlisted) - Block - 2 - The Culture of the Navy; Enlisted Professionalism; Overview of Manpower Management for the Senior Enlisted Leader.

View Set