Role of Assessment

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Standardized test

A commercially developed test that samples behavior under uniform procedures.

Holistic Rubric

A holistic rubric makes assessment decisions based on a global look at something

Point Grading System

A point grading system is fairly simple and easy to use. The importance of each assignment, quiz, or test is reflected in the points allocated. For example, you may decide that assignments will be worth 10 points, quizzes 25 points, and tests 100 points. At the end of the grading period, the points are added up, and grades are assigned according to the established grade range.

Brief-essay item

A question to which respondents formulate a short-answer response in their own words or solve a problem.

Extended essay item

A question to which respondents formulate responses of several paragraphs in their own words.

Alternate-choice item

A statement to which respondents react either positively or negatively.

Cum pie ling item

A statement with a missing word or phrase, which must be supplied by the respondent.

Multiple-choice item

A test question with a stem that poses a problem or asks a question to be answered by one of several alternative responses.

Weighted Grading System

A weighted grading system is more complex than the point grading system. Every assignment is given a letter grade, and all grades are then weighted to arrive at a final grade. . The determination of a final grade can be made simpler and more objective by changing grades to numerical values: A = 4; B = 3; C = 2; D = 1; F = 0. Once numerical values are assigned, an average score is calculated for homework, quizzes, and tests. For example, you would calculate a homework average for seven homework assignments with the grades of A, B, B, C, C, D, and A by carrying out the following computation:

Assigning Grades

Absolute grading standard Performance compared with established set of criteria Relative grading standard Students' performance compared with that of classmates, including grading on the curve Point grading system Student work is allocated points, and grades are assigned according to established grade range Weighted grading system Assignments are given a letter grade, and all grades are weighted to determine final grade Percentage grading system Percentage correct is recorded for each assignment, and an average is calculated to determine final grade Grade contract Written agreement between student and teacher as to what students will do to earn a specific grade

Essay

Advantages Measures higher cognitive levels Less time needed to construct Disadvantages Difficult to score Questions sometimes ambiguous

Matching

Advantages Large sampling of content Can test associations Easy to construct and score Disadvantages Tests for recognition Guessing

Multiple Choice

Advantages Large sampling of content Scoring simple and fast Reduces guessing Measures wide range of cognitive levels Disadvantages Often used to test trivial content Question construction time-consuming

Completion

Advantages Large sampling of content Easy to construct Limited guessing Disadvantages Tests for memorization Writing good items difficult Difficult to score

Alternate Choice

Advantages Large sampling of content Easy to score Disadvantages Guessing Writing clear items difficult Tends to test memorization

Analytic Rubric

An analytic rubric examines only certain components.

Matching item

An arranged series of premises, each of which is matched with a specific item from a second list of responses.

Teacher-made test

An evaluative instrument developed and scored by a teacher for classroom assessment.

Standard deviation

An understanding of the normal curve requires a basic knowledge of the concept of variability; that is, you must understand standard deviation. The standard deviation is a measure of the extent to which scores are spread out around the mean. The greater the variability of scores around the mean, the larger the standard deviation. A small standard deviation means that most of the scores are packed in close to the mean. A large standard deviation means that the scores are spread out. When all the scores are identical, the standard deviation is zero.

Student work samples

Can be used to offer credible evidence of student learning and to provide evidence of teacher effectiveness. They portray the learning process of students over a sufficiently long period of time for appreciable progress in learning to occur. Work samples might include (a) written work such as a report, science experiment results, test, or story; (b) artwork; (c) tape recordings; (d) a constructed project done in art, industrial arts, or an appropriate subject; and (e) other types of finished products depending on the subject area. Work samples collected over sufficiently long periods of time allow teachers to assess and evaluate the success of their teaching, the progress of their students in relation to the objectives and standards delineated in the planning of instruction, and the ability to reflect on student achievement in relation to teaching and make the changes necessary for improved teaching and student success. Work samples also give teachers the opportunity to think reflectively about teaching—planning, instruction, assessment, management of the learning environment, and professionalism. They give teachers the opportunity to demonstrate with hard evidence that they are, in fact, able to foster learning gains in students.

Quizzes

Classroom quizzes can be used for evaluating student progress. In fact, quizzes are an excellent way to check homework and find out whether concepts from the preceding lesson were understood. Teacher quizzes differ from regular teacher-made tests in that they usually consist of three to five questions and are limited to the material taught in the immediate or preceding lesson. They are easy to develop, administer, and grade; thus, they provide prompt evaluative information to both students and teacher. Quizzes encourage students to keep up with their homework, and they show students their strengths and weaknesses in learning. In addition, quizzes help teachers improve instruction by providing feedback related to their effectiveness. Problems identified through quizzes serve as early warning signals of teaching or learning problems. Early identification allows the teacher to focus on problems before they worsen.

Assessment Concepts

Diagnostic evaluation Evaluation administered prior to instruction for placement purposes. Formative evaluation The use of evaluation in supplying feedback during the course of a program. Summative evaluation A judgment made at the end of a project that determines whether it has been successful or not and commonly used to give grades. Competitive evaluation Evaluation that forces students to compete with each other. Noncompetitive evaluation Evaluation that does not force students to compete with each other. Performance assessment Assessment in which students demonstrate the behaviors to be measured. Student work sample Collection of students' work over a sufficiently long period of time. Portfolio A systematic, organized collection of evidence that documents growth and development and that represents progress made toward reaching specified goals and objectives. Standard scores A score based on the number of standard deviations an individual is from the mean. Percent! le The point on a distribution of scores below which a given percentage of individuals fall. Reliability The extent to which individual differences are measured consistently, or the coefficient of stability of scores. Validity The extent to which measurement corresponds with criteria—that is, the ability of a device to measure what it is supposed to measure. Usability The suitability of a measurement device for collecting desired data.

Pretest Evaluation

Diagnostic pretest evaluations normally are administered before instruction to assess students' prior knowledge of a particular topic for placement purposes. Diagnostic pretests can also be used to identify specific deficits that need remediation. More sophisticated assessments can help teachers determine students' cognitive styles and the depth of their understanding of complex concepts. Diagnostic evaluations, however, may also become necessary during the course of study when the teacher feels students are having difficulty with the material. The purpose of diagnostic pretest evaluations generally is to anticipate potential learning problems and, in many cases, to place students in the proper course or unit of study.

Authentic Assessment

Educators are trying to redesign schools to reflect changing world conditions. An essential element of that redesign is the way student learning is assessed. Simply testing isolated skills or retained facts does not effectively measure a student's capabilities. To accurately evaluate what students have learned, an assessment method must examine their collective abilities. This can be accomplished through authentic assessment. Authentic assessment usually includes a task for students to perform and a rubric by which their performance on the task will be evaluated. Authentic assessment presents students with real-world situations that require them to apply their relevant skills and knowledge. In other words, students apply their skills to authentic tasks and projects. In authentic assessment, students • Make oral reports • Play tennis • Write stories and reports • Solve math problems that have real-world applications • Do science experiments • Read and interpret literature In authentic assessment, assessment drives the curriculum. That is, teachers first determine the tasks that students will perform to demonstrate the desired outcomes, and then a curriculum is developed that will enable students to perform those tasks well, which would include the acquisition of essential knowledge and skills. This has been referred to as planning backwards (see Chapter 6). Authentic assessment has evolved to encompass a range of approaches, including portfolio assessment, journals and logs, products, videotapes of performances, and projects. Portfolio assessment is presently being widely adopted in many schools.

Systems of Evaluation

Evaluation systems can be grouped into two categories: competitive and noncompetitive.

Formative Evaluation

Formative evaluation is carried out during instruction to provide feedback on students' progress and learning. It is used in monitoring instruction and promoting learning. Formative evaluation is a continuous process, but comparatively little use has been made of it. Thus, although pretest evaluation alone usually is considered diagnostic, formative evaluation is also diagnostic in that it provides information about the strengths and weaknesses of students. Formative evaluation generally focuses on small, independent pieces of instruction and a narrow range of instructional objectives. Essentially, formative evaluation answers your question "How are you doing?" and uses checkup tests, homework, and classroom questioning in doing so. You should use the results obtained from formative evaluation to adjust your instruction or revise the curriculum, rather than to assign grades.

Percentile score

Indicates the percentage of the population whose scores fall at or below that score. A score of 20, for example, would have 20% of the group falling at or below the score and 80% above the score.

Homework

It is not essential for teachers to grade every student assignment and record a grade. For assignments that are designed for understanding and practice, allow students to check themselves or allow students to grade each other's papers. Have students trade papers, sign the papers they grade, and use a special color of pen to mark the papers. You might want to spot check some of the student-checked papers.

Contracting for Grades

Most schools give teachers considerable freedom in establishing grading standards. Some teachers have used this flexibility in implementing a contract approach to grading. With a contract, the teacher promises to award a specific grade for specified performance. Students know exactly what they must do to receive a certain grade; depending on the amount of work they wish to do, they receive a particular grade. For example, a simple contract follows: To receive a grade of D, you must satisfactorily complete activities 1 through 6, satisfactorily complete half of the homework, and pass the posttest. To receive a grade of C, you must complete activities 1 through 6, satisfactorily complete 60% of the homework, do one of the optional activities satisfactorily, and receive at least a C on the posttest. To receive a grade of B, you must complete activities 1 through 6, satisfactorily complete 80% of the homework, do two of the optional activities very well, and receive at least a B on the posttest. To receive a grade of A, you must complete activities 1 through 6, satisfactorily complete 90% of the homework, do four of the optional activities excellently, complete at least one of the major project activities satisfactorily, and receive at least a B+ on the posttest. When you establish a contract system, you must develop sets of objectives that correspond to specific letter grades. You then decide the activities and assignments that will be required at each level. These objectives, corresponding letter grades, and requirements are shared with students in writing, so students can study them and make decisions on a contract grade.

Standard scores

Most schools report student performances in terms of standard scores such as z scores, T scores, and stanine scores, as well as in terms of percentile. These methods use the normal distribution curve to show how student performances compared with the distribution of scores above and below the mean. Standard scores provide a standard scale by which scores on different evaluative instruments by different groups may be compared reasonably

Portfolios

Portfolios are a very popular form of authentic assessment. The portfolio is a purposeful collection of student work that tells the story of the student's efforts, progress, or achievement in a given area. Teachers design activities resulting in student-made products that are collected to make a portfolio for each student. The portfolio of work is then used as the product by which students are evaluated. Some teachers also require students to include a reflection upon their skills and accomplishments as part of their portfolios. The reflective caption is designed so that the student can explain why he or she chose a particular piece of evidence. A portfolio can be thought of as a systematic, organized collection of evidence that documents growth and development and that represents progress made toward reaching specified goals and objectives. Portfolios enable students to display their skills and accomplishments for examination by others. Portfolio advocates suggest that portfolios are a more equitable and sensitive portrait of what students know and are able to do. They encourage teachers and schools to focus on important student outcomes.

Posttest Evaluation (Summative)

Posttest (summative) evaluation is the final phase in an evaluation program. Because post-test evaluation is primarily aimed at determining student achievement for grading purposes, it is generally conducted at the conclusion of a chapter, unit, grading period, semester, or course. Thus, posttest evaluation is used for determining student achievement and for judging teaching success. Grades provide the school with a rationale for passing or failing students and are usually based on a comprehensive range of accumulated behaviors, skills, and knowledge. Posttest evaluation, as the term implies, provides an account of students' performances. It is usually based on test scores and written work related to cognitive knowledge and rarely addresses such areas of learning as values, attitudes, and motor performance. Student performance on end-of-chapter tests, homework, classroom projects, and standardized achievement tests is commonly used in posttest evaluation. Posttest evaluation can be used in judging not only student achievement but also the effectiveness of a teacher or a particular school curriculum. The data collected and instrumentation used in collecting the data differ, depending on the type of posttest evaluation being considered.

Recordkeeping

Recordkeeping is often a burden for teachers and is time-consuming. It also requires a great deal of accuracy. Fortunately, the recordkeeping burden can be reduced by computer technology. For example, electronic grade books can keep track of students' assessment data and other classroom information. Most electronic grade books available today can store many types of student information, including test scores, homework grades, project grades, semester averages, teacher judgments, and attendance. Some programs can also handle missing assignments, automatically e-mail parents, and function to keep records of student and parent information such as mailing addresses, phone numbers, locker numbers, book numbers, and other details. Many programs allow each component of the assessment system to be weighted, and the program will then compute the students' 6- or 9-week grades and semester grades based on the formula you create. Some school districts give parents and/or students the opportunity to connect with the school computer, enter a personal identification number and a password, and access the student's grades and teacher assignments and comments.

Reliability

Reliability is the consistency with which a measurement device gives the same results when the measurement is repeated. In other words, it is the measurement device's trustworthiness or dependability. A reliable bathroom scale, for example, gives identical weights for each of three separate weights taken in a single morning. If, on the other hand, the three weights taken differ by 5 pounds, the scale could not be considered very reliable. Likewise, a true/false test that is so ambiguously worded that students are forced to guess would probably yield different scores from one administration of the test to the next. In short, it would be extremely unreliable. How can teachers increase the reliability of their measurement devices? Basically, the reliability of measurement instruments can be improved by incorporating the following suggestions into their construction. 1. Increase the number of evaluative items. Reliability can be improved by increasing the amount of data collected. Because you have a larger sample of the trait being evaluated, chance errors will tend to cancel each other out. Thus, a test of 30 items is more reliable than one of 20 items. 2. Establish optimum item difficulty. Reliability can be increased by making the items being evaluated (test or observational) of moderate difficulty. In effect, moderate difficulty gives a moderate spread of scores, which allows you to better judge each student's performance in relation to other students. Conversely, difficult and easy items result in bunched scores, which make it more difficult to differentiate among scores. Thus, tests made up of moderate items spread the data over a greater range than devices composed mainly of difficult or easy items. In the case of observational scales, an item with a 5-point scale would give more reliable information than a 7- or 3-point scale because it would give more consistent results than would larger or smaller scales. 3. Write clear items and directions. Reliability is improved when students clearly understand what is being asked. Ambiguities and misunderstood directions lead to irrelevant errors. 4. Administer the evaluative instrument carefully. Reliability is improved when students are not distracted by noises or when they are not rushed. 5. Score objectively. Reliability is greater when objective data are collected. With subjective data, internal differences within the scorer can result in identical responses or behaviors being scored differently on different occasions.

Measurement Accuracy

Reliability, validity, and usability are three important qualities of every measurement device. If a teacher-made test reveals that 50% of an algebra class was unable to solve algebraic equations, should the teacher be concerned? The answer depends on the reliability, validity, and usability of the test—that is, the ability of the test to consistently measure what it is supposed to measure: problem-solving ability.

Constructing a rubric

Step 1. Examine the standards or objectives that the product or performance is meant to address. Step 2. Write or identify the criteria that will be used to judge the student's product or performance, and make sure they match the standards or objectives. Step 3. Design a frame by deciding on the major categories or attributes the rubric will address. Step 4. Describe the different levels of performance (exceptional, very good, adequate, etc.) that match each criterion. Be sure to choose words or phrases that show the actual differences among the levels. Make sure they are observable. Step 5. Test the rubric with students to make sure it is understandable. Step 6. Revise the rubric as necessary.

Performance Assessment

Students demonstrate the behaviors that the assessor wants to measure (Airasian, 2001; Meyer, 1992). For example, if the desired behavior is writing, students write; or if the desired behavior is identification of geometric figures, they draw or locate geometric figures. Assessment is done by measuring the individual works against specified criteria that match the objectives toward a specific purpose. In effect, samples of students' work are compiled for evaluation. The students, the teacher, or both can select items for assessment of performance. These items are often accumulated in portfolios, thus allowing students to display a variety of evidence of performance.

Non-competitive evaluation (Criterion Referenced)

Systems do not require interstudent comparisons; rather, they are based on established sets of standards of mastery (criterion referenced). Some researchers suggest that criterion-referenced evaluation (which does not force competition among students) contributes more to student progress than does norm-referenced evaluation. In effect, these researchers suggest that not all students are motivated through competition. In fect, they suggest that competition can discourage less able students who are forced to compete with more capable students. Competition can even be harmful to more capable students because it often teaches that winning is all-important. Criterion-referenced evaluation focuses on assessing students' mastery of specific skills, regardless of how other students did on the same skills.

Competitive evaluation (Norm Referenced)

Systems force students to compete with other students (norm referenced). Most evaluators concerned with students' standing within a group make use of the normal curve. This curve is commonly called the natural curve or chance curve because it reflects the natural distribution of all sorts of things in nature. This distribution is shown in Figure 8.1. Such a curve is appropriately used when the group being studied is large and diversified. Within a classroom, the curve can be used to give teachers an idea of how well a student has performed in comparison with classmates.

Percentage Grading System

The percentage grading system is probably the simplest of all grading systems and the most widely used. The system typically relies on the calculation of the percentage correct of the responses attempted. For example, a student who gets 20 of 25 correct on a homework assignment has a score of 80 written in the grade book, 6 of 8 correct on a quiz has a 75 recorded in the grade book, and 40 of 60 correct on an examination has a 67 written in the grade book. You typically calculate an average of these and other term scores in arriving at a final score on which to base the term grade. The problem with this system is that all student exercises carry the same weight, even though the types of exercises are markedly different—homework, quiz, and examination. Even with the noted flaw, teachers tend to use the percentage system extensively for two reasons. First, it is simple to use and understand. Second, parents and students prefer the system because of its simplicity and their ability to understand it.

Rubrics

To make the evaluation of portfolios as objective as possible, most teachers develop a scoring rubric to guide the grading process. A rubric is a summarization of the performance criteria at different levels of performance. Often, teachers label the different levels as "excellent," "good," "fair," and "poor" or with particular grades to summarize the performance. As shown in the example, the criteria are usually listed in the column on the left, and the columns to the right of the criteria describe varying degrees of quality. As concisely as possible, the columns explain what makes a good piece of work exceptional, good, or bad. A rubric is incomplete unless it contains all of these elements. Generally, there are two types of rubrics: holistic rubrics and analytic rubrics. Rubrics communicate standards and scoring criteria to students before the performance.

Usability

Usability is how well a measurement device is suited for gathering the desired information. For example, a 2-hour science test would not be suitable for a 50-minute class period. A test should be easy to administer and score, fall within budget limitations, be suitable to the test conditions, and have the appropriate degree of difficulty. Reliability, validity, and usability are all interrelated. In fact, measurement devices must be reliable and suitable for the purposes for which they are used before they can be valid. For example, if you cannot get consistent height measurements from a yardstick (not reliable), you cannot expect it to be accurate. The measurements might also be very consistent (reliable) but still not accurate (valid). A pencil-and-paper test would hardly be suitable for evaluating the ability to hit a tennis ball. Clearly, if a measurement device is to be used in making decisions, it is essential that the information be reliable and valid, as well as suitable.

Validity

Validity is the extent to which an evaluative device measures what it is supposed to measure. It measures what was taught and learned. For instance, if social studies content knowledge was taught and learned, but students scored low on the test over the content because they could not understand the questions, then the test is not valid. We all have had teachers who taught one thing and tested over something else, or who have made the test so difficult that we performed poorly. Although there are several types of validity, the most important one to teachers is content, or face, validity. Content, or face, validity is established by determining whether the instrument's items correspond to the content that was taught in the course.


संबंधित स्टडी सेट्स

Week 1: Introduction to Cloud Computing

View Set

The Unfinished Nation, Chapter 14, Review

View Set

FINN 1003 (CHAPTER 10), FINN 1003 (CHAPTER 9), FINN 1003 (CHAPTER 6), FINN 1003 (CHAPTER 7), FINN 1003 (CHAPTER 8)

View Set

Chapter 38: Assessment and Management of Patients With Rheumatic Disorders

View Set

Chapter 11- Modern Atomic Theory

View Set