Profstud 301 Final
If educators wish to accurately estimate the likelihood of consistent decisions about students who score at or near a high-stakes test's previously determined cut-score, which of the following indicators would be most useful for this purpose?
A conditional standard error of measurement (near the cut-score)
A self-report inventory intended to measure secondary students' confidence that they are "college and career ready" has recently been developed by administrators in an urban school district. To collect evidence bearing on the consistency with which this new inventory measures students' status with respect to this affective disposition, the inventory is administered to nearly 500 students in late January and then, a few weeks later, in mid-February. When students' scores on the two administrations have been correlated, which one of the following indicators of reliability will have been generated?
A test-retest reliability coefficient
A recently established for-profit measurement company has just published a brand-new set of "interim tests" intended to measure students' progress in attaining certain scientific skills designated as "21st century competencies." There are four supposedly equivalent versions of each interim test, and each of these four versions is to be administered about every two months. Correlation coefficients showing the relationship between every pair of the four versions are made available to users. What kind of coefficient do these between-version correlations represent?
An alternate-form coefficient
Consider following multiple-choice item dealing either with the nature of assessment bias or the ways of reducing it, and select the best of the three answer-options.
Because assessment bias erodes the validity of inferences derivative from students' test performances, even greater effort should be made to reduce assessment bias when working with these two distinctive populations.
Why do some members of the measurement community prefer to use the phrase "absence-of-bias" rather than "assessment bias" when quantitatively reporting the degree to which an educational test appears to be biased?
Because both reliability and validity, two key attributes of educational tests, are positive, "to be sought" qualities, so too is "absence-of-bias" a positive quality to be sought in educational tests.
Suppose that the developers of a new science achievement test had inadvertently laden their test's items with gender-based stereotypes regarding the role of women in science and, when the new test was given, the test scores of girls were markedly lower than the test scores of boys. Which of the following deficits most likely accounts for this gender disparity in students' scores?
Construct-irrelevant variance
This illustrative essay item was written for sixth graders. Thinking back over the mathematics lessons and homework assignments that you received during the past 12 weeks, what mathematical conclusions can you draw? Describe those conclusions in no more than 300 words, written by hand on the test-booklets provided or as a printed copy of your conclusions composed on one of our classroom computers. Select the statement that most accurately appraises this essay item for sixth-grade students.
Despite its adherence to one of the chapter's item-writing guidelines for essay items, the shoddy depiction of a student's task renders the item dysfunctional.
Which one of the following four pairs of validity evidence most frequently revolves exclusively around judgments focused on test content?
Developmental-care documentation and external content reviews by nonpartisan judges
Although the way a state's public schools are run is up to officials of that state, not the federal government, the U.S. Supreme Court has ruled that state-taught students must still be granted their constitutionally guaranteed rights, and this means that teachers should be guided about classroom-assessment coverage by the U.S. Constitution.
False
Although unintended side effects of a teacher's instructional efforts are often encountered, the unpredictability of such unintended effects renders them essentially useless for serious-minded teacher evaluation.
False
A teacher's effective use of performance assessment will almost certainly lead to substantial time-savings for that teacher.
False
Because of the inherent novelty of performance tests, they should be used as a powerful method of measuring students' mastery of "to-be-memorized" bodies of factual knowledge.
False
Because the National Assessment of Educational Progress (NAEP) is widely employed as a "grade-promotion" and "diploma-denial" exam for individual students, teachers whose students take NAEP tests should familiarize themselves with the content in NAEP assessment frameworks to identify potential emphases for classroom assessments.
False
Because parents' preferences regarding what their children should be learning are not only motivationally useful for teachers to employ but also constitute significant curricular guidance for educators, teachers should strive to incorporate parents' curricular opinions in all of their classroom assessments.
False
Because students' growth in their mastery of cognitive skills and knowledge is such a patently important factor by which to evaluate the success of not only schools, but also teachers, classroom assessments should focus exclusively on measuring students' cognitive status.
False
If a commercial publisher of educational tests announces that it is selling "instructionally diagnostic tests," teachers can be relatively certain the results of such tests will provide useful instructional guidance to teachers.
False
If a teacher decides to seek advice from, say, a group of several teacher colleagues regarding the appropriateness of the content for the teacher's planned classroom assessment, professional ethics demand that the curricular counsel of those colleagues must be accepted.
False
If a fifth-grade student's performance on a nationally standardized test indicates that the student has scored with agrade-equivalent score of 7.6, this result indicates the fifth-grader has been placed at too low a grade level.
False
Most U.S. standardized aptitude tests have been developed by state departments of education in response to legislatively enacted accountability laws.
False
Norms tables based on the performances of local students in, for instance, a particular school district, are typically based on a higher level of students' performance than the performance levels seen in national normative tables
False
On an educational test created to fulfill an evaluative function, a test item will typically be most effective in distinguishing between successful and unsuccessful instruction if responses to the item are positively correlated with test-takers' socioeconomic status.
False
Performance testing, because of its requisite reliance on sometimes flawed human scoring, should be chiefly restricted to measuring students' mastery of lower-order cognitive skills.
False
Student growth percentiles, being employed in a large number of state teacher-evaluation programs, are calculated by subtracting last year's mean "accountability-test percentile" for the state's entire student population from this year's mean "accountability-test percentile" to determine if any resultant differences are positive.
False
Fortunately, the students of almost all U.S. teachers are currently required to complete federally required, state-administered accountability examinations—thusmaking students' test performances available for use in teacher-evaluation systems.
False
Scale-score interpretations of students' performances on standardized tests are almost always based on a mean score of 500 and a standard deviation of 100.
False
Value-added models (VAM) are widely accepted, reasonably accurate, and easily understood statistical procedures for evaluating the instructional effectiveness of schools and teachers.
False
The relationship between the degree to which an educational test is biased and the test's disparate impact on certain groups of learners is an important one. Which of the following statements best captures the nature of this relationship?
If an educational assessment displays a disparate impact on different groups of test-takers, it may or may not be biased.
What are the two major causes of assessment bias we encounter in typical educational tests?
Offensiveness and unfair penalization
Suppose that you and several other teachers in a middle school were trying to construct a new test intended to be predictive of high-school students' subsequent scores on the SAT and ACT college admissions exams. Moreover, suppose that you were in no particular hurry to assemble validity evidence in support of the accuracy of those inferred predictions. Which one of the following sources of validity evidence would supply the most compelling support for the validity of your anticipated predictions?
Predictive validity evidence based on the new test's relation to other variables
Based on the 2014 edition of the Standards for Educational and Psychological Testing, and on common sense, which one of the following statements about students' test results represents a potentially appropriate phrasing that's descriptive of a set of students' test performances?
Students' scores on the test permit valid interpretations for this test's use.
If a multistate assessment consortium has generated a new performance test of students' oral communication skills and wishes to verify that students' scores on the performance test remain relatively similar regardless of the time during the school year when the test was completed, which of the following kinds of consistency evidence would be most appropriate?
Test-retest evidence of reliability
Ramon Ruiz is sorting out empty tin cans he found in the neighborhood. He has four piles based on different colors of the cans. He thinks he has made a mistake in adding up how many cans are in each pile. Please identify Ramon's addition statement that is in error. a. 20 bean cans plus 32 cans = 52 cans. b. 43 bean cans plus 18 cans = 61 cans. c. 38 bean cans plus 39 cans = 76 cans. d. 54 bean cans plus 12 cans = 66 cans
The assessment item appears to be biased against Americans of Latino backgrounds.
When external reviewers of a test's content attempt to judge how well a test's items mesh with a specified collection of curricular aims, which one of the following pairs of alignment indicators should be present?
The degree to which each of a test's items is aligned to one or more of the specified curricular aims and a content-coverage indication representing the proportion of the curricular aims adequately represented by the test's items
Here is an illustrative response-scoring plan devised by a high-school Latin teacher. Please review how the teacher plans to evaluate students' Latin compositions, then select the option that most accurately describes the teacher's scoring intentions. A Latin teacher in an urban high school (that has a long and oft-honored history of preparing students for college) frequently expresses during faculty meetings her complete disdain for what she calls "multiple-guess exams." As part of her annual teacher-evaluation evidence, she has been asked by her school's principal to present a written description of how she plans to evaluate students' responses to her constructed-response items. Please consider the following description supplied by the teacher, then select from four alternatives the most accurate comment regarding this teacher's scoring plans. "I plan to score my students' essay responses holistically, not analytically, because I invariably ask students to generate brief essays in which they must incorporate at least half of the new vocabulary terms encountered during the previous week. I supply students with a set of explicit evaluative criteria that I will incorporate in arriving at a single, overall judgment of an essay's quality. Actually, I always pre-weight each of these evaluative criteria and post those weights for students in advance of their tackling this task. Because this is a course emphasizing the writing of Latin (rather than oral Latin), I make it clear to my students—well in advance—that grammar and the other mechanics of writing are very important. When I score students' essays, if there is more than one essay per test, I score all of Essay One before moving on to Essay Two. Because I want these students to become, in a sense, Latin "journalists," I require that they clearly identify themselves with a byline at the outset of each essay. This scoring system, based on nearly 20 years of my teaching Latin to hundreds of our school's students, really works!" Select the statement that most accurately depicts this teacher's scoring plans.
The teacher's approach violates one of the chapter's essay-scoring guidelines.
Consider the following illustrative binary-choice item. Please decide whether the following statement regarding the reliability of educational tests is True or False. Please place a check behind the True or False to indicate your answer. True ___ False___ When determining a test's classification consistency, there is no need to consider the cut score employed nor that cut score's location in the score distribution. Which of the following statements best describes the illustrative item?
This illustrative item violates the item-specific guideline regarding the use of negative statements in a binary-choice item.
Consider the following illustrative matching item. Choose the best match between the item categories in List X and the strengths/weaknesses in List Y. List X List Y ___ (1) matching a. Can cover much content ___ (2) binary-choice b. Can test high-order cognition ___ (3) multiple binary-choice c. May elicit only low-level knowledge d. Cannot assess creative responses Which of the following statements best describes the quality of the illustrative item?
This illustrative matching item contains several departures from Chapter Six's item-writing guidelines for matching items.
What is the chief function of validity evidence when employed to confirm the accuracy of score-based interpretations about test-takers' status in relation to specific uses of an educational test?
To support relevant propositions in a validity argument that's marshaled to determine the defensibility of certain score-based interpretations
Which one of the following sources of validity evidence should be of most interest to teachers when evaluating their own teacher-made tests?
Validity evidence based on test content
Please read the descriptions of fictitious teachers prepping students for upcoming exams, then select the most accurate characterization of the teacher's activities. Mrs. Gordon makes sure she has her sixth-grade students practice each fall for the nationally standardized achievement tests required in the spring by the district's school board. Fortunately, she has been able to make photocopies of most of the test's pages during the past several years, so she can organize a highly relevanttwo-week preparation unit wherein students are given actual test items to solve. At the close of the unit, students are supplied with a practice test on which about 60 percent of the test consists of actual items copied from the nationally standardized commercial test. Mrs. Gordon provides students with an answer key after they have taken the practice test so that they can check their answers. Mrs. Gordon's activities constitute:
violation of both guidelines
Which of the following indices of a test's reliability is most often provided by developers of the kinds of standardized tests destined for use with large numbers of students?
Internal-consistency reliability coefficients
Suppose that a state's governor has appointed a blue-ribbon committee to establish a test-based promotion-denial system for reducing the number of sixth-grade students who are "socially" promoted to the seventh grade. The blue-ribbon committee's proposal calls for sixth-graders to be able to take a new high-stakes promotion exam at any time they wish during their grade-six school year. Given these circumstances, which one of the following evidences of the new promotion exam's measurement consistency should be collected?
Test-retest reliability
Among the most prevalent personal-bias errors made when scoring students' responses to performance tests are generosity errors, severity errors, and central-tendency errors.
True
A test's instructional sensitivity represents the degree to which students' performances on the test accurately reflect the quality of instruction specifically provided with the intention of promoting students' mastery of what is being assessed.
True
Jürgen James, an experienced mathematics teacher, loves fussing with numbers based on his classroom assessments. He studies the performance of his previous year's students on key tests to help him arrive at criterion-referenced interpretations regarding which segments of his instructional program seem to be working well or badly. Based on the performances of both of his algebra classes, he learns that the differences in p-values for items taken by uninstructed students (based on a first-of-year pretest) and p-values for items taken by instructed students (based on final exams) are whopping. That is, when pretest p-values are subtracted from final exam p-values, the resulting differences are mostly at least .40 or higher. Mr. James concludes that his algebra items were doing precisely the job they were created to do. Which of the quotes from Mr. James's fellow teachers most accurately characterizes his conclusion regarding his algebra items?
"Jürgen came to the correct conclusion, and I'm not surprised by his students' p-value jumps—he is a spectacular math teacher!"
Rodney Gardner teaches history in a very large urban high school, and he has been experimenting this year with the use of different kinds of classroom assessments to gauge his students' mastery of key historical concepts and facts. About a month into the new school year, Mr. Gardner used a 25-item multiple-choice test to measure how well students had mastered a rather large array of factual information about key events in U.S. history. He calculated thep-value for each of the test's 25 four-option items, and he was relatively pleased when the average p-value for the entire set of 25 items was .56. Much later in the year, with the same class, he tried out another assessment tactic with a brand-new test consisting of 40 True/False items. He chose the binary-choice items because, after almost six months of instruction, there was substantially more historical knowledge to assess. When he calculated p-values for each of the 40 items, he was gratified to discover that the average p-value was .73. Mr. Gardner concluded that because of the p-value "bump" of .17, his students' learning had increased substantially. Please select from the four choices below the statement that most accurately describes Mr. Gardner's interpretation of his students' test results.
A serious flaw in Mr. Gardner's conclusion about his students' improved learning is that students have a higher probability of guessing correct answers from a set of early-instruction four-option items than from a later-instruction set of binary-choice items.
Anita Gonzales teaches middle-school English courses. At least half of her classroom tests call for students to author original compositions. Her other tests are typically composed of selected-response items. Anita has recently committed herself to the improvement of these selected-response tests, so when she distributes those tests to her students, she also supplies an item-improvement questionnaire to each student. The questionnaire asks students as they complete their tests to identify any items that they regard as (a) confusing, (b) having multiple correct answers, (c) having no correct answers, or (d) containing unfamiliar vocabulary terms. Students are to turn in their questionnaires along with their completed tests, but are given the option of turning in the questionnaires anonymously or not. Which of the following statements most accurately portrays Anita's test-improvement procedures?
Although seeking students' judgments regarding her tests has much to commend it, Anita should have sought students' reactions to a test only after they had completed it—by distributing blank copies of the test along with the item-improvement questionnaire.
These illustrative short-answer items were created for use in a twelfth-grade English course and are intended to be used in the course's midterm exam. Please complete the short-answer items below by filling in the blank you will find in each item. • __________ is the case to be employed with all modifiers of gerund—definitely including pronouns. • A __________ infinitive that, in former times, was regarded as a grammatical error is now acceptably encountered in all kinds of writing. Which of the following assertions best reflects how these two short-answer items conform to the chapter's item-writing guidelines for such items.
Although several of the chapter's item-writing guidelines have been properly followed, there is the same, rather obvious, violation of an item-writing guideline in both items.
During last year's end-of-school evaluation conference, Jessica Jones, a high-school social studies teacher, was told by the principal that her classroom tests were "haphazard at best." Jessica now intends to systematically review each of the classroom tests she builds, based on her principal's suggestions. She intends to personally evaluate each test on the basis of (a) its likely contribution to a valid test-based inference, (b) the accuracy of its content, (c) the absence of any important content omissions, and (d) the test's fundamental fairness. Choose, from the following options, the best appraisal of Jessica's test-review plans.
Although the four test-review factors Jessica chose will help her identify certain deficiencies in her tests, she should also incorporate as review criteria a full range of widely endorsed experience-based and research-based (a) item-specific guidelines and (b) general item-writing guidelines
A compulsive middle-school teacher, even after reading Chapter 2's recommendation urging teachers not to collect reliability evidence for their own teacher-made tests, perseverates in calculating Kuder-Richardson indices for all of his major and minor classroom exams. What kind of reliability indicator is this teacher attempting to compute?
An internal-consistency reliability coefficient
Please assume you are a middle-school English teacher who, despite this chapter's urging that you rarely, if ever, collect reliability evidence for your own tests, stubbornly decides to do so for all of your mid-term and final exams. Although you wish to determine the reliability of your tests for the group of students in each of yourclasses, you only wish to administer the tests destined for such reliability analyses on one occasion, not two or more. Given this constraint, which of the following coefficients would be most suitable for your reliability-determination purposes?
An internal-consistency reliability coefficient
A dozen middle-school mathematics teachers in a large school district have collaborated to create a 30-item test of students' grasp of what the test's developers have labeled "Essential Quantitative Aptitude," that is, students' EQA. All 30 items were constructed in an effort to measure each student's EQA. Before using the test with many students, however, the developers wish to verify that all or most of its items are functioning homogeneously, that is, are properly aimed at gauging a test-taker's EQA. On which of the following indicators of assessment reliability should the test developers focus their efforts?an internal-consistency reliability coefficient
An internal-consistency reliability coefficient
This illustrative item is intended for use in a middle-school American history course. Directions: Remembering the class discussions of America's current immigration issues, please provide a brief essay on each of the issues cited below. You will have a full 50-minute class period to complete this examination, and you should divide your essay-writing efforts equally between the two topics. In grading your twin essays, equal weight will be given to each essay. Remember, compose two clear essays—onefor each issue. Your Two Essay Topics 1. Why would some form of "amnesty" for illegal aliens be a helpful solution to at least part of today's U.S. immigration problems? 2. Why would some form of "amnesty" for illegal aliens be a disastrous solution to today's U.S. immigration problems? Which of the following statements most accurately describes the match between the illustrative item and the Chapter 7 guidelines for creating essay items?
At least one of the chapter's guidelines has been explicitly followed in the illustrative item.
The teaching staff in a suburban middle school is concerned with the quality of their school's teacher-made classroom assessments. This issue has arisen because the district school board has directed all schools to install a teacher-evaluation process featuring "prominently weighted" evidence of students' learning—as measured chiefly by teacher-made tests. Because many of these teachers have never completed any sort of educational measurement course, they are worried about whether their teacher-made tests will be up to the evaluative challenge presented by the district's new teacher-appraisal procedure. The district office requires teachers to submit all students' responses from each classroom assessment immediately after those assessments have been administered. Then, in less than two weeks after submission, teachers receive descriptive statistics for each test (such as students' means and standard deviations). Teachers also receive an internal consistency reliability coefficient for the total test and, in addition, a p-value and an item-discrimination index for each item. Teachers then must personally judge the quality of their own tests' items. The teachers' reviews of their test's individual items are seen as "subjective" by almost everyone involved, whereas the empirical evidence of item quality is regarded as "objective." Thus, the school's faculty unanimously decides to weight teachers' own per-item judgments at 25 percent while weighting the statistical per-item p-values and item-discrimination indices at 75 percent. Please select the statement that most accurately characterizes the test-improvement procedures in this suburban middle school.
Because the relevance of traditional item-quality indicators, such as those supplied by this school's district office, can vary depending on the specific use to which a teacher-made test will be put, the across-the-board weightings (25 percent judgmental; 75 percent empirical) may be inappropriate for the proposedteacher-evaluation process.
Only one of the following statements about a test's classification consistency is accurate. Select the accurate statement regarding classification consistency.
Classification consistency indicators represent the proportion of students classified identically on two testing occasions.
Mr. Wong, a second-year mathematics teacher in a large urban high school, is seeking frequent reactions to his teacher-made tests from the other mathematics teachers in his school. He typically first secures his colleagues' agreement during informal faculty-lounge conversations, then relays copies of his tests—along with brief review forms—at several points in the school year. Although Mr. Wong simultaneously carries out systematic reviews of his own tests by employing what he regards as "a first-rate" test-appraisal rubric from his school district, when his own views regarding any of his test's items conflict with those of his colleagues, he always defers to the reactions of his much more experienced fellow teachers. Please choose from the following options the most accurate statement regarding Mr. Wong's test-improvement efforts. Question options:
Even though Mr. Wong is wise in seeking the item-quality reactions of his school's other math teachers—especially because he is only in his second year of teaching—the ultimate decision about the quality of any of his test items should not be deferentially based on collegial input but, rather, based on Mr. Wong's own judgment.
A district's new teacher-evaluation procedure is heavily based on observations of teachers' classroom performances. School-site administrators, along with a small group of recently retired school principals, have been observing the teachers, then supplying evaluations related to teachers' observed instructional effectiveness. When officials of the teachers' union raise a concern about these teacher-evaluators' inconsistencies of judgment when using a district-devised observation form, the district's superintendent asks her staff to collect validity evidence bearing on the teachers' union concern. Which one of the following sources of validity evidence will most likely play a major role in resolving the charge that the classroom-observation evidence is flawed?
Evidence based on response processes
Assume a state's education authorities have recently established a policy that, in order for students to be promoted to the next grade level, those students must pass a state-supervised English and language arts (ELA) exam. Administered near the close of Grades three, six, and eight, the three new grade-level exams are intended to determine a student's mastery of the official state-approved ELA curricular targets for those three grades. As state authorities set out to provide support for these "promotion-denial" exams, which one of the following sources of validity evidence are they likely to rely on most heavily?
Evidence based on test content
One of your colleagues, a high-school chemistry teacher, believes that certain of her students have somehow gained access to the final exams she has always used in her classes. To address what she calls "this serious security violation," she has created four new versions of all of her major exams—four versions that she regards as "equally challenging." She has recently sought your advice regarding what sort of reliability evidence she ought to be collecting regarding these new multiple renditions of her chemistry exams. In this situation, which one of the following should you be recommending to her?
Evidence regarding the alternate-form reliability of her several exams
Consider each description of fictional teachers carrying out their implementations of portfolio assessment, then indicate whether item following that description is True or False. Maria Flores Installs Portfolio Assessment Having decided to adopt a portfolio assessment approach for the written-composition segment of her middle-school English classes, Maria Flores introduces her students to the new assessment scheme by asking a commercial artist friend of hers to speak to each class. The artist brings his own portfolio and shows students how it allows prospective clients to judge his work. Ms. Flores tells her students that her friend's portfolio is called a "showcase portfolio" and that students will be preparing both a showcase portfolio to periodically involve their parents in reviewing a student's work products, as well as a "working portfolio" to keep track of all of their composition drafts and final products. Ms. Flores and her friend emphasize that both kinds of portfolios must be owned by the student, not the teacher. Early in the academic year, Ms. Flores works with each of her classes to decide collaboratively on the evaluative criteria to be used in the rubrics that will be used in a particular class for judging the composition efforts of a given class. Although these "per-class" rubrics occasionally differ in certain respects for different classes, they are generally quite similar. Students are directed to place all of their drafts and final versions in folders and then put those folders in a designated file drawer in the classroom. Ms. Flores makes sure to review all students' portfolios at least once a month. Typically, she devotes one preparation period a day to a differentclass's portfolios. Because the portfolios are readily available, Ms. Flores finds it convenient and time-efficient to evaluate students' progress in this manner. She provides a brief (dated) "teacher's evaluation" for students to consider when they work with their own portfolios. At least twice every term, Ms. Flores selects what she considers to be the students' best finished compositions from their working portfolios. She places such work products in a showcase portfolio. Students are directed to take these showcase portfolios home to let their families see what kinds of compositions they have been creating. Parents are enthusiastic about this practice. A number of parents have told the school's principal that Ms. Flores's "take-home" portfolio system is the way they would like to see other aspects of their children's performances evaluated. To be fair, Ms. Flores should have established a uniform set of evaluative criteria for all of her classes.
False
Consider each description of fictional teachers carrying out their implementations of portfolio assessment, then indicate whether item following that description is True or False. Gary Owens Gives Portfolio Assessment a Try A third-grade teacher, Gary Owens, has just completed a summer professional-development workshop on portfolio assessment. He and a number of the teachers at his school have decided to try out performance assessment at least in limited parts of their instructional and assessment programs. Gary has decided to use portfolios with his third-graders' mathematics work for a full school year. He introduces students to the activity by stressing the importance of their personal ownership of the portfolios and the significance of their choosing the kinds of mathematics work they put in their portfolios. Gary suggests to the class that students include only problem-solution mathematics work in their portfolios. Thus, they should not put drill work and simple computational work in the portfolios. The students discuss this suggestion for a while and then unanimously agree. Early on, Gary works with students for two full days to decide on the evaluative criteria in the rubrics he and they will use when evaluating the mathematics work in the portfolios. They decide, collaboratively, that the major evaluative criteria will be (1) selection of proper solution strategies, (2) accurate completion of selected solution procedures, and (3) arrival at the correct solution to the problem. Students routinely collect their work and place it for safekeeping in specially marked cardboard boxes that Gary has arranged on the "Portfolio Shelf." Every two months, Gary holds an individual portfolio conference with each student during which he supplies the student with a "teacher's appraisal" of that student's portfolio work. It is clear to Gary that his students' ability to solve mathematics problems has improved substantially. Although it took most students several weeks to get used to the process, they now seem to thoroughly enjoy Gary's version of portfolio assessment in mathematics. He does also. Although Gary holds individual portfolio conferences with students every two months, he should have been holding such conferences weekly.
False
Consider each description of fictional teachers carrying out their implementations of portfolio assessment, then indicate whether item following that description is True or False. Gary Owens Gives Portfolio Assessment a Try A third-grade teacher, Gary Owens, has just completed a summer professional-development workshop on portfolio assessment. He and a number of the teachers at his school have decided to try out performance assessment at least in limited parts of their instructional and assessment programs. Gary has decided to use portfolios with his third-graders' mathematics work for a full school year. He introduces students to the activity by stressing the importance of their personal ownership of the portfolios and the significance of their choosing the kinds of mathematics work they put in their portfolios. Gary suggests to the class that students include only problem-solution mathematics work in their portfolios. Thus, they should not put drill work and simple computational work in the portfolios. The students discuss this suggestion for a while and then unanimously agree. Early on, Gary works with students for two full days to decide on the evaluative criteria in the rubrics he and they will use when evaluating the mathematics work in the portfolios. They decide, collaboratively, that the major evaluative criteria will be (1) selection of proper solution strategies, (2) accurate completion of selected solution procedures, and (3) arrival at the correct solution to the problem. Students routinely collect their work and place it for safekeeping in specially marked cardboard boxes that Gary has arranged on the "Portfolio Shelf." Every two months, Gary holds an individual portfolio conference with each student during which he supplies the student with a "teacher's appraisal" of that student's portfolio work. It is clear to Gary that his students' ability to solve mathematics problems has improved substantially. Although it took most students several weeks to get used to the process, they now seem to thoroughly enjoy Gary's version of portfolio assessment in mathematics. He does also. Gary's early effort to work with his students in determining the evaluative criteria for their rubrics was premature and should have been delayed until at least the middle of the school year so that students would better understand the nature of the mathematics skills being sought.
False
Consider each description of fictional teachers carrying out their implementations of portfolio assessment, then indicate whether item following that description is True or False. Maria Flores Installs Portfolio Assessment Having decided to adopt a portfolio assessment approach for the written-composition segment of her middle-school English classes, Maria Flores introduces her students to the new assessment scheme by asking a commercial artist friend of hers to speak to each class. The artist brings his own portfolio and shows students how it allows prospective clients to judge his work. Ms. Flores tells her students that her friend's portfolio is called a "showcase portfolio" and that students will be preparing both a showcase portfolio to periodically involve their parents in reviewing a student's work products, as well as a "working portfolio" to keep track of all of their composition drafts and final products. Ms. Flores and her friend emphasize that both kinds of portfolios must be owned by the student, not the teacher. Early in the academic year, Ms. Flores works with each of her classes to decide collaboratively on the evaluative criteria to be used in the rubrics that will be used in a particular class for judging the composition efforts of a given class. Although these "per-class" rubrics occasionally differ in certain respects for different classes, they are generally quite similar. Students are directed to place all of their drafts and final versions in folders and then put those folders in a designated file drawer in the classroom. Ms. Flores makes sure to review all students' portfolios at least once a month. Typically, she devotes one preparation period a day to a differentclass's portfolios. Because the portfolios are readily available, Ms. Flores finds it convenient and time-efficient to evaluate students' progress in this manner. She provides a brief (dated) "teacher's evaluation" for students to consider when they work with their own portfolios. At least twice every term, Ms. Flores selects what she considers to be the students' best finished compositions from their working portfolios. She places such work products in a showcase portfolio. Students are directed to take these showcase portfolios home to let their families see what kinds of compositions they have been creating. Parents are enthusiastic about this practice. A number of parents have told the school's principal that Ms. Flores's "take-home" portfolio system is the way they would like to see other aspects of their children's performances evaluated. Ms. Flores and her artist friend should never have urged middle-school students to personally own their portfolios, particularly while students were first learning about portfolio-assessment procedures.
False
Consider each description of fictional teachers carrying out their implementations of portfolio assessment, then indicate whether item following that description is True or False. Maria Flores Installs Portfolio Assessment Having decided to adopt a portfolio assessment approach for the written-composition segment of her middle-school English classes, Maria Flores introduces her students to the new assessment scheme by asking a commercial artist friend of hers to speak to each class. The artist brings his own portfolio and shows students how it allows prospective clients to judge his work. Ms. Flores tells her students that her friend's portfolio is called a "showcase portfolio" and that students will be preparing both a showcase portfolio to periodically involve their parents in reviewing a student's work products, as well as a "working portfolio" to keep track of all of their composition drafts and final products. Ms. Flores and her friend emphasize that both kinds of portfolios must be owned by the student, not the teacher. Early in the academic year, Ms. Flores works with each of her classes to decide collaboratively on the evaluative criteria to be used in the rubrics that will be used in a particular class for judging the composition efforts of a given class. Although these "per-class" rubrics occasionally differ in certain respects for different classes, they are generally quite similar. Students are directed to place all of their drafts and final versions in folders and then put those folders in a designated file drawer in the classroom. Ms. Flores makes sure to review all students' portfolios at least once a month. Typically, she devotes one preparation period a day to a differentclass's portfolios. Because the portfolios are readily available, Ms. Flores finds it convenient and time-efficient to evaluate students' progress in this manner. She provides a brief (dated) "teacher's evaluation" for students to consider when they work with their own portfolios. At least twice every term, Ms. Flores selects what she considers to be the students' best finished compositions from their working portfolios. She places such work products in a showcase portfolio. Students are directed to take these showcase portfolios home to let their families see what kinds of compositions they have been creating. Parents are enthusiastic about this practice. A number of parents have told the school's principal that Ms. Flores's "take-home" portfolio system is the way they would like to see other aspects of their children's performances evaluated. Ms. Flores's major omission in her implementation of portfolio assessment is her failure to engage her students in one-on-one portfolio conferences during the school year. Because she teaches at the middle-school level, Ms. Flores should not have used both showcase and working portfolios during the same time with her students.
False
Note that this Mastery Check was specifically designed both to provide evidence regarding your outcome mastery and to fulfill a diagnostic function. It addresses four topics: (1) the relationship between affective assessment relationship and instruction (Items 1, 5, 9, 13, and 17), (2) self-report inventories (Items 2, 6, 10, 14, and 18), (3) respondents' anonymity (Items 3, 7, 11, 15, and 19), and (4) how to interpret results of affective assessment (Items 4, 8, 12, 16, and 20). Thus, you can determine how well you seem to understand each of the four topics by focusing on how you performed on the five items for that topic. If respondents who are completing an affective self-report inventory that's presented by a computer are informed at the outset that their responses will be anonymous, it can be safely assumed that almost all students will believe their responses to be truly anonymous.
False
Darrell Ito teaches English in a suburban middle school and has been intrigued by his principal's strong advocacy of formative assessment. He has read several articles about formative assessment and borrowed a book from his school's professional-development library dealing solely with how a teacher launches aformative-assessment program. Based on what he has read, Darrell decides not to use formative assessment when his students are learning how to create original compositions but, rather, when pursuing any curricular aims involving "Writer's Rules," such as punctuation rules, spelling rules, and usage conventions. Moreover, he decides to rely almost exclusively on students' self-reported understandings, that is, their use of red, green, and yellow plastic cups to indicate the degree to which they are "getting it" as the class proceeds. Every few weeks, based on his analysis of the sequence of rules his students must master (part of what he calls a "learning progression"), Darrell orally presents a set of three or four "Writer's Rules" to his students. After each rule has been presented, Darrell says, "Traffic-signal colors, class!" At that point, students put a green cup on top of their cup-stack to indicate that they understand the presented rule well. Students put a yellow cup on top of their stack if they are unsure about their understanding of the rule just given. And, of course, the red cup goes on top of the stack if a student really is baffled by the particular rule that Darrell has described. As students' cup stacks are being rearranged, Darrell visually surveys the stacks to determine which colors predominate. Because formative assessment calls for an obligatory instructional adjustment based on assessment-elicited evidence, Darrell provides additional instruction regarding each rule—but tries to make the adjusted instruction quite different from the way he taught the rule earlier. The greater the prevalence of yellow or red cups, the more instructional time Darrell devotes to what he calls his "second stab" at rule-teaching. Based on several months of Darrell's attempt to implement formative assessment in his English classes, almost all students are pleased with the approach. So is Darrell. Darrell's decision to use formative assessment only with the more explicit Writer's Rules, rather than with the promotion of students' actual composition skills, was acceptable chiefly because formative assessment is not effective when used in pursuit of students' cognitive skills that must be measured by constructed-response items.
False
Emily Contreras teaches "Modern Spanish" in a large suburban high school whose Latino students constitute less than 10 percent of the student body. Sensing the approach of substantial demographic changes in the United States, she wants to make certain that many more of her school's non-Latino students have at least a modicum of Spanish-speaking skills. Because Emily has a solid belief in the value of assessment-illuminated instruction and especially the merits of formative assessment, she was particularly pleased last year to see that a commercial test organization had published a set of new "formative assessments in Spanish designed for students at all levels ofSpanish-speaking proficiency." Emily persuaded her school's principal, in collaboration with principals from several other district high schools, to purchase sufficient copies of these new tests to be used in the manner the tests' publisher specifies. The tests, according to their publisher, are to be administered four times a year—at the beginning of the school year, at its conclusion, and at two separate times during the middle three months of the school year. In this way, the publisher asserts, "the tests' formative dividends" will be maximized for both teachers and students alike. The new tests are described by their developers as "consistent with findings of the widely accepted Black and Wiliam research review of 1998" and can also function as excellent predictors of high-school students' subsequent college accomplishments if they take additional courses in Spanish. Emily is simply delighted that these new assessments in Spanish, previously unavailable wherever she has been teaching, can be used in her classes. Emily's acceptance of the prescribed fashion in which the new tests are to be administered leads to the warranted conclusion that she is definitely engaged in the formative-assessment process.
False
Even though teachers should not take away too much instructional time because of their classroom assessments, the number of assessment targets addressed by any classroom test should still be numerous and wide-ranging so that more curricular content can be covered.
False
For evaluation of instructional quality, if the evidence of students' learning collected from external accountability tests disagrees with the evidence of students' learning collected from teacher-made classroom assessments, the accountability-test evidence should always trump the classroom-assessment evidence.
False
From a purely statistical perspective, when student growth is to play a prominent role in the evaluation of teachers and pre-instruction assessment results are to be contrasted with post-instruction assessment results, it is advantageous for a teacher's students to score particularly high on the pretest.
False
Fully 50 percent of any college student's grades can be linked back to that student's performance on the ACT or SAT while in high school.
False
George Lee has recently been asked by his principal to teach a geography course in the high school where he has taught for the past four years. Although George has never taken a college-level course in geography, his principal assures him that he can "pull this off with ease." The emergency situation was caused by the sudden illness of the school's regular geography teacher, Mr. Hibbard, just before the start of the school year. As he surveys the materials that Mr. Hibbard has left him, George sees there are five major curricular outcomes that students are supposed to achieve during the one-semester course. However, as far as George can see, Mr. Hibbard's exams seem to be based more on the content covered in the course's geography textbook than deal specifically with the five curricular outcomes. Nonetheless, after reviewing the five curricular goals, Mr. Hibbard has described to him in an e-mail, and also prominently posted on the classroom bulletin board, George is increasingly convinced that much of the content in the textbook (and on Mr. Hibbard's midterm and final exams) is not particularly relevant to the five "Geography Goals" emphasized by Mr. Hibbard. Given his modest familiarity with geography, however, George is delighted that Mr. Hibbard has left his exams to be used if a substitute teacher wishes to employ them. George definitely plans to do so. George urges his students to pay careful attention to the five goals as they study their geography material throughout the semester. He then determines each student's grade primarily on the basis of the two major exams in the course—thatis, 50 percent for the final exam; 35 percent for the midterm exam; 10 percent for short, in-class quizzes; and 5 percent for class participation. Is George a GGAG? (TRUE or FALSE)
False
If a state's education officials have endorsed the Common Core State Standards, but have chosen to create their state's own accountability tests to measure those standards (instead of using tests built by a multistate assessment consortium), it is still sensible for a teacher in that state to seek test-construction guidance from what's measured by consortium-created tests.
False
If ever a single term captured Mary Evan's conception of teaching, it would surely be the word "differentiation." Mary knows all too well, from her 13 years of teaching primary-grade children, that students vary enormously not only in what they have learned as part of their own family experiences but also with respect to their innate strengths and weaknesses. Accordingly, when she dispenses grades in her class, she is repelled by any evaluative procedure that fails to take into consideration these enormous differences. Because her school district's leaders are insistent on emphasizing the state's official content standards (curricular aims), those officials insist that all of the district's teachers structure their major semester grades around the state's content standards. In her case, however, Mary differentiates by adjusting each student's grade so that it represents how well a student has mastered the state-approved curricular aims—butalways in relation to each student's actual "potential to excel." What Mary's approach means, in practical terms, is that sometimes students whose potential, according to a judgment by Mary, is quite limited may, even though scoring much lower on key exams than many other students, end up with the same grade as other, more able students. Is Mary a GGAG? (TRUE or FALSE)
False
In recognition of the significant impact a state's official accountability tests can have on what that state's students ought to be learning, it is apparent that a teacher's classroom tests should only measure the very same skills and knowledge that are assessed on the state's accountability tests—and never assess students' mastery of en route skills or bodies of knowledge that teachers might see as contributory to mastering what's measured on the state tests.
False
Information about how to differentiate the quality of students' responses to performance-test tasks should be supplied for a minimum of at least half of a rubric's evaluative criteria.
False
Instructionally diagnostic tests, because their mission is to promote greater learning from students, must necessarily be focused on detecting a student's weaknesses. Question options:
False
In general, holistically scored rubrics are more useful for pinpointing students' strengths and weaknesses than are analytically scored rubrics.
False
James Jackson teaches third graders in an inner-city elementary school. He has been a faculty member at the school for two years, having taught the third grade at a rural school for five years before that. Because he is dismayed with the skills of his students, especially in mathematics, he has decided to implement aformative-assessment strategy—in math only—for his students. He understands that formative assessment will generally be more successful if it focuses attention on only a modest number of higher-order mathematics skills rather than on a large number of less important subskills and bodies of knowledge that third-grade children should master. Accordingly, he identifies six truly challenging mathematics skills and splits the school year into six separate skill-promotion units of at least six weeks' duration aimed at each of the six target skills. For each of the six units, he then identifies a learning progression identifying no more than four "building blocks," that is, subskills or bodies of knowledge James regards as precursive to students' attainment of the mathematics skill being promoted in each unit. These learning progressions serve as a sort of "instructional map" intended to guide James with his teaching. As students near the close of each building block, James uses a variety of selected-response tests to measure his students' mastery of the subskill or knowledge embodied in that particular building block. Then, for any building block with which many students are having difficulty, he provides additional and sometimes dramatically different instruction dealing with what the building block measured. James believes his approach to formative assessment seems to be working. James definitely selected too few complex skills for a full academic year becausestudents—rural or inner city—can master many more than six challenging cognitive skills in math.
False
Leonard teaches four general science courses in an urban middle school. He is familiar with the substantial body of research evidence indicating that students' level of effort is a powerful predictor of their subsequent success in school and, perhaps more importantly, later in life when school years will only be a memory.Accordingly, when Leonard grades his 114 students each term, he never grades them simply on the basis of how well they have achieved the six curricular aims around which his science course is organized. Rather, for each student, Leonard attempts to reach an informed judgment about that particular student's level of effort exhibited in the science course. He tries to observe students closely during the course and often makes recorded judgments about each student's effort level on a one-to-five point scale. Leonard always makes this mix of effort and goal achievement well known to his students in advance, feeling it only fair, as he says, to "lay his grading cards on the table." Moreover, he believes that his effort-oriented grading system will spur his students to work harder in the course. Is Leonard a GGAG? (TRUE or FALSE)
False
Many standardized educational tests created by commercial assessment firms are instructionally insensitive primarily because their developers deliberately intended to construct standardized tests uninfluenced by the often unpredictable caliber of different teachers.
False
Note that this Mastery Check was specifically designed both to provide evidence regarding your outcome mastery and to fulfill a diagnostic function. It addresses four topics: (1) the relationship between affective assessment relationship and instruction (Items 1, 5, 9, 13, and 17), (2) self-report inventories (Items 2, 6, 10, 14, and 18), (3) respondents' anonymity (Items 3, 7, 11, 15, and 19), and (4) how to interpret results of affective assessment (Items 4, 8, 12, 16, and 20). Thus, you can determine how well you seem to understand each of the four topics by focusing on how you performed on the five items for that topic. A teacher's challenge in reducing students' tendency to supply socially desirable answers on a self-report inventory is identical to the challenge in reducing students' socially desirable responses to the items on a cognitive test.
False
Note that this Mastery Check was specifically designed both to provide evidence regarding your outcome mastery and to fulfill a diagnostic function. It addresses four topics: (1) the relationship between affective assessment relationship and instruction (Items 1, 5, 9, 13, and 17), (2) self-report inventories (Items 2, 6, 10, 14, and 18), (3) respondents' anonymity (Items 3, 7, 11, 15, and 19), and (4) how to interpret results of affective assessment (Items 4, 8, 12, 16, and 20). Thus, you can determine how well you seem to understand each of the four topics by focusing on how you performed on the five items for that topic. If a teacher sets out to bring about changes in students' values, he or she needs to select as a curricular aim the promotion of only those values that are supported by more than 50 percent of the students' parents and at least half of the general citizenry of the state in which the teacher's school is located. Question options:
False
Note that this Mastery Check was specifically designed both to provide evidence regarding your outcome mastery and to fulfill a diagnostic function. It addresses four topics: (1) the relationship between affective assessment relationship and instruction (Items 1, 5, 9, 13, and 17), (2) self-report inventories (Items 2, 6, 10, 14, and 18), (3) respondents' anonymity (Items 3, 7, 11, 15, and 19), and (4) how to interpret results of affective assessment (Items 4, 8, 12, 16, and 20). Thus, you can determine how well you seem to understand each of the four topics by focusing on how you performed on the five items for that topic. The difficulties stemming from the presence of "social desirability's" contamination of students' responses can be effectively addressed by informing students in the initial directions that they are to identify themselves only after responding to all items.
False
Note that this Mastery Check was specifically designed both to provide evidence regarding your outcome mastery and to fulfill a diagnostic function. It addresses four topics: (1) the relationship between affective assessment relationship and instruction (Items 1, 5, 9, 13, and 17), (2) self-report inventories (Items 2, 6, 10, 14, and 18), (3) respondents' anonymity (Items 3, 7, 11, 15, and 19), and (4) how to interpret results of affective assessment (Items 4, 8, 12, 16, and 20). Thus, you can determine how well you seem to understand each of the four topics by focusing on how you performed on the five items for that topic. The greater the educational significance that a teacher attributes to the pursuit of affective curricular aims, the more acceptable it is for the teacher to use students' self-report responses not only to arrive at affectively focused inferences about groups of students but also to make inferences about particular student's affective status.
False
Note that this Mastery Check was specifically designed both to provide evidence regarding your outcome mastery and to fulfill a diagnostic function. It addresses four topics: (1) the relationship between affective assessment relationship and instruction (Items 1, 5, 9, 13, and 17), (2) self-report inventories (Items 2, 6, 10, 14, and 18), (3) respondents' anonymity (Items 3, 7, 11, 15, and 19), and (4) how to interpret results of affective assessment (Items 4, 8, 12, 16, and 20). Thus, you can determine how well you seem to understand each of the four topics by focusing on how you performed on the five items for that topic. The only acceptable response options presented to a student who must complete a self-report affective inventory containing statements about the same topic should be the following: Strongly Agree, Agree, Uncertain, Disagree, Strongly Disagree.
False
Note that this Mastery Check was specifically designed both to provide evidence regarding your outcome mastery and to fulfill a diagnostic function. It addresses four topics: (1) the relationship between affective assessment relationship and instruction (Items 1, 5, 9, 13, and 17), (2) self-report inventories (Items 2, 6, 10, 14, and 18), (3) respondents' anonymity (Items 3, 7, 11, 15, and 19), and (4) how to interpret results of affective assessment (Items 4, 8, 12, 16, and 20). Thus, you can determine how well you seem to understand each of the four topics by focusing on how you performed on the five items for that topic. The vast majority of educators think revealing to students the nature of a teacher's affective curricular aims—at the outset of instruction—is an instructionally sensible action to take.
False
Note that this Mastery Check was specifically designed both to provide evidence regarding your outcome mastery and to fulfill a diagnostic function. It addresses four topics: (1) the relationship between affective assessment relationship and instruction (Items 1, 5, 9, 13, and 17), (2) self-report inventories (Items 2, 6, 10, 14, and 18), (3) respondents' anonymity (Items 3, 7, 11, 15, and 19), and (4) how to interpret results of affective assessment (Items 4, 8, 12, 16, and 20). Thus, you can determine how well you seem to understand each of the four topics by focusing on how you performed on the five items for that topic. To provide a more complete picture of students' current affective status, it is sensible to ask students to supplement their anonymous responses to a self-report inventory by adding optional, unsigned explanatory comments if they wish to do so.
False
Note that this Mastery Check was specifically designed both to provide evidence regarding your outcome mastery and to fulfill a diagnostic function. It addresses four topics: (1) the relationship between affective assessment relationship and instruction (Items 1, 5, 9, 13, and 17), (2) self-report inventories (Items 2, 6, 10, 14, and 18), (3) respondents' anonymity (Items 3, 7, 11, 15, and 19), and (4) how to interpret results of affective assessment (Items 4, 8, 12, 16, and 20). Thus, you can determine how well you seem to understand each of the four topics by focusing on how you performed on the five items for that topic. When anonymously completed self-report inventories are being used in an attempt to assess students' affect, if some students respond too positively and, at the same time, some students respond too negatively, teachers simply cannot draw valid group-focused inferences about students' affective status.
False
Note that this Mastery Check was specifically designed both to provide evidence regarding your outcome mastery and to fulfill a diagnostic function. It addresses four topics: (1) the relationship between affective assessment relationship and instruction (Items 1, 5, 9, 13, and 17), (2) self-report inventories (Items 2, 6, 10, 14, and 18), (3) respondents' anonymity (Items 3, 7, 11, 15, and 19), and (4) how to interpret results of affective assessment (Items 4, 8, 12, 16, and 20). Thus, you can determine how well you seem to understand each of the four topics by focusing on how you performed on the five items for that topic. When assessing the affective dispositions of a group of students by using a self-report inventory, it is obligatory to employ a Likert inventory as similar as possible to the inventories introduced by Rensis Likert in 1932.
False
One of the most practical ways of reducing the reactive effects of pretesting on students' performances is to construct two equidifficult forms of a test, then randomly administer one of the forms as a pretest and the other form as a posttest.
False
Prior to its recent redesign, the SAT was fundamentally an achievement test, whereas the ACT is, at bottom, an aptitude test.
False
Rose Stanley believes in the importance of letting her fifth-grade students know what the educational objectives are, and she does so at the beginning of each school year. Moreover, Rose believes that a prominent determinant of the grades she gives to students should be directly linked to how well students have achieved the objectives set out for them. However, Rose realizes that statements of educational objectives are sometimes similar to the ink blots in a projective Rorschach test where we often see in those ink blots what we wish to see. Accordingly, Rose enlists the advice of her students in deciding how much to weight the various kinds of evidence that will be used in determining whether a student has, in fact, mastered each of the nine major objectives she had chosen for the students. At least two hours, on and off, are devoted early in the school year to class discussions of how to weight such things as students' test performances, in-class participation, effort level, attendance, independent and group projects related to one or more of the nine major objectives. Rose's students appear to be particularly content with the grade-related weightings and, after those weights have been applied, to the actual grades Rose awards. Is Rose a GGAG? (TRUE or FALSE)
False
Task-specific rubrics typically contribute more to promoting students' generalizable mastery of high-level cognitive skills than do skill-focused rubrics.
False
Teachers who rely chiefly on hypergeneral rubrics are most likely to spur students to acquire a generalized mastery of whatever skills are being assessed by the performance tests involved.
False
The most effective way to construct rubrics for efficient and accurate scoring of students' responses to performance tests is to build tests that can be scored simultaneously using analytic and holistic evaluative approaches.
False
Two federal initiatives, the Race to the Top Program in 2009 and the ESEA Flexibility Program in 2011, caused a dramatic increase in U.S. educators' attempts to create defensible teacher-appraisal programs focused on the formative evaluation of a teacher.
False
Whenever possible, teachers should attempt to have their assessments focus quite equally on the cognitive, affective, and psychomotor domains because almost all human acts including students' test-taking—rely to a considerable extent on those three domains of behavior.
False
When large-scale achievement tests contain many items linked to students' socioeconomic status or to students' inherited academic aptitudes, such tests more accurately identify the instructional quality of the students' teachers.
False
Whereas percentiles function as a relative scheme for interpreting standardized test results, grade-equivalent scores represent an absolute procedure for interpreting results of standardized testing.
False
What's most important to Antonio Lopez is that students learn what they are supposed to learn. He teaches social studies in an inner-city middle school, and his students are remarkably heterogeneous. Many of the families served by Antonio's school are definitely low-income, but in the past few years a major "gentrification" project has led to the arrival of a fair number of more affluent families. Faced with such diversity, Antonio attempts to make his grading system mesh with the composition of the school's students. The essence of Antonio's approach to grading is quite simple. What he does is wait a full month before assigning instructional goals to his students, but those assignments are particularized. In other words, they're peculiar to each student—notapplicable to all students. During the first month of school, Antonio attempts to get a fix on each student's probability of goal mastery. He administers several classroom assessments that, although focused on assessing students' social studies achievement, seem to function in a fashion similar to that fulfilled by a group-administrable aptitude test. The resultant sets of instructional goals are quite distinctive. Any pair of students are likely to have few in-common goals. Antonio is convinced that his particularization of grading expectations will, in the long term, benefit his students. Is Antonio a GGAG? (TRUE or FALSE)
False
This illustrative short-answer item was written for a third-grade class. The purpose is to help both the teacher and the students determine how well those students had achieved mastery of a recent state-approved language arts curriculum goal. Please write your answer legibly. _____________ is a good one-word description for commas, periods, question marks, and colons. Which one of the following statements most accurately describes the illustrative item?
For young students such as these third graders, direct questions should be used instead of incomplete statements—so the illustrative item violates an item-writing guideline for short-answer items.
Which of the following represents the most appropriate strategy by which to support the validity of score-based interpretations for specific uses?
Generation of an evidence-laden validity argument in support of a particular usage-specified score interpretation
A first-year classroom teacher, George Jenkins, has just finished preparing the initial set of three classroom tests he intends to use with his fifth-grade students early in the school year (one test each in mathematics, language arts, and social studies). In an effort to improve those tests, he has e-mailed a draft version of the three tests to his mother, who provided continuing support for George while he completed his teacher-education coursework as well as a semester-long student teaching experience. He asks his mother to suggest improvements that he might make in the early-version tests. Which of the following best describes George's effort to enhance the quality of his tests?
George could probably have secured better advice about his draft tests had he solicited it from his school's teachers and from his fifth-grade students before they took the tests "for real."
Flo Philips, a health education teacher in a large urban middle school, has recently begun analyzing her selected-response classroom tests using empirical data from students' current performances on those tests. She has acquired a simplified test-analysis program from her district's administrators and enjoys applying the program on her own laptop computer. She tries to base students' grades chiefly on their test scores and hopes to find that her items display item-discrimination indices below .20. A recent analysis of items from one of her major classroom tests indicated that three items were negative discriminators. Flo was elated. Please select the statement that most accurately describes Flo's test-improvement understanding.
Given Flo's use of students' test performances to assign grades, her understanding of item-discrimination indices is confused—actually, her items should be yielding strong positive indices rather than low or negative indices.
A district's new computer-administered test of students' mastery of "composition conventions" has recently been used with their district's eleventh- and twelfth-grade students. To help judge the consistency with which the test measures students' knowledge of the assessed conventions, district officials have computed Cronbach's coefficient alpha for students who completed this brand-new exam. Which of the following kinds of reliability evidence do these alpha coefficients represent?
Internal consistency
An independent, for-profit measurement firm has recently published what the firm's promotional literature claims to be "an instructionally diagnostic" interim test in mathematics. Different forms of the new test are to be administered to students every two or three months. A student's results are reported as a total, all-encompassing score and also as five "strands" that are advertised as "distinctive and diagnostic." Your district's administrators are deciding whether to purchase copies of this new test. Which one of the following would be the most appropriate source of validity evidence for the newly published test?
Internal structure evidence
Please imagine that the reading specialists in a district's central office have developed what they have labeled a "diagnostic reading test." You think its so-called subscale scores are not diagnostic at all but are simply measuring a single overall dimension you believe to be "reading comprehension." In this setting, which of the following kinds of reliability evidence would supply the most relevant information related to your disagreement with the reading test's developers?
Internal-consistency reliability evidence
A high-school biology teacher, Nicholas, relies heavily on his students' test performances when he assigns grades to those students. Typically, he sends his selected-response classroom tests to the district's assessment director who, usually in 24 hours, returns a set of item analyses to the teachers. These analyses usually contain an overall mean and a standard deviation for each class's test performances, as well as p-values and item-discrimination indicators for every item in each test. Nicholas teaches four different biology classes, so four separate analyses are carried out at the district office. Nicholas is pleased that very few of his tests' items display exceptionally high or exceptionally low p-values. Moreover, the vast majority of the items appear to have discrimination indices of a positive .25 or above. Three items have negative discrimination indices. After looking at the phrasing of those items, Nicholas sees how he should revise them to eliminate potentially confusing ambiguities. Please consider the four options below, then select the alternative that most accurately describes how Nicholas is dealing with the analyses of his biology tests.
Nicholas was appropriately pleased with the results of the district-conducted item analyses, and he made a sensible decision to revise the three negatively discriminating items.
This excerpt from a teacher's memo includes faculty-created rules for scoring their students' responses to essay items. The following rules for scoring students' responses to essay items were created last year by our faculty and were approved by a near unanimous vote of the faculty. Please review what those rules recommend prior to our taking this year's "confirmatory" faculty vote on these rules. RULES FOR SCORING RESPONSES TO ESSAY ITEMS When teachers in this school score their students' responses to essay items, those teachers should always (1) make a preliminary judgment about how much importance should be assigned to the conventions of writing, such as spelling, (2) decide whether to score holistically or analytically, (3) prepare a tentative scoring key prior to actually scoring students' responses, (4) try to score students' responses anonymously without knowing which student supplied which response, and (5) score a given student's responses to all essay items on a test and then move on to the next student's responses. Please select the most accurate assertion regarding these rules.
Only one of the faculty-approved rules is basically opposed to the Chapter 7 guidelines for scoring students' responses to essay items.
Measurement specialists assert that validation efforts are preoccupied with the degree to which we use students' test performances to support the accuracy of score-based inferences. Which of the following best identifies the focus of those inferences?
Students' unseen skills and knowledge
Which of the following strategies seems most suitable for teachers to use when trying to detect and eliminate assessment bias in their own teacher-made tests?
Teachers should pay particular attention to the possibility that assessment bias may have crept into their teacher-made tests and should strive to rely on their best judgments about the presence of such bias on all of their classroom tests—but especially on their most significant classroom assessments.
In certain Christian religions, there are gradients of sinful acts. For example, in the Roman Catholic Church, a venial sin need not be confessed to a priest, whereas a mortal sin must definitely be confessed. Based on a context clue contained in the paragraph above, which of the following statements is most accurate? a. For Catholics, there is no difference in the gravity or mortal or venial sins. b. For Catholics, a mortal sin is more serious than a venial sin. c. For Catholics, a venial sin is more serious than a mortal sin. d. Catholic priests are required to forgive all mortal sins that are confessed.
The assessment item appears to be biased in favor of students who are Roman Catholics.
Amy Johnson has a large collection of Barbie dolls. Originally, she had 49. Recently, she somehow lost 12 Barbies. How many Barbies does Amy have left? (Show your work.) a. 37 Barbies b. 61 Barbies c. 27 Barbies
The assessment might offend people who view girls as having much broader interests than playing with dolls.
This illustrative item is destined for use in a high-school speech course that, in recent weeks, has been focused on debate preparation. Directions: To conclude our unit on how to prepare successfully for a debate, please consider carefully the following preparation-focused topics. After doing so, choose one that you regard as most important—to you—and then write a 300-400word essay describing how best to prepare for whatever topic you chose. Be sure to identify which of the potential topics you have selected. You will have 40 minutes to prepare your essay. Potential Essay Topics Introducing your position and defending it Use of evidence during the body of the debate Preparing for your opponents' rebuttal Please choose the statement that most accurately reflects the illustrative item's congruence with Chapter 7's guidelines for writing essay items.
The illustrative item is structured in direct opposition to one of the chapter's guidelines for writing essay items.
Consider the following illustrative three-option multiple-choice item. An anonymously completed, self-report item regarding a student's values —anitem that has no clearly correct answer—is best suited for use in an: a. cognitive examination b. affective inventory c. psychomotor skills test Which of the following statements best characterizes the illustrative item?
The illustrative item violates a general item-writing guideline by providing a blatant grammatical clue to the correct answer.
This illustrative short-answer item was constructed for tenth-grade students. Following World War Two, an international organization intended to maintain world peace was established, namely, the United Nations. Similarly, after World War One a peace-oriented international organization was established. What was the name of that earlier organization? _____________________ Which of the following statements best mirrors the degree to which the illustrative item is in accord with Chapter 7's guidelines for writing short-answer items?
The illustrative item violates none of the chapter's guidelines for writing short-answer items.
Which one of the following kinds of validity evidence represents a different category of evidence than the other three kinds of validity evidence identified? a. Convergent evidence, that is, positive relationships between test scores and other measures intended to measure the same or similar constructs b. Discriminant evidence, that is, positive relationships between test scores and other measures purportedly assessing different constructs c. Alignment evidence d. Test-criterion relationship evidence representing the degree to which a test score predicts a relevant variable that is operationally distinct from the test Which of the following statements best describes the illustrative item?
The illustrative item violates one of the chapter's general item-writing guidelines by presenting a blatant cue regarding which answer is correct.
Consider the following illustrative binary-choice item. For this next True/False item, indicate whether the item's statement is true or false by circling the T or F following the item. Validation is the joint responsibility of the test developer and the test user, but the accumulation of reliability/precision evidence is the exclusive responsibility of the test user. (Circle one: T or F) Which of the following statements best describes the illustrative item?
The illustrative True/False item violates one of the item-category guidelines by including two substantial concepts in a single item.
Here's an illustrative short-response item intended for use with ninth-graders in a high-school government course: Please accurately fill in the blanks you find in the statement given below regarding "How a bill becomes a law." In _______, _______ and _______ explored what ultimately became the _______ section of the northwestern United States with the assistance of a native-American guide known as _______. (Prod. These blank lines MUST be equal in length.) Select the most accurate of the following statements regarding this illustrative short-answer item.
The item satisfies the guideline regarding linear equality, yet violates the number-of-blanks guideline.
For following item, select the option that best illustrates the degree to which the item adheres to the chapter's general item-writing guidelines or the guidelines for specific categories of items. Note that following item deal with assessment-related content and thus might be regarded as a rudimentary form of "assessment enrichment." Consider whether the following binary-choice item adheres to the item-writing guidelines presented in the text. Presented below is a binary-choice item. Please indicate—by circling the R or W—whether the statement given in the item is right (R) or wrong (W). R or W Absence-of-bias determinations are typically made as a function of judgmental scrutiny and, when possible, empirical analysis. Which of the following statements best describes the illustrative item?
The item violates none of the chapter's guidelines, either the five general guidelines or the specific guidelines for binary-choice items.
Consider the following illustrative binary-choice item. Please consider the following binary-choice item and then indicate whether it is Accurate (A) or Inaccurate (I). A or I ___ If a teacher wishes to create assessments that truly tap students' mastery of higher order cognitive challenges, the teacher will not be working within the affective domain. Which of the following statements best describes the illustrative item?
The item violates the item-category guideline discouraging the use of negatives in such items.
Validity evidence can be collected from a number of sources. Suppose, for instance, that a mathematics test has been built by a school district's officials to help identify those middle-school students who are unlikely to pass a statewide eleventh-grade high-school diploma test. The new test will routinely be given to the district's seventh-grade students. To secure evidence supporting the validity of this kind of predictive application, the new test will be administered to current seventh-graders, and the seventh-grade tests will also be given to the district's current eleventh-graders. This will permit the eleventh-graders' two sets of test results to be compared. Which of the following best describes this source of validity evidence?
The relationship of eleventh-graders' performances on the two tests
In which one of the following four statements are all of the pronouns used properly? a. I truly enjoyed his telling of the joke. b. We watched him going to the coffee shop. c. We listened to them singing the once—popular, but rarely heard song. d. Dad watched them joking about politicians-while approving of it all.
This assessment item does not appear to be biased.
This illustrative essay item was written for eleventh-grade students taking an English course. In the space provided in your test booklet, please compose a brief editorial (of 250 words or less) in favor of the school district's after-school tutorial program. The intended audience for your position statement consists of those people who routinely read this town's weekly newspaper. Because you will have the entire class period to complete this task, you may wish to write a draft editorial using the scratch paper provided so that you can then revise the draft before copying your final version into the test booklet. Your grade on this task will contribute 40 percent toward the grade for the Six-Week Persuasive Writing Unit. Which of the following statement best characterizes this item?
This illustrative item contains no serious violation of any of the chapter's guidelines for writing essay items.
Consider the following multiple binary-choice item with its four separate sub-items and then decide how well the item adhered to the chapter's item-writing guidelines. Directions: For each statement in the following cluster of four statements, please indicate whether the statement is true (T) or false (F) by circling the appropriateletter. In an elaborate effort to ascertain the reliability of a new high-stakes test developed in their district, central-office administrators have calculated the following types of evidence based on a tryout of the test with nearly 2,300 students: • Internal consistency r = .83 • Test-retest r = .78 • Standard error of measurement = 4.3 T or F (1) The three types of reliability evidence calculated by the central-office staff are essentially interchangeable. T or F (2) The trivial difference between the test-retest coefficient and the internal consistency coefficient constitutes no cause for alarm. T or F (3) The test-retest r should never be smaller than a test's internal consistency estimate of reliability. T or F (4) The standard error measurement (4.3 in this instance) is derived more from validity evidence than from reliability evidence. Choose the most accurate of the following statements regarding the illustrative multiple binary-choice item as a whole.
This illustrative item seems to violate none of the chapter's guidelines for constructing such items, that is, the general guidelines, the guidelines for multiple binary-choice guidelines, and the guidelines for binary-choice items.
Consider the following illustrative binary-choice item. It deals with a reliability/precision concept treated in the Standards for Educational and Psychological Testing (2014). Directions: Please indicate whether the statement below regarding the reliability/precision of educational tests is Accurate (Circle the A) or Inaccurate (Circle the I). A or I Because the standard error of measurement can be employed to generate confidence intervals around reported scores, it is typically more informative than a reliability coefficient. Which of the following statements best describes the illustrative item?
This illustrative binary-choice item violates none of the general or item-category guidelines for this type of selected-response item.
Consider the following illustrative five-option multiple-choice item. It addresses content presented in the Standards for Educational and Psychological Testing (2014) related to the fundamental notion of assessment validity. When we encounter a test whose scores are affected by processes that are quite extraneous to the test's intended purpose, we assert that the test displays which one of the following? a. Construct underrepresentation b. Construct deficiency c. Construct corruption d. Construct-irrelevant variance e. All of the above Which of the following statements best describes the illustrative item?
This illustrative item, because it includes an "all of the above" alternative, violates an important ite-writing guideline.
Please compose a short essay of 500 and 1,000 words on the topic: "Soccer Outside the United States." Either use one of our classroom computers or write the essay by hand. Be sure to engage in appropriate prewriting activities, draft an initial version of the essay, and then revise your draft at least once. You will have ninety minutes to complete this task.
This item seems to be biased in favor of children born outside the United States, many of whom may be more familiar with non-U.S. soccer than will children be who are born in the United States.
A considerable degree of disagreement can be found among educators regarding the precise meaning of the label "performance assessment."
True
A major challenge facing those teachers who personally employ performance tests is the difficulty of drawing valid inferences about students' generalized mastery of the skill(s) or bodies of knowledge being measured.
True
Although judgmental methods can be readily employed to identify a test's items that are apt to be instructionally insensitive, reliance on empirical methods of doing so requires large samples of students and teachers—aswell as the use of sophisticated statistical analyses.
True
Although the NAEP assessment frameworks are, technically, supposed to guide NAEP item-development and not function as curricular frameworks because of the long-standing U.S. tradition that the federal government shouldn't influence what is taught in state-governed public schools, teachers can still get good ideas about what to assess and how to assess it from the illustrative NAEP items that are available to the public. Question options:
True
Although students' results on standardized tests are reported frequently as scale scores, percentiles are more intuitively understandable for most people.
True
Although test-elicited evidence of students' learning can play a prominent role in the summative evaluation of teachers, and most commentators believe that it should do so, a number of other useful sources of teacher-evaluation exist.
True
Because a classroom test's influence on a teacher's instructional decision making is one of the most beneficial dividends of classroom assessment, a teacher should think through in advance how certain levels of student performances would influence a teacher's test-based instructional decisions—and then abandon or revise any tests that have no decision-impact linked to their results.
True
Because an excessive number of assessment targets in a teacher's classroom assessments can make it difficult to maintain an instructional focus on too many assessable outcomes, it is wiser for teachers to employ grain-sizes calling for a modest number of broad-scope assessment targets than adopt a large number ofsmall-scope assessment targets.
True
Because most items on traditionally standardized achievement tests must contribute to a sufficient amount of spread in test-takers' total test scores to permit fine-grained comparisons among those test takers, some items on today's accountability tests end up being closely linked to test-takers' innate academic aptitudes.
True
Because of such needs as how to grade this year's students or whether changes are needed in next year's instructional procedures, teachers should invariably link their planned classroom assessments explicitly to these sorts of decisions from the earliest moments a classroom test is being conceptualized.
True
Because of the manner in which certain of their items have been constructed, some commercially created nationally standardized achievement tests tend to measure, in part, the composition of a school's student body rather than the effectiveness with which those students have been taught.
True
Because of today's continuing advances in technology, it seems certain that creators of performance assessment will increasingly structure their computer-based assessments around a wide range of digitally simulated tasks.
True
Because recent years have seen both schools and teachers being evaluated on the basis of students' performances on high-stakes tests, such as a state's annual accountability tests, it becomes almost imperative for teachers to determine the degree to which what's measured by their classroom assessments can contribute to improved students' performances on such significant tests.
True
Because the curricular recommendations of national subject-matter associations typically represent the best curricular thinking of the most able subject-matter specialists in a given field, as teachers try to identify the knowledge, skills, and affect to measure in their own classroom assessments, the views of such national organizations can often provide helpful curricular insights.
True
Because task-specific rubrics are almost always more particularized in their expectations than are other kinds of rubrics, it is usually possible to score students' responses more quickly and more accurately using such rubrics than when using other kinds of rubrics.
True
Classroom observation systems can be developed and refined so that accurate and reliable observations of extremely able and extremely inept teachers can be made, but substantial difficulties arise when trying to base evaluative judgments on classroom observations of "middle-ability" teachers.
True
Collection of students' affective dispositions, by employing anonymously completed self-report inventories—prior to and following instruction—canmake a useful contribution to an evidence-based evaluation of a teacher's instructional success or a school's instructional success.
True
Consider each description of fictional teachers carrying out their implementations of portfolio assessment, then indicate whether item following that description is True or False. Maria Flores Installs Portfolio Assessment Having decided to adopt a portfolio assessment approach for the written-composition segment of her middle-school English classes, Maria Flores introduces her students to the new assessment scheme by asking a commercial artist friend of hers to speak to each class. The artist brings his own portfolio and shows students how it allows prospective clients to judge his work. Ms. Flores tells her students that her friend's portfolio is called a "showcase portfolio" and that students will be preparing both a showcase portfolio to periodically involve their parents in reviewing a student's work products, as well as a "working portfolio" to keep track of all of their composition drafts and final products. Ms. Flores and her friend emphasize that both kinds of portfolios must be owned by the student, not the teacher. Early in the academic year, Ms. Flores works with each of her classes to decide collaboratively on the evaluative criteria to be used in the rubrics that will be used in a particular class for judging the composition efforts of a given class. Although these "per-class" rubrics occasionally differ in certain respects for different classes, they are generally quite similar. Students are directed to place all of their drafts and final versions in folders and then put those folders in a designated file drawer in the classroom. Ms. Flores makes sure to review all students' portfolios at least once a month. Typically, she devotes one preparation period a day to a differentclass's portfolios. Because the portfolios are readily available, Ms. Flores finds it convenient and time-efficient to evaluate students' progress in this manner. She provides a brief (dated) "teacher's evaluation" for students to consider when they work with their own portfolios. At least twice every term, Ms. Flores selects what she considers to be the students' best finished compositions from their working portfolios. She places such work products in a showcase portfolio. Students are directed to take these showcase portfolios home to let their families see what kinds of compositions they have been creating. Parents are enthusiastic about this practice. A number of parents have told the school's principal that Ms. Flores's "take-home" portfolio system is the way they would like to see other aspects of their children's performances evaluated. Although Ms. Flores incorporated many fine features in her attempt to make portfolio assessment a success in her classes, she seriously overlooked activities intended to enhance students' self-evaluation abilities.
True
Consider each description of fictional teachers carrying out their implementations of portfolio assessment, then indicate whether item following that description is True or False. Gary Owens Gives Portfolio Assessment a Try A third-grade teacher, Gary Owens, has just completed a summer professional-development workshop on portfolio assessment. He and a number of the teachers at his school have decided to try out performance assessment at least in limited parts of their instructional and assessment programs. Gary has decided to use portfolios with his third-graders' mathematics work for a full school year. He introduces students to the activity by stressing the importance of their personal ownership of the portfolios and the significance of their choosing the kinds of mathematics work they put in their portfolios. Gary suggests to the class that students include only problem-solution mathematics work in their portfolios. Thus, they should not put drill work and simple computational work in the portfolios. The students discuss this suggestion for a while and then unanimously agree. Early on, Gary works with students for two full days to decide on the evaluative criteria in the rubrics he and they will use when evaluating the mathematics work in the portfolios. They decide, collaboratively, that the major evaluative criteria will be (1) selection of proper solution strategies, (2) accurate completion of selected solution procedures, and (3) arrival at the correct solution to the problem. Students routinely collect their work and place it for safekeeping in specially marked cardboard boxes that Gary has arranged on the "Portfolio Shelf." Every two months, Gary holds an individual portfolio conference with each student during which he supplies the student with a "teacher's appraisal" of that student's portfolio work. It is clear to Gary that his students' ability to solve mathematics problems has improved substantially. Although it took most students several weeks to get used to the process, they now seem to thoroughly enjoy Gary's version of portfolio assessment in mathematics. He does also. Although Gary's approach to portfolio assessment has much to commend it, he failed to include two key ingredients for successful portfolio assessment.
True
Consider each description of fictional teachers carrying out their implementations of portfolio assessment, then indicate whether item following that description is True or False. Gary Owens Gives Portfolio Assessment a Try A third-grade teacher, Gary Owens, has just completed a summer professional-development workshop on portfolio assessment. He and a number of the teachers at his school have decided to try out performance assessment at least in limited parts of their instructional and assessment programs. Gary has decided to use portfolios with his third-graders' mathematics work for a full school year. He introduces students to the activity by stressing the importance of their personal ownership of the portfolios and the significance of their choosing the kinds of mathematics work they put in their portfolios. Gary suggests to the class that students include only problem-solution mathematics work in their portfolios. Thus, they should not put drill work and simple computational work in the portfolios. The students discuss this suggestion for a while and then unanimously agree. Early on, Gary works with students for two full days to decide on the evaluative criteria in the rubrics he and they will use when evaluating the mathematics work in the portfolios. They decide, collaboratively, that the major evaluative criteria will be (1) selection of proper solution strategies, (2) accurate completion of selected solution procedures, and (3) arrival at the correct solution to the problem. Students routinely collect their work and place it for safekeeping in specially marked cardboard boxes that Gary has arranged on the "Portfolio Shelf." Every two months, Gary holds an individual portfolio conference with each student during which he supplies the student with a "teacher's appraisal" of that student's portfolio work. It is clear to Gary that his students' ability to solve mathematics problems has improved substantially. Although it took most students several weeks to get used to the process, they now seem to thoroughly enjoy Gary's version of portfolio assessment in mathematics. He does also. Gary's emphasis on the importance of students' personal ownership of their portfolios was given in a timely manner—at the outset of introducing students to portfolio assessment.
True
Consider each description of fictional teachers carrying out their implementations of portfolio assessment, then indicate whether item following that description is True or False. Gary Owens Gives Portfolio Assessment a Try A third-grade teacher, Gary Owens, has just completed a summer professional-development workshop on portfolio assessment. He and a number of the teachers at his school have decided to try out performance assessment at least in limited parts of their instructional and assessment programs. Gary has decided to use portfolios with his third-graders' mathematics work for a full school year. He introduces students to the activity by stressing the importance of their personal ownership of the portfolios and the significance of their choosing the kinds of mathematics work they put in their portfolios. Gary suggests to the class that students include only problem-solution mathematics work in their portfolios. Thus, they should not put drill work and simple computational work in the portfolios. The students discuss this suggestion for a while and then unanimously agree. Early on, Gary works with students for two full days to decide on the evaluative criteria in the rubrics he and they will use when evaluating the mathematics work in the portfolios. They decide, collaboratively, that the major evaluative criteria will be (1) selection of proper solution strategies, (2) accurate completion of selected solution procedures, and (3) arrival at the correct solution to the problem. Students routinely collect their work and place it for safekeeping in specially marked cardboard boxes that Gary has arranged on the "Portfolio Shelf." Every two months, Gary holds an individual portfolio conference with each student during which he supplies the student with a "teacher's appraisal" of that student's portfolio work. It is clear to Gary that his students' ability to solve mathematics problems has improved substantially. Although it took most students several weeks to get used to the process, they now seem to thoroughly enjoy Gary's version of portfolio assessment in mathematics. He does also. Gary's failure to involve parents meaningfully in the portfolio assessment process represents a serious constraint on the learning dividends obtainable from portfolio assessment.
True
Consider each description of fictional teachers carrying out their implementations of portfolio assessment, then indicate whether item following that description is True or False. Maria Flores Installs Portfolio Assessment Having decided to adopt a portfolio assessment approach for the written-composition segment of her middle-school English classes, Maria Flores introduces her students to the new assessment scheme by asking a commercial artist friend of hers to speak to each class. The artist brings his own portfolio and shows students how it allows prospective clients to judge his work. Ms. Flores tells her students that her friend's portfolio is called a "showcase portfolio" and that students will be preparing both a showcase portfolio to periodically involve their parents in reviewing a student's work products, as well as a "working portfolio" to keep track of all of their composition drafts and final products. Ms. Flores and her friend emphasize that both kinds of portfolios must be owned by the student, not the teacher. Early in the academic year, Ms. Flores works with each of her classes to decide collaboratively on the evaluative criteria to be used in the rubrics that will be used in a particular class for judging the composition efforts of a given class. Although these "per-class" rubrics occasionally differ in certain respects for different classes, they are generally quite similar. Students are directed to place all of their drafts and final versions in folders and then put those folders in a designated file drawer in the classroom. Ms. Flores makes sure to review all students' portfolios at least once a month. Typically, she devotes one preparation period a day to a differentclass's portfolios. Because the portfolios are readily available, Ms. Flores finds it convenient and time-efficient to evaluate students' progress in this manner. She provides a brief (dated) "teacher's evaluation" for students to consider when they work with their own portfolios. At least twice every term, Ms. Flores selects what she considers to be the students' best finished compositions from their working portfolios. She places such work products in a showcase portfolio. Students are directed to take these showcase portfolios home to let their families see what kinds of compositions they have been creating. Parents are enthusiastic about this practice. A number of parents have told the school's principal that Ms. Flores's "take-home" portfolio system is the way they would like to see other aspects of their children's performances evaluated. Ms. Flores's major omission in her implementation of portfolio assessment is her failure to engage her students in one-on-one portfolio conferences during the school year.
True
Darrell Ito teaches English in a suburban middle school and has been intrigued by his principal's strong advocacy of formative assessment. He has read several articles about formative assessment and borrowed a book from his school's professional-development library dealing solely with how a teacher launches aformative-assessment program. Based on what he has read, Darrell decides not to use formative assessment when his students are learning how to create original compositions but, rather, when pursuing any curricular aims involving "Writer's Rules," such as punctuation rules, spelling rules, and usage conventions. Moreover, he decides to rely almost exclusively on students' self-reported understandings, that is, their use of red, green, and yellow plastic cups to indicate the degree to which they are "getting it" as the class proceeds. Every few weeks, based on his analysis of the sequence of rules his students must master (part of what he calls a "learning progression"), Darrell orally presents a set of three or four "Writer's Rules" to his students. After each rule has been presented, Darrell says, "Traffic-signal colors, class!" At that point, students put a green cup on top of their cup-stack to indicate that they understand the presented rule well. Students put a yellow cup on top of their stack if they are unsure about their understanding of the rule just given. And, of course, the red cup goes on top of the stack if a student really is baffled by the particular rule that Darrell has described. As students' cup stacks are being rearranged, Darrell visually surveys the stacks to determine which colors predominate. Because formative assessment calls for an obligatory instructional adjustment based on assessment-elicited evidence, Darrell provides additional instruction regarding each rule—but tries to make the adjusted instruction quite different from the way he taught the rule earlier. The greater the prevalence of yellow or red cups, the more instructional time Darrell devotes to what he calls his "second stab" at rule-teaching. Based on several months of Darrell's attempt to implement formative assessment in his English classes, almost all students are pleased with the approach. So is Darrell. Although other techniques exist for securing assessment-elicited evidence of students' status, Darrell's reliance on students' self-reported levels of understanding was consistent with an acceptable implementation of formative assessment.
True
Note that this Mastery Check was specifically designed both to provide evidence regarding your outcome mastery and to fulfill a diagnostic function. It addresses four topics: (1) the relationship between affective assessment relationship and instruction (Items 1, 5, 9, 13, and 17), (2) self-report inventories (Items 2, 6, 10, 14, and 18), (3) respondents' anonymity (Items 3, 7, 11, 15, and 19), and (4) how to interpret results of affective assessment (Items 4, 8, 12, 16, and 20). Thus, you can determine how well you seem to understand each of the four topics by focusing on how you performed on the five items for that topic. Most teachers want their students, at the close of an instructional period, to exhibit subject-approaching tendencies (that is, an interest in the subject being taught) equal to or greater than the subject-approaching tendencies those students displayed at the beginning of instruction.
True
Darrell Ito teaches English in a suburban middle school and has been intrigued by his principal's strong advocacy of formative assessment. He has read several articles about formative assessment and borrowed a book from his school's professional-development library dealing solely with how a teacher launches aformative-assessment program. Based on what he has read, Darrell decides not to use formative assessment when his students are learning how to create original compositions but, rather, when pursuing any curricular aims involving "Writer's Rules," such as punctuation rules, spelling rules, and usage conventions. Moreover, he decides to rely almost exclusively on students' self-reported understandings, that is, their use of red, green, and yellow plastic cups to indicate the degree to which they are "getting it" as the class proceeds. Every few weeks, based on his analysis of the sequence of rules his students must master (part of what he calls a "learning progression"), Darrell orally presents a set of three or four "Writer's Rules" to his students. After each rule has been presented, Darrell says, "Traffic-signal colors, class!" At that point, students put a green cup on top of their cup-stack to indicate that they understand the presented rule well. Students put a yellow cup on top of their stack if they are unsure about their understanding of the rule just given. And, of course, the red cup goes on top of the stack if a student really is baffled by the particular rule that Darrell has described. As students' cup stacks are being rearranged, Darrell visually surveys the stacks to determine which colors predominate. Because formative assessment calls for an obligatory instructional adjustment based on assessment-elicited evidence, Darrell provides additional instruction regarding each rule—but tries to make the adjusted instruction quite different from the way he taught the rule earlier. The greater the prevalence of yellow or red cups, the more instructional time Darrell devotes to what he calls his "second stab" at rule-teaching. Based on several months of Darrell's attempt to implement formative assessment in his English classes, almost all students are pleased with the approach. So is Darrell. Because assessment-informed adjustments of a teacher's instruction are not obligatory during the formative-assessment process, Darrell need not have supplied additional instruction adjustments for all of the Writer's Rules.
True
Emily Contreras teaches "Modern Spanish" in a large suburban high school whose Latino students constitute less than 10 percent of the student body. Sensing the approach of substantial demographic changes in the United States, she wants to make certain that many more of her school's non-Latino students have at least a modicum of Spanish-speaking skills. Because Emily has a solid belief in the value of assessment-illuminated instruction and especially the merits of formative assessment, she was particularly pleased last year to see that a commercial test organization had published a set of new "formative assessments in Spanish designed for students at all levels of Spanish-speaking proficiency." Emily persuaded her school's principal, in collaboration with principals from several other district high schools, to purchase sufficient copies of these new tests to be used in the manner the tests' publisher specifies. The tests, according to their publisher, are to be administered four times a year—at the beginning of the school year, at its conclusion, and at two separate times during the middle three months of the school year. In this way, the publisher asserts, "the tests' formative dividends" will be maximized for both teachers and students alike. The new tests are described by their developers as "consistent with findings of the widely accepted Black and Wiliam research review of 1998" and can also function as excellent predictors of high-school students' subsequent college accomplishments if they take additional courses in Spanish. Emily is simply delighted that these new assessments in Spanish, previously unavailable wherever she has been teaching, can be used in her classes. The publisher's statement that the tests are "consistent" with the conclusions of the Black and Wiliam 1998 research review does not automatically guarantee that such consistency is, in fact, present.
True
First developed by the College Board to function as an admission exam for elite Northeastern universities, the SAT has had its items replenished by the Educational Testing Service since 1947
True
For the past several years Nguyen Nguyen has taught French in a suburban high school. Because of his Vietnamese background and the longtime involvement of France in a nation that used to be called French Indochina, Nguyen's French accent is excellent. People often remark that he sounds as though he had spent his entire life in Paris. This is his fourth year of highly acclaimed teaching in this school. When Nguyen reviewed his state's specified instructional outcomes for secondary-level French courses, and for language courses in general, he was dismayed to discover that most of the official outcomes centered on written French rather than spoken French. Because the state's language teachers are given a certain degree of latitude in assigning grading weights to the state's prescribed instructional objectives, Nguyen takes advantage of this flexibility by basing 80 percent of the grade for each student on the degree to which they have achieved mastery of the course objectives reflecting speaking and listening French fluency. Nguyen explains his grading plans to students and, via a take-home note, to their parents early in the school year. Although a few parents carp about the heavy stress on spoken French, most agree with Nguyen's grading plans. Is Nguyen a GGAG? (TRUE or FALSE)
True
If a distribution of standardized test scores were particularly heterogeneous, it would have a larger standard deviation than if it were particularly homogeneous.
True
If an elementary teacher has designed his instructional system so it centers on the use of "catch-up" and "enrichment" learning centers where, based on classroom-assessment performances, students self-assign themselves to one of these centers, an early-on factor to consider is whether the classroom assessments should yield norm-referenced or criterion-referenced inferences.
True
If appropriately conceived and implemented, performance assessment can contribute substantially not only to improving a teacher's instructional effectiveness but also to increasing the quality of students' learning.
True
If a teacher's students are annually supposed to master an officially approved set of state curricular standards, and a state accountability test aligned with those standards is given each year, teachers should surely try to make sure that what their classroom tests measure is congruent —or contributory to—what's assessed by such state accountability tests.
True
If the performances of a large group of students on a standardized test are arrayed in a relatively normal manner, then approximately two-thirds of the students' scores will be located within plus-or-minus one standard deviation from the group's mean score.
True
Note that this Mastery Check was specifically designed both to provide evidence regarding your outcome mastery and to fulfill a diagnostic function. It addresses four topics: (1) the relationship between affective assessment relationship and instruction (Items 1, 5, 9, 13, and 17), (2) self-report inventories (Items 2, 6, 10, 14, and 18), (3) respondents' anonymity (Items 3, 7, 11, 15, and 19), and (4) how to interpret results of affective assessment (Items 4, 8, 12, 16, and 20). Thus, you can determine how well you seem to understand each of the four topics by focusing on how you performed on the five items for that topic. Multifocus self-report affective inventories typically contain far fewer items related to each affective variable being measured than do traditional Likert inventories.
True
When ACT tests are revised, content suggestions are routinely solicited from large samples of secondary school teachers and curriculum specialists, as well as from college professors in the subject areas to be assessed.
True
In her fourth-grade class, Belinda Avery has recently tried to rely heavily on the formative-assessment process in pursuing the language arts curricular aims identified by her state's department of education as "Mandatory Language Arts Expectancies." In all, there are 13 such expectancies that her students are supposed tomaster, eight in reading and five in writing. For each of these 13 expectancies, Belinda has identified either one or two "building blocks"—namely, subskills or bodies of enabling knowledge that she believes students must master en route to their achievement of whatever specific expectancy is involved. Belinda weights the importance of each of the building blocks as a contributor to students' mastery of the expectancy, then bases her during-the-year grades on students' mastery of the most heavily weighted building blocks. When the school year is nearing its conclusion, however, Belinda plans to measure her students' mastery of each of the 13 expectancies directly and to grade students on their expectancy attainment. During the bulk of the school year, however, she is only assessing students' building-block learning for purposes of improvinglearning, not evaluating students. Is Belinda a GGAG? (TRUE or FALSE)
True
In recognition of how much time it typically takes for teachers to score students' responses to constructed-response items, especially those items calling for extended responses, an early factor for a teacher to consider when creating a classroom assessment is whether the teacher has sufficient time to properly score students' responses to a test containing constructed-response items. Question options:
True
In the Midwest state where Sarah Ortiz teaches third graders, the state's school board has officially endorsed an exceedingly large number of "learning objectives." Because Sarah wants to organize her instruction and grading practices around a set of intended curricular outcomes and because she believes the state's learning objectives are too numerous, she decides to prioritize those objectives by focusing only on a half-dozen of the most important ones. Her school district's report cards call only for an "overall achievement" grade (not broken down by particular objectives), so Sarah informs all of her students—andtheir parents—about which objectives she will be using as the chief targets for her instruction and for her grading. A description of each of those six objectives, along with two sample items that might be used on a test assessing student's achievement of that objective, is sent home with each student at the start of the school year. Sarah then grades her students, using evidence from formal and informal assessments, according to how well she believes a student has mastered the six objectives. Although the students and parents seem to appreciate the way Sarah is grading her students, several teachers in her school chide her for dealing with only six objectives out of a state-approved set of 34 third-grade objectives. Is Sarah a GGAG? (TRUE or FALSE)
True
In the school district where Harry Harper teaches middle-school science, a number of recent demographic changes have occurred so that most schools now serve very divergent groups of students. In his own classes, Harry finds that roughly half of his students come from schools with excellent instructional programs and half have come from schools where instruction has been less than stellar. His students' entry-level achievement, therefore, varies enormously. When Harry communicates about grades to parents and students during the midyear and end-of-year grading conferences in which students take part, he makes no grading allowances for differences in students' prior instruction. The only factor Harry uses, as he presents his midyear and end-of-year grades during these conferences, is the degree to which each student has accomplished the curricular goals identified by Harry's district for middle-school science courses. Is Harry a GGAG? (TRUE or FALSE)
True
It is often remarkably helpful for teachers to ask their coworkers to review the potential emphases of underdevelopment classroom assessments because teachers and administrators, especially those who are familiar with what's being taught and the sorts of students to whom it is taught, can provide useful insights regarding what should be assessed—and what shouldn't. Question options:
True
James Jackson teaches third graders in an inner-city elementary school. He has been a faculty member at the school for two years, having taught the third grade at a rural school for five years before that. Because he is dismayed with the skills of his students, especially in mathematics, he has decided to implement aformative-assessment strategy—in math only—for his students. He understands that formative assessment will generally be more successful if it focuses attention on only a modest number of higher-order mathematics skills rather than on a large number of less important subskills and bodies of knowledge that third-grade children should master. Accordingly, he identifies six truly challenging mathematics skills and splits the school year into six separate skill-promotion units of at least six weeks' duration aimed at each of the six target skills. For each of the six units, he then identifies a learning progression identifying no more than four "building blocks," that is, subskills or bodies of knowledge James regards as precursive to students' attainment of the mathematics skill being promoted in each unit. These learning progressions serve as a sort of "instructional map" intended to guide James with his teaching. As students near the close of each building block, James uses a variety of selected-response tests to measure his students' mastery of the subskill or knowledge embodied in that particular building block. Then, for any building block with which many students are having difficulty, he provides additional and sometimes dramatically different instruction dealing with what the building block measured. James believes his approach to formative assessment seems to be working. Although the particular manner in which James employed formative assessment in his classroom seems sensible, research evidence suggests that a variety of other formative-assessment implementation strategies will also yield improved learning for students.
True
James Jackson teaches third graders in an inner-city elementary school. He has been a faculty member at the school for two years, having taught the third grade at a rural school for five years before that. Because he is dismayed with the skills of his students, especially in mathematics, he has decided to implement aformative-assessment strategy—in math only—for his students. He understands that formative assessment will generally be more successful if it focuses attention on only a modest number of higher-order mathematics skills rather than on a large number of less important subskills and bodies of knowledge that third-grade children should master. Accordingly, he identifies six truly challenging mathematics skills and splits the school year into six separate skill-promotion units of at least six weeks' duration aimed at each of the six target skills. For each of the six units, he then identifies a learning progression identifying no more than four "building blocks," that is, subskills or bodies of knowledge James regards as precursive to students' attainment of the mathematics skill being promoted in each unit. These learning progressions serve as a sort of "instructional map" intended to guide James with his teaching. As students near the close of each building block, James uses a variety of selected-response tests to measure his students' mastery of the subskill or knowledge embodied in that particular building block. Then, for any building block with which many students are having difficulty, he provides additional and sometimes dramatically different instruction dealing with what the building block measured. James believes his approach to formative assessment seems to be working. James' reliance on the learning progression he devised to assess his students' mastery of the building blocks, that is, bodies of enabling knowledge and cognitive subskills, represents a common way of implementing a formative-assessment strategy in classrooms such as his.
True
When describing the performance of a group of students on a standardized test, the most commonly accepted index of variability is the standard deviation.
True
Jethro Jones teaches high-school English classes and has been directed by his school's principal to make students' achievement of the state's recently authorized Language Arts/English learning goals the heart of his grade-giving system. Jethro first sorts out the "measurable" learning goals for which he is responsible in his classes (a few of the goals aren't measurable). Then he makes sure that he has one or, preferably, two assessments or student-conducted projects that relate to each goal. These assessments or student projects are administered throughout the school year. When grading time comes, twice per year, Jethro compiles the evidence of goal-achievement for each student, then awards a grade based on each student's evidence related to a particular goal. Jethro, of course, must coalesce these per-goal grades into a more general, overall grade. He does this privately, then translates the complete array of per-goal evidence into semester and full-year grades for his students. Because Jethro is quite attentive to students' attitudinal dispositions toward English, he supplies students and their parents with a separate report form dealing with such variables as student's effort, attendance, class participation, etc. Is Jethro a GGAG? (TRUE or FALSE)
True
Many users of the kinds of scoring rubrics employed to evaluate students' performance-test responses agree that the most significant feature of such rubrics is its set of evaluative criteria.
True
Melinda Stevens is teaching fifth-graders in a rather rural elementary school, and she has done so in the same school ever since she completed her teacher-education program at a nearby state university. This is her fourth year at the school, and she desperately wants to become a more effective teacher. She has been reading several articles about formative assessment and has decided to "give it a twirl" this year. She'll determine how well formative assessment is working for her once the school year is over. Melinda plans to use formative assessment only in connection with her mathematics and language arts curricular goals. She can expand its use to other curricular areas if she concludes that its measurement orientation functions as an effective instructional adjunct. Although most of the articles she's been reading urge that formative assessment be used by both teachers and students, Melinda decides to keep her students out of the formative-assessment process for the first year so that she can become more comfortable with how she should personally use this approach. She starts off the year by giving her students an extensive pretest, using both selected-response items and constructed-response items, so that she can ascertain what her fifth-graders know and don't know. The pretest takes almost two hours for most students to complete, so Melinda breaks the pretesting into four half-hour segments given on adjacent days of the week. Based on her students' performances on the four-part pretest, Melinda adjusts the math and language arts curricular goals so that she can include only those curricular targets that students should definitely master but currently have not mastered. This extensive pretest helps Melinda select suitable goals for her fifth-graders, and she also picks up some useful insights regarding students' current skills and knowledge in both mathematics and in language arts. As the school year unfolds, Melinda administers shorter, more focused pretests at the beginning of any instructional unit destined to last more than three weeks. She typically uses the results of these pre-instruction tests to help her decide whether to retain or discard the potential instructional-unit goals she has tentatively chosen for her students. Because Melinda chooses to preclude her students from taking part in along-the-way classroom assessment during the school year, there is little likelihood that genuine formative assessment could ever transpire in her fifth-grade class.
True
Melinda Stevens is teaching fifth-graders in a rather rural elementary school, and she has done so in the same school ever since she completed her teacher-education program at a nearby state university. This is her fourth year at the school, and she desperately wants to become a more effective teacher. She has been reading several articles about formative assessment and has decided to "give it a twirl" this year. She'll determine how well formative assessment is working for her once the school year is over. Melinda plans to use formative assessment only in connection with her mathematics and language arts curricular goals. She can expand its use to other curricular areas if she concludes that its measurement orientation functions as an effective instructional adjunct. Although most of the articles she's been reading urge that formative assessment be used by both teachers and students, Melinda decides to keep her students out of the formative-assessment process for the first year so that she can become more comfortable with how she should personally use this approach. She starts off the year by giving her students an extensive pretest, using both selected-response items and constructed-response items, so that she can ascertain what her fifth-graders know and don't know. The pretest takes almost two hours for most students to complete, so Melinda breaks the pretesting into four half-hour segments given on adjacent days of the week. Based on her students' performances on the four-part pretest, Melinda adjusts the math and language arts curricular goals so that she can include only those curricular targets that students should definitely master but currently have not mastered. This extensive pretest helps Melinda select suitable goals for her fifth-graders, and she also picks up some useful insights regarding students' current skills and knowledge in both mathematics and in language arts. As the school year unfolds, Melinda administers shorter, more focused pretests at the beginning of any instructional unit destined to last more than three weeks. She typically uses the results of these pre-instruction tests to help her decide whether to retain or discard the potential instructional-unit goals she has tentatively chosen for her students. Melinda's use of extensive pre-instruction assessment was probably helpful to her at the start of the school year and also as the school year progressed, but she was not using the formative-assessment process.
True
Note that this Mastery Check was specifically designed both to provide evidence regarding your outcome mastery and to fulfill a diagnostic function. It addresses four topics: (1) the relationship between affective assessment relationship and instruction (Items 1, 5, 9, 13, and 17), (2) self-report inventories (Items 2, 6, 10, 14, and 18), (3) respondents' anonymity (Items 3, 7, 11, 15, and 19), and (4) how to interpret results of affective assessment (Items 4, 8, 12, 16, and 20). Thus, you can determine how well you seem to understand each of the four topics by focusing on how you performed on the five items for that topic. A substantial number of educators regard affective curricular aims as being equal in importance to cognitive curricular aims—or, possibly, of even greater importance
True
Note that this Mastery Check was specifically designed both to provide evidence regarding your outcome mastery and to fulfill a diagnostic function. It addresses four topics: (1) the relationship between affective assessment relationship and instruction (Items 1, 5, 9, 13, and 17), (2) self-report inventories (Items 2, 6, 10, 14, and 18), (3) respondents' anonymity (Items 3, 7, 11, 15, and 19), and (4) how to interpret results of affective assessment (Items 4, 8, 12, 16, and 20). Thus, you can determine how well you seem to understand each of the four topics by focusing on how you performed on the five items for that topic. For an affective self-report inventory, very young students can be asked to reply to simple statements—sometimes presented orally—by the use of only two or three agreement-options per statement.
True
Note that this Mastery Check was specifically designed both to provide evidence regarding your outcome mastery and to fulfill a diagnostic function. It addresses four topics: (1) the relationship between affective assessment relationship and instruction (Items 1, 5, 9, 13, and 17), (2) self-report inventories (Items 2, 6, 10, 14, and 18), (3) respondents' anonymity (Items 3, 7, 11, 15, and 19), and (4) how to interpret results of affective assessment (Items 4, 8, 12, 16, and 20). Thus, you can determine how well you seem to understand each of the four topics by focusing on how you performed on the five items for that topic. Generally speaking, teachers can make defensible decisions about the impact of affectively oriented instruction by arriving at group-focused inferences regarding their students' affective status prior to and following that instruction.
True
Note that this Mastery Check was specifically designed both to provide evidence regarding your outcome mastery and to fulfill a diagnostic function. It addresses four topics: (1) the relationship between affective assessment relationship and instruction (Items 1, 5, 9, 13, and 17), (2) self-report inventories (Items 2, 6, 10, 14, and 18), (3) respondents' anonymity (Items 3, 7, 11, 15, and 19), and (4) how to interpret results of affective assessment (Items 4, 8, 12, 16, and 20). Thus, you can determine how well you seem to understand each of the four topics by focusing on how you performed on the five items for that topic. One of the more difficult decisions to be faced when constructing a multifocus self-report affective inventory is arriving at a response to the following question: How many items are needed for each affective variable being measured?
True
Note that this Mastery Check was specifically designed both to provide evidence regarding your outcome mastery and to fulfill a diagnostic function. It addresses four topics: (1) the relationship between affective assessment relationship and instruction (Items 1, 5, 9, 13, and 17), (2) self-report inventories (Items 2, 6, 10, 14, and 18), (3) respondents' anonymity (Items 3, 7, 11, 15, and 19), and (4) how to interpret results of affective assessment (Items 4, 8, 12, 16, and 20). Thus, you can determine how well you seem to understand each of the four topics by focusing on how you performed on the five items for that topic. When teachers administer an affective assessment to their students early in an instructional program and intend to administer the same or a similar assessment to their students later, the assessments often have a substantial impact on the teacher's instruction.
True
Note that this Mastery Check was specifically designed both to provide evidence regarding your outcome mastery and to fulfill a diagnostic function. It addresses four topics: (1) the relationship between affective assessment relationship and instruction (Items 1, 5, 9, 13, and 17), (2) self-report inventories (Items 2, 6, 10, 14, and 18), (3) respondents' anonymity (Items 3, 7, 11, 15, and 19), and (4) how to interpret results of affective assessment (Items 4, 8, 12, 16, and 20). Thus, you can determine how well you seem to understand each of the four topics by focusing on how you performed on the five items for that topic. Whenever self-report inventories are employed to measure students' affective dispositions, students' perceptions that their responses are anonymous are typically far more important than is "actual" anonymity.
True
Note that this Mastery Check was specifically designed both to provide evidence regarding your outcome mastery and to fulfill a diagnostic function. It addresses four topics: (1) the relationship between affective assessment relationship and instruction (Items 1, 5, 9, 13, and 17), (2) self-report inventories (Items 2, 6, 10, 14, and 18), (3) respondents' anonymity (Items 3, 7, 11, 15, and 19), and (4) how to interpret results of affective assessment (Items 4, 8, 12, 16, and 20). Thus, you can determine how well you seem to understand each of the four topics by focusing on how you performed on the five items for that topic. Whereas most cognitively oriented classroom assessments attempt to measure students' optimal performances, affectively oriented classroom assessments attempt to get an accurate fix on students' typical dispositions.
True
Note that this Mastery Check was specifically designed both to provide evidence regarding your outcome mastery and to fulfill a diagnostic function. It addresses four topics: (1) the relationship between affective assessment relationship and instruction (Items 1, 5, 9, 13, and 17), (2) self-report inventories (Items 2, 6, 10, 14, and 18), (3) respondents' anonymity (Items 3, 7, 11, 15, and 19), and (4) how to interpret results of affective assessment (Items 4, 8, 12, 16, and 20). Thus, you can determine how well you seem to understand each of the four topics by focusing on how you performed on the five items for that topic. Students' affect is most often measured in school because evidence of students' affect is believed to help predict how students are apt to behave in particular ways later—when those students' educations have been concluded.
True
One of the best ways to minimize halo effect—and its negative impact on scoring accuracy—is to employ analytic scoring and then implore rubric-users to render separate judgments for each evaluative criterion.
True
One of the most useful ways of determining the instructional dividends of a standardized achievement test is to analyze the manner in which students' performances are reported to the various user groups, for instance, educators, parents, and policymakers.
True
One of the shortcomings of the range as an indicator of the variability in a set of students' test scores is that it is derived from only two raw scores.
True
One clear-cut dividend of using item-response theory's scale scores when reporting students' performances on standardized tests is that this type of scale score allows for the statistical equating of test forms with dissimilar difficulty levels.
True
Significant factors in determining the quality of a diagnostic test are the following: curricular alignment, sufficiency ofitems, quality of items, and ease of usage.
True
Standardized tests, whether focused on achievement or aptitude, are assessment instruments administered, scored, and interpreted in a standard, predetermined manner.
True
Teachers will find that their classroom assessments are most useful when a teacher's earliest thinking about the nature of such assessments is explicitly intended to contribute to an upcoming educational decision to be made by the teacher.
True
The split-and-switch design, if used to collect pretest-to-posttest evidence based on students' test scores, is most appropriate when used with a relatively large class, for instance, a class containing 25 or more students.
True
To avoid the excessive time-consumption often associated with performance assessment, it is helpful for teachers to focus their performance tests on measuring only a modest number of particularly significant skills.
True
When scoring students' responses to performance tests, the three common sources of errors contributing to invalid inferences are the scoring scale, the scorers themselves, and the procedures by which scorers employ the scoring scale.
True
When teachers employ skill-focused rubrics to evaluate students' responses to performance tests, it is useful—both evaluatively and instructionally—tobriefly label each of a rubric's evaluative criteria.
True
Whenever possible, the following evaluative criteria should be employed when teachers select performance-test tasks: generalizability, authenticity, multiple foci, teachability, fairness, feasibility, and scorability.
True
America's long-standing preoccupation with using tests to arrive at comparative score interpretations was heavily influenced by the considerable success of the Army Alpha, a World War I group-administered aptitude test intended to identify Army recruits who would be successful in the Army's officer-training programs.
True
A district chooses a commercial test to provide information about the social studies skills and knowledge that the students seem to be having difficulty in mastering. A relatively elaborate series of "alignment" studies will be carried out early in the school year in an attempt to provide validity evidence to confirm this instructionally supportive usage. On which of the following sources of validity evidence is it most likely those who are supervising these alignment studies will rely?
Validity evidence based on test content
Webb's alignment procedures have become increasingly popular among American educators. Which one of the following statements is an accurate assertion regarding this widely used procedure for gauging the degree to which a test's items are representatively reflective of a set of curricular aims?
Webb's alignment procedure is, at bottom, a judgmentally based validation procedure centered on the appropriateness of test content.
Please read the descriptions of fictitious teachers prepping students for upcoming exams, then select the most accurate characterization of the teacher's activities. Because she is eager for her students to perform well on their twelfth-grade senior mathematics tests(administered by the state department of education), Mrs. Williamson gives students answer keys for all of thetest's selected-response items. When her students take the test in the school auditorium, along with all of theschool's other twelfth-graders, she urges them to use the answer keys discreetly, and only if necessary. Mrs. Williamson's activities constitute:
a violation of both guidelines
Please read the descriptions of fictitious teachers prepping students for upcoming exams, then select the most accurate characterization of the teacher's activities. Christi Jones, a third-grade teacher, was asked by officials of her state department of education two years ago to serve as a member of a "Bias Review Committee" whose task was to consider whether a set of not-yet-final items being prepared for the state's annual accountability tests contained any assessment bias that would preclude their use. Even though Christi realized that her committee's item-by-item reviews would not be the only factor determining whether such underdeveloped items would actually be used on the state-administered accountabilitytests, she was convinced that many of the items she had reviewed would end up on those tests. Accordingly, based on the informal notes she had taken during a two-day meeting of the Bias Review Committee, she always makes certain to give her own third-grade students plenty of guided and independent practice in responding to items similar to those she had reviewed. Christi generates these practice items herself, always trying to make her practice items resemble the specific details of the items she reviewed. Because a new teacher-evaluation system in her district calls for the inclusion of state test scores of eachteacher's students, Christi was pleased to see that her own third-graders scored well on this year's state tests. Christi's activities constitute:
a violation of both guidelines
Please read the descriptions of fictitious teachers prepping students for upcoming exams, then select the most accurate characterization of the teacher's activities. The district school board where Todd Blanding teaches high school chemistry has developed a teacher-evaluation process in which fully 40 to 60 percent of a teacher's overall annual evaluation must be based on "measures of student growth." Moreover, because not all of the district's teachers instruct students who are required to complete an external achievement test, the board has stipulated that "up to 100 percent of a teacher's student-growth evidence (40 to 60 percent) can be based on before-instruction and after-instruction classroom assessments." Todd and the other teachers in his high school realize how important it is for their students to score well on classroom tests, particularly any tests being used to collect evidence of pre-instruction to post-instruction growth. Accordingly, each month the high school's staff participates in content-alike learning communities so they can explore together suitable test-preparation alternatives. Based on these monthly explorations, Todd has developed apretest-to-posttest instructional approach whereby he never provides "item-specific instruction" for more than half of the items he intends to use for any upcoming posttest. ("Item-specific instruction explicitly explores the nuances of a particular item.) Because at least half of the items on an instructional unit's posttest will not have been discussed in class prior to the posttest, Todd is confident that he can base valid interpretations about students' growth from theirpretest-to-posttest performances. Todd's activities constitute:
a violation of both guidelines
Please read the descriptions of fictitious teachers prepping students for upcoming exams, then select the most accurate characterization of the teacher's activities. Consuela Sanchez realizes that many of her fourth graders are relatively recent arrivals in the United States, having come from Mexico and Central America. Most of her students speak English as a second language and possess limited experience in taking the kinds of standardized tests used so frequently these days in U.S. schools. Accordingly, Consuela has located a number of English-language standardized tests for her fourth-grade students, and she has photocopied segments of the tests so the introductory pages will be available to all of her students. Once every few weeks, Consuela asks her fourth-graders to spend classroom instructional time trying, as she says, to "make sense" out of these tests. About 20 minutes is devoted to students' reading the tests' directions and then determining if they can understand specifically how they are to complete each of the standardized tests. She makes no copies of any items other than those used in a test's directions. Consuela tells her students, "If you understand exactly what you are to do with a test, you will almost always do better on thattest." Students seem to regard these occasional exercises positively, thinking of Consuela's test analyses as something akin to solving a detective mystery model. Consuela's activities constitute:
a violation of neither guideline
Please read the descriptions of fictitious teachers prepping students for upcoming exams, then select the most accurate characterization of the teacher's activities. Mr. Thompkin teaches mathematics in an urban middle school serving many students from lower-income families. Although Mr. Thompkin personally finds his district's heavy emphasis on educational testing to be excessive, he concedes that his students will benefit from scoring well on the many math tests he is obliged to administer during a school year. Because most of his students cannot afford to enroll in the commercial test-preparation programs that are available throughout his city, Mr. Thompkin entices a psychologist friend of his—a friend who is particularly knowledgeable about test-taking skills—to visit all of his courses one day during the first month of school. The psychologist explains to students not only how to take tests successfully but also how to prepare in advance for any high-stakes testing situations. Mr. Thompkin believes one class period per year that's focused ontest-taking rather than learning mathematics is a decent trade-off for his students. Mr. Thompkin's activities constitute:
a violation of neither guideline
Please read the descriptions of fictitious teachers prepping students for upcoming exams, then select the most accurate characterization of the teacher's activities. Mrs. Hilliard knows the reading test administered to all state eighth graders contains a set of five fairly lengthy reading selections, each of which is followed by about eight multiple-choice items dealing with such topics as (1) the main idea of the selection or the main idea of its constituent paragraphs, (2) the meaning of technical terms that can be inferred from contextual clues, and (3) the defensibility of post-reading inferential statements linked to the selection. Mrs. Hilliard routinely spends time in her eighth-grade language arts class trying to improve her students' reading comprehension capabilities. She has the students read passages similar to those used in the statewide test, then gives her students a variety of practice tests, including written multiple-choice, true-false, and oral short-answer tests in which, for example, individual students must state aloud what they believe to be the main idea of a specific paragraph in the passage. Mrs. Hilliard's activities constitute:
a violation of neither guideline
Fred Phillips prepares his sixth-grade social studies students to do well on a state-administered social studies examination by having all of his students take part in practice exercises using test items similar to those found on the state examination. Fred tries to replicate the nature of the state examination's items without ever using exactly the same content as it is apt to appear on the examination. He weaves his test-preparation activities into his regular social studies instruction so cleverly that most students really don't know they are receiving examination-related preparation. Fred Phillips' activities constitute:
a violation of the educational defensibility guideline
Please read the descriptions of fictitious teachers prepping students for upcoming exams, then select the most accurate characterization of the teacher's activities. Srijati is eager to have her fourth-grade students become better "close readers," that is, to be better able to read written materials carefully so that they are capable of, as Srijati says, "sucking all of the meaning out of what theyread." Because of reductions in assessment funds, however, Srijati's school district has been obliged to eliminate all constructed-response items assessing students' reading comprehension. All items measuring students' readingcomprehension, therefore, must be selected-response types of items and, beyond that, district officials have indicated that only three specific item types will be used in district-developed reading tests. One type asks students to read a brief passage and then select from four alternatives the passage's main idea. Two of thewrong-answer alternatives in these "choose the main idea" items must always be completely unrelated to the passage itself. The remaining wrong-answer alternative must be a reversal of the actual main idea. The other two kinds of district-stipulated types of acceptable items are equally constrained in their structures. So that her students will perform optimally on the district-developed reading tests, Srijati provides "close-reading practice" based exclusively on the three district-approved ways for students to display their reading comprehension. Srijati's fourth-graders really shine when it is time to take the district reading tests. Srijati's activities constitute:
a violation of the educational defensibility guideline
Please read the descriptions of fictitious teachers prepping students for upcoming exams, then select the most accurate characterization of the teacher's activities. Because there is a statewide reading comprehension test that must be passed by all high-school students before they receive state-sanctioned diplomas, Mr. Gillette, a tenth-grade English teacher, spends about four weeks of his regular class sessions getting students ready to pass standardized tests. He devotes one week to each of the following topics: (1) time management in examinations, (2) dealing with test-induced anxiety, (3) making calculated guesses, and (4) trying to think like the test's item writers. Mr. Gillette's students seem appreciative of his efforts. Mr. Gillette's activities constitute:
a violation of the professional ethics guideline