TLA Final
Based on the 2014 edition of the Standards for Educational and Psychological Testing, and on common sense, which one of the following statements about students' test results represents a potentially appropriate phrasing that's descriptive of a set of students' test performances?
Students' scores on the test permit valid interpretations for this test's use.
Measurement specialists assert that validation efforts are preoccupied with the degree to which we use students' test performances to support the accuracy of score-based inferences. Which of the following best identifies the focus of those inferences?
Students' unseen skills and knowledge
Which of the following strategies seems most suitable for teachers to use when trying to detect and eliminate assessment bias in their own teacher-made tests?
Teachers should pay particular attention to the possibility that assessment bias may have crept into their teacher-made tests and should strive to rely on their best judgments about the presence of such bias on all of their classroom tests—but especially on their most significant classroom assessments.
Webb's alignment procedures have become increasingly popular among American educators. Which one of the following statements is an accurate assertion regarding this widely used procedure for gauging the degree to which a test's items are representatively reflective of a set of curricular aims?
The alignment process designed by Webb is dominated by a single factor, namely, the degree to which different curricular aims are given equal emphasis on the test for which validity evidence is being collected.
Ramon Ruiz is sorting out empty tin cans he found in the neighborhood. He has four piles based on different colors of the cans. He thinks he has made a mistake in adding up how many cans are in each pile. Please identify Ramon's addition statement that is in error. a. 20 bean cans plus 32 cans = 52 cans. b. 43 bean cans plus 18 cans = 61 cans. c. 38 bean cans plus 39 cans = 76 cans. d. 54 bean cans plus 12 cans = 66 cans
The assessment item appears to be biased against Americans of Latino backgrounds.
In certain Christian religions, there are gradients of sinful acts. For example, in the Roman Catholic Church, a venial sin need not be confessed to a priest, whereas a mortal sin must definitely be confessed. Based on a context clue contained in the paragraph above, which of the following statements is most accurate? a. For Catholics, there is no difference in the gravity or mortal or venial sins. b. For Catholics, a mortal sin is more serious than a venial sin. c. For Catholics, a venial sin is more serious than a mortal sin. d. Catholic priests are required to forgive all mortal sins that are confessed.
The assessment item appears to be biased in favor of students who are Roman Catholics.
Amy Johnson has a large collection of Barbie dolls. Originally, she had 49. Recently, she somehow lost 12 Barbies. How many Barbies does Amy have left? (Show your work.) a. 37 Barbies b. 61 Barbies c. 27 Barbies
The assessment might offend people who view girls as having much broader interests than playing with dolls.
Directions: To conclude our unit on how to prepare successfully for a debate, please consider carefully the following preparation-focused topics. After doing so, choose one that you regard as most important—to you—and then write a 300-400 word essay describing how best to prepare for whatever topic you chose. Be sure to identify which of the potential topics you have selected. You will have 40 minutes to prepare your essay. Potential Essay Topics Introducing your position and defending it Use of evidence during the body of the debate Preparing for your opponents' rebuttal
The illustrative item is structured in direct opposition to one of the chapter's guidelines for writing essay items.
An anonymously completed, self-report item regarding a student's values —an item that has no clearly correct answer—is best suited for use in an: a. cognitive examination b. affective inventory c. psychomotor skills test
The illustrative item violates a general item-writing guideline by providing a blatant grammatical clue to the correct answer.
Following World War Two, an international organization intended to maintain world peace was established, namely, the United Nations. Similarly, after World War One a peace-oriented international organization was established. What was the name of that earlier organization? _____________________
The illustrative item violates none of the chapter's guidelines for writing short-answer items.
Validation is the joint responsibility of the test developer and the test user, but the accumulation of reliability/precision evidence is the exclusive responsibility of the test user. (Circle one: T or F)
The illustrative True/False item violates one of the item-category guidelines by including two substantial concepts in a single item.
Which one of the following kinds of validity evidence represents a different category of evidence than the other three kinds of validity evidence identified? a. Convergent evidence, that is, positive relationships between test scores and other measures intended to measure the same or similar constructs b. Discriminant evidence, that is, positive relationships between test scores and other measures purportedly assessing different constructs c. Alignment evidence d. Test-criterion relationship evidence representing the degree to which a test score predicts a relevant variable that is operationally distinct from the test
The item violates at least one of the chapter's general item-writing guidelines and at least one of the chapter's item-writing guidelines for binary-choice items.
For following item, select the option that best illustrates the degree to which the item adheres to the chapter's general item-writing guidelines or the guidelines for specific categories of items. Note that following item deal with assessment-related content and thus might be regarded as a rudimentary form of "assessment enrichment." Consider whether the following binary-choice item adheres to the item-writing guidelines presented in the text. Presented below is a binary-choice item. Please indicate—by circling the R or W—whether the statement given in the item is right (R) or wrong (W). R or W Absence-of-bias determinations are typically made as a function of judgmental scrutiny and, when possible, empirical analysis.
The item violates none of the chapter's guidelines, either the five general guidelines or the specific guidelines for binary-choice items.
A or I ___ If a teacher wishes to create assessments that truly tap students' mastery of higher order cognitive challenges, the teacher will not be working within the affective domain.
The item violates the item-category guideline discouraging the use of negatives in such items.
When external reviewers of a test's content attempt to judge how well a test's items mesh with a specified collection of curricular aims, which one of the following pairs of alignment indicators should be present?
The percentage of the test's items judged to measure one or more curricular aims and the percentage of the test's items judged to measure none of the specified curricular aims
Validity evidence can be collected from a number of sources. Suppose, for instance, that a mathematics test has been built by a school district's officials to help identify those middle-school students who are unlikely to pass a statewide eleventh-grade high-school diploma test. The new test will routinely be given to the district's seventh-grade students. To secure evidence supporting the validity of this kind of predictive application, the new test will be administered to current seventh-graders, and the seventh-grade tests will also be given to the district's current eleventh-graders. This will permit the eleventh-graders' two sets of test results to be compared. Which of the following best describes this source of validity evidence?
The relationship of eleventh-graders' performances on the two tests
A Latin teacher in an urban high school (that has a long and oft-honored history of preparing students for college) frequently expresses during faculty meetings her complete disdain for what she calls "multiple-guess exams." As part of her annual teacher-evaluation evidence, she has been asked by her school's principal to present a written description of how she plans to evaluate students' responses to her constructed-response items. Please consider the following description supplied by the teacher, then select from four alternatives the most accurate comment regarding this teacher's scoring plans. "I plan to score my students' essay responses holistically, not analytically, because I invariably ask students to generate brief essays in which they must incorporate at least half of the new vocabulary terms encountered during the previous week. I supply students with a set of explicit evaluative criteria that I will incorporate in arriving at a single, overall judgment of an essay's quality. Actually, I always pre-weight each of these evaluative criteria and post those weights for students in advance of their tackling this task. Because this is a course emphasizing the writing of Latin (rather than oral Latin), I make it clear to my students—well in advance—that grammar and the other mechanics of writing are very important. When I score students' essays, if there is more than one essay per test, I score all of Essay One before moving on to Essay Two. Because I want these students to become, in a sense, Latin "journalists," I require that they clearly identify themselves with a byline at the outset of each essay. This scoring system, based on nearly 20 years of my teaching Latin to hundreds of our school's students, really works!"
The teacher's approach violates one of the chapter's essay-scoring guidelines.
In the space provided in your test booklet, please compose a brief editorial (of 250 words or less) in favor of the school district's after-school tutorial program. The intended audience for your position statement consists of those people who routinely read this town's weekly newspaper. Because you will have the entire class period to complete this task, you may wish to write a draft editorial using the scratch paper provided so that you can then revise the draft before copying your final version into the test booklet. Your grade on this task will contribute 40 percent toward the grade for the Six-Week Persuasive Writing Unit.
This illustrative item contains no serious violation of any of the chapter's guidelines for writing essay items.
Directions: For each statement in the following cluster of four statements, please indicate whether the statement is true (T) or false (F) by circling the appropriate letter. In an elaborate effort to ascertain the reliability of a new high-stakes test developed in their district, central-office administrators have calculated the following types of evidence based on a tryout of the test with nearly 2,300 students: • Internal consistency r = .83 • Test-retest r = .78 • Standard error of measurement = 4.3 T or F (1) The three types of reliability evidence calculated by the central-office staff are essentially interchangeable. T or F (2) The trivial difference between the test-retest coefficient and the internal consistency coefficient constitutes no cause for alarm. T or F (3) The test-retest r should never be smaller than a test's internal consistency estimate of reliability. T or F (4) The standard error measurement (4.3 in this instance) is derived more from validity evidence than from reliability evidence.
This illustrative item seems to violate none of the chapter's guidelines for constructing such items, that is, the general guidelines, the guidelines for multiple binary-choice guidelines, and the guidelines for binary-choice items.
True ___ False___ When determining a test's classification consistency, there is no need to consider the cut score employed nor that cut score's location in the score distribution.
This illustrative item violates the item-specific guideline regarding the use of negative statements in a binary-choice item.
List X List Y ___ (1) matching a. Can cover much content ___ (2) binary-choice b. Can test high-order cognition ___ (3) multiple binary-choice c. May elicit only low-level knowledge d. Cannot assess creative responses
This illustrative matching item contains several departures from Chapter Six's item-writing guidelines for matching items.
Consider the following illustrative binary-choice item. It deals with a reliability/precision concept treated in the Standards for Educational and Psychological Testing (2014). Directions: Please indicate whether the statement below regarding the reliability/precision of educational tests is Accurate (Circle the A) or Inaccurate (Circle the I). A or I Because the standard error of measurement can be employed to generate confidence intervals around reported scores, it is typically more informative than a reliability coefficient. Which of the following statements best describes the illustrative item?
This illustrative binary-choice item violates none of the general or item-category guidelines for this type of selected-response item.
When we encounter a test whose scores are affected by processes that are quite extraneous to the test's intended purpose, we assert that the test displays which one of the following? a. Construct underrepresentation b. Construct deficiency c. Construct corruption d. Construct-irrelevant variance e. All of the above
This illustrative item, because it includes an "all of the above" alternative, violates an important ite-writing guideline.
Please compose a short essay of 500 and 1,000 words on the topic: "Soccer Outside the United States." Either use one of our classroom computers or write the essay by hand. Be sure to engage in appropriate prewriting activities, draft an initial version of the essay, and then revise your draft at least once. You will have ninety minutes to complete this task.
This item seems to be biased in favor of children born outside the United States, many of whom may be more familiar with non-U.S. soccer than will children be who are born in the United States.
What is the chief function of validity evidence when employed to confirm the accuracy of score-based interpretations about test-takers' status in relation to specific uses of an educational test?
To support relevant propositions in a validity argument that's marshaled to determine the defensibility of certain score-based interpretations
A considerable degree of disagreement can be found among educators regarding the precise meaning of the label "performance assessment."
True
A major challenge facing those teachers who personally employ performance tests is the difficulty of drawing valid inferences about students' generalized mastery of the skill(s) or bodies of knowledge being measured.
True
Although the NAEP assessment frameworks are, technically, supposed to guide NAEP item-development and not function as curricular frameworks because of the long-standing U.S. tradition that the federal government shouldn't influence what is taught in state-governed public schools, teachers can still get good ideas about what to assess and how to assess it from the illustrative NAEP items that are available to the public.
True
Among the most prevalent personal-bias errors made when scoring students' responses to performance tests are generosity errors, severity errors, and central-tendency errors.
True
Because a classroom test's influence on a teacher's instructional decision making is one of the most beneficial dividends of classroom assessment, a teacher should think through in advance how certain levels of student performances would influence a teacher's test-based instructional decisions—and then abandon or revise any tests that have no decision-impact linked to their results.
True
Because of such needs as how to grade this year's students or whether changes are needed in next year's instructional procedures, teachers should invariably link their planned classroom assessments explicitly to these sorts of decisions from the earliest moments a classroom test is being conceptualized.
True
Because of today's continuing advances in technology, it seems certain that creators of performance assessment will increasingly structure their computer-based assessments around a wide range of digitally simulated tasks.
True
Because recent years have seen both schools and teachers being evaluated on the basis of students' performances on high-stakes tests, such as a state's annual accountability tests, it becomes almost imperative for teachers to determine the degree to which what's measured by their classroom assessments can contribute to improved students' performances on such significant tests.
True
Because the curricular recommendations of national subject-matter associations typically represent the best curricular thinking of the most able subject-matter specialists in a given field, as teachers try to identify the knowledge, skills, and affect to measure in their own classroom assessments, the views of such national organizations can often provide helpful curricular insights.
True
If an elementary teacher has designed his instructional system so it centers on the use of "catch-up" and "enrichment" learning centers where, based on classroom-assessment performances, students self-assign themselves to one of these centers, an early-on factor to consider is whether the classroom assessments should yield norm-referenced or criterion-referenced inferences.
True
If appropriately conceived and implemented, performance assessment can contribute substantially not only to improving a teacher's instructional effectiveness but also to increasing the quality of students' learning.
True
If a teacher's students are annually supposed to master an officially approved set of state curricular standards, and a state accountability test aligned with those standards is given each year, teachers should surely try to make sure that what their classroom tests measure is congruent —or contributory to—what's assessed by such state accountability tests.
True
In recognition of how much time it typically takes for teachers to score students' responses to constructed-response items, especially those items calling for extended responses, an early factor for a teacher to consider when creating a classroom assessment is whether the teacher has sufficient time to properly score students' responses to a test containing constructed-response items.
True
It is often remarkably helpful for teachers to ask their coworkers to review the potential emphases of underdevelopment classroom assessments because teachers and administrators, especially those who are familiar with what's being taught and the sorts of students to whom it is taught, can provide useful insights regarding what should be assessed—and what shouldn't.
True
Many users of the kinds of scoring rubrics employed to evaluate students' performance-test responses agree that the most significant feature of such rubrics is its set of evaluative criteria.
True
One of the best ways to minimize halo effect—and its negative impact on scoring accuracy—is to employ analytic scoring and then implore rubric-users to render separate judgments for each evaluative criterion.
True
Teachers will find that their classroom assessments are most useful when a teacher's earliest thinking about the nature of such assessments is explicitly intended to contribute to an upcoming educational decision to be made by the teacher.
True
When scoring students' responses to performance tests, the three common sources of errors contributing to invalid inferences are the scoring scale, the scorers themselves, and the procedures by which scorers employ the scoring scale.
True
A district chooses a commercial test to provide information about the social studies skills and knowledge that the students seem to be having difficulty in mastering. A relatively elaborate series of "alignment" studies will be carried out early in the school year in an attempt to provide validity evidence to confirm this instructionally supportive usage. On which of the following sources of validity evidence is it most likely those who are supervising these alignment studies will rely?
Validity evidence based on internal structure of the social studies test
Which one of the following sources of validity evidence should be of most interest to teachers when evaluating their own teacher-made tests?
Validity evidence based on test content
If educators wish to accurately estimate the likelihood of consistent decisions about students who score at or near a high-stakes test's previously determined cut-score, which of the following indicators would be most useful for this purpose?
a standard error of measurement for the entire test
A dozen middle-school mathematics teachers in a large school district have collaborated to create a 30-item test of students' grasp of what the test's developers have labeled "Essential Quantitative Aptitude," that is, students' EQA. All 30 items were constructed in an effort to measure each student's EQA. Before using the test with many students, however, the developers wish to verify that all or most of its items are functioning homogeneously, that is, are properly aimed at gauging a test-taker's EQA. On which of the following indicators of assessment reliability should the test developers focus their efforts?
an internal-consistency reliability coefficient
Please assume you are a middle-school English teacher who, despite this chapter's urging that you rarely, if ever, collect reliability evidence for your own tests, stubbornly decides to do so for all of your mid-term and final exams. Although you wish to determine the reliability of your tests for the group of students in each of yourclasses, you only wish to administer the tests destined for such reliability analyses on one occasion, not two or more. Given this constraint, which of the following coefficients would be most suitable for your reliability-determination purposes?
an internal-consistency reliability coefficient
A self-report inventory intended to measure secondary students' confidence that they are "college and career ready" has recently been developed by administrators in an urban school district. To collect evidence bearing on the consistency with which this new inventory measures students' status with respect to this affective disposition, the inventory is administered to nearly 500 students in late January and then, a few weeks later, in mid-February. When students' scores on the two administrations have been correlated, which one of the following indicators of reliability will have been generated?
a test-retest reliability coefficient
One of your colleagues, a high-school chemistry teacher, believes that certain of her students have somehow gained access to the final exams she has always used in her classes. To address what she calls "this serious security violation," she has created four new versions of all of her major exams—four versions that she regards as "equally challenging." She has recently sought your advice regarding what sort of reliability evidence she ought to be collecting regarding these new multiple renditions of her chemistry exams. In this situation, which one of the following should you be recommending to her?
evidence regarding the alternate-form reliability of her several exams
A district's new computer-administered test of students' mastery of "composition conventions" has recently been used with their district's eleventh- and twelfth-grade students. To help judge the consistency with which the test measures students' knowledge of the assessed conventions, district officials have computed Cronbach's coefficient alpha for students who completed this brand-new exam. Which of the following kinds of reliability evidence do these alpha coefficients represent?
internal consistency
Please imagine that the reading specialists in a district's central office have developed what they have labeled a "diagnostic reading test." You think its so-called subscale scores are not diagnostic at all but are simply measuring a single overall dimension you believe to be "reading comprehension." In this setting, which of the following kinds of reliability evidence would supply the most relevant information related to your disagreement with the reading test's developers?
internal-consistency reliability evidence
If a multistate assessment consortium has generated a new performance test of students' oral communication skills and wishes to verify that students' scores on the performance test remain relatively similar regardless of the time during the school year when the test was completed, which of the following kinds of consistency evidence would be most appropriate?
test-retest evidence of reliability
Suppose that a state's governor has appointed a blue-ribbon committee to establish a test-based promotion-denial system for reducing the number of sixth-grade students who are "socially" promoted to the seventh grade. The blue-ribbon committee's proposal calls for sixth-graders to be able to take a new high-stakes promotion exam at any time they wish during their grade-six school year. Given these circumstances, which one of the following evidences of the new promotion exam's measurement consistency should be collected?
test-retest reliability
Which one of the following four pairs of validity evidence most frequently revolves exclusively around judgments focused on test content?
Developmental-care documentation and external content reviews by nonpartisan judges
In general, holistically scored rubrics are more useful for pinpointing students' strengths and weaknesses than are analytically scored rubrics.
False
Please indicate whether following statement is related to performance assessment is Accurate (True) or Inaccurate (False).
False
Whenever possible, teachers should attempt to have their assessments focus quite equally on the cognitive, affective, and psychomotor domains because almost all human acts including students' test-taking—rely to a considerable extent on those three domains of behavior.
False
"Because terrific pressures are currently obliging teachers to significantly boost their students' test scores, every teacher needs to understand how, in most instances, an educational test actually defines the nature of what's to be taught." This is:
One of today's reasons for teachers to know about assessment
Please accurately fill in the blanks you find in the statement given below regarding "How a bill becomes a law." In _______, _______ and _______ explored what ultimately became the _______ section of the northwestern United States with the assistance of a native-American guide known as _______. (Prod. These blank lines MUST be equal in length.)
The item satisfies the guideline regarding linear equality, yet violates the number-of-blanks guideline.
In which one of the following four statements are all of the pronouns used properly? a. I truly enjoyed his telling of the joke. b. We watched him going to the coffee shop. c. We listened to them singing the once—popular, but rarely heard song. d. Dad watched them joking about politicians-while approving of it all.
This assessment item does not appear to be biased.
To avoid the excessive time-consumption often associated with performance assessment, it is helpful for teachers to focus their performance tests on measuring only a modest number of particularly significant skills.
True
Which of the following indices of a test's reliability is most often provided by developers of the kinds of standardized tests destined for use with large numbers of students?
internal-consistency reliability coefficients
"Wishing that students will make progress does not guarantee that students actually will do so. And this is why I believe teachers have a fundamental responsibility to monitor their students' progress throughout the school year. I try to administer informal progress-monitoring quizzes every few weeks to make sure my instruction is "taking." If my instruction is not working as well as I want it to work, then I can make modifications in my upcoming teaching plans. Assessment-based monitoring of students' progress is so very sensible that it's hard for me to understand why it is not more widely used. This is:
A traditional reason for teachers to know about assessment
"Teachers need to give classroom assessments in order to assign grades to students indicating how well each student has attained the learning outcomes set for them." This is:
A traditional reason for teachers to know about assessment
"We have an enormously diverse collection of students in our school, and their levels of achievement are all over the lot. Accordingly, when I get a new group of students for my third-grade classroom each fall, you can bet that during the early days of the school year I assess their entry behavior, that is, the knowledge and skills those children already possess. It helps me to know where I need to put my instructional energies during the school year." This is:
A traditional reason for teachers to know about assessment
Please complete the short-answer items below by filling in the blank you will find in each item. • __________ is the case to be employed with all modifiers of gerund—definitely including pronouns. • A __________ infinitive that, in former times, was regarded as a grammatical error is now acceptably encountered in all kinds of writing.
Although several of the chapter's item-writing guidelines have been properly followed, there is the same, rather obvious, violation of an item-writing guideline in both items.
A recently established for-profit measurement company has just published a brand-new set of "interim tests" intended to measure students' progress in attaining certain scientific skills designated as "21st century competencies." There are four supposedly equivalent versions of each interim test, and each of these four versions is to be administered about every two months. Correlation coefficients showing the relationship between every pair of the four versions are made available to users. What kind of coefficient do these between-version correlations represent?
An alternate-form coefficient
A compulsive middle-school teacher, even after reading Chapter 2's recommendation urging teachers not to collect reliability evidence for their own teacher-made tests, perseverates in calculating Kuder-Richardson indices for all of his major and minor classroom exams. What kind of reliability indicator is this teacher attempting to compute?
An internal-consistency reliability coefficient
Directions: Remembering the class discussions of America's current immigration issues, please provide a brief essay on each of the issues cited below. You will have a full 50-minute class period to complete this examination, and you should divide your essay-writing efforts equally between the two topics. In grading your twin essays, equal weight will be given to each essay. Remember, compose two clear essays—one for each issue. Your Two Essay Topics 1. Why would some form of "amnesty" for illegal aliens be a helpful solution to at least part of today's U.S. immigration problems? 2. Why would some form of "amnesty" for illegal aliens be a disastrous solution to today's U.S. immigration problems?
At least one of the chapter's guidelines has been explicitly followed in the illustrative item.
If a teacher's students include children with disabilities or children who are English Language Learners, which one of the following three assertions about assessment bias is most defensible?
Because assessment bias erodes the validity of inferences derivative from students' test performances, even greater effort should be made to reduce assessment bias when working with these two distinctive populations.
Why do some members of the measurement community prefer to use the phrase "absence-of-bias" rather than "assessment bias" when quantitatively reporting the degree to which an educational test appears to be biased?
Because both reliability and validity, two key attributes of educational tests, are positive, "to be sought" qualities, so too is "absence-of-bias" a positive quality to be sought in educational tests.
"I was quite surprised when our state's department of education insisted that each of the state's teachers collect accurate evidence of their students' growth because such evidence was to be used in evaluating all of the state's teachers. I have, for my entire career, collected pretest and posttest evidence of my students' achievement status because this helps me—irrespective of what the state wants me to do—determine which changes, if any, are needed during next year's instruction." This is:
Both a traditional reason and one of today's reasons for teachers to know about assessment
Only one of the following statements about a test's classification consistency is accurate. Select the accurate statement regarding classification consistency.
Classification consistency indicators represent the proportion of students classified identically on two testing occasions.
Suppose that the developers of a new science achievement test had inadvertently laden their test's items with gender-based stereotypes regarding the role of women in science and, when the new test was given, the test scores of girls were markedly lower than the test scores of boys. Which of the following deficits most likely accounts for this gender disparity in students' scores?
Construct-irrelevant variance
Thinking back over the mathematics lessons and homework assignments that you received during the past 12 weeks, what mathematical conclusions can you draw? Describe those conclusions in no more than 300 words, written by hand on the test-booklets provided or as a printed copy of your conclusions composed on one of our classroom computers.
Despite its adherence to one of the chapter's item-writing guidelines for essay items, the shoddy depiction of a student's task renders the item dysfunctional.
Assume a state's education authorities have recently established a policy that, in order for students to be promoted to the next grade level, those students must pass a state-supervised English and language arts (ELA) exam. Administered near the close of Grades three, six, and eight, the three new grade-level exams are intended to determine a student's mastery of the official state-approved ELA curricular targets for those three grades. As state authorities set out to provide support for these "promotion-denial" exams, which one of the following sources of validity evidence are they likely to rely on most heavily?
Evidence based on test content
Although the way a state's public schools are run is up to officials of that state, not the federal government, the U.S. Supreme Court has ruled that state-taught students must still be granted their constitutionally guaranteed rights, and this means that teachers should be guided about classroom-assessment coverage by the U.S. Constitution.
False
Because the National Assessment of Educational Progress (NAEP) is widely employed as a "grade-promotion" and "diploma-denial" exam for individual students, teachers whose students take NAEP tests should familiarize themselves with the content in NAEP assessment frameworks to identify potential emphases for classroom assessments.
False
Because parents' preferences regarding what their children should be learning are not only motivationally useful for teachers to employ but also constitute significant curricular guidance for educators, teachers should strive to incorporate parents' curricular opinions in all of their classroom assessments.
False
Because students' growth in their mastery of cognitive skills and knowledge is such a patently important factor by which to evaluate the success of not only schools, but also teachers, classroom assessments should focus exclusively on measuring students' cognitive status.
False
Even though teachers should not take away too much instructional time because of their classroom assessments, the number of assessment targets addressed by any classroom test should still be numerous and wide-ranging so that more curricular content can be covered.
False
If a teacher decides to seek advice from, say, a group of several teacher colleagues regarding the appropriateness of the content for the teacher's planned classroom assessment, professional ethics demand that the curricular counsel of those colleagues must be accepted.
False
If a state's education officials have endorsed the Common Core State Standards, but have chosen to create their state's own accountability tests to measure those standards (instead of using tests built by a multistate assessment consortium), it is still sensible for a teacher in that state to seek test-construction guidance from what's measured by consortium-created tests.
False
Information about how to differentiate the quality of students' responses to performance-test tasks should be supplied for a minimum of at least half of a rubric's evaluative criteria.
False
Performance testing, because of its requisite reliance on sometimes flawed human scoring, should be chiefly restricted to measuring students' mastery of lower-order cognitive skills.
False
Teachers who rely chiefly on hypergeneral rubrics are most likely to spur students to acquire a generalized mastery of whatever skills are being assessed by the performance tests involved.
False
The most effective way to construct rubrics for efficient and accurate scoring of students' responses to performance tests is to build tests that can be scored simultaneously using analytic and holistic evaluative approaches.
False
_____________ is a good one-word description for commas, periods, question marks, and colons.
For young students such as these third graders, direct questions should be used instead of incomplete statements—so the illustrative item violates an item-writing guideline for short-answer items.
Which of the following represents the most appropriate strategy by which to support the validity of score-based interpretations for specific uses?
Generation of an evidence-laden validity argument in support of a particular usage-specified score interpretation
The relationship between the degree to which an educational test is biased and the test's disparate impact on certain groups of learners is an important one. Which of the following statements best captures the nature of this relationship?
If an educational assessment displays a disparate impact on different groups of test-takers, it may or may not be biased.
What are the two major causes of assessment bias we encounter in typical educational tests?
Inappropriate vocabulary and unfamiliar types of test items
An independent, for-profit measurement firm has recently published what the firm's promotional literature claims to be "an instructionally diagnostic" interim test in mathematics. Different forms of the new test are to be administered to students every two or three months. A student's results are reported as a total, all-encompassing score and also as five "strands" that are advertised as "distinctive and diagnostic." Your district's administrators are deciding whether to purchase copies of this new test. Which one of the following would be the most appropriate source of validity evidence for the newly published test?
Internal structure evidence
"Just as physicians need to know about patients' blood pressure and what it indicates, teachers need to know about educational testing. It is simply part of what a solid educational professional needs to understand." This is:
Neither a traditional reason nor one of today's reasons for teachers to know about assessment
"Leaders of both the National Education Association and the American Federation of Teachers have strongly endorsed the more frequent assessment of students as a way to better educate the nation's children. Accordingly, current teachers need to know about assessment fundamentals before they try to teach their students." This is:
Neither a traditional reason nor one of today's reasons for teachers to know about assessment
"I think every teacher has a professional obligation to make sure their school is accurately evaluated, but also to assure that each teacher in the school is accurately evaluated. Moreover, when appraising schools or teachers, most people rely on students' test performances. So it is abundantly clear that teachers must understand what's going on when kids are tested." This is:
One of today's reasons for teachers to know about assessment
"Just a year ago, the voters in our school district voted favorably in a huge school-levy election that brought in substantial tax dollars for our schools. Most of the district's teachers are convinced that this positive support for the schools was based on our schools' consistently high rankings on the state's annual accountabilitytests." This is:
One of today's reasons for teachers to know about assessment
"When I plan a new unit of instruction for my fourth-grade students, I always—and I mean always—first create the end-of-unit assessments I'll be using. By doing so, I acquire a much more clear idea of where I am heading instructionally, and thereby help my lesson-planning immensely." This is:
One of today's reasons for teachers to know about assessment
When teachers in this school score their students' responses to essay items, those teachers should always (1) make a preliminary judgment about how much importance should be assigned to the conventions of writing, such as spelling, (2) decide whether to score holistically or analytically, (3) prepare a tentative scoring key prior to actually scoring students' responses, (4) try to score students' responses anonymously without knowing which student supplied which response, and (5) score a given student's responses to all essay items on a test and then move on to the next student's responses.
Only one of the faculty-approved rules is basically opposed to the Chapter 7 guidelines for scoring students' responses to essay items.
Suppose that you and several other teachers in a middle school were trying to construct a new test intended to be predictive of high-school students' subsequent scores on the SAT and ACT college admissions exams. Moreover, suppose that you were in no particular hurry to assemble validity evidence in support of the accuracy of those inferred predictions. Which one of the following sources of validity evidence would supply the most compelling support for the validity of your anticipated predictions?
Predictive validity evidence based on the new test's relation to other variables