TLA Final

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Based on the 2014 edition of the Standards for Educational and Psychological​ Testing, and on common​ sense, which one of the following statements about​ students' test results represents a potentially appropriate phrasing​ that's descriptive of a set of​ students' test​ performances?

Students' scores on the test permit valid interpretations for this test's use.

Measurement specialists assert that validation efforts are preoccupied with the degree to which we use​ students' test performances to support the accuracy of​ score-based inferences. Which of the following best identifies the focus of those​ inferences?

Students' unseen skills and knowledge

Which of the following strategies seems most suitable for teachers to use when trying to detect and eliminate assessment bias in their own​ teacher-made tests?

Teachers should pay particular attention to the possibility that assessment bias may have crept into their​ teacher-made tests and should strive to rely on their best judgments about the presence of such bias on all of their classroom tests—but especially on their most significant classroom assessments.

Webb's alignment procedures have become increasingly popular among American educators. Which one of the following statements is an accurate assertion regarding this widely used procedure for gauging the degree to which a​ test's items are representatively reflective of a set of curricular​ aims?

The alignment process designed by Webb is dominated by a single​ factor, namely, the degree to which different curricular aims are given equal emphasis on the test for which validity evidence is being collected.

Ramon Ruiz is sorting out empty tin cans he found in the neighborhood. He has four piles based on different colors of the cans. He thinks he has made a mistake in adding up how many cans are in each pile. Please identify​ Ramon's addition statement that is in error. a. 20 bean cans plus 32 cans​ = 52 cans. b. 43 bean cans plus 18 cans​ = 61 cans. c. 38 bean cans plus 39 cans​ = 76 cans. d. 54 bean cans plus 12 cans​ = 66 cans

The assessment item appears to be biased against Americans of Latino backgrounds.

In certain Christian​ religions, there are gradients of sinful acts. For​ example, in the Roman Catholic​ Church, a venial sin need not be confessed to a​ priest, whereas a mortal sin must definitely be confessed. Based on a context clue contained in the paragraph​ above, which of the following statements is most​ accurate? a. For​ Catholics, there is no difference in the gravity or mortal or venial sins. b. For​ Catholics, a mortal sin is more serious than a venial sin. c. For​ Catholics, a venial sin is more serious than a mortal sin. d. Catholic priests are required to forgive all mortal sins that are confessed.

The assessment item appears to be biased in favor of students who are Roman Catholics.

Amy Johnson has a large collection of Barbie dolls.​ Originally, she had 49.​ Recently, she somehow lost 12 Barbies. How many Barbies does Amy have​ left? (Show your​ work.) a. 37 Barbies b. 61 Barbies c. 27 Barbies

The assessment might offend people who view girls as having much broader interests than playing with dolls.

​Directions: To conclude our unit on how to prepare successfully for a​ debate, please consider carefully the following​ preparation-focused topics. After doing​ so, choose one that you regard as most important—to you—and then write a 300-400 word essay describing how best to prepare for whatever topic you chose. Be sure to identify which of the potential topics you have selected. You will have 40 minutes to prepare your essay. Potential Essay Topics Introducing your position and defending it Use of evidence during the body of the debate Preparing for your​ opponents' rebuttal

The illustrative item is structured in direct opposition to one of the​ chapter's guidelines for writing essay items.

An anonymously​ completed, self-report item regarding a​ student's values —an item that has no clearly correct answer—is best suited for use in​ an: a. cognitive examination b. affective inventory c. psychomotor skills test

The illustrative item violates a general​ item-writing guideline by providing a blatant grammatical clue to the correct answer.

Following World War​ Two, an international organization intended to maintain world peace was​ established, namely, the United Nations.​ Similarly, after World War One a​ peace-oriented international organization was established. What was the name of that earlier​ organization? ​ _____________________

The illustrative item violates none of the​ chapter's guidelines for writing​ short-answer items.

Validation is the joint responsibility of the test developer and the test​ user, but the accumulation of​ reliability/precision evidence is the exclusive responsibility of the test user.​ (Circle one: T or​ F)

The illustrative​ True/False item violates one of the​ item-category guidelines by including two substantial concepts in a single item.

Which one of the following kinds of validity evidence represents a different category of evidence than the other three kinds of validity evidence​ identified? a. Convergent​ evidence, that​ is, positive relationships between test scores and other measures intended to measure the same or similar constructs b. Discriminant​ evidence, that​ is, positive relationships between test scores and other measures purportedly assessing different constructs c. Alignment evidence d.​ Test-criterion relationship evidence representing the degree to which a test score predicts a relevant variable that is operationally distinct from the test

The item violates at least one of the​ chapter's general​ item-writing guidelines and at least one of the​ chapter's item-writing guidelines for​ binary-choice items.

For following​ item, select the option that best illustrates the degree to which the item adheres to the​ chapter's general​ item-writing guidelines or the guidelines for specific categories of items. Note that following item deal with​ assessment-related content and thus might be regarded as a rudimentary form of​ "assessment enrichment." Consider whether the following​ binary-choice item adheres to the​ item-writing guidelines presented in the text. Presented below is a​ binary-choice item. Please indicate—by circling the R or W—whether the statement given in the item is right​ (R) or wrong ​(W). R or W​ Absence-of-bias determinations are typically made as a function of judgmental scrutiny​ and, when​ possible, empirical analysis.

The item violates none of the​ chapter's guidelines, either the five general guidelines or the specific guidelines for​ binary-choice items.

A or I​ ___ If a teacher wishes to create assessments that truly tap​ students' mastery of higher order cognitive​ challenges, the teacher will not be working within the affective domain.

The item violates the​ item-category guideline discouraging the use of negatives in such items.

When external reviewers of a​ test's content attempt to judge how well a​ test's items mesh with a specified collection of curricular​ aims, which one of the following pairs of alignment indicators should be​ present?

The percentage of the​ test's items judged to measure one or more curricular aims and the percentage of the​ test's items judged to measure none of the specified curricular aims

Validity evidence can be collected from a number of sources.​ Suppose, for​ instance, that a mathematics test has been built by a school​ district's officials to help identify those​ middle-school students who are unlikely to pass a statewide​ eleventh-grade high-school diploma test. The new test will routinely be given to the​ district's seventh-grade students. To secure evidence supporting the validity of this kind of predictive​ application, the new test will be administered to current​ seventh-graders, and the​ seventh-grade tests will also be given to the​ district's current​ eleventh-graders. This will permit the​ eleventh-graders' two sets of test results to be compared. Which of the following best describes this source of validity​ evidence?

The relationship of​ eleventh-graders' performances on the two tests

A Latin teacher in an urban high school​ (that has a long and​ oft-honored history of preparing students for​ college) frequently expresses during faculty meetings her complete disdain for what she calls​ "multiple-guess exams." As part of her annual​ teacher-evaluation evidence, she has been asked by her​ school's principal to present a written description of how she plans to evaluate​ students' responses to her​ constructed-response items. Please consider the following description supplied by the​ teacher, then select from four alternatives the most accurate comment regarding this​ teacher's scoring plans. ​"I plan to score my​ students' essay responses​ holistically, not​ analytically, because I invariably ask students to generate brief essays in which they must incorporate at least half of the new vocabulary terms encountered during the previous week. I supply students with a set of explicit evaluative criteria that I will incorporate in arriving at a​ single, overall judgment of an​ essay's quality.​ Actually, I always​ pre-weight each of these evaluative criteria and post those weights for students in advance of their tackling this task. Because this is a course emphasizing the writing of Latin​ (rather than oral​ Latin), I make it clear to my students—well in advance—that grammar and the other mechanics of writing are very important. When I score​ students' essays, if there is more than one essay per​ test, I score all of Essay One before moving on to Essay Two. Because I want these students to​ become, in a​ sense, Latin​ "journalists," I require that they clearly identify themselves with a byline at the outset of each essay. This scoring​ system, based on nearly 20 years of my teaching Latin to hundreds of our​ school's students, really​ works!"

The​ teacher's approach violates one of the​ chapter's essay-scoring guidelines.

In the space provided in your test​ booklet, please compose a brief editorial​ (of 250 words or​ less) in favor of the school​ district's after-school tutorial program. The intended audience for your position statement consists of those people who routinely read this​ town's weekly newspaper. Because you will have the entire class period to complete this​ task, you may wish to write a draft editorial using the scratch paper provided so that you can then revise the draft before copying your final version into the test booklet. Your grade on this task will contribute 40 percent toward the grade for the​ Six-Week Persuasive Writing Unit.

This illustrative item contains no serious violation of any of the​ chapter's guidelines for writing essay items.

​Directions: For each statement in the following cluster of four​ statements, please indicate whether the statement is true ​(T) or false ​(F) by circling the appropriate letter. In an elaborate effort to ascertain the reliability of a new​ high-stakes test developed in their​ district, central-office administrators have calculated the following types of evidence based on a tryout of the test with nearly​ 2,300 students: • Internal consistency r ​= .83 • Test-retest r ​= .78 • Standard error of measurement​ = 4.3 T or F ​(1) The three types of reliability evidence calculated by the​ central-office staff are essentially interchangeable. T or F ​(2) The trivial difference between the​ test-retest coefficient and the internal consistency coefficient constitutes no cause for alarm. T or F ​(3) The​ test-retest r should never be smaller than a​ test's internal consistency estimate of reliability. T or F ​(4) The standard error measurement​ (4.3 in this​ instance) is derived more from validity evidence than from reliability evidence.

This illustrative item seems to violate none of the​ chapter's guidelines for constructing such​ items, that​ is, the general​ guidelines, the guidelines for multiple​ binary-choice guidelines, and the guidelines for​ binary-choice items.

True​ ___ False___ When determining a​ test's classification​ consistency, there is no need to consider the cut score employed nor that cut​ score's location in the score distribution.

This illustrative item violates the​ item-specific guideline regarding the use of negative statements in a​ binary-choice item.

List X List Y ​___ (1) matching a. Can cover much content ​___ (2)​ binary-choice b. Can test​ high-order cognition ​___ (3) multiple​ binary-choice c. May elicit only​ low-level knowledge d. Cannot assess creative responses

This illustrative matching item contains several departures from Chapter​ Six's item-writing guidelines for matching items.

Consider the following illustrative​ binary-choice item. It deals with a​ reliability/precision concept treated in the Standards for Educational and Psychological Testing ​(2014). ​Directions: Please indicate whether the statement below regarding the​ reliability/precision of educational tests is Accurate ​(Circle the​ A) or Inaccurate ​(Circle the​ I). A or I Because the standard error of measurement can be employed to generate confidence intervals around reported​ scores, it is typically more informative than a reliability coefficient. Which of the following statements best describes the illustrative​ item?

This illustrative​ binary-choice item violates none of the general or​ item-category guidelines for this type of​ selected-response item.

When we encounter a test whose scores are affected by processes that are quite extraneous to the​ test's intended​ purpose, we assert that the test displays which one of the​ following? a. Construct underrepresentation b. Construct deficiency c. Construct corruption d.​ Construct-irrelevant variance e. All of the above

This illustrative​ item, because it includes an​ "all of the​ above" alternative, violates an important​ ite-writing guideline.

Please compose a short essay of 500 and​ 1,000 words on the​ topic: "Soccer Outside the United​ States." Either use one of our classroom computers or write the essay by hand. Be sure to engage in appropriate prewriting​ activities, draft an initial version of the​ essay, and then revise your draft at least once. You will have ninety minutes to complete this task.

This item seems to be biased in favor of children born outside the United​ States, many of whom may be more familiar with​ non-U.S. soccer than will children be who are born in the United States.

What is the chief function of validity evidence when employed to confirm the accuracy of​ score-based interpretations about​ test-takers' status in relation to specific uses of an educational​ test?

To support relevant propositions in a validity argument​ that's marshaled to determine the defensibility of certain​ score-based interpretations

A considerable degree of disagreement can be found among educators regarding the precise meaning of the label​ "performance assessment."

True

A major challenge facing those teachers who personally employ performance tests is the difficulty of drawing valid inferences about​ students' generalized mastery of the​ skill(s) or bodies of knowledge being measured.

True

Although the NAEP assessment frameworks​ are, technically, supposed to guide NAEP​ item-development and not function as curricular frameworks because of the​ long-standing U.S. tradition that the federal government​ shouldn't influence what is taught in​ state-governed public​ schools, teachers can still get good ideas about what to assess and how to assess it from the illustrative NAEP items that are available to the public.

True

Among the most prevalent​ personal-bias errors made when scoring​ students' responses to performance tests are generosity​ errors, severity​ errors, and​ central-tendency errors.

True

Because a classroom​ test's influence on a​ teacher's instructional decision making is one of the most beneficial dividends of classroom​ assessment, a teacher should think through in advance how certain levels of student performances would influence a​ teacher's test-based instructional decisions—and then abandon or revise any tests that have no​ decision-impact linked to their results.

True

Because of such needs as how to grade this​ year's students or whether changes are needed in next​ year's instructional​ procedures, teachers should invariably link their planned classroom assessments explicitly to these sorts of decisions from the earliest moments a classroom test is being conceptualized.

True

Because of​ today's continuing advances in​ technology, it seems certain that creators of performance assessment will increasingly structure their​ computer-based assessments around a wide range of digitally simulated tasks.

True

Because recent years have seen both schools and teachers being evaluated on the basis of​ students' performances on​ high-stakes tests, such as a​ state's annual accountability​ tests, it becomes almost imperative for teachers to determine the degree to which​ what's measured by their classroom assessments can contribute to improved​ students' performances on such significant tests.

True

Because the curricular recommendations of national​ subject-matter associations typically represent the best curricular thinking of the most able​ subject-matter specialists in a given​ field, as teachers try to identify the​ knowledge, skills, and affect to measure in their own classroom​ assessments, the views of such national organizations can often provide helpful curricular insights.

True

If an elementary teacher has designed his instructional system so it centers on the use of​ "catch-up" and​ "enrichment" learning centers​ where, based on​ classroom-assessment performances, students​ self-assign themselves to one of these​ centers, an​ early-on factor to consider is whether the classroom assessments should yield​ norm-referenced or​ criterion-referenced inferences.

True

If appropriately conceived and​ implemented, performance assessment can contribute substantially not only to improving a​ teacher's instructional effectiveness but also to increasing the quality of​ students' learning.

True

If a​ teacher's students are annually supposed to master an officially approved set of state curricular​ standards, and a state accountability test aligned with those standards is given each​ year, teachers should surely try to make sure that what their classroom tests measure is congruent —or contributory to—​what's assessed by such state accountability tests.

True

In recognition of how much time it typically takes for teachers to score​ students' responses to​ constructed-response items, especially those items calling for extended​ responses, an early factor for a teacher to consider when creating a classroom assessment is whether the teacher has sufficient time to properly score​ students' responses to a test containing​ constructed-response items.

True

It is often remarkably helpful for teachers to ask their coworkers to review the potential emphases of underdevelopment classroom assessments because teachers and​ administrators, especially those who are familiar with​ what's being taught and the sorts of students to whom it is​ taught, can provide useful insights regarding what should be assessed—and what​ shouldn't.

True

Many users of the kinds of scoring rubrics employed to evaluate​ students' performance-test responses agree that the most significant feature of such rubrics is its set of evaluative criteria.

True

One of the best ways to minimize halo effect—and its negative impact on scoring accuracy—is to employ analytic scoring and then implore​ rubric-users to render separate judgments for each evaluative criterion.

True

Teachers will find that their classroom assessments are most useful when a​ teacher's earliest thinking about the nature of such assessments is explicitly intended to contribute to an upcoming educational decision to be made by the teacher.

True

When scoring​ students' responses to performance​ tests, the three common sources of errors contributing to invalid inferences are the scoring​ scale, the scorers​ themselves, and the procedures by which scorers employ the scoring scale.

True

A district chooses a commercial test to provide information about the social studies skills and knowledge that the students seem to be having difficulty in mastering. A relatively elaborate series of​ "alignment" studies will be carried out early in the school year in an attempt to provide validity evidence to confirm this instructionally supportive usage. On which of the following sources of validity evidence is it most likely those who are supervising these alignment studies will​ rely?

Validity evidence based on internal structure of the social studies test

Which one of the following sources of validity evidence should be of most interest to teachers when evaluating their own​ teacher-made tests?

Validity evidence based on test content

If educators wish to accurately estimate the likelihood of consistent decisions about students who score at or near a​ high-stakes test's previously determined​ cut-score, which of the following indicators would be most useful for this​ purpose?

a standard error of measurement for the entire test

A dozen​ middle-school mathematics teachers in a large school district have collaborated to create a​ 30-item test of​ students' grasp of what the​ test's developers have labeled​ "Essential Quantitative​ Aptitude," that​ is, students' EQA. All 30 items were constructed in an effort to measure each​ student's EQA. Before using the test with many​ students, however, the developers wish to verify that all or most of its items are functioning​ homogeneously, that​ is, are properly aimed at gauging a​ test-taker's EQA. On which of the following indicators of assessment reliability should the test developers focus their​ efforts?

an​ internal-consistency reliability coefficient

Please assume you are a​ middle-school English teacher​ who, despite this​ chapter's urging that you​ rarely, if​ ever, collect reliability evidence for your own​ tests, stubbornly decides to do so for all of your​ mid-term and final exams. Although you wish to determine the reliability of your tests for the group of students in each of your​classes, you only wish to administer the tests destined for such reliability analyses on one​ occasion, not two or more. Given this​ constraint, which of the following coefficients would be most suitable for your​ reliability-determination purposes?

an​ internal-consistency reliability coefficient

A​ self-report inventory intended to measure secondary​ students' confidence that they are​ "college and career​ ready" has recently been developed by administrators in an urban school district. To collect evidence bearing on the consistency with which this new inventory measures​ students' status with respect to this affective​ disposition, the inventory is administered to nearly 500 students in late January and​ then, a few weeks​ later, in​ mid-February. When​ students' scores on the two administrations have been​ correlated, which one of the following indicators of reliability will have been​ generated?

a​ test-retest reliability coefficient

One of your​ colleagues, a​ high-school chemistry​ teacher, believes that certain of her students have somehow gained access to the final exams she has always used in her classes. To address what she calls​ "this serious security​ violation," she has created four new versions of all of her major exams—four versions that she regards as​ "equally challenging." She has recently sought your advice regarding what sort of reliability evidence she ought to be collecting regarding these new multiple renditions of her chemistry exams. In this​ situation, which one of the following should you be recommending to​ her?

evidence regarding the​ alternate-form reliability of her several exams

A​ district's new​ computer-administered test of​ students' mastery of​ "composition conventions" has recently been used with their​ district's eleventh- and​ twelfth-grade students. To help judge the consistency with which the test measures​ students' knowledge of the assessed​ conventions, district officials have computed​ Cronbach's coefficient alpha for students who completed this​ brand-new exam. Which of the following kinds of reliability evidence do these alpha coefficients​ represent?

internal consistency

Please imagine that the reading specialists in a​ district's central office have developed what they have labeled a​ "diagnostic reading​ test." You think its​ so-called subscale scores are not diagnostic at all but are simply measuring a single overall dimension you believe to be​ "reading comprehension." In this​ setting, which of the following kinds of reliability evidence would supply the most relevant information related to your disagreement with the reading​ test's developers?

internal-consistency reliability evidence

If a multistate assessment consortium has generated a new performance test of​ students' oral communication skills and wishes to verify that​ students' scores on the performance test remain relatively similar regardless of the time during the school year when the test was​ completed, which of the following kinds of consistency evidence would be most​ appropriate?

test-retest evidence of reliability

Suppose that a​ state's governor has appointed a​ blue-ribbon committee to establish a​ test-based promotion-denial system for reducing the number of​ sixth-grade students who are​ "socially" promoted to the seventh grade. The​ blue-ribbon committee's proposal calls for​ sixth-graders to be able to take a new​ high-stakes promotion exam at any time they wish during their​ grade-six school year. Given these​ circumstances, which one of the following evidences of the new promotion​ exam's measurement consistency should be​ collected?

test-retest reliability

Which one of the following four pairs of validity evidence most frequently revolves exclusively around judgments focused on test​ content?

Developmental-care documentation and external content reviews by nonpartisan judges

In​ general, holistically scored rubrics are more useful for pinpointing​ students' strengths and weaknesses than are analytically scored rubrics.

False

Please indicate whether following statement is related to performance assessment is Accurate (True) or Inaccurate (False).

False

Whenever​ possible, teachers should attempt to have their assessments focus quite equally on the​ cognitive, affective, and psychomotor domains because almost all human acts including ​students' test-taking—rely to a considerable extent on those three domains of behavior.

False

​"Because terrific pressures are currently obliging teachers to significantly boost their​ students' test​ scores, every teacher needs to understand​ how, in most​ instances, an educational test actually defines the nature of​ what's to be​ taught." This​ is:

One of ​today's reasons for teachers to know about assessment

Please accurately fill in the blanks you find in the statement given below regarding​ "How a bill becomes a​ law." In​ _______, _______ and​ _______ explored what ultimately became the​ _______ section of the northwestern United States with the assistance of a​ native-American guide known as​ _______. ​(Prod. These blank lines MUST be equal in​ length.)

The item satisfies the guideline regarding linear​ equality, yet violates the​ number-of-blanks guideline.

In which one of the following four statements are all of the pronouns used​ properly? a. I truly enjoyed his telling of the joke. b. We watched him going to the coffee shop. c. We listened to them singing the once—​popular, but rarely heard song. d. Dad watched them joking about​ politicians-while approving of it all.

This assessment item does not appear to be biased.

To avoid the excessive​ time-consumption often associated with performance​ assessment, it is helpful for teachers to focus their performance tests on measuring only a modest number of particularly significant skills.

True

Which of the following indices of a​ test's reliability is most often provided by developers of the kinds of standardized tests destined for use with large numbers of​ students?

​internal-consistency reliability coefficients

"Wishing that students will make progress does not guarantee that students actually will do so. And this is why I believe teachers have a fundamental responsibility to monitor their​ students' progress throughout the school year. I try to administer informal​ progress-monitoring quizzes every few weeks to make sure my instruction is​ "taking." If my instruction is not working as well as I want it to​ work, then I can make modifications in my upcoming teaching plans.​ Assessment-based monitoring of​ students' progress is so very sensible that​ it's hard for me to understand why it is not more widely used. This​ is:

A traditional reason for teachers to know about assessment

​"Teachers need to give classroom assessments in order to assign grades to students indicating how well each student has attained the learning outcomes set for​ them." This​ is:

A traditional reason for teachers to know about assessment

​"We have an enormously diverse collection of students in our​ school, and their levels of achievement are all over the lot.​ Accordingly, when I get a new group of students for my​ third-grade classroom each​ fall, you can bet that during the early days of the school year I assess their entry​ behavior, that​ is, the knowledge and skills those children already possess. It helps me to know where I need to put my instructional energies during the school​ year." This​ is:

A traditional reason for teachers to know about assessment

Please complete the​ short-answer items below by filling in the blank you will find in each item. • ​__________ is the case to be employed with all modifiers of gerund—definitely including pronouns. • A​ __________ infinitive​ that, in former​ times, was regarded as a grammatical error is now acceptably encountered in all kinds of writing.

Although several of the​ chapter's item-writing guidelines have been properly​ followed, there is the​ same, rather​ obvious, violation of an​ item-writing guideline in both items.

A recently established​ for-profit measurement company has just published a​ brand-new set of​ "interim tests" intended to measure​ students' progress in attaining certain scientific skills designated as​ "21st century​ competencies." There are four supposedly equivalent versions of each interim​ test, and each of these four versions is to be administered about every two months. Correlation coefficients showing the relationship between every pair of the four versions are made available to users. What kind of coefficient do these​ between-version correlations​ represent?

An​ alternate-form coefficient

A compulsive​ middle-school teacher, even after reading Chapter​ 2's recommendation urging teachers not to collect reliability evidence for their own​ teacher-made tests, perseverates in calculating​ Kuder-Richardson indices for all of his major and minor classroom exams. What kind of reliability indicator is this teacher attempting to​ compute?

An​ internal-consistency reliability coefficient

​Directions: Remembering the class discussions of​ America's current immigration​ issues, please provide a brief essay on each of the issues cited below. You will have a full​ 50-minute class period to complete this​ examination, and you should divide your​ essay-writing efforts equally between the two topics. In grading your twin​ essays, equal weight will be given to each essay.​ Remember, compose two clear essays—one for each issue. Your Two Essay Topics 1. Why would some form of​ "amnesty" for illegal aliens be a helpful solution to at least part of​ today's U.S. immigration​ problems? 2. Why would some form of​ "amnesty" for illegal aliens be a disastrous solution to​ today's U.S. immigration​ problems?

At least one of the​ chapter's guidelines has been explicitly followed in the illustrative item.

If a​ teacher's students include children with disabilities or children who are English Language​ Learners, which one of the following three assertions about assessment bias is most​ defensible?

Because assessment bias erodes the validity of inferences derivative from​ students' test​ performances, even greater effort should be made to reduce assessment bias when working with these two distinctive populations.

Why do some members of the measurement community prefer to use the phrase​ "absence-of-bias" rather than​ "assessment bias" when quantitatively reporting the degree to which an educational test appears to be​ biased?

Because both reliability and​ validity, two key attributes of educational​ tests, are​ positive, "to be​ sought" qualities, so too is​ "absence-of-bias" a positive quality to be sought in educational tests.

"I was quite surprised when our​ state's department of education insisted that each of the​ state's teachers collect accurate evidence of their​ students' growth because such evidence was to be used in evaluating all of the​ state's teachers. I​ have, for my entire​ career, collected pretest and posttest evidence of my​ students' achievement status because this helps me—irrespective of what the state wants me to do—determine which​ changes, if​ any, are needed during next​ year's instruction." This​ is:

Both a traditional reason and one of​ today's reasons for teachers to know about assessment

Only one of the following statements about a​ test's classification consistency is accurate. Select the accurate statement regarding classification consistency.

Classification consistency indicators represent the proportion of students classified identically on two testing occasions.

Suppose that the developers of a new science achievement test had inadvertently laden their​ test's items with​ gender-based stereotypes regarding the role of women in science​ and, when the new test was​ given, the test scores of girls were markedly lower than the test scores of boys. Which of the following deficits most likely accounts for this gender disparity in​ students' scores?

Construct-irrelevant variance

Thinking back over the mathematics lessons and homework assignments that you received during the past 12​ weeks, what mathematical conclusions can you​ draw? Describe those conclusions in no more than 300​ words, written by hand on the​ test-booklets provided or as a printed copy of your conclusions composed on one of our classroom computers.

Despite its adherence to one of the​ chapter's item-writing guidelines for essay​ items, the shoddy depiction of a​ student's task renders the item dysfunctional.

Assume a​ state's education authorities have recently established a policy​ that, in order for students to be promoted to the next grade​ level, those students must pass a​ state-supervised English and language arts​ (ELA) exam. Administered near the close of Grades​ three, six, and​ eight, the three new​ grade-level exams are intended to determine a​ student's mastery of the official​ state-approved ELA curricular targets for those three grades. As state authorities set out to provide support for these​ "promotion-denial" exams, which one of the following sources of validity evidence are they likely to rely on most​ heavily?

Evidence based on test content

Although the way a​ state's public schools are run is up to officials of that​ state, not the federal​ government, the U.S. Supreme Court has ruled that​ state-taught students must still be granted their constitutionally guaranteed​ rights, and this means that teachers should be guided about​ classroom-assessment coverage by the U.S. Constitution.

False

Because the National Assessment of Educational Progress​ (NAEP) is widely employed as a​ "grade-promotion" and​ "diploma-denial" exam for individual​ students, teachers whose students take NAEP tests should familiarize themselves with the content in NAEP assessment frameworks to identify potential emphases for classroom assessments.

False

Because​ parents' preferences regarding what their children should be learning are not only motivationally useful for teachers to employ but also constitute significant curricular guidance for​ educators, teachers should strive to incorporate​ parents' curricular opinions in all of their classroom assessments.

False

Because​ students' growth in their mastery of cognitive skills and knowledge is such a patently important factor by which to evaluate the success of not only​ schools, but also​ teachers, classroom assessments should focus exclusively on measuring​ students' cognitive status.

False

Even though teachers should not take away too much instructional time because of their classroom​ assessments, the number of assessment targets addressed by any classroom test should still be numerous and​ wide-ranging so that more curricular content can be covered.

False

If a teacher decides to seek advice​ from, say, a group of several teacher colleagues regarding the appropriateness of the content for the​ teacher's planned classroom​ assessment, professional ethics demand that the curricular counsel of those colleagues must be accepted.

False

If a​ state's education officials have endorsed the Common Core State​ Standards, but have chosen to create their​ state's own accountability tests to measure those standards​ (instead of using tests built by a multistate assessment​ consortium), it is still sensible for a teacher in that state to seek​ test-construction guidance from​ what's measured by​ consortium-created tests.

False

Information about how to differentiate the quality of​ students' responses to​ performance-test tasks should be supplied for a minimum of at least half of a​ rubric's evaluative criteria.

False

Performance​ testing, because of its requisite reliance on sometimes flawed human​ scoring, should be chiefly restricted to measuring​ students' mastery of​ lower-order cognitive skills.

False

Teachers who rely chiefly on hypergeneral rubrics are most likely to spur students to acquire a generalized mastery of whatever skills are being assessed by the performance tests involved.

False

The most effective way to construct rubrics for efficient and accurate scoring of​ students' responses to performance tests is to build tests that can be scored simultaneously using analytic and holistic evaluative approaches.

False

​_____________ is a good​ one-word description for​ commas, periods, question​ marks, and colons.

For young students such as these third​ graders, direct questions should be used instead of incomplete statements—so the illustrative item violates an​ item-writing guideline for​ short-answer items.

Which of the following represents the most appropriate strategy by which to support the validity of​ score-based interpretations for specific​ uses?

Generation of an​ evidence-laden validity argument in support of a particular​ usage-specified score interpretation

The relationship between the degree to which an educational test is biased and the​ test's disparate impact on certain groups of learners is an important one. Which of the following statements best captures the nature of this​ relationship?

If an educational assessment displays a disparate impact on different groups of​ test-takers, it may or may not be biased.

What are the two major causes of assessment bias we encounter in typical educational​ tests?

Inappropriate vocabulary and unfamiliar types of test items

An​ independent, for-profit measurement firm has recently published what the​ firm's promotional literature claims to be​ "an instructionally​ diagnostic" interim test in mathematics. Different forms of the new test are to be administered to students every two or three months. A​ student's results are reported as a​ total, all-encompassing score and also as five​ "strands" that are advertised as​ "distinctive and​ diagnostic." Your​ district's administrators are deciding whether to purchase copies of this new test. Which one of the following would be the most appropriate source of validity evidence for the newly published​ test?

Internal structure evidence

​"Just as physicians need to know about​ patients' blood pressure and what it​ indicates, teachers need to know about educational testing. It is simply part of what a solid educational professional needs to​ understand." This​ is:

Neither a traditional reason nor one of​ today's reasons for teachers to know about assessment

​"Leaders of both the National Education Association and the American Federation of Teachers have strongly endorsed the more frequent assessment of students as a way to better educate the​ nation's children.​ Accordingly, current teachers need to know about assessment fundamentals before they try to teach their​ students." This​ is:

Neither a traditional reason nor one of​ today's reasons for teachers to know about assessment

​ "I think every teacher has a professional obligation to make sure their school is accurately​ evaluated, but also to assure that each teacher in the school is accurately evaluated.​ Moreover, when appraising schools or​ teachers, most people rely on​ students' test performances. So it is abundantly clear that teachers must understand​ what's going on when kids are​ tested." This​ is:

One of ​today's reasons for teachers to know about assessment

​"Just a year​ ago, the voters in our school district voted favorably in a huge​ school-levy election that brought in substantial tax dollars for our schools. Most of the​ district's teachers are convinced that this positive support for the schools was based on our​ schools' consistently high rankings on the​ state's annual accountability​tests." This​ is:

One of ​today's reasons for teachers to know about assessment

​"When I plan a new unit of instruction for my​ fourth-grade students, I always—and I mean always—first create the​ end-of-unit assessments​ I'll be using. By doing​ so, I acquire a much more clear idea of where I am heading​ instructionally, and thereby help my​ lesson-planning immensely." This​ is:

One of ​today's reasons for teachers to know about assessment

When teachers in this school score their​ students' responses to essay​ items, those teachers should always​ (1) make a preliminary judgment about how much importance should be assigned to the conventions of​ writing, such as​ spelling, (2) decide whether to score holistically or​ analytically, (3) prepare a tentative scoring key prior to actually scoring​ students' responses,​ (4) try to score​ students' responses anonymously without knowing which student supplied which​ response, and​ (5) score a given​ student's responses to all essay items on a test and then move on to the next​ student's responses.

Only one of the​ faculty-approved rules is basically opposed to the Chapter 7 guidelines for scoring​ students' responses to essay items.

Suppose that you and several other teachers in a middle school were trying to construct a new test intended to be predictive of​ high-school students' subsequent scores on the SAT and ACT college admissions exams.​ Moreover, suppose that you were in no particular hurry to assemble validity evidence in support of the accuracy of those inferred predictions. Which one of the following sources of validity evidence would supply the most compelling support for the validity of your anticipated​ predictions?

Predictive validity evidence based on the new​ test's relation to other variables


Ensembles d'études connexes

MGMT 340: Lean Start-Up (Ron Cheek)

View Set

Peds 25 Growth & Development Newborn & Infant PrepU

View Set

George Washington Leads A New Nation

View Set