Ed Measurement & Evaluation

Ace your homework & exams now with Quizwiz!

If a​ teacher's students include children with disabilities or children who are English Language​ Learners, which one of the following three assertions about assessment bias is most​ defensible?

Because assessment bias erodes the validity of inferences derivative from​ students' test​ performances, even greater effort should be made to reduce assessment bias when working with these two distinctive populations.

A dozen​ middle-school mathematics teachers in a large school district have collaborated to create a​ 30-item test of​ students' grasp of what the​ test's developers have labeled​ "Essential Quantitative​ Aptitude," that​ is, students' EQA. All 30 items were constructed in an effort to measure each​ student's EQA. Before using the test with many​ students, however, the developers wish to verify that all or most of its items are functioning​ homogeneously, that​ is, are properly aimed at gauging a​ test-taker's EQA. On which of the following indicators of assessment reliability should the test developers focus their​ efforts?

An​ internal-consistency reliability coefficient

Please assume you are a​ middle-school English teacher​ who, despite this​ chapter's urging that you​ rarely, if​ ever, collect reliability evidence for your own​ tests, stubbornly decides to do so for all of your​ mid-term and final exams. Although you wish to determine the reliability of your tests for the group of students in each of your​ classes, you only wish to administer the tests destined for such reliability analyses on one​ occasion, not two or more. Given this​ constraint, which of the following coefficients would be most suitable for your​ reliability-determination purposes?

An​ internal-consistency reliability coefficient

Assume a​ state's education authorities have recently established a policy​ that, in order for students to be promoted to the next grade​ level, those students must pass a​ state-supervised English and language arts​ (ELA) exam. Administered near the close of Grades​ three, six, and​ eight, the three new​ grade-level exams are intended to determine a​ student's mastery of the official state-approved ELA curricular targets for those three grades. As state authorities set out to provide support for these​ "promotion-denial" exams, which one of the following sources of validity evidence are they likely to rely on most​ heavily?

Evidence based on test content

Which of the following represents the most appropriate strategy by which to support the validity of​ score-based interpretations for specific uses?

Generation of an​ evidence-laden validity argument in support of a particular​ usage-specified score interpretation

If educators wish to accurately estimate the likelihood of consistent decisions about students who score at or near a​ high-stakes test's previously determined​ cut-score, which of the following indicators would be most useful for this​ purpose?

A conditional standard error of measurement​ (near the​ cut-score)

Imagine that you had rejected the​ chapter's recommendation for teachers not to seek reliability evidence for most of their own classroom tests.​ Moreover, you routinely ask your students to complete each of your exams ​twice, usually two or three days apart. You make sure that nothing takes place during the two or three days separating those​ test-taking occasions that would bear directly on​ students' mastery of​ what's being tested. You then correlate​ students' scores on the two testing occasions. What you hope to determine by these​ two-time testing activities is an answer to the​ question: How stable are my classroom​ assessments? Accordingly, which of the following reliability evidence would you regard as most​ appropriate, and personally​ gratifying, for any of your​ "twice-taken" tests?

A test-retest

"Teachers need to give classroom assessments in order to assign grades to students indicating how well each student has attained the learning outcomes set for​ them." This​ is:

A traditional reason for teachers to know about assessment

"We have an enormously diverse collection of students in our​ school, and their levels of achievement are all over the lot.​ Accordingly, when I get a new group of students for my​ third-grade classroom each​ fall, you can bet that during the early days of the school year I assess their entry​ behavior, that​ is, the knowledge and skills those children already possess. It helps me to know where I need to put my instructional energies during the school​ year."

A traditional reason for teachers to know about assessment

​"Wishing that students will make progress does not guarantee that students actually will do so. And this is why I believe teachers have a fundamental responsibility to monitor their​ students' progress throughout the school year. I try to administer informal​ progress-monitoring quizzes every few weeks to make sure my instruction is​ "taking." If my instruction is not working as well as I want it to​ work, then I can make modifications in my upcoming teaching plans.​ Assessment-based monitoring of​ students' progress is so very sensible that​ it's hard for me to understand why it is not more widely used."

A traditional reason for teachers to know about assessment

"Because the curricular recommendations of national​ subject-matter associations typically represent the best curricular thinking of the most able​ subject-matter specialists in a given​ field, as teachers try to identify the​ knowledge, skills, and affect to measure in their own classroom​ assessments, the views of such national organizations can often provide helpful curricular insights."

Accurate

"If an elementary teacher has designed his instructional system so it centers on the use of​ "catch-up" and​ "enrichment" learning centers​ where, based on​ classroom-assessment performances, students​ self-assign themselves to one of these​ centers, an​ early-on factor to consider is whether the classroom assessments should yield​ norm-referenced or​ criterion-referenced inferences."

Accurate

"If a​ teacher's students are annually supposed to master an officially approved set of state curricular standards, and a state accountability test aligned with those standards is given each​ year, teachers should surely try to make sure that what their classroom tests measure is congruent with—or contributory to—​what's assessed by such state accountability tests."

Accurate

"In recognition of how much time it typically takes for teachers to score​ students' responses to​ constructed-response items, especially those items calling for extended​ responses, an early factor for a teacher to consider when creating a classroom assessment is whether the teacher has sufficient time to properly score students' responses to a test containing​ constructed-response items."

Accurate

"It is often remarkably helpful for teachers to ask their coworkers to review the potential emphases of underdevelopment classroom assessments because teachers and​ administrators, especially those who are familiar with​ what's being taught and the sorts of students to whom it is​ taught, can provide useful insights regarding what should be assessed—and what​ shouldn't."

Accurate

"Teachers will find that their classroom assessments are most useful when a​ teacher's earliest thinking about the nature of such assessments is explicitly intended to contribute to an upcoming educational decision to be made by the teacher."

Accurate

A recently established​ for-profit measurement company has just published a​ brand-new set of​ "interim tests" intended to measure​ students' progress in attaining certain scientific skills designated as​ "21st century​ competencies." There are four supposedly equivalent versions of each interim​ test, and each of these four versions is to be administered about every two months. Correlation coefficients showing the relationship between every pair of the four versions are made available to users. What kind of coefficient do these​ between-version correlations​ represent?

Alternate-form coefficient

Suppose that you are an elementary school teacher whose students are tested each spring using a​ state-adopted accountability test in mathematics. This​ year, the state has shifted its annual accountability tests to a new set of exams developed collaboratively by a group of​ "partner-states" during the previous several years. At each grade​ level, one of four available versions of that​ grade's tests may be administered. You have reviewed the technical manual for the new​ 50-item tests, and you are pleased with the new​ test's reliability coefficients reported for students at the same grade level as the grade you currently teach. Although the following indicators are totally​ fictitious, which of the following reliability indicators should have properly triggered the greatest satisfaction on your​ part?

Alternate-form r ​= .69

"I was quite surprised when our​ state's department of education insisted that each of the​ state's teachers collect accurate evidence of their​ students' growth because such evidence was to be used in evaluating all of the​ state's teachers. I​ have, for my entire​ career, collected pretest and posttest evidence of my​ students' achievement status because this helps irrespective of what the state wants me to do determine which​ changes, if​ any, are needed during next​ year's instruction." This​ is:

Both a traditional reason and one of​ today's reasons for teachers to know about assessment

Only one of the following statements about a​ test's classification consistency is accurate. Select the accurate statement regarding classification consistency

Classification consistency indicators represent the proportion of students classified identically on two testing occasions.

The relationship between the degree to which an educational test is biased and the​ test's disparate impact on certain groups of learners is an important one. Which of the following statements best captures the nature of this​ relationship?

If an educational assessment displays a disparate impact on different groups of​ test-takers, it may or may not be biased

"Because the National Assessment of Educational Progress​ (NAEP) is widely employed as a​ "grade-promotion" and​ "diploma-denial" exam for individual​ students, teachers whose students take NAEP tests should familiarize themselves with the content in NAEP assessment frameworks to identify potential emphases for classroom assessments."

Inaccurate

"Because​ parents' preferences regarding what their children should be learning are not only motivationally useful for teachers to employ but also constitute significant curricular guidance for​ educators, teachers should strive to incorporate​ parents' curricular opinions in all of their classroom assessments."

Inaccurate

"Because​ students' growth in their mastery of cognitive skills and knowledge is such a patently important factor by which to evaluate the success of not only​ schools, but also​ teachers, classroom assessments should focus exclusively on measuring​ students' cognitive status."

Inaccurate

"Even though teachers should not take away too much instructional time because of their classroom​ assessments, the number of assessment targets addressed by any classroom test should still be numerous and​ wide-ranging so that more curricular content can be covered."

Inaccurate

"If a teacher decides to seek advice​ from, say, a group of several teacher colleagues regarding the appropriateness of the content for the​ teacher's planned classroom​ assessment, professional ethics demand that the curricular counsel of those colleagues must be accepted."

Inaccurate

"If a​ state's education officials have endorsed the Common Core State​ Standards, but have chosen to create their​ state's own accountability tests to measure those standards​ (instead of using tests built by a multistate assessment​ consortium), it is still sensible for a teacher in that state to seek​ test-construction guidance from​ what's measured by​ consortium-created tests."

Inaccurate

"Whenever​ possible, teachers should attempt to have their assessments focus quite equally on the​ cognitive, affective, and psychomotor domains because almost all human acts—including ​students' ​test-taking—rely to a considerable extent on those three domains of behavior."

Inaccurate

A​ district's new​ computer-administered test of​ students' mastery of​ "composition conventions" has recently been used with their​ district's eleventh- and​ twelfth-grade students. To help judge the consistency with which the test measures​ students' knowledge of the assessed​ conventions, district officials have computed Cronbach's coefficient alpha for students who completed this​ brand-new exam. Which of the following kinds of reliability evidence do these alpha coefficients represent?

Internal consistency

An​ independent, for-profit measurement firm has recently published what the​ firm's promotional literature claims to be​ "an instructionally​ diagnostic" interim test in mathematics. Different forms of the new test are to be administered to students every two or three months. A​ student's results are reported as a​ total, all-encompassing score and also as five​ "strands" that are advertised as​ "distinctive and​ diagnostic." Your​ district's administrators are deciding whether to purchase copies of this new test. Which one of the following would be the most appropriate source of validity evidence for the newly published​ test?

Internal structure evidence

A compulsive​ middle-school teacher, even after reading Chapter​ 2's recommendation urging teachers not to collect reliability evidence for their own​ teacher-made tests, perseverates in calculating​ Kuder-Richardson indices for all of his major and minor classroom exams. What kind of reliability indicator is this teacher attempting to​ compute?

Internal-consistency reliability coefficient

Which indices of a​ test's reliability is most often provided by developers of the kinds of standardized tests destined for use with large numbers of​ students?

Internal-consistency reliability coefficients

Please imagine that the reading specialists in a​ district's central office have developed what they have labeled a​ "diagnostic reading​ test." You think its​ so-called subscale scores are not diagnostic at all but are simply measuring a single overall dimension you believe to be​ "reading comprehension." In this​ setting, which of the following kinds of reliability evidence would supply the most relevant information related to your disagreement with the reading​ test's developers?

Internal-consistency reliability evidence

"Just as physicians need to know about​ patients' blood pressure and what it​ indicates, teachers need to know about educational testing. It is simply part of what a solid educational professional needs to​ understand." This​ is:

Neither a traditional reason nor one of​ today's reasons for teachers to know about assessment

"Leaders of both the National Education Association and the American Federation of Teachers have strongly endorsed the more frequent assessment of students as a way to better educate the​ nation's children.​ Accordingly, current teachers need to know about assessment fundamentals before they try to teach their​ students."

Neither a traditional reason nor one of​ today's reasons for teachers to know about assessment

What are the two major causes of assessment bias we encounter in typical educational​ tests?

Offensiveness and unfair penalization

"Because terrific pressures are currently obliging teachers to significantly boost their students' test scores, every teacher needs to understand​ how, in most​ instances, an educational test actually defines the nature of what's to be​ taught."

One of ​today's reasons for teachers to know about assessment

"Just a year​ ago, the voters in our school district voted favorably in a huge​ school-levy election that brought in substantial tax dollars for our schools. Most of the​ district's teachers are convinced that this positive support for the schools was based on our​ schools' consistently high rankings on the​ state's annual accountability​ tests."

One of ​today's reasons for teachers to know about assessment

"When I plan a new unit of instruction for my​ fourth-grade students, I always, I mean, first create the​ end-of-unit assessments​ I'll be using. By doing​ so, I acquire a much more clear idea of where I am heading​ instructionally, and thereby help my​ lesson-planning immensely." This​ is:

One of ​today's reasons for teachers to know about assessment

​ "I think every teacher has a professional obligation to make sure their school is accurately​ evaluated, but also to assure that each teacher in the school is accurately evaluated.​ Moreover, when appraising schools or​ teachers, most people rely on​ students' test performances. So it is abundantly clear that teachers must understand​ what's going on when kids are​ tested." This​ is:

One of ​today's reasons for teachers to know about assessment

Measurement specialists assert that validation efforts are preoccupied with the degree to which we use​ students' test performances to support the accuracy of​ score-based inferences. Which of the following best identifies the focus of those​ inferences?

Students' unseen skills and knowledge

Suppose that a​ state's governor has appointed a​ blue-ribbon committee to establish a​ test-based promotion-denial system for reducing the number of​ sixth-grade students who are​ "socially" promoted to the seventh grade. The​ blue-ribbon committee's proposal calls for​ sixth-graders to be able to take a new​ high-stakes promotion exam at any time they wish during their​ grade-six school year. Given these​ circumstances, which one of the following evidences of the new promotion​ exam's measurement consistency should be​ collected?

Test-retest reliability

A​ self-report inventory intended to measure secondary​ students' confidence that they are college and career​ ready" has recently been developed by administrators in an urban school district. To collect evidence bearing on the consistency with which this new inventory measures students' status with respect to this affective disposition, the inventory is administered to nearly 500 students in late January and​ then, a few weeks​ later, in​ mid-February. When​ students' scores on the two administrations have been​ correlated, which one of the following indicators of reliability will have been​ generated?

Test-retest reliability coefficient

Please review the following item for assessment bias. It was used to assess the basic computation mathematics aims being pursued by an​ inner-city, elementary​ school's staff in a Midwestern state. Ramon Ruiz is sorting out empty tin cans he found in the neighborhood. He has four piles based on different colors of the cans. He thinks he has made a mistake in adding up how many cans are in each pile. Please identify​ Ramon's addition statement that is in error. a. 20 bean cans plus 32 cans​ = 52 cans. b. 43 bean cans plus 18 cans​ = 61 cans. c. 38 bean cans plus 39 cans​ = 76 cans. d. 54 bean cans plus 12 cans​ = 66 cans

The assessment item appears to be biased against Americans of Latino backgrounds.

Review the following​ reading-comprehension item for assessment bias. In certain Christian​ religions, there are gradients of sinful acts. For​ example, in the Roman Catholic​ Church, a venial sin need not be confessed to a​ priest, whereas a mortal sin must definitely be confessed. Based on a context clue contained in the paragraph​ above, which of the following statements is most​ accurate?

The assessment item appears to be biased in favor of students who are Roman Catholics

Validity evidence can be collected from a number of sources.​ Suppose, for​ instance, that a mathematics test has been built by a school​ district's officials to help identify those​ middle-school students who are unlikely to pass a statewide​ eleventh-grade high-school diploma test. The new test will routinely be given to the​ district's seventh-grade students. To secure evidence supporting the validity of this kind of predictive​ application, the new test will be administered to current​ seventh-graders, and the​ seventh-grade tests will also be given to the​ district's current​ eleventh-graders. This will permit the​ eleventh-graders' two sets of test results to be compared. Which of the following best describes this source of validity​ evidence?

The relationship of​ eleventh-graders' performances on the two tests

What is the chief function of validity evidence when employed to confirm the accuracy of score-based interpretations about​ test-takers' status in relation to specific uses of an educational test?

To support relevant propositions in a validity argument​ that's marshaled to determine the defensibility of certain​ score-based interpretations

One of your​ colleagues, a​ high-school chemistry​ teacher, believes that certain of her students have somehow gained access to the final exams she has always used in her classes. To address what she calls​ "this serious security​ violation," she has created four new versions of all of her major exams—four versions that she regards as​ "equally challenging." She has recently sought your advice regarding what sort of reliability evidence she ought to be collecting regarding these new multiple renditions of her chemistry exams. In this​ situation, which one of the following should you be recommending to​ her?

evidence regarding the​ alternate-form reliability of her several exams

Based on the 2014 edition of the Standards for Educational and Psychological​ Testing, and on common​ sense, which one of the following statements about​ students' test results represents a potentially appropriate phrasing​ that's descriptive of a set of​ students' test​ performances?

​"Students' scores on the test permit valid interpretations for this​ test's use."

Suppose that the developers of a new science achievement test had inadvertently laden their​ test's items with​ gender-based stereotypes regarding the role of women in science​ and, when the new test was​ given, the test scores of girls were markedly lower than the test scores of boys. Which of the following deficits most likely accounts for this gender disparity in​ students' scores?

​Construct-irrelevant variance

If a multistate assessment consortium has generated a new performance test of​ students' oral communication skills and wishes to verify that​ students' scores on the performance test remain relatively similar regardless of the time during the school year when the test was​ completed, which of the following kinds of consistency evidence would be most​ appropriate?

​Test-retest evidence of reliability


Related study sets

Introduction to Cybersecurity: First Principles

View Set

Chapter 2: Atoms, Molecules, and Ions

View Set