Ed Measurement & Evaluation
If a teacher's students include children with disabilities or children who are English Language Learners, which one of the following three assertions about assessment bias is most defensible?
Because assessment bias erodes the validity of inferences derivative from students' test performances, even greater effort should be made to reduce assessment bias when working with these two distinctive populations.
A dozen middle-school mathematics teachers in a large school district have collaborated to create a 30-item test of students' grasp of what the test's developers have labeled "Essential Quantitative Aptitude," that is, students' EQA. All 30 items were constructed in an effort to measure each student's EQA. Before using the test with many students, however, the developers wish to verify that all or most of its items are functioning homogeneously, that is, are properly aimed at gauging a test-taker's EQA. On which of the following indicators of assessment reliability should the test developers focus their efforts?
An internal-consistency reliability coefficient
Please assume you are a middle-school English teacher who, despite this chapter's urging that you rarely, if ever, collect reliability evidence for your own tests, stubbornly decides to do so for all of your mid-term and final exams. Although you wish to determine the reliability of your tests for the group of students in each of your classes, you only wish to administer the tests destined for such reliability analyses on one occasion, not two or more. Given this constraint, which of the following coefficients would be most suitable for your reliability-determination purposes?
An internal-consistency reliability coefficient
Assume a state's education authorities have recently established a policy that, in order for students to be promoted to the next grade level, those students must pass a state-supervised English and language arts (ELA) exam. Administered near the close of Grades three, six, and eight, the three new grade-level exams are intended to determine a student's mastery of the official state-approved ELA curricular targets for those three grades. As state authorities set out to provide support for these "promotion-denial" exams, which one of the following sources of validity evidence are they likely to rely on most heavily?
Evidence based on test content
Which of the following represents the most appropriate strategy by which to support the validity of score-based interpretations for specific uses?
Generation of an evidence-laden validity argument in support of a particular usage-specified score interpretation
If educators wish to accurately estimate the likelihood of consistent decisions about students who score at or near a high-stakes test's previously determined cut-score, which of the following indicators would be most useful for this purpose?
A conditional standard error of measurement (near the cut-score)
Imagine that you had rejected the chapter's recommendation for teachers not to seek reliability evidence for most of their own classroom tests. Moreover, you routinely ask your students to complete each of your exams twice, usually two or three days apart. You make sure that nothing takes place during the two or three days separating those test-taking occasions that would bear directly on students' mastery of what's being tested. You then correlate students' scores on the two testing occasions. What you hope to determine by these two-time testing activities is an answer to the question: How stable are my classroom assessments? Accordingly, which of the following reliability evidence would you regard as most appropriate, and personally gratifying, for any of your "twice-taken" tests?
A test-retest
"Teachers need to give classroom assessments in order to assign grades to students indicating how well each student has attained the learning outcomes set for them." This is:
A traditional reason for teachers to know about assessment
"We have an enormously diverse collection of students in our school, and their levels of achievement are all over the lot. Accordingly, when I get a new group of students for my third-grade classroom each fall, you can bet that during the early days of the school year I assess their entry behavior, that is, the knowledge and skills those children already possess. It helps me to know where I need to put my instructional energies during the school year."
A traditional reason for teachers to know about assessment
"Wishing that students will make progress does not guarantee that students actually will do so. And this is why I believe teachers have a fundamental responsibility to monitor their students' progress throughout the school year. I try to administer informal progress-monitoring quizzes every few weeks to make sure my instruction is "taking." If my instruction is not working as well as I want it to work, then I can make modifications in my upcoming teaching plans. Assessment-based monitoring of students' progress is so very sensible that it's hard for me to understand why it is not more widely used."
A traditional reason for teachers to know about assessment
"Because the curricular recommendations of national subject-matter associations typically represent the best curricular thinking of the most able subject-matter specialists in a given field, as teachers try to identify the knowledge, skills, and affect to measure in their own classroom assessments, the views of such national organizations can often provide helpful curricular insights."
Accurate
"If an elementary teacher has designed his instructional system so it centers on the use of "catch-up" and "enrichment" learning centers where, based on classroom-assessment performances, students self-assign themselves to one of these centers, an early-on factor to consider is whether the classroom assessments should yield norm-referenced or criterion-referenced inferences."
Accurate
"If a teacher's students are annually supposed to master an officially approved set of state curricular standards, and a state accountability test aligned with those standards is given each year, teachers should surely try to make sure that what their classroom tests measure is congruent with—or contributory to—what's assessed by such state accountability tests."
Accurate
"In recognition of how much time it typically takes for teachers to score students' responses to constructed-response items, especially those items calling for extended responses, an early factor for a teacher to consider when creating a classroom assessment is whether the teacher has sufficient time to properly score students' responses to a test containing constructed-response items."
Accurate
"It is often remarkably helpful for teachers to ask their coworkers to review the potential emphases of underdevelopment classroom assessments because teachers and administrators, especially those who are familiar with what's being taught and the sorts of students to whom it is taught, can provide useful insights regarding what should be assessed—and what shouldn't."
Accurate
"Teachers will find that their classroom assessments are most useful when a teacher's earliest thinking about the nature of such assessments is explicitly intended to contribute to an upcoming educational decision to be made by the teacher."
Accurate
A recently established for-profit measurement company has just published a brand-new set of "interim tests" intended to measure students' progress in attaining certain scientific skills designated as "21st century competencies." There are four supposedly equivalent versions of each interim test, and each of these four versions is to be administered about every two months. Correlation coefficients showing the relationship between every pair of the four versions are made available to users. What kind of coefficient do these between-version correlations represent?
Alternate-form coefficient
Suppose that you are an elementary school teacher whose students are tested each spring using a state-adopted accountability test in mathematics. This year, the state has shifted its annual accountability tests to a new set of exams developed collaboratively by a group of "partner-states" during the previous several years. At each grade level, one of four available versions of that grade's tests may be administered. You have reviewed the technical manual for the new 50-item tests, and you are pleased with the new test's reliability coefficients reported for students at the same grade level as the grade you currently teach. Although the following indicators are totally fictitious, which of the following reliability indicators should have properly triggered the greatest satisfaction on your part?
Alternate-form r = .69
"I was quite surprised when our state's department of education insisted that each of the state's teachers collect accurate evidence of their students' growth because such evidence was to be used in evaluating all of the state's teachers. I have, for my entire career, collected pretest and posttest evidence of my students' achievement status because this helps irrespective of what the state wants me to do determine which changes, if any, are needed during next year's instruction." This is:
Both a traditional reason and one of today's reasons for teachers to know about assessment
Only one of the following statements about a test's classification consistency is accurate. Select the accurate statement regarding classification consistency
Classification consistency indicators represent the proportion of students classified identically on two testing occasions.
The relationship between the degree to which an educational test is biased and the test's disparate impact on certain groups of learners is an important one. Which of the following statements best captures the nature of this relationship?
If an educational assessment displays a disparate impact on different groups of test-takers, it may or may not be biased
"Because the National Assessment of Educational Progress (NAEP) is widely employed as a "grade-promotion" and "diploma-denial" exam for individual students, teachers whose students take NAEP tests should familiarize themselves with the content in NAEP assessment frameworks to identify potential emphases for classroom assessments."
Inaccurate
"Because parents' preferences regarding what their children should be learning are not only motivationally useful for teachers to employ but also constitute significant curricular guidance for educators, teachers should strive to incorporate parents' curricular opinions in all of their classroom assessments."
Inaccurate
"Because students' growth in their mastery of cognitive skills and knowledge is such a patently important factor by which to evaluate the success of not only schools, but also teachers, classroom assessments should focus exclusively on measuring students' cognitive status."
Inaccurate
"Even though teachers should not take away too much instructional time because of their classroom assessments, the number of assessment targets addressed by any classroom test should still be numerous and wide-ranging so that more curricular content can be covered."
Inaccurate
"If a teacher decides to seek advice from, say, a group of several teacher colleagues regarding the appropriateness of the content for the teacher's planned classroom assessment, professional ethics demand that the curricular counsel of those colleagues must be accepted."
Inaccurate
"If a state's education officials have endorsed the Common Core State Standards, but have chosen to create their state's own accountability tests to measure those standards (instead of using tests built by a multistate assessment consortium), it is still sensible for a teacher in that state to seek test-construction guidance from what's measured by consortium-created tests."
Inaccurate
"Whenever possible, teachers should attempt to have their assessments focus quite equally on the cognitive, affective, and psychomotor domains because almost all human acts—including students' test-taking—rely to a considerable extent on those three domains of behavior."
Inaccurate
A district's new computer-administered test of students' mastery of "composition conventions" has recently been used with their district's eleventh- and twelfth-grade students. To help judge the consistency with which the test measures students' knowledge of the assessed conventions, district officials have computed Cronbach's coefficient alpha for students who completed this brand-new exam. Which of the following kinds of reliability evidence do these alpha coefficients represent?
Internal consistency
An independent, for-profit measurement firm has recently published what the firm's promotional literature claims to be "an instructionally diagnostic" interim test in mathematics. Different forms of the new test are to be administered to students every two or three months. A student's results are reported as a total, all-encompassing score and also as five "strands" that are advertised as "distinctive and diagnostic." Your district's administrators are deciding whether to purchase copies of this new test. Which one of the following would be the most appropriate source of validity evidence for the newly published test?
Internal structure evidence
A compulsive middle-school teacher, even after reading Chapter 2's recommendation urging teachers not to collect reliability evidence for their own teacher-made tests, perseverates in calculating Kuder-Richardson indices for all of his major and minor classroom exams. What kind of reliability indicator is this teacher attempting to compute?
Internal-consistency reliability coefficient
Which indices of a test's reliability is most often provided by developers of the kinds of standardized tests destined for use with large numbers of students?
Internal-consistency reliability coefficients
Please imagine that the reading specialists in a district's central office have developed what they have labeled a "diagnostic reading test." You think its so-called subscale scores are not diagnostic at all but are simply measuring a single overall dimension you believe to be "reading comprehension." In this setting, which of the following kinds of reliability evidence would supply the most relevant information related to your disagreement with the reading test's developers?
Internal-consistency reliability evidence
"Just as physicians need to know about patients' blood pressure and what it indicates, teachers need to know about educational testing. It is simply part of what a solid educational professional needs to understand." This is:
Neither a traditional reason nor one of today's reasons for teachers to know about assessment
"Leaders of both the National Education Association and the American Federation of Teachers have strongly endorsed the more frequent assessment of students as a way to better educate the nation's children. Accordingly, current teachers need to know about assessment fundamentals before they try to teach their students."
Neither a traditional reason nor one of today's reasons for teachers to know about assessment
What are the two major causes of assessment bias we encounter in typical educational tests?
Offensiveness and unfair penalization
"Because terrific pressures are currently obliging teachers to significantly boost their students' test scores, every teacher needs to understand how, in most instances, an educational test actually defines the nature of what's to be taught."
One of today's reasons for teachers to know about assessment
"Just a year ago, the voters in our school district voted favorably in a huge school-levy election that brought in substantial tax dollars for our schools. Most of the district's teachers are convinced that this positive support for the schools was based on our schools' consistently high rankings on the state's annual accountability tests."
One of today's reasons for teachers to know about assessment
"When I plan a new unit of instruction for my fourth-grade students, I always, I mean, first create the end-of-unit assessments I'll be using. By doing so, I acquire a much more clear idea of where I am heading instructionally, and thereby help my lesson-planning immensely." This is:
One of today's reasons for teachers to know about assessment
"I think every teacher has a professional obligation to make sure their school is accurately evaluated, but also to assure that each teacher in the school is accurately evaluated. Moreover, when appraising schools or teachers, most people rely on students' test performances. So it is abundantly clear that teachers must understand what's going on when kids are tested." This is:
One of today's reasons for teachers to know about assessment
Measurement specialists assert that validation efforts are preoccupied with the degree to which we use students' test performances to support the accuracy of score-based inferences. Which of the following best identifies the focus of those inferences?
Students' unseen skills and knowledge
Suppose that a state's governor has appointed a blue-ribbon committee to establish a test-based promotion-denial system for reducing the number of sixth-grade students who are "socially" promoted to the seventh grade. The blue-ribbon committee's proposal calls for sixth-graders to be able to take a new high-stakes promotion exam at any time they wish during their grade-six school year. Given these circumstances, which one of the following evidences of the new promotion exam's measurement consistency should be collected?
Test-retest reliability
A self-report inventory intended to measure secondary students' confidence that they are college and career ready" has recently been developed by administrators in an urban school district. To collect evidence bearing on the consistency with which this new inventory measures students' status with respect to this affective disposition, the inventory is administered to nearly 500 students in late January and then, a few weeks later, in mid-February. When students' scores on the two administrations have been correlated, which one of the following indicators of reliability will have been generated?
Test-retest reliability coefficient
Please review the following item for assessment bias. It was used to assess the basic computation mathematics aims being pursued by an inner-city, elementary school's staff in a Midwestern state. Ramon Ruiz is sorting out empty tin cans he found in the neighborhood. He has four piles based on different colors of the cans. He thinks he has made a mistake in adding up how many cans are in each pile. Please identify Ramon's addition statement that is in error. a. 20 bean cans plus 32 cans = 52 cans. b. 43 bean cans plus 18 cans = 61 cans. c. 38 bean cans plus 39 cans = 76 cans. d. 54 bean cans plus 12 cans = 66 cans
The assessment item appears to be biased against Americans of Latino backgrounds.
Review the following reading-comprehension item for assessment bias. In certain Christian religions, there are gradients of sinful acts. For example, in the Roman Catholic Church, a venial sin need not be confessed to a priest, whereas a mortal sin must definitely be confessed. Based on a context clue contained in the paragraph above, which of the following statements is most accurate?
The assessment item appears to be biased in favor of students who are Roman Catholics
Validity evidence can be collected from a number of sources. Suppose, for instance, that a mathematics test has been built by a school district's officials to help identify those middle-school students who are unlikely to pass a statewide eleventh-grade high-school diploma test. The new test will routinely be given to the district's seventh-grade students. To secure evidence supporting the validity of this kind of predictive application, the new test will be administered to current seventh-graders, and the seventh-grade tests will also be given to the district's current eleventh-graders. This will permit the eleventh-graders' two sets of test results to be compared. Which of the following best describes this source of validity evidence?
The relationship of eleventh-graders' performances on the two tests
What is the chief function of validity evidence when employed to confirm the accuracy of score-based interpretations about test-takers' status in relation to specific uses of an educational test?
To support relevant propositions in a validity argument that's marshaled to determine the defensibility of certain score-based interpretations
One of your colleagues, a high-school chemistry teacher, believes that certain of her students have somehow gained access to the final exams she has always used in her classes. To address what she calls "this serious security violation," she has created four new versions of all of her major exams—four versions that she regards as "equally challenging." She has recently sought your advice regarding what sort of reliability evidence she ought to be collecting regarding these new multiple renditions of her chemistry exams. In this situation, which one of the following should you be recommending to her?
evidence regarding the alternate-form reliability of her several exams
Based on the 2014 edition of the Standards for Educational and Psychological Testing, and on common sense, which one of the following statements about students' test results represents a potentially appropriate phrasing that's descriptive of a set of students' test performances?
"Students' scores on the test permit valid interpretations for this test's use."
Suppose that the developers of a new science achievement test had inadvertently laden their test's items with gender-based stereotypes regarding the role of women in science and, when the new test was given, the test scores of girls were markedly lower than the test scores of boys. Which of the following deficits most likely accounts for this gender disparity in students' scores?
Construct-irrelevant variance
If a multistate assessment consortium has generated a new performance test of students' oral communication skills and wishes to verify that students' scores on the performance test remain relatively similar regardless of the time during the school year when the test was completed, which of the following kinds of consistency evidence would be most appropriate?
Test-retest evidence of reliability