FINAL EXAM CHAP 2, 3, 4, 5
A recently established for-profit measurement company has just published a brand-new set of "interim tests" intended to measure students' progress in attaining certain scientific skills designated as "21st century competencies." There are four supposedly equivalent versions of each interim test, and each of these four versions is to be administered about every two months. Correlation coefficients showing the relationship between every pair of the four versions are made available to users. What kind of coefficient do these between-version correlations represent?
An alternate-form coefficient
A compulsive middle-school teacher, even after reading Chapter 2's recommendation urging teachers not to collect reliability evidence for their own teacher-made tests, perseverates in calculating Kuder-Richardson indices for all of his major and minor classroom exams. What kind of reliability indicator is this teacher attempting to compute?
An internal-consistency reliability coefficient
A district's new teacher-evaluation procedure is heavily based on observations of teachers' classroom performances. School-site administrators, along with a small group of recently retired school principals, have been observing the teachers, then supplying evaluations related to teachers' observed instructional effectiveness. When officials of the teachers' union raise a concern about these teacher-evaluators' inconsistencies of judgment when using a district-devised observation form, the district's superintendent asks her staff to collect validity evidence bearing on the teachers' union concern. Which one of the following sources of validity evidence will most likely play a major role in resolving the charge that the classroom-observation evidence is flawed?
Evidence based on response processes
Because the National Assessment of Educational Progress (NAEP) is widely employed as a "grade-promotion" and "diploma-denial" exam for individual students, teachers whose students take NAEP tests should familiarize themselves with the content in NAEP assessment frameworks to identify potential emphases for classroom assessments.
FALSE
Even though teachers should not take away too much instructional time because of their classroom assessments, the number of assessment targets addressed by any classroom test should still be numerous and wide-ranging so that more curricular content can be covered.
FALSE
In recognition of the significant impact a state's official accountability tests can have on what that state's students ought to be learning, it is apparent that a teacher's classroom tests should only measure the very same skills and knowledge that are assessed on the state's accountability tests—and never assess students' mastery of en route skills or bodies of knowledge that teachers might see as contributory to mastering what's measured on the state tests.
FALSE
Whenever possible, teachers should attempt to have their assessments focus quite equally on the cognitive, affective, and psychomotor domains because almost all human acts including students' test-taking—rely to a considerable extent on those three domains of behavior.
FALSE
One of your colleagues, a high-school chemistry teacher, believes that certain of her students have somehow gained access to the final exams she has always used in her classes. To address what she calls "this serious security violation," she has created four new versions of all of her major exams—four versions that she regards as "equally challenging." She has recently sought your advice regarding what sort of reliability evidence she ought to be collecting regarding these new multiple renditions of her chemistry exams. In this situation, which one of the following should you be recommending to her?
evidence regarding the alternate-form reliability of her several exam
Amy Johnson has a large collection of Barbie dolls. Originally, she had 49. Recently, she somehow lost 12 Barbies. How many Barbies does Amy have left? (Show your work.)
- The assessment might offend people who view girls as having much broader interests than playing with dolls.
If a teacher's students include children with disabilities or children who are English Language Learners, which one of the following three assertions about assessment bias is most defensible?
-Because assessment bias erodes the validity of inferences derivative from students' test performances, even greater effort should be made to reduce assessment bias when working with these two distinctive populations.
Why do some members of the measurement community prefer to use the phrase "absence-of-bias" rather than "assessment bias" when quantitatively reporting the degree to which an educational test appears to be biased?
-Because both reliability and validity, two key attributes of educational tests, are positive, "to be sought" qualities, so too is "absence-of-bias" a positive quality to be sought in educational tests.
Suppose that the developers of a new science achievement test had inadvertently laden their test's items with gender-based stereotypes regarding the role of women in science and, when the new test was given, the test scores of girls were markedly lower than the test scores of boys. Which of the following deficits most likely accounts for this gender disparity in students' scores?
-Construct-irrelevant variance
Which one of the following four pairs of validity evidence most frequently revolves exclusively around judgments focused on test content
-Developmental-care documentation and external content reviews by nonpartisan judges
Assume a state's education authorities have recently established a policy that, in order for students to be promoted to the next grade level, those students must pass a state-supervised English and language arts (ELA) exam. Administered near the close of Grades three, six, and eight, the three new grade-level exams are intended to determine a student's mastery of the official state-approved ELA curricular targets for those three grades. As state authorities set out to provide support for these "promotion-denial" exams, which one of the following sources of validity evidence are they likely to rely on most heavily?
-Evidence based on test content
Although the way a state's public schools are run is up to officials of that state, not the federal government, the U.S. Supreme Court has ruled that state-taught students must still be granted their constitutionally guaranteed rights, and this means that teachers should be guided about classroom-assessment coverage by the U.S. Constitution
-FALSE
Because parents' preferences regarding what their children should be learning are not only motivationally useful for teachers to employ but also constitute significant curricular guidance for educators, teachers should strive to incorporate parents' curricular opinions in all of their classroom assessments.
-FALSE
Because students' growth in their mastery of cognitive skills and knowledge is such a patently important factor by which to evaluate the success of not only schools, but also teachers, classroom assessments should focus exclusively on measuring students' cognitive status.
-FALSE
If a state's education officials have endorsed the Common Core State Standards, but have chosen to create their state's own accountability tests to measure those standards (instead of using tests built by a multistate assessment consortium), it is still sensible for a teacher in that state to seek test-construction guidance from what's measured by consortium-created tests.
-FALSE
If a teacher decides to seek advice from, say, a group of several teacher colleagues regarding the appropriateness of the content for the teacher's planned classroom assessment, professional ethics demand that the curricular counsel of those colleagues must be accepted.
-FALSE
Which of the following represents the most appropriate strategy by which to support the validity of score-based interpretations for specific uses?
-Generation of an evidence-laden validity argument in support of a particular usage-specified score interpretation
The relationship between the degree to which an educational test is biased and the test's disparate impact on certain groups of learners is an important one. Which of the following statements best captures the nature of this relationship?
-If an educational assessment displays a disparate impact on different groups of test-takers, it may or may not be biased.
An independent, for-profit measurement firm has recently published what the firm's promotional literature claims to be "an instructionally diagnostic" interim test in mathematics. Different forms of the new test are to be administered to students every two or three months. A student's results are reported as a total, all-encompassing score and also as five "strands" that are advertised as "distinctive and diagnostic." Your district's administrators are deciding whether to purchase copies of this new test. Which one of the following would be the most appropriate source of validity evidence for the newly published test?
-Internal structure evidence
What are the two major causes of assessment bias we encounter in typical educational tests?
-Offensiveness and unfair penalization
Suppose that you and several other teachers in a middle school were trying to construct a new test intended to be predictive of high-school students' subsequent scores on the SAT and ACT college admissions exams. Moreover, suppose that you were in no particular hurry to assemble validity evidence in support of the accuracy of those inferred predictions. Which one of the following sources of validity evidence would supply the most compelling support for the validity of your anticipated predictions?
-Predictive validity evidence based on the new test's relation to other variables
Based on the 2014 edition of the Standards for Educational and Psychological Testing, and on common sense, which one of the following statements about students' test results represents a potentially appropriate phrasing that's descriptive of a set of students' test performances
-Students' scores on the test permit valid interpretations for this test's use.
Measurement specialists assert that validation efforts are preoccupied with the degree to which we use students' test performances to support the accuracy of score-based inferences. Which of the following best identifies the focus of those inferences?
-Students' unseen skills and knowledge
Because a classroom test's influence on a teacher's instructional decision making is one of the most beneficial dividends of classroom assessment, a teacher should think through in advance how certain levels of student performances would influence a teacher's test-based instructional decisions—and then abandon or revise any tests that have no decision-impact linked to their results
-TRUE
Because recent years have seen both schools and teachers being evaluated on the basis of students' performances on high-stakes tests, such as a state's annual accountability tests, it becomes almost imperative for teachers to determine the degree to which what's measured by their classroom assessments can contribute to improved students' performances on such significant tests.
-TRUE
Because the curricular recommendations of national subject-matter associations typically represent the best curricular thinking of the most able subject-matter specialists in a given field, as teachers try to identify the knowledge, skills, and affect to measure in their own classroom assessments, the views of such national organizations can often provide helpful curricular insights.
-TRUE
If a teacher's students are annually supposed to master an officially approved set of state curricular standards, and a state accountability test aligned with those standards is given each year, teachers should surely try to make sure that what their classroom tests measure is congruent —or contributory to—what's assessed by such state accountability tests.
-TRUE
If an elementary teacher has designed his instructional system so it centers on the use of "catch-up" and "enrichment" learning centers where, based on classroom-assessment performances, students self-assign themselves to one of these centers, an early-on factor to consider is whether the classroom assessments should yield norm-referenced or criterion-referenced inferences.
-TRUE
In recognition of how much time it typically takes for teachers to score students' responses to constructed-response items, especially those items calling for extended responses, an early factor for a teacher to consider when creating a classroom assessment is whether the teacher has sufficient time to properly score students' responses to a test containing constructed-response items.
-TRUE
Teachers will find that their classroom assessments are most useful when a teacher's earliest thinking about the nature of such assessments is explicitly intended to contribute to an upcoming educational decision to be made by the teacher.
-TRUE
Which of the following strategies seems most suitable for teachers to use when trying to detect and eliminate assessment bias in their own teacher-made tests?
-Teachers should pay particular attention to the possibility that assessment bias may have crept into their teacher-made tests and should strive to rely on their best judgments about the presence of such bias on all of their classroom tests—but especially on their most significant classroom assessments.
Ramon Ruiz is sorting out empty BEAN tin cans he found in the neighborhood. He has four piles based on different colors of the cans. He thinks he has made a mistake in adding up how many cans are in each pile. Please identify
-The assessment item appears to be biased against Americans of Latino backgrounds.
In certain Christian religions, there are gradients of sinful acts. For example, in the Roman Catholic Church, a venial sin need not be confessed to a priest, whereas a mortal sin must definitely be confessed. Based on a context clue contained in the paragraph above, which of the following statements is most accurate?
-The assessment item appears to be biased in favor of students who are Roman Catholics.
When external reviewers of a test's content attempt to judge how well a test's items mesh with a specified collection of curricular aims, which one of the following pairs of alignment indicators should be present?
-The degree to which each of a test's items is aligned to one or more of the specified curricular aims and a content-coverage indication representing the proportion of the curricular aims adequately represented by the test's items
Validity evidence can be collected from a number of sources. Suppose, for instance, that a mathematics test has been built by a school district's officials to help identify those middle-school students who are unlikely to pass a statewide eleventh-grade high-school diploma test. The new test will routinely be given to the district's seventh-grade students. To secure evidence supporting the validity of this kind of predictive application, the new test will be administered to current seventh-graders, and the seventh-grade tests will also be given to the district's currenteleventh-graders. This will permit the eleventh-graders' two sets of test results to be compared. Which of the following best describes this source of validity evidence?
-The relationship of eleventh-graders' performances on the two tests
In which one of the following four statements are all of the pronouns used properly?
-This assessment item does not appear to be biased.
Please compose a short essay of 500 and 1,000 words on the topic: "Soccer Outside the United States." Either use one of our classroom computers or write the essay by hand. Be sure to engage in appropriate prewriting activities, draft an initial version of the essay, and then revise your draft at least once. You will have ninety minutes to complete this task.
-This item seems to be biased in favor of children born outside the United States, many of whom may be more familiar with non-U.S. soccer than will children be who are born in the United States.
What is the chief function of validity evidence when employed to confirm the accuracy of score-based interpretations about test-takers' status in relation to specific uses of an educational test?
-To support relevant propositions in a validity argument that's marshaled to determine the defensibility of certain score-based interpretations
A district chooses a commercial test to provide information about the social studies skills and knowledge that the students seem to be having difficulty in mastering. A relatively elaborate series of "alignment" studies will be carried out early in the school year in an attempt to provide validity evidence to confirm this instructionally supportive usage. On which of the following sources of validity evidence is it most likely those who are supervising these alignment studies will rely?
-Validity evidence based on test content
Which one of the following sources of validity evidence should be of most interest to teachers when evaluating their own teacher-made tests?
-Validity evidence based on test content
Webb's alignment procedures have become increasingly popular among American educators. Which one of the following statements is an accurate assertion regarding this widely used procedure for gauging the degree to which a test's items are representatively reflective of a set of curricular aims?
-Webb's alignment procedure is, at bottom, a judgmentally based validation procedure centered on the appropriateness of test content.
Only one of the following statements about a test's classification consistency is accurate. Select the accurate statement regarding classification consistency.
Classification consistency indicators represent the proportion of students classified identically on two testing occasions.
Although the NAEP assessment frameworks are, technically, supposed to guide NAEP item-development and not function as curricular frameworks because of the long-standing U.S. tradition that the federal government shouldn't influence what is taught in state-governed public schools, teachers can still get good ideas about what to assess and how to assess it from the illustrative NAEP items that are available to the public.
TRUE
Because an excessive number of assessment targets in a teacher's classroom assessments can make it difficult to maintain an instructional focus on too many assessable outcomes, it is wiser for teachers to employ grain-sizes calling for a modest number of broad-scope assessment targets than adopt a large number ofsmall-scope assessment targets.
TRUE
Because of such needs as how to grade this year's students or whether changes are needed in next year's instructional procedures, teachers should invariably link their planned classroom assessments explicitly to these sorts of decisions from the earliest moments a classroom test is being conceptualized.
TRUE
It is often remarkably helpful for teachers to ask their coworkers to review the potential emphases of underdevelopment classroom assessments because teachers and administrators, especially those who are familiar with what's being taught and the sorts of students to whom it is taught, can provide useful insights regarding what should be assessed—and what shouldn't.
TRUE
If educators wish to accurately estimate the likelihood of consistent decisions about students who score at or near a high-stakes test's previously determined cut-score, which of the following indicators would be most useful for this purpose?
a conditional standard error of measurement (near the cut-score)
A dozen middle-school mathematics teachers in a large school district have collaborated to create a 30-item test of students' grasp of what the test's developers have labeled "Essential Quantitative Aptitude," that is, students' EQA. All 30 items were constructed in an effort to measure each student's EQA. Before using the test with many students, however, the developers wish to verify that all or most of its items are functioning homogeneously, that is, are properly aimed at gauging a test-taker's EQA. On which of the following indicators of assessment reliability should the test developers focus their efforts?
an internal-consistency reliability coefficient
Please assume you are a middle-school English teacher who, despite this chapter's urging that you rarely, if ever, collect reliability evidence for your own tests, stubbornly decides to do so for all of your mid-term and final exams. Although you wish to determine the reliability of your tests for the group of students in each of yourclasses, you only wish to administer the tests destined for such reliability analyses on one occasion, not two or more. Given this constraint, which of the following coefficients would be most suitable for your reliability-determination purposes?
an internal-consistency reliability coefficient
A self-report inventory intended to measure secondary students' confidence that they are "college and career ready" has recently been developed by administrators in an urban school district. To collect evidence bearing on the consistency with which this new inventory measures students' status with respect to this affective disposition, the inventory is administered to nearly 500 students in late January and then, a few weeks later, in mid-February. When students' scores on the two administrations have been correlated, which one of the following indicators of reliability will have been generated?
a test-retest reliability coefficient
A district's new computer-administered test of students' mastery of "composition conventions" has recently been used with their district's eleventh- and twelfth-grade students. To help judge the consistency with which the test measures students' knowledge of the assessed conventions, district officials have computed Cronbach's coefficient alpha for students who completed this brand-new exam. Which of the following kinds of reliability evidence do these alpha coefficients represent?
internal consistency
Please imagine that the reading specialists in a district's central office have developed what they have labeled a "diagnostic reading test." You think its so-called subscale scores are not diagnostic at all but are simply measuring a single overall dimension you believe to be "reading comprehension." In this setting, which of the following kinds of reliability evidence would supply the most relevant information related to your disagreement with the reading test's developers?
internal-consistency reliability evidence
If a multistate assessment consortium has generated a new performance test of students' oral communication skills and wishes to verify that students' scores on the performance test remain relatively similar regardless of the time during the school year when the test was completed, which of the following kinds of consistency evidence would be most appropriate?
test-retest evidence of reliability
Suppose that a state's governor has appointed a blue-ribbon committee to establish a test-based promotion-denial system for reducing the number of sixth-grade students who are "socially" promoted to the seventh grade. The blue-ribbon committee's proposal calls for sixth-graders to be able to take a new high-stakes promotion exam at any time they wish during their grade-six school year. Given these circumstances, which one of the following evidences of the new promotion exam's measurement consistency should be collected?
test-retest reliability
Which of the following indices of a test's reliability is most often provided by developers of the kinds of standardized tests destined for use with large numbers of students?
internal-consistency reliability coefficients