FINAL EXAM CHAP 2, 3, 4, 5

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

A recently established​ for-profit measurement company has just published a​ brand-new set of​ "interim tests" intended to measure​ students' progress in attaining certain scientific skills designated as​ "21st century​ competencies." There are four supposedly equivalent versions of each interim​ test, and each of these four versions is to be administered about every two months. Correlation coefficients showing the relationship between every pair of the four versions are made available to users. What kind of coefficient do these​ between-version correlations​ represent?

An​ alternate-form coefficient

A compulsive​ middle-school teacher, even after reading Chapter​ 2's recommendation urging teachers not to collect reliability evidence for their own​ teacher-made tests, perseverates in calculating​ Kuder-Richardson indices for all of his major and minor classroom exams. What kind of reliability indicator is this teacher attempting to​ compute?

An​ internal-consistency reliability coefficient

A district's new teacher-evaluation procedure is heavily based on observations of teachers' classroom performances. School-site administrators, along with a small group of recently retired school principals, have been observing the teachers, then supplying evaluations related to teachers' observed instructional effectiveness. When officials of the teachers' union raise a concern about these teacher-evaluators' inconsistencies of judgment when using a district-devised observation form, the district's superintendent asks her staff to collect validity evidence bearing on the teachers' union concern. Which one of the following sources of validity evidence will most likely play a major role in resolving the charge that the classroom-observation evidence is flawed?

Evidence based on response processes

Because the National Assessment of Educational Progress (NAEP) is widely employed as a "grade-promotion" and "diploma-denial" exam for individual students, teachers whose students take NAEP tests should familiarize themselves with the content in NAEP assessment frameworks to identify potential emphases for classroom assessments.

FALSE

Even though teachers should not take away too much instructional time because of their classroom assessments, the number of assessment targets addressed by any classroom test should still be numerous and wide-ranging so that more curricular content can be covered.

FALSE

In recognition of the significant impact a state's official accountability tests can have on what that state's students ought to be learning, it is apparent that a teacher's classroom tests should only measure the very same skills and knowledge that are assessed on the state's accountability tests—and never assess students' mastery of en route skills or bodies of knowledge that teachers might see as contributory to mastering what's measured on the state tests.

FALSE

Whenever possible, teachers should attempt to have their assessments focus quite equally on the cognitive, affective, and psychomotor domains because almost all human acts including students' test-taking—rely to a considerable extent on those three domains of behavior.

FALSE

One of your​ colleagues, a​ high-school chemistry​ teacher, believes that certain of her students have somehow gained access to the final exams she has always used in her classes. To address what she calls​ "this serious security​ violation," she has created four new versions of all of her major exams—four versions that she regards as​ "equally challenging." She has recently sought your advice regarding what sort of reliability evidence she ought to be collecting regarding these new multiple renditions of her chemistry exams. In this​ situation, which one of the following should you be recommending to​ her?

evidence regarding the​ alternate-form reliability of her several exam

Amy Johnson has a large collection of Barbie dolls. Originally, she had 49. Recently, she somehow lost 12 Barbies. How many Barbies does Amy have left? (Show your work.)

- The assessment might offend people who view girls as having much broader interests than playing with dolls.

If a teacher's students include children with disabilities or children who are English Language Learners, which one of the following three assertions about assessment bias is most defensible?

-Because assessment bias erodes the validity of inferences derivative from students' test performances, even greater effort should be made to reduce assessment bias when working with these two distinctive populations.

Why do some members of the measurement community prefer to use the phrase "absence-of-bias" rather than "assessment bias" when quantitatively reporting the degree to which an educational test appears to be biased?

-Because both reliability and validity, two key attributes of educational tests, are positive, "to be sought" qualities, so too is "absence-of-bias" a positive quality to be sought in educational tests.

Suppose that the developers of a new science achievement test had inadvertently laden their test's items with gender-based stereotypes regarding the role of women in science and, when the new test was given, the test scores of girls were markedly lower than the test scores of boys. Which of the following deficits most likely accounts for this gender disparity in students' scores?

-Construct-irrelevant variance

Which one of the following four pairs of validity evidence most frequently revolves exclusively around judgments focused on test content

-Developmental-care documentation and external content reviews by nonpartisan judges

Assume a state's education authorities have recently established a policy that, in order for students to be promoted to the next grade level, those students must pass a state-supervised English and language arts (ELA) exam. Administered near the close of Grades three, six, and eight, the three new grade-level exams are intended to determine a student's mastery of the official state-approved ELA curricular targets for those three grades. As state authorities set out to provide support for these "promotion-denial" exams, which one of the following sources of validity evidence are they likely to rely on most heavily?

-Evidence based on test content

Although the way a state's public schools are run is up to officials of that state, not the federal government, the U.S. Supreme Court has ruled that state-taught students must still be granted their constitutionally guaranteed rights, and this means that teachers should be guided about classroom-assessment coverage by the U.S. Constitution

-FALSE

Because parents' preferences regarding what their children should be learning are not only motivationally useful for teachers to employ but also constitute significant curricular guidance for educators, teachers should strive to incorporate parents' curricular opinions in all of their classroom assessments.

-FALSE

Because students' growth in their mastery of cognitive skills and knowledge is such a patently important factor by which to evaluate the success of not only schools, but also teachers, classroom assessments should focus exclusively on measuring students' cognitive status.

-FALSE

If a state's education officials have endorsed the Common Core State Standards, but have chosen to create their state's own accountability tests to measure those standards (instead of using tests built by a multistate assessment consortium), it is still sensible for a teacher in that state to seek test-construction guidance from what's measured by consortium-created tests.

-FALSE

If a teacher decides to seek advice from, say, a group of several teacher colleagues regarding the appropriateness of the content for the teacher's planned classroom assessment, professional ethics demand that the curricular counsel of those colleagues must be accepted.

-FALSE

Which of the following represents the most appropriate strategy by which to support the validity of score-based interpretations for specific uses?

-Generation of an evidence-laden validity argument in support of a particular usage-specified score interpretation

The relationship between the degree to which an educational test is biased and the test's disparate impact on certain groups of learners is an important one. Which of the following statements best captures the nature of this relationship?

-If an educational assessment displays a disparate impact on different groups of test-takers, it may or may not be biased.

An independent, for-profit measurement firm has recently published what the firm's promotional literature claims to be "an instructionally diagnostic" interim test in mathematics. Different forms of the new test are to be administered to students every two or three months. A student's results are reported as a total, all-encompassing score and also as five "strands" that are advertised as "distinctive and diagnostic." Your district's administrators are deciding whether to purchase copies of this new test. Which one of the following would be the most appropriate source of validity evidence for the newly published test?

-Internal structure evidence

What are the two major causes of assessment bias we encounter in typical educational tests?

-Offensiveness and unfair penalization

Suppose that you and several other teachers in a middle school were trying to construct a new test intended to be predictive of high-school students' subsequent scores on the SAT and ACT college admissions exams. Moreover, suppose that you were in no particular hurry to assemble validity evidence in support of the accuracy of those inferred predictions. Which one of the following sources of validity evidence would supply the most compelling support for the validity of your anticipated predictions?

-Predictive validity evidence based on the new test's relation to other variables

Based on the 2014 edition of the Standards for Educational and Psychological Testing, and on common sense, which one of the following statements about students' test results represents a potentially appropriate phrasing that's descriptive of a set of students' test performances

-Students' scores on the test permit valid interpretations for this test's use.

Measurement specialists assert that validation efforts are preoccupied with the degree to which we use students' test performances to support the accuracy of score-based inferences. Which of the following best identifies the focus of those inferences?

-Students' unseen skills and knowledge

Because a classroom test's influence on a teacher's instructional decision making is one of the most beneficial dividends of classroom assessment, a teacher should think through in advance how certain levels of student performances would influence a teacher's test-based instructional decisions—and then abandon or revise any tests that have no decision-impact linked to their results

-TRUE

Because recent years have seen both schools and teachers being evaluated on the basis of students' performances on high-stakes tests, such as a state's annual accountability tests, it becomes almost imperative for teachers to determine the degree to which what's measured by their classroom assessments can contribute to improved students' performances on such significant tests.

-TRUE

Because the curricular recommendations of national subject-matter associations typically represent the best curricular thinking of the most able subject-matter specialists in a given field, as teachers try to identify the knowledge, skills, and affect to measure in their own classroom assessments, the views of such national organizations can often provide helpful curricular insights.

-TRUE

If a teacher's students are annually supposed to master an officially approved set of state curricular standards, and a state accountability test aligned with those standards is given each year, teachers should surely try to make sure that what their classroom tests measure is congruent —or contributory to—what's assessed by such state accountability tests.

-TRUE

If an elementary teacher has designed his instructional system so it centers on the use of "catch-up" and "enrichment" learning centers where, based on classroom-assessment performances, students self-assign themselves to one of these centers, an early-on factor to consider is whether the classroom assessments should yield norm-referenced or criterion-referenced inferences.

-TRUE

In recognition of how much time it typically takes for teachers to score students' responses to constructed-response items, especially those items calling for extended responses, an early factor for a teacher to consider when creating a classroom assessment is whether the teacher has sufficient time to properly score students' responses to a test containing constructed-response items.

-TRUE

Teachers will find that their classroom assessments are most useful when a teacher's earliest thinking about the nature of such assessments is explicitly intended to contribute to an upcoming educational decision to be made by the teacher.

-TRUE

Which of the following strategies seems most suitable for teachers to use when trying to detect and eliminate assessment bias in their own teacher-made tests?

-Teachers should pay particular attention to the possibility that assessment bias may have crept into their teacher-made tests and should strive to rely on their best judgments about the presence of such bias on all of their classroom tests—but especially on their most significant classroom assessments.

Ramon Ruiz is sorting out empty BEAN tin cans he found in the neighborhood. He has four piles based on different colors of the cans. He thinks he has made a mistake in adding up how many cans are in each pile. Please identify

-The assessment item appears to be biased against Americans of Latino backgrounds.

In certain Christian religions, there are gradients of sinful acts. For example, in the Roman Catholic Church, a venial sin need not be confessed to a priest, whereas a mortal sin must definitely be confessed. Based on a context clue contained in the paragraph above, which of the following statements is most accurate?

-The assessment item appears to be biased in favor of students who are Roman Catholics.

When external reviewers of a test's content attempt to judge how well a test's items mesh with a specified collection of curricular aims, which one of the following pairs of alignment indicators should be present?

-The degree to which each of a test's items is aligned to one or more of the specified curricular aims and a content-coverage indication representing the proportion of the curricular aims adequately represented by the test's items

Validity evidence can be collected from a number of sources. Suppose, for instance, that a mathematics test has been built by a school district's officials to help identify those middle-school students who are unlikely to pass a statewide eleventh-grade high-school diploma test. The new test will routinely be given to the district's seventh-grade students. To secure evidence supporting the validity of this kind of predictive application, the new test will be administered to current seventh-graders, and the seventh-grade tests will also be given to the district's currenteleventh-graders. This will permit the eleventh-graders' two sets of test results to be compared. Which of the following best describes this source of validity evidence?

-The relationship of eleventh-graders' performances on the two tests

In which one of the following four statements are all of the pronouns used properly?

-This assessment item does not appear to be biased.

Please compose a short essay of 500 and 1,000 words on the topic: "Soccer Outside the United States." Either use one of our classroom computers or write the essay by hand. Be sure to engage in appropriate prewriting activities, draft an initial version of the essay, and then revise your draft at least once. You will have ninety minutes to complete this task.

-This item seems to be biased in favor of children born outside the United States, many of whom may be more familiar with non-U.S. soccer than will children be who are born in the United States.

What is the chief function of validity evidence when employed to confirm the accuracy of score-based interpretations about test-takers' status in relation to specific uses of an educational test?

-To support relevant propositions in a validity argument that's marshaled to determine the defensibility of certain score-based interpretations

A district chooses a commercial test to provide information about the social studies skills and knowledge that the students seem to be having difficulty in mastering. A relatively elaborate series of "alignment" studies will be carried out early in the school year in an attempt to provide validity evidence to confirm this instructionally supportive usage. On which of the following sources of validity evidence is it most likely those who are supervising these alignment studies will rely?

-Validity evidence based on test content

Which one of the following sources of validity evidence should be of most interest to teachers when evaluating their own teacher-made tests?

-Validity evidence based on test content

Webb's alignment procedures have become increasingly popular among American educators. Which one of the following statements is an accurate assertion regarding this widely used procedure for gauging the degree to which a test's items are representatively reflective of a set of curricular aims?

-Webb's alignment procedure is, at bottom, a judgmentally based validation procedure centered on the appropriateness of test content.

Only one of the following statements about a​ test's classification consistency is accurate. Select the accurate statement regarding classification consistency.

Classification consistency indicators represent the proportion of students classified identically on two testing occasions.

Although the NAEP assessment frameworks are, technically, supposed to guide NAEP item-development and not function as curricular frameworks because of the long-standing U.S. tradition that the federal government shouldn't influence what is taught in state-governed public schools, teachers can still get good ideas about what to assess and how to assess it from the illustrative NAEP items that are available to the public.

TRUE

Because an excessive number of assessment targets in a teacher's classroom assessments can make it difficult to maintain an instructional focus on too many assessable outcomes, it is wiser for teachers to employ grain-sizes calling for a modest number of broad-scope assessment targets than adopt a large number ofsmall-scope assessment targets.

TRUE

Because of such needs as how to grade this year's students or whether changes are needed in next year's instructional procedures, teachers should invariably link their planned classroom assessments explicitly to these sorts of decisions from the earliest moments a classroom test is being conceptualized.

TRUE

It is often remarkably helpful for teachers to ask their coworkers to review the potential emphases of underdevelopment classroom assessments because teachers and administrators, especially those who are familiar with what's being taught and the sorts of students to whom it is taught, can provide useful insights regarding what should be assessed—and what shouldn't.

TRUE

If educators wish to accurately estimate the likelihood of consistent decisions about students who score at or near a​ high-stakes test's previously determined​ cut-score, which of the following indicators would be most useful for this​ purpose?

a conditional standard error of measurement​ (near the​ cut-score)

A dozen​ middle-school mathematics teachers in a large school district have collaborated to create a​ 30-item test of​ students' grasp of what the​ test's developers have labeled​ "Essential Quantitative​ Aptitude," that​ is, students' EQA. All 30 items were constructed in an effort to measure each​ student's EQA. Before using the test with many​ students, however, the developers wish to verify that all or most of its items are functioning​ homogeneously, that​ is, are properly aimed at gauging a​ test-taker's EQA. On which of the following indicators of assessment reliability should the test developers focus their​ efforts?

an​ internal-consistency reliability coefficient

Please assume you are a​ middle-school English teacher​ who, despite this​ chapter's urging that you​ rarely, if​ ever, collect reliability evidence for your own​ tests, stubbornly decides to do so for all of your​ mid-term and final exams. Although you wish to determine the reliability of your tests for the group of students in each of your​classes, you only wish to administer the tests destined for such reliability analyses on one​ occasion, not two or more. Given this​ constraint, which of the following coefficients would be most suitable for your​ reliability-determination purposes?

an​ internal-consistency reliability coefficient

A​ self-report inventory intended to measure secondary​ students' confidence that they are​ "college and career​ ready" has recently been developed by administrators in an urban school district. To collect evidence bearing on the consistency with which this new inventory measures​ students' status with respect to this affective​ disposition, the inventory is administered to nearly 500 students in late January and​ then, a few weeks​ later, in​ mid-February. When​ students' scores on the two administrations have been​ correlated, which one of the following indicators of reliability will have been​ generated?

a​ test-retest reliability coefficient

A​ district's new​ computer-administered test of​ students' mastery of​ "composition conventions" has recently been used with their​ district's eleventh- and​ twelfth-grade students. To help judge the consistency with which the test measures​ students' knowledge of the assessed​ conventions, district officials have computed​ Cronbach's coefficient alpha for students who completed this​ brand-new exam. Which of the following kinds of reliability evidence do these alpha coefficients​ represent?

internal consistency

Please imagine that the reading specialists in a​ district's central office have developed what they have labeled a​ "diagnostic reading​ test." You think its​ so-called subscale scores are not diagnostic at all but are simply measuring a single overall dimension you believe to be​ "reading comprehension." In this​ setting, which of the following kinds of reliability evidence would supply the most relevant information related to your disagreement with the reading​ test's developers?

internal-consistency reliability evidence

If a multistate assessment consortium has generated a new performance test of​ students' oral communication skills and wishes to verify that​ students' scores on the performance test remain relatively similar regardless of the time during the school year when the test was​ completed, which of the following kinds of consistency evidence would be most​ appropriate?

test-retest evidence of reliability

Suppose that a​ state's governor has appointed a​ blue-ribbon committee to establish a​ test-based promotion-denial system for reducing the number of​ sixth-grade students who are​ "socially" promoted to the seventh grade. The​ blue-ribbon committee's proposal calls for​ sixth-graders to be able to take a new​ high-stakes promotion exam at any time they wish during their​ grade-six school year. Given these​ circumstances, which one of the following evidences of the new promotion​ exam's measurement consistency should be​ collected?

test-retest reliability

Which of the following indices of a​ test's reliability is most often provided by developers of the kinds of standardized tests destined for use with large numbers of​ students?

​internal-consistency reliability coefficients


Set pelajaran terkait

Strategic Management Final Exam (Francis, Belmont U)

View Set

Data Structures & Algorithms with Java

View Set

Science Unit 3: Chemistry, atoms, molecules; October 20, 2014

View Set

Mood and Affect Practice Questions

View Set

Which one of the following statement is True about the health continuum?

View Set