Classroom Assessment Midterm
The standard error of measurement is focused chiefly on: A. Test content evidence of validity B. None of the above C. Relations to other variables evidence of validity D. Response processes evidence of validity
B. None of the above
Classroom teachers are most apt to focus on which of the following? A. Evidence of alternate-form reliability B. Test content evidence of validity C. Internal structure evidence of validity D. Evidence of internal-consistency reliability
B. Test content evidence of validity
A history teacher, Mrs. Scroggins, tries to determine the consistency of her tests by occasionally re-administering them to her students, then seeing how much similarity there was in the way her students performed. What kind of reliability evidence is Mrs. Scroggins attempting to collect? A. Internal consistency B. Test-retest C. Alternate form D. Precision
B. Test-retest
Which of the following is not a traditional reason that teachers assess students? A. To determine instructional effectiveness B. To clarify instructional intentions C. To monitor students' progress D. To assign grades to students
B. To clarify instructional intentions
Which one of the following statements could be technically correct? A. "The test is face-valid." B. "The test is consequentially valid." C. "The test-based inference is valid." D. "The test is definitely valid."
C. "The test-based inference is valid."
Which of the following answer choices depicts the chronologically accurate development of assessment legislation in the United States? a. ESSA, ESEA, NCLB b. ESEA, ESSA, NCLB c. ESEA, NCLB, ESSA d. NCLB, ESSA, ESEA
c. ESEA, NCLB, ESSA
Which of the following approaches to bias-elimination is it most reasonable to expect classroom teachers to use? a. Neither empirical nor judgmental approaches are reasonable for classroom teachers to use. b. Empirical approaches c. Judgmental approaches d. Both empirical and judgmental approaches are equally reasonable for classroom teachers to use.
c. Judgmental approaches
Consider the following test-item. Which of the following decisions requires educators to use quality assessment information? a. Choosing who should get into this college b. Deciding what reading group a student should be placed in c. Determining whether a student is legally disabled d. All of the above e. Only (a) and (b) Which category best describes this item? a. Matching b. Multiple binary-choice c. Multiple choice d. Binary-choice
c. Multiple choice
A district chooses a commercial test to provide information about the social studies skills and knowledge that the students seem to be having difficulty in mastering. A relatively elaborate series of alignment studies will be carried out early in the school year in an attempt to provide validity evidence to confirm this instructionally supportive usage. Which source of validity evidence will a person supervising the alignment studies rely on? a. Validity evidence based on test content b. Validity evidence based on internal structure of the social studies test c. Validity evidence based on relationships between students' test scores and other variables d. Validity evidence based on response processes
a. Validity evidence based on test content
A prominent procedure to minimize assessment bias for students with disabilities is to employ ________________. a. assessment accommodations b. classroom modifications c. individualized education programs d. positive behavior supports
a. assessment accommodations
Which of the following types of assessment targets a student's attitudes, interests, and values? a. Cognitive assessment b. Affective assessment c. Psychomotor assessment d. Standardized assessment
b. Affective assessment
Which of the following is not a step in the four steps for creating a learning progression? a. Form a basic and introductory understanding of a target curricular aim. b. Determine the measurability of each preliminary identified building block. c. Arrange all the building blocks in an instructionally sensible sequence. d. Identify all requisite precursory subskills and bodies of enabling knowledge.
a. Form a basic and introductory understanding of a target curricular aim.
Review the following mathematics item for assessment bias. Amy Johnson has a large collection of Barbie dolls. Originally, she had 49. Recently, she somehow lost 12 Barbies. How many Barbies does Amy have left? (Show your work.) a. 37 Barbies b. 61 Barbies c. 27 Barbies a. The assessment might offend people who view girls as having much broader interests than playing with dolls. b. The assessment item incorporates a discernible stereotype regarding socioeconomic status. c. The assessment item does not appear to be biased. d. This assessment item is biased against students who struggle with mathematics.
a. The assessment might offend people who view girls as having much broader interests than playing with dolls.
One important group of students in need of protection from assessment bias is English language Learners (ELLs). Therefore, it is important to understand the qualifying categories for an ELL student. Which is not a category of students considered to be English language learners (ELLs)? a. Students who are beginning to learn English but could benefit from school instruction. b. Students who are fluent in English, but prefer to speak another primary language. c. Students who are proficient in English but could use additional assistance in academic or social contexts. d. Students whose first language is not English and know little, if any, English.
b. Students who are fluent in English, but prefer to speak another primary language.
Based on the 2014 edition of the Standards for Educational and Psychological Testing, and on common sense, which one of the following statements about students' test results represents a potentially appropriate phrasing that's descriptive of a set of students' test performances? a. The students' test scores are invalid. b. Students' scores on the test permit valid interpretations for this test's use. c. The scores are valid if, and only if, they were elicited by a valid test. d. The students' test scores are valid for unlimited educational purposes.
b. Students' scores on the test permit valid interpretations for this test's use.
Which of the following is a consequence of a collaborative effort by the National Governor's Association (NGA) and the Council of Chief State School Officers (CCSSO)? a. No Child Left Behind b. The Common Core State Standards (CCSS) c. Every Student Succeeds Act d. Elementary and Secondary Education Act
b. The Common Core State Standards (CCSS)
The Elementary and Secondary Education Act contains various subsections referred to as "titles." Which of the following "titles" gets the most attention (and funding) from policymakers? a. Title III b. Title I c. Title II d. Title IX
b. Title I
Which of the following is not one of the three types of reliability evidence? a. Internal consistency b. Validity c. Alternative form d. Test-retest
b. Validity
Which of the following descriptions of validity is most accurate? a. Validity refers to the consistency with which a test measures whatever it is measuring. b. Validity refers to the accuracy of score-based interpretations for specific purposes. c. Validity describes the legitimacy of the decision to which a test-based inference will be put. d. Validity describes the degree to which a test's usage leads to appropriate consequences for students.
b. Validity refers to the accuracy of score-based interpretations for specific purposes.
Public Law 94-142 installed the use of an ___________________ to outline the educational processes for students with disabilities. a. accommodating education plan b. individualized education program c. individualized education plan d. assessment monitoring protocol
b. individualized education program
When educators collect test-based evidence to inform decisions about already completed instructional activities they are engaging in a basic form of _____________________. a. instructional assessment b. summative assessment c. standardized assessment d. formative assessment
b. summative assessment
If educators wish to accurately estimate the likelihood of consistent decisions about students who score at or near a high-stakes test's previously determined cut-score, which of the following indicators would be most useful for this purpose? a. A standard error of measurement for the entire test b. A traditionally computed internal-consistency reliability coefficient c. A conditional standard error of measurement (near the cut-score) d. A standard deviation signifying the degree of score spread among test-takers
c. A conditional standard error of measurement (near the cut-score)
Which is the most appropriate description of a learning progression? a. A learning progression is an ordered sequence of the stuff a student must learn in order to perform at a high level on a summative assessment. b. A learning progression is an ordered sequence of the stuff a teacher must teach so as to achieve a significant curricular outcome. c. A learning progression is an ordered sequence of the stuff a student must learn so as to achieve a significant curricular outcome. d. None of these appropriately describe a learning progression.
c. A learning progression is an ordered sequence of the stuff a student must learn so as to achieve a significant curricular outcome.
Which statement best characterized this nation's current use of formative assessment? a. A four-levels approach to formative assessment is used in the lower grades. b. Only a five-strategies approach to formative assessment is often encountered. c. Although research-supported, formative assessment is not widely used. d. Most teachers now employ the formative-assessment process in their classes.
c. Although research-supported, formative assessment is not widely used.
Please assume you are a middle-school English teacher who, despite this chapter's urging that you rarely, if ever, collect reliability evidence for your own tests, stubbornly decides to do so for all of your midterm and final exams. Although you wish to determine the reliability of your tests for the group of students in each of your classes, you only wish to administer the tests destined for such reliability analyses on one occasion, not two or more. Given this constraint, which of the following coefficients would be most suitable for your reliability-determination purposes? a. A standard error of measurement b. A test-retest reliability coefficient c. An internal-consistency reliability coefficient d. An alternate-forms reliability coefficient
c. An internal-consistency reliability coefficient
Which of the following conclusions regarding multiple binary-choice items has not been supported by available research? A. These items are a bit less difficult for students than multiple-choice items. B. These items are highly efficient in gathering student achievement data. C. These items are regarded by students as more difficult than multiple-choice items. D. These items tend to be more reliable than other forms of selected-response items.
A. These items are a bit less difficult for students than multiple-choice items.
What should be the two major fairness concerns of a classroom teacher who wishes to eliminate bias in the teacher's assessment instruments? A. Offensiveness and absence-of-bias in items in the teacher's tests B. Unfair penalization and reliability of items in the teacher's tests C. Offensiveness and unfair penalization of items in the teacher's tests D. Disparate impact and offensiveness of items in the teacher's tests
C. Offensiveness and unfair penalization of items in the teacher's tests
Which of the following is not an item-writing rule for the creation of binary-choice items? A. Include only a single concept in any statement. B. Phrase items so that a superficial analysis by students will suggest an incorrect answer. C. Rarely use statements containing double negatives, although single or triple negatives are acceptable. D. Keep item-length similar for both of the binary categories being assessed.
C. Rarely use statements containing double negatives, although single or triple negatives are acceptable.
What kind of evidence is most eagerly sought by the commercial testing firms that develop academic aptitude tests? A. Test content evidence of validity B. Evidence that a test is regarded by some as face-valid C. Relations to other variables evidence of validity D. Evidence that the consequences of a test's use will be appropriate
C. Relations to other variables evidence of validity
Suppose that you and several other teachers in a middle school were trying to construct a new test intended to be predictive of high-school students' subsequent scores on the SAT and ACT college admissions exams. Moreover, suppose that you were in no particular hurry to assemble validity evidence in support of the accuracy of those inferred predictions. Which source of validity evidence would supply the most compelling support for the validity of your anticipated predictions? a. Validity evidence based on the new test's internal structure b. Concurrent validity evidence based on the new test's relation to other variables c. Predictive validity evidence based on the new test's relation to other variables d. Validity evidence based on the content of your new test
c. Predictive validity evidence based on the new test's relation to other variables
Validity evidence can be collected from a number of sources. For instance, suppose that a mathematics test has been built by a school district's officials to help identify those middle-school students who are unlikely to pass a statewide 11th-grade high-school diploma test. The new test will routinely be given to the district's seventh-grade students. To secure evidence supporting the validity of this kind of predictive application, the new test will be administered to current seventh-graders, and the seventh-grade tests will also be given to the district's current eleventh-graders. This will permit the eleventh-graders' two sets of test results to be compared. Which best describes this source of validity evidence? a. The internal structure of the seventh-grade test b. Students' responses observed during the two test-taking experiences c. The relationship of 11th-graders' performances on the two tests d. Test content evidence
c. The relationship of 11th-graders' performances on the two tests
Consider the following illustrative five-option multiple-choice item. It addresses content presented in the Standards for Educational and Psychological Testing (2014) related to the fundamental notion of assessment validity. When we encounter a test whose scores are affected by processes that are quite extraneous to the test's intended purpose, we assert that the test displays which one of the following? a. Construct underrepresentation b. Construct deficiency c. Construct corruption d. Construct-irrelevant variance e. All of the above Which statement best describes the illustrative item? a. This illustrative multiple-choice item contains negatively phrased options, and thus violates a general item-writing guideline. b. The multiple-choice item illustrated here employs ambiguous directions to the test-taker and, therefore, violates one of the chapter's item-category guidelines. c. This illustrative item, because it includes an "all of the above" alternative, violates an important item-writing guideline. d. This illustrative item contains no violation of the general or item-category guidelines presented for this type of selected-response item.
c. This illustrative item, because it includes an "all of the above" alternative, violates an important item-writing guideline.
Proper formative assessment is conducted ___________________. a. prior to the beginning of the instructional process b. when summative assessment does not provide clear evaluative information c. during the instructional process d. at the conclusion of the instructional process
c. during the instructional process
Decisions linked to classroom assessments should be made _____________. a. after thorough review b. at the conclusion of the assessment c. in advance d. during initial review of the exam
c. in advance
Which of the following is the most useful indicator of the consistency of an individual student's test performance? a. A dichotomous item-analysis b. An alternate-form reliability coefficient c. A polytomous item-analysis d. A standard error of measurement
d. A standard error of measurement
Which of the following is typically recommended for use with students who have the most serious cognitive disabilities? A. Alternate assessments B. Regular assessments C. Assessment accommodations D. No assessments whatsoever
A. Alternate assessments
Which of the following is a generally recommended item-writing rule for matching items? A. In the test's directions, describe the basis for matching and the number of times a response can be used. B. Employ relatively long lists, usually containing at least two-dozen premises or responses. C. Whenever possible, employ heterogeneous lists of premises and responses. D. Typically, employ more premises than responses, allowing each response to be used more than once.
A. In the test's directions, describe the basis for matching and the number of times a response can be used.
Stability data regarding assessment consistency is an instance of: A. Test-retest reliability B. Precision reliability C. Internal-consistency reliability D. Alternate-form reliability
A. Test-retest reliability
Which of the following conceptions of assessment validity is most constant with the Standards for Educational and Psychological Testing released in 2014 by AERA, APA, and NCME? A. How consistently a test measures whatever it measures B. The accuracy of score-based interpretations for intended test uses C. The accuracy of score-based inferences about test-takers D. The appropriateness of a test's usage
B. The accuracy of score-based interpretations for intended test uses
Please review the following item for assessment bias. It was used to assess the basic computation mathematics aims being pursued by an inner-city, elementary school's staff in a Midwestern state. Ramon Ruiz is sorting out empty tin cans he found in the neighborhood. He has four piles based on different colors of the cans. He thinks he has made a mistake in adding up how many cans are in each pile. Please identify Ramon's addition statement that is in error. a. 20 bean cans plus 32 cans = 52 cans. b. 43 bean cans plus 18 cans = 61 cans. c. 38 bean cans plus 39 cans = 76 cans. d. 54 bean cans plus 12 cans = 66 cans A. The assessment item does not appear to be biased. B. These are not two major causes of assessment biased encountered in typical educational tests. C. The assessment item appears to be biased against Americans of Latino backgrounds. D. The assessment item appears to be biased in favor of recyclers.
C. The assessment item appears to be biased against Americans of Latino backgrounds.
Which of the following statements best describes the relationship among the three sanctioned forms of reliability evidence? A. All three types of evidence are essentially equivalent. B. Test-retest reliability evidence is more important than either internal-consistency evidence or alternate-form evidence. C. The three forms of evidence represent fundamentally different ways of representing a test's consistency. D. The three forms of evidence differ in their significance because internal-consistency evidence of a test's reliability is a necessary condition for the other two types of consistency.
C. The three forms of evidence represent fundamentally different ways of representing a test's consistency.
Of the following four statements, one is not a guideline to be followed when constructing multiple-choice items. Which statement is it? A. Never use "all of the above" as a response option. B. Randomly assign correct answers to the available answer-choice positions. C. To keep stems brief, place most words in an item's alternatives. D. Don't let the length of alternatives suggest correct or incorrect answers.
C. To keep stems brief, place most words in an item's alternatives.
Which of the following is not a general item-writing rule for classroom assessments? A. Do not employ sentence-structures unlikely to be easily understood by students. B. Do not provide unclear directions to students about how to respond to an assessment. C. Do not employ words, phrases, or sentences apt to be regarded as ambiguous to students. D. Do not inform students about how much the items on a test will be weighted.
D. Do not inform students about how much the items on a test will be weighted.
Which of the following should be most influential in guiding those who create educational tests? A. The Common Core State Standards B. Test specifications of the PARCC Assessment Consortium C. Test specifications of the Smarter Balanced Assessment Consortium D. The AERA-APA-NCME Standards for Educational and Psychological Testing
D. The AERA-APA-NCME Standards for Educational and Psychological Testing
Which of the following views regarding the assessment of English language learners (ELL) is most defensible? A. Almost all groups charged with the assessment of ELL students urge the use either of translated tests or, failing that, the provision of interpreters for ELL students. B. It is relatively easy to develop alternate tests for ELL students in those students' native languages. C. ELL students, if tested with regular English-language tests, will often outperform students whose native language is English. D. The use of assessment accommodations for ELL students typically leads to more valid test-based inferences about those students.
D. The use of assessment accommodations for ELL students typically leads to more valid test-based inferences about those students.
The National Assessment of Educational Progress (NAEP): A. was first established in 1996 by the U.S. Congress to serve as an accountability oriented nation's "report card." B. is a mandatory examination in reading and mathematics that must be taken annually by a sample of students in each state. C. sets forth prescribed assessment frameworks at three grade levels, such frameworks to be followed by the 50 states in framing their own state content standards. D. assesses national samples of U.S. students at three grade levels every few years in certain academic subjects.
D. assesses national samples of U.S. students at three grade levels every few years in certain academic subjects.
The Common Core State Standards were an attempt to outline what students should know at each grade level in which of the following subjects? a. English Language Arts and Mathematics b. Science and Mathematics c. English Language Arts and Science d. Mathematics and Social Studies
a. English Language Arts and Mathematics
Which of the following is a recommended item-writing rule for the construction of binary-choice items? a. Employ a roughly equal number of statements representing the two categories being tested. b. If one category being assessed requires longer statements than the other category, be sure that the disparity in statement-length is constant. c. Employ relatively few double-negative statements and, if you do, be sure to emphasize with italics or bold-face type that a negative is involved. d. Include no more than two concepts in any one statement.
a. Employ a roughly equal number of statements representing the two categories being tested.
Reliability coefficients range from ________________. a. -1.00 to +1.00 b. 0 to 1.00 c. -100.00 to +100.00 d. -10.00 to +10.00
a. -1.00 to +1.00
Please select the one answer that most accurately identifies the particular item's "quoted" reason for teachers to know about assessment. "Teachers need to give classroom assessments in order to assign grades to students indicating how well each student has attained the learning outcomes set for them." This is: a. A traditional reason for teachers to know about assessment b. One of today's reasons for teachers to know about assessment c. Both a traditional reason and one of today's reasons for teachers to know about assessment d. Neither a traditional reason nor one of today's reasons for teachers to know about assessment
a. A traditional reason for teachers to know about assessment
Which of the following terms refers to the degree to which there is a meaningful agreement between two or more of the following: curriculum, instruction, and assessment? a. Alignment b. Common core c. Continuity d. Standardization
a. Alignment
Which of the following is a reasonable explanation for why the word assessment may be used over the word testing? a. Assessment is a broader descriptor of the type of measurement practices in which teachers engage. b. The word testing produces anxiety in students. c. Students are no longer asked to participate in testing given the more nuanced measurement approaches required by federal legislation. d. Teachers would rather avoid the word testing as it creates situations where classroom management becomes difficult.
a. Assessment is a broader descriptor of the type of measurement practices in which teachers engage.
Which category of test items best describes the following item: According to educators, one of the major advantages of the Every Student Succeeds Act is that it forces schools and teachers to focus only on the important material included in the state test. a. True b. False a. Binary-choice b. Matching item c. Multiple binary-choice d. Multiple choice
a. Binary-choice
Which category of test items best describes the following: True or False: Mount Everest is the tallest mountain Earth. a. Binary-choice b. Multiple choice c. Matching item d. Multiple binary-choice
a. Binary-choice
Considering the knowledge you've gained regarding formative assessment in this chapter, which of the following characteristics of a class discussion could yield formative assessment information? a. Class discussions that show what students are thinking. b. Class discussions that get all students involved. c. Class discussions that are about important topics. d. None of these types of class discussions would yield formative assessment information.
a. Class discussions that show what students are thinking.
How does "classification consistency" differ conceptually from more traditional indicators of test reliability? a. Classification-consistency approaches are focused more on capturing the degree of students' consistent categorizations rather than supplying only numerical indices of students' score consistency. b. Whereas the three traditional forms of reliability evidence are largely interchangeable, classification-consistency approaches to reliability are truly distinctive. c. In contrast to traditional reliability approaches, if there are no actual decisions linked to a test's use, it is impossible to determine a test's classification consistency. d. Classification-consistency approaches do not employ numerical indicators of an assessment's consistency, unlike traditional reliability procedures.
a. Classification-consistency approaches are focused more on capturing the degree of students' consistent categorizations rather than supplying only numerical indices of students' score consistency.
Which represents the most appropriate definition of formative assessment? a. Formative assessment is a planned process in which assessment-elicited evidence of students' status is used by teachers to adjust their ongoing instructional procedures or by students to adjust their current learning tactics. b. Formative assessment is a testing structure in which teachers assess unplanned student activities and use the results to inform their instruction. c. Formative assessment is an unplanned process in which assessment-elicited evidence of students' status is used by teachers to adjust their ongoing instructional procedures or by students to adjust their current learning tactics. d. Formative assessment is a testing structure in which student evidence is used by teachers to assign quantitative scores to student work.
a. Formative assessment is a planned process in which assessment-elicited evidence of students' status is used by teachers to adjust their ongoing instructional procedures or by students to adjust their current learning tactics.
Cronbach's coefficient alpha and the Kuder-Richardson reliability formulae are examples of: a. Internal-consistency coefficients b. Test-retest coefficients c. Alternate-form coefficients d. None of the above
a. Internal-consistency coefficients
Which category of test items best describes the following: Consider these three categories of test items: multiple-choice, binary, and matching. Choose the appropriate term to match the description: _____ Multiple choice (a.) This type of question offers the test-taker only two options from which to choose. _____ Binary (b.) This type of question offers the test-taker several options from which to choose. _____ Matching (c.) This type of question may ask the test taker to attach vocabulary words with their proper definition. a. Matching b. Binary-choice c. Multiple binary-choice d. Multiple choice
a. Matching
Which of the following pieces of federal legislation had notably been an attempt at reversing the growing achievement gap that left poor and minority students in failing schools while requiring a "show us" approach to student evaluation? a. No Child Left Behind b. Elementary and Secondary Education Act c. Every Student Succeeds Act d. Individuals with Disabilities Education Act
a. No Child Left Behind
Please select the one answer that most accurately identifies the particular item's "quoted" reason for teachers to know about assessment. "Just a year ago, the voters in our school district voted favorably in a huge school-levy election that brought in substantial tax dollars for our schools. Most of the district's teachers are convinced that this positive support for the schools was based on our schools' consistently high rankings on the state's annual accountability tests." This is: a. One of today's reasons for teachers to know about assessment b. A traditional reason for teachers to know about assessment c. Both a traditional reason and one of today's reasons for teachers to know about assessment d. Neither a traditional reason nor one of today's reasons for teachers to know about assessment
a. One of today's reasons for teachers to know about assessment
Which strategy seems most suitable for teachers to use when trying to detect and eliminate assessment bias in their own teacher-made tests? a. Teachers should pay particular attention to the possibility that assessment bias may have crept into their teacher-made tests and should strive to rely on their best judgments about the presence of such bias on all of their classroom tests but especially on their most significant classroom assessments. b. Because decisions based on hard evidence will almost always be more defensible than decisions based dominantly on human judgment, teachers should identify potentially biased items in their teacher-made tests by using empirical methodsand only thereafter confirm those identifications using human judgment. c. Given that students' self-identification of potentially biased items can prove remarkably illuminating to teachers regarding the items in their teacher-made tests, all significant classroom assessments should provide an opportunity for students themselves to indicate that they regarded an item as biased and to indicate the nature of this bias while actually completing a test's items. d. Because teachers spend so much time focusing on avoiding bias, it is highly unlikely that teacher-produced exams contain bias.
a. Teachers should pay particular attention to the possibility that assessment bias may have crept into their teacher-made tests and should strive to rely on their best judgments about the presence of such bias on all of their classroom tests but especially on their most significant classroom assessments.
Which of the following sources of validity evidence are teachers most likely to collect? a. Test content b. Response processes c. Relations to other variables d. Internal structure
a. Test content
If a multi-state assessment consortium has generated a new performance test of students' oral communication skills and wishes to verify that students' scores on the performance test remain relatively similar regardless of the time during the school year when the test was completed, which of the following kinds of consistency evidence would be most appropriate? a. Test-retest evidence of reliability b. Internal-consistency evidence of reliability c. Alternate-form evidence of reliability d. The standard error of measurement
a. Test-retest evidence of reliability
Consider the illustrative binary-choice item. Please decide whether the following statement regarding the reliability of educational tests is True or False. Please place a check behind the True or False to indicate your answer. True ___ False___ When determining a test's classification consistency, there is no need to consider the cut score employed nor that cut score's location in the score distribution. Which statement best describes the illustrative item? a. This illustrative item violates the item-specific guideline regarding the use of negative statements in a binary-choice item. b. This illustrative item is quite consistent with the item-specific guideline regarding the phrasing of items to elicit wrong answers. c. This illustrative item, regrettably, relies on particularly complicated syntax and, therefore, is apt to confuse test-takers. d. This illustrative item violates none of the chapter's general item-writing guidelines or the specific guidelines for writing binary-choice items.
a. This illustrative item violates the item-specific guideline regarding the use of negative statements in a binary-choice item.
Which source of validity evidence should be of most interest to teachers when evaluating their own teacher-made tests? a. Validity evidence based on test content b. Validity evidence based on response processes of students' taking the tests c. Validity evidence based on the internal structure of the teacher-made tests d. Validity evidence based on relationships between students' scores on teacher-made tests and those students' performances on other variables
a. Validity evidence based on test content
Which of the following statements most accurately reflects the relationship between students' aptitude and their achievement? a. Whereas aptitude tends to reflect potential, achievement tends to reflect prior learning. b. The level of a student's aptitude can never exceed the level of the student's achievement. c. Actually, achievement is little more than an operationalization of aptitude. d. Both aptitude and achievement are equivalent to a traditional conception of intelligence.
a. Whereas aptitude tends to reflect potential, achievement tends to reflect prior learning.
Reliability refers to the ___________________. a. consistency of the test scores b. relevancy of the test scores c. usefulness of the test scores d. None of the provided answer choices
a. consistency of the test scores
One of the important rules to be followed in creating multiple binary-choice items is that: a. Most items should mesh sensibly with a cluster's stimulus material. b. The stimulus material for any cluster of items should contain a substantial amount of extraneous information. c. Item clusters should be strikingly separated from one another. d. Multiple binary-choice items, to avoid confusion, should never be included in a test already containing binary-choice items.
c. Item clusters should be strikingly separated from one another.
Please select the one answer that most accurately identifies the particular item's "quoted" reason for teachers to know about assessment. "Wishing that students will make progress does not guarantee that students actually will do so. And this is why I believe teachers have a fundamental responsibility to monitor their students' progress throughout the school year. I try to administer informal progress-monitoring quizzes every few weeks to make sure my instruction is "taking." If my instruction is not working as well as I want it to work, then I can make modifications in my upcoming teaching plans. Assessment-based monitoring of students' progress is so very sensible that it's hard for me to understand why it is not more widely used. This is: a. One of today's reasons for teachers to know about assessment b. A traditional reason for teachers to know about assessment c. Both a traditional reason and one of today's reasons for teachers to know about assessment d. Neither a traditional reason nor one of today's reasons for teachers to know about assessment
b. A traditional reason for teachers to know about assessment
Which of the following is the best definition for the concept of accessibility for students with disabilities? a. Accessibility refers to the notion that some test takers must have an unobstructed opportunity to demonstrate their status with respect to the construct(s) being measured by an educational test. While most students have an unobstructed opportunity, educators only need to focus on a small few. b. Accessibility refers to the notion that all test takers must have an unobstructed opportunity to demonstrate their status with respect to the construct(s) being measured by an educational test. This is a key component of supporting individuals with disabilities. c. Accessibility refers to the notion that most test takers must have an unobstructed opportunity to demonstrate their status with respect to the construct(s) being measured by an educational test. It is impossible to achieve this for everyone, but is achievable for most students. d. None of these are reasonable definitions.
b. Accessibility refers to the notion that all test takers must have an unobstructed opportunity to demonstrate their status with respect to the construct(s) being measured by an educational test. This is a key component of supporting individuals with disabilities.
Suppose that the developers of a new science achievement test had inadvertently laden their test's items with gender-based stereotypes regarding the role of women in science and, when the new test was given, the test scores of girls were markedly lower than the test scores of boys. Which of the following deficits most likely accounts for this gender disparity in students' scores? a. Concurrent invalidity b. Construct-irrelevant variance c. Construct underrepresentation d. Predictive invalidity
b. Construct-irrelevant variance
Which of the following pieces of federal legislation attempts to install greater degrees of flexibility so that states and districts can particularize their programs for implementing chief provisions of the current successor to ESEA? a. NCLB b. ESSA c. ESEA d. IDEA
b. ESSA
Assume a state's education authorities have recently established a policy that, in order for students to be promoted to the next grade level, those students must pass a state-supervised English and language arts (ELA) exam. Administered near the close of grades three, six, and eight, the three new grade-level exams are intended to determine a student's mastery of the official state-approved ELA curricular targets for those three grades. As state authorities set out to provide support for these promotion-denial exams, which one of the following sources of validity evidence are they likely to rely on most heavily? a. Evidence based on responses of students as they take a grade-level test b. Evidence based on test content c. Evidence based on the test scores' relationship to other variables d. Evidence based on internal structure of each of the tests
b. Evidence based on test content
Which of the following questions is not an element in a research-supported conception of formative assessment? a. Formative assessment is a process, not a test. b. Formative assessment should be used only by teachers to adjust their ongoing instructional activities. c. Formative assessment must be carefully planned. d. Formative assessment calls for the use of assessment-elicited evidence in making adjustment decisions.
b. Formative assessment should be used only by teachers to adjust their ongoing instructional activities.
Which of the following represents the most appropriate strategy by which to support the validity of score-based interpretations for specific uses? a. Assembly of test-users' personal perceptions regarding the accuracy of their own score-based interpretations of test-takers' performances b. Generation of an evidence-laden validity argument in support of a particular usage-specified score interpretation c. Isolation of as much as possible inference-linked validity evidence, both positive and negative, regarding how to interpret test-takers' scores d. Collection of validity-relevant evidence, as well as validity-irrelevant evidence, regarding the best and the worst ways to give meaning to students' test scores
b. Generation of an evidence-laden validity argument in support of a particular usage-specified score interpretation
Which term best describes the type of measurement that would yield the following feedback: Jonathan scored within the 92nd percentile on the SAT? a. Assessment b. Classroom-created measurement c. Norm-referenced measurement d. Criterion-referenced measurement
c. Norm-referenced measurement
Why do some members of the measurement community prefer to use the phrase "absence-of-bias" rather than "assessment bias" when quantitatively reporting the degree to which an educational test appears to be biased? a. Because "absence of bias" does not include the qualifier "assessment" and, thus, is more comprehensive in its applicability. b. Because the quantitative indices typically employed when describing the extent to which a test is biased are most commonly conceptualized in a negative rather than a positive fashion. c. Because both reliability and validity, two key attributes of educational tests, are positive qualities. "Absence-of-bias" is a positive quality to be sought in educational tests. d. None of the provided answer choices is accurate.
c. Because both reliability and validity, two key attributes of educational tests, are positive qualities. "Absence-of-bias" is a positive quality to be sought in educational tests.
Consider the following test item. Your primary concern in selecting techniques to assess a learning objective or objectives should be classroom practicality and efficiency. a. True b. False Which category best describes this item? a. Multiple binary-choice b. Multiple choice c. Binary-choice d. Matching
c. Binary-choice
Please select the one answer that most accurately identifies the particular item's "quoted" reason for teachers to know about assessment. "I was quite surprised when our state's department of education insisted that each of the state's teachers collect accurate evidence of their students' growth because such evidence was to be used in evaluating all of the state's teachers. I have, for my entire career, collected pretest and posttest evidence of my students' achievement status because this helps me irrespective of what the state wants me to do to determine which changes, if any, are needed during next year's instruction." This is: a. A traditional reason for teachers to know about assessment b. One of today's reasons for teachers to know about assessment c. Both a traditional reason and one of today's reasons for teachers to know about assessment d. Neither a traditional reason nor one of today's reasons for teachers to know about assessment
c. Both a traditional reason and one of today's reasons for teachers to know about assessment
These standards, released by the Council of Chief State School Officers and the National Governors Association for Best Practices were an attempt to establish continuity and consistency across varying state curricular aims. a. Every Student Succeeds Act b.Individuals with Disabilities Education Act c. Common Core State Standards d. No Child Left Behind
c. Common Core State Standards
One of the most commonly misused terms in educational jargon is the word "standards." In reality, there is no singular, all-encompassing concept of a standard, but rather more specific subtypes of educational standards. Which of the following subtypes of standards could best be described as "the knowledge or skills that educators want students to learn"? a. Teaching standard b. Performance standard c. Content standard d. Academic standard
c. Content standard
Ms. Brown attempted to design an assessment that assessed the reading comprehension of her students in regard to their ability to comprehend informative text. After reviewing her drafted assessment, she realized she had relied on other kinds of text and had not included an adequate amount of informative text. Which phrase best describes Mrs. Brown's mistake? a. Inaccurate assessment formation b. Test bias c. Content underrepresentation d. Construct-irrelevant variance
c. Content underrepresentation
When students' test scores are-as predicted-correlated positively with those students' scores on a test aimed at a similar measurement mission, of what is this an example? a. Construct-irrelevant variance b. Divergent validity evidence c. Convergent validity evidence d. Construct underrepresentation
c. Convergent validity evidence
Which of the following was a major shift in the 2014 AERA-APA-NCME Standards for Educational and Psychological Testing? a. Formally sanctioning both the construct and the label: "Consequential Validity." b. Adding a complete chapter on affective assessment for individual students. c. Defining assessment validity as inference-accuracy for a specific purpose. d. Deleting a significant chapter of previous standards dealing with "fairness."
c. Defining assessment validity as inference-accuracy for a specific purpose.
Which one of the following pairs of validity evidence most frequently revolves exclusively around judgments focused on test content? a. Clearly explicated test developers' statements about what is to be assessed and correlations between test-takers' scores and their performances on external variables b. Descriptions of developmental care and correlations of test-takers' performances with their scores on similar tests c. Developmental-care documentation and external content reviews by nonpartisan judges d. Careful analyses of a test's internal structure and test-developers' descriptions of the care with which a test was built
c. Developmental-care documentation and external content reviews by nonpartisan judges
Which of the following best captures the testing comparison between NCLB and ESSA? a. ESSA demands more stringent assessment than NCLB. b. ESSA eliminates NCLB's focus on fairly assessing special subgroups of students. c. ESSA permits greater state-determinations of testing than NCLB. d. ESSA emphasizes more appropriate testing of female students than NCLB.
c. ESSA permits greater state-determinations of testing than NCLB.
Classroom assessment in public schools is often a function of federal legislation. Which of the following pieces of legislation is historically considered to have had the greatest impact on public school testing policies? a. No Child Left Behind b. Individuals with Disabilities Education Act of 2004 c. Elementary and Secondary Education Act (ESEA) of 1965 d. Every Student Succeeds Act
c. Elementary and Secondary Education Act (ESEA) of 1965
Which of the following terms best describes the means teachers employ in their attempt to promote students' achievement of the curricular ends being sought? a. Curriculum b. Measurability c. Instruction d. Assessment
c. Instruction
If Mr. Higgins, a fourth-grade teacher, tries to evaluate his major exams by ascertaining the degree to which his test's items are functioning in a similar manner, what kind of test-evaluative evidence is this? a. Test-retest reliability evidence b. Alternate-form reliability evidence c. Internal-consistency reliability evidence d. None of the above
c. Internal-consistency reliability evidence
If a teacher's students include children with disabilities or children who are English language learners, which assertion about assessment bias is most defensible? a. In view of the inherent difficulties that students with disabilities and English language learners are bound to experience in completing most educational tests, meaningful relaxations should be allowed in the levels of assessment challenges given to those two groups of students. b. Given the atypical nature of these two groups of students and the inability of most test accommodations to adequately level the playing field, there is really no need for test-developers to give any extra attention to bias-reduction for either of these two special groups of students. c. Students with disabilities and English language learners are at no greater risk for experiencing assessment bias. d. Because assessment bias erodes the validity of inferences derivative from students' test performances, even greater effort should be made to reduce assessment bias when working with these two distinctive populations.
d. Because assessment bias erodes the validity of inferences derivative from students' test performances, even greater effort should be made to reduce assessment bias when working with these two distinctive populations.
One of the following rules for the construction of essay items is accurate. The other three rules are not. Which is the correct rule? a. Give students an opportunity to match their achievement levels with the essay test by allowing them to choose, from optional items, those they will answer. b. Judge the quality of a given set of essay items by seeing how accurately a tryout group of students can comprehend what responses are sought. c. Force students to allocate their time judiciously by never indicating how much time should be expended on a particular item. d. Construct all essay items so the student's task for each item is unambiguously described.
d. Construct all essay items so the student's task for each item is unambiguously described.
Which term best describes the type of measurement that would yield the following feedback: Jonathan mastered 92 percent of the tested content? a. Assessment b. Classroom-created measurement c. Norm-referenced measurement d. Criterion-referenced measurement
d. Criterion-referenced measurement
Differential item functioning (DIF) is employed in connection with which of the following approaches to bias-detection? a. Neither empirical nor judgmental approaches b. Both empirical and judgmental approaches c. Judgmental approaches d. Empirical approaches
d. Empirical approaches
Which of the following rules is often recommended for the generation of matching items? a. Order all of the premises alphabetically, but arrange the responses in an unpredictable manner. b. Ideally, both the premises and the responses should represent fundamentally heterogeneous lists. c. Place the premises for an item on one page, then put most of the responses for that item on the following page. d. Employ relatively brief lists, placing the shorter words or phrases at the right.
d. Employ relatively brief lists, placing the shorter words or phrases at the right.
Which of the following assertions most accurately captures the relationship between disparate impact of a test and assessment bias? a. If a test is biased, it will only rarely have a disparate impact on different student subgroups. b. A test that is biased against either gender group will almost certainly be biased against ethnic groups. c. If a test has a disparate impact on different student subgroups, the test is a priori biased. d. If a test has a disparate impact on different student subgroups, the test is not necessarily biased.
d. If a test has a disparate impact on different student subgroups, the test is not necessarily biased.
Please select the one answer that most accurately identifies the particular item's "quoted" reason for teachers to know about assessment. "Just as physicians need to know about patients' blood pressure and what it indicates, teachers need to know about educational testing. It is simply part of what a solid educational professional needs to understand." This is: a. One of today's reasons for teachers to know about assessment b. A traditional reason for teachers to know about assessment c. Both a traditional reason and one of today's reasons for teachers to know about assessment d. Neither a traditional reason nor one of today's reasons for teachers to know about assessment
d. Neither a traditional reason nor one of today's reasons for teachers to know about assessment
Which of the following non-profit organizations played a significant role in the rapid adoption of the Common Core State Standards? a. The National Education Association b. The American Federation of Teachers c. The Council for Exceptional Children d. The Bill and Melinda Gates Foundation
d. The Bill and Melinda Gates Foundation
Which of the following is generally conceded to be a key component of formative assessment? a. A teacher's exclusive reliance on the collection of data using constructed-response tests. b. Data obtained via standardized achievement tests c. A heavy emphasis on using students' classroom test results as a dominant factor in determining students' grades d. The framework provided by a learning progression's building blocks
d. The framework provided by a learning progression's building blocks
What is the chief function of validity evidence when employed to confirm the accuracy of score-based interpretations about test-takers' status in relation to specific uses of an educational test? a. To underscore the importance of consequential validity that's intended to verify that the particular uses for test-takers' scores are legitimate b. To explore the full range of potential uses for educational tests originally built to support one or, at most, two specific uses of test-takers' results c. To confirm rival hypotheses that might legitimately challenge a proposed interpretation of a test-taker's performance d. To support relevant propositions in a validity argument that's marshaled to determine the defensibility of certain score-based interpretations
d. To support relevant propositions in a validity argument that's marshaled to determine the defensibility of certain score-based interpretations