401 Final
Which of the following is the most often misinterpreted score-interpretation indicator used with standardized tests? a)Grade-equivalent score b)Percentile c)Stanine d)Raw score
a)Grade-equivalent score
A substantial number of educators regard affective curricular aims as being equal in importance to cognitive curricular aims—or, possibly, of even greater importance. a) True b) False
a) True
For an affective self-report inventory, very young students can be asked to reply to simple statements —sometimes presented orally—by the use of only two or three agreement-options per statement. a) True b) False
a) True
Generally speaking, teachers can make defensible decisions about the impact of affectively oriented instruction by arriving at group-focused inferences regarding their students' affective status prior to and following that instruction. a) True b) False
a) True
Most teachers want their students, at the close of an instructional period, to exhibit subject-approaching tendencies (that is, an interest in the subject being taught) equal to or greater than the subject-approaching tendencies those students displayed at the beginning of instruction. a) True b) False
a) True
Multifocus self-report affective inventories typically contain far fewer items related to each affective variable being measured than do traditional Likert inventories. a) True b) False
a) True
One of the more difficult decisions to be faced when constructing a multifocus self-report affective inventory is arriving at a response to the following question: How many items are needed for each affective variable being measured? a) True b) False
a) True
When teachers administer an affective assessment to their students early in an instructional program and intend to administer the same or a similar assessment to their students later, the assessments often have a substantial impact on the teacher's instruction. a) True b) False
a) True
Whenever self-report inventories are employed to measure students' affective dispositions, students' perceptions that their responses are anonymous are typically far more important than is "actual" anonymity. a) True b) False
a) True
Whereas most cognitively oriented classroom assessments attempt to measure students' optimal performances, affectively oriented classroom assessments attempt to get an accurate fix on students' typical dispositions. a) True b) False
a) True
Students' affect is most often measured in school because evidence of students' affect is believed to help predict how students are apt to behave in particular ways later—when those students' educations have been concluded. a) True b) False
a) True
About a month into the new school year, high school teacher Rodney Gardner used a 25-item multiple-choice test to measure how well students had mastered a rather large array of factual information about key events in U.S. history. He calculated the p-value for each of the test's 25 four-option items, and he was relatively pleased when the average p-value for the entire set of 25 items was .56. Much later in the year, with the same class, he tried out another assessment tactic with a brand-new test consisting of 40 true/false items. When he calculated p-values for each of the 40 items, he was gratified to discover that the average p-value was .73. Mr. Gardner concluded that because of the p-value "bump" of .17, his students' learning had increased substantially. Please select from the four choices below the statement that most accurately describes Mr. Gardner's interpretation of his students' test results. a)A serious flaw in Mr. Gardner's conclusion about his students' hypothetical improved learning is that the increase in p value is a univariate result of student learning. b)Mr. Gardner's interpretation of substantially increased student learning is erroneous because p-values are solely indicative of the proportion of correct answers produced in response to particular test items, hence such p-values are almost completely unrelated to the degree of students' learning. c)Regrettably, Mr. Gardner's interpretation was clearly in error because when comparing students' performances on different tests, it is customary to use students' average raw scores rather than p-values, so the p-value comparison he made is likely to be misleading. d)Mr. Gardner's interpretation of a "bump" in student learning is mistaken because lower p-values are indicative of more difficult items, so Mr. Gardner's multiple-choice exam early in the school year contained more difficult items than the binary-choice exam he administered later in the school year.
a)A serious flaw in Mr. Gardner's conclusion about his students' hypothetical improved learning is that the increase in p value is a univariate result of student learning.
Which of the following is an instructionally beneficial rubric? a)A skill-focused rubric b)A task-specific rubric c)A hypergeneral rubric d)None of the above
a)A skill-focused rubric
Mr. McMillan was busy assigning grades to his sixth-grader's science fair projects. One of his start students, Darren, submitted an excellent project. However, Mr. McMillan felt the project didn't represent the full extent of Darren's potential, so he gave Darren a B. What type of grading is Mr. McMillan applying? a)Aptitude-based grading b)Absolute grading c)Relative grading d)None of these is correct
a)Aptitude-based grading
The teaching staff in a suburban middle school is concerned with the quality of their school's teacher-made classroom assessments. This issue has arisen because the district school board has directed all schools to install a teacher-evaluation process featuring prominently weighted evidence of students' learningas measured chiefly by teacher-made tests. The district office requires teachers to submit all students' responses from each classroom assessment immediately after those assessments have been administered. Then, in less than 2 weeks after submission, teachers receive descriptive statistics for each test (such as students' means and standard deviations). Teachers also receive an internal consistency reliability coefficient for the total test and a p-value and an item-discrimination index for each item. Teachers then must personally judge the quality of their own tests' items. The teachers' reviews of their test's individual items are seen as subjective by almost everyone involved, whereas the empirical evidence of item quality is regarded as objective. The school's faculty unanimously decides to weight teachers' own per-item judgments at 25 percent while weighting the statistical per-item p-values and item-discrimination indices at 75 percent. Please select the statement that most accurately characterizes the test-improvement procedures in this suburban middle school. a)Because the relevance of traditional item-quality indicators, such as those supplied by this school's district office, can vary depending on the specific use to which a teacher-made test will be put, the across-the-board weightings (25 percent judgmental; 75 percent empirical) may be inappropriate for the proposed teacher-evaluation process. b)The teaching staff's unanimous judgment about the relative per-item weightings of teachers' judgments and per-item empirical indicators is essentially backward - and should have been 75 percent weighting of judgmental evidence and 25 percent evidence for empirical evidence. c)Because of the imprecision of both judgmental and empirical indicators of an item's quality, the only truly defensible weightings of those two categories of item-quality evidence should be identical, with 50 percent for empirical indicators and 50 percent for judgmental indicators. d)Once a differential per-item weighting contrast has been decided on for judgmental and empirical indicators of items' quality, that difference must remain constant for all items involved in a test-improvement effort.
a)Because the relevance of traditional item-quality indicators, such as those supplied by this school's district office, can vary depending on the specific use to which a teacher-made test will be put, the across-the-board weightings (25 percent judgmental; 75 percent empirical) may be inappropriate for the proposed teacher-evaluation process.
Which of the following steps is not one that should be followed in the creation of a multifocus affective inventory for use in classroom assessment? a)Create a series of exclusively positive statements related to each affective variable selected. b)Determine the number and phrasing of students' response options. c)Select the affective variables to measure. d)Decide on how many items to use in measuring each variable.
a)Create a series of exclusively positive statements related to each affective variable selected.
Mr. Ramirez administered an instructionally diagnostic test to his students in order to better understand how to proceed in his teaching of fractions. The diagnostic test he administered took him nearly three days to score, thus countering its practical usefulness. Which attribute of an instructionally diagnostic test did this test violate? a)Ease of usage b)Item quality c)Curricular alignment d)Sufficiency of items
a)Ease of usage
Which is not a factor around which teachers should structure the evaluation of the quality of their instruction? a)Evidence constructed exclusively from classroom observations b)Assessment evidence collected via classrooms assessments c)Evidence pertaining to any positive or negative side effects d)Assessment evidence collected via accountability tests
a)Evidence constructed exclusively from classroom observations
Ms. Cooke computed some basic descriptive statistics for each of her two Algebra I classes. Her first period class had a standard deviation of 7.6. Her second period class had a standard deviation of 4.5. Which class has a greater variability in test scores and how do you know? a)First period has a greater variability because the standard deviation of 7.6 is larger than the standard deviation of 4.5. b)There is not enough information given to make this determination. c)Second period has a greater variability because the standard deviation of 4.5 is smaller than the standard deviation of 7.6. d)The standard deviation is not the proper statistic for determining variability.
a)First period has a greater variability because the standard deviation of 7.6 is larger than the standard deviation of 4.5.
Which best describes the two types of evaluation processes under which teachers are routinely evaluated? a)Formative and summative evaluations b)Summative and instructional evaluations c)Formative and instructional evaluations d)None of the provided answer choices is correct
a)Formative and summative evaluations
Consider the three statements. Which are considered characteristics of portfolio assessments? I. Represents the range of reading and writing students are engaged in II. Engages students in assessing their progress and/or accomplishments and establishing ongoing learning goals III. Mechanically scored or scored by teachers who have little input a)I and II b)III only c)II only d)I, II, and III
a)I and II
Which of the following contentions about affective assessment in the classroom is not accurate? a)If classroom affective assessment is introduced, it substantially diminishes the attention given to the measurement of higher-level cognitive outcomes. b)Anonymity enhancement procedures for classroom affective assessment are extremely desirable. c)Classroom affective assessment typically focuses a teacher's instructional concerns more directly on the promotion of affective objectives for students. d)If affective assessment is to take place in classrooms, many teachers will need to learn about affect-focused instructional techniques.
a)If classroom affective assessment is introduced, it substantially diminishes the attention given to the measurement of higher-level cognitive outcomes.
Which of the following is not one of the five review criteria for judgmentally based improvement procedures listed in Chapter 11? a)Item's ease of grading b)Accuracy of content c)Contribution to score-based inference d)Fairness
a)Item's ease of grading
Which of the following is the most troublesome problem facing those educators who wish to rely heavily on the use of performance tests? a)Making valid inferences about students' generalized skill-mastery b)Generating tasks for performance tests c)Persuading students to respond seriously to a performance test's task(s) d)Scoring students' responses to a performance test's task(s)
a)Making valid inferences about students' generalized skill-mastery
Ms. Troy is the principal of Sunnyside Elementary School. In order to evaluate one of her first-grade teachers, Mrs. Stelter, Ms. Troy sits down and observes a 30-minute lesson. At the conclusion of the lesson, Ms. Troy concludes that Ms. Stelter is an impactful teacher. What is the main issue with Ms. Troy's evaluation process? a)Ms. Troy has not collected any outcome data. b)There is no problem with Ms. Troy's process or conclusion. c)Ms. Troy needs to observe Ms. Stelter for more than one day. d)Ms. Troy needs to observe Ms. Stelter for longer than 30 minutes.
a)Ms. Troy has not collected any outcome data.
Consider the three statements. Which are considered characteristics of portfolio assessments? I. Assesses all students on the same dimensions. II. Addresses achievement only. III. Separates learning, testing, and teaching. a)None of these represent characteristics of portfolio assessment. b)I and III c)I only d)I and II
a)None of these represent characteristics of portfolio assessment.
Which term represents the number of items that a student has answered correctly on an assessment? a)Raw score b)Mean c)Median d)Range
a)Raw score
Mrs. Kate administered a test on the scientific method to her 10th grade Biology students. When Mrs. Kate's administrator reviewed her assigned grades on the test he noticed that the grades represented a normal distribution. When he asked why, Mrs. Kate said, "The students who performed best in the class received the highest grades and then grades were distributed normally based on class performance." What type of grading is Mrs. Kate describing? a)Relative grading b)Absolute grading c)Aptitude-based grading d)None of these is correct
a)Relative grading
Scale scores are converted raw scores that use a new, arbitrarily chosen scale to represent levels of achievement or ability. What is one clear advantage of a scale score? a)Scale scores allow for the comparison of several equidifficult forms of a test. b)Scale scores are easier to read than raw scores. c)Scale scores are more accurate than raw scores. d)None of these represent an advantage of scale scores.
a)Scale scores allow for the comparison of several equidifficult forms of a test.
Mr. Miller has decided to incorporate a pretest/posttest design to his classroom evaluation procedures in order to gather better classroom data to guide his instruction. He created two equidifficult forms of the test he plans to use, divided his class in half, and administered one of the two forms to each half of his class. He then administered the opposite version of the test to each different half. So, in the end, each half of the class had taken both tests. What type of testing design is Mr. Miller following? a)Split-and-switch design b)Blind-scoring design c)Pretest - posttest design d)None of these answer choices is correct.
a)Split-and-switch design
Which of the following is not a reason that should dissuade policymakers from evaluating educational quality on the basis of students' scores on certain educational achievement tests? a)Substantial gaps between minority and majority students' performance on most accountability tests will rarely be found. b)There are often substantial mismatches between the content covered on some of these tests and local curricular emphases. c)It is difficult to tell from such tests how much of a student's test performance is due to what was taught in school rather than to students' socioeconomic status or inherited academic aptitudes. d)There is a technical tendency to remove from such tests items covering important, teacher-stressed knowledge and skills.
a)Substantial gaps between minority and majority students' performance on most accountability tests will rarely be found.
Which of the following is the most accurate in regards to grading student effort? a)There is no clear, commonly defined, approach to evaluating student effort and therefore effort cannot be graded. b)Effort contains a common definition that makes it easy to grade. c)Effort is an important factor that should be calculated routinely. d)None of these answer choices is correct.
a)There is no clear, commonly defined, approach to evaluating student effort and therefore effort cannot be graded.
The only acceptable response options presented to a student who must complete a self-report affective inventory containing statements about the same topic should be the following: Strongly Agree, Agree, Uncertain, Disagree, Strongly Disagree. a) True b) False
b) False
The vast majority of educators think revealing to students the nature of a teacher's affective curricular aims—at the outset of instruction—is an instructionally sensible action to take. a) True b) False
b) False
Because there is a statewide reading comprehension test that must be passed by all high-school students before they receive state-sanctioned diplomas, Mr. Gillette, a 10th-grade English teacher, spends about four weeks of his regular class sessions getting students ready to pass standardized tests. He devotes one week to each of the following topics: (1) time management in examinations, (2) dealing with test-induced anxiety, (3) making calculated guesses, and (4) trying to think like the test's item writers. Mr. Gillette's students seem appreciative of his efforts. Mr. Gillette's activities constitute ________. a)a violation of the professional ethics guideline b)a violation of the educational defensibility guideline c)a violation of both guidelines d)a violation of neither guideline
a)a violation of the professional ethics guideline
Which of the following indices is most commonly used to represent an item's difficulty? a)p value b)Correlation coefficient c)Distractor efficiency d)Discrimination index
a)p value
Analytic scoring is better than holistic scoring when an educator is trying to _________. a)provide diagnostic feedback to students b)remove sources of unreliability c)increase the efficiency of scoring d)increase overall validity
a)provide diagnostic feedback to students
When assessing the affective dispositions of a group of students by using a self-report inventory, it is obligatory to employ a Likert inventory as similar as possible to the inventories introduced by Rensis Likert in 1932. a) True b) False
b) False
When anonymously completed self-report inventories are being used in an attempt to assess students' affect, if some students respond too positively and, at the same time, some students respond too negatively, teachers simply cannot draw valid group-focused inferences about students' affective status. a) True b) Fasle
b) Fasle
In a normal distribution, approximately what percentage of test scores would fall within two standard deviations above the mean? a)50 percent b)2 percent c)10 percent d)15 percent
b)2 percent
A teacher's challenge in reducing students' tendency to supply socially desirable answers on a self-report inventory is identical to the challenge in reducing students' socially desirable responses to the items on a cognitive test. a) True b) False
b) False
If a teacher sets out to bring about changes in students' values, he or she needs to select as a curricular aim the promotion of only those values that are supported by more than 50 percent of the students' parents and at least half of the general citizenry of the state in which the teacher's school is located. a) True b) False
b) False
If respondents who are completing an affective self-report inventory that's presented by a computer are informed at the outset that their responses will be anonymous, it can be safely assumed that almost all students will believe their responses to be truly anonymous. a) True b) False
b) False
The greater the educational significance that a teacher attributes to the pursuit of affective curricular aims, the more acceptable it is for the teacher to use students' self-report responses not only to arrive at affectively focused inferences about groups of students but also to make inferences about particular student's affective status. a) True b) False
b) False
Suppose a group of test scores forms a perfectly normal distribution. Approximately what percentage of test scores will fall within one standard deviation of the mean? a)100 percent b)66 percent c)25 percent d)33 percent
b)66 percent
Which type of instructionally diagnostic tests are most commonly found in special education? a)Instruction-oriented b)Classification-focused c)Standardized diagnostic tests d)None of these are commonly found in special education
b)Classification-focused
Which of the following is not one of the five rules for rubrics outlined in Chapter 8? a)Make certain that all the rubric's evaluative criteria can be addressed instructionally b)Employ as many evaluative criteria as possible c)Provide a succinct label for each evaluative criteria d)Make sure the skill to be assessed is significant
b)Employ as many evaluative criteria as possible
Which of the following rules is not one that is recommended when creating a scoring rubric that will have a positive impact on classroom instruction? a)Make certain the skill to be assessed is truly significant. b)Employ as many evaluative criteria as possible to judge major aspects of students' responses. c)Provide a terse label for each of the rubric's evaluative criteria. d)Be sure that all of the rubric's evaluative criteria can be addressed instructionally by teachers.
b)Employ as many evaluative criteria as possible to judge major aspects of students' responses.
Which of the following is not one of the four described steps in the development of a goal-attainment approach to grading? a)Choosing goal-attainment evidence b)Ensuring that grading criteria are kept private for security purposes c)Arriving at a final goal-attainment grade d)Clarifying curricular aims
b)Ensuring that grading criteria are kept private for security purposes
The difficulties stemming from the presence of "social desirability's" contamination of students' responses can be effectively addressed by informing students in the initial directions that they are to identify themselves only after responding to all items. a)True b)False
b)False
To provide a more complete picture of students' current affective status, it is sensible to ask students to supplement their anonymous responses to a self-report inventory by adding optional, unsigned explanatory comments if they wish to do so. a)True b)False
b)False
A first-year classroom teacher, George Jenkins, has just finished preparing the initial set of three classroom tests he intends to use with his fifth-grade students early in the school year (one test each in mathematics, language arts, and social studies). In an effort to improve those tests, he has e-mailed a draft version of the three tests to his mother, who provided continuing support for George while he completed his teacher-education coursework as well as a semester-long student teaching experience. He asks his mother to suggest improvements that he might make in the early-version tests. Which best describes George's effort to enhance the quality of his tests? a)George probably chose the best source of improvement advice for his draft tests, but he should have provided his mother with more specific guidelines regarding what she should be looking for as she reviews his tests' items. b)George could probably have secured better advice about his draft tests had he solicited it from his school's teachers and from his fifth-grade students before they took the tests. c)Although George's mother might have some useful suggestions regarding the quality of his early-version tests, a novice teacher such as George can almost always get the best instruction and/or assessment advice from his school's principal - so he should head to the school office for test-related suggestions. d)In addition to his own review (and, if necessary, editing) of the three draft tests - carried out at least a few days after having written them - George should seek critiques from his fellow elementary school teachers and, after his students have finished the tests, from those students as well.
b)George could probably have secured better advice about his draft tests had he solicited it from his school's teachers and from his fifth-grade students before they took the tests.
Sue Philips, a health education teacher in a large urban middle school, has recently begun analyzing her selected-response classroom tests using empirical data from students' current performances on those tests. She has acquired a simplified test-analysis program from her district's administrators and applies the program on her own laptop computer. She tries to base students' grades chiefly on their test scores and hopes to find that her items display item-discrimination indices below .20. A recent analysis of items from one of her major classroom tests indicated that three items were negative discriminators. Sue was elated. Please select the statement that most accurately describes Sue's test-improvement understanding. a)Sue's elation with the three negatively discriminating items was warranted because items that are genuinely capable of negative discriminations can accurately unearth otherwise hidden item-construction flaws. b)Given Sue's use of students' test performances to assign grades, her understanding of item-discrimination indices is confused—actually, her items should be yielding strong positive indices rather than low or negative indices. c)Sue should have been relying on her test items' p-values to spot discrimination disparities in those items' ability to differentiate between the most and least knowledgeable students. d)Sue should not have used her current students' test performances to determine indicators of item quality—past students' performances represent more typical reflections of students' abilities.
b)Given Sue's use of students' test performances to assign grades, her understanding of item-discrimination indices is confused—actually, her items should be yielding strong positive indices rather than low or negative indices.
Engaging in assessment improvement is a natural part of the teaching process. However, sometimes reviewing your self-created materials alone can prove problematic. Which represents a reasonable explanation of why self-review can be problematic? a)Since you created the assessment, it isn't possible for you to critique it. b)If you created the assessment, you are prone to be biased in its favor. c)Teachers should play no part in reviewing their own assessments. This should be left to the help of colleagues and other professionals. d)You may be the expert in your content, but you're not an expert in assessment.
b)If you created the assessment, you are prone to be biased in its favor.
Which represents an accurate description of a performance assessment? a)An approach to measuring a student's status based solely on accuracy of response. b)An approach to measuring a student's status through their performance on predominately binary-choice items. c)An approach to measuring a student's status based on the way the student completes a specified task. d)An approach to measuring a student's status based on a combination of binary and multiple-choice items.
c)An approach to measuring a student's status based on the way the student completes a specified task.
Joshua Jenkins teaches a particularly popular high-enrollment series of American government courses in a large suburban high school. He has recently been tinkering with the multiple-choice exams he gives to his students because he wants to select the very best students to take part in an upcoming off-campus community project. He carries out a distractor analysis for one exam, given at the end of a 6-week unit on U.S. politics. He uses the data from the two largest of his four American government classes. After reviewing the distractor analysis data for item 27, Joshua decides to continue using the item in his exam covering the U.S. politics unit. Distractor Analysis Table Response Options Item No. 27 a)Joshua's decision to retain the item was correct - chiefly because all five multiple-choice options were chosen by one or more of his students in both the high total-test scorers and the low total-test scorers. b)Joshua erred in deciding to retain the item because, in order to make the intended norm-referenced interpretations about his students, the discrimination index of .15 is too low. c)Joshua wisely decided to retain the item because more students in the high-scoring group chose the correct answer than did students in the low-scoring group. d)Joshua's decision to retain the item was mistaken because of the item's atypically low p-value.
b)Joshua erred in deciding to retain the item because, in order to make the intended norm-referenced interpretations about his students, the discrimination index of .15 is too low.
Which of the following is not a common flaw when scoring performance assessments? a)Teachers' personal-bias errors b)Lack of scorer familiarity with the totality of the content being assessed c)Scoring-instrument flaws d)Procedural flaws
b)Lack of scorer familiarity with the totality of the content being assessed
Which characteristic would not be considered a quality of an item that would cause the item to be instructionally insensitive? a)Alignment leniency b)Length of item c)Socioeconomic status links d)Excessive difficulty
b)Length of item
Consider the following set of factors that could be employed to judge the quality of the tasks for performance tests. Which one is not generally endorsed as a task-selection factor? a)Feasibility of implementation b)Motivational impact on students c)Authenticity d)Teachability of the skill assessed by a task
b)Motivational impact on students
For tests intended to provide norm-referenced interpretations, which of the following kinds of items should be sought? a)Negative discriminators b)Positive discriminators c)Nondiscriminators d)None of the above
b)Positive discriminators
Which of the following is not one of the attributes of concern for an instructionally diagnostic test? a)Curricular alignment b)Simplicity of content c)Ease of usage d)Item quality
b)Simplicity of content
Mr. Byron is concerned about his students' end of year test scores. In his state, student test data are compared across years. For example, his students' scores on last year's exam were taken into account so that this year, students who scored similarly last year can be compared to one another this year. The comparisons are reported in percentiles such as, "Addy scored as well or better than 95 percent of her peers who took this assessment and who scored at the same scale score last year." What type of approach to evaluation is being described? a)Scale score comparisons b)Student growth percentile c)Pretest - posttest design d)None of these is correct
b)Student growth percentile
When considering the clarification of curricular aims, what two groups of stakeholders should be the teacher's focus? a)Students and administrators b)Students and parents c)Parents and administrators d)None of these is correct
b)Students and parents
Which one of the following statements regarding the improvement of classroom assessments is not accurate? a)Performance-based approaches to item improvement usually are different for tests aimed at criterion-referenced inferences than for those aimed at norm-referenced inferences. b)Students' reactions to test items should play little or no role in item improvement. c)Classroom teachers can employ judgmental improvement procedures, empirical procedures, or both. d)For selected-response tests, especially multiple-choice items, distractor analyses can prove useful in item-improvement.
b)Students' reactions to test items should play little or no role in item improvement.
If classroom teachers set out to improve their own tests using judgmental approaches, which of the following review criteria is not a factor teachers ought to consider? a)Absence of any significant gaps in a test's content coverage b)The likelihood that, if a test is seen by parents, those parents will recognize the suitability of an item's content coverage c)Adherence to item-specific guidelines and general item-writing rules d)Each item's likely contribution to a valid score-based inference about a student's status
b)The likelihood that, if a test is seen by parents, those parents will recognize the suitability of an item's content coverage
Given your reading of Chapter 9, which would you anticipate being a major obstacle to the effective use of portfolios? a)They are unpopular with students. b)They are labor intensive. c)They include only examples of a student's best work. d)They frequently involve student collaboration.
b)They are labor intensive.
When considering instructionally diagnostic tests, at least how many strengths and weaknesses should be addressed? a)There is no minimum. b)Two c)One d)Three
b)Two
Which of the following phrases is most commonly used to describe a portfolio focused on a student's self-evaluative and ongoing improvement in the quality of work products? a)Formative portfolio b)Working portfolio c)Summative portfolio d)Showcase portfolio
b)Working portfolio
Fred Phillips prepares his sixth-grade social studies students to do well on a state-administered social studies examination by having all of his students take part in practice exercises using test items similar to those found on the state examination. Fred tries to replicate the nature of the state examination's items without ever using exactly the same content as it is apt to appear on the examination. He weaves his test-preparation activities into his regular social studies instruction so cleverly that most students really don't know they are receiving examination-related preparation. Fred Phillips' activities constitute _________. a)a violation of the professional ethics guideline b)a violation of the educational defensibility guideline c)a violation of both guidelines d)a violation of neither guideline
b)a violation of the educational defensibility guideline
Srijati is eager to have her fourth-grade students become better "close readers"—that is, to be better able to read written materials carefully so that they are capable of, as Srijati says, "sucking all of the meaning out of what they read." Because of reductions in assessment funds, Srijati's school district has been obliged to eliminate all constructed-response items assessing students' reading comprehension. All items measuring students' reading comprehension, therefore, must be selected-response types of items and, beyond that, district officials have indicated that only three specific item types will be used in district-developed reading tests. So that her students will perform optimally on the district-developed reading tests, Srijati provides "close-reading practice" based exclusively on the three district-approved ways for students to display their reading comprehension. Srijati's fourth-graders really shine when it is time to take the district reading tests. Srijati's activities constitute _______. a)a violation of the professional ethics guideline b)a violation of the educational defensibility guideline c)a violation of both guidelines d)a violation of neither guideline
b)a violation of the educational defensibility guideline
Instructionally diagnostic tests are generally designed to yield informative results regarding _____________________. a)subgroups of students b)individual students c)large groups of students d)None of these is correct.
b)individual students
A rubric is a scoring guide to be employed in judging students' responses to constructed-response assessments such as a performance test. Which one of the following elements is the least necessary feature of a properly constructed rubric? a)An identification of the evaluative criteria to be used in appraising a student's response b)Descriptions of different quality levels associated with each evaluative criterion c)A designation of a performance standard required for skill-mastery d)An indication of whether a holistic or analytic scoring approach is to be used
c)A designation of a performance standard required for skill-mastery
Mr. Cory has established a set of guidelines for grading his ninth-grade students' English essays. He was quite pleased, because as he graded the essays he realized that almost 95 percent of the class earned an A. Which type of grading is most likely being described? a)Aptitude-based grading b)Relative grading c)Absolute grading d)None of these is correct
c)Absolute grading
Anita Gonzales teaches middle-school English courses. At least half of her classroom tests call for students to author original compositions. Her other tests are typically composed of selected-response items. Anita has recently committed herself to the improvement of these selected-response tests, so when she distributes those tests to her students, she also supplies an item-improvement questionnaire to each student. The questionnaire asks students as they complete their tests to identify any items that they regard as (1) confusing, (2) having multiple correct answers, (3) having no correct answers, or (4) containing unfamiliar vocabulary terms. Students are to turn in their questionnaires along with their completed tests, but are given the option of turning in the questionnaires anonymously or not. Which statement most accurately portrays Anita's test-improvement procedures? a)Anita should make sure to have her students put their names on the item-improvement questionnaires so that she can identify which items were criticized by which students - then discount students' criticisms of any items that a complaining student had answered incorrectly. b)Instead of combining students' reactions to the items on the tests in a single item-improvement questionnaire, Anita should have developed and distributed four separate questionnaires - one for each of the four potential deficits in the items. c)Although seeking students' judgments regarding her tests has much to commend it, Anita should have sought students' reactions to a test only after they had completed it - by distributing blank copies of the test along with the item-improvement questionnaire. d)Because flawed test-directions often distort students' responses to selected-response tests' items and because Anita's own test-directions might be seriously defective, she should have asked her students to evaluate only the directions for her tests, not the tests' actual items.
c)Although seeking students' judgments regarding her tests has much to commend it, Anita should have sought students' reactions to a test only after they had completed it - by distributing blank copies of the test along with the item-improvement questionnaire.
During last year's end-of-school evaluation conference, Jessica Jones, a high-school social studies teacher, was told by the principal that her classroom tests were "haphazard at best." Jessica now intends to systematically review each of the classroom tests she builds, based on her principal's suggestions. She intends to personally evaluate each test on the basis of (1) its likely contribution to a valid test-based inference, (2) the accuracy of its content, (3) the absence of any important content omissions, and (4) the test's fundamental fairness. Which option represents the best appraisal of Jessica's test-review plans? a)Jessica's four evaluative criteria not only represent useful factors for teachers to employ when carrying out their own test-review efforts, but these four evaluative criteria also exhaust the potentially helpful factors that teachers should consider when evaluating their own teacher-made tests. b)Jessica suffers from a misperception about the potential test-improvement dividends derivative from teachers' often excessively favorable evaluations of their own teacher-made classroom tests. c)Although the four test-review factors Jessica chose will help her identify certain deficiencies in her tests, she should also incorporate as review criteria a full range of widely endorsed experience-based and research-based (1) item-specific guidelines and (2) general item-writing guidelines. d)Because of the frequent dependence on judgmentally determined policy positions when dealing with social studies content (as opposed to the content typically encountered in, say, mathematics and science courses), Jessica's well-intentioned judgmental reviews of her own teacher-made tests are likely to be unsuccessful.
c)Although the four test-review factors Jessica chose will help her identify certain deficiencies in her tests, she should also incorporate as review criteria a full range of widely endorsed experience-based and research-based (1) item-specific guidelines and (2) general item-writing guidelines.
Many experienced assessors of student affect suggest the most appropriate way for classroom teachers to monitor their students' affect is through the use of: a)Teacher observations of affect-related student behaviors b)Anonymously completed personalized essays c)Anonymously completed self-report inventories d)Affectively oriented performance tests
c)Anonymously completed self-report inventories
Mr. Smith, a second-year mathematics teacher in a large urban high school, is seeking frequent reactions to his teacher-made tests from the other mathematics teachers in his school. He typically first secures his colleagues' agreement during informal faculty-lounge conversations, then relays copies of his tests - along with brief review forms - at several points in the school year. Although Mr. Smith simultaneously carries out systematic reviews of his own tests by employing what he regards as a first-rate test-appraisal rubric from his school district, when his own views regarding any of his test's items conflict with those of his colleagues, he always defers to the reactions of his much more experienced fellow teachers. Which option represents the most accurate statement regarding Mr. Smith's test-improvement efforts? a)Mr. Smith should first identify the math teachers who have at least 5 years' worth of teaching experience in his school or in another school - then solicit the item-review judgments only from such seasoned mathematics instructors rather than seeing reactions from less experienced teachers. b)Because he has chosen to solicit the per-item reactions of his fellow mathematics teachers, there is really no need for Mr. Smith to undertake his own, separate review of the items he originally authored. c)Even though Mr. Smith is wise in seeking the item-quality reactions of his school's other math teachers - especially because he is only in his second year of teaching - the ultimate decision about the quality of any of his test items should not be deferentially based on collegial input but, rather, based on Mr. Smith's own judgment. d)To make sure that he and his colleagues are covering the full range of factors on which to evaluate a test's items, Mr. Smith should make sure that the evaluative criteria he considers in the district-developed test-appraisal rubric he uses are different than the evaluative criteria he asks his fellow math teachers to employ when they review his tests.
c)Even though Mr. Smith is wise in seeking the item-quality reactions of his school's other math teachers - especially because he is only in his second year of teaching - the ultimate decision about the quality of any of his test items should not be deferentially based on collegial input but, rather, based on Mr. Smith's own judgment.
Which characteristic would not be considered a quality of an item that would cause the item to be instructionally insensitive? a)Item flaws b)Socioeconomic status links c)Format of the item (binary, multiple choice, etc.) d)Academic aptitude links
c)Format of the item (binary, multiple choice, etc.)
Given your reading of Chapter 8, which would you anticipate being a limitation of performance-based assessments? a)Alignment to state standards b)Ability to measure higher-order learning c)Lengthy administration time d)The ability to assess all learners accurately
c)Lengthy administration time
Mr. Miller administered a test to his students following his unit on converting fractions to decimals. He concluded that his class did not respond well to his instruction due to low test scores. While this may seem like common practice on the surface, what piece of information is Mr. Miller missing to make an adequate determination of his students' response to his instruction? a)Mr. Miller needs to also consider the students' classwork. b)Mr. Miller needs to test his students again with a different test version. c)Mr. Miller is missing pretest data. d)Mr. Miller has all of the information he needs.
c)Mr. Miller is missing pretest data.
Which type of score would indicate a test-taker's standing in relation to that of a norm group? a)Mean score b)Grade equivalent score c)Percentile score d)None of these would provide this information
c)Percentile score
Which of the following is not a component of the seven-step sequence for portfolio assessment found in Chapter 9? a)Select criteria by which to evaluate portfolio work samples. b)Involve parents in the portfolio assessment process. c)Portfolio assessment is complicated and should only be done by the teacher. d)Schedule and conduct portfolio conferences.
c)Portfolio assessment is complicated and should only be done by the teacher.
Which of the following, from a classroom teacher's perspective, is probably the most serious drawback of portfolio assessment? a)Parents' negative reactions to portfolio assessment b)Students' negative reactions to portfolio assessment c)Portfolio assessment's time-demands on teachers d)The excessive attention given to portfolio assessment by educational policymakers
c)Portfolio assessment's time-demands on teachers
Which represents an appropriate linear representation of the classic pretest versus posttest model? a)instruction --> Pretest --> Review --> Posttest b)Instruction --> Pretest --> Posttest c)Pretest --> Instruction --> Posttest d)None of these accurately represents the classic pretest versus posttest model
c)Pretest --> Instruction --> Posttest
Which statistic would yield information regarding the variability of a group's test scores? a)Mode b)Mean c)Range d)Median
c)Range
Given your reading of Chapter 9, which would you anticipate being a common misperception of portfolios? a)They can be used for communication with parents. b)They require a clear and specific purpose. c)They consist of a haphazard collection of student work. d)They include student self-evaluations of their work.
c)They consist of a haphazard collection of student work.
Which of the following represents a key impediment to teachers' portfolio assessment? a)General public disapproval of the use of this version of classroom assessment b)Students' unfamiliarity with this assessment strategy c)Time demands linked to a teacher's implementation of portfolio assessment d)School-site administrators' unfamiliarity with the portfolio-assessment process
c)Time demands linked to a teacher's implementation of portfolio assessment
Because she is eager for her students to perform well on their 12th-grade senior mathematics tests (administered by the state department of education), Mrs. Williamson gives students answer keys for all of the test's selected-response items. When her students take the test in the school auditorium, along with all of the school's other 12th-graders, she urges them to use the answer keys discreetly, and only if necessary. Mrs. Williamson's activities constitute ________. a)a violation of the educational defensibility guideline b)a violation of the professional ethics guideline c)a violation of both guidelines d)a violation of neither guideline
c)a violation of both guidelines
Mrs. Gordon makes sure she has her sixth-grade students practice each fall for the nationally standardized achievement tests required in the spring by the district's school board. Fortunately, she has been able to make photocopies of most of the test's pages during the past several years, so she can organize a highly relevant 2-week preparation unit wherein students are given actual test items to solve. At the close of the unit, students are supplied with a practice test on which about 60 percent of the test consists of actual items copied from the nationally standardized commercial test. Mrs. Gordon provides students with an answer key after they have taken the practice test so that they can check their answers. Mrs. Gordon's activities constitute _______. a)a violation of the professional ethics guideline b)a violation of the educational defensibility guideline c)a violation of both guidelines d)a violation of neither guideline
c)a violation of both guidelines
Mrs. Jones, a third-grade teacher, was asked by officials of her state department of education two years ago to serve as a member of a Bias Review Committee whose task was to consider whether a set of not-yet-final items being prepared for the state's annual accountability tests contained any assessment bias that would preclude their use. Even though Mrs. Jones realized that her committee's item-by-item reviews would not be the only factor determining whether such underdeveloped items would actually be used on the state-administered accountability tests, she was convinced that many of the items she had reviewed would end up on those tests.Accordingly, based on the informal notes she had taken during a two-day meeting of the Bias Review Committee, she always makes certain to give her own third-grade students plenty of guided and independent practice in responding to items similar to those she had reviewed. Mrs. Jones generates these practice items herself, always trying to make her practice items resemble the specific details of the items she reviewed. Because a new teacher-evaluation system in her district calls for the inclusion of state test scores of each teacher's students, Mrs. Jones was pleased to see that her own third-graders scored well on this year's state tests. Mrs. Jones's activities constitute ________. a)a violation of the educational defensibility guideline b)a violation of the professional ethics guideline c)a violation of both guidelines d)a violation of neither guideline
c)a violation of both guidelines
The district where Todd Blanding teaches high-school chemistry stipulates that up to 100 percent of a teacher's student-growth evidence, used for teacher evaluations, can be based on before-instruction and after-instruction classroom assessments. Todd and the other teachers in his high school realize how important it is for their students to score well on classroom tests, particularly any tests being used to collect evidence of pre-instruction to post-instruction growth. Accordingly, each month the high school's staff participates in content-alike learning communities so they can explore together suitable test-preparation alternatives. Based on these monthly explorations, Todd has developed a pretest-to-posttest instructional approach whereby he never provides item-specific instruction for more than half of the items he intends to use for any upcoming posttest. (Item-specific instruction explicitly explores the nuances of a particular item.) Because at least half of the items on an instructional unit's posttest will not have been discussed in class prior to the posttest, Todd is confident that he can base valid interpretations about students' growth from their pretest-to-posttest performances. Todd's activities constitute __________. a)a violation of the professional ethics guideline b)a violation of the educational defensibility guideline c)a violation of both guidelines d)a violation of neither guideline
c)a violation of both guidelines
Mrs. Hilliard knows the reading test administered to all state eighth graders contains a set of five fairly lengthy reading selections, each of which is followed by about eight multiple-choice items dealing with such topics as (1) the main idea of the selection or the main idea of its constituent paragraphs, (2) the meaning of technical terms that can be inferred from contextual clues, and (3) the defensibility of post-reading inferential statements linked to the selection. Mrs. Hilliard routinely spends time in her eighth-grade language arts class trying to improve her students' reading comprehension capabilities. She has the students read passages similar to those used in the statewide test, then gives her students a variety of practice tests, including written multiple-choice, true-false, and oral short-answer tests in which, for example, individual students must state aloud what they believe to be the main idea of a specific paragraph in the passage. Mrs. Hilliard's activities constitute _________. a)a violation of the educational defensibility guideline b)a violation of the professional ethics guideline c)a violation of neither guideline d)a violation of both guidelines
c)a violation of neither guideline
An advantage of performance-based assessments over achievement tests is that they can be used to evaluate _________. a)the reading skills of students b)the level of a student's knowledge c)both the process and product of a task d)the attitudes of students
c)both the process and product of a task
A ______________________ is a test that is designed to yield either norm-referenced or criterion-referenced inferences and that is administered, scored, and interpreted in a predetermined manner. a)summative assessment b)formative assessment c)standardized test d)diagnostic test
c)standardized test
Which of the following is not an element typically embodied in performance tests? a)Prespecified evaluative criteria for judging students' responses b)Multiple evaluative criteria c)Judgmental appraisal of students' responses d) A direct link to a preexisting content standard
d) A direct link to a preexisting content standard
Which of the following statements about goal-attainment grading is most defensible? a)"Because of the centrality of curricular aims in any goal-attainment conception of grading, the curricular targets being sought should be carefully described to students' parents and to students themselves at grading time." b)"Given its focus on students' mastery of curricular-targets, goal-attainment grading essentially precludes the possibility of teachers' measuring students' affective dispositions." c)"Because students' effort plays such a pivotal role in a student's ultimate learning, all goal-attainment grading must include a provision for incorporating students' levels of effort." d)"If a teacher can collect defensible assessment evidence of a student's mastery of the teacher's designated curricular aims, then this evidence should be the only basis for goal-attainment grading."
d)"If a teacher can collect defensible assessment evidence of a student's mastery of the teacher's designated curricular aims, then this evidence should be the only basis for goal-attainment grading."
Which of the following is not a key step that a classroom teacher needs to take in implementing a portfolio assessment program? a)Schedule and conduct a meaningful number of portfolio conferences. b)Decide on the kinds of work samples to collect. c)Require students to evaluate continually their own portfolio products. d)Decide which students should be involved in the portfolio assessment program.
d)Decide which students should be involved in the portfolio assessment program.
Consider the following statements regarding test-preparation. Which one is not accurate? a)Appropriate test-preparation will simultaneously improve students' test scores as well as students' mastery of the knowledge and/or skills represented by the test. b)If teachers adhere to the ethical norms of the education profession while preparing their students, this is an important ingredient in appropriate test-preparation activities. c)If relatively brief, generalized test-taking preparation focused on such skills as how to manage one's time during test-taking is quite appropriate. d)If teachers simultaneously direct their instruction toward a test's specific items and the curricular aim on which the test is based, this constitutes appropriate test preparation.
d)If teachers simultaneously direct their instruction toward a test's specific items and the curricular aim on which the test is based, this constitutes appropriate test preparation.
Which of the following is not one of the attributes of concern for an instructionally diagnostic test? a)Item quality b)Ease of usage c)Curricular alignment d)Length of assessment
d)Length of assessment
A high-school biology teacher, Nicholas, relies heavily on his students' test performances when he assigns grades to those students. Typically, he sends his selected-response classroom tests to the district's assessment director who, usually in 24 hours, returns a set of item analyses to the teachers. These analyses usually contain an overall mean and a standard deviation for each class's test performances, as well as p-values and item-discrimination indicators for every item in each test. Nicholas teaches four different biology classes, so four separate analyses are carried out at the district office. Nicholas is pleased that very few of his tests' items display exceptionally high or exceptionally low p-values. Moreover, the vast majority of the items appear to have discrimination indices of a positive .25 or above. Three items have negative discrimination indices. After looking at the phrasing of those items, Nicholas sees how he should revise them to eliminate potentially confusing ambiguities. Please consider the four options below, then select the alternative that most accurately describes how Nicholas is dealing with the analyses of his biology tests. a)To obtain a more reliable estimate of item quality, Nicholas should have combined the four sets of per-class test results, then sent them to the district office for a single set of item analyses. b)The three negatively discriminating items were actually the most effective items for Nicholas's purposes, so the other items should be modified, not the three negative discriminators. c)Nicholas's satisfaction with the many items possessing .25 or higher item-discrimination indices was unwarranted because most items deemed suitable for yielding norm-referenced interpretations must possess indices of .50 or better. d)Nicholas was appropriately pleased with the results of the district-conducted item analyses, and he made a sensible decision to revise the three negatively discriminating items.
d)Nicholas was appropriately pleased with the results of the district-conducted item analyses, and he made a sensible decision to revise the three negatively discriminating items.
Which represents a disadvantage of percentile scores? a)Percentile scores do not provide accurate information. b)Percentile scores depend on the quality of the norm group. c)Percentile scores are difficult to interpret. d)None of these are disadvantages of percentile scores.
d)None of these are disadvantages of percentile scores.
A parent receives his child's grade equivalent score on the third-grade math assessment. The grade equivalent score is 5.3. The parent calls the teacher to discuss whether or not his daughter should skip the fourth grade. Which of the following pieces of advice is most appropriate? a)The child should not skip the fourth grade. Grade equivalent scores are not trustworthy scores. b)The child should skip the fourth grade since she has shown mastery at the fifth-grade level. c)The child should not skip the fourth grade. The grade equivalent score of 5.3 indicates that the student scored at a level that would be expected of an average fifth grader on the third-grade assessment. d)None of these represent appropriate advice.
d)None of these represent appropriate advice.
Performance assessments are often scored via rubric. Given the knowledge you've gained from this chapter, which type of rubric do you feel is most often best-suited for performance assessments? a)Task-specific rubrics b)Rubrics are not an appropriate way to score performance assessments. c)Hypergeneral rubrics d)Skill-focused rubrics
d)Skill-focused rubrics
Which of the following would be the most suitable assessment target for a multifocus affective inventory? a)Students' abilities to perform routine physical skills such as ball-tossing and ball-catching b)Students' ability to write a persuasive essay c)Students' skill in solving complex geometric problems d)Students' interests in different subjects
d)Students' interests in different subjects
Which of the following is not a component of the seven-step sequence for portfolio assessment found in Chapter 9? a)Require students to evaluate continually their own portfolio products. b)Make sure students "own" their portfolios. c)Decide on what kinds of work samples to collect. d)Teachers should autonomously decide on evaluative criteria as the content experts.
d)Teachers should autonomously decide on evaluative criteria as the content experts.
Given your reading in this chapter, which would you suggest are more effectively measured by performance-based assessments? a)Understanding of concepts b)The ability to recognize faulty procedures c)Distinguishing between accurate and inaccurate information d)The ability to formulate problems
d)The ability to formulate problems
Which would be the best justification for the relatively large amount of time required to respond to many performance-based assessment tasks? a)Students and parents like them b)Performance on one task generalizes well to performance on other tasks c)Multiple scores can be derived from a single task d)The tasks can provide students with valuable learning opportunities
d)The tasks can provide students with valuable learning opportunities
Which of the following is not an important rule to be followed in the classroom assessment of student affect? a)Make all inferences about students' affective status so the inferences are group-focused rather than individual-focused. b)If any self-report inventories are used, students' responses must be truly anonymous. c)Any affective variable being measured must be genuinely noncontroversial. d)To monitor students' ever-changing attitudes and interests, assess affect on at least a bi-weekly basis.
d)To monitor students' ever-changing attitudes and interests, assess affect on at least a bi-weekly basis.
Which of the following terms is appropriate for an evaluation model that employs a student's prior achievement and background characteristics as statistical controls to help isolate the effects on student achievement of specific teachers, schools, or districts? a)Teacher contribution model b)Student achievement model c)Socio-economic model d)Value added-model
d)Value added-model
Mr. Thompkin teaches mathematics in an urban middle school serving many students from lower-income families. Although Mr. Thompkin personally finds his district's heavy emphasis on educational testing to be excessive, he concedes that his students will benefit from scoring well on the many math tests he is obliged to administer during a school year. Because most of his students cannot afford to enroll in the commercial test-preparation programs that are available throughout his city, Mr. Thompkin entices a psychologist friend of his - a friend who is particularly knowledgeable about test-taking skills - to visit all of his courses one day during the first month of school. The psychologist explains to students not only how to take tests successfully but also how to prepare in advance for any high-stakes testing situations. Mr. Thompkin believes one class period per year that's focused on test-taking rather than learning mathematics is a decent trade-off for his students. Mr. Thompkin's activities constitute ________. a)a violation of the educational defensibility guideline b)a violation of the professional ethics guideline c)a violation of both guidelines d)a violation of neither guideline
d)a violation of neither guideline
Ms. Sanchez realizes that many of her fourth-graders are relatively recent arrivals in the United States, having come from Mexico and Central America. Most of her students speak English as a second language and possess limited experience in taking the kinds of standardized tests used so frequently these days in U.S. schools. Accordingly, Ms. Sanchez has located a number of English-language standardized tests for her fourth-grade students, and she has photocopied segments of the tests so the introductory pages will be available to all of her students. Once every few weeks, Ms. Sanchez asks her fourth-graders to spend classroom instructional time trying, as she says, to "make sense" out of these tests. About 20 minutes is devoted to students' reading the tests' directions and then determining if they can understand specifically how they are to complete each of the standardized tests. She makes no copies of any items other than those used in a test's directions. Ms. Sanchez 's activities constitute _________. a)a violation of the professional ethics guideline b)a violation of the educational defensibility guideline c)a violation of both guidelines d)a violation of neither guideline
d)a violation of neither guideline
An example of analytic scoring would be to evaluate _______ a)overall performance based on a process checklist b)overall performance based on a product review c)the performance of each step listed on a product review d)the performance of each step listed on a process checklist
d)the performance of each step listed on a process checklist
Jurgen James, an experienced mathematics teacher, loves fussing with numbers based on his classroom assessments. He studies the performance of his previous year's students on key tests to help him arrive at criterion-referenced interpretations regarding which segments of his instructional program seem to be working well or badly. Based on the performances of both of his algebra classes, he learns that the differences in p-values for items taken by uninstructed students (based on a first-of-year pretest) and p-values for items taken by instructed students (based on final exams) are staggering. That is, when pretest p-values are subtracted from final exam p-values, the resulting differences are mostly at least .40 or higher. Mr. James concludes that his algebra items were doing precisely the job they were created to do. Which of the quotes from Mr. James's fellow teachers most accurately characterizes his conclusion regarding his algebra items? a)"Jurgen came to the correct conclusion, and I'm not surprised by his students' p-value jumps - he is a spectacular math teacher!" b)"Jurgen is, in this instance at least, simply turned around - his high pretest-to-posttest differences are precisely what an effective teacher should not want to see." c)"Even though Jurgen may have accurately concluded that his algebra items were of high quality, he should have arrived at this judgment by focusing only on his students' performances after instruction, not on both pre-instruction and post-instruction performances." d)"The substantial differences in students' p-values by Jurgen's students, while commendable, are more suitable for tests from which teachers should be arriving at norm-referenced, not criterion-referenced interpretations."
a)"Jurgen came to the correct conclusion, and I'm not surprised by his students' p-value jumps - he is a spectacular math teacher!"