Understanding By Design Book
Project
A complex set of intellectual challenges, typically occurring over lengthy periods of time. Projects typically involve extensive student inquiry, culminating in student products and performances. A unit might be composed of a single project but include other forms of assessment evidence (quizzes, tests, observations) along the way.
Apply
effectively use and adapt knowledge in diverse contexts.
Empathize
get inside, find value in what others might find odd, alien, or implausible; perceive sensitively, based on prior direct experience.
Perform
To act upon and bring to completion.
Rubric
A criterion-based scoring guide that enables judges to make reliable judgments about student work and enables students to self-assess. A rubric assesses one or more traits of performance. The rubric answers the question, What does understanding or proficiency for an identified result look like? See also analytic-trait scoring.
Academic Prompt
A form of assessment between an authentic performance task and a short-answer test or quiz. Academic prompts are open-ended written performance tests. As the word academic suggests, they are tests that occur only in school or exam situations. The tester prompts a response to a particular quote, idea, or request for performance. Such prompts are not authentic (even though they prompt performance) because typical school constraints are placed on the task, the access to resources, time allotted.
Template
A guide or framework for designers. In everyday usage, the term refers to a form, constructed of paper, wood, or sheet metal, whose edge provides a guide for cutting a particular shape. In Understanding by Design, the unit planning template provides a conceptual guide to applying the various elements of backward design in the development or refinement of a unit of study. Each page of the template contains key questions, prompting the user to consider particular elements of backward design, and a graphic organizer containing frames for recording design ideas. See also intelligent tool.
Concept
A mental construct or category represented by a word or phrase. Concepts include both tangible objects (e.g., chair, rabbit) and abstract ideas (e.g., democracy, bravery). Overarching understandings are derived from concepts.
Have perspective
see points of view, with critical eyes and ears; see the big picture.
Holistic Scoring
A representation of an overall impression of the quality of a performance or product. Holistic scoring is distinguished from analytic-trait scoring, in which separate rubrics are used for each separate criterion that makes up an aspect of performance. However, multiple holistic scores are possible for a multifaceted performance task involving several standards. For example, separate holistic scores might apply to an oral presentation and a written report that are part of the same task, without breaking down those scores into the analytic components of each mode (e.g., the organization and clarity of the oral performance).
Portfolio
A representative collection of one's work. As the word's roots suggest (and as is still the case in the arts), the sample of work is fashioned for a particular objective and carried from place to place for inspection or exhibition. In academic subject areas, a portfolio often serves two distinct purposes: providing a documentation of the student's work, and serving as the basis for evaluation of work in progress or work over time. The documentation typically serves three functions: revealing the student's control over all the major areas, techniques, genres, and topics of the course or program; allowing students to reflect on and show off their best work (by letting them select which works will be put in the portfolio); and providing evidence of how works evolved and were refined.
Essential Question
A question that lies at the heart of a subject or a curriculum (as opposed to being either trivial or leading), and promotes inquiry and uncoverage of a subject. Essential questions thus do not yield a single straightforward answer (as a leading question does) but produce different plausible responses, about which thoughtful and knowledgeable people may disagree. An essential question can be either overarching or topical (unit-specific) in scope. (Note that this represents a change in language use from earlier UbD material. In the first edition of Understanding by Design, essential questions were overarching only.)
Leading Question
A question used to teach, clarify, or assess for knowledge. Unlike essential questions, leading questions have correct and straightforward answers. To call a question "leading" is not to damn it; leading questions have a useful role in teaching and checking for understanding. But their purpose is quite different, therefore, from the purpose of essential questions.
Entry Question
A simple, thought-provoking question that opens a lesson or unit. It often introduces a key idea or understanding in an accessible way. Effective entry questions spark discussion about a common experience, provocative issue, or perplexing problem, as a lead-in to the unit and essential questions. Entry questions should be framed for maximal simplicity, be worded in student-friendly language, have provocation value, and point toward the larger unit and essential questions. The design challenge is to enable essential and unit questions to arise naturally from the entry questions, problems, and activities.
Desired Results
A specific educational goal or achievement target. In Understanding by Design, Stage 1 sums up all desired results. Common synonyms include target, goal, objective, and intended outcome. Desired results in education are generally of five kinds: (1) factual or rule-based declarative knowledge (e.g., a noun is the name of a person, place, or thing); (2) skills and processes (e.g., rendering a perspective drawing, researching a topic); (3) understandings, insights derived from inferences into ideas, people, situations, and processes (e.g., visible light represents a very small band within the electromagnetic spectrum); (4) habits of mind (e.g., persistence, tolerance for ambiguity); and (5) attitudes (e.g., appreciation of reading as a valuable leisure-time pursuit). Though they involve complex learnings, the desired results must be cast in measurable terms. Any valid assessment, in other words, is designed to measure the degree to which the learner's work hit the target. See also achievement target.
Proposition
A statement that describes a relationship between or among concepts. Understanding by Design suggests that targeted understandings be framed as specific propositions to be understood, not just phrases that refer to the topic or content standard. Propositions include principles, generalizations, axioms, and laws
Achievement target
A synonym for desired result, learning outcome, and similar terms related to the educational end sought.
Uncoverage
A teaching approach that is required for all matters of understanding. To "uncover" a subject is to do the opposite of "covering" it, namely to go into depth. Three types of content typically demand such uncoverage. The content may be principles, laws, theories, or concepts that are likely to have meaning for a student only if they are seen as sensible and plausible; that is, the student can verify, induce, or justify the content through inquiry and construction. The content may be counterintuitive, nuanced, subtle, or otherwise easily misunderstood ideas, such as gravity, evolution, imaginary numbers, irony, texts, formulas, theories, or concepts. The content may be the conceptual or strategic element of any skill (e.g., persuasion in writing or "creating space" in soccer). Such uncoverage involves clarifying effective and efficient means, given the ends of skill, leading to greater purposefulness and less mindless use of techniques. Contrast coverage.
Coverage
A teaching approach that superficially teaches and tests content knowledge irrespective of student understanding or engagement. The term generally has a negative connotation: It implies that the goal is to march through a body of material (often a textbook) within a specified time frame. (Ironically, one meaning of the term to cover is "to obscure.") Teachers often couple the term with an excuse linked to demands of curriculum frameworks ("I would have liked to go into greater depth, but we have to cover the content") or external testing ("but the students will be tested on . . . and the results are published in the paper"). Contrast uncoverage.
Ill-structured
A term used to describe a question, problem, or task that lacks a recipe or obvious formula to answer or solve it. Ill-structured tasks or problems do not suggest or imply a specific strategy or approach guaranteed to yield success. Often the problem is fuzzy and needs to be further defined or clarified before a solution is offered. Such questions or problems thus demand more than knowledge; they demand good judgment and imagination. All good essay questions, science problems, or design challenges are ill structured: Even when the goal is understood or the expectations clear, a procedure must be developed along the way. Invariably, ill-structured tasks require constant selfassessment and revision, not just a simple application of knowledge transfer. Most real problems in life are ill structured; most test items are not. Test questions are well structured in that they have a single, unambiguous right answer, or an obvious solution procedure. Such items are fine for validly assessing elements of knowledge but not appropriate for judging the student's ability to use knowledge wisely—namely, how to judge which knowledge and skill to use when. (A basketball analogy clarifies the distinction. The "test" of each drill in basketball differs from the "test" of playing the game well in performance: The drill is predictable and structured; the game is unpredictable and not scriptable.)
Scoring Scale
An equally divided continuum (number line) used in evaluating performance. The scale identifies how many different scores will be used. Performance assessments typically use a much smaller scale for scoring than standardized tests. Rather than a scale of 100 or more, most performancebased assessment uses a 4- or 6-point scale. Two interrelated reasons explain this use of a small number of score points. Each place on the scale is not arbitrary (as it is in norm-referenced scoring); it is meant to correspond to a specific criterion or quality of work. The second reason is practical: To use a scale of so many discrete numbers reduces scoring reliability.
Standardized
A term used to describe a test or assessment in which the administrative conditions and protocol are uniform for all students. In other words, if all students face similar logistical, time, material, and feedback guidelines and constraints, then the test is standardized. Standardized tests prompt three common misconceptions: • "Multiple-choice test" and "standardized test" are synonymous. A performance task, administered uniformly as a test, is also a standardized test, as seen, for example, in the road test for a driver's license or a qualifying meet for the Olympics. • Standardized tests are always objectively (that is, machine) scored. The advanced placement exam essays and all state writing tests are scored by judges yet are standard in their administration. • Only national norm-referenced or criterion-referenced tests (such as the SAT) can be standardized. A departmental exam in a high school is also a standardized test. An important implication, then, is that all formal tests are standardized. This is not true of an assessment, however. In an assessment, the administrator is free to vary the questions, the tasks, the order of the tasks, and the time allotted in order to be satisfied that the results are fair, valid, and reliable. This was the argument made by Piaget for his "clinical method" as opposed to the "test method" of Binet. See also assessment.
Secure
A term used to describe a test with questions that are not accessible to teachers or students for purposes of preparation. Most multiple-choice tests must be secure or their validity is compromised, because they rely on a small number of uncomplicated questions. Many valid performance assessments are not secure, however. Examples include a baseball game or the road test for getting a driver's license. The student to be assessed often knows the musical piece, debate topic, oral exam questions, or term paper subject in advance, and the teacher appropriately "teaches to the test" of performance.
Open Ended Question
A term used to describe tasks or questions that do not lead to a single right answer. This does not imply that all answers are of equal value, however. Rather, it implies that many different acceptable answers are possible. Such answers are thus "justified" or "plausible" or "well-defended" as opposed to "correct." Essay test questions, for example, are open-ended, whereas multiple-choice tests are not (by design).
Indirect Test
A test that measures performance out of its normal context. Thus, any multiple-choice test of any complex performance (reading, writing, problem solving) is, by definition, indirect. The ACT and SAT are indirect ways of assessing likely success in college, because their results correlate with freshman grade-point averages. An indirect test is less authentic than a direct test, by definition. However, an indirect test of performance can be valid; if results on the indirect test correlate with results on direct tests, then the test is valid by definition.
Direct Test
A test that measures the achievement of a targeted performance in the context in which the performance is expected to occur (e.g., the parallelparking portion of a driving test). In comparison, an indirect test uses often deliberately simplified ways of measuring the same performance out of context (e.g., written portion of a driver's test). A direct test is more authentic than an indirect test, by definition. Contrast audit test.
Intelligent Tools
A tool that puts abstract ideas and processes in a tangible form. An intelligent tool enhances performance on cognitive tasks, such as the design of learning units. For example, an effective graphic organizer like a story map helps students internalize the elements of a story in ways that enhance their reading and writing of stories. Likewise, routinely using intelligent tools like the unit planning template and the Understanding by Design tools should help users develop a mental template of the key ideas of UbD. See also template.
Analytic-trait scoring
A type of scoring that uses several distinct criteria to evaluate student products and performances. In effect, a performance is assessed several times, using the lens of separate criteria each time. For example, in the analytic scoring of essays, we might evaluate five traits—organization, use of detail, attention to audience, persuasiveness, and conventions. Analytic-trait scoring contrasts with holistic scoring, whereby a judge forms a single, overall impression about a performance.
Genre of Performance
A type or category of intellectual performance or product. For example, people commonly speak of genres of writing (narrative, essay, letter) or speaking (seminar discussion, formal speech, giving directions). A genre is thus a subset of the three main modes of intellectual performance: oral, written, displayed.
Facet of Understanding
A way in which a person's understanding manifests itself. Understanding by Design identifies six kinds of understanding: application, empathy, explanation, interpretation, perspective, and selfknowledge. True understanding thus is revealed by a person's ability to Explain, interpret, apply, have perspective, empathize, and have self-knowledge. Speaking of facets of understanding implies that understanding (or lack of it) reveals itself in different mutually reinforcing ways. In other words, the more a student can explain, apply, and offer multiple points of view on the same idea, the more likely it is that the student understands that idea. A facet is thus more like a criterion in performance assessment than a learning style. It refers more to how teachers judge whether understanding is present than their need to appeal to a learner's abilities or preferences. In the same way that an essay, to be effective, has to be persuasive and logical (whether or not a person has those traits or values them), so, too, do the facets suggest what teachers need to see if they are to conclude a student has understanding. This is not meant to imply that all six facets are always involved in any particular matter of understanding. For example, self-knowledge and empathy would not often be at stake in looking for evidence of student understanding of many mathematical concepts. The facets do not present a quota but a framework or set of criteria for designing lessons and assessments that better develop and measure understanding.
Understanding
An insight into ideas, people, situations, and processes manifested in various appropriate performances. To understand is to make sense of what one knows, to be able to know why it's so, and to have the ability to use it in various situations and contexts.
Sampling
All unit and test design involves the act of sampling from a vast domain of possible knowledge, skills, and tasks. Like the Gallup polls, sampling enables the assessor to draw valid inferences from a limited inquiry if the sample of work or answers is appropriate and justified. Unit and test design uses two different kinds of sampling: sampling from the wider domain of all possible curricular questions, topics, and tasks; and sampling that involves assessing only a subset of the entire student population instead of testing everyone. These two kinds of sampling get combined in large-scale testing systems to form matrix sampling, whereby one can test many or all students using different tests to cover as much of the domain of knowledge as possible. Teachers attempting to sample the domain of subject matter in a unit through a specific task must ask, What feasible and efficient sample of tasks or questions will enable us to make valid inferences about the student's overall performance (because we cannot possibly test the student on everything that was taught and learned)? When teachers try to use a subset of the population to construct a more efficient and cost-effective approach to testing, they are asking the question the pollsters ask: What must be the composition of any small sample of students so that we can validly infer conclusions about the systemwide performance of all students using the results from our sample?
Performance Task
Also called "performance." A task that uses one's knowledge to effectively act or bring to fruition a complex product that reveals one's knowledge and expertise. Music recitals, oral presentations, art displays, and auto mechanic competitions are performances in both senses. Many educators mistakenly use the phrase "performance assessment" when they really mean "performance test" (see assess, assessment). A performance assessment involves more than a single test of performance and might use other modes of assessment as well (such as surveys, interviews of the performer, observations, and quizzes). Tests of performance, whether authentic or not, differ from multiplechoice or short-answer tests. In a test of performance, the student must put it all together in the context of ill-structured, nonroutine, or unpredictable problems or challenges. By contrast, most conventional short-answer or multiplechoice tests are more like the drills in sports than the test of performance. Real performers (athletes, debaters, dancers, scientists, or actors) must learn to innovate and use their judgment as well as their knowledge. By contrast, multiplechoice test items merely ask the student to recall, recognize, or "plug in" isolated, discrete bits of knowledge or skill, one at a time. Because many types of performance are ephemeral actions, a fair and technically sound assessment typically involves the creation of products. This ensures adequate documentation and the possibility of appropriate review and oversight in scoring the performance. See also perform.
WHERETO
An acronym for Where is it going?; Hook the students; Explore and equip; Rethink and revise; Exhibit and evaluate; Tailor to student needs, interests, and styles; Organize for maximum engagement and effectiveness. Considered in greater detail, WHERETO consists of the following components: • Where is the work headed? Why is it headed there? What are the student's final performance obligations, the anchoring performance assessments? What are the criteria by which student work will be judged for understanding? (These are questions asked by students. Help the student see the answers to these questions upfront.) • Hook the student through engaging and provocative entry points: thought-provoking and focusing experiences, issues, oddities, problems, and challenges that point toward essential questions, core ideas, and final performance tasks. • Explore and equip. Engage students in learning experiences that allow them to explore the big ideas and essential questions; that cause them to pursue leads or hunches, research and test ideas, try things out. Equip students for the final performances through guided instruction and coaching on needed skill and knowledge. Have them experience the ideas to make them real. • Rethink and revise. Dig deeper into ideas at issue (through the facets of understanding). Revise, rehearse, and refine, as needed. Guide students in selfassessment and self-adjustment, based on feedback from inquiry, results, and discussion. • Evaluate understanding. Reveal what has been understood through final performances and products. Involve students in a final self-assessment to identify remaining questions, set future goals, and point toward new units and lessons. • Tailor (personalize) the work to ensure maximum interest and achievement. Differentiate the approaches used and provide sufficient options and variety (without compromising goals) to make it most likely that all students will be engaged and effective. • Organize and sequence the learning for maximal engagement and effectiveness, given the desired results
Backward Design
An approach to designing a curriculum or unit that begins with the end in mind and designs toward that end. Although such an approach seems logical, it is viewed as backward because many teachers begin their unit design with the means—textbooks, favored lessons, and time-honored activities—rather than deriving those from the end—the targeted results, such as content standards or understandings. We advocate the reverse of habit: starting with the end (the desired results) and then identifying the evidence necessary to determine that the results have been achieved (assessments). With the results and assessments clearly specified, the designer determines the necessary (enabling) knowledge and skill, and only then, the teaching needed to equip students to perform. This view is not new. Ralph Tyler (1949) described the logic of backward design clearly and succinctly more than 50 years ago: Educational objectives become the criteria by which materials are selected, content is outlined, instructional procedures are developed and tests and examinations are prepared. . . . The purpose of a statement of objectives is to indicate the kinds of changes in the student to be brought about so that instructional activities can be planned and developed in a way likely to attain these objectives. (pp. 1, 45)
authentic assessment, authentic task
An assessment composed of performance tasks and activities designed to simulate or replicate important real-world challenges. The heart of authentic assessment is realistic performance-based testing—asking the student to use knowledge in real-world ways, with genuine purposes, audiences, and situational variables. Thus, the context of the assessment, not just the task itself and whether it is performance-based or hands-on, is what makes the work authentic (e.g. the "messiness" of the problem, ability to seek feedback and revise, access to appropriate resources). Authentic assessments are meant to do more than "test": they should teach students (and teachers) what the "doing" of a subject looks like and what kinds of performance challenges are actually considered most important in a field or profession. The tasks are chosen because they are representative of essential questions or challenges facing practitioners in the field. An authentic test directly measures students on the valued performances. By contrast, multiple-choice tests are indirect measures of performance. (Compare, for example, the road test versus the written test for getting a driver's license.) In the field of measurement, authentic tests are called "direct" tests. Contrast academic prompt and quiz.
Quiz
Any selected-response or short-answer test (be it oral or written) whose sole purpose is to assess for discrete knowledge and skill. Contrast academic prompt and authentic assessment.
Longitudinal Assessment
Assessment of the same performances over numerous times, using a fixed scoring continuum, to track progress (or lack thereof) toward a standard; also called "developmental assessment." For example, the National Assessment of Educational Progress (NAEP) uses a fixed scale for measuring gains in mathematics performance over the 4th, 8th, and 12th grade. Similarly, the American Council on the Teaching of Foreign Languages (ACTFL) uses a novice-expert continuum for charting the progress of all language students over time. Most school testing, whether done locally or statewide, is not longitudinal because the tests are one-time events with onetime scoring systems. Understanding by Design proposes an assessment system that uses scoring scales and tasks that can be used across many grade levels to provide longitudinal assessment.
Big Idea
In Understanding by Design, the core concepts, principles, theories, and processes that should serve as the focal point of curricula, instruction, and assessment. By definition, big ideas are important and enduring. Big ideas are transferable beyond the scope of a particular unit (e.g., adaptation, allegory, the American Dream, significant figures). Big ideas are the building material of understandings. They can be thought of as the meaningful patterns that enable one to connect the dots of otherwise fragmented knowledge. Such ideas go beyond discrete facts or skills to focus on larger concepts, principles, or processes. These are applicable to new situations within or beyond the subject. For example, students study the enactment of the Magna Carta as a specific historical event because of its significance to a larger idea, the rule of law, whereby written laws specify the limits of a government's power and the rights of individuals, such as due process. This big idea transcends its roots in 13th-century England and is a cornerstone of modern democratic societies. A big idea can also be described as a "linchpin" idea. The linchpin is the pin that keeps the wheel in place on an axle. Thus, a linchpin idea is one that is essential for understanding, without which the student cannot go anywhere. For instance, without grasping the distinction between the letter and the spirit of the law, students cannot understand the American constitutional and legal system—even if they are highly knowledgeable and articulate about facts of history. Without a focus on linchpin ideas with lasting value, students may be left with easily forgotten fragments of knowledge.
Benchmark
In an assessment system, a developmentally appropriate standard; sometimes called a "milepost" standard. For example, many districtwide systems set benchmarks for grades 4, 8, 10, and 12. In many state content standards, benchmarks provide further concrete indicators for the standards— they serve as substandards. In athletics and industry, the term is often used to describe the highest level of performance—the exemplars. Used as a verb, benchmark means to search for a best performance or achievement specification for a particular objective. The resulting benchmark (noun) sets the highest possible standard of performance, a goal to aim toward. Thus, a benchmark in this sense is used when teachers want their assessment to be anchored by the best possible samples of work (versus being anchored by samples of work from an average school district). An assessment anchored by benchmarks, in either sense of the word, should not be expected to yield a predictable curve of results. Standards differ from reasonable expectations. (See also standard.) It is possible that very few products or performances—or even none at all—will match the benchmark performance.
Outcome
In education, shorthand for "intended outcomes of instruction." An intended outcome is a desired result, a specific goal to which educators commit. Understanding by Design uses the terms achievement target and goal to describe such intents. To determine if outcomes have been attained requires agreement on specific measures—the assessment tasks, criteria, and standards. Despite the controversies in past years about Outcomes-Based Education, the word outcome is neutral, implying no particular kind of target or educational philosophy. It refers to the priorities of a curriculum or an educational program. An outcome-based approach focuses on desired outputs, not the inputs (content and methods). The key question is results-oriented (What will students know and be able to do as a result of instruction?) rather than input based (What instructional methods and materials shall we use?).
Reliability
In measurement and testing, the accuracy of the score. Is it sufficiently free of error? What is the likelihood that the score or grade would be constant if the test were retaken or the same performance were rescored by someone else? Error is unavoidable; all tests, including the best multiplechoice tests, lack 100 percent reliability. The aim is to minimize error to tolerable levels. In performance assessment the reliability problem typically occurs in two forms: (1) To what extent can we generalize from the single or small number of performances to the student's likely performance in general? and (2) What is the likelihood that different judges will see the same performance in the same way? The second question involves what is typically termed "inter-rater reliability." Score error is not necessarily a defect in the test-maker's methods, but a statistical fact related to (1) how extraneous factors inevitably influence testtakers or judges, or (2) the limits of using a small sample of questions or tasks in a single sitting. It is possible to obtain adequate reliability by ensuring that there are multiple tasks for the same outcome; better reliability is obtained when the student has many tasks, not just one. Also, scoring reliability is greatly improved when evaluation is performed by well-trained and supervised judges, working from clear rubrics and specific anchor papers or performances. (These procedures have long been used in large-scale writing assessments and in the advanced placement program.)
Process
In the context of assessment, the intermediate steps the student takes in reaching the final performance or end-product specified by the assessment. Process thus includes all strategies, decisions, subskills, rough drafts, and rehearsals used in completing the given task. When asked to evaluate the process leading to the final performance or product, the assessor is sometimes asked to explicitly judge the student's intermediate steps, independent of what can be inferred about those processes from the end result. For example, one might rate a student's ability to work within a group or prepare an outline as a prewriting component of a research project, independent of the ultimate product the group or individual writer produces. However, evaluating process skills separately requires caution. The emphasis should be on whether the final product or performance met the standards set—irrespective of how the student got there.
Resultant knowledge and skill
Knowledge and skill that are meant to result from a unit of study. In addition to the targeted understanding, teachers identify other desired outcomes (for example "skill in listening"). Resultant knowledge and skill differs from prerequisite knowledge and skill. Resultant knowledge is the goal of the unit. Prerequisite knowledge is what is needed to accomplish the goals of the unit. For example, in a unit that culminates in a historical role-play, the prerequisite knowledge involves the biographical facts of the people being portrayed and the prerequisite skill is the ability to role-play. Designers using UbD identify the resultant knowledge and skill in Stage 1, and they weave the prerequisite knowledge into Stage 3, the learning plan
Curriculum
Literally, "the course to be run." In Understanding by Design, the term refers to the explicit and comprehensive plan developed to honor a framework based on content and performance standards.
Application
One of the six facets of understanding and a time-honored indicator of understanding. The ability to apply knowledge and skill in diverse situations provides important evidence of the learner's understanding. The idea is not new or specific to UbD. Bloom and his colleagues (1956) saw application as central to understanding and quite different from the kind of plugging-in and fill-in-the-blanks activity found in so many classrooms: "Teachers frequently say: If a student really comprehends something, he can apply it. . . . Application is different in two ways from knowledge and simple comprehension: the student is not prompted to give specific knowledge, nor is the problem old-hat" (p. 120).
Self-knowledge
One of the six facets of understanding. As discussed in the context of the facets theory, self-knowledge refers to accuracy of selfassessment and awareness of the biases in one's understanding because of favored styles of inquiry, habitual ways of thinking, and unexamined beliefs. Accuracy of self-assessment in this case means that the learner understands what he does not understand with clarity and specificity. (Socrates referred to this capacity as "wisdom.") Self-knowledge also involves the degree of awareness of biases and how these influence thinking, perceptions, and beliefs about how the subject is to be understood. One does not just receive understanding (like images through eyes), in other words; ways of thinking and categorizing are projected onto situations in ways that inevitably shape understanding. See also application; empathy; explanation; interpretation; and perspective.
Empathy
One of the six facets of understanding. Empathy, the ability to "walk in another's shoes," to escape one's own emotional reactions to grasp another's, is central to the most common colloquial use of the term understanding. When we "try to understand" another person, people, or culture, we strive for empathy. It is thus not simply affective response; it is not sympathy. It is a learned ability to grasp the world (or text) from someone else's point of view. It is the discipline of using one's imagination to see and feel as others see and feel, to imagine that something different might be possible, even desirable. Empathy is not the same as perspective. Seeing something in perspective involves seeing from a critical distance; detaching oneself to see more objectively. Empathy involves seeing from inside another person's worldview; embracing the insights, experience, and feelings that are found in the subjective or aesthetic realm.The term was coined by a German scholar, Theodor Lipps, at the turn of the 20th century to describe what the audience must do to understand a work or performance of art. Empathy is thus the deliberate act of finding what is plausible, sensible, or meaningful in the ideas and actions of others, even if they appear puzzling or off-putting. See also application; explanation; interpretation; perspective; self-knowledge.
Perspective
One of the six facets of understanding. The ability to see other plausible points of view. It also implies that understanding enables a distance from what one knows, an avoidance of getting caught up in the views and passions of the moment. See also application; empathy; explanation; interpretation; self-knowledge.
Interpretation
One of the six facets of understanding. To interpret is to find meaning, significance, sense, or value in human experience, data, and texts. It is to tell a good story, provide a powerful metaphor, or sharpen ideas through an editorial. Interpretation is thus fraught with more inherent subjectivity and tentativeness than the theorizing or analyzing involved in explanation. Even if one knows the relevant facts and theoretical principles it is necessary to ask, What does it all mean? What is its importance? (In fact, one definition in the dictionary for the verb understand is "know the import of.") A jury trying to understand child abuse seeks significance and intent, not accurate generalizations from theoretical science. The theorist builds objective knowledge about the phenomenon called abuse, but the novelist may offer as much or more insight through inquiry into the psychic life of a unique person. This narrative building is the true meaning of constructivism. When teachers say that students must "make their own meaning," they mean that handing students prepackaged interpretations or notions of significance, without having the students work it through and come to see some explanations and interpretations as more valid than others, leads to sham understanding. A purely didactic teaching of the interpretation is likely to lead to superficial and quickly forgotten knowledge, and it misleads students about the inherently arguable nature of all interpretation. See also application; empathy; explanation; perspective; self-knowledge.
Explanation
One of the six facets of understanding. Understanding involves more than just knowing information. A person with understanding is able to explain why it is so, not just state the facts. Such understanding emerges as a well-developed and supported theory, an account that makes sense of data, phenomena, ideas, or feelings. Understanding is revealed through performances and products that clearly, thoroughly, and instructively explain how things work, what they imply, where they connect, and why they happened. Understandings in this sense thus go beyond merely giving back "right" answers to providing warranted opinions (to justify how the student got there and why it's right). Such verbs as justify, generalize, support, verify, prove, and substantiate get at what is needed. Regardless of content or the student's age or sophistication, understanding in this sense reveals itself in the ability to "show your work," to explain why the answer is correct, to subsume current work under more general and powerful principles, to give valid evidence and argument for a view, and to defend that view. See also application; empathy; interpretation; perspective; self-knowledge.
Audit Test
Our term for the state or national standardized test. Like the business audit or doctor's physical exam, it is a brief test that assesses something important and complex using simpler indicators. The test questions are proxies for more important goals and standards, in the same way that a blood pressure reading gives a quick snapshot of overall health. We think it important to make this point to remind readers that the goal and look of the standardized test is very different from the goal and look of more direct assessment of the goals and standards, so it makes little sense to attend solely to the audit. Rather, the audit will go well to the extent that "health" is attended to locally. Contrast direct test.
Iterative
Requiring continual revisiting of earlier work. An iterative approach is thus the opposite of linear or step-by-step processes. Synonyms for iterative are recursive, circular, and spiral-like. The curricular design process is always iterative; designers keep revisiting their initial ideas about what they are after, how to assess it, and how they should teach to it as they keep working on each element of the design. They rethink earlier units and lessons in light of later designs and results—the learning that does (or does not) occur.
Anchors
Samples of work or performance used to set the specific performance standard for each level of a rubric. For example, attached to the paragraph describing a level-six performance in writing would be two or three samples of writing that illustrate what a level-six performance is. (The anchor for the top score is often called the "exemplar.") Anchors contribute significantly to scoring reliability. A rubric without such anchors is typically far too ambiguous to set a clear standard for judges and performers alike. Such phrases as "sophisticated and persuasive" or "insightful mathematical solution" have little meaning unless teachers have examples of work that provide concrete and stable definitions. Anchors also support students by providing tangible models of quality work.
Unit
Short for a "unit of study." Units represent a coherent chunk of work in courses or strands, across days or weeks. An example is a unit on natural habitats and adaptation that falls under the yearlong strand of living things (the course), under 3rd grade science (the subject), and under science (the program). Though no hard and fast criteria signify what a unit is, educators generally think of a unit as a body of subject matter that is somewhere in length between a lesson and an entire course of study; that focuses on a major topic (e.g., Revolutionary War) or process (e.g., research process); and that lasts between a few days and a few weeks
Design
To plan the form and structure of something or the pattern or motif of a work of art. In education, teachers are designers in both senses, aiming to develop purposeful, coherent, effective, and engaging lessons, units, and courses of study and accompanying assessments to achieve identified results. To say that something happens by design is to say that it occurs through thoughtful planning as opposed to by accident or by "winging it." At the heart of Understanding by Design is the idea that what happens before the teacher gets in the classroom may be as or more important than the teaching that goes on inside the classroom.
Assess
To thoroughly and methodically analyze student accomplishment against specific goals and criteria. The word comes from the Latin assidere, meaning "to sit beside." See also performance task.
Assessment
Techniques used to analyze student accomplishment against specific goals and criteria. A test is one type of Assessment. Others include clinical interviews (as in Piaget's work), observations, self-assessments, and surveys. Good assessment requires a balance of techniques because each technique is limited and prone to error. To refer to "assessments" instead of just "tests" is also a distinction of manner and attitude, as implied by the Latin origin of the word assess; to assess is to "sit with" the student. The implication is that in an assessment the teacher makes thoughtful observations and disinterested judgments, and offers clear and helpful feedback. Assessment is sometimes viewed as synonymous with evaluation, though common usage differs. A teacher can assess a student's strengths and weaknesses without placing a value or a grade on the performance. See also performance task; standardized.
Transferability
The ability to use knowledge appropriately and fruitfully in a new or different context from that in which it was initially learned. For example, a student who understands the concept of "balanced diet" (based on the USDA food pyramid guidelines) transfers that understanding by evaluating hypothetical diets for their nutritional values and by creating nutritional menus that meet the food pyramid recommendations
Blooms Taxonomy
The common name of a system that classifies and clarifies the range of possible intellectual objectives, from the cognitively easy to the difficult; in effect, a classification of degrees of understanding. More than 40 years ago, Benjamin Bloom and his colleagues in testing and measurement developed this schema for distinguishing the simplest forms of recall from the most sophisticated uses of knowledge in designing student assessments. Their work was summarized in the now ubiquitous text titled Taxonomy of Educational Objectives: Cognitive Domain. As the authors often note, the writing of this book was driven by persistent problems in testing. Educators needed to know how educational objectives or teacher goals should be measured, given the absence of clear agreement about the meaning of objectives such as "critical grasp of" and "thorough knowledge of"—phrases that test developers typically use. In the introduction to the Taxonomy, Bloom and his colleagues (1956) refer to "understanding" as a commonly sought but ill-defined objective: For example, some teachers believe their students should "really understand," others desire their students to "internalize knowledge," still others want their students to "grasp the core or essence." Do they all mean the same thing? Specifically, what does a student do who "really understands" which he does not do when he does not understand? Through reference to the Taxonomy. . . teachers should be able to define such nebulous terms. (p. 1) They identified six cognitive levels: Knowledge, comprehension, Application, Analysis, Synthesis, and Evaluation, with the last three commonly referred to as "higher order." Note that in this scheme, higher-order thinking does not include application as they defined it. This seems odd, given the seemingly complex demands of application and the concern expressed by many advocates of authentic assessment about getting the student to more effectively apply knowledge. But this is not what Bloom and his colleagues meant by apply. They were speaking of those narrower cases in which a student must use discrete knowledge or skill in an exam setting, as when constructing a sentence or solving a math word problem; they were not referring to the more sophisticated act of drawing upon a repertoire to solve a complex, multifaceted, contextualized problem. The authors' description of synthesis thus better fits the meaning of application used in Understanding by Design in particular and the performance assessment movement in general, because they stress that such an aim requires the "students' unique production."
Validity
The inferences one can confidently draw about student learning based on the results of an assessment. Does the test measure what it purports to measure? Do the test results correlate with other performance results educators consider valid? Does the sample of questions or tasks accurately correlate with what students would do if tested on everything that was taught? Do the results have predictive value; that is, do they correlate with likely future success in the subject in question? If some or all of these questions must have a "yes" answer, a test is valid. Because most tests provide a sample of student performance, the scope and nature of the samples influence the extent to which valid conclusions may be drawn. Is it possible to accurately and reliably predict from the performance on a specific task that the student has control over the entire domain? Does one type of task enable an inference to other types of tasks (say, one genre of writing to all others)? No. Thus, the typically few tasks used in performance assessment often provide an inadequate basis for generalizing. One solution is to use a wide variety of student work of a similar type or genre, collected over the year, as part of the summative assessment. To be precise, it is not the test itself that is valid, but the inferences that educators claim to be able to make from the test results. Thus, the purpose of the test must be considered when assessing validity. Multiple-choice reading tests may well be valid if they are used to test the student's comprehension ability or to monitor grade-level reading ability of a district's population as compared to other large populations. They may not be valid as measures of a pupil's repertoire of reading strategies and the ability to construct apt and insightful responses to texts. The format of the test can be misleading; an inauthentic test can still be technically valid. It may aptly sample from the subject domain and predict future performance accurately but nonetheless be based on inauthentic, even trivial, tasks. The SAT college admissions test and tests such as the OtisLennon School Ability Test are said by their makers to be valid in this more limited sense: they are efficient proxies that serve as useful predictors. Conversely, an authentic task may not be valid. The scoring system can raise other questions about validity. To ask if a performance task is valid is to ask, within the limits of feasibility, if the scoring targets the most important aspects of performance as opposed to that which is most easily scored. Have the most apt criteria been identified, and is the rubric built upon the most apt differences in quality? Or has scoring focused merely on what is easy to count and score? Has validity been sacrificed for reliability, in other words?
Prerequisite Knowledge and Skill
The knowledge and skill required to successfully perform a culminating performance task or achieve a targeted understanding. Typically prerequisites identify the more discrete knowledge and know-how required to put everything together in a meaningful final performance. For example, knowledge of the USDA food pyramid guidelines would be considered a prerequisite to the task of planning a healthy, balanced diet for a week. Contrast resultant knowledge and skill
Criteria
The qualities that must be met for work to measure up to a standard. To ask, "What are the criteria?" is the same as asking, "What should we look for when examining students' products or performances to know if they were successful? How will we determine acceptable work?" Criteria should be considered before the design of specific performance tasks (though this seems odd to novice designers). Designing a task that measures critical thinking requires knowing beforehand what the indicators of such thinking are, and then designing the task so that students must demonstrate those traits through performance. An assessment must also determine how much weight each criterion should receive relative to other criteria. Thus, if teachers agree that spelling, organization, and the development of ideas are all important in judging writing, they must then ask, "Are they of equal importance? If not, what percentage should we assign to each?" The criteria used in judging performance, like a test itself, can be valid or invalid, and authentic or inauthentic. For example, a teacher can assign students to do some original historical research (an authentic task) but grade the work only on whether four sources were used and whether the report is exactly five pages long. Such criteria would be invalid because a piece of historical work could easily not meet those two criteria but still be excellent research. Criteria should correspond to the qualities of masterful performance. Many performance assessments undervalue so-called impact criteria. (See Chapters 5 and 6 in Wiggins [1998] for more on these types of criteria.)
Enduring Understanding
The specific inferences, based on big ideas, that have lasting value beyond the classroom. In UbD, designers are encouraged to write them as full-sentence statements, describing what, specifically, students should understand about the topic. The stem "Students will understand that . . ." provides a practical tool for identifying understandings. In thinking about the enduring understandings for a unit or course, teachers are encouraged to ask, "What do we want students to understand and be able to use several years from now, after they have forgotten the details?" Enduring understandings are central to a discipline and are transferable to new situations. For example, in learning about the rule of law, students come to understand that "written laws specify the limits of a government's power and articulate the rights of individuals, such as due process." This inference from facts, based on big ideas such as "rights" and "due process," provides a conceptual unifying lens through which to recognize the significance of the Magna Carta as well as to examine emerging democracies in the developing world. Because such understandings are generally abstract in nature and often not obvious, they require uncoverage through sustained inquiry rather than one-shot coverage. The student must come to understand or be helped to grasp the idea, as a result of work. If teachers treat an understanding like a fact, the student is unlikely to get it.
Design Standards
The specific standards used to evaluate the quality of unit designs. Rather than treating design as merely a function of good intentions and hard work, standards and a peer review process provide a way for teacher work to be assessed in the same way that student work is assessed against rubrics and anchors. The design standards have a dual purpose: (1) to guide self-assessment and peer reviews to identify design strengths and needed improvements; and (2) to provide a mechanism for quality control, a means of validating curricular designs.
Product
The tangible and stable result of a performance and the processes that led to it. The product is valid for assessing the student's knowledge to the extent that success or failure in producing the product (1) reflects the knowledge taught and being assessed, and (2) is an appropriate sample from the whole curriculum of the relative importance of the material in the course.
Standard
To ask, "What is the standard?" is to question how well the student must perform, at what kinds of tasks, based on what content, to be considered proficient or effective. Thus, there are three kinds of standards, each addressing a different question. Content standards answer the question, "What should students know and be able to do?" Performance standards answer the question, "How well must students do their work?" Design standards answer the question, "What worthy work should students encounter?" Most state documents identify only content standards. Some also identify performance standards—a specific result or level of achievement that is deemed exemplary or appropriate (typically measured by a standardized test). Understanding by Design also identifies and emphasizes design standards related to the quality of the task itself; these are the standards and criteria by which educators distinguish sound from unsound units. Confusions abound because of these various kinds of standards. Worse, the word standard is sometimes used as a synonym for high expectations. At other times, it is used as a synonym for benchmark—the best performance or product that can be accomplished by anyone. And in large-scale testing, standard has often implicitly meant minimal standard; that is, the lowest passing score. One also often hears standards discussed as if they were general guidelines or principles. Finally, standard is routinely confused with the criteria for judging performance. (Many people falsely believe that a rubric is sufficient for evaluation. But an articulated performance standard, often made real by anchors or exemplars, is also necessary.) When talking about standards-based education, educators should consider a number of points. First, in a general sense, they must be careful not to confuse standards with expectations. A performance standard is not necessarily meant to be reachable by all who try and are well trained; that's better thought of as an expectation. A standard remains worthy whether or not few people or any people can meet it. That is very different from an expectation that happens to be high or a "reach"—something that a good number of students not only can but ought to meet, if they persist and get good teaching from teachers (who have high expectations). Second, a performance standard in assessment is typically set by an "exemplary" anchor performance or some specification or cut-off score. Consider wider-world benchmarks: the four-minute mile, the Malcolm Baldrige Award- winning companies, Hemingway's writing, Peter Jennings's oral presentation. Few student performers, if any, will meet such standards, but they are still worthy targets for framing a program and an assessment. School tests rarely set performance standards using such professional benchmarks (though such exemplars serve as instructional models and as sources of criteria for rubrics). A school standard is typically set through the selection of peer-based anchors or exemplars of performance—what might be called "milepost" or "ageappropriate" standards. The choice of such exemplary work samples sets the de facto standard. A key assessment question thus becomes, Where should the samples of student work come from? What would be a valid choice of anchors? And how do teachers link school standards to wider-world and adult standards? What teachers typically do is select the best work available from the overall student population being tested. (Proponents of UbD believe, however, that students need to be more routinely provided with anchors that come from slightly more advanced and experienced students, to serve as a helpful longer-range target and to guide ongoing feedback.) Third, a standard differs from the criteria used to judge performance. The criteria for the high jump or the persuasive essay are more or less fixed no matter the age or ability of the student. All high jumps, to be successful, must meet the same criterion: The bar must stay on. In writing, all persuasive essays must use appropriate evidence and effective reasoning. But how high should the bar be? How sophisticated and rigorous should the arguments be? Those are questions about standards. (The descriptors for the different levels in a rubric typically contain both criteria and standards.) Standards are not norms, however, even if norms are used to determine age-appropriate standards. Traditionally, performance standards have been put into operation by fixing a minimally acceptable performance level through so-called cutoff, or cut, scores. Typically, in both classroom grading and on state tests, a score of 60 is considered a minimal standard of performance. But test designers are rarely asked to establish a defensible cut score. Stating at the outset that 60 is passing and 59 is failing is arbitrary; few tests are designed so that a significant, qualitative difference distinguishes a 59 and a 61. It is thus all too easy, when thinking of a standard as a cutoff point, to turn what should be a criterion-referenced scoring system into a norm-referenced scoring system. Thus, improving content standards will not necessarily raise performance standards. Content refers to input and performance to output. Content standards state the particular knowledge the student should master. Many current reforms assume improving the inputs will necessarily improve the output. But this is clearly false. One can still receive poor-quality work from students in a demanding course of study. In fact, it is reasonable in the short term to expect to obtain worse performance by raising content standards only; establishing higher standards only in the difficulty of what is taught will likely lead to greater failure by students, if all other factors (teaching and time spent on work) remain constant. The key question to ask in setting valid and useful performance standards must always be, At what level of performance would the student be "appropriately qualified or certified"? An effective solution to putting standards into operation is thus to equate internal teacher and school standards to some equivalent, worthy level of achievement in the outside world—a wider-world benchmark—thus lending substance, stability, and credibility to the scoring. This is a common feature of vocational, musical, athletic, and other performancebased forms of learning.
Self-Knowledge
perceive the personal style, prejudices, projections, and habits of mind that both shape and impede understanding; be aware of what is not understood and why it is so hard to understand.
Explain
provide thorough, supported, and justifiable accounts of phenomena, facts, and data.
Interpret
tell meaningful stories; offer apt translations; provide a revealing historical or personal dimension to ideas and events; make something personal or accessible through images, anecdotes, analogies, or models.