1b. What Are we Testing and Why?
Proficiency tests
Assesses and examinee's level of language ability, but without respect to a particular curriculum. Typically, this involves assessing more than one narrow aspect of language ability.
Performance Assessments
Assessments that require actual performances of relevant tasks, usually involving writing or speaking
Test Usefulness
Characteristics that make a test more or less useful for its intended purposes. One or more will often be prioritized in a given situation, but not to the extent that the other are ignored. Test developers need to decide how important each quality is, and set minimally acceptable levels for it.
Semi-direct tests
Speaking test that require the test takers ro record their speech rather than talk directly to a human interlocutor. These tests are generally tape-mediated or computer-mediated.
Item Response Theory
Statistical approach to performing item and reliability analysis with large groups of test takers. It assumes that for an item with a given level of difficulty, the probability that a test taker will answer correctly depends on their level of ability.
Target Language Use (TLU) domain
The context outside of the test, whether in the real world or in the classroom, where test takers will use the language. It is these context to which we are hoping to generalize our score-basked inferences.
Interpretive Framework
The way in which a test or test score is viewed; most importantly, whether the test is norm-referenced or criterion-referenced.
Washback
1) The effect of a test on teaching and learning, including effects on all aspects of the curriculum, including materials and teaching approaches, as well as on what the students do to learn and prepare for tests. 2) the effect of a test on teaching and learning.
Correlation coefficient
A mathematical estimate of the strength of relationship between to sets of numbers.
Rating scale
A set of generic descriptions of test taker performance which can be used to assign scores to an individual examinee's performance in a systematic fashion.
Scoring rubric
A set of generic descriptions of test taker performance which can be used to assign scores to an individual examinee's performance in a systematic fashion.
Objective test
A test that can be scored objectively, and therefore uses selected-response questions (particularly multiple-choice questions, but sometimes true-false or matching questions as well). Because the planning and writing of a test cannot be truly objective, this is a misnomer, and it is better to think of these as objectively scored tests.
Subjective test
A test that involves human judgement to score, as in most test of writing or speaking. Despite the name, however, it is possible to reduce the degree of subjectivity in rated test through well-written rating scales and appropriate rater training.
Reliability
Consistency of scoring. Estimated statistically. Strictly speaking, applies to NRT.
Construct validity
Degree to which it is appropriate to interpret a test score as an indicator of the construct of interest.
Standard Deviations
Measure of dispersion that is analogus to the average difference between individual scores and the mean.
Screening tests
Proficiency test used to make selection decisions--most commonly about whether someone is sufficiently proficient in the target language to be qualified for a particular job.
Descriptive Statistics
Quantitative measures that describe how a group of scores are distributed, particularly how much they are clustered together, how much they are spread out from each other, and how the distribution of scores is shaped.
Norm-referenced tests (NRTs)
Test on which an examinee's results are interpreted by comparing them to how well others did on the test. Scores are often reported in terms of test takers' percentile scores, not percentage correct.
Formative Assessment
Test or other assessment that takes place while students are still in the process of learning something, and that is used to monitor how well that learning is progressing. Closely related to Progress assessment.
Summative Assessment
Test or other assessment typically given at the end of a unit, course, program, etc. That provides information about how much students learned. Closely related to achievement tests; in fact, most achievement testing is largely summative, and summative testing usually aims to assess learner achievement.
Indirect test
Test that attempts to assess one of the so-called "Productive skills" through related tasks that do not require any speaking or writing. Instead, they rely upon tasks that will be easier and/or faster to grade; for example, an indirect test of writing might include a multiple-choice test of grammatical knowledge or error detection.
Direct test
Test that requires examinees to use the ability that is supposed to be being assessed; for example, a writing test that requires test takers to write something or a speaking test that requires examinees to speak.
Discrete-point testing
Test that uses a series of separate, unrelated tasks (usually test questions) to assess one "bit" of language ability at a time. This is typically done with multiple choice questions.
Admission test
Test use to decide whether a student should be admitted to the program.
Progress test
Test used to asses how well students are doing in terms of mastering course content and meeting course objectives. This is done from the point of view that the learning is still ongoing.
Achievement test
Test used to identify how well students have met course objectives or mastered course content.
Diagnostic test
Test used to identify learners' areas of strength and weakness. Some language programs also use diagnostic tests to confirm that students were place accurately.
Construct
The ability we want to assess. Not directly observable.