DELTA (Testing Terms)
Construct validity
A quality of an effective test. If a language test has construct validity, it reflects an accepted theory of language use and tests only the subskills/competencies that would naturally be involved. Example : Dictation as a test of listening comprehension lacks construct validity as a) it involves a number of skills not generally needed when listening - ie the ability to retain the exact words used, the ability to write and to spell; b) it is generally carried out under conditions which are quite different to those generally required when listening.
Achievement test
A test given at the end of a course to test whether the learners have assimilated what has been taught have passed the course, and therefore (potentially) are ready to go on to the next level. Example: many coursebooks include an "end of course" test, which can be used for this purpose.
Objective tests
A test in which there are exact "right" answers and no judgment is called for on the part of the marker. Examples : A multiple choice test where each item has one right answer and three or four incorrect distractors; a true-false question.
Cohort / Test Cohort
a group of people who share the same characteristics. A test cohort is therefore composed of all the people who take a particular test at a particular time. For example, anyone who takes IELTS in the summer this year will be part of the IELTS test cohort for that exam session.
placement test
a test used to determine which course a learner should attend. It would be used in eg a private language school to assign newly-arrived learners to a class at an appropriate level.
Distractors
The incorrect alternatives in a multiple choice test. Example : a,c and d in : Thanks for .... me about it. a) tell b) telling c) told d) you tell
Proficiency tests
aim to certify that a learner is at a specific level, regardless of what previous course(s) s/he has or has not done. Many proficiency tests are run by external examining boards such as Cambridge English. For example, the Cambridge First exam is a proficiency test which, if passed, certifies that the learner is at CEFR B2 level.
Summative assessment
evaluates the success of past learning in terms of pass/fail. They show to what extent the learner has or hasn't achieved the standards required by the programme. Examples of summative tests include achievement tests and proficiency tests.
Fit for Purpose
good test needs to be fit for purpose - in other words it needs to match the requirements of the situation in which it is being used. This involves it demonstrating a number of qualities associated with effective testing in general - reliability, transparency, coverage, different types of validity, practicality etc - but also matching the needs of the learners with whom it is being used - and this might include such factors as age, level, culture, communicative needs and so on.
Predictive validity
if the results accurately indicate whether or not the test taker will be able to perform a specified "real life" task. In language testing this obviously implies "...will be able to perform a specified "real life" task in terms of the level of language competence required". Example : The IELTS exam is intended to predict whether or not the test taker will be able to cope, as far as English language competence is concerned, when following a university course in English.
Content validity
test quality concerned with whether the test under consideration tests only what has been covered in the preceding course (for progress tests), the preceding course syllabus (for achievement tests) or test specification (for proficiency tests). Example: in their course, the learners have been studying the present continuous for ongoing present events, These are included in their progress test, but the reading comprehension also includes several examples of the present continuous used to express future arrangements - and tests learners on their understanding of this. This test lacks content validity.
Transparency
test quality which refers to the extent to which it is clear to the learner what s/he has to do. This may be affected by factors such as the clarity of the instructions, a specific number of marks, time and how much should be allocated to each task, etc.
Test Battery
test which involves more than one activity type - in other words, the test is composed of a group of activities which may be scored individually or whose scores may be combined into one overall score. For example, the Cambridge English PET has three components : - A Reading and Writing test - A Listening test - A Speaking test The results are combined into a single score, which is then graded as Fail, Pass, Pass with Merit, or Pass with Distinction
Progress Tests
tests given during a course to check how well the items/subskills taught up to that point have been assimilated and retained. They have a formative purpose - ie they help the teacher and learners decide how successful the learning has been to that point, what needs recycling and consolidation, whether different learning strategies need to be introduced and used etc etc. For example, a text book with twenty units might provide a test after every five units which focuses on the language items/subskills that the learners have met in those units.
Discrete item (or discrete point) test
tests one element of language at a time. For example, the following multiple-choice item tests only the learner's knowledge of the correct past form of the verb sing: 14. When I was a child I .......... in a choir. a. sing b. singed c. song d. sung e. sang They have the advantages of often being practical to administer and mark, and objective in terms of marking. However, they show only the learners' ability to recognise or produce individual items - not how s/he would use the language in actual communication. In other words, they are inevitably indirect tests
norm referenced testing
the percentage of the test cohort (ie the people taking the exam) who will pass or get certain grades is decided in advance. Eg if I have a class of 20 learners I might decide in advance that, when the test is marked, the top 25% (ie the 5 learners with the highest marks) will get Grade A, the second 25% Grade B, the third 25% grade C, and the final 25% will fail.
Impact
refers to the socio-economic effect the introduction of a test may have Example : Since specific IELTS grades have been adopted by a wide range of universities as an entrance requirement, a significant market has arisen for IELTS preparation courses,with a resulting positive economic effect for the language schools which run these courses and/or act as an exam centre.
Diagnostic Testing
test given at the beginning of a course which aims to discover exactly what the learners know or don't know already and where their strengths and weaknesses lie. This information is then used to decide the course content. Diagnostic tests are therefore formative in function. Example: A teacher in a private language school has a new group of learners. In the school placement test, they all tested out at CEFR B1 level.
Reliability
the term used in testing to indicate whether the scores on a test give an accurate result - ie competent users of the language do well whilst those with less competence do less well. Examples There are two types of reliability : 1) Test-retest reliability (if the learner took the test twice would s/he get the same result?) 2) Mark-remark reliability (if the test was marked by two different people, would the result be the same?)
Direct tests
they ask the learner to demonstrate their ability to perform a specific "real life" communicative task by asking them to actually do it. They therefore demonstrate the learner's ability to use the language in actual communication. An example of a direct test which is also integrative might be role-playing a job interview, where the learners have to listen and understand the examiner's questions, choose the relevant grammar and lexis to express their ideas, speak with intelligible/accurate pronunciation and intonation etc.
Formative assessment
used to improve the quality of future learning - ie to help the teacher and learners decide how successful their learning has been up to that point, what needs recycling and consolidation before they can move on, whether different learning strategies need to be introduced and used etc etc. Examples of tests with a formative purpose are diagnostic tests and progress tests.
Coverage
quality of a test, and refers to the number and appropriacy of the items included. This should be balanced and reflect what the learner has been taught and/or is expected to know. A test might have inadequate coverage if: a) it was too short overall b) it over-emphasised some items at the expense of others c) it overemphasised some systems/skills at the expense of others d) it overemphasised some sections of a course at the expense of others e) It was too easy or difficult
Discrimination (testing)
refers to the extent to which a test distinguishes between strong and weak students. When a test is pre-tested, attention will be paid to each item's item discrimination index. If all the members of the cohort get that items right, or if all get it wrong, the item will be rejected - it gives no evidence of the differing knowledge/ability of the members of the cohort and is therefore useless for testing purposes. If, on the other hand, only 20% of the pre-test cohort get an item right, and are the same 20% who score highest on the test overall, then the item clearly discriminates between the "top" group and the rest.
Backwash (Washback)
Backwash (also called washback) is the effect that knowledge of the contents of a test may have on the course which precedes it. It may be positive or negative. Example: If students are working towards an exam where all of the test items focus on grammatical accuracy, the teacher (possibly at the student's instigation) may spend much of the preceding course focusing on this area - and possibly neglecting other areas that they will need outside the course such as spoken fluency or listening. Thus the test would have negative backwash - it pushed the teacher into "teaching for the test" rather than providing a balanced course which dealt with all the students' needs and developed other areas of competence than just grammatical knowledge.
Practicality
In testing, how easy or difficult a test is to administer and mark. Example : A multiple choice test is more practical than a test using essay writing as the multiple choice test can be marked quickly and easily by using a template (without even needing to read the answers) or by computer, while the essay test requires a marker to take the time to read,and consider carefully what has been written.
Face validity
concerns the extent to which a test appears valid - ie appears to be "a good test" to the people using it - the learners taking the test, their parents, the teachers and institutions putting them in for the test etc. As these people may not be testing experts, they may of course not fully understand what is involved. As an example: if you look at the example given under construct validity, you will see that there are various reasons why dictation is not a valid test of listening comprehension. Yet many learners (and teachers) who are used to it as an activity type may accept quite happily that listening ability should be tested in this way, and make no objection to a test which includes it.
Integrative tests
may be either direct or indirect. The use of the term integrative indicates that they test more than one skill and/or item of knowledge at a time. Dictation is an integrative test because it involves listening skills, writing skills, recognition of specific language items, grammar (eg in order to distinguish whether /əv/ should be written as have or of) and so on
criterion referenced testing
measures learner performance against a set of predetermined factors - ie descriptions of what learners are expected to know and/or be able to do at a specific stage of their learning