Chapter 4 - Of Tests and Testing

Ace your homework & exams now with Quizwiz!

Psychological Traits

'Any distinguishable, relatively enduring way in which one individual varies from another.' - Guildford, 1956. Often adjectives that are used to describe people (e.g., honest, kind, diligent, cold, reliable etc). These exist as constructs, and we can infer their existence from overt behaviour, such as test scores. They are relatively stable, may change over time, yet there are often high correlations between _____ scores at different time points. The nature of the situation influences how they will be manifested.

Stanines

'Standard Nines'. These are percentiles grouped into 9 bands; the percent of cases in each band differs. It simplifies scores to only range from 1 - 9 (M = 5, SD = 2). They are commonly used in educational testing. However, one major issue is that discrimination can be lost with bands.

Norm-referenced testing and assessment

A method of evaluation and a way of deriving meaning from test scores by evaluating an individual test-taker's score and comparing it to scores of a group of test-takers. The meaning of an individual test score is understood relative to other scores on the same test. E.g., NAPLAN, IQ etc

National Norms

A norm derived from a normative sample that was nationally representative of the population at the time the norming study was conducted. Involves testing large numbers of people representative of different variables (e.g., age, gender, racial/ethnic background, SES etc). A lot of our norms are based on American statistics; we need to be mindful of this.

Incidental/Convenience Sample

A sample that is convenient or available for use. May not be representative of the population. Generalisation of findings from these samples must be made with caution.

Norm referencing

A sample that use reference to peers or others with similar characteristics. Requires set of 'norms' derived from test scores from a representative sample of the target population. It requires the conversion of raw test scores into standard scores or percentiles that are based on normative data. Interpretation: relative position of individual or likelihood of getting that score compared to a representative sample of peers.

Random Error

A source of error in measuring a targeted variable caused by random, unpredictable fluctuations and inconsistencies of other variables in the measurement process (i.e., noise).

Systematic Error

A source of error in measuring a variable that is typically constant or proportionate to what is presumed to be the true value of the variable being measured.

Test Construction

A source of error variance where variation may exist within items on a test or between tests (i.e., item sampling or content sampling).

Test Scoring and Interpretation

A source of error where even though computer testing reduces error in test scoring, many tests still require expert interpretation (e.g., projective tests). Subjectivity in scoring can enter into behavioural assessment also.

Percentile

A type of norm looking at the percentage of people whose score on a test or measure falls below a particular raw score. It's a popular method for organising test-related data because they are easily calculated. ___________ divide the distributions into 100 equal parts. It provides simple ranking information that allows comparison with others. One problem is that real differences between raw scores may be minimised near the ends of the distribution and exaggerated in the middle of the distribution.

Age Norms

A type of norm that indicates average performance of different samples of test takers who were at various ages at the time the test was administered. E.g., height usually increases at various rates as a function of age up to middle-late teens; performance on various tests as a function of advancing age. Can include measuring the concept of 'mental age' vs. biological age.

Grade Norms

A type of norm that indicates average test performance of test-takers in a given school grade. The mean/median score for children at each grade level is calculated, which provides a gauge of ow one student's performance compares with that of fellow students in the same grade. However, it is not designed for adults who have returned to school.

Local Norms

A type of norm that provides normative information with respect to local population's performance on some test. E.g., an individual high school developing their own norms on a state-wide test. Also, abbreviated forms of tests require new norms.

Subgroup Norms

A type of norm where a normative sample can be segmented by any of the criteria initially used in selecting subjects for the sample, i.e., the same variables as national norms (age, gender, SES etc). Different institutions will find more use in a variety of the subgroups than others.

Developing Norms

Administering a test according to a standard set of instructions. Includes a recommended test setting (e.g., quiet, well-lit room). This makes past, present and future results more comparable. A normative sample must take the test under a standard set of conditions which can be replicated. The data is summarised with descriptive statistics, measures of central tendency and variability. A detailed description of the findings must be provided.

Measurement Error

All of the factors associated with the process of measuring some variable, other than the actual variable being measured. It can consist of random and systematic error.

National Anchor Norms

An equivalency table for scores on two different tests which allows for a basis of comparison. E.g., DAS vs. Beck DI.

Reliability Coefficient

An index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance.

Purposive Sample

Arbitrarily selecting a sample that is believed to be representative of the population.

Z Scores

Can use _ ______ to directly compare across children within the same test, across tests for the same child, and different children across different tests.

Reliability

Consistency in measurement.

Sampling Error

Election e.g., The extent to which the population of voters in the study actually was representative of voters in the election.

Methodological Error

Election e.g., Interviewers may not have been trained properly, the wording in the questionnaire may have been ambiguous, or the items may have been biased to favour one or another of the candidates.

Stratified-Random Sampling

Every member of the population has an equal opportunity of being included in a sample.

Error

Refers tot he component of the observed score that does not have to do with the test-takers true ability or trait being measured.

Stratified Sampling

Sampling that includes different subgroups, or strata, from the population.

Test Administration

Sources of error may stem from the testing environment. Also, test-taker variables such as pressing emotional problems, physical discomfort, lack of sleep, and the effects of drugs or medication. Examiner-related variables such as physical appearance and demeanour may play a role.

Variance

Standard deviation squared. Made up of true ________ plus error ________.

Normalised Standard Scores

Standard scores retain the same distribution as the raw scores, but the assumption with norms is that the distribution is normal. __________ ________ ______ involve transforming the raw scores first to fit/approximate a normal distribution, and then standardising them.

Sampling

Test developers select a population, for which the test is intended, that has at least one common, observable characteristic.

Criterion referencing

Tests measure how much of an attribute a person has by reference to a set criterion or cut-off score. E.g., driving test, uni exam. It's possible for everyone to pass, and is common with tests of knowledge, ability, or skill. It is less useful for tests of psychological traits.

Fixed Reference Group Scoring Systems

The distribution of scores obtained on a test from one group is used as a basis for calculation of test scores for future administrations of the test. Scale scores can differ when raw scores are the same.

Various Sources of Error Are a Part of the Assessment Process

The fifth assumption about psychological testing. Mistakes, miscalculations (everyday conversation); in assessment, the factors, other than what you're measuring, that influences performance on a test. Error must be taken account of in any assessment. Remember, assessors can be a source of error, but error can also be random.

Psychological Traits and States Exist

The first assumption about psychological testing. A trait is based on observable behaviour, and cultural evolution can bring trait terms in and out of existence. Traits exist as a construct; an unobservable concept to describe or explain behaviour, and can infer existence from overt behaviour. They are relatively enduring; they don't exist in behaviour 100% of the time; they can be situation dependent. Variation can also come from the strength of the trait.

Tests and Other Measurement Techniques Have Strengths and Weaknesses

The fourth assumption about psychological testing. It is important to understand how a test was developed, the appropriate circumstances to administer the test, to whom the test should be administered, and how test results should be administered. Appreciating the limitations of a test is important, as well as how these limitations can be compensated for.

Standardisation

The process of administering a test to a representative sample of test-takers for the purpose of establishing norms.

Normative Sample

The reference group to which test-takers are compared. Therefore, the results of your test depend on who you're comparing them to.

Traits and States can be Quantified and Measured

The second assumption about psychological testing. Different test developers may define and measure constructs in different ways; once a construct is defined, test developers turn to item content and item weighting.

Testing and Assessment Benefit Society

The seventh assumption about psychological testing. Many critical decisions within society are based on testing and assessment procedures (e.g., employment, schooling etc).

Testing and Assessment Can Be Conducted in a Fair and Unbiased Manner

The sixth assumption about psychological testing. This assumption can be controversial and questionable. The problems with test fairness can be psychiatric or political. We must ask ourselves, what do we, as a society, wish to accomplish by the use of this test/assessment? Remember that tests are tools; they can be used both properly and improperly.

Norms

The test performance data of a particular group of test-takers that are designed for use as a reference when evaluating or interpreting individual test scores. They are derived through standardised testing and sampling.

Test-Related Behaviour Predicts Non-Test-Related Behaviour

The third assumption about psychological testing. The results in the lab must predict the results outside of the lab. The obtained behaviour from a test would be used to make predictions about future behaviour, or to understand past behaviour.

Psychological States

These distinguish one person from another but are relative less enduring than psychological traits.

Ipsative Referencing

This type of referencing compares an individual's score to their own previous score. It's common in coaching, therapy and education. Interpretation: personal bests, progress and improvement.

Reliability, Validity and Other considerations

What makes a 'good test'?

Culture

When interpreting test results, it helps to know about the _______ and era of the test taker.

Norm-Referenced vs. Criterion-Referenced Interpretation

____ __________ tests involve comparing individuals to the normative group. With _________ __________ tests, test-takers are evaluated as to whether they meet a set standard (e.g., driving exam).

Observed Score

________ _____ = true score + error (X = T + E)

Standard Scores

z=(X-M)/SD Using ________ ______ instead of percentiles overcomes the distribution issues. Benefits of using ________ ______ is they immediately tell us the distance and direction that score is from the mean of the normative sample. They allow comparisons of scores within tests, across tests, between Ps on same or on different tests.


Related study sets

Pharmacology Chapter 29, Fluids and Electrolytes

View Set

Unit 9 Exam - Resource/Factor Markets

View Set