PSYCH ASSESSMENT 106

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

SOURCES OF ERROR VARIANCE

-Assessee -Assessor -Measuring instruments

TYPES OF NORMS

1. Age norms 2. Grade norms 3. National norms 4. National anchor norms 5. Local norms 6. Norms from a fixed reference group 7. Subgroup norms 8. Percentile norms

2 types of sampling procedure

1. Purposive sampling 2. Incidental sampling

Sources of Error Variance

1. Test construction, administration, scoring, and/or interpretation

Sources of error variance during test administration

1. Test environment like room temperature, level of lightning and amount of ventilation and noise. 2. Test taker variables like pressing emotional problems, physical discomfort, lack of sleep, and the effects of drugs or medication 3. Formal learning experiences, casual life experiences, therapy, illness and changes in mood or mental state 4. Body weight, can be source of error variance 5. Examiner-related variables are potential sources of error variance. Scorers and scoring system are potential sources of error variance.

Basic 3 approaches to the estimation of reliability

1. Test-retest 2. Alternate or parallel forms 3. Internal or inter-item consistency

- also known as age-equivalent scores, indicate the average performance of different samples of test takers who were at various ages at the time the test was administered

AGE NORMS

are simply different versions of a test that have been constructed so as to be parallel.

ALTERNATE FORMS

refers to an estimate of the extent to which these different forms of the same test have been affected by item sampling error or other error.

Alternate forms reliability

an informed, scientific concept developed or constructed to describe or explain behavior. We can't see, hear, or touch construct but we can infer their existence from an overt behavior.

CONSTRUCT

a standard on which a judgment or decision may be based.

CRITERION

is also referred to as the true score model of measurement. It is the most widely used and accepted model in the psychometric literature today.

Classical test theory

states that a score on an ability test is presumed to reflect not only the test takers true score on the ability being measured but also error

Classical test theory

may be defined as a method of evaluation and a way of deriving meaning from test scores by evaluating an individual's score with reference to a set standard

Criterion-referenced testing and assessment

is designed to provide an indication of where a test taker stands with respect to some variable or criterion such as an educational or a vocational objective.

Criterion-referenced tests

a test item or question that can be answered with only one of two response options such as true or false or yes-no.

Dichotomous test item

The computation of a coefficient of split-half reliability generally entails three steps:

Divide the test into equivalent halves. Calculate a Pearson r between scores on the two halves of the test Adjust the half-test reliability using Spearman-Brown formula.

- refer to mistakes, miscalculations and the like. - Traditionally refers to something that is more than expected; it is a component of the measurement process; - Refers to a long-standing assumption that factors other than what a test attempts to measure will influence performance on the test

ERROR

is used as the basis for the calculation of test scores for future administration of the test.

Fixed reference group

- a system of scoring wherein the distribution of scores obtained on the test from one group of test takers is used as the basis for the calculation of test scores for future administrations

Fixed reference group scoring system

are developed by administering the test to representative samples of children over a range of consecutive grade levels

GRADE NORMS

designed to indicate the average test performance of test takers in a given school grade.

GRADE NORMS

- examines how generalizable scores from a particular test are if the test is administered in different situations

Generalizability study

describes the degree to which a test measures different factor.

Heterogeneity

is composed of items that measure more than one trait.

Heterogeneous test

is one source of variance during test construction. It refers to the variation among items within a test as well to variation among items between tests.

ITEM SAMPLING OR CONTENT SAMPLING

- referred to as convenience sampling. The process of arbitrarily selecting some people to be part of sample because they are readily available, not because they are most representative of the population being studied.

Incidental sampling

refers to the degree of correlation among all the items on a scale. A measure of inter-item consistency is calculated from a single administration of a single trait.

Inter-item consistency

Is the degree of agreement or consistency between two or more scorers with regard to a particular measure.

Inter-scorer reliability

an estimate of reliability of a test obtained from a measure of inner-item consistency

Internal consistency estimates of reliability

-refers to collectively, all of the factors associated with the process of measuring some variable, other than the variable being measured. Categories of measurement error

MEASUREMENT ERROR

are derived from a normative sample that was nationally representative of the population at the time the norming study was conducted.

NATIONAL NORMS

- refer to behavior that is usual, average, normal, standard, expected, or typical

NORM

- provide a standard with which the results of measurement can be compared.

NORMS

an equivalency table for scores on two nationally standardized test designed to measure the same thing.

National anchor norms

refers to an observable action or product of an observable action

OVERT BEHAVIOR

exist when for each form of the test the means and the variances of observed test scores are equal.

PARALLEL FORMS OF A TEST

refers to an estimate of the extent to which item sampling and other errors have affected test scores on versions of the same test when for each form of the test, the means and variances of observed tests scores are equal.

PARALLEL FORMS RELIABILITY

refers to the distribution of raw scores more specifically to the number of items that were answered correctly multiplied by 100 and divided by the total number of items.

PERCENTAGE CORRECT

is an expression of the percentage of people whose score on a test or measure falls below a particular raw score. It is a converted score that refers to a percentage of test takers.

PERCENTILE

are the raw data from a test's standardization sample converted to percentile form.

PERCENTILE NORMS

- a test item or question with three or more alternative responses where only one alternative is scored correct or scored as being consistent with a targeted trait or other construct

Polytomous test items

-is when a time limit is long enough to allow test takers to attempt all items and if some items are so difficult that no test taker is able to obtain a perfect score

Power tests

Assumption #1

Psychological Traits and States Exis

Methods of obtaining internal consistency estimates of reliability

SPLIT-HALF ESTIMATE

norms for any defined group within a large group.

Subgroup norms

is an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test. It is appropriate when evaluating the reliability of a test that purports to measure something that is relatively stable over time such as personality trait.

TEST-RETEST RELIABILITY

is presumed to represent the strength of the targeted ability or trait or state and is frequently based on cumulative scoring

THE TEST SCORE

any distinguishable, relatively enduring way in which one individual varies from another

TRAIT

The greater the proportion of the total variance attributed to true variance, the more reliable the test

TRUE

Assumption #3

Test-Related Behavior Predicts Non-Test-Related Behavior. Patterns of answer to true-false questions on one widely used test of personality are used in decision making regarding mental disorder. The tasks in some tests mimic the actual behaviors that the test user is attempting to understand

Assumption #4

Tests and other Measurement Techniques Have Strengths and Weaknesses Competent test users understand a great deal about the tests they use. They understand among other things, how a test was developed, the circumstances under which it is appropriate to administer the test, how the test should be administered and to whom, and how the test result should be interpreted. Competent test users understand and appreciate the limitations of the tests they use as well as how those limitations might be compensated for by data from other sources

- allows a test developer or user to estimate internal consistency reliability from a correlation of two halves of a test.

The Spearman-Brown formula

Assumption #5

Various Sources of Error Are Part of the Assessment Process

variance from true differences

true variance

a relatively new measure for evaluating the internal consistency of a test. It is a measure that focuses on the degree of difference that exists between item scores. It is a measure used to evaluate the internal consistency of a test that focuses on the degree of difference that exists between item scores

Average proportional distance (APD)

refers to the component of the observed test score that does not have to do with the test takers ability.

Error

the assumption is made that each test taker has a true score on a test that would be obtained but for the action of measurement error.

Classical test theory or true score theory

- developed by Cronbach and elaborated on by others

Coefficient alpha

the mean of all possible split-half correlation

Coefficient alpha

is when the interval between testing is greater than six months, the estimate of test-retest reliability is often referred to as

Coefficient of stability

also referred to as criterion-referenced or domain-referenced testing and assessment. A method of evaluation and a way of deriving meaning from test scores by evaluating an individual's score with reference to a set standard; contrast with norm-referenced testing and assessment.

Content-referenced testing and assessment

- seek to estimate the extent to which specific sources of variations under defined conditions are contributing to the test score. It is a test reliability is conceived of as an objective measure of how precisely the test score assesses the domain from which the test draws a sample.

Domain sampling theory

- the component of a test score attributable to sources other than the trait or ability measured

ERROR VARIANCE

the equivalency of scores on different tests is calculated with reference to corresponding percentile score

Equipercentile method

another alternative to the true score model. It is also referred to as latent-trait theory or the latent-trait model, a system of assumptions about measurement and the extent to which each test items measures the trait.

Item response theory (IRT)

- provide normative information with respect to the local population's performance on some test.

Local norms

are the test performance data of a particular group of test takers that are designed for use as a reference when evaluating or interpreting individuals test scores

Norm in the psychometric context

a way to derive meaning from a test scores. An approach to evaluate the test score in relation to other scores on the same set

Norm- referenced

as a method of evaluation and a way of deriving meaning from test scores by evaluating an individual test taker score and comparing it to scores of a group of test takers. A common goal of norm-referenced tests is to yield information on a test taker's standing or ranking relative to some comparison group of test takers.

Norm-referenced testing and assessment

Assumption #2

Psychological Traits and States can be Quantified and Measured Measuring traits and states by means of a test entail developing not only appropriate test items but also appropriate ways to score the test and interpret the results. The test score is presumed to represent the strength of the targeted ability or trait or state and is frequently based on cumulative scoring

the arbitrary selection of people to be part of a sample because they are thought to be representative of the population being studied

Purposive sampling

- is a source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process

RANDOM ERROR

A good test or more generally, a good measuring tool or procedure is reliable. The criterion of reliability involves the consistency of the measuring tool. The precision with which the test measures and the extent to which error is present in measurements. In theory, the perfectly reliable m measuring tool consistently measures in the same way

RELIABILITY

refers to the proportion of the total variance attributed to true variance

RELIABILITY

also referred to as inflation of variance, a reference to a phenomenon associated with reliability estimates wherein the variance od either variable in a correlational analysis is inflated by the sampling procedures used and so the resulting correlations coefficients tends to be higher contrast with restriction of range.

Restriction or inflation of range

a portion of the universe of people deemed to be representative of the whole population

SAMPLE

the process of selecting the portion of the universe deemed to be representative of the whole population.

SAMPLING

- distinguish one person from another but are relatively less enduring. Psychological trait exists only as a construct.

STATES

-refers to a source of error in measuring a variable that is typically constant or proportionate to what is presumed to be the true value of the variable being measured.

SYSTEMATIC ERROR

- some defined group as the population for which the test is designed.

Sampling

generally contain items of uniform level of difficulty so that when given generous time limits all test takers should be able to complete all the test items correctly

Speed tests

is obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once.

Split-half reliability

is the process of administering a test to representative sample of test takers for the purpose of establishing norms.

Standardization or test standardization

generalizability study examines how much of impact different facets of the universe have on the test score.

Stated in the language of generalizability theory

a trait, state or ability presumed to be relatively unchanging overtime; contrast with dynamic.

Static characteristics

is the process of developing a sample based on specific subgroups of a population

Stratified sampling

is the process of developing a sample based on specific subgroups of a population in which every member has the same chance of being included in the sample.

Stratified-random sampling

Assumption #7

Testing and Assessment Benefit Society. In a world without tests, teachers and school administrators could arbitrarily place children in different types of special classes simply because that is where they believed the children belonged. In a world without tests, there would be a great need for instruments to diagnose educational difficulties in reading and math and point the way to remediation. The criteria for a good test would include clear instructions for administering, scoring, and interpretation. It would also seem to be a plus if a test offered economy in time and money it took to administer, score, and interpret it.

Assumption #6

Testing and Assessment can be conducted in a fair and unbiased manner. Today all major test publishers strive to develop instruments that are fair when used in strict accordance with guidelines in the test manual. One source of fairness-related problems is the test user who attempts to use a particular test with people whose background and experience are different from the background and experience of people for whom the test was intended

True

The extent to which a test takers score is affected by the content sampled on a test and by the way the content is sampled is a source of error variance.

as a value that according to classical test theory genuinely reflects an individual's ability level as measured by a particular test.

True score

A test is considered valid for a particular purpose if it does. It measures what it purports to measure. A test reaction time is valid test if it accurately measures reaction time. A test of intelligence is a valid test if it truly measures intelligence. Other considerations

VALIDITY

The degree of relationship between various forms of a test can be evaluated by means of an alternate-forms or parallel-forms coefficient of reliability which is often termed as

coefficient of equivalence

The influence of particular facets on the test score is represented by

coefficients of generalizability In the decision study developers examine the usefulness of test scores in helping the test user make decisions.

- is a trait state, or ability presumed to be ever-changing as a function of situational and cognitive experiences.

dynamic characteristic

variance from irrelevant, random sources

error variance

Split-half reliability is also referred to

odd-even reliability

Test are said to be homogeneous if

they contain items that measure a single trait.

The simplest way of determining the degree of consistency among scorers in the scoring of a test is

to calculate a coefficient of correlation- coefficient of inter-scorer reliability

A statistic useful in describing sources of test score variability is the

variance -the standard deviation squared.


Ensembles d'études connexes

MGMT473: Ch. 17: Planning for Growth

View Set

Oceanography 1 - seawater composition

View Set

Nombres, Expresiones de Cortesía y Expresiones de Placer de Conocer

View Set