PSYC 4100: Chapter 6 What is test reliability/precision?

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

consistency, inconsistency

Basic idea is that test scores reflect 2 sorts of factors: factors that contribute to _________ and factors that contribute to __________

same testing situation

Carefully following all of the instructions for administering a test ensure that all test takers experience the ____ _______ ________ each time the test is given

Random error

Caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process for example random, individually experienced events such as illness, test interruption.

likely to exist

Confidence intervals give us a realistic estimate of how much error is _______ ___ _____ in an individual's observed score

homogeneous

The length of the test influences the reliability, the more ____________ questions the respondent answers, the more info the test yields about the concept the test is designed to measure

inter-rater

if a test involves multiple raters, to check reliability one should use

internal consistency

if a test is used to measure only one construct, reliability may be measured using methods of

Reliability coefficient

is simply a Pearson product-moment correlation coefficient applied to test scores

The coefficient alpha formula

is used for test questions that have more than one correct answer

measurement error

more _________ ______ = reduced reliability in the test score

heterogeneous tests

tests measuring more than one trait or characteristic

scorer reliability/interscorer agreement

the amount of consistency among scorers' judgements

observed score

the calculated confidence interval is almost always centered on an _________ ______, not a true score

coefficient alpha

the most commonly applied estimate of a multiple item scale's reliability; represents the average of all possible split-half reliabilities for a construct

r

the symbol used to represent a correlation coefficient

Parallel forms (or alternate forms)

two forms of the test that are as much alike as possible

correct decision

The more confident we're that an observed score on a test is really close to the person's true score, the more comfortable we can be that we are making a ______ ________ about the meaning of the score

higher

The number of questions on a test is directly related to reliability -- more questions = _________ reliability (given the items are equivalent in content and difficulty)

consistency

The term reliability/precision describes the ________ of test scores

alternate forms

Their practical advantage is that they can also be used as pretests and posttests if desired

split-half method

divide the test into halves and then compare the set of individual test scores on the first half with the set of individual test scores on the second half; measure of internal consistency

stable characteristics of the individual

factors that contribute to consistency

features of the situation

factors that contribute to inconsistency

test itself, administration, scoring, test takers

four sources of error that influence reliability

cohen's kappa

- a measure of interrater or intercoder reliability between two rater or coders the test yields the percentage of agreement and the probability of error.

Coefficient alpha

- describes the extent to which questions on a test or subscale are interrelated

practice effects

A limitation of test-retest is that test takers may score differently because of ________ _________ theses occur when test takers benefit from taking the test the first time (practice) which enables them to solve problems more quickly and correctly the next time

internal consistency method

A measure of how related the items (or groups of items) on the test are to one another

approximately the same way

A reliable test is one we can trust to measure each person in ____________ ____ ______ ______ every time it is used (But we don't talk about reliable tests -- we refer to reliable scores) - this has to do with scores/interpretations - tests can be used incorrectly

test-retest

A test developer gives to the same group of test takers on two different occasions, then correlated the scores from the first and secon administration

increase

Adding more question that measure the same trait can ________ a test's reliability

KR-20, coefficient alpha

Another way of measuring internal consistency is to compare individuals' scores on all possible ways of splitting the test into halves, which can be done using either

68

Approximately ___% of the observed test scores (X) would be within ±1 SEM of the true score (T).

95

Approximately ___% of the observed test scores (X) would be within ±2 SEM of the true score (T).

testing conditions, test instructions

Effective testing practices decrease chances that test taker's scores will be contaminated with error due to poor _________ _______ or poor _____ _____________

homogeneous

Estimating reliability using methods of internal consistency is appropriate only for __________ tests

quantify, variation

Generalizability theory allows you to ________ each possible source of __________ so that you can determine whether the results obtained are likely to generalize to different scores evaluate by different rates of different situations

systematic

Generalizability theory proposes separating sources of systematic error from random error to eliminate ________ error

truly equivalent

Greatest danger in alternate forms, are they ________ __________

test-retest

If the test will be used more than once, for reliability use

true-score variance, total observed-score variance

In classical test theory, reliability is defined as ________ ____________ divided by ________ _________________ ________

homogeneous subtest, factor

It isn't appropriate to calculate overall estimate of internal consistency when the test is heterogeneous, the developer should calculate and report an estimate of internal consistency for each _________ ________ or _______

valid

Just because a test is reliable does not make it ________ - that the inferences being made from the test scores are correct or that it is being used properly

well-defined characteristics

Much easier to develop alternate forms for _____________ ___________ (math ability) than for personality traits

intraclass correlation coefficient

One statistical way to evaluate inter rater reliability is the

instructions, constancy, effective practices

Proper test administration affects reliability in three ways

Alternate Forms method

Psychologists may give two forms of the same test, designed to be as much alike as possible, to the same people; which are administered as close in time as possible

standard error of measurement (SEM)

Psychologists use this as an index of the amount of inconsistency or error expected in an individual's observed test score

cancel themselves out

Random errors that may occur in one test will actually _______ ___________ ____ over an infinite number of testing occasions

unpredictable

Random measurement error affects each individual's score in an ____________ way

true score

Represents the score that would be obtained if the individual took a test infinite times and then the scores were averaged across infinite tests

accuracy

Systematic error affect _________ of a measurement

haphazardly

Test administrators need to be aware of individuals who complete the test in an unusually short amount of time -- they may have answered ___________, on purpose or by mistake

estimate, differ

The SEM is an ______ of how much the individual's observed test scores (X) might _______ from the individual's true score (T)

longer, shorter

The SPearman-Brown formula is also helpful to test developers who wish to estimate how the reliability/precision of a test would change if the test were made either _________ or _________

wider, more

The _______ the confidence interval, the ______ measurement error is present in the test score

random assignment

The best way to divide the test in split-half is to use ________ __________ to place each question in one half or the other

test-retest

This method is only appropriate when test takers are not likely to learn something the first time they take the test that can affect their scores on the second administration, or when the interval between is long enough to prevent these effects

index of the strength

To describe our estimates of reliability/precision of test scores, we use correlation to provide an _______ ____ _____ __________ of the relationship between two sets of test scores

respond

Treating all test takers in the same way decreases errors that arises from creating differences in the way individuals __________

shorter

Using split-half method means we are correlating the scores on two _______ versions of the test

sign, number itself

We look at a correlation coefficient in two ways to interpret its meaning

Systematic error

When a single source of error always increases or decreases the true score for the same amount

difference in true scores

When confidence intervals around the true scores overlap, you may not be sure that differences in observed test scores actually correspond to ________ __ ____ _______ ( and you may have to consider the two scores equivalent for decision-making purposes)

prophecy

When doing split-half reliability tests, we use the SPearman and Brown formula, the __________ formula - used to estimate what the reliability coefficient would be if the tests had not been cut in half but instead were the original length

high, low

When reliability of test scores is _____, then the SEM is ____

Coefficient alpha

describes the extent to which questions on a test or subscale are interrelated

overlap, equivalent differences

When true-score confidence intervals for two different scores ________, it means that you cannot be sure that the observed scores' differences reflect __________ __________ in true scores

spearman-brown formula

When using split-half method, we must mathematically adjust the reliability coefficient to compensate for the impact of splitting the test into halves; for this we use the

internal consistency

Whether knowledge of how a person answered one item can give information to help you predict how they answered on another test item

raw scores, larger SEM

____ _______ near the mean of score distribution tend to have a _______ _____ than very high or very low scores, but scaled scores that have been transformed from raw scores for easier interpretation can sometimes show the opposite patterns

practice, order

______ and ______ effects can add systematic as well as random error to test scores

KR-20

______ is a substitute for coefficient alpha when test items are scored dichotomously (true/false). (Used to estimate internal consistency reliability)

random, systematic

_______ error reduces reliability; _________ error does not

heterogeneous

____________ tests can be expected to have lower reliability coefficients

COhen's kappa

a popular index of agreement

Confidence interval

a range of scores that we feel confident will include the test taker's true score

homogeneous test

a test that measuers one trait or characteristic

Interrater agreement

an index of how consistently the scorers rate or make decisions

interval lengthens

as the ______ __________ the test-retest reliability will decline because the number of opportunities for the test takers or testing situation to change increases over time

coefficient alpha

basically splits the test in half in every possible way

order effects

changes in test scores resulting from the order in which the teste were taken - the test takers half receive from a first, the other half for B first

Generalizability theory

concerns how wella and under what conditions we can generalize an estimation of reliability/precision of test scores from one test administration to another

normally

random error is assumed to be ________ distributed

Homogeneity

refers to whether the questions measure the same trait or dimension

variance of true scores

reliability reflects the proportion of the total observed variance in the test scores that is attributable to the

decreases

shortening a test ________ its reliability

Measurement error

variations in measurements due to random mistakes or inconsistencies of the person

reliability coefficient

when we are referring to the results of the statistical evaluation of reliability, the term ___________ ___________ is preferred

Intrascorer reliability

whether a particular scorer is consistent in the way they assign scores from test to test


Kaugnay na mga set ng pag-aaral

Fluid, Electrolyte, and Acid-Base Balance

View Set

Microbiology Lab Final Exam Study Guide

View Set

Chapter 6 Sexual Behaviors:Practice Quiz

View Set

Animal Farm Chapters 3-4 Questions

View Set

Synthesis Midterm Practice Questions

View Set

Lecture 13: Emotions and Morality

View Set