Reliability and validity of measurement 3/1

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

systematic error (aka method error)

Instrument errors -e.g Lack of calibration Investigator/Interviewer biases -Desire for favorable patient outcomes -Personality and beliefs of interviewer -Not following standard procedures Subject biases -Social desirability- say things they think you want to hear -Hawthorne effect: act of observing behavior may change the behavior; -Increased familiarity/comfort with measurement: elevated BP when first measured by physician decreases as patient becomes more comfortable with the medical setting

when is intertester reliability best assessed

Intertester reliability is best assessed when all testers are able to measure the variable during the same trial, where they have observed the subject's performance simultaneously and independently. Many research protocols use only 1 tester to avoid the necessity of establishing intertester reliability and to provide more consistency of measurement

Criterion-Related Validity - Predictive Validity

"form of criterion-based validity in which an inferred interpretation is justified by comparing a measurement with supporting evidence that is obtained at a later point in time; examines the justification of using a measurement to say something about future events or conditions" Measurement can be used to predict some future event, measurement SAT tests are thought to have predictive validity for forecasting future academic success in college Predictive validity of measurements of smoking, weight, and exercise habits shown by development of CAD in individuals with these factors

validity

"the degree to which a useful (meaningful) interpretation can be inferred from a measurement" Emphasis on the ability to make inferences from the measurements

true score continued

Because no measurement is exact or free from error, every observed score on any measuring instrument is made up of 2 quantities: True score = score that would be obtained if there were no errors of measurement Conceived of as hypothetical, unobservable quantity that cannot be directly measured The average score that would be obtained if the person were remeasured an infinite number of times on that variable No single measurement would pinpoint the true score exactly, but the average of an infinite number of repeated measurements would be equal to the true score

content validity is usually concerned with...

Concerned with sample-population representativeness. i.e. the knowledge and skills covered by the test items should adequately sample the universe of content that defines the variable(s) being tested Usually established by agreement of content experts You have in all likelihood taken some exams that seemed to you to be poor assessments of your mastery of the subject, even though you personally might have done well on them; and you have no doubt taken other exams that seemed to you to be good assessments, even though you personally might not have done well on them. In the language of students, it is the distinction between a "fair" exam and an "unfair" exam. A fair exam is one that actually measures what it purports to measure, namely, the student's knowledge and understanding of the subject matter; and an unfair exam is one for which the student's score substantially reflects something other than knowledge and understanding, for example, the student's ability to spot and deal with trick questions, to remember picayune details mentioned by the instructor or the textbook, or to adhere to some particular theoretical or ideological party line favored by the instructor. Determination of content validity is essentially a *subjective process*. Tests are usually reviewed by a panel of "experts" who determine if the test satisfies the content domain. No statistical indices that assess content validity

random error

Differences from True score due to: Mood Motivation Fatigue Inattention Noise Lighting Coding errors By random effect, sometimes these factors will cause the score to increase or decrease. By the end of the day we except random error to get back to 0. theres nothing we can do to control for random error. Error makes our observed score different from the true store over time, the effects will cancel out, resulting in: E(e) = 0 and E(O) = E(t)

Construct Validity - Convergent Validity

Measurements believed to reflect the same underlying phenomenon will yield similar results, or will be related to (correlate with) each other

Statistical Measures of Reliability: Intraclass Correlation Coefficient

Single index to assess the correlation and agreement of quantitative measurements between different sets of scores -->Used with scores on interval/ratio and ordinal scales of measurements -->Range: 0.00 (random) to 1.00 (perfect) Because reliability is a characteristic of measurement obtained to varying degrees, the clinician/researcher must determine 'how much' reliability is needed to justify the use of the measurement General guidelines: 0.75 indicates good reliability < 0.75 indicates poor reliability

types of reliability of measurement

Test-retest Intra-tester Inter-tester Internal consistency Parallel (Alternative) forms

Construct Validity - Discriminant Validity

constructs that theoretically should not be related to each other are, in fact, observed to be not related measurements discriminate between individuals with the trait and without the trait

Statistical Measures of Reliability: Kappa Statistic score ranges

is a chance-corrected measure of agreement Used with scores on nominal scale Range: 0.00 (no more than chance agreement) to 1.00 (perfect agreement)

content validity

the extent to which a measurement is judged to reflect the meaningful elements of a construct and not any extraneous elements An instrument is said to have content validity if it covers all parts of the universe of content and reflects the relative importance of each part

Measurement error

the extent to which the observed score varies from the true score due to other variables that impact the observed score These other errors may be random or systematic Measurement errors reduce the reliability of the measurement

inter-tester reliability

"Consistency or equivalency of measurements when more than one person takes the measurements; indicates agreement of measurements taken by different examiners"

intra-tester reliability

"Consistency or equivalency of measurements when one person takes the measurement; indicates agreement in measurement over time"

test-retest reliability

"The consistency of repeated measurements separated in time; indicates stability over time" Biggest problem with assessing this type of reliability is the memory effect. Time interval between test administrations is important. Should be far enough apart to avoid effects of fatigue, learning or memory, but close enough so that changes have not occurred in the measured variable.

Criterion Related Validity -Concurrent Validity

"a form of criterion related validity in which an inferred interpretation is justified by comparing a measurement with supporting evidence that was obtained at approximately the same time as the measurement being validated" Concurrent validity of a measurement is evaluated by comparing it to another "criterion" measure taken at relatively the same time (concurrently), so that both measures reflect the same event, or behavior

systematic error over time

Over the long run, these errors are not expected to cancel each other out, therefore, average score not equal 0.

concurrent validity example

Pressure catheter in heart ventricle provides a direct measure of BP. BP measurements with sphygmomanometers provide indirect measurements of BP and can be shown to have concurrent validity with the direct BP measures Goniometric measurements of range of motion have concurrent validity because they are associated with joint angles measured from x-rays

internal consistency reliability

"The extent to which items or elements that contribute to a measurement reflect one basic phenomenon or dimension" In internal consistency reliability we are looking at how consistent the results are for different items for the same construct within the measure. It measures whether several items that propose to measure the same general construct produce similar scores. For example, if a respondent expressed agreement with the statements "I like to ride bicycles" and "I've enjoyed riding bicycles in the past", and disagreement with the statement "I hate bicycles", this would be indicative of good internal consistency of the test.

Parallel (Alternate) Forms Reliability

"the consistency or agreement of measurements obtained with different (alternative) forms of a test; indicates whether measurements obtained with different forms of a test can be used interchangeably" Scores on the two forms should show a high positive correlation Many standardized tests (SAT, GRE, licensing exams (PT has 4 different versions) exist in 2 or more versions, called equivalent, alternate, or parallel forms. Parallel forms of tests are constructed so that the different forms can be used independent of each other and considered equivalent measures

reliability

"the consistency or repeatability of measurements; the degree to which measurements are error-free and the degree to which repeated measurements will agree" A reliable measure is reproducible and precise: Each time it is used it produces the same value (assuming the underlying variable being measured has not changed between tests).

types of validity of measurement

1. Face validity 2. Content validity 3. Construct validity -->Convergent validity -->Discriminant validity 4. Criterion-related validity -->Concurrent validity -->Predictive validity -->Prescriptive validity

classic examples of not good intertester reliability

Classic examples of a lack of inter-tester reliability of measurement is judging of diving, ice skating, gymnastics, etc Different judges (testers) observe the same performance, and using the same rules for scoring, still assign different scores (measurements) to the same performance

face validity

Concerned with how a measure or procedure appears to measure what it is supposed to measure. -Least rigorous form of validity -More of a 'public relations' issue Is the test credible to the users? Face validity serves an important purpose, however, in that a test with face validity will be accepted to those who use it and to those who are tested by it. Patients may not be compliant with repeated testing if they don't see how the test relates to their problems or difficulty.

construct validity

Construct validity reflects the ability of instrument to measure an abstract concept = construct Establishing construct validity represents a considerable challenge because constructs are not real, they exist only as concepts that represent an abstract, multi-dimensional trait, and are not directly observable

true score theory

Measurements are rarely perfectly reliable. All measurements are fallible to some extent because all humans (PTs and patients) and tests respond with some inconsistency Essentially, true score theory maintains that every measurement (observed score) is an additive composite of two components: true ability (or the true level) of the respondent on that measure; and measurement error. Observed score (measurement) X = T(True score) + e(Error) We observe the measurement -- the score on the test, BP, HR, ROM, body weight, etc. We don't observe what's on the right side of the equation; we assume that there are two components to the right side.

observed score, true score, reliability, random error, systematic error

Observed score: your weight what you measured that day on the scale True score: what your weight really is (we don't really know what it is) Reliability is the extent to which the test is error free. Random error: cant control. Its random it happens Systematic error: the way we control systematic error improves the reliability of our measure

Take home messages

Reliability and validity are properties of the *measurement*, not the test or instrument used to make the measurement A measurement is reliable only under certain types of conditions and for certain types of patients/client (pay close attention to the operational definitions) When reading research reports, look for evidence that the measurements used in the study have appropriate forms of reliability and validity


Ensembles d'études connexes

ECON 4: The Balance Sheet Equation

View Set

FNAN 300 Chapter 11 Connect Learnsmart

View Set

(PrepU) Chapter 3: Critical Thinking, Ethical Decision Making, and the Nursing Process

View Set

Aggregate Expenditures Model I saving function

View Set

Chapter 14: Assessing Skin, Hair, and Nails

View Set

Accounting Cornett Auburn Exam 3 TB

View Set