Measurement Quiz 2

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Kappa 0.81 - 1.00

Almost Perfect Agreement

Predictive Validity

Determine whether a measure will be a valid predictor of some future criterion score or behavior.

Validity is needed to...

Draw conclusions or make inferences about data/measures.

Random Errors

Due to chance. Can effect the score in unpredictable ways. Random error will cancel each other out over time making the average a good estimate of the true score. (EX: patient not giving too much or to little effort at varying times for grip strength)

Examiner/Rater Biological Variation

Inherent variation in the examiner's senses.

Validity

Is the test measuring what it is supposed to measure? Degree to which a useful/meaningful interpretation can be inferred from a measurement. What is assessed is indeed what is intended to be assessed.

Examiner/Rater Expectation

Knowing the previous value may influence the examiner's perception of the current measured value.

Carryover Effect

Practice or learning during the initial trial alters performance on subsequent trials. Strength measurements can improve following warm-up trials. A series of pretest trials may be given to neutralize the carryover effects & data are collected only after performance has stabilized.

Systematic Errors

Predictable errors of measurement. Over or underestimating a score consistently.

Content Validity

Refers to the adequacy with which the complete universe of content is sampled by a test's items.

Kappa 0.61 - 0.80

Substantial Correlation

Validity is not All-or-None

A characteristics to be presented as a greater or lesser degree.

Defective Instrument

A goniometer with a 'sloppy' pivot point would result in measurements with considerable variability.

Standard Error of Measurement

A range of scores within which the true score for a given test is likely to lie.

Perform Test-Retest Study

A sample of individuals is subjected to the identical test on two separate occasions, keeping all testing conditions as constant as possible. A research must be able to anticipate the inherent stability of the targeted response variable.

Do participants have sufficient diversity to assess the full range of the outcome measure?

A study of reliability should include the full spectrum of people that the outcome measure was designed to test. For example, consider a measure designed to assess a child's ability to engage in elementary school activities. Study participants should include all ages of elementary school children & children with a full range of participation abilities.

Reliable

A valid measure is usually...

Criterion-Related Validity

Ability of test to align with results obtained on an external criterion. Test to be validated (target test) is compared to a gold standard (criterion measure) that is already established & assumed to be valid.

Examiner/Rater Potential Sources of Error

Biological Variation Instructions Expectation

Patient Sources of Measurement Error

Biological Variation Motivation

Measuring Instrument Potential Sources of Error

Calibration/Scale Defective Instrument

Validity is not an immutable characteristic of a measurement

Can be evaluated within the context of an instrument's intended use, such as the population or setting to which it will be applied.

Criterion Related Validity Two Basic Approaches

Concurrent Validity Predictive Validity

Key Questions About Quality for Studies of Outcome-Measures Reliability

Consider the number of participants being rated & the number of raters. Does the study have sufficient sample size? Do participants have sufficient diversity to assess the full range of the outcome measure? Are study participants stable in the characteristic of interest? Is there evidence that the outcome measure was conducted in a reasonably consistent manner between assessments? Were raters blinded to the scores of previous participants? Is the time between assessments appropriate?

Types of Validity

Criterion-Related Validity Content Validity Construct Validity

Responsiveness

Detect Change of Scores Sensitivity to Changes Good Enough to measure small changes that are clinically important. Longitudinal Constructs assessment process.

Reliability - Measurement Error

Difference between the true values & observed value. (OV = TV +/- ME (noise)). We would have no way of knowing exactly how much error was included in each of the measured values. We never really know a true value.

Environment Potential Sources of Error

Disruptive Environment Inadequate Lighting

Has the tool been measured in a population similar to mine?

Do the study samples have the sample condition? Is the study sample similar in disease severity? Is the study sample similar in disease-specific factors?

Understanding Psychometric Properties...

Empowers you to select & use measurements effectively in your practice.

ICC Greater than 0.90 Indicate

Excellent Reliability

Considerations for Choosing a Measurement you Trust?

External Validity Psychometric Properties

Kappa 0.21 - 0.40

Fair Correlation

ICC Between 0.75 - 0.9

Good Reliability

External Validity

Has the tool been measured in a population similar to mine?

Absolute Reliability

How much of a measured value, expressed in the original units, is likely due to error. Standard error of measurement (SEM)

Example Concurrent Validity

How to evaluate the validity of a newly developed wearable sensors motion system for joint kinematics measures? What's the gold standard to compart to?

Poor Reliability

ICC Less than 0.5

Is the time between assessments appropriate?

If assessments are conducted within too short or too long a time interval, the reliability may be inaccurate. If the time is too short, raters (or participants) may be influenced by the initial test. Consider a study of the intra-rater reliability of manual muscle testing. If assessments are taken with only a minute between tests, the raters are likely to remember the strength scores from one test to another, & participants may have fatigue that causes a decrease in their strength. In this case, rater recall could overestimate intra-rater reliability, & participant fatigue could underestimate intra-rater reliability, making the results difficult to interpret. If the time is too long, there is more risk that participants will change on the outcome of interest. A general rule for time between assessments is 1 to 2 days; however, this can vary depending on the outcome measure being studied.

Is there evidence that the outcome measure was conducted in a reasonably consistent manner between assessments?

If it is conducted differently between assessments, the reliability of the outcome measure will be inaccurate. Consider a questionnaire completed by patients regarding their satisfaction with therapy services. If the first assessment is conducted in a quiet, private room, whereas the second assessment is conducted over the phone & without privacy, patients might be prone to give different answers. This scenario would give inaccurate estimates of the measure's test-retest reliability.

Are study participants stable in the characteristic of interest?

If the participants in reliability study change substantially in the characteristic of interest, the reliability result will not be accurate. For example, in the case of an outcome measure designed to assess urinary incontinence, it is important that the participants not change any medication regiment that could change the severity of their incontinence between study assessments. Such a change would produce erroneous results for a study or reliability.

Source of Measurement Error

Inherent variability of the characteristic being measured (Blood pressure, body temp are not constant). Person performing measurement (rater) (MMT - oral encouragement?). Environment (Noise, distraction). Measuring Instrument (Poorly calibrated grip strength dynamometer).

Examiner/Rater Instructions

Instructions or amount of verbal encouragement may vary from one measurement session to the next.

Rater Reliability

Inter-(Between) Rater Reliability Intra-(Within) Rater Reliability

Measuring Reliability

Interclass correlation coefficient (ICC) Standard Eooro of Measurement (SEM)

Psychometric Properties Definition

Intrinsic properties of a measurement include reliability, validity, & responsiveness (clinical meaningful changes). Can be applied to questionnaires, outcome measures, clinical tools, scales or special tests. AKA Clinimetric properties, or methodological qualities. How Confident can you be about the measurement on your patient?

Is the measurement trustworthy?

Is it reliable? Is it Valid? is it responsive to change?

Reliability Exists in a Context

Is not an inherent or immutable trait of a measurement. It can only be understood within a context of its application: Characteristics of a subject; Training & skill of the examiners; In different languages. It is not necessarily the same in all situations.

Psychometric Properties

Is the measurement trustworthy?

Validity Definition

Is the test measuring what it is suppose to measure? Extent to which a test measures what it is intended to measure. Degree to which a useful/meaningful interpretation can be inferred from a measurement. It is needed to draw conclusions or make inferences about data/measures.

Because Inferences are difficult to verify, establishing validity is not as straightforward as establishing reliability

Less of a problem for direct observations such as distance, size or speed. For variables represent abstract constructs such as anxiety, depression, intelligence, or pain, direct observation may not be possible. We are required to take measurements of a correlate or proxy of the actual property under consideration. We make inferences about the magnitude of a latent trait based on observations of related discernible behaviors. Example: restlessness & agitation may be observed to quantify anxiety.

Intra-Rater Reliability

Means that one person should come out with the same results on every repetition of the test, within acceptable levels. Consistency in measurement & scoring by the evaluator when two tests results from two similar situations are correlated. Rater Bias may occur. Did the same tester give the same instructions each time?

What influences reliability

Measurement error

Internal Consistency

Measurements such as surveys, questionnaires, written examinations & interviews are composed of a set of questions or items intended to measure the different attributes of a multifaced construct. Correlations among each subsection or between a subsection & summative score. Split-half Reliability

Calibration/Scale

Measuring range of motion to the closest 5 degree because the goniometer is calibrated in 5-degree increments would yield different values compared to a goniometer calibrated in 1-degree increments.

Types of Responsiveness

Minimum Detectable Difference (MDD) Minimal Clinically Important Difference (MCID)

Kappa 0.41 - 0.60

Moderate Correlation

ICC between 0.5 - 0.75

Moderate Reliability

Fleiss' Kappa

More than two raters

Correlations <0

No Agreement/No Correlation

Are Systematic Errors a Threat to Reliability?

No, but it threatens validity because none of the observed values will succeed in accurately quantifying the target construct.

Kappa 0.01 - 0.20

None to slight Correlation

Patient Biological Variation

On repeated knee flexion efforts the patient may provide different joint angles.

Patient Motivation

Patient may try harder on some occasions compared to others.

Were raters blinded to the scores of previous participants?

Raters should not have knowledge of previously collected scores. For example, for a study of physical therapy students' inter-rater reliability in taking manual blood pressure measurements, students must be blinded to the measures collected by other students. Lack of blinding is likely to produce inaccurate results.

Relative Reliability Coefficient

Reflect true variance as a proportion of the total variance. With maximum relative reliability (no error at all), this ratio will be 1, or 100%. It's unitless, therefore allow us to compare the reliability of different tests.

All Standardized Tests/Outcome Measures include what?

Reliability Validity Responsiveness (change of score)

Clinical Diagnostic Tests include what?

Reliability Validity Responsiveness (change of score) Sensitivity Specificity Positive Predictive Value Negative Predictive Value

Example: The Interrater Reliability of MMT:

SCI excellent interrater reliability ICC 0.94. ICU patients > UE ICC 0.62 & LE ICC 0.66 - moderate.

Test-Retest Reliability

Same test repeated at two different times. Determine the ability of an instrument to measure subject performance consistently.

In Situations which raters are minimally involved Test-Retest Reliability...

Self-report outcome measures. Instrumental physiological test that provides automated digital readouts.

Test Retest Interval

Should be far enough apart to avoid fatigue, or learning, or memory but close enough to avoid genuine changes in the targeted variable. The amount of time between tests matters (BP, Infant development). All depends on the inherent stability of the targeted response variable.

Inter & Intra-Rater Reliability Coefficients: Coen's Kappa (K)

Statistical measures for assessing the extent of agreement among two or more raters of qualitative (categorical) data. K takes into account the possibility of agreement happening by chance.

Inter & Intra-Rater Reliability Coefficients: Inter-Class Coefficients (ICC)

Statistical measures for assessing the extent of agreement among two or more raters of quantitative measurements that are organized into groups (categorical data). It describes how strongly units in the same group correlate with (resemble) each other.

Concurrent Validity

Studies when target & criterion test scores are obtained at approximately the same time. Useful when a new untested tool is potentially more efficient, easier to administer, more practical or safer than the current method, & when it is being proposed as an alternative.

Consider the number of participants being rated & the number of raters. Does the study have sufficient sample size?

Sufficient sample size is required to reduce the chance for random error to affect the results. A study that reports the method used to determine the minimum sample size (i.e., sample size calculation) has higher quality.

Type of Measurement Error that Influence Reliability

Systematic Errors Random Errors

Testing Effects

Test itself is responsible for observed changes in a measured variable. ROM testing can stretch soft tissue around a joint & thereby increasing the ROM on subsequent testing.

Test-Retest Reliability is Influenced by:

Test-Retest Interval Carryover Effects Testing Effects Both Carryover & testing effects may manifest as systematic error, creating unidirectional changes across all subjects (or observations). It will not affect reliability coefficients, but will affect validity of the measure.

Types of Reliability

Test-retest Interrater Intrarater Alternate Forms/Parallel Forms Internal Consistency

Validity Relates to...

The confidence we have that our measurement tools are giving us accurate information about a relevant construct so that we can apply results in a meaningful way.

Reproducibility & Dependability

The degree to which repeated measures will agree.

Reliability

The stability, consistency, or repeatability of a measurement (of unchanging behavior). How free the test is from error. Reproducibility & Dependability. An attempt to identify sources of error.

Rater Bias

This can be decreased by blinding the rater or having the grading that is very objective.

Inadequate Lighting

This may cause the examiner to read the goniometer incorrectly.

Disruptive Environment

This may distract the examiner or patient & result in an inaccurate reading.

Measurements of Accuracy of Tests/Measures

To what extent do we trust the findings of a test & how much do we rely on them to inform our clinical decision-making? Reliability, Validity, Responsiveness (change of score), Sensitivity, Specificity, Positive Predictive Value, Negative Predictive Value.

Cohen's Kappa

Two raters

Split-Half Reliability

Two sets of items testing the same content, redundant, for reliability testing. Can be considered as alternate forms of the same test.

Alternate Forms of Reliability

Two versions of an instrument are compared to see if they obtain similar scores. Equivalent or parallel forms. Paper vs. electronic versions of pain visual analog scale.

Inter-Rater Reliability

Used to assess the degree to which different raters/observers give consistent estimates/scores of the same phenomenon. Degree of agreement between scores from two raters following observation & rating of the same subject. Think about providing instructions to a grip strength testing or asking about VAS - Did each tester give the same instructions?

Predictive Validation can be used for:

Validating screening procedures used to identify risk factors for future disease. Determining the validity of prognostic indicators used to predict treatment outcomes.

What Validity Is & Is Not

Validity is not measuring whether a measurement is valid or invalid. It is measuring how valid are the results of a test for a given purpose within this setting?

How does validity from reliability?

Validity looks at the objective information of a test & the ability to make inferences from the test score. Validity addresses what are you able to do with a test. It implies that a test is mostly free of error. Thus a valid test is typically a reliable test.

Without Reliability...

We can not have confidence in our data.

Reliability is not all-or-none

We estimate the 'degree' of reliability present in a measurement system.

Measuring Reliability: Concept of Variance

We estimate the reliability of a test based on the statistical concept of variance. In general, greater dispersion of scores, larger the variance. Two sources of variances when considering a group of repeated observations: Difference among true scores for different individuals & measurement error.

0.75

What Relative Reliability Coefficient Value generally signals decent reliability.

Predictive Validity Example

When a measure is validated by predicting scores on a criterion measured in the future. The level of spinal cord injury on the ASIA scale predicts the likelihood of walking.

Are Systematic Errors Easy to Correct?

Yes, when detect because it is consistent.


संबंधित स्टडी सेट्स

NEC National Electrical Code CH2 {Article 250-285}

View Set

Ch. 18: Intraoperative Nursing Management Prep U

View Set

U.S. History, Module 8 (Exam Review), chapter 26, CH 22 APUSH, CH 21-25 APUSH Test, APUSH Chapter 23, History Midterm, American History ll Midterm, US History II Midterm, CHP 22, CHP 23, CHP 24, HIST 1320 Ch. 22 & Ch. 23, CHP 26, Final Exam ICC Histo...

View Set

Quantitative Reasoning Missed Questions

View Set

Essentials of Real Estate Finance Unit 4

View Set