Test 3 OT 667

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Criterion referenced measures

- used to compare an individuals performance on a set of objectives with a particular pre-set criterion for acceptable achievement.

When evaluating a study discuss the:

-internal validity - external validity - social validity

Criterion Vs Norm pop quiz

.....

If the mean is 10 and the standard deviation is 2: •If a student's score is 8, what is z? •If a student's scores at the 84th percentile, what is her raw score? Z-score? •Would you expect someone to have score of 20?

1. •z = (8 - 10) / 2 = 1 2. •84% = 1 standard deviation >>> 1 = z-score •1 = (raw score - 10) / 2 >>> raw score = 12 3. •z = (20 - 10) / 2 = 5 •A z-score of 3 = 99.9 % So while technically possible, it is extremely unlikely

High vs Low Reliability Score

A low reliability score of an instrument as applied to a specific population causes implications—unable to guarantee if effects are result of intervention

Norm con:

Cons: •Being above average does not necessarily imply "A" performance •Half the test-takers must be below average

Norm :pro

Pros: •Ensures a "spread" between top & bottom of the class for clear grade setting •Assumes a standard normal distribution •Shows individual performance relative to group

Reliability: Variance

Reliability = Between subject variance _____________ Between subject variance + within subject variance (error) •

Two methods to interpret the data collected from standardized tests :

- Criterion- referenced measures - Norm- referenced measures

criterion referenced measures

- an individuals score is usually expressed as a percentage of items answered correctly - every test taker knows what the benchmarks/objectives are - it is possible for ALL test takers to achieve 100% mastery

CON criterion referenced measures

- based on a predetermined set of criteria. Difficult to know just where to set criteria ! - lack of comparison data with other individuals

Types of Reliability

- inter- rater - intra relater - test - retest - internal consistency (all have 2 sets of data points compared except when there is only 1 = internal consistency)

Norm referenced measures

- scores are often transformed into a common distribution( normal distribution curve)

PRO of criterion referenced measures

- sets minimum performance expectations - demonstrates what the individual can and cannot do in relation to important content area standards

What are the factors in determining the cut-off score of a test?(criterion referenced measures)

- statistical procedures (ex. 1 z score) - clinical judgement (art) - political (depending on the impact/ consequence of test results (ex. high stakes test to make major decision about candidates)

norm referenced measures are used:

- to show how individual scores compare to scores of a well-defined norm group of individuals with little emphasis on the absolute amount of knowledge or skill. - To rank each individual with respect to the achievement of others in broad areas of knowledge (sampled from a variety domains)

Scores for the NBCOT examinations are reported on a scale from 300 to 600, with a mean of 480. A total scaled score of at least 450 is required to pass, and the passing rate = 84% •Using the above information, find the z-score given that the scaled score is 525

Scores for the NBCOT examinations are reported on a scale from 300 to 600, with a mean of 480. A total scaled score of at least 450 is required to pass, and the passing rate = 84% (find z-score for 525 score) •480 = z-score of 0 since it is the mean •450 = z-score of -1 •84% pass and passing score is 450 •84% - 50% = 16% •16% = 1 standard deviation below mean •Standard deviation = 30 •480 (mean) - 450 (1 SD below) = 30 •z = (525 - 480) / 30 = 1.5 •Plug z-score into percentile calculator using 1 tail to get from 0 to percentile in •Answer: 525 >> 93.3 percentile •

Z score formula

z=(raw score -mean)/standard deviation for sample: x- x(line over it)/ std

•When a clinician rates a group of patients' performance on a certain skill at two different times, what type of reliability coefficient is computed?

•Inter-rater or Test-retest reliability •Cannot separate difference between two in the question • •If NO clinician was present, then test-retest reliability •If a clinician is present then there is no difference between the two

Reliability Coefficients (range: 0-1)

•Inter-rater, intra-rater, and test-retest reliability use ICC

Inter-rater, intra-rater, and test-retest reliability use ICC

•Intra-class correlation coefficient •Used to indicate the degree of agreement between measurements •ICC = 0.9 or above is high and desirable •ICC = 0.7 or below is low and undesirable •Does NOT take into account systematic error

ICC continued

•Intra-class correlations should not be used to estimate reliability •When there is a systematic error between the raters, correlation cannot tell the extent of agreement between the two sets of measurement because correlation tells only how the two sets of scores vary together •ICC does NOT take into consideration systematic errors ex: see slide 47

norm referenced measures

•Items are selected that discriminate between high and low achievers •If too many people get a question correct, then test questions are thrown out until they achieve a normal curve again •Same for too many people getting a question incorrect

SEM

•Its purpose is to indicate how precise an estimate of the true score is of an observed score •Given an observed score, SEM is used to estimate the range within which an individual's true score probably falls (confidence interval) •Gives the margin of error that you should expect in an individual test score because of imperfect reliability of the test •Can never know true score of individual but can calculate true score range using true score plus or minus the error •Biden: 65% +/- 5 >>> true score range of 60-70 •Trump: 63% +/- 5 >>> true score range of 58-68 Cannot tell who will win because of overlap

MDC Clinical Scenario: Disabilities of the Arm, Shoulder, and Hand Questionnaire (DASH) •SEM in athletes is 3.61 points •A patient scores 67 out of 100 on the instrument • •What is the minimum score a patient must achieve at the follow-up test to be confident a change has occurred?

•MDC = score - (SEM x 1.96 x root 2) •MDC = 67 - 10 = 57

•A client got a score of 85 on a norm-referenced test. This means that the client . . .

•Mastered 85% of the material covered in the test •Achieved a score of better than 84% of those taking the test •Achieved a score of 1 standard deviation above the mean of that population group •None of the possible answers •=None of the possible answers •85 what? Out of what? Need to be presented as percentage or percentile •If provided with a percentile then 2 possible answer choices are correct: •Achieved a score of better than 84% of those taking the test •Achieved a score of 1 standard deviation above the mean of that population group

Observed score variance

•Observed score variance = true score variance + error variance (or noise) •Error can be systematic or random

•Inter-rater—2 clinicians rate same client

•One we used for our project •2 people rating then calculating agreement •Inter-rater reliability—degree to which two or more independent researchers using same data collection instrument obtain the same results •This needs to be > 85% to be valid

•High within subject variance

•Person one: 44 >> 1000

•High between subject variance

•Person one: 5 >> 6 •Person two: 80 >> 90

•Low within subject variance

•Person one: 50 >>55

•Low between subject variance

•Person one: 55 >> 56 •Person two: 50 >> 55

T-score

•Provides location of score in distribution with mean of 50 and standard deviation of 10 •T = 10(z-score) + 50 •Often misinterpreted as percentages

The instrument chosen to evaluate a patient depends on several

•Psychometric properties (Accuracy (=validity), Precision (=reliability) •Appropriateness •Ease of use •Access Cost

•Patient bias due to:

•Reactivity/social desirability •Recall bias •Faking

Measurement Properties: psychometric info

•Reliability •Standard error of measurement •Minimal detectable change •Validity • •Not all assessments are standardized tests with criterion or norm-referenced measures

•When evaluating a measure discuss the:

•Reliability •validity

Reliability of Measures

•Reliability = true score variance / observed score variance

•Consider an IQ test with a standard deviation of 15, and with a reliability of 0.9 If a person scores 100 on the IQ test, how confident are we that the person's true IQ is 100?

•SEM = 15*√(1-0.9) = 4.7 •With 68% CI, we could estimate the person's true score to be between 95.3 and 104.7 •Or 68% of the time the true score would fall in this band •If know standard deviation & you know reliability, then you know range of true score

SEM continued

•SEM is estimated using the standard deviation of the observed scores multiplied by the square root of 1 minus the reliability of the scores •SEM = (SD baseline)*√ (1 - ICC) •ICC = intraclass correlation coefficient = reliability •When applied, SEM is used to calculi confidence intervals around the observed scores •Purpose of confidence interval is to determine the range of scores that we are reasonable confident represents an individual's true ability •68% CI = score ± SEM •95% CI = score ± (1.96 x SEM) •Score: observer score •1.96 is deviation This will produce a range in which true score is located

Standard Error of Measurement SEM

•SEM is the standard deviation of a theoretically normal distribution of test scores observed by one person on equivalent tests Alternative way to express reliability

Standardized score

•Score that results from transformation to fit normal distribution •Allows for comparison of performance across 2 different measures •Example: ACT vs SAT •Reports performance on various scales to determine how many standard deviations the score is away from the mean

•Test-retest

•Take ACT then take it again •See difference of test 1 to test 2 and check for correlation •Consistency of stable characteristics over a period of time •Procedure: administer the same test to the same individuals over a period of time •Considerations: intervals, carryover, and testing effects •Assume no change of construct over time interval

Factors Affecting Reliability & Possible Sources of Measurement Error

•Test-retest interval •The manner in which the measure is scored •Video taped vs real time •Test length •Increase length with more similar items increase reliability •https://en.wikipedia.org/wiki/Spearman%E2%80%93Brown_prediction_formula •The Spearman-Brown prediction formula is a formula relating psychometric reliability to test length and used by psychometricians to predict the reliability of a test after changing the test length •Poorly constructed test Guessing is not a measure of ability

The objective of a criterion-referenced test should include evaluation of . . .

•The client's skill level •The client's current level of functioning •The client's skill & performance levels •The conditions and standards =•The objectives of a criterion-referenced test should include evaluation of the conditions and standards

Validity

•The degree to which a measurement instrument accurately measures the outcome of interest

Reliability

•The degree to which an assessment tool produces stable and consistent results •It is a way to reflect the amount of error inherent in any measurement •The consistency of the measurement (no error)

continued :Factors Affecting Reliability & Possible Sources of Measurement Error

•The physical and emotional state of the individual at measurement time •Variation of the test situations •Poor test equipment •Malfunctioning or improperly calibrated

Example: Timed Up & Go Test

•This is a standardized test with clear instructions •HOWEVER just because there is clear instructions doesn't mean people will follow them •Test-retest reliability—test one day then test again next day •Correlation can be used for test retest for different people •Reliability of test depends on population •Cannot say reliability of instrument is good or bad •Reliability is not a property of test but rather of people being evaluated with that test •Timed Up & Go test has high reliability for young healthy adults, but poor for toddlers •Toddlers will not be able to follow instructions ** Reliability is not a property of the instrument, but of the population**

Disabilities of the Arm, Shoulder, and Hand Questionnaire (DASH, 30-items) [1=no difficulty/symptoms; 5=unable/extreme symptoms]

•This is an example of internal consistency •The DASH assesses any joints in the upper limbs •Rated on range of 30-150 with 150 being severely disabled •Want to know if all these items measure disability •How closely do these items all measure the same things

True (universe) score

•True (universe) score for a particular assessment procedure is the hypothetical average of the observed scores the participant would obtain if the participant is repeatedly assessed under the same conditions

Clinical scenario: SEM for the Box and Blocks Test in Chronic Stroke is 3.7 block per minute •On evaluation, the patient is able to move 7 block in 1 minute •After 4 weeks of treatment, the patient moves 10 blocks in 1 minute • •Did the patient make a change that is beyond measurement error?

•True score range at evaluation: 7 +/- 3.7 = 3.3 to 10.7 •True score range after treatment: 10 +/- 3.7 = 6.3 to 13.7 •Or true score range at eval = 3.3 to 10.7 and 10 falls within this range •Patient did NOT make a change beyond measurement error •There is overlap so cannot guarantee an improvement from intervention

•Internal consistency

•Used when unable to get 2 data points •Used when there is only 1 data point

range for reliability

•Value range for reliability in a group of individuals is 0 - 1

Reliability: Variance

•Want high between subject variance & low within subject variance error

MDC continued

•We use the MDC to calculate confidence intervals around the observed score at baseline •Benchmark score for a noticeable change = observed scores (at baseline) +/- MDC •If the observed scores at post treatment is beyond the MDC confidence interval, that mean changes of the patient's score at post treatment is noticeable or real (beyond the measurement error) •This will produce range beyond measurement error

Clinical scenario: Pt. scores a 46 (out of 56) on the Berg Balance Scale (BBS) ranged from 0 to 56 (14 items). Higher scores = better balance. •SEM for the BBS is 2.3 points for elderly, cut-off for fall-risk is < 45 •http://www.chiropractic.on.ca/wp-content/uploads/fp-berg-balance-scale.pdf What is the range the true score lies (68% CI)?

•What is the range the true score lies (68% CI)? •Range = 46 +/- 2.3 = 43.7 to 48.3 •Is this patient at risk of falls? •Patient IS fall risk because range goes below fall risk score •He has potential to fall so cannot be released

Interpreting test scores: Normal curve equivalent (NCE)

•Y-axis is probability (relative frequency) •Shape of distribution changes with only two parameters •σ (sigma) and μ (mu) •All = 100% •Cumulative percentage roughly equals percentiles

Z score

•Z-scores correspond to different percentiles & is used as converter

Transforming Z-score to Percentiles

•Z-scores tell you where a value fits into a normal distribution •Based on the normal distribution, there are rules about where scores with a z value will fall, and how it will relate to a percentile rank •You can use the area under the normal curve to calculate percentiles for any score

Test-retest reliability of Timed up and go test (https://www.youtube.com/watch?v=j77QUMPTnE0) for •(a) young healthy adults •(b) adults with hemiplegia •(c) toddlers •Why they are different?

•between-subject variance vs. within-subject variance

Raw score

•score must be transformed in order to be useful •Raw > percentile •Makes easier to transform into z-score

Percentile rank

•single number that indicates the percentage of norm group that scored below a given raw score •Assume that the elements in a data set are a ranking ordered from smallest to largest and range from 1-99 •Much more compact in middle of distribution (not equal intervals) •Major changes in middle while minor changes further out •ACT scores 20 > 26, percentile: 52 > 82th; [1 point = 5 percentile scores) •ACT scores 30 > 35, percentile: 93 > 99th; [1 point = 1 percentile scores)

When Conducting a standardized test:

Examiners must use the exact same instructions, materials, and procedures each time they administer the test and they must score the test using criteria specified in the test manual

Examples of norm measures

Examples: ACT, SAT, GRE •In this test, one is against other people •Scores are based on and reported as percentiles •There is no pass or fail •Scores range from 0-99% and compares abilities of individuals •Composed of ordinal data • • •However, if a cut off is put in place it becomes criterion measure •ACT score of 22 = norm measurement •UAB has ACT requirement of 22 = criterion measurement

You administer 4 standardized developmental assessments to a child; each assessment evaluates a different areas - motor, cognition, language, & emotion. Her scores are as follows. Which area of her development is mostly impaired? •Motor: T-score = 40 •Cognition: Z score = -1 •Language: ~ 16th percentile •Emotion: raw score = 350, [mean 500, standard deviation = 100]

First, convert all to z score •Motor: T-score = 40 >>> z-score = -1 •Used bell curve (found T-score 40 and went up to find equivalent z-score) •Cognition: Z score = -1 •Language: ~ 16th percentile >>> z-score = -1 •Used bell curve (found 16th percentile & went down to z-score) •Emotion: raw score = 350, [mean = 500, SD = 100] >>> z-score = -1.5 •z-score = (350 - 500) / 100 = -1.5 Answer: Emotion

standardized test

Has uniform procedures for administration and scoring

Intra-class correlation (ICC)

ICC is used when the variables are different measurements of the same trait (unit) •Weight measured on 2 different scales (inter-instrument reliability) •Administrate an ADL assessment twice to the same group of patients within a short period of time by same rater (intra-rater reliability) •Ask the same group of patients to rate their ADL performance using the Barthel Index twice within short period of time (test-rest reliability) •ADL performance of a group of patients was evaluated by 2 clinicians using Barthel Index (inter-rater reliability)

Standardized scores:

Z-score

Z score is used to measure

an observer's distance from the mean

goal of criterion measures

determine which skills a person can and cannot accomplish, thereby providing a focus for intervention

external validity

extent to which we can generalize findings to real-world settings

Ex of criterion:

getting a drivers license - test to get a license is pass/ fail - everyone who takes it can pass it - cant differentiate who is better because scores don't mean anything - a pass is a pass regardless of if a person gets 100 or an 80, they all get a license - if there is a cut off point then then it meets criterion measures: ex: cut off point for admissions

Item homogeneity:

homogeneity of a set of items •Consistency of performance of a group of individuals across the items on a single test •Procedure: administer the measure under standardized conditions to a representative group on one occasion then determine the value of the (Cronbach's alpha) coefficient •A measure of scale reliability •How closely related a set of items are as a group •This only needs to be completed once •1 data point vs 2 data points from all other types of reliability

•Internal validity

the change in outcome is indeed the result of a particular intervention

intent of criterion referenced measures

to measure an individuals performance on specific tasks rather than to compare the individuals performance with that of his or her peers.

The normal curve and percentages

•**Need to remember these numbers for exam** •Z-score = 1 •1 standard deviation above mean •Percentile equals 34% + 50% (because a z-score of 0 = 50%)

Z-score

•0 if in middle •1 = 1 standard deviation above norm = 84% •-2 = 2 standard deviations below norm = 2nd percentile •Can equate to a severe delay

Why is most of the reliability value not 1 or 0?

•1 is max because if zero error would just be true score/true score = 1 •Impossible to have 1 because impossible to have zero errors •Approaching zero because can have infinity of errors and if infinity is in denominator then result is approaching zero •Impossible to actually have 0 as impossible to have infinity of errors

Reliability is:

•A characteristic of scores of a sample NOT a characteristic of an instrument •A property of the score of a test in a particular group of examinees •Always established for a defined population •A prerequisite for validity •Is a necessary, but not sufficient, evidence for validity

Z-scores

•A z-score states the position of a raw score in relation to the mean of the distribution, using the standard deviation as the unit of measurement

Minimal Detectable Change (MDC)

•Aka smallest detectable change (SDC) •A statistical estimate of the smallest amount of change that can be detected by a measure that corresponds to a noticeable change in ability •The MDC is the minimum amount of change in a patient's score that ensures the change isn't the result of measurement error •MDC is calculated in terms of confidence of prediction •MDC(95% CI) based on a 95% confidence interval (CI) is: •MDC(95% CI) = 1.96 * SEM * √2 •**Don't need to know formula, basically need SEM number to be larger than 1**

•Intra-rater—1 clinician rates client

•Clinician rates client then day later rate same client again •See difference of one person across time on same person

•Internal consistency uses α

•Cronbach's alpha or alpha coefficient Takes into account systemic errors

norm referenced measures

•Designed to yield a normal curve, with 50% of test takers scoring above the 50th percentile and 50% scoring below it

3 main characteristics of normal distribution

•Distribution is symmetrical •Mean, median, and mode are same score & located at center of distribution •Percentage of cases in each standard deviation is precisely know

Why is the std used to divide?

•Divide this difference by the standard deviation (in order to assess how big it really is)

social validity

•Evaluate significance of goal, magnitude & importance of effects, and appropriateness/acceptability

what is the z score equation used for?

•Find the difference between a score and the mean of the set of scores

clinician bias due to

•Halo effect •Expectation •Drift - environmental bias

Z score: distance is measured in std units

•If a z-score is zero, it's on the mean. •If a z-score is positive, it's above the mean. •If a z-score is negative, it's below the mean. •If a z-score is 1, it's 1 SD above the mean. •If a z-score is -2, it's 2 SDs below the mean.

Why might test-retest reliability become less important? •If participants become familiar with a test & all perform better on second occasion •If concept being measured is not expected to be stable over time •If alternate form reliability cannot be calculated •If convergent validity was not high

•If concept being measured is not expected to be stable over time •Reliability is seeking a stable measurement day after day

•High vs low Cronbach's alpha (internal consistency)

•If low then items do not all measure the same thing •Score is not representative of variable being measured

•High vs low inter-rater reliability score

•If low then unable to switch out evaluators and one person will need to evaluate it all •Need a good agreement between evaluators to allow for switching if needed

High vs low test-retest reliability score

•If low, intra-subject variance is high •Conditions should stay constant when testing reliability If weighing 2 times, don't let participants eat dozen donuts before second measure if didn't for first

•SEM can be used to determine how much variability could be expected on retesting the individual ( SEM example continued)

•If the person could be retested on the same test a number of times, one could expect: •In about 68% of the retests the scores would fall within a range between 95.3 and 104.7 points •In about 95% of the retests the scores would fall within a range between 90.7 and 109.3

•Z-scores are standardized, so they can be compared

•If you know your score on an exam and your friend's score, you can convert to z-scores to determine who did better and by how much

•Instrument bias due to:

•Improper calibration


संबंधित स्टडी सेट्स

Chapter 18 Bone and Joint Problems (EVOLVE ch 49/51 med/surg)

View Set

Chapter 1 Pre-Work/Quiz CIST-004A

View Set

8. The Berlin Blockade (Cold War)

View Set