Measurement, reliability and validity

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Example of ordinal questions

1. Are you underweight, healthy weight or overweight; 2. How many times do you eat pizza a week? (never, sometimes, often) 3. How close do you live to campus? (close, not too far, really far)

What are the two types of validity?

1. Convergent validity: 2 measures of a similar construct should be correlated 2. Discriminant validity: 2 unrelated measures should not be correlated (don't expect 2 unrelated measures to be correlated)

How to establish construct validity?

1. Correlate new test with an established test 2. Show that people with and w/o certain traits score differently 3. Compare your measure with other related measures of similar or differing constructs

Define item to total correlation

1. Item to total correlation ∙ correlate performance of each item w/ overall performance across participants ∙ ACROSS ALL PARTICIPANTS (within the study)

How do you measure internal consistency?

1. Split-half reliability 2. Cronbach's alpha

Types of reliability

1. Test-retest 2. Parallel forms 3. Interrater/intrarater 4. Internal consistency

Factors that influence error score

1. Tester or rater (errors in taking measurements, recording behavior or data entry) 2. Measurement instrument (Equipment malfunction/uncalibrated, unclear questionnaire) 3. Variability in characteristic being measured (transient states of the participants, ie food intake, mood, blood pressure, fatigue-level) 4. Situational factors (room temperature, lighting, crowding)

Decreasing Error/Increasing Reliability

1. Tester or rater (maintain consistent scoring procedures) 2. Measurement instrument (increase number of items/observations, standardize instructions, eliminate unclear questions) 3. Variability in characteristic being measured (standardize testing conditions) 4. Situational factors (minimize the effects of external events)

If you are measuring BMI in your research study, what issues could contribute to: A. systematic error B. random error

???

Which are strategies to improve reliability in a study?

???

Nominal Scale

Assignment of labels ∙ quality but not numbers ∙ for categorical variables - cannot quantify! ∙ categories vary in quality but not amount ∙ cannot say that one is more or less than another

Which of the following variables is an example of the nominal level of measurement? A. rank in graduating class B. gender C. age of students D. amount of money earned

B. Gender

How to increase reliability?

Decrease error! ∙ Inc. sample size (# of items/observations) ∙ Standardize instructions ∙ Eliminate unclear q's ∙ Standardize testing conditions ∙ Minimize the effects of external events ∙ Maintain consistent scoring procedures ∙ Moderate test difficulty

What is validity?

Does the test do what it's supposed to do and measure what it's supposed to measure; tool measures "what it should - truthfulness, accuracy, authenticity" ∙ refers to the meaning of the test's results not the test itself

Concurrent Validity (Present-criterion)

E.g. a food frequency questionnaire validated with 24 hr recalls or diet records collected for the same time period.

Predictive validity (future)

E.g. how well do nutrition screening tools predict mortality among seniors.

Parallel/Alternate forms

Equivalence ∙ Often want to eliminate practice effects ∙ 2 diff forms of same test to same group ∙ to assess dietary knowledge, give slightly different forms of the test (make sure they are comparable) ∙ eliminate practice effect (having done it once knows the answer)

A valid measurement doesn't have to be reliable. T/F?

False

How do you know if you're measuring the right thing in the right way

First you'll need to find a measure that's "reliable": consistent, dependable, predictable

Examples of nominal questions.

Gender, favorite color, where do you live, mode of transportation to school

What is internal and external validity?

Internal: are the methods correct and the results accurate? External: are the findings generalizable

CQ: Temperature in degrees Celsius is an example of which level of measurement?

Interval ∙ can you rank temperatures (yes - 25C hotter than 15C) ∙ are spaces between ranks equal? (Yes 24-25C is the same as 14-15C) ∙ Does it have an absolute zero & can you make a meaningful ratio? (No - 0C does not mean absence of temperature)

What is considered least precise level of measurements?

Nominal

NOIR - no one is ready

Nominal - Lowest, categories, no rank Ordinal - Second lowest, ranked categories Interval - next to highest, ranked categories with know units b/w rankings Ratio - highest, ranked categories with known interval and an absolute zero

True score equation:

Observed score = True score + error score ∙ True: perfect reflection of true value (theoretical but never truly known) ∙ Error: diff b/w true and observed

Define reliability using an equation.

Reliability = True Score / (true score+error score) ∙ Reliability of the observed score becomes higher if error is reduced!!

Systemic error

Source of error ∙ "predictable" errors of measurement ∙ i.e. consistent under/over estimation ∙ e.g. measured height consistently 0.5 cm greater than true height ∙ major concern for *validity of measure*

Random Error

Source of error ∙ due to chance ∙ e.g. fatigue, mistake ∙ major concern for *reliability* ∙ e.g. True height = 167 cm measured (observed) height = 166.5, 168, 166 etc

operational definitions

Specifying exactly what will be observed and how it will be done ∙ how to measure the variables e.g. measuring socioeconomic status --> total family income + highest level of school completed

Conceptualization

Specifying what we mean by a term ∙ helps translate an abstract theory/ construct into specific variables ∙ makes it possible to test hypotheses ∙ e.g. Are children healthier when they eat well? (what do we mean by "healthier" and "eat well"?)

Test Re-test Reliability

Stability over time: ∙ give same test to same people to see if same results are obtained ∙ choosing the correct time periods b/w tests ∙ characteristic measured does not change over time ∙ e.g. give IQ test twice

"To assess the _____ of the DHQ, 58 pregnant women completed it twice within a 4-5 week interval" which type of reliability assessment was used?

Test-retest reliability

Level of measurement

The degree of precision by which a variable may be assessed

Operationalization

The process of connecting concepts to observations

Measurements can be reliable but not valid. T/F?

True

Poor operationalization

can influence study results (invalid operational definitions) ∙ E.g. operationalizing "success in career" by looking at pay cheque only

Cronbach's Alpha

conceptually, it is the average consistency across all possible split-half reliabilities ∙ can be directly computed from data (0.7 is seen as acceptable) ∙ used for internal consistency reliability ∙ often reported to show scale reliability

What are the types of validity?

face, content, criterion, construct

Variables are measured at one of four levels which are:

nominal, ordinal, interval, and ratio ∙ (in less accurate to more accurate order) ∙ The more precise (higher) the level of measurement, the more accurate is the measurement process

Good example that uses NOIR

numbers assigned to runners - nominal rank order of winners - ordinal performance rating on 0-10 - interval time to finish - ratio

Split-half reliability

randomly divide items into 2 subsets & examine consistency in total scores across the 2 subsets 2. Split half reliability -randomly divide items into 2 subsets and examine the consistency in total scores across the 2 subsets -within a person's response

Define reliability

reproducibility of a measurement -a consistent and free from error measurement

Criterion validity

∙ Ability of the tool to predict results obtained on an external criterion or reference standard ∙ How well does test estimate performance in comparison to an external criterion or reference standard ∙ criterion should be a valid indicator of variable of interest and relevant to variable being measured ∙ concurrent validity (current) ∙ predictive validity (future)

Examples of ratio scale

∙ Age (days, months, years) ∙ Height (cm, inches) ∙ Nutrient intake (kcal/day) ∙ Length of hospital stay (days) ∙ Medical costs ($)

Ordinal scale

∙ Assignment of values along some underlying dimension ∙ One observation is ranked above or below another (variables are ordered) ∙ BUT you cannot say the amount of one variable is more or less than the other

Interval scale

∙ Assignment of values with equal distances between points. ∙ one score differs from another on a scale that has equally appearing intervals ∙ arbitrary zero ∙ BUT cannot say the amount of difference is an exact representation of difference s of the variable being studied i.e. not why

Criterion Validity Examples

∙ Brief beverage and snack questionnaire (target tool) compared with intakes from FFQ/24 recall (criterion). ∙ FFQ (target tool) compared with objective measure such as blood biomarker (criterion) ∙ Questionnaire to assess childcare nutrition environment (target tool) compared with child observation or caregiver interview (criterion)

Internal consistency

∙ Consistency of underlying measures ∙ Extent to which items measure the same characteristic ∙ degree of correlation/consistency among items in a scale ∙ correlate performance of each item with overall performance across participants.

Face validity

∙ Does the measuring instrument appear to test what it is supposed to? ∙ Does it appear to be valid to the persons completing it?

define construct validity

∙ Extent to which test results are related to underlying construct ∙ Difficult to establish - often use a combination of methods ∙ Can try to compare to "gold standard" if there is one... (chicken and egg). ∙ how well does it measure an abstract concept (e.g. how well does a questionnaire measure healthy eating)

Absolute vs. Relative Validity

∙ For "absolute" validity you need a true gold standard that is an exact measure of what is intended to capture ∙ in nutrition studies, we seldom achieve absolute validity

Importance of levels of measurements

∙ How a variable is measured can determine the amount of information we obtain (i.e. measuring elevated body mass as BMI gives more information that overweight/underweight) ∙ The level of measurement affects the types of statistical test you can use.

Content validity

∙ How well the items represent entire universe of items ∙ ask an expert: does this instrument measure everything it is supposed to? ∙ think about a short FFQ to assess vitamin D intake. Are all relevant Vitamin D source covered? ∙ Does HEI capture key aspects of dietary quality in nutritional guidelines?

Rater reliability (inter/intra)

∙ Inter: consistency b/w raters (two raters judge the same event/behavior and assess agreement) ∙ Intra: stability of measures by the same person (to assess effects of bias, fatigue etc.)

Why should we care about levels of measurements?

∙ Measurement should be as precise as possible ∙ In social science, variables are often measured at the nominal or ordinal level (but in nutritional/food science and dietetics we often want precise date such as blood glucose, Na concentration) ∙ How a variable is measured can determine the level of precision ∙ Affects types of statistical test you can use.

Challenges in measuring change

∙ Need valid & reliable tools of measurement ∙ level of original measurements matter ∙ reliability of the tool ∙ starting point matter (floor/ceiling effects) ∙ Variables change naturally over time (and people can get batter at taking tests with practice) ∙ can you detect the difference? how much is clinically important?

Vitamin D intake measurements

∙ Nominal = Vit D supplements (yes/no) ∙ Ordinal = intake reported as exceeds/meets/does not meet DRI ∙ Interval = set RDA as 0 and report scores as (+) above or (-) below RDA. (e.g. for RDA 600 IU/d, an intake of 1000 IU/d would be reported as +400). ∙ Ratio - Vitamin D intake as IU/d

Ratio scale

∙ Possess all the properties of the nominal, ordinal, interval scales ∙ has an absolute zero point ∙ CAN say values differ, by how much, and what this means

What are the types of errors made in measurements?

∙ Systematic error ∙ Random error

What are the factors that can lower validity?

∙ Tests that are too short ∙ Identifiable patterns of answers ∙ Unclear directions/vocabs ∙ Time limits ∙ Level of difficulty ∙ Poorly structured test items ∙ Difficult reading vocabulary and sentence structure

How is reliability measured?

∙ Using correlation coefficient (r) ∙ indicate how scores on one test change relative to scores on a 2nd test ∙ can range from -1 to +1 (=perfect reliability, while 0 = no reliability)

Examples of interval questions

∙ What is you GPA? (2 vs 4 doesn't mean you are twice as smart) ∙ What is you IQ? (120 vs. 40 doesn't mean you are 3 times as smart) ∙ What temperature (C) was it when you left the house? ∙ What is your score on the "Healthy Eating Index" ∙ Year (AD, BC) - equally occurring intervals (1980-81 is same as 2010-2011, but year 0 is not beginning of time.

Approaches for establishing validity

∙ correlate new test with an established test ∙ show that people with and without certain traits score differently ∙ determine whether tasks required on test are consistent with theory guiding test development


Kaugnay na mga set ng pag-aaral

Ch 13: Labor and Birth Processes

View Set

Tech Enabled Business Transformation

View Set

Precalculus Chapter 2: Polynomial and Rational Functions - LIONS

View Set

Introduction to AP Government & Politics

View Set

VTHT 1349 Veterinary Pharamcology

View Set

Exam 1 prep u practice questions (33, 23)

View Set

NCLEX - Legal & Ethical, Nursing Jurisprudence: Legal and Ethical Considerations NCLEX Practice Quiz, Legal and Ethical Issues in Nursing, NCLEX Questions-Ethical and Legal Issues, NCLEX STYLE REVIEW QUESTIONS FOR NURSE PROCESS, LEGAL, PROFESSIONALIS...

View Set