Reliability and validity

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

What are the three kinds of validity ?

1) Face Validity 2) Content validity 3) Criterion validity

Three types of reliability are?

1) test-retest reliability (overtime) 2) across items (internal consistency) 3) Across different researchers (inter-rater reliability)

Reliability - thresholds

1: perfect reliability >0.9: excellent >0.8-<0.9: good >0.7<0.8: acceptable >06<0.7: questionable reliability >0.5<0.6: poor reliability <0.5 : unacceptable reliability 0= no reliability.

internal consistency

A measure of reliability; the degree to which a test yields similar scores across its different parts, such as on odd versus even items. On a multiple item measure. All items are suppose to reflect the same underlying construct. Thus, people's scores on a set of items should be correlated with each other. E.g. on a scale of loneliness a person who agrees they feel left out should also agree they feel alone. - if people's responses to the different items aren't correlated with eachother, then it would no longer make sense to claim that they are all measuring the same underlying constructs.

Convergent validity: criterion validity

Criteria can also include other measures if the same construct. Known as convergent validity. One would expect anxiety to be positively correlated with existing measures of the same construct.

Inter relater reliability

Is the extent which different observers are consistent in their judgement. Assessed using chronbach's alpha when the judgement are quantitative or cohens k judgements are categorical

Discriminant validity

Is the extent which scores on a measure aren't correlated with measured of variables that are conceptually distinct ( the degree which items are designed to measure different constructs of discrimination between eachother) A new scale shouldn't correlate with other scales designed to measure different constructs.

Internal consistency : chronbach's a

Refers to how closely related a set of items are as a group. Extent to which different items on the same test or sub scale on a larger test correlate with eachother. Alpha coefficient rating 0-1 : the higher the score, the more reliable the score is. A value of +.70 or greater is generally taken to indicate good internal consistency.

How can internal consistency be assessed ?

Split half method!

Predictive validity:

The success with which a test predicts the behavior it is designed to predict; it is assessed by computing the correlation between test scores and the criterion behavior. How well does the test predict something In the further such as job performance or degree grade.

What's the split half method?

This method involves splitting the items in the questionnaire into two halves with each half measuring the same elements but in slightly different ways. Even and odd numbers split for example. If the scale is very reliable, a person's score on one half of the scale should be the same or similar to their score on the other half! A split half correlation of +.80 is generally considered good internal consistency / reliability. (-) the problem with this method is that there are several ways in which a set of data can be split into two and so the results could be a product of the way in which the data was split.

Concurrent vs predictive validity : criterion validity

When the criterion is measured the same times as the construct, criterion validity is referred to concurrent validity. However, when the criterion is measured at some point in the FUTURE ( after the construct been measured) it is referred to as Predictive validity (as they have a future outcome)

Reliability is:

degree of consistency with which an instrument measures the attribute it is supposed to be measuring. Reliability = stability, consistency, dependability of a measuring tool, can reproduce consistency. The LESS VARIATION an instrument produces the HIGHER ITS RELIABILITY

face validity (content validity)

refers to simply whether the test looks as though its measuring what its supposed to. Assessed informally - such as a pilot stage for instance they could ask pilot ppts 'does this appear to measure X'. Most people would expect a self esteem questionnaire to include items about whether they see themselves as a person of worth or whether they think they have good qualities. Thus, a questionnaire that included these kinds of items would have good face validity.

criterion validity (predictive validity)

the degree to which test scores indicate a result on a specific measure that is consistent with some other criterion / variables (criteria) of the characteristic being assessed. A criterion can be any variable that one has reason to think should be correlated with the construct being measured. You should expect test anxiety scores to be positively correlated with general anxiety and with blood pressure during an examination (positive) You would expect test anxiety scores to be negatively correlated with exam performance and grades (negative correlation)

content validity

the extent to which a test covers the behavior that is of interest (construct of interest). It is assessed by carefully checking the measurement method against the conceptual definition of a construct.

Validity

the extent to which a test measures or predicts what it is supposed to. The extent the scores from a measure represents the variable they intended to.

concurrent validity

the extent to which two measures of the same trait or ability agree. How well does the test correlate with other established tests at around the same time.

test-retest reliability

using the same test on two occasions to measure consistency (e.g. questionnaire). (-) For many tools is that the second time the measure is administered is not effectively under the same conditions. Assessing test-retest reliability requires using the measure on a group of people at one time and using it again on the same group of people at a later time, and then looking at test-retest correlations between the two sets of scores. Typically done by computing Pearson's R In general, a test correlation of +.80 is considered to indicate good reliability. High test retest correlations make sense when the construct being measured is assumed to be CONSISTENT over time such as self-esteem and intelligence. (-) However some constructs are not assumed to be stable over time (mood). This would produce low test-retest reliability over a period of month wouldn't be an issue though.


Kaugnay na mga set ng pag-aaral

Intro to Programming Concepts Midterm Study Guide

View Set

дієслово як частина мови

View Set

NUR 106 Module A Practice Questions

View Set