Research Methods - Chapter 5

Ace your homework & exams now with Quizwiz!

reliability as item-total correlation

Another way to calculate reliability as internal consistency

Construct Validity 1. Face validity

(1) Face validity - an extent to which the measured variable appears to measure the construct: Example: Variable (or construct) to be measured Emotional intelligence Face valid measures I am good at judging others I am in control of my emotions Prone to reactivity, dishonesty

Validity

Although reliability shows the extent to which our measure is free from random error, it tells us little about whether or not we are measuring the construct that we intend to measure. E.g. math test and language Therefore, in addition to being reliable, the measure should be valid - it should measure the construct that it was designed to measure.

Random Error

Measured Variables may contain some random error that impact consistency of the results: Participant: Misunderstanding of the question Mood/amount of sleep/weather Testing at different place etc. Experimenter: Misprinting the materials Mistakes in recording answers etc. Random error usually doesn't alter results very much, and is balancing itself.

Systematic Error

Measured variable may contain systematic error - the measured variables can be impacted by other conceptual variables, and these relationships are not taken into account by the researcher. While random error is self-cancelling, systematic error may significantly alter results: systematically increase or decrease scores on measured variable. E.g.: high-self esteem-> lower anxiety score

(2) Equivalent forms reliability States

personality variables that are expected to vary over time within the same person (e.g. feelings, mood)

States

personality variables that are expected to vary over time within the same person (e.g. feelings, mood)

(2) Equivalent forms reliability Traits

stable personality variables that are not expected to vary much within people over time (e.g. extroversion)

Criterion validity (2) Predictive validity

(2) Predictive validity - establishing that the scores from a measurement procedure (e.g., a test or survey) make accurate predictions about the construct they represent (e.g., constructs like intelligence, achievement, burnout, depression, etc.) In order to be able to test for predictive validity, the new measurement procedure must be taken after (years, months) the well-established measurement procedure. Example: Universities often use ACTs (American College Tests) or SATs (Scholastic Aptitude Tests) scores to help them with student admissions because there is strong predictive validity between these tests of intellectual ability and academic performance, where academic performance is measured in terms of freshman (i.e., first year) GPA (grade point average) scores at university

split-half reliability

One way to calculate reliability as internal consistency - split the test into halves, and then to obtain correlations for each person How to divide items in the test? Odd-even First/second half Statistical measure of reliability - Cronbach's alpha [0.0 - 1.0], where 0 -completely unreliable; 1 - very reliable and consistent.

Criterion validity (1) Concurrent validity

(1) Concurrent validity - two different measurement procedures are carried out at the same time and compared. Example: a comparison of the scores of the Algebra exam with course grades in college algebra to determine the degree to which scores on the Algebra exam are related to performance in a college algebra class. Note: both happen at the same time.

Improving reliability and validity

(1) Conduct a pilot test - test the survey on a small group of people: troubleshoot poor items, make sure respondents understand items correctly, free-format self-report to create fixed-format, etc; (2) Use multiple measures - the more items that are measuring the same construct you have, the more reliable the scale is; (3)Ensure variability(discrimination) of your measure - make sure the items work; (4) Write good nonreactive items - will talk a lot about it later, avoid ambiguous wording; (5) Develop clear instructions and stress the importance of accurate answers; (6) Consider face and content validity when creating items - include broad range of reasonable questions to your scale; (7) Stick to existing scales - it guarantees that the scales are valid and likely to deliver reliable results.

Reliability - Classification 1. Test ReTest reliability

(1) Test-Retest reliability - an extent to which scores on the same measured variable correlate at two different tests occasions. Test-Retest reliability is the correlation between test scores on the first time with the second time (the same test). Example 1: A test designed to assess student learning in psychology could be given to a group of students twice, with the second administration perhaps coming a week after the first. The obtained correlation coefficient would indicate the stability of the scores. Advantages Sample size: each person is measured twice Disadvantages: Good for measuring stable traits Carry-over effect between administrations /practice effect/change of mood Reactivity Time between tests (too short - carry-over, too long - change of moods, information)

Construct Validity (2) Content validity

(2) Content validity - a degree to which the measured variable appears to have adequately sampled from the pool of potential domain of questions that might relate to the conceptual variable of interest. Example: Math ability - geometry, algebra, arithmetic's questions, not just geometry.

Reliability - Classification (2) Equivalent forms reliability

(2) Equivalent forms reliability - correlation between scores on two different but equivalent versions of the same measure that are given at different times. To overcome the shortages of Test-Retest reliability. Example: If you wanted to evaluate the reliability of a critical thinking assessment, you might create a large set of items that all pertain to critical thinking, and then randomly split the questions up into two sets, which would represent the parallel forms

Reliability - Classification (2) Equivalent forms reliability

(2) Equivalent forms reliability - correlation between scores on two different but equivalent versions of the same measure that are given at different times. To overcome the shortages of Test-Retest reliability. Example: If you wanted to evaluate the reliability of a critical thinking assessment, you might create a large set of items that all pertain to critical thinking, and then randomly split the questions up into two sets, which would represent the parallel forms Disadvantages: In practice it is not possible to verify that the two tests are parallel But taking the first test may change responses to the second test Advantages: Since the two forms of the test are different, carry-over effect is less of a problem. It is reasonable to assume that the effect will not be as strong with alternate forms of the test as with two administrations of the same test.

Construct Validity (3) Convergent validity

(3) Convergent validity - the extent to which the variable that measures a certain construct (anxiety) correlates with other variables that measure the same construct (anxiety). Example: if you developed the survey to study emotional intelligence, you want your total scores of your survey to be highly correlated with the scores of the developed scales of emotional intelligence.

Construct Validity (4) Discriminant validity

(4) Discriminant validity - the extent to which the variable that measures a certain construct (anxiety) is unrelated with other variables that measure different construct (emotional intelligence). Example: if you developed the survey to study emotional intelligence, you DON'T want your total scores of your survey to be highly correlated with the scores of the developed scales on other unrelated constructs.

3) Reliability as Internal Consistency

- an extent to which the scores on the items correlate with each other and thus are measuring the true score rather than random error: Observed test (measure score) = True score + Random Error 120 (observed score) = 115 (true score) + 5 (random error) Reliability = True score / Observed score Example: There is a scale (set of items) that measures the same state (anxiety), if you score high on one of the questions of anxiety, you are expected to score high on the rest of items. Therefore, the average correlation among scores on items should be close to 1.0 to state high consistency. Advantages: Suitable for measuring unstable states - e.g. fatigue Only one test administration is required Disadvantages: Good for test that measure one trait Is not suitable for timed test Cannot be used if a test items cannot be divided (For example, the Minnesota Multiphasic Personality Inventory has sub scales measuring differently behaviors such depression, schizophrenia, social introversion. Therefore the split-half method was not be an appropriate method to assess reliability for this personality test).

(4) Inter-rater reliability

- assess the degree to which different judges or raters agree in their assessment decisions. The use of this type of reliability would probably be more likely when evaluating artwork as opposed to math problems. Example: course evaluations, or presentation evaluations Cohen's kappa If the raters are in complete agreement then κ = 1. If there is no agreement among the raters other than what would be expected by chance, κ = 0. Problem: if two researchers are observing 'aggressive behavior' of children at nursery they would both have their own subjective opinion regarding what aggression comprises. In this scenario it would be unlikely they would record aggressive behavior the same and the data would be unreliable.

Validity Construct Validity

Construct Validity- the extent to which a measured variable actually measures the conceptual variable that it is designed to assess. E.g. the extent to which questions in survey that was designed to measure bias are indeed measuring bias.

Validity Criterion validity

Criterion validity - reflects the use of a criterion - a well-established measurement procedure - to create a new measurement procedure to measure the construct you are interested in. This is the correlation between the well-established scale and the new one (both scales have to be theoretically related). - to create a shorter version - to add more interesting aspects that are not captured in the original scale

Classical Test Theory (CTT):

Observed test (measure score) = True score + Random Error 120 (observed score) = 115 (true score) + 5 (random error) Reliability = True score / Observed score If True score = Observed score -> reliability = 1.0

Evaluating Measures - Reliability

Reliability - is the overall consistency of a measure. A measure is said to have a high reliability if it produces similar results under consistent conditions. Reliability - the extent to which a measure is free from random error. Reliability - the extent to which a measure is free from random error. For example, measurements of adults weight are often extremely reliable Measurements of adult height - even more reliable

The relationship between reliability and validity:

Reliability = consistency, how grouped Validity = precision, how close to target Validity is considered harder to achieve than reliability A test can be reliable without being valid, but cannot be valid without being reliable Reliability is an assumption of validity


Related study sets

Chapter 6- Formulating Hypothesis and Research Questions

View Set

Ch.36 Skin Integrity and Wound Care

View Set

Basic Appraisal Procedures - Quizzes

View Set

ch 15 differential reinforcement

View Set

Research Involving Human Subjects (RCR-Basic) Quiz

View Set