SOCI1001 - RELIABILITY & VALIDITY
The major forms of validity are
content validity criterion-related validity (concurrent and predictive) construct validity face validity.
TEST RE-TEST
This method tests reliability which involves the repeated administration of the same test to the same people on different occasions and correlating the scores obtained to find the estimate of reliability. • This is the most basic form of reliability assessment and also one of the easiest ways to estimate the reliability of an empirical measurement.
If what you are measuring is stable and the indicator has stability reliability, then the results will be
constant.
STABILITY RELIABILITY
This addresses the extent to which a measure produces the same results when applied to the same group at different time periods.
Content validity
a reflection of the extent to which the measuring device used covers the entire range of meanings of the concept being studied.
For a test to be ________ it also needs to be ______.
Reliable; Valid
Construct validity
concerned with the logical relationship between variables. It is dependent on theory and must be placed in theoretical context. • Moser and Kalton (1971, p. 355) in capturing this approach, state "on the basis of theoretical considerations, the researcher postulates the types of degrees of association between the scale and other variables and he examines these associations to see whether they conform to his expectations."/Construct validity has two components: Convergent validity and Divergent (discriminant)
Stability reliability can be assessed
using the test-retest method
Disadvantages of Test Re-Test
• Affected by practice and memory • Influenced by events that might occur between testing sessions. • Requires the administration of two tests
There are three types of reliability:
• Stability reliability • Representative reliability • Equivalence reliability
Criterion-related validity has two components:
Concurrent and Predictive
Assessing ___________________ validity involves establishing that the scores from a measurement procedure (e.g., a test or survey) make accurate predictions about the construct they represent.
predictive Predictive validity example the new intelligence test of intellectual ability is a new measurement procedure that is the equivalent of an IQ test, which is designed to detect the highest levels of intellectual ability. • A sample of students take the new test just before they go off to university. • After one year, the GPA scores of these students are collected. • The aim is to assess whether there is a strong, consistent relationship between the scores from the new measurement procedure (i.e. the new intelligence test) and the scores from the well-established measurement procedure (i.e. the GPA scores). • If such a strong, consistent relationship is demonstrated, we can say that the new measurement procedure (i.e., the new intelligence test) has predictive validity.
RELIABILITY
According to Carmines and Zeller (1979,pp.11), "reliability is the extent to which an experiment, test or any measuring procedure yields the same results on repeated trials." Dixon et al (1987,pp.102) describe it in terms of different researchers obtaining the same results when measuring the same phenomenon with the same measuring device.
Disadvantage of face validity
Assessing face validity might involve simply showing your survey to a few untrained individuals to see whether they think the items look okay to them. It is the least scientific measure of all the validity and is often confused with content validity.
We assess the __________________ validity of a measurement procedure when two different measurement procedures are carried out at the same time.
Concurrent
Steps of content validity
First, the full domain of content relevant to the particular measurement situation is identified. Next, a set of items intended to reflect the given content of the concept is formulated. After this, a sample of the items is obtained for use in the testing situation. Lastly, the test items are arranged in a "testable" form for administration.
Split Half Steps
Only one administration of a test is required. The total set of test items is halved, this may be done in a variety of ways. The most common is to place the even-numbered ones in one group and the uneven or odd numbers in the other. • Another method of halving is to administer a first half and then the other. The estimate of reliability is then correlated in a two step process. First, the scores are correlated on the two halves to obtain the estimate for the two halves. • This allow the researcher to determine whether the halves of the test are measuring the same quality or characteristic. • Then a statistical correlation known as SpearmanBrown Prophecy formula is made to find the estimate for the whole test.
What are two basic properties of measuring devices such as experiments or tests used in social research. They also serve as tools to determine the existence of these properties. • One may exist without the other, but both are needed in order that the device might be accepted as scientifically useful
Reliability and validity
REPRESENTATIVE RELIABILITY
This evaluates the extent to which the indicator or measure produces the same results when applied to different populations such as those represented by different ages, gender, groups. • To have this form of reliabilty, the indicator must consistently produce accurate results when applied to different groups.
External Validity
This is about generalization. To what extent can an effect in research, be generalized to populations, settings, treatment variables, and measurement variables? (Can we generalize to other persons, places, times?)
Validity
This is concerned with matching a concept to be measured with the device being used to measure it. According to Babbie (1979,pp.16) this is evidenced by the degree that a particular indicator measures what it is supposed to measure rather than reflecting some other phenomenon".
Advantages of Split-Half
This is the most widely used of the three forms and considered to be superior as by administering one test, it eliminates difficulty involved in constructing parallel forms as in the case of the alternative form approach. • It also eliminates the problem regarding respondents' memory and experience causing inflation on the reliability estimate as in the case of the test-retest method. • Furthermore, by virtue of it dividing its test items into halves, which incidentally causes it to approximate the alternative form, it is good at determining the internal consistency of a test relatively quickly. Disadvantages of split half method The splitting of the test items in halves also serve as a limitation and causes this form not to be recommended for estimating reliability.
EQUIVALENCE RELIABILITY
This is when a researcher has multiple indicators of a construct. "It addresses the question: Does the measure yield consistent results across different indicators?" (Neuman, 2003, 180). split-half method./split-half method This method establishes equivalence reliabilty by dividing the indicators of the same construct into two groups in order to determine whether they produce the same results.
Convergent validity
helps to establish construct validity when you use two different measurement procedures and research methods (e.g., participant observation and a survey) in your study to collect data about a construct./Divergent (discriminant) This type of validity helps to establish construct validity by demonstrating that the construct you are interested in (e.g., sleep quality) is different from other constructs that might be present in your study (e.g., sleep quantity).
Face validity
it is merely a subjective, superficial assessment of whether the measurement procedure you use in a study appears to be a valid measure of a given variable or construct.
External invalidity
refers to the possibility that conclusions from a survey cannot be generalized to the real world.
Internal Validity
refers to whether the effects observed in a study are due to the manipulation of the independent variable and not some other factor. (Is the relationship causal?)
Criterion validity
reflects the use of a criterion a well-established measurement procedure - to create a new measurement procedure to measure the construct you are interested in.This is demonstrated when there is a strong relationship between the scores from the two measurement procedures, which is typically examined using a correlation.
If the results are the same or tend to be consistent across repeated administrations or measurements, the device would be deemed __________________ or ___________________. Conversely, should the scores fluctuate considerably, the device would be deemed __________________ or _____________________.
reliable or to have (high) reliability; unreliable or to have low reliability.
Divergent example
we are unsure whether sleep quality and sleep quantity are part of the same construct or are two different constructs. We ask participants to complete a survey, as well as observing participants whilst sleeping. However, the survey contains (a) questions that measure sleep quality and (b) questions that measure sleep quantity. Similarly, when we observe participants, we record scores separately for (a) sleep quality and (b) sleep quantity. • In order to assess whether the two constructs (i.e., sleep quality and sleep quantity) are different, we first need to find that both constructs have convergent validity. Therefore, there should be a strong relationship between the survey scores and observational scores for (a) sleep quality and (b) sleep quantity. Next, we need to find that these two constructs are distinct; that is, that we have divergent validity. Therefore, there should be little or no relationship between (a) the survey scores for sleep quality and the survey scores for sleep quantity and (b) the observational scores for sleep quality and the observational scores for sleep quantity. If this is the case, we can be more confident that sleep quality and sleep quantity are, in fact, two separate constructs.
Example of concurrent validity
we want to know whether the new measurement procedure really measures intellectual ability. • A sample of students complete the two tests (e.g., the GSAT test (already established measure) and the new measurement procedure). • There is little if any interval between the taking of the two tests. • Participants who score high on the GSAT test would also score high on the new measurement test; and the same would be said for medium and low scores.
Reasons FOR CRITERION VVALIDITY:
• (a)to create a shorter version of a wellestablished measurement procedure; • (b) to account for a new context, location, and/or culture where well-established measurement procedures need to be modified or completely altered; and • (c) to help test the theoretical relatedness and construct validity of a well-established measurement procedure.
Factors Contributing to the unreliability of a test:
• Familiarity with the particular test • Fatigue • Stress • Physical conditions of the room in which the test is given • Fluctuation of human memory • Amount of practice or experience gained outside of the experience being evaluated by the test. A test that is overly sensitive to the above items is not reliable.
TEST RE-TEST Advantages
•Requires only one form of a test. • Provides information as to the consistency of the test over time.
