Defining and Measuring Variables chapter 3
although operational definitions are used to measure and define a variety of constructs, such as beauty, hunger, and pain, the most familiar example is probably the
IQ test
validity
a measurement procedure is the degree to which the measurement process measures the variable that it claims to measure
The accuracy of a measurement is
the degree to which the measurement conforms to the established standard
ex of am operational definition; most familiar one
- "intelligence" is a hypothetical construct; it is an internal attribute that cannot be observed directly. However, intelligence is assumed to influence external behaviors that can be observed and measured. - An IQ test actually measures external behavior consisting of responses to questions. The test includes both elements of an operational definition: There are specific procedures for administering and scoring the test, and the resulting scores are used as a definition and a measurement of intelligence. Thus, an IQ score is actually a measure of intelligent behavior, but we use the score both to define intelligence and to measure it
6 validities and their meanings (in general terms)
- Face validity- an unscientific form of validity demonstrated when a measurement procedure superficially appears to measure what it claims to measure. - Concurrent validity- is demonstrated when scores obtained from a new measure are directly related to scores obtained from an established measure of the same variable. - Predictive validity- is demonstrated when scores obtained from a measure accurately predict behavior according to a theory. - Construct validity- requires that the scores obtained from a measurement procedure behave exactly the same as the variable itself. Construct validity is based on many research studies that use the same measurement procedure and grows gradually as each new study contributes more evidence. - Convergent validity- is demonstrated by a strong relationship between the scores obtained from two (or more) different methods of measuring the same construct. - Divergent validity is demonstrated by showing little or no relationship between the measurements of two different constructs.
- the fact that the categories form an ordered sequence means that there is a directional relationship between the categories. - With measurements from an ordinal scale, we can determine whether two individuals are different, and we can determine the direction of difference. However...
- However, ordinal measurements do not allow us to determine the magnitude of the difference between the two individuals. For example, a large coffee is bigger than a small coffee, but we do not know how much bigger. Other examples of ordinal scales are socioeconomic class (upper, middle, and lower) and T-shirt sizes (small, medium, and large).
the simple fact that the two sets of measurements are related does not necessarily mean that they are identical. ex, we could claim to measure people's height by having them step on a bathroom scale and recording the number that appears Note that we claim to be measuring height, although we are actually measuring weight.
- However, we could provide support for our claim by demonstrating a reasonably strong relationship between our scores and more traditional measurements of height (taller people tend to weigh more; shorter people tend to weigh less). - Although we can establish some degree of concurrent validity for our measurements, it should be obvious that a measurement of weight is not really a valid measure of height. In particular, these two measurements behave in different ways and are influenced by different factors. Manipulating diet, for example, influences weight but has little or no effect on height.
2nd
- involves forming a hypothesis, a tentative answer to the question.
There are circumstances in which a high level of face validity can create problems.
- If the purpose of the measurement is obvious, the participants in a research study can see exactly what is being measured and may adjust their answers to produce a better image of themselves. - For this reason, researchers often try to disguise the true purpose of measurement devices such as questionnaires, deliberately trying to conceal the variables that they are trying to measure
convergent validity
- In general terms, it involves creating two different methods for measuring the same construct, and then showing that the two methods produce strongly related scores. The goal is to demonstrate that different measurement procedures "converge"—or join— on the same construct.
an operational definition is an indirect method of measuring something that cannot be measured directly. How can we be sure that the measurements obtained from an operational definition actually represent the intangible construct?
- In general, we are asking how good a measurement procedure, or a measurement, is.
- Internal consistency: Often, a complex construct such as intelligence or personality is measured using a test or questionnaire consisting of multiple items. The idea is that no single item or question is sufficient to provide a complete measure of the construct. A common example is the use of exams that consist of multiple items (questions or problems) to measure performance in an academic course. - The final measurement for each individual is then determined by adding or averaging the responses across the full set of items. A basic assumption in this process is that each item (or group of items) measures a part of the total construct. - If this is true, then there should be some consistency between the scores for different items or different groups of items.
- Internal consistency: Often, a complex construct such as intelligence or personality is measured using a test or questionnaire consisting of multiple items. The idea is that no single item or question is sufficient to provide a complete measure of the construct. A common example is the use of exams that consist of multiple items (questions or problems) to measure performance in an academic course. - The final measurement for each individual is then determined by adding or averaging the responses across the full set of items. A basic assumption in this process is that each item (or group of items) measures a part of the total construct. - If this is true, then there should be some consistency between the scores for different items or different groups of items.
error expressed as an equation:
- Measured Score = True Score + Error - Ex. if we try to measure your intelligence with an IQ test, the score we get is determined partially by your actual level of intelligence (your true score), but also is influenced by a variety of other factors such as your current mood, your level of fatigue, your general health, how lucky you are at guessing on questions to which you do not know the answers, and so on. These other factors are lumped together as error and are typically a part of any measurement. - It is generally assumed that the error component changes randomly from one measurement to the next, raising your score for some measurements and lowering it for others.
Measures of reliability(terms+general definitions)
- Test-retest reliability is established by comparing the scores obtained from two successive measurements of the same individuals and calculating a correlation between the two sets of scores. If alternative versions of the measuring instrument are used for the two measurements, the reliability measure is called parallel-forms reliability. - Inter-rater reliability is the degree of agreement between two observers who simultaneously record measurements of the behaviors. - Split-half reliability is obtained by splitting the items on a questionnaire or test in half, computing a separate score for each half, and then calculating the degree of consistency between the two scores for a group of participants.
How are validity and reliability related?
- They are related to each other in that reliability is a prerequisite for validity; that is, a measurement procedure cannot be valid unless it is reliable. - If we measure your IQ twice and obtain measurements of 75 and 160, not only are the measurements unreliable but we also have no idea what your IQ actually is. The huge discrepancy between the two measurements is impossible if we are truly measuring intelligence. Therefore, we must conclude that there is so much error in the measurements that the numbers themselves have no meaning.
Thus, the process of measurement involves two components:
- a set of categories and a procedure for assigning individuals to categories.
theory
- a set of statements about the mechanisms underlying a particular behavior. Theories help organize and unify different observations of the behavior and its relationship with other variables. A good theory generates predictions about the behavior.
A ratio scale, on the other hand, is characterized by
- a zero point that is not an arbitrary location. - Instead, the value 0 on a ratio scale is a meaningful point representing none (a complete absence) of the variable being measured. The existence of an absolute, non-arbitrary zero point means that we can measure the absolute amount of the variable; that is, we can measure the distance from 0. - This makes it possible to compare measurements in terms of ratios. o For example, a glass with 8 ounces of water (8 more than 0) has twice as much as a glass with 4 ounces (4 more than 0). - With a ratio scale, we can measure the direction and magnitude of the difference between measurements and describe differences in terms of ratios. - Ratio scales are quite common and include physical measures, such as height and weight, as well as variables, such as reaction time or number of errors on a test.
In situations in which there is an established standard for measurement units, it is possible to define the
- accuracy of a measurement process. For example, we have standards that define precisely what is meant by aninch, aliter, apound, and asecond
3rd step in the research process
- is determining a method for defining and measuring the variables that are being studied. - Occasionally, a research study involves variables that are well defined, easily observed, and easily measured. For example, a study of physical development might involve the variables of height and weight. Both of these variables are tangible, concrete attributes that can be observed and measured directly.
Limitations of Operational Definitions
- an operational definition is not the same as the construct itself; we can define and measure variables(intelligence, motivation, anxiety etc.,)but in fact we are measuring external manifestations that provide an indication of the underlying variables. As a result, there are always concerns about the quality of operational definitions and the measurements they produce. - there is not a one-to-one relationship between the variable that is being measured and the actual measurements produced by the operational definition -ex. an instructor evaluating the students in a class. the underlying variable is knowledge or mastery of subject matter, and the instructor's goal is to obtain a measure of knowledge for each student. but knowledge is a construct that cannot be directly observed or measured. Therefore, instructors typically give students a task (such as an exam, an essay, or a set of problems), and then measure how well students perform the task. Although it makes sense to expect that performance is a reflection of knowledge, performance and knowledge are not the same thing. For example, physical illness or fatigue may affect performance on an exam, but they probably do not affect knowledge. There is not a one-to-one relationship between the variable that the instructor wants to measure (knowledge) and the actual measurements that are made (performance).
A measure cannot be valid unless it is reliable...
- but a measure can be reliable without being valid.
- We begin by specifying how each of the variables will be measured. we defined variables as
- characteristics or conditions that change or have different values for different individuals. - Usually, researchers are interested in how variables are affected by different conditions or how variables differ from one group of individuals to another.
Face validity
- concerns the superficial appearance, or face value, of a measurement procedure. Does the measurement technique look like it measures the variable that it claims to measure? -- is based on subjective judgment and is difficult to quantify.
Because new research results are reported every day
- construct validity is never established absolutely. Instead, construct validity is an ideal or a goal that develops gradually from the results of many research studies that examine the measurement procedure in a wide variety of situations.
Many research variables, particularly variables of interest to behavioral scientists, are in fact hypothetical entities created from theory and speculation. Such variables are called
- constructs or hypothetical construct
In addition to using operational definitions as a basis for measuring variables, they also can be used to
- define variables to be manipulated. Ex. the construct "hunger" can be operationally defined as the number of hours of food deprivation. Ex. In a research study one group could be tested immediately after eating a full meal, a second group could be tested 6 hours after eating, and a third group could be tested 12 hours after eating. In this study, we are comparing three different levels of hunger, which are defined by the number of hours without food. Alternatively, we could measure hunger for a group of rats by recording how much food each animal eats when given free access to a dish of rat chow. The amount that each rat eats defines how hungry it is.
The fact that the categories are all the same size makes it possible to
- determine the distance between two points on the scale. o For example, you know that a measurement of 70 degrees Fahrenheit is higher than a measurement of 55 degrees, and you know that it is exactly 15 degrees higher.
The categories that make up an ordinal scale have
- different names and are organized in an ordered series.
concurrent validity
- establishes consistency between two different procedures for measuring the same variable, suggesting that the two measurement procedures are measuring the same thing. Because one procedure is well established and accepted as being valid, we infer that the second procedure must also be valid. -the validity of a new measurement is established by demonstrating that the scores obtained from the new measurement technique are directly related to the scores obtained from another, better-established procedure for measuring the same variable.
1st step
- find an unanswered question that will serve as a research idea
Remember, the difference between an interval scale and a ratio scale is the definition of the zero point. Thus, measurements of
- height in inches or weight in pounds could be either interval or ratio depending on how the zero point is defined. o For example, with traditional measurements of weight, zero corresponds to none (no weight) and the measurements form a ratio scale. In this case, an 80-pound child (80 pounds above 0) weighs twice as much as a 40-pound child (40 pounds above 0). - Now consider a set of measurements that define the zero point as the average weight for the age group. In this situation, each child is being measured relative to the average, so a child who is 12 pounds above average receives a score of 112 pounds. A child who is 4 pounds below average is assigned a score of 24 pounds. - Now the measurements make up an interval scale. In particular, a child who is 12 pounds above average (112) does not weigh twice as much as a child who is 6 pounds above average (16). You should note, however, that the ratio and the interval measurements provide the same information about the distance between two scores. - For the ratio measurements, 84 pounds is 4 more than 80 pounds. - For the interval measurements, a score of +8 pounds is 5 more than a score of +3 pounds. For most applications, the ability to measure distances is far more important than the ability to measure ratios. - Therefore, in most situations, the distinction between interval and ratio scales has little practical significance.
- For most variables that you are likely to encounter, numerous research studies probably already have examined the same variables. Past research has studied each variable in a variety of different situations and has documented which factors influence the variable and how different values of the variable produce different kinds of behavior In short, past research has demonstrated
- how the specific variable behaves
Constructs
- hypothetical attributes or mechanisms that help explain and predict behavior in a theory.
Divergent validity
- involves demonstrating that we are measuring one specific construct and not combining two different constructs in the same measurement process. The goal is to differentiate between two conceptually distinct constructs by measuring both constructs and then showing that there is little or no relationship between the two measurements.
The distinguishing characteristic of an interval scale is that
- it has an arbitrary zero point. That is, the value 0 is assigned to a particular location on the scale simply as a matter of convenience or reference. Specifically, a value of 0 does not indicate the total absence of the variable being measured. o For example, a temperature of 0 degrees Fahrenheit does not mean that there is no temperature, and it does not prohibit the temperature from going even lower. Interval scales with an arbitrary zero point are fairly rare. The two most common examples are the Fahrenheit and Celsius temperature scales. o Other examples are altitude (above and below sea level), golf scores (above and below par), and other relative measures, such as above and below average rainfall.
A measurement procedure is said to have reliability if
- it produces identical (or nearly identical) results when it is used repeatedly to measure the same individual under the same conditions. - ex. if we use an IQ test to measure a person's intelligence today, then use the same test for the same person under similar conditions next week, we should obtain nearly identical IQ scores.
In the case of reaction time, most researchers solve the problem by
- measuring reaction times in several trials and computing an average. The average value provides a much more stable, more reliable measure of performance.
- Earlier, we used the example of attempting to measure height by having people step on a bathroom scale. Because height and weight are related, the measurement that we obtain from the scale would be considered a valid measure of height, at least in terms of concurrent validity. However, the weight measurement is
- not a valid method of measuring height in terms of construct validity. In particular, height is not influenced by short periods of food deprivation. - Weight measurements, on the other hand, are affected by food deprivation. Therefore, measurements of weight do not behave in accordance with what is known about the construct "height," which means that the weight-measurement procedure does not have construct validity as a measure of height.
Researchers can measure these external, observable events as an indirect method of measuring the construct itself. Typically, researchers identify a behavior, or a cluster of behaviors associated with a construct; the behavior is then measured, and the resulting measurements are used as a definition and a measure of the construct. This method of defining and measuring a construct is called an
- operational definition; Researchers often refer to the process of using an operational definition as operationalizing a construct.
In general, critically examine any measurement procedure and ask yourself whether a different technique might
- produce better measurements.
The categories that make up a nominal scale simply represent
- qualitative (not quantitative) differences in the variable measured. The categories have different names but are not related to each other in any systematic way. o For example, if you were measuring academic majors for a group of college students, the categories would be art, chemistry, English, history, psychology, and so on. Each student would be placed in a category according to his or her major.
A common example of a measurement with a large error component is
- reaction time - Ex. you are participating in a cognitive skill test. A pair of one-digit numbers is presented on a screen and your task is to press a button as quickly as possible if the two digits add to 10 (Ackerman & Beier, 2007). - On some trials, you will be fully alert and focused on the screen, with your finger tensed and ready to move. - On other trials, you may be daydreaming or distracted, with your attention elsewhere, so that extra time passes before you can refocus on the numbers, mentally add them, and respond. - In general, it is quite common for reaction time on some trials to be twice as long as reaction time on other trials. When scores change dramatically from one trial to another, the measurements are said to be unreliable, and we cannot trust any single measurement to provide an accurate indication of an individual's true score.
Over a series of many measurements, the increases and decreases caused by error
- should average to zero. - Ex. your IQ score is likely to be higher when you are well rested and feeling good and lower when you are tired and depressed. Although your actual intelligence has not changed, the error component causes your score to change from one measurement to another.
- Although the distinction between interval and ratio scales has little practical significance, the differences among the other measurement scales can be enormous.
- the amount of information provided by each scale can limit the interpretation of the scores. For example, nominal scales only allow you to determine whether two scores are the same or different. If two scores are different, you cannot measure the size of the difference and you cannot determine whether one score is greater than or less than the other. If your research question is concerned with the direction or the size of differences, nominal measurements cannot be used.
The numerical value of the correlation (independent of the sign) describes
- the consistency of the relationship by measuring the degree to which the data points form a straight line. If the points fit perfectly on a line, the correlation is +1.00 or −1.00. If there is no linear fit whatsoever, the correlation is 0.
The differences among these four types of measurement scales are based on
- the relationships that exist among the categories that make up the scales and the information provided by each measurement.
In essence, reliability is
- the stability, or the consistency of the measurements produced by a specific measurement procedure. The concept of reliability is based on the assumption that the variable being measured is stable or constant Ex. your intelligence does not change dramatically from one day to another but rather stays at a fairly constant level
Measurements from a nominal scale allow us to determine whether two individuals are the same or different, but
- they do not permit any quantitative comparison. For example, if two individuals are in different categories, we cannot determine the direction of the difference (is art "more than" English?), and we cannot determine the magnitude of the difference. Other examples of nominal scales are classifying people by race, political affiliation, or occupation.
Although these mechanisms and elements cannot be seen and are only assumed to exist, we accept them as real because
- they seem to describe and explain behaviors that we see.
Ordinal scales also allow you to determine whether
- two scores are the same or different and provide additional information about the direction of the difference. o For example, with ordinal measurements you can determine whether one option is preferred over another. However, ordinal scales do not provide information about the magnitude of the difference between the two measurements. Again, this may limit the research questions for which ordinal scales are appropriate.
The next steps in the research process involve
- using the hypothesis to develop an empirical research study that will either support or refute the hypothesis
To evaluate differences or changes in variables, it is essential that
- we are able to measure them.
In general, critically examine any measurement procedure and ask
- whether a different technique might produce better measurements.
if the error component is relatively large
- you will find huge differences from one measurement to the next, and the measurements are, therefore, not reliable.
Participant changes
-The participant can change between measurements. As noted earlier, a person's degree of focus and attention can change quickly and can have a dramatic effect on measures of reaction time. -Such changes may cause the obtained measurements to differ, producing what appear to be inconsistent or unreliable measurements
Often, an ordinal scale consists of
-series of ranks (first, second, third, and so on) -verbal labels such as small, medium, and large
The indirect connection between the variable and the measurements can result in two general problems(limitations of O.D prt 2)
1st -operational definitions leave out important components of a construct. Ex, it is possible to define depression in terms of behavioral symptoms (social withdrawal, insomnia, etc.). However, behavior represents only a part of the total construct. Depression includes cognitive and emotional components that are not included in a totally behavioral definition. One way to reduce this problem is to include two or more different procedures to measure the same variable. 2nd-operational definitions often include extra components that are not part of the construct being measured. ex, a self-report of depression in a clinical interview or on a questionnaire is influenced by the participant's verbal skills (ability to understand questions and express feelings and thoughts) as well as the participant's willingness to reveal personal feelings or behaviors that might be perceived as odd or undesirable. A participant who is able and willing to describe personal symptoms may appear to be more depressed than someone who withholds or conceals information
Internal consistency
A measure of reliability; the degree to which a test yields similar scores across its different parts, such as on odd versus even items.
construct validity
If we can demonstrate that measurements of a variable behave in exactly the same way as the variable itself - Ex. you are examining a measurement procedure that claims to measure aggression. Past research has demonstrated a relationship between temperature and aggression: In the summer, as temperature rises, people tend to become more aggressive. - To help establish construct validity, you would need to demonstrate that the scores you obtain from the measurement procedure also increase as the temperature goes up. Note, however, that this single demonstration is only one small part of construct validity. - To completely establish construct validity, you would need to examine all the past research on aggression and show that the measurement procedure produces scores that behave in accordance with everything that is known about the construct "aggression."
Observer error
The individual who makes the measurements can introduce simple human error into the measurement process, especially when the measurement involves a degree of human judgment.
Successive measurements
The reliability estimate obtained by comparing the scores obtained from two successive measurements is commonly called test-retest reliability
The following scenario illustrates the concepts of convergent and divergent validity. - One variable of concern for researchers interested in couples therapy is the quality of the relationship. o Lawrence et al. (2011) have introduced the Relationship Quality Interview (RQI), which is intended to measure relationship quality in five specific domains: o (a) emotional intimacy o (b) sexual relationship o (c) support transactions o (d) power sharing o (e) problem solving.
The researchers demonstrated convergent validity by showing strong relationships among the five RQI scale ratings, indicating that the five domains of the RQI are converging on the same construct (relationship quality). - After establishing convergent validity, however, the researchers wanted to demonstrate that the RQI is really measuring relationship quality and not some other variable. o For example, the scores may actually reflect the general level of satisfaction with the relationship rather than the quality. o It is possible, for example, for couples to be satisfied with a low-quality relationship. o To resolve this problem, it is necessary to demonstrate that the two constructs, "quality" and "satisfaction," are separate and distinct. o The researchers established divergent validity by showing a weak relationship between the RQI quality scores and measures of general satisfaction. Specifically, correlations between the domain-specific measures of quality from the RQI and global relationship satisfaction scores were generally low. o By demonstrating that two or more different methods of measurement produce strongly related scores for the same construct (convergent validity), and by demonstrating a weak relationship between the measurements for two distinct constructs (divergent validity), you can provide very strong and convincing evidence of validity. That is, there is little doubt that you are actually measuring the construct that you intend to measure.
The characteristic that differentiates interval and ratio scales is
The zero point
- On the other hand, it is not necessary for a measurement to be valid for it to be reliable. For example, we could measure your height and claim that it is a measure of intelligence. Although this is a foolish and invalid method for defining and measuring intelligence, it would be very reliable, producing consistent scores from one measurement to the next
Thus, the consistency of measurement is no guarantee of validity.
Simultaneous measurements
When measurements are obtained by direct observation of behaviors, it is common to use two or more separate observers who simultaneously record measurements
Predictive validity
When the measurements of a construct accurately predict behavior (according to the theory)
operational definition
a procedure for indirectly measuring and defining a variable that cannot be observed or measured directly. An operational definition specifies a measurement procedure (a set of operations) for measuring an external, observable behavior and uses the resulting measurements as a definition and a measurement of the hypothetical construct.
To show the amount of consistency between two different measurements, the two scores obtained for each person can be presented in
a scatterplot
it is quite common to measure variables for which there is no established standard. In such cases, it is impossible to define or measure
accuracy. - A test designed to measure depression, for example, cannot be evaluated in terms of accuracy because there is no standard unit of depression that can be used for comparison. For such a test, the question of accuracy is moot, and the only concerns are the validity and the reliability of the measurements.
The first criterion for evaluating a measurement procedure is validity. To establish validity, you must demonstrate that the measurement procedure is
actually measuring what it claims to be measuring.
Ex. your intelligence does not change dramatically from one day to another but rather stays at a fairly constant level However, when we measure a variable such as intelligence, the measurement procedure introduces
an element of error
Inter-rater reliability can be measured by
computing the correlation between the scores from the two observers (Figure 3.1 and Chapter 13, p. 317) or by computing a percentage of agreement between the two observers
constructs can be influenced by blank and in turn can influence blank
constructs can be influenced by external stimuli and, in turn, can influence external behaviors - Ex. external factors such as rewards or reinforcements can affect motivation (a construct), and motivation can then affect performance.
Another technique for establishing the validity of a measurement procedure is to demonstrate a combination of
convergent and divergent validity
Often, the consistency of a relationship is determined by computing a correlation between the two measurements. A consistent positive relationship like the one in (a) produces a
correlation near +1.00
a consistent negative relationship like the one in (b) produces a
correlation near −1.00,
Each individual records (measures) what he or she observes, and the degree of agreement between the two observers is called
inter-rater reliability
four different types of measurement scales:(just term)
nominal, ordinal, interval, and ratio. -
Occasionally, a measurement procedure produces results that are consistently wrong by a constant amount. The speedometer on a car, for example, may consistently read 10 mph faster than the actual speed. In this case, the speedometer readings are
not accurate, but they are valid and reliable. When the car is traveling at 40 mph, the speedometer consistently (reliably) reads 50 mph, and when the car is actually going 30 mph, the speedometer reads 40 mph. Note that the speedometer correctly differentiates different speeds, which means that it is producing valid measurements of speed. (Note that a measurement process can be valid and reliable even if it is not accurate.)
- the horizontal position of the point determined by blank and the vertical position determined by the blank
one score, second score
The categories on interval and ratio scales are
organized sequentially, and all categories are the same size. Thus, the scale of measurement consists of a series of equal intervals like the inches on a ruler.
choice of a measurement procedure involves a number of decisions. Usually, there is no absolutely right or absolutely wrong choice; nonetheless be aware that
other researchers had options and choices when they decided how to measure their variables
A researcher may use exactly the same measurement procedure for the same group of individuals at two different times. Or a researcher may use modified versions of the measurement instrument (such as alternative versions of an IQ test) to obtain two different measurements for the same group of participants. When different versions of the instrument are used for the test and the retest, the reliability measure is often called
parallel-forms reliability
In very general terms, measurement is a procedure for classifying individuals into categories. The set of categories is called
scale of measurement
one definition of validity requires that
scores obtained from a new measurement procedure are consistently related to the scores from a well-established technique for measuring the same variable.
Thus far, the discussion has concentrated on situations involving successive measurements. Although this is one common example of reliability, it is also possible to measure reliability for
simultaneous measurements and to measure reliability in terms of the internal consistency among the many items that make up a test or questionnaire.
n a scatter plot, the two scores for each person are represented as a
single point
To measure the degree of consistency, researchers commonly split the set of items in half and compute a separate score for each half. The degree of agreement between the two scores is then evaluated, usually with a correlation (Chapter 15, p. 387). This general process results in a measure of
split-half reliability - You should note that there are many different ways to divide a set of items in half prior to computing split-half reliability, and the value you obtain depends on the method you use to split the items. However, there are statistical techniques for dealing with this problem
Important to keep in mind
that any measurement procedure, particularly an operational definition, is simply an attempt to classify the variable being considered. Other measurement procedures are always possible and may provide a better way to define and measure the variable.
Other common examples of interval or ratio scales are
the measures of time in seconds, weight in pounds, and temperature in degrees Fahrenheit. Notice that in each case, one interval (1 inch, 1 second, 1 pound, and 1 degree) is the same size, no matter where it is located on the scale
In attempting to explain and predict behavior, scientists and philosophers often develop
theories that contain hypothetical mechanisms and intangible elements
- Whenever the variables in a research study are hypothetical constructs, you must use operational definitions to define and measure the variables. Usually, however,
this does not mean creating your own operational definition
Best method of determining how a variable should be measured is
to consult previous research involving the same variable - Research reports typically describe in detail how each variable is defined and measured; By reading several research reports concerning the same variable, you typically can discover that a standard, generally accepted measurement procedure has already been developed.
negative relationship
two measurements change in opposite directions so that people who score high on one measurement tend to score low on the other.
positive relationship between two measurements
two measurements change together in the same direction. Therefore, people who score high on the first measurement (toward the right of the graph) also tend to score high on the second measurement (toward the top of the graph).
best method to plan your own research(w/ O.D)
use the conventional method of defining and measuring your variables. In this way, your results will be directly comparable to the results obtained in past research
Researchers have developed two general criteria for evaluating the quality of any measurement procedure
validity and reliability. - validity and reliability are often defined and measured by the consistency of the relationship between two sets of measurements.
As long as the error component is relatively small
your scores will be relatively consistent from one measurement to the next, and the measurements are said to be reliable
common sources of error are:
· Observer error: The individual who makes the measurements can introduce simple human error into the measurement process, especially when the measurement involves a degree of human judgment. o Ex. consider a baseball umpire judging balls and strikes or a college professor grading student essays. The same pitch could be called a ball once and a strike later in the game, or the same essay could receive an A one semester and a B at a different time. In each case, the measurement includes some error introduced by the observer. · Environmental changes: Although the goal is to measure the same individual under identical circumstances, this ideal is difficult to attain. Often, there are small changes in the environment from one measurement to another, and these small changes can influence the measurements. There are so many environmental variables (such as time of day, temperature, weather conditions, and lighting) that it is essentially impossible to obtain two identical environmental conditions. · Participant changes: The participant can change between measurements. As noted earlier, a person's degree of focus and attention can change quickly and can have a dramatic effect on measures of reaction time. o Such changes may cause the obtained measurements to differ, producing what appear to be inconsistent or unreliable measurements. Ex. hunger probably does not lower intelligence, but it can be a distraction that causes a lower score on an IQ test.