Defining and Measuring Variables chapter 3

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Another strategy for reducing experimenter bias is to use a

"blind" experiment. If the research study is conducted by an experimenter (assistant) who does not know the expected results, the experimenter should not be able to influence the participants. -called single blind research

physiological measures

- A second option for measuring a construct is to look at the physiological manifestations of the underlying construct. - Other physiological measures involve brain-imaging techniques such as positron emission tomography (PET) scanning and magnetic resonance imaging (MRI). These techniques allow researchers to monitor activity levels in specific areas of the brain during different kinds of activity.

Although many measurements are clearly classified as either ordinal or interval, there are others that are not obviously in one category or the other.

- IQ scores, for example, are numerical values that appear to form an interval scale. However, there is some question about the size of one point of IQ. Is the difference between an IQ of 85 and an IQ of 86 exactly the same as the difference between an IQ of 145 and an IQ of 146? If the answer is yes, then IQ scores form an interval scale. However, if you are not sure that one point is exactly the same everywhere on the scale, then IQ scores must be classified as ordinal measurements.

There are circumstances in which a high level of face validity can create problems.

- If the purpose of the measurement is obvious, the participants in a research study can see exactly what is being measured and may adjust their answers to produce a better image of themselves. - For this reason, researchers often try to disguise the true purpose of measurement devices such as questionnaires, deliberately trying to conceal the variables that they are trying to measure

convergent validity

- In general terms, it involves creating two different methods for measuring the same construct, and then showing that the two methods produce strongly related scores. The goal is to demonstrate that different measurement procedures "converge"—or join— on the same construct.

an operational definition is an indirect method of measuring something that cannot be measured directly. How can we be sure that the measurements obtained from an operational definition actually represent the intangible construct?

- In general, we are asking how good a measurement procedure, or a measurement, is.

experimenter bias.

The experimenter is manipulating participant motivation, and this manipulation can distort the results. When researchers influence results in this way

- On the other hand, it is not necessary for a measurement to be valid for it to be reliable. For example, we could measure your height and claim that it is a measure of intelligence. Although this is a foolish and invalid method for defining and measuring intelligence, it would be very reliable, producing consistent scores from one measurement to the next

Thus, the consistency of measurement is no guarantee of validity.

Simultaneous measurements

When measurements are obtained by direct observation of behaviors, it is common to use two or more separate observers who simultaneously record measurements

Predictive validity

When the measurements of a construct accurately predict behavior (according to the theory)

With nominal scales, we can determine whether

a change in one variable is accompanied by a change in the other variable, but we cannot determine the direction of the change (increase or a decrease), and we cannot determine the magnitude of the change

An interval or a ratio scale, on the other hand, allows

a much more sophisticated description of a relationship. For example, we could determine that a 1-point increase in one variable (such as drug dose) results in a 4-point decrease in another variable (such as heart rate).

To show the amount of consistency between two different measurements, the two scores obtained for each person can be presented in

a scatterplot

A common example of a measurement with a large error component is

- reaction time - Ex. you are participating in a cognitive skill test. A pair of one-digit numbers is presented on a screen and your task is to press a button as quickly as possible if the two digits add to 10 (Ackerman & Beier, 2007). - On some trials, you will be fully alert and focused on the screen, with your finger tensed and ready to move. - On other trials, you may be daydreaming or distracted, with your attention elsewhere, so that extra time passes before you can refocus on the numbers, mentally add them, and respond. - In general, it is quite common for reaction time on some trials to be twice as long as reaction time on other trials. When scores change dramatically from one trial to another, the measurements are said to be unreliable, and we cannot trust any single measurement to provide an accurate indication of an individual's true score.

- As seen in the preceding sections, the choice of a measurement procedure involves several decisions. Because each decision has implications for the results of the study, it is important to consider all the options before deciding on a scheme for measurement for your own study or when critically reading a report of results from another research study. The best starting point for selecting a measurement procedure is to

- review past research reports involving the variables or constructs to be examined.

In particular, consider which measure has a level of

- sensitivity appropriate for detecting the individual differences and group differences that you expect to observe.

Over a series of many measurements, the increases and decreases caused by error

- should average to zero. - Ex. your IQ score is likely to be higher when you are well rested and feeling good and lower when you are tired and depressed. Although your actual intelligence has not changed, the error component causes your score to change from one measurement to another.

Typically, a researcher begins a study with some expectation of how the variables will behave, specifically the direction and magnitude of changes that are likely to be observed. An important concern for any measurement procedure is

- that the measurements are sensitive enough to respond to the type and magnitude of the changes that are expected. o For example, if a medication is expected to have only a small effect on reaction time, then it is essential that time be measured in units small enough to detect the change. If we measure time in seconds and the magnitude of the effect is 1/100 of a second, then the change will not be noticed.

One advantage of physiological measures is

- that they are extremely objective. The equipment provides accurate, reliable, and well-defined measurements that are not dependent on subjective interpretation by either the researcher or the participant.

- Although the distinction between interval and ratio scales has little practical significance, the differences among the other measurement scales can be enormous.

- the amount of information provided by each scale can limit the interpretation of the scores. For example, nominal scales only allow you to determine whether two scores are the same or different. If two scores are different, you cannot measure the size of the difference and you cannot determine whether one score is greater than or less than the other. If your research question is concerned with the direction or the size of differences, nominal measurements cannot be used.

The numerical value of the correlation (independent of the sign) describes

- the consistency of the relationship by measuring the degree to which the data points form a straight line. If the points fit perfectly on a line, the correlation is +1.00 or −1.00. If there is no linear fit whatsoever, the correlation is 0.

An ordinal scale tells us

- the direction of the difference (which is more, and which is less).

The differences among these four types of measurement scales are based on

- the relationships that exist among the categories that make up the scales and the information provided by each measurement.

One particular sensitivity problem occurs when

- the scores obtained in a research study tend to cluster at one end of the measurement scale. o For example, suppose that an educational psychologist intends to evaluate a new teaching program by measuring reading comprehension for a group of students before and after the program is administered. If the students all score around 95% before the program starts, there is essentially no room for improvement. Even if the program does improve reading comprehension, the measurement procedure probably will not detect an increase in scores. In this case, the measurement procedure is insensitive to changes that may occur in one direction.

In essence, reliability is

- the stability, or the consistency of the measurements produced by a specific measurement procedure. The concept of reliability is based on the assumption that the variable being measured is stable or constant Ex. your intelligence does not change dramatically from one day to another but rather stays at a fairly constant level

Measurements from a nominal scale allow us to determine whether two individuals are the same or different, but

- they do not permit any quantitative comparison. For example, if two individuals are in different categories, we cannot determine the direction of the difference (is art "more than" English?), and we cannot determine the magnitude of the difference. Other examples of nominal scales are classifying people by race, political affiliation, or occupation.

Although these mechanisms and elements cannot be seen and are only assumed to exist, we accept them as real because

- they seem to describe and explain behaviors that we see.

One method for limiting the problems associated with multiple measures is

- to combine them into a single score for each individual.

One option for limiting experimenter bias is

- to standardize or automate the experiment. o For example, a researcher could read from a prepared script to ensure that all participants receive exactly the same instructions. Or instructions could be presented on a printed handout or by video. o In each case, the goal is to limit the personal contact between the experimenter and the participant.

Ordinal scales also allow you to determine whether

- two scores are the same or different and provide additional information about the direction of the difference. o For example, with ordinal measurements you can determine whether one option is preferred over another. However, ordinal scales do not provide information about the magnitude of the difference between the two measurements. Again, this may limit the research questions for which ordinal scales are appropriate.

One method of obtaining a more complete measure of a construct is to

- use two (or more) different procedures to measure the same variable. o For example, we could record both heart rate and behavior as measures of fear. -The advantage of this multiple-measure technique is that it usually provides more confidence in the validity of the measurements.

The next steps in the research process involve

- using the hypothesis to develop an empirical research study that will either support or refute the hypothesis

To evaluate differences or changes in variables, it is essential that

- we are able to measure them.

In general, critically examine any measurement procedure and ask

- whether a different technique might produce better measurements.

to determine the magnitude of a difference,

- you need either an interval or a ratio scale.

if the error component is relatively large

- you will find huge differences from one measurement to the next, and the measurements are, therefore, not reliable.

Participant changes

-The participant can change between measurements. As noted earlier, a person's degree of focus and attention can change quickly and can have a dramatic effect on measures of reaction time. -Such changes may cause the obtained measurements to differ, producing what appear to be inconsistent or unreliable measurements

An alternative is to set up a study in which neither the experimenter nor the participant knows the expected result. This procedure is called

-double-blind research and is commonly used in drug studies in which some participants get the real drug (expected to be effective) and others get a placebo (expected to have no effect). -The double-blind study is structured so that neither the researcher nor the participants know exactly who is getting which drug until the study is completed.

One disadvantage of such measures

-is that they typically require equipment that may be expensive or unavailable. -In addition, the presence of monitoring devices creates an unnatural situation that may cause participants to react differently than they would under normal circumstances. -- A more important concern with physiological measures is whether they provide a valid measure of the construct. Heart rate, for example, may be related to fear, but heart rate is not the same thing as fear. Increased heart rate may be caused by anxiety, arousal, embarrassment, or exertion as well as by fear. Can we be sure that measurements of heart rate are, in fact, measurements of fear?

On the negative side, a behavior

-may be only a temporary or situational indicator of an underlying construct -Usually, it is best to measure a cluster of related behaviors rather than rely on a single indicator.

Often, an ordinal scale consists of

-series of ranks (first, second, third, and so on) -verbal labels such as small, medium, and large

construct validity

If we can demonstrate that measurements of a variable behave in exactly the same way as the variable itself - Ex. you are examining a measurement procedure that claims to measure aggression. Past research has demonstrated a relationship between temperature and aggression: In the summer, as temperature rises, people tend to become more aggressive. - To help establish construct validity, you would need to demonstrate that the scores you obtain from the measurement procedure also increase as the temperature goes up. Note, however, that this single demonstration is only one small part of construct validity. - To completely establish construct validity, you would need to examine all the past research on aggression and show that the measurement procedure produces scores that behave in accordance with everything that is known about the construct "aggression."

it is quite common to measure variables for which there is no established standard. In such cases, it is impossible to define or measure

accuracy. - A test designed to measure depression, for example, cannot be evaluated in terms of accuracy because there is no standard unit of depression that can be used for comparison. For such a test, the question of accuracy is moot, and the only concerns are the validity and the reliability of the measurements.

The first criterion for evaluating a measurement procedure is validity. To establish validity, you must demonstrate that the measurement procedure is

actually measuring what it claims to be measuring.

Ex. your intelligence does not change dramatically from one day to another but rather stays at a fairly constant level However, when we measure a variable such as intelligence, the measurement procedure introduces

an element of error

double-blind

both the researcher and the participants are unaware of the predicted outcome.

Inter-rater reliability can be measured by

computing the correlation between the scores from the two observers (Figure 3.1 and Chapter 13, p. 317) or by computing a percentage of agreement between the two observers

Often, the consistency of a relationship is determined by computing a correlation between the two measurements. A consistent positive relationship like the one in (a) produces a

correlation near +1.00

Measurements from a ratio scale allow us to

determine the direction, the magnitude, and the ratio of the difference.

Finally, interval and ratio scales provide information about

differences between individuals, including the direction of the difference (greater than or less than) and the magnitude of the difference.

If more than one procedure exists for defining and measuring a particular variable,

examine the options and determine which method is best suited for the specific research question.

2 potential artifacts

experimenter bias and participant reactivity

Each individual records (measures) what he or she observes, and the degree of agreement between the two observers is called

inter-rater reliability

Reactivity is especially a problem in studies conducted in a

laboratory, where participants are fully aware that they are participants in a study. Although it is essentially impossible to prevent participants from noticing the demand characteristics of a study and adjusting their behaviors, there are steps to help reduce the effects of reactivity. Often, it is possible to observe and measure individuals without their awareness.

four different types of measurement scales:(just term)

nominal, ordinal, interval, and ratio. -

Occasionally, a measurement procedure produces results that are consistently wrong by a constant amount. The speedometer on a car, for example, may consistently read 10 mph faster than the actual speed. In this case, the speedometer readings are

not accurate, but they are valid and reliable. When the car is traveling at 40 mph, the speedometer consistently (reliably) reads 50 mph, and when the car is actually going 30 mph, the speedometer reads 40 mph. Note that the speedometer correctly differentiates different speeds, which means that it is producing valid measurements of speed. (Note that a measurement process can be valid and reliable even if it is not accurate.)

- The ability to compare measurements has a direct effect on the ability to describe relationships between variables.

o For example, when a research study involves measurements from nominal scales, the results of the study can establish the existence of only a qualitative relationship between variables.

To develop an operational definition for the constructfear

o researchers must first determine which type of external expression should be used to define and measure fear.

multiple measures can introduce some problems. two problem involved

o the statistical analysis and interpretation of the results. Although there are statistical techniques for evaluating multivariate data, they are complex and not well understood by many researchers. o A more serious problem is that the two measures may not behave in the same way. -produce an immediate and large effect on behavior but no effect on heart rate. As a result, participants are willing to approach a feared object after therapy, but their hearts still race. The lack of agreement between two measures can confuse the interpretation of results (did the therapy reduce fear?). The discrepancy between the measurements may be caused by the fact that one measure is more sensitive than the other, or it may indicate that different dimensions of the variable change at different times during treatment (behavior may change quickly, but the physiological aspects of fear take more time).

- the horizontal position of the point determined by blank and the vertical position determined by the blank

one score, second score

A nominal scale can tell us

only that a difference exists.

The categories on interval and ratio scales are

organized sequentially, and all categories are the same size. Thus, the scale of measurement consists of a series of equal intervals like the inches on a ruler.

choice of a measurement procedure involves a number of decisions. Usually, there is no absolutely right or absolutely wrong choice; nonetheless be aware that

other researchers had options and choices when they decided how to measure their variables

A researcher may use exactly the same measurement procedure for the same group of individuals at two different times. Or a researcher may use modified versions of the measurement instrument (such as alternative versions of an IQ test) to obtain two different measurements for the same group of participants. When different versions of the instrument are used for the test and the retest, the reliability measure is often called

parallel-forms reliability

Reactivity occurs when

participants modify their natural behavior in response to the fact that they are participating in a research study or the knowledge that they are being measured.

- One particular sensitivity problem occurs when the scores obtained in a research study tend to cluster at one end of the measurement scale. it is called

range effect.

Demand characteristics

refer to any of the potential cues or features of a study that (1) suggest to the participants what the purpose and hypothesis is and (2) influence the participants to respond or behave in a certain way.

- Most commonly used procedures have been evaluated for reliability and validity. In addition, using an established measurement procedure means that

results can be compared directly to the previous literature in the area.

In very general terms, measurement is a procedure for classifying individuals into categories. The set of categories is called

scale of measurement

one definition of validity requires that

scores obtained from a new measurement procedure are consistently related to the scores from a well-established technique for measuring the same variable.

- The many different external expressions of a construct are traditionally classified into three categories that also define three different types, or modalities, of measurement. The three categories are

self-report, physiological, and behavioral Ex. the hypothetical construct "fear," and suppose that a researcher would like to evaluate the effectiveness of a therapy program designed to reduce the fear of flying. This researcher must somehow obtain measurements of fear before the therapy begins, then compare them with measurements of fear obtained after therapy.

Thus far, the discussion has concentrated on situations involving successive measurements. Although this is one common example of reliability, it is also possible to measure reliability for

simultaneous measurements and to measure reliability in terms of the internal consistency among the many items that make up a test or questionnaire.

n a scatter plot, the two scores for each person are represented as a

single point

To measure the degree of consistency, researchers commonly split the set of items in half and compute a separate score for each half. The degree of agreement between the two scores is then evaluated, usually with a correlation (Chapter 15, p. 387). This general process results in a measure of

split-half reliability - You should note that there are many different ways to divide a set of items in half prior to computing split-half reliability, and the value you obtain depends on the method you use to split the items. However, there are statistical techniques for dealing with this problem

Important to keep in mind

that any measurement procedure, particularly an operational definition, is simply an attempt to classify the variable being considered. Other measurement procedures are always possible and may provide a better way to define and measure the variable.

The primary advantage of a self-report measure is

that it is probably the most direct way to assess a construct. Each individual is in a unique position of self-knowledge and self-awareness; presumably, no one knows more about the individual's fear than the individual. -- Also, a direct question and its answer have more face validity than measuring some other response that theoretically is influenced by fear.

single-blind

the researcher does not know the predicted outcome.

One obvious factor that differentiates the four types of measurement scales is

their ability to compare different measurements.

Best method of determining how a variable should be measured is

to consult previous research involving the same variable - Research reports typically describe in detail how each variable is defined and measured; By reading several research reports concerning the same variable, you typically can discover that a standard, generally accepted measurement procedure has already been developed.

Researchers observe and measure these external manifestations to develop operational definitions for constructs. However, one major decision for a researcher is

to determine which of these external manifestations provide the best indication of the underlying construct

positive relationship between two measurements

two measurements change together in the same direction. Therefore, people who score high on the first measurement (toward the right of the graph) also tend to score high on the second measurement (toward the top of the graph).

best method to plan your own research(w/ O.D)

use the conventional method of defining and measuring your variables. In this way, your results will be directly comparable to the results obtained in past research

Researchers have developed two general criteria for evaluating the quality of any measurement procedure

validity and reliability. - validity and reliability are often defined and measured by the consistency of the relationship between two sets of measurements.

An artifact can threaten the validity of the measurements because

you are not really measuring what you intended, and it can be a threat to reliability.

As long as the error component is relatively small

your scores will be relatively consistent from one measurement to the next, and the measurements are said to be reliable

variety of ways that an experimenter can influence a participant's behavior:

· By paralinguistic cues (variations in tone of voice) that influence the participants to give the expected or desired responses · By kinesthetic cues (body posture or facial expressions) · By verbal reinforcement of expected or desired responses · By misjudgment of participants' responses in the direction of the expected results · By not recording participants' responses accurately (errors in recording of data) in the direction of the expected or desired results

although operational definitions are used to measure and define a variety of constructs, such as beauty, hunger, and pain, the most familiar example is probably the

IQ test

validity

a measurement procedure is the degree to which the measurement process measures the variable that it claims to measure

The accuracy of a measurement is

the degree to which the measurement conforms to the established standard

ex of am operational definition; most familiar one

- "intelligence" is a hypothetical construct; it is an internal attribute that cannot be observed directly. However, intelligence is assumed to influence external behaviors that can be observed and measured. - An IQ test actually measures external behavior consisting of responses to questions. The test includes both elements of an operational definition: There are specific procedures for administering and scoring the test, and the resulting scores are used as a definition and a measurement of intelligence. Thus, an IQ score is actually a measure of intelligent behavior, but we use the score both to define intelligence and to measure it

2nd

- involves forming a hypothesis, a tentative answer to the question.

Observer error

The individual who makes the measurements can introduce simple human error into the measurement process, especially when the measurement involves a degree of human judgment.

Successive measurements

The reliability estimate obtained by comparing the scores obtained from two successive measurements is commonly called test-retest reliability

operational definition

a procedure for indirectly measuring and defining a variable that cannot be observed or measured directly. An operational definition specifies a measurement procedure (a set of operations) for measuring an external, observable behavior and uses the resulting measurements as a definition and a measurement of the hypothetical construct.

constructs can be influenced by blank and in turn can influence blank

constructs can be influenced by external stimuli and, in turn, can influence external behaviors - Ex. external factors such as rewards or reinforcements can affect motivation (a construct), and motivation can then affect performance.

Another technique for establishing the validity of a measurement procedure is to demonstrate a combination of

convergent and divergent validity

a consistent negative relationship like the one in (b) produces a

correlation near −1.00,

Other common examples of interval or ratio scales are

the measures of time in seconds, weight in pounds, and temperature in degrees Fahrenheit. Notice that in each case, one interval (1 inch, 1 second, 1 pound, and 1 degree) is the same size, no matter where it is located on the scale

In attempting to explain and predict behavior, scientists and philosophers often develop

theories that contain hypothetical mechanisms and intangible elements

- Whenever the variables in a research study are hypothetical constructs, you must use operational definitions to define and measure the variables. Usually, however,

this does not mean creating your own operational definition

negative relationship

two measurements change in opposite directions so that people who score high on one measurement tend to score low on the other.

6 validities and their meanings (in general terms)

- Face validity- an unscientific form of validity demonstrated when a measurement procedure superficially appears to measure what it claims to measure. - Concurrent validity- is demonstrated when scores obtained from a new measure are directly related to scores obtained from an established measure of the same variable. - Predictive validity- is demonstrated when scores obtained from a measure accurately predict behavior according to a theory. - Construct validity- requires that the scores obtained from a measurement procedure behave exactly the same as the variable itself. Construct validity is based on many research studies that use the same measurement procedure and grows gradually as each new study contributes more evidence. - Convergent validity- is demonstrated by a strong relationship between the scores obtained from two (or more) different methods of measuring the same construct. - Divergent validity is demonstrated by showing little or no relationship between the measurements of two different constructs.

hypotheses and double blind and single blind

- Finally, we should note that there are many research studies in which the participants do not know the hypothesis. Often participants are deliberately misled about the purpose of the study to minimize the likelihood that their expectations will influence their behaviors. - In other studies, the hypothesis simply is never presented to the participants. In these cases, we often describe the participants as being "blind" to the hypothesis or simply as "naive." However, there is no official term that is used to describe this type of research. - In particular, studies in which the participants do not know the hypothesis are not classified as single-blind or double-blind research. These two terms apply to studies in which the hypothesis is unknown to the researcher (single-blind) or is unknown to both the researcher and the participants (double-blind).

Orne (1962) describes participation in a research study as a social experience in which both the researcher and the participant have roles to play. In particular, the researcher is clearly in charge and is expected to give instructions. The participant, on the other hand, is expected to follow instructions. In fact, most participants strive to be a "good subject" and work hard to do a good job for the researcher. Although this may appear to be good for the researcher's study, it can create two serious problems

- First, participants often try to figure out the purpose of the study and then modify their responses to fit their perception of the researcher's goals. - Second, participants can become so dedicated to performing well that they do things in a research study that they would never do in a normal situation.

It also is common for researchers in the behavioral sciences to measure variables using rating scales.

- For example, participants are asked to use a scale from 1 to 5 to rate the degree to which they agree (or disagree) with controversial statements. The five numerical values are often labeled, for example: Strongly Agree and Somewhat Agree exactly equal to the distance between Neutral and Somewhat Disagree?

- Although the choices appear to form an interval scale with equal distance between successive numbers, is the distance between Strongly Agree and Somewhat Agree exactly equal to the distance between Neutral and Somewhat Disagree? Again, should the scale be treated as ordinal or interval?

- Fortunately, the issue of distinguishing between ordinal and interval scales of measurement has been resolved. - First, researchers have routinely treated scores from ambiguous scales, such as IQ scores and rating scales, as if they were from an interval scale. By tradition or convention, such scores have been added and averaged and multiplied as if they were regular numerical values. - In addition, scientists have argued convincingly for over 50 years that this kind of mathematical treatment is appropriate for these types of ordinal data (Lord, 1953). For a recent review of the history of this issue, see Norman (2010).

- the fact that the categories form an ordered sequence means that there is a directional relationship between the categories. - With measurements from an ordinal scale, we can determine whether two individuals are different, and we can determine the direction of difference. However...

- However, ordinal measurements do not allow us to determine the magnitude of the difference between the two individuals. For example, a large coffee is bigger than a small coffee, but we do not know how much bigger. Other examples of ordinal scales are socioeconomic class (upper, middle, and lower) and T-shirt sizes (small, medium, and large).

the simple fact that the two sets of measurements are related does not necessarily mean that they are identical. ex, we could claim to measure people's height by having them step on a bathroom scale and recording the number that appears Note that we claim to be measuring height, although we are actually measuring weight.

- However, we could provide support for our claim by demonstrating a reasonably strong relationship between our scores and more traditional measurements of height (taller people tend to weigh more; shorter people tend to weigh less). - Although we can establish some degree of concurrent validity for our measurements, it should be obvious that a measurement of weight is not really a valid measure of height. In particular, these two measurements behave in different ways and are influenced by different factors. Manipulating diet, for example, influences weight but has little or no effect on height.

How are validity and reliability related?

- They are related to each other in that reliability is a prerequisite for validity; that is, a measurement procedure cannot be valid unless it is reliable. - If we measure your IQ twice and obtain measurements of 75 and 160, not only are the measurements unreliable but we also have no idea what your IQ actually is. The huge discrepancy between the two measurements is impossible if we are truly measuring intelligence. Therefore, we must conclude that there is so much error in the measurements that the numbers themselves have no meaning.

- Internal consistency: Often, a complex construct such as intelligence or personality is measured using a test or questionnaire consisting of multiple items. The idea is that no single item or question is sufficient to provide a complete measure of the construct. A common example is the use of exams that consist of multiple items (questions or problems) to measure performance in an academic course. - The final measurement for each individual is then determined by adding or averaging the responses across the full set of items. A basic assumption in this process is that each item (or group of items) measures a part of the total construct. - If this is true, then there should be some consistency between the scores for different items or different groups of items.

error expressed as an equation:

- Measured Score = True Score + Error - Ex. if we try to measure your intelligence with an IQ test, the score we get is determined partially by your actual level of intelligence (your true score), but also is influenced by a variety of other factors such as your current mood, your level of fatigue, your general health, how lucky you are at guessing on questions to which you do not know the answers, and so on. These other factors are lumped together as error and are typically a part of any measurement. - It is generally assumed that the error component changes randomly from one measurement to the next, raising your score for some measurements and lowering it for others.

In a classic example of experimenter bias

- Rosenthal and Fode (1963) had student volunteers act as the experimenters in a learning study. The students were given rats to train in a maze. Half of the students were led to believe that their rats were specially bred to be "maze bright." The remainder were told that their rats were bred to be "maze dull." In reality, both groups of students received the same type of ordinary laboratory rat, neither bright nor dull. - Nevertheless, the findings showed differences in the rats' performance between the two groups of experimenters. The "bright" rats were better at learning the maze. The student expectations influenced the outcome of the study. How did their expectations have this effect? Apparently, there were differences in how the students in each group handled their rats, and the handling, in turn, altered the rats' behavior. - Note that the existence of experimenter bias means that the researcher is not obtaining valid measurements. Instead, the behaviors or measurements are being distorted by the experimenter. In addition, experimenter bias undermines reliability because the participants may produce very different scores if tested under the same conditions by a different experimenter.

Measures of reliability(terms+general definitions)

- Test-retest reliability is established by comparing the scores obtained from two successive measurements of the same individuals and calculating a correlation between the two sets of scores. If alternative versions of the measuring instrument are used for the two measurements, the reliability measure is called parallel-forms reliability. - Inter-rater reliability is the degree of agreement between two observers who simultaneously record measurements of the behaviors. - Split-half reliability is obtained by splitting the items on a questionnaire or test in half, computing a separate score for each half, and then calculating the degree of consistency between the two scores for a group of participants.

- For example, in a field study, participants are observed in their natural environment and are much less likely to know that they are being investigated, hence they are less reactive. Although this strategy is often possible, some variables are difficult to observe directly (e.g., attitudes), and in some situations, ethical considerations prevent researchers from secretly observing people. An alternative strategies are to

- The true purpose of a questionnaire can be masked by embedding a few critical questions in a larger set of irrelevant items or by deliberately using questions with low face validity. - Another option is to suggest (subtly or openly) that the participant is performing one task when, in fact, we are observing and measuring something else. In either case, some level of deception is involved, which can raise a question of ethics (see Chapter 4). - The most direct strategy for limiting reactivity is to reassure participants that their performance or responses are completely confidential and anonymous, and encourage them to make honest, natural responses. Any attempt to reassure and relax participants helps reduce reactivity.- The true purpose of a questionnaire can be masked by embedding a few critical questions in a larger set of irrelevant items or by deliberately using questions with low face validity. - Another option is to suggest (subtly or openly) that the participant is performing one task when, in fact, we are observing and measuring something else. In either case, some level of deception is involved, which can raise a question of ethics (see Chapter 4). - The most direct strategy for limiting reactivity is to reassure participants that their performance or responses are completely confidential and anonymous, and encourage them to make honest, natural responses. Any attempt to reassure and relax participants helps reduce reactivity.

The fact that the categories are all the same size makes it possible to

- determine the distance between two points on the scale. o For example, you know that a measurement of 70 degrees Fahrenheit is higher than a measurement of 55 degrees, and you know that it is exactly 15 degrees higher.

- Typically, a researcher knows the predicted outcome of a research study and is in a position to influence the results, either intentionally or unintentionally. o For example, an experimenter might be warm, friendly, and encouraging when presenting instructions to a group of participants in a treatment condition expected to produce good performance, and appear cold, aloof, and somewhat stern when presenting the instructions to another group in a comparison treatment for which performance is expected to be relatively poor.

- critically examining and questioning a published measurement procedure can lead to new research ideas. A - s you read published research reports, always question the measurement procedures asking:

- Why was the variable measured as it was? - Would a different scale have been better? - Were the results biased by a lack of sensitivity or by range effects? - What would happen if the variable(s) were defined and measured in a different way? - If you can reasonably predict that a different measurement strategy would change the results, then you have the grounds for a new research study. - Keep in mind, however, that if you develop your own operational definition or measurement procedure, you need to demonstrate validity and reliability, a task that is very detailed and time consuming. Some researchers dedicate their entire careers to developing a measure.

In general, range effects suggests

- a basic incompatibility between the measurement procedure and the individuals measured.

- Also decide whether the scale of measurement (nominal, ordinal, interval, or ratio) is appropriate for the kind of conclusion you would like to make. Simply to establish that differences exist,

- a nominal scale may be sufficient.

An artifact is

- a non-natural feature accidentally introduced into something being observed. In the context of a research study, an artifact is an external factor that may influence or distort the measurements.

Thus, the process of measurement involves two components:

- a set of categories and a procedure for assigning individuals to categories.

theory

- a set of statements about the mechanisms underlying a particular behavior. Theories help organize and unify different observations of the behavior and its relationship with other variables. A good theory generates predictions about the behavior.

Behavioral measures provide researchers with

- a vast number of options, making it possible to select the behaviors that seem to be best for defining and measuring the construct. o For example, the construct "mental alertness" could be operationally defined by behaviors such as reaction time, reading comprehension, logical reasoning ability, or ability to focus attention. Depending on the specific purpose of a research study, one of these measures probably is more appropriate than the others. -In clinical situations in which a researcher works with individual clients, a single construct such as depression may reveal itself as a separate, unique behavioral problem for each client. In this case, the clinician can construct a separate, unique behavioral definition of depression that is appropriate for each patient. -- In other situations, the behavior may be the actual variable of interest and not just an indicator of some hypothetical construct. For a school psychologist trying to reduce disruptive behavior in the classroom, it is the actual behavior that the psychologist wants to observe and measure. In this case, the psychologist does not use the overt behavior as an operational definition of an intangible construct but rather simply studies the behavior itself.

A ratio scale, on the other hand, is characterized by

- a zero point that is not an arbitrary location. - Instead, the value 0 on a ratio scale is a meaningful point representing none (a complete absence) of the variable being measured. The existence of an absolute, non-arbitrary zero point means that we can measure the absolute amount of the variable; that is, we can measure the distance from 0. - This makes it possible to compare measurements in terms of ratios. o For example, a glass with 8 ounces of water (8 more than 0) has twice as much as a glass with 4 ounces (4 more than 0). - With a ratio scale, we can measure the direction and magnitude of the difference between measurements and describe differences in terms of ratios. - Ratio scales are quite common and include physical measures, such as height and weight, as well as variables, such as reaction time or number of errors on a test.

In situations in which there is an established standard for measurement units, it is possible to define the

- accuracy of a measurement process. For example, we have standards that define precisely what is meant by aninch, aliter, apound, and asecond

The categories that make up an ordinal scale have

- different names and are organized in an ordered series.

concurrent validity

- establishes consistency between two different procedures for measuring the same variable, suggesting that the two measurement procedures are measuring the same thing. Because one procedure is well established and accepted as being valid, we infer that the second procedure must also be valid. -the validity of a new measurement is established by demonstrating that the scores obtained from the new measurement technique are directly related to the scores obtained from another, better-established procedure for measuring the same variable.

1st step

- find an unanswered question that will serve as a research idea

Similarly, clustering at the low end of the scale can produce a

- floor effect. -is the clustering of scores at the low end of a measurement scale, allowing little or no possibility of decreases in value.

Limitations of Operational Definitions

- an operational definition is not the same as the construct itself; we can define and measure variables(intelligence, motivation, anxiety etc.,)but in fact we are measuring external manifestations that provide an indication of the underlying variables. As a result, there are always concerns about the quality of operational definitions and the measurements they produce. - there is not a one-to-one relationship between the variable that is being measured and the actual measurements produced by the operational definition -ex. an instructor evaluating the students in a class. the underlying variable is knowledge or mastery of subject matter, and the instructor's goal is to obtain a measure of knowledge for each student. but knowledge is a construct that cannot be directly observed or measured. Therefore, instructors typically give students a task (such as an exam, an essay, or a set of problems), and then measure how well students perform the task. Although it makes sense to expect that performance is a reflection of knowledge, performance and knowledge are not the same thing. For example, physical illness or fatigue may affect performance on an exam, but they probably do not affect knowledge. There is not a one-to-one relationship between the variable that the instructor wants to measure (knowledge) and the actual measurements that are made (performance).

often in the range effect the measurement is

- based on a task that is too easy (thereby producing high scores) or too difficult (thereby producing low scores) for the participants being tested. Note that it is not the measurement procedure that is at fault but rather the fact that the procedure is used with a particular group of individuals. o For example, a measurement that works well for 4-year-old children may produce serious range effects if used with adolescents. For this reason, it is advisable to pretest any measurement procedure for which potential range effects are suspected. Simply measure a small sample of representative individuals to be sure that the obtained values are far enough from the extremes of the scale to allow room to measure changes in either direction.

In general, if we expect fairly small, subtle changes in a variable, then the measurement procedure must

- be sensitive enough to detect the changes, and the scale of measurement must have enough different categories to allow discrimination among individuals.

A measure cannot be valid unless it is reliable...

- but a measure can be reliable without being valid.

When the range is restricted at the high end, the problem is called a

- ceiling effect (the measurements bump into a ceiling and can go no higher). -the clustering of scores at the high end of a measurement scale, allowing little or no possibility of increases in value.

- We begin by specifying how each of the variables will be measured. we defined variables as

- characteristics or conditions that change or have different values for different individuals. - Usually, researchers are interested in how variables are affected by different conditions or how variables differ from one group of individuals to another.

scores from interval or ratio scales are

- compatible with basic arithmetic, which permits more sophisticated analysis and interpretation. For example, measurements from interval or ratio scales can be used to compute means and variances, and they allow hypothesis testing with t tests or analysis of variance. Ordinal measurements, on the other hand, do not produce meaningful values for means and variances and are not appropriate for most commonly used hypothesis tests. As a result, interval or ratio scale data are usually preferred for most research situations.

Face validity

- concerns the superficial appearance, or face value, of a measurement procedure. Does the measurement technique look like it measures the variable that it claims to measure? -- is based on subjective judgment and is difficult to quantify.

Because new research results are reported every day

- construct validity is never established absolutely. Instead, construct validity is an ideal or a goal that develops gradually from the results of many research studies that examine the measurement procedure in a wide variety of situations.

behavioral measures

- constructs often reveal themselves in overt behaviors that can be observed and measured. The behaviors may be completely natural events such as laughing, playing, eating, sleeping, arguing, or speaking. Or the behaviors may be structured, as when a researcher measures performance on a designated task. In the latter case, a researcher usually develops a specific task in which performance is theoretically dependent on the construct being measured. o For example, reaction time could be measured to determine whether a drug affects mental alertness; the number of words recalled from a list provides a measure of memory ability; and performance on an IQ test is a measure of intelligence. To measure the "fear of flying," a researcher could construct a hierarchy of potential behaviors (visiting an airport, walking onto a plane, sitting in a plane while it idles at the gate, riding in a plane while it taxies on a runway, and actually flying) and measuring how far up the hierarchy an individual is willing to go.

Many research variables, particularly variables of interest to behavioral scientists, are in fact hypothetical entities created from theory and speculation. Such variables are called

- constructs or hypothetical construct

In addition to using operational definitions as a basis for measuring variables, they also can be used to

- define variables to be manipulated. Ex. the construct "hunger" can be operationally defined as the number of hours of food deprivation. Ex. In a research study one group could be tested immediately after eating a full meal, a second group could be tested 6 hours after eating, and a third group could be tested 12 hours after eating. In this study, we are comparing three different levels of hunger, which are defined by the number of hours without food. Alternatively, we could measure hunger for a group of rats by recording how much food each animal eats when given free access to a dish of rat chow. The amount that each rat eats defines how hungry it is.

Participants who are aware they are being observed and measured may react in unpredictable ways. In addition, the research setting often creates a set of cues or demand characteristics that suggests what kinds of behavior are appropriate or expected. The com- bination of

- demand characteristics and participant reactivity can change participants' normal behavior and thereby influence the measurements they produce.

interval scale, we can

- determine the direction and the magnitude of a difference.

Remember, the difference between an interval scale and a ratio scale is the definition of the zero point. Thus, measurements of

- height in inches or weight in pounds could be either interval or ratio depending on how the zero point is defined. o For example, with traditional measurements of weight, zero corresponds to none (no weight) and the measurements form a ratio scale. In this case, an 80-pound child (80 pounds above 0) weighs twice as much as a 40-pound child (40 pounds above 0). - Now consider a set of measurements that define the zero point as the average weight for the age group. In this situation, each child is being measured relative to the average, so a child who is 12 pounds above average receives a score of 112 pounds. A child who is 4 pounds below average is assigned a score of 24 pounds. - Now the measurements make up an interval scale. In particular, a child who is 12 pounds above average (112) does not weigh twice as much as a child who is 6 pounds above average (16). You should note, however, that the ratio and the interval measurements provide the same information about the distance between two scores. - For the ratio measurements, 84 pounds is 4 more than 80 pounds. - For the interval measurements, a score of +8 pounds is 5 more than a score of +3 pounds. For most applications, the ability to measure distances is far more important than the ability to measure ratios. - Therefore, in most situations, the distinction between interval and ratio scales has little practical significance.

- For most variables that you are likely to encounter, numerous research studies probably already have examined the same variables. Past research has studied each variable in a variety of different situations and has documented which factors influence the variable and how different values of the variable produce different kinds of behavior In short, past research has demonstrated

- how the specific variable behaves

Constructs

- hypothetical attributes or mechanisms that help explain and predict behavior in a theory.

Divergent validity

- involves demonstrating that we are measuring one specific construct and not combining two different constructs in the same measurement process. The goal is to differentiate between two conceptually distinct constructs by measuring both constructs and then showing that there is little or no relationship between the two measurements.

3rd step in the research process

- is determining a method for defining and measuring the variables that are being studied. - Occasionally, a research study involves variables that are well defined, easily observed, and easily measured. For example, a study of physical development might involve the variables of height and weight. Both of these variables are tangible, concrete attributes that can be observed and measured directly.

The distinguishing characteristic of an interval scale is that

- it has an arbitrary zero point. That is, the value 0 is assigned to a particular location on the scale simply as a matter of convenience or reference. Specifically, a value of 0 does not indicate the total absence of the variable being measured. o For example, a temperature of 0 degrees Fahrenheit does not mean that there is no temperature, and it does not prohibit the temperature from going even lower. Interval scales with an arbitrary zero point are fairly rare. The two most common examples are the Fahrenheit and Celsius temperature scales. o Other examples are altitude (above and below sea level), golf scores (above and below par), and other relative measures, such as above and below average rainfall.

negative aspects of self report

- it is very easy for participants to distort self-report measures. A participant may deliberately lie to create a better self-image, or a response may be influenced subtly by the presence of a researcher, the wording of the questions, or other aspects of the research situation. - When a participant distorts self-report responses, the validity of the measurement is undermined.

A measurement procedure is said to have reliability if

- it produces identical (or nearly identical) results when it is used repeatedly to measure the same individual under the same conditions. - ex. if we use an IQ test to measure a person's intelligence today, then use the same test for the same person under similar conditions next week, we should obtain nearly identical IQ scores.

In the case of reaction time, most researchers solve the problem by

- measuring reaction times in several trials and computing an average. The average value provides a much more stable, more reliable measure of performance.

- Earlier, we used the example of attempting to measure height by having people step on a bathroom scale. Because height and weight are related, the measurement that we obtain from the scale would be considered a valid measure of height, at least in terms of concurrent validity. However, the weight measurement is

- not a valid method of measuring height in terms of construct validity. In particular, height is not influenced by short periods of food deprivation. - Weight measurements, on the other hand, are affected by food deprivation. Therefore, measurements of weight do not behave in accordance with what is known about the construct "height," which means that the weight-measurement procedure does not have construct validity as a measure of height.

For example, in a field study, participants are

- observed in their natural environment and are much less likely to know that they are being investigated, hence they are less reactive. Although this strategy is often possible, some variables are difficult to observe directly (e.g., attitudes), and in some situations, ethical considerations prevent researchers from secretly observing people.

Researchers can measure these external, observable events as an indirect method of measuring the construct itself. Typically, researchers identify a behavior, or a cluster of behaviors associated with a construct; the behavior is then measured, and the resulting measurements are used as a definition and a measure of the construct. This method of defining and measuring a construct is called an

- operational definition; Researchers often refer to the process of using an operational definition as operationalizing a construct.

In general, critically examine any measurement procedure and ask yourself whether a different technique might

- produce better measurements.

The categories that make up a nominal scale simply represent

- qualitative (not quantitative) differences in the variable measured. The categories have different names but are not related to each other in any systematic way. o For example, if you were measuring academic majors for a group of college students, the categories would be art, chemistry, English, history, psychology, and so on. Each student would be placed in a category according to his or her major.

The indirect connection between the variable and the measurements can result in two general problems(limitations of O.D prt 2)

1st -operational definitions leave out important components of a construct. Ex, it is possible to define depression in terms of behavioral symptoms (social withdrawal, insomnia, etc.). However, behavior represents only a part of the total construct. Depression includes cognitive and emotional components that are not included in a totally behavioral definition. One way to reduce this problem is to include two or more different procedures to measure the same variable. 2nd-operational definitions often include extra components that are not part of the construct being measured. ex, a self-report of depression in a clinical interview or on a questionnaire is influenced by the participant's verbal skills (ability to understand questions and express feelings and thoughts) as well as the participant's willingness to reveal personal feelings or behaviors that might be perceived as odd or undesirable. A participant who is able and willing to describe personal symptoms may appear to be more depressed than someone who withholds or conceals information

Internal consistency

A measure of reliability; the degree to which a test yields similar scores across its different parts, such as on odd versus even items.

Although striving to be a responsible participant is the most common response, individuals may adopt different ways of responding to experimental cues based on whatever they judge to be an appropriate role in the situation. These ways of responding are referred to as subject roles or subject role behaviors. Four different subject roles have been identified

The good subject role. These participants have identified the hypothesis of the study and are trying to produce responses that support the investigator's hypothesis. As good as this may sound, we do not want participants to adopt the good subject role because then we do not know if the results of the study extend to individuals who did not adopt such a role. The negativistic subject role. These participants have identified the hypothesis of the study and are trying to act contrary to the investigator's hypothesis. Often these individuals are upset that they must participate and are directing their anger toward sabotaging the study. Clearly, we do not want participants in our study to adopt this role. The apprehensive subject role. These participants are overly concerned that their performance in the study will be used to evaluate their abilities or personal characteristics. They try to place themselves in a desirable light by responding in a socially desirable fashion instead of truthfully. Again, we do not want participants to adopt this role because they are not providing truthful responses. The faithful subject role. These participants attempt to follow instructions to the letter and avoid acting on any suspicions they have about the purpose of the study. Two types of participants take on this role: those who want to help science and know they should not allow their suspicions to enter into their responses, and those who are simply apathetic and do not give the study much thought. These are the participants we really want in our study.

The following scenario illustrates the concepts of convergent and divergent validity. - One variable of concern for researchers interested in couples therapy is the quality of the relationship. o Lawrence et al. (2011) have introduced the Relationship Quality Interview (RQI), which is intended to measure relationship quality in five specific domains: o (a) emotional intimacy o (b) sexual relationship o (c) support transactions o (d) power sharing o (e) problem solving.

The researchers demonstrated convergent validity by showing strong relationships among the five RQI scale ratings, indicating that the five domains of the RQI are converging on the same construct (relationship quality). - After establishing convergent validity, however, the researchers wanted to demonstrate that the RQI is really measuring relationship quality and not some other variable. o For example, the scores may actually reflect the general level of satisfaction with the relationship rather than the quality. o It is possible, for example, for couples to be satisfied with a low-quality relationship. o To resolve this problem, it is necessary to demonstrate that the two constructs, "quality" and "satisfaction," are separate and distinct. o The researchers established divergent validity by showing a weak relationship between the RQI quality scores and measures of general satisfaction. Specifically, correlations between the domain-specific measures of quality from the RQI and global relationship satisfaction scores were generally low. o By demonstrating that two or more different methods of measurement produce strongly related scores for the same construct (convergent validity), and by demonstrating a weak relationship between the measurements for two distinct constructs (divergent validity), you can provide very strong and convincing evidence of validity. That is, there is little doubt that you are actually measuring the construct that you intend to measure.

The characteristic that differentiates interval and ratio scales is

The zero point

common sources of error are:

· Observer error: The individual who makes the measurements can introduce simple human error into the measurement process, especially when the measurement involves a degree of human judgment. o Ex. consider a baseball umpire judging balls and strikes or a college professor grading student essays. The same pitch could be called a ball once and a strike later in the game, or the same essay could receive an A one semester and a B at a different time. In each case, the measurement includes some error introduced by the observer. · Environmental changes: Although the goal is to measure the same individual under identical circumstances, this ideal is difficult to attain. Often, there are small changes in the environment from one measurement to another, and these small changes can influence the measurements. There are so many environmental variables (such as time of day, temperature, weather conditions, and lighting) that it is essentially impossible to obtain two identical environmental conditions. · Participant changes: The participant can change between measurements. As noted earlier, a person's degree of focus and attention can change quickly and can have a dramatic effect on measures of reaction time. o Such changes may cause the obtained measurements to differ, producing what appear to be inconsistent or unreliable measurements. Ex. hunger probably does not lower intelligence, but it can be a distraction that causes a lower score on an IQ test.

Defining and Measuring Variables chapter 3

Ensembles d'études connexes

Pre-lecture Questions

Simulated Exam questions

cse 100 chapter 1 quiz

Ch.7 Quiz

Pharm WK 7 Heart Failure/Diuretics

Midterm 2

BUS 100 CHAPTER 13, 11 PRE LECTURE QUIZZES

Waves

Sports and Entertainment Marketing- FINALS Review MASTER SET

FINA Chapter 11

Marketing midterm

Chapter 7 HW (Exam 2)

CHAPTER 58 exam 4

Neuro Exam 2 Content

Endocrine Practice Q's

History 21 Final

Ab Psy Ch. 12 - Schizophrenia Spectrum

skull osteology

Chapter 8

_______ is indicated for which of the following?