Research Methods in Psychology: Chapter 5

Ace your homework & exams now with Quizwiz!

It is usually not necessary to establish discriminant validity between

a measure and something that is completely unrelated

To ensure content validity

a measure must capture all parts of a defined construct

The operational definition of a variable represents

a researcher's specific decision about how to measure or manipulate the conceptual variable

Face validity is

a subjective judgment: if it looks as if it should be a good measure, it has face validity

For research on children

self-reports may be replaced with parent reports or teacher reports. These reports ask parents or teachers to respond to a series of questions about the child

Any variable can be expressed in two ways:

conceptual variable and operational definition

Validity

concerns whether the operationalization is measuring what it is supposed to measure

Researchers check face validity by

consulting experts

Interval scale

- A scale of measurement that applies to the numerals of a quantitative variable that meet two conditions: First, the numerals represent equal intervals (distances) between levels, and second, there is no "true zero" (a person can get a score of 0, but the 0 does not really mean "nothing") - Ex: IQ score, shoe size, degree of agreement on a 1-7 scale

Ratio scale

- A scale of measurement that applies when the numerals of a quantitative variable have equal intervals and when the value of 0 truly means "nothing" - Ex: Height in cm, number of exam questions answered correctly, number of seconds to respond to a computer task

Ordinal scale

- A scale of measurement that applies when the numerals of a quantitative variable represents a ranked order - Ex: Order of finishers in a swimming race and a ranking of 10 movies from most to least favorite

Slope direction

- The direction of a relationship - Can be positive, negative, or zero - that is, sloping up, sloping down, or not sloping at all

Categorical variables (aka nominal variables)

- Variables that are categorized. - Examples: sex, species, nationality, and favorite song

Strength

A description of an association indicating how closely the data points in a scatterplot cluster along a line of best fit drawn through them

Physiological measures

A measure which operationalizes a variable by recording biological data such as brain activity, hormone levels, or heart rate.

Observational measure

A measure which operationalizes a variable by recording observable behaviors or physical traces of behaviors.

Self-report measure

A measure which operationalizes a variable by recording people's answers to questions about themselves in a questionnaire or interview

Correlation coefficient (r)

A method of evaluation which indicates how close the dots on scatterplots are to a line drawn through them

A negative r indicates

a big problem

Internal reliability is relevant

for measures that use more than one item to get at the same construct

A measurement should correlate more strongly with ______________________ ____________________ and less strongly with _____________________ ______________________

similar traits (convergent validity); dissimilar traits (discriminant validity)

The closer the Cronbach's alpha is to 1

the better the scale's reliability (for self-report measures, researchers are looking for Cronbach's alpha of .70 or higher)

The numbers below the scatterplots are

the correlation coefficients, or r

The r indicates the same two things on a scatterplot:

the direction of the relationship and the strength of the relationship, both of which psychologists use in evaluating reliability evidence

If the internal reliability is good

the researchers can average all the items together

Intelligence tests can be considered

observational measures; this is because the people who administer such tests in person are observing people's intelligent behaviors (such as being able to correctly solve a puzzle or quickly detect a pattern)

A researcher could operationalize happiness by

observing how many times a person smiles

Evidence for reliability is a special example of an association claim because

of the association between one version of the measure and another, between one coder and another, or between an earlier time and a later time.

The value or r can only fall between

1.0 and -1.0

Quantitative variables

- Variables that are coded with meaningful numbers - Height and weight are quantitative because they are measured in numbers such as 150 centimeters or 45 kilograms

Known-groups paradigm

-A method for establishing criterion validity; in which a researcher tests two or more groups, who are known to differ on the variable of interest, to ensure that they score differently on a measure of that variable. - Known-groups can be used to validate self-report measures

Cronbach's alpha (or coefficient alpha)

A correlation-based statistic that measures a scale's internal reliability

Discriminant validity

An empirical test of the extent to which a measure does not associate strongly with measures of other, theoretically different constructs

Convergent validity

An empirical test of the extent to which a measure is associated with other measures if a theoretically similar construct

With interrater reliability

Two or more independent observers will come up with consistent (or very similar) findings. This reliability is the most relevant for observational measures

Criterion validity

Validity that evaluates whether the measure under consideration is related to a concrete outcome, such as behavior, that it should be related to, according to the theory being tested

Test-retest reliability

When a researcher gets consistent scores every time he or she uses the measure

Internal reliability

When a study participant gives a consistent pattern of answers, no matter how the researcher has phrased the question

Interrater reliability

When consistent scores are obtained no matter who measures or observes

Researchers may use two or more statistical devices for data analysis:

scatterplots and the correlation coefficient (r)

Facial EMG can detect

a happy facial expression, because people who are smiling show particular patterns of muscle movements around the eyes and cheeks

As with r

a kappa close to 1.0 means that the two raters agreed

4. Classify each result below as an example of face validity, content validity, convergent and discriminant validity, or criterion validity. a. A professor gives a class of 40 people his five-item measure of conscientiousness (e.g., "I get chores done right away", "I follow a schedule," "I do not make a mess of things"). Average scores are correlated (r = -.20) with how many times each student has been late to class during the semester b. A professor gives a class of 40 people his five-item measure of conscientiousness (e.g., "I get chores done right away", "I follow a schedule," "I do not make a mess of things"). Average scores are highly correlated with a self-report measure of tidiness ( r = .50) than with a measure of general knowledge ( r = .09) c. The researcher e-mails his five-item measure of conscientiousness (e.g., "I get chores done right away", "I follow a schedule," "I do not make a mess of things") to 20 experts in personality psychology, and asks them they think his items are a good measure of conscientiousness. d. The researcher e-mails his five-item measure of conscientiousness (e.g., "I get chores done right away", "I follow a schedule," "I do not make a mess of things") to 20 experts in personality psychology, and asks them if they think he has included all the important aspects of conscientiousness.

a. Criterion validity b. Convergent and Discriminant validity c. Face validity d. Content validity

1. Classify each operational variable below as categorical or quantitative. If the variable is quantitative, further classify it as ordinal, interval, or ratio a. Degree of pupil dilation in a person's eyes in a study of romantic couples (measured in millimeters) b. Number of books a person owns c. A book's sales rank on Amazon.com d. Location of a person's hometown (urban, rural, or suburban) e. Nationality of the participants in a cross-cultural study of Canadian, Ghanaian, and French students f. A student's grade in school

a. Quantitative, ratio b. Quantitative, ratio c. Quantitative, ordinal d. Categorical e. Categorical f. Quantitative, interval

3. Classify each of the following results as an example of internal reliability, interrater reliability, or test-retest reliability? a. A researcher finds that people's scores on a measure of extroversion stay stable over two months b. An infancy researcher wants to measure how long a 3-month-old baby looks at a stimulus on the right and left sides of a screen. Two undergraduates watch a tape of the eye movements of ten infants and time how long each baby looks to the right and to the left c. A researcher asks a sample of 40 people a set of five items that are all capturing how extroverted they are. The Cronbach's alpha for the five items is found to be .65

a. Test-retest reliability b. Interrater reliability c. Internal reliability

If a measurement is reliable

there is a consistent pattern of scores every time it is measured

Measurement reliability and measurement validity

are separate steps in establishing construct validity

Convergent validity and discriminant validity

are usually evaluated together, as a pattern

2. Which of the following correlation coefficients best describes the pictured scatterplot?

b. r = -.95

An IQ test is an interval scale

because the distance between IQ scores of 100 and 105 represents the same distance as 110 and 115. However, a score of 0 on an IQ test does not mean a person has "no intelligence".

Test-retest reliability

can apply whether the operationalization is self-report, observational, or physiological. However, it is primarily relevant when researchers are measuring constructs (such as intelligence, personality, religiosity) that they expect to be relatively stable in most people

Moment-to-moment happiness

has been measured using facial electromyography (EMG) - a way of electronically recording tiny movements in the muscles in the face.

A set of items has internal reliability

if its items correlate strongly with one another

The validity of a measure

is not the same as its reliability

The conceptual definition

is the researcher's definition of the variable in question at a theoretical level

A more common and efficient way to evaluate reliability relationships

is to use the correlation coefficient

A measurement has face validity when

it appears to experts to be a plausible measure of the variable in question

If an IQ test has criterion validity

it should be correlated with behaviors that capture the construct of intelligence, such as how fast people can learn a complex set of symbols (an outcome that represents the conceptual definition of intelligence)

No matter what type of operationalization is used, if it is a good measure of its construct,

it should correlate with a criterion behavior or outcome that is related to that construct

Observational measures

may record physical traces of behavior

A physiological way to operationalize people's level of stress might be to

measure the degree of cortisol that is released in their saliva, because people under stress show higher levels of cortisol

Many researchers are most convinced when measures have been associated with a variety of behaviors (using criterion validity), however,

no definitive test will establish validity

When the slope is positive, r is ________________________; when the slope is negative, r is ________________________

positive; negative

The behavioral criteria provided by criterion validity

provide excellent evidence for construct validity

When the relationship is strong

r is close to either 1.0 or -1.0

When the relationship is weak

r is closer to zero

If there is no relationship between two variables

r will be .00 or close to .00 (i.e., .02 or -.04)

Reliability

refers to how consistent the results of a measure are

Physiological measures usually

require the use of equipment to amplify, record, and analyze biological data

To assess the test-retest reliability of some measure

researchers measure the same set of participants on that measure at least twice - at Time 1 and Time 2. After this, they are able to compute r

Before using a particular measure in a study

researchers should not only check to be sure that the measures are reliable but also be sure that they measure the conceptual variables they were intended to measure. This is validity.

To study conceptual variables (other than happiness)

researchers start by stating a definition of their construct (the conceptual variable) and then create an operational definition

The relationship is strong when ________________________; it is weak when ______________

the dots are close to the line; when the dots are spread out

Body temperature in degrees Celsius is an example of an interval scale because

the interval levels are equal; however, a temperature of 0 degrees does not mean that a person has "no temperature".

Operationalization

the process of turning a concept of interest into a measured or manipulated variable

If a set of items correlate strongly

the researcher can reasonably take an average to create a single overall score for each person

Is the internal reliability is low

the researchers are not justified in combining all the items into one scale. They have to go back and revise the items -- or average together only those items that correlate strongly with one another

A scatterplot is a helpful tool for assessing the agreement between two administrations of

the same measurement (test-retest assessment) or between two coders (interrater reliability)

Kappa

the statistic observers use when rating a sample of a categorical variable

An r of -1.0 represents

the strongest possible negative relationship

An r of 1.0 represents

the strongest possible positive relationship

Researchers worry about discriminant validity when

they want to be sure that their measure is not accidentally capturing a similar but different construct

Kappa measures the extent

to which two raters place participants into the same categories.

If r is positive but weak

we could not trust the observers' ratings

To test interrater reliability of some measure

we might ask two observers to rate the same participants at the same time, and then we would compute r

If r is positive and strong (according to many researchers, r = .70 or higher)

we would have very good interrater reliability

If r is positive and strong (for test-retest, we might expect .50 or above),

we would have very good test-retest reliability

If r is positive but weak

we would know that participants' scores on the test changed from Time 1 to Time 2. This is a sign of poor measurement reliability.

Criterion validity examines

whether a measure correlates with key outcomes and behaviors.

By using a scatterplot

you can see if the two ratings agree (if the individual dots are close to a straight line drawn through them) or whether they disagree (if the individual dots scatter widely from a straight line drawn through them)


Related study sets

FNP (Pediatrics Part 2 - Hip pain - Legg-Calve-Perthes Disease)

View Set

"5 plus 5" rules of med administration

View Set

Managerial Accounting Chapter 27

View Set