Chapter 5: Evaluating Research
A team of researchers is studying how economic stability impacts mental health in men and in women. Place the sampling methods in order from least to most capable of producing results that are generalizable to the U.S. adult population.
(LEAST) 1. emailing the survey to the researchers' friends and colleagues. 2. going to the unemployment benefits office and asking people there to fill out a survey 3. selecting every other household in an urban neighborhood and asking people who answered the door to fill out a survey 4. randomly selecting names from voter registration lists (MOST)
Place the measures of physical health in order from least precise to most precise.
(least precise) 1. A question about whether the respondent has ever been diagnosed with a serious illness. 2. A question asking the respondent to rate their overall health on a 5-point scale. 3. A composite variable created from questions asking whether the respondent has or has had a number of conditions (e.g., asthma, diabetes, or cancer) 4. A composite variable created from in-person measures of the respondent's physical condition (e.g., blood pressure) and from a review of their medical records. (most precise)
Researchers are interested in the effect that receiving free school lunches has on children's health. Place their study designs in order from least likely to most likely to establish internal validity.
1. 2. 3.
The Milgram experiment showed that 1____________ of participants were willing to shock confederates, and decades later, mock reality show The Xtreme Zone showed that 2____________ of contestants were willing to administer shocks a confederate. This suggests that these study results may be 3____________. In both cases, however, the study subjects shared particular characteristics and knew that the situation was carefully monitored, which introduces problems with the study's 4____________.
1. 65% 2. 81% 3. reliable 4. external validity
Match each dimension of internal validity of measures to the correct question. 1. Is the measure comprehensive? 2. Do each of the individual items relate to the concept being measured? 3. Does the measure correlate with an established measure of the same concept? 4. Does the measure seem to make sense? 5. Does the measure correlate with an outcome that it should predict?
1. CONTENT 2. construct 3. concurrent 4. face 5. predictive
Two dimensions of internal validity that are related to construct validity are 1____________ validity, which assesses whether the concepts of a larger measure that should be associated with one another are indeed associated, and 2____________ validity, which assesses whether the concepts of a larger measure that should not be related to one another are not, in fact, related. These dimensions of validity are important for concepts that have 3____________ but related behaviors.
1. convergent 2. discriminant 3. separate
The best way to establish reliability in measurement is for researchers to carefully 1____________ the concepts they wish to measure and then 2____________ based on this 3____________.
1. define 2. operationalize 3. conceptualization
According to the textbook, a measure that is reliable is one that is 1____________ , whereas a measure that is valid is one that is 2____________.
1. dependable 2. accurate
Researchers must be concerned with 1____________ reliability as well as reliability at the level of measurement. This can be tested by running a study again to determine whether the findings can be 2____________.
1. macro-level 2. replicated
The federal poverty line is a measure of poverty that is 1____________ but not 2____________.
1. reliable 2. valid
Match each test of robustness to the correct example. 1. A researcher studying how academic ability impacts life satisfaction divides an ability test in half and gives the same sample each half of the test, then compares their average scores on each half. 2. A researcher receives funding to interview college students about their plans after graduation. She first conducts interviews with a set of 10 students to determine whether her questions make sense to students and whether she needs to add or cut some questions. 3. A researcher is interested in studying political attitudes among retirees. He devises a set of questions tapping into agreement with different political positions and philosophies, and then he gives these questions to a sample of 25 retirees twice, about a month apart.
1. split-half method 2. pilot testing 3. test-retest method
Match each measure of financial resources to the correct set of qualities. 1. Researchers ask respondents about the value of all assets the members of their household own (e.g., homes, stocks and bonds, cars) and their total debts. 2. Researchers ask respondents how often they worry about money. 3. Researchers collect detailed information about the amount of money respondents spent in the past month. 4. Researchers ask respondents to report the amount of income they earned over the previous year.
1. strong in both reliability and validity 2. strong reliability but weak validity 3. weak in both reliability and validity 4. weak reliability but strong validity
Measures likely to have concepts that should not be closely associated with one another can be assessed for discriminant validity. Identify each measure as either assessable or not assessable for discriminant validity.
Assessable for Discriminant Validity: -a measure of romantic relationship quality that includes three measures of overall satisfaction and happiness, five measures of behavior within the relationship, and two measures of conflict -a measure of health that includes questions about health conditions and health behaviors Not Assessable for Discriminant Validity: -a measure of socioeconomic status that includes years of schooling, income, and occupational prestige -a measure of marital status that includes five categories
Which term refers to a measure of intercoder reliability based on agreement between coders?
Cohen's kappa
Select the bold phrase that is true about intercoder reliability. Three coders examine the same set of data. Sociologists always measure intercoder reliability, whether there is one coder, three coders, or many. These three coders use Cohen's kappa as a rigorous measure of intercoder reliability. They are pleased with .63 as their calculation of Cohen's kappa, because this indicates that their measure was accurate.
Cohen's kappa as a rigorous measure of intercoder reliability.
Researchers are developing a measure of job satisfaction. They first draft a few questions about how much the respondents enjoy going to work and getting along with their colleagues based on their own experiences of work and job satisfaction. Then the researchers analyze whether the measure predicts the length of time people stay in their jobs and how strongly associated it is with a question about overall job satisfaction. Identify whether or not the following dimensions of internal validity were considered by the researchers.
Considered by the Researchers: -concurrent -face -predictive Not Considered by the Researchers: -content -construct
Identify the examples of assessing robustness.
Examples: -following a set of respondents for two months and asking them to take a survey about their work habits at the beginning and end of that period, then comparing the results -taking a random sample of 100 respondents, and testing a composite variable by giving 50 people half of the items in the larger variable and the other 50 people the other half, then comparing the average scores -asking a small sample to answer a survey before distributing it to a much larger sample of respondents Not Examples: -following a sample of respondents over the course of a year, having them report their level of stress every day at the same time, and then comparing those responses to the times of the year and weather conditions when they were made -taking a small random sample of people and asking them for feedback on several composite variable items
The two types of study validity are ____________ validity, which concerns the degree to which a study is ____________ beyond the study itself, and ____________ validity, which is concerned with whether a study can establish a ____________ effect of an independent variable on a dependent variable. -causal -internal -generalizable -external
External, Generalizable, Internal, Causal
True or False: In response to widespread criticism of its poverty measures, the National Academy of Sciences recommends counting the number of people who self-identify as poor.
False
True or False: The Milgram experiment had problems with external validity, but The Xtreme Zone did not.
False
Identify each experiment as either involving or not involving manipulation.
Involving Manipulation: -Researchers randomly assign some participants to listen to a lecture on history and the others to watch a documentary on the same topic. Then each group of participants takes a test to see how well they retained the information. -Researchers randomly assign half of the participants to read a news article about how stress can interfere with test outcomes before asking all participants to take a math test, to see whether reading the article affects performance. Not Involving Manipulation: -Participants are asked to fill out surveys. Some participants receive extra questions about their interest in the topic of the survey, and other participants do not. -An ethnographer observes participants in a range of settings in order to see how their behavior changes based on who they are with and where they are.
Imagine a group of researchers set out to improve the CES-D scale of depressive symptoms by revising the wording of some items to be more applicable across racial or ethnic identification and immigrant background. Identify the possible effects of this project.
Possible Effect(s): -It would be more useful for comparing depressive symptoms across groups. -It would improve the scale's validity. Not Possible Effect(s): -The new measure would produce the same data as the old measure. -It would improve the scale's reliability.
Professor Smiley is constructing a measure of college prestige which would indicate the relative ranking of colleges, from the least to most prestigious, on a scale from 0 to 100. The measure includes 10 individual items which will be averaged together to create this scale. He uses several criteria to assess his measure's validity. Match each type of internal validity of measures to the correct example.
Predictive: He tests whether his scale correlates with graduates' income and likelihood to attend a graduate or professional school. Construct: He uses statistical tests to determine whether each individual item relates well to the overall construct Content: Based on prior research and discussions with colleagues, he makes a list of all the dimensions of prestige and compares them to his items. Face: He reads through the 10 items to see if they make sense to him. Concurrent: He downloads the rankings of colleges and universities from U.S. News & World Report and compares its rankings to his ratings.
Identify whether or not the following studies have problems with external validity.
Problems with External Validity: -A researcher is interested in women's attitudes of, and experiences with, the gendered division of household labor in their homes, so she interviews 50 white women for her study. -Researchers are interested in whether stress affects one's sense of smell. They set up a lab experiment in which subjects are put through a difficult test and then asked to identify foods by scent. -A survey on nutrition and eating habits is handed out each day in front of a fast-food restaurant. No Problems with External Validity: -Researchers test whether employers racially discriminate against job applicants by submitting applications for a range of jobs and varying the names of applicants to signal different racial backgrounds. -A study of college students' dating and sexual lives was conducted in 100 colleges across the country, with researchers asking a range of faculty members to distribute the surveys in their classes.
According to Dr. Krista Perreira, identify the reasons why researchers have only recently begun to reconsider the CES-D measure.
Reasons: -Early statistical tests were insufficient to detect differences across groups. -Funding for health and social sciences measures was limited. -Immigrants comprised a smaller proportion of the population in the past. -The measure was popular and widely used. Not Reasons: -Not as many researchers were interested in the topic of mental health.
Identify each criteria as either recommended or not recommended by Lincoln and Guba for evaluation of qualitative research.
Recommended: -dependability -credibility Not Recommended: -validity -reliability
Identify each scenario as either strengthening or not strengthening the reliability of the CLASS and HOME measures.
Strengthening Reliability: -Researchers train observers with detailed directions regarding what to look for and how to interpret observations. -Researchers train observers using videos before having them collect data. Not Strengthening Reliability: -Researchers use people who are comfortable in the environment, such as teachers or parents, to conduct observations. -Researchers alter the criteria slightly to accommodate different state policies and contextual factors.
Identify each of the following as either a systematic error or a random error.
Systematic Error: -A question measuring one's experience in marriage assumes respondents are all heterosexual. -A survey question uses advanced vocabulary that many respondents do not understand. Random Error: -A survey question asking about employment status is misread as educational status by some respondents. -A study conducted by a call center surveys respondents anytime between 9 a.m. and 9 p.m. Researchers find that those interviewed late in the day answered "don't know" to more questions than those interviewed in the morning.
Identify the true and false statements about vignettes.
TRUE: - - FALSE: - -
Identify the true and false statements about the concept of precision.
TRUE: -A measure should be detailed and specific. FALSE: -Precision measures how much information a respondent gives on a survey. -Precision is a dimension of measurement that supports validity. -Researchers should be well organized and prepared in order to avoid human error.
Identify the true and false statements about the importance of reliability and validity.
TRUE: -Reliability and validity help make research generalizable and replicable. -Fulfilling both criteria ensures that researchers are operationalizing variables to match their conceptualizations. FALSE: -Reliability is a criterion of operationalization, and validity is a criterion of conceptualization. -When designing their studies, researchers can only fulfill either the reliability criterion or the validity criterion.
Select the bold phrases that represent composite variables. Researchers Cho and England wish to conduct a study on gender, work ethic, and happiness. They measure gender using an open-ended question to account for a variety of gender identities. They measure work ethic using the average of five questions regarding respondents' attitudes and behaviors toward work, and they measure happiness with a question asking respondents to rate their level of happiness. They later add two additional variables: an open-ended question asking the respondents' annual household income and an average of all their adult household members' educational attainment.
average of five questions regarding respondents' attitudes and behaviors toward work, average of all their adult household members' educational attainment.
Select the bold phrase that refers to internal validity of a measure. Researchers Liu and Jones are studying the effect of teacher quality on elementary school students' achievement. The researchers first had to create a measure that accurately represented teacher quality. They spent a long time building and testing this measure, including discussing different facets of teacher quality and whether their proposed measures correlated with previous measures. Then Liu and Jones decided to use test scores to measure achievement. Finally, they analyzed the data to determine whether their measure of teacher quality successfully predicted achievement.
create a measure that accurately represented teacher quality.
Which term refers to how closely a measure correlates with some other factor?
criterion-related validity
Select the bold phrases that represent dependability, a criterion slightly different than reliability that is used for judging research. In a national multi-method study, an interviewer found positive long-term effects of early childhood exposure to books among rural adults in Maine. The interviewer found similar results across numerous rural adults in Maine. Another researcher conducted a survey about the same topic and reached similar conclusions. A third researcher used a national sample, across fourteen states, and also drew similar conclusions.
found similar results across numerous rural adults in Maine.
One way to improve reliability is to assess ____________, which is a protocol testing how well a measure works.
robustness