Psychology 2
Levels of measurement ordinal
"A level of measurement in which different numbers indicate rank order of cases on a variable" (Dixon et al., 2018, p. 120). Meaningful values, but the intervals between them are unequal Examples: Anything on a Likert type scale (e.g., Strongly Disagree to Strongly Agree, Never to Often, etc.), class rank You can count, take percentiles
levels of measurement ratio
"The highest level of measurement, which has the features of all the other levels plus an absolute (nonarbitrary) zero point" (Dixon et al., 2018, p. 122) i.e. equal intervals plus a meaningful zero You can talk about fractions at this level E.g. X was half as large as Y Examples: age, height, accuracy, reaction time, number of siblings
Levels of measurement nominal
"numbers serve only to label categories of a variable" (Dixon et al., 2018, p. 119). Examples: gender identification, race/ethnicity, religion, college major, political party affiliation Ex: What is your gender identity? 1. Man 2. Woman 3. Non-binary 4. Other (Please specify): _____ Some special considerations for nominal survey questions: Categories should be exhaustive (all possible values should be represented) Categories should be mutually exclusive
General Features of Surveys
(1) Surveys often use large-scale probability sampling • Big samples • Random selection of respondents • Allows researchers to draw statistical inferences about a population! (2) Uses structured questionnaire or interview procedures • Structured interview - "A type of interview with highly specific objectives in which all questions are written beforehand and asked in the same order for all respondents, and the interviewer's remarks are standardized" • Structured interviews are more rigid & standardized than unstructured interviews (broad objectives/questions developed as interview proceeds) or semi-structured interviews (specific objectives, but some freedom for interviewer to meet them) • Surveys generally use closed-ended questions - "Survey questions that require respondents to choose responses from those provided" E.g., What political party do you identify with? 1. Democrat 2. Republican 3. Independent 4. Libertarian 5. Green 6. Other • Open-ended questions are less common & yield qualitative data - "A survey question that requires respondents to answer in their own words" (Dixon et al., 2018, p. 216) E.g., Why do you use Instagram? Open-ended questions can provide rich data but are time-consuming, less likely to be completed by respondents, and costly. (3) Surveys usually employ quantitative analysis • Closed-ended questions are common in surveys because they are easier to examine using quantitative analysis • Descriptive survey - "A survey undertaken to provide estimates of the characteristics of a population" Explanatory survey - "A survey that investigates relationships between two or more variables, often attempting to explain them in cause-and-effect terms"
Assessing Validity
1. Convergent Validation - "Measurement validation is based on the extent to which independent measures of the same concept are associated with one another" • Compare alternate measures of the same concept (e.g., self esteem) • Compare alternate operational methods for measuring the same concept (e.g., self report vs. observation) 2. Construct Validation - "Measurement validation based on an accumulation of research evidence indicating that a measure is related to other variables as theoretically expected" • Examine theory underlying the concept • Form and test hypotheses about how other variables should be related to the measure • Accumulate evidence for the construct validity of the measure over multiple studies
Measurement Process Stepss
And moves to operationalization : 1. Specify Empirical Indicators 2. Spell out procedures The process of identifying empirical indicators and the procedures for applying them to a concept
Data Collection Modes overall
Cost Time Response Rate Population Coverage Quality of Measure
Data Collection Modes Cost
Cost Time Response Rate Population Coverage Quality of Measure Best Worst Computer-assisted self-administered surveys are the least costly. Face-to-face interviews are the most costly
internal consistency A form of reliability assessment; the consistency of "scores" across all the items of a composite measure (i.e., index or scale). the second method of reliability assessment, internal consistency, avoids the practical problem of repeating applications of the same operational definition; however, this method only applies to composite measures based on multiple items, such as the self-esteem scale. Rather than obtain a stability estimate based on consistency over time, as in test-retest reliability, this method estimates the agreement or equivalence among the constituent items of a multi-item measure.
Cronbach's alpha A statistical index of internal consistency reliability that ranges from 0 (unreliable) to 1 (perfectly reliable).
Cross-Sectional vs. Longitudinal Designs
Cross-sectional design - "The most common survey design, in which data are gathered from a sample of respondents at essentially one point in time" • Hard to determine causal relationships • Can't study change over time Longitudinal design- "Survey design in which data are collected at more than one point in time" • Provides stronger causal inference and can help assess change over time
What's the problem with this question??
Do you agree that Twitter is destroying society? 1. Strongly Disagree 2. Disagree 3. Agree 4. Strongly Agree Leading question & emotionally loaded language
levels of measurement interval
Equal intervals between numbers, but no meaningful zero Ex: Temperature in Celsius or Fahrenheit, Years B.C.E. 0 degrees arbitrary, year 0 also arbitrary At this level, you can take the mean (average) Uncommon in psychology (IQ, maybe?)
The Measurement Process- the process of assigning numbers or labels to units of analysis
Example: Rating a movie 3 stars The number or label here & what does it mean 3 stars = Indicator of the movie's quality What is the unit of analysis? Movie Example: Labeling a student as a Psychology major What is the number or label here & what does it mean? • Psychology major = provides info about a student's major area of study What is the unit of analysis? • Student
Constructing a Survey What's the problem with this question?
How often do you post to Instagram? 1. Never 2. Rarely 3. Sometimes 4. Always The response options are vague and could be interpreted in different ways by different respondents How often do you post to Instagram? 1. Once a year or less 2. A few times a year 3. Once a month 4. Two or Three Times a Month 5. Once a Week 6. More than Once a Week 7. Every Day These options are more concrete
confidence intervals:
If we were to repeat our study with many samples, we could be confident that a certain percentage (e.g., 95%) of the calculated confidence intervals in those samples would contain the population value
Reliability & Validity bullseye low reliability low validity
In measurement, we are trying to hit the bullseye (our theoretical concept) Low reliability is reflected in the random, loose pattern of shots Low validity is reflected in the distance of shots to the bullseye Both are critical to quality measurement!
Probability Sampling sampling
Probability sampling: "Sampling based on a process of random selection that gives each case in the population an equal or known chance of being included in the sample" (Dixon et al., 2018; p. 144). You can't always take a measurement from every case in a population So if you want to know about a population, you instead have to select a sample from that population Make inferences based on the measurements from this sample
Reliability & Validity high reliability high validity
Reliability is a necessary but not sufficient criterion for measurement validity!
summary 126
Researchers try to select operational definitions that best capture the meaning of the concept being measured. How you choose to operationalize a concept also depends on the data source or general methodological approach and the de-sired level of measurement.
Measurement Process Stepsss
Then ends with data collection and validation : 1. Apply Operational Definitions to Produce Data 2. Analyze Data/Assess Quality of Operational Definitions This involves both measuring the concept and "assessing how well your measure represents the underlying concept"
Once a site is selected, nonprobability sampling can be used to select cases within the site to observe.
There are four common types: 1. Convenience Convenience sampling - "The selection of cases that are conveniently available" This method is haphazard & often involves just taking the first cases that come along Examples: • Surveying the first 15 people you run into at the grocery store • Asking undergrads to sign up for your experiments for course credit You can't use convenience samples to generalize to a population! 2. Purposive Purposive sampling - "Sampling that involves the careful and informed selection of typical cases or of cases that represent relevant dimensions of the population" Involves selecting cases that can best help you answer a research question • "Typical" cases that represent the average • Expert cases who are experienced with phenomenon • "Deviant" or extreme cases 3. Snowball Snowball sampling - "A sampling procedure that uses a process of chain referral, whereby each contact is asked to identify additional members of the target population, who are asked to name others, and so on" • Assumes members of your target population know and can refer each other • Is particularly useful in cases where your target population might be hidden or hard to find (e.g., injection drug users, undocumented immigrants, etc.) 4. Theoretical Theoretical sampling - "A sampling process used in qualitative research in which observations are selected in order to develop aspects of an emerging theory" • More iterative process where the need to sample new cases emerges from the data
reliability The stability or consistency of an operational definition.
measurement validity The goodness of fit between an operational definition and the concept it is purported to measure. Measurement validity refers to the congruence or "goodness of fit" between an op-erational definition and the concept it is purported to measure.
unidimensionality Evidence that a scale or index is measuring only a single dimension of a concept.
scale A composite measure of a concept constructed by combining separate indicators according to procedures designed to ensure unidimensionality or other desirable qualities.
archival records A source of operational definitions that consists of existing documents and institutional records.
summary 1118 There are two main sources of operational definitions: manipulated operations and measured operations, the latter of which include self/verbal reports, observation, and archival records. Each of these data sources typifies a general research approach. Experiments always involve manipulated operations; surveys and in-depth interviews involve self-reports, or replies to direct questions; field research involves direct observations of behavior, and the analysis of existing data may include a range of archival records.
Probability sampling should begin with a careful definition of the population to which inferences are to be made. The second step is to construct a sam-pling frame, usually by locating one or more available lists. If a complete list is available, it is possible to draw a simple random sample or, to increase sampling efficiency, draw a stratified random sample. When a complete list is unavail-able, multistage cluster sampling may be used. Determining sample size usually in-volves striking a balance between desired precision and available resources such as time and monetary cost. Besides the measurable error produced by random selection, probability sampling is subject to coverage error and nonresponse error.
summary 169
snowball sampling A sampling procedure that uses a process of chain referral, whereby each contact is asked to identify additional members of the target population, who are asked to name others, and so on theoretical sampling A sampling process used in qualitative research in which observations are selected in order to develop aspects of an emerging theory. saturation In purposive and theoretical sampling, the point at which new data cease to yield new information or theoretical insights.
summary 176 Nonprobability sampling involves non-random selection. It is useful for study-ing populations to which researchers have limited access or cannot construct sampling frames, and is appropriate for studying a small number of cases or deciding what to observe and whom to interview within a research setting. Non-probability sampling typically occurs in two stages: selecting cases or research sites and selecting observations within selected sites. Four common methods of selecting units of observation within sites are convenience sampling, purposive sampling, snowball sampling, and theo-retical sampling. Although nonprobabil-ity samples should not be used to make precise statistical inferences, they can be used effectively for developing and gen-eralizing theories.
Principles of Probability Sampling
1. Probability - "The likelihood that something will occur which may vary from 0 to 100%" Probability sampling allows researchers to estimate characteristics of a population from a sample using probability sampling theory Probability sampling relies on concepts of both probability and random selection In probability sampling, because the probability of selection is known, we can also construct the probability distribution of a particular variable: We usually don't know the population values so we need a way to determine how much sample estimates are likely to vary from the population value (i.e.., the sampling error)! 2. Random Selection - "A selection process that gives each element in a population a known and independent chance of being selected" Known = We know the probability of selection • Independent = Chances of choosing one case doesn't effect chances of choosing another 3. Probability Distribution - "A distribution of the probabilities for a variable, which indicates the likelihood that each category or value of the variable will occur" 4. Sampling Distribution - "A theoretical distribution of sample results for all possible samples of a given size" Sampling distributions have 4 features that allow for statistical inference: a. Sampling Distribution Mean = Population Value b. Standard error - "A statistical measure of the average sampling error for a particular sampling distribution, which indicates how much sample results will vary from sample to sample" c. The sampling distribution is normally distributed d. Predictable percentages of sample estimates fall within measurable distances from the population value: Allows researchers to calculate confidence intervals
Assessing Reliability
1. Test-retest reliability - "The association between repeated applications of an operational definition" -Based on consistency over time • Give the same measure to the same set of people on two separate occasions • Calculate a statistical correlation between the two sets of measures • If correlation is above 0.8, suggests that measure is relatively stable between two time points 2. Internal consistency - "The consistency of scores across all the items of a composite measure • Based on consistency across multiple items of an index or scale • Cronbach's alpha - "A statistical index of internal consistency reliability that ranges from 0 (unreliable) to 1 (perfectly reliable) 3. Inter-rater reliability - The extent to which different observers or coders get equivalent results when applying the same measure Based on consistency across observers • Used for observational and archival measures • Can look at the percentage of agreement between raters
Levels of Measurement Ordinal measurement
A level of measurement in which different numbers indicate rank order of cases on a variable How often do you post to Instagram? 1. Never 2. Rarely 3. Sometimes 4. Often The responses can be rank ordered (e.g., Never is less than Rarely) But...there are not equal intervals between response options Can't perform most mathematical operations
Levels of Measurement Interval Measurement
A level of measurement that has the qualities of ordinal level plus equal distances (intervals) between assigned numbers Example: Fahrenheit temperature scale • The difference between 20 and 30 degrees is the same as the differences between 90 and 100 degrees • Arbitrary zero point • Can add/subtract but not multiply/divide • Rare in psychology measurement
Operationalization
Aggression - overt and relational Overt : Who pushes and shoves others around? Relational Who tells a kid's secrets to other kids in the class? indicators Operational definitions are the recipes of research We operationally defined aggression using an existing scale: • Child Social Behavior Scale- Peer Report (Crick & Grotpeter, 1995) • 13 items (6 overt & 7 relational aggression) - E.g., Who pushes and shoves others around? • Classroom peers asked to nominate an unlimited number of classmates from a roster that fit each item - For each item, the proportion of classmates who nominated the child was calculated - An overall aggression score ranging between 0-1 was calculated by averaging all items
summary 124
An important variation in operational definitions is their level of measurement. Operational definitions may produce four different levels: nominal, ordinal, interval, and ratio. The four levels themselves form an ordinal scale with regard to the amount of information they provide. Each level has the features of the level(s) below it plus something else. Table 5.1 illustrates this. In most social science research, however, the distinction between interval and ratio levels of measurement is not very important compared with the differences between the interval and nominal or ordinal levels. Indeed, Chapter 12 distinguishes statistical analyses appropriate for nominal/ordinal versus interval/ratio measures.
manipulation operations are designed to change the value or category of a variable, whereas measurement operations estimate existing values or categories.
An operational definition based on respondents' answers to questions in an interview or questionnaire. Also called self-report.
summary 154
Based on random selection and statistical inference, all probability sam-pling follows a unified framework for making inferences from a sample to a population: researchers calculate statis-tical estimates from a random sample; then, they use knowledge of the sam-pling distribution of the statistic to establish a confidence interval, which indicates the level of confidence that a population value falls within a specified range.
The Survey Process choose sampling frame design and select sample
Because surveys typically rely on probability sampling, it is necessary to build a sampling frame • Frame depends on mode of data collection (e.g., telephone interviewing requires lists of telephone numbers) • For telephone interviews, random digit dialing is often used - "A sampling technique in which dialable telephone numbers are generated (sampled) randomly" • Telephone prefixes from a particular geographic area • Last four digits are random
Data Collection Modes time
Computer-assisted self-administered surveys take the least time Face-to-face interviews take the most time
The Survey Process Construct and Pretest Questionnaire
Constructing a Survey Step 1: Outline topics/concepts that you want to cover in your survey Step 2: Select or develop survey questions • When possible, it can be helpful to select pre-existing well-established scales or questions • Questions should: • Be understood by a respondent in a consistent way (i.e., reliability) • Be understood by all respondents in the same way • Mean the same thing to respondents that they do to the researcher (i.e., validity) Step 3: Determine ordering of survey questions & transitions between topics Start with questions that are easy to answer • Place routine demographic questions at the survey end • Place sensitive questions toward the middle once you've had a chance to build rapport • Questions on the same topic should be grouped
The Survey Process: Data Collection Modes
Face-to-face interviews - "A type of interview in which the interviewer interacts face-to-face with the respondent" Advantages: • Improves response rates - "Proportion of people from whom completed interviews or questionnaires are obtained" (Dixon et al., 2018, p. 222) • Maintains participant motivation during long interviews Disadvantages • Costly in terms of both time and money Computer-assisted personal interviewing (CAPI) - "A software program, usually on a portable computer, that aids interviewers by providing appropriate instructions, question wording, and data entry supervision "A set of computerized tools that aid telephone interviewers and supervisors by automating various data collection tasks" Advantages • Least costly in time & money • Flexible design (more interactive) Disadvantages • Lowest response rates • Internet requirement may leave some populations out Telephone interviews - "A type of interview in which the interviewers interact with respondents by telephone" Advantages • Less costly in terms of time & money • More quality control over data collection Disadvantages • Requires less complex questions with fewer response options • Harder to maintain participant motivation • Lower response rates • Need to be short! Paper-pencil questionnaire survey - "A survey form filled out by respondents" Can be hand-delivered or mailed Advantages • Less costly than face-to-face or telephone • No interviewers • Can be deployed over large geographic area • Anonymity Disadvantages • Much lower response rates • Bad for respondents with low levels of literacy • Low participant motivation
Data Collection Modes population coverage
Face-to-face interviews have the best population coverage Computer-assisted self-administered surveys have the worst population coverage
Data Collection Modes response rate
Face-to-face interviews have the best response rates Computer-assisted self-administered surveys have the worst response rates
Data Collection Modes Quality of Measure
Face-to-face interviews tend to yield the best measurement quality Paper pencil questionnaires tend to yield the worst measurement quality
What are the different levels of measurement?
How do we assess the quality of operational definitions? (i.e., reliability & validity) There are 4 levels of measurement: Nominal Ordinal Interval Ratio
Levels of Measurement nominal variable example pt 2
How would you describe your political preference? 1. Democrat 2. Republican 3. Liberal 4. Conservative 5. Independent 6. Libertarian 7. Green 8. Other Do you notice any problems with this question? The categories do not exhibit mutual exclusivity: "The measurement requirement that each case can be placed in one and only one category of a variable"
Nonprobability Sampling & Inferences
If you are using a nonprobability sample, it is not appropriate to make statistical inferences to a population! Instead, nonprobability samples are used to build theory Usually sample until you reach saturation - "In purposive and theoretical sampling, the point at which new data cease to yield new information or theoretical insights"
The measurement process
Involves moving from a conceptual definition an operationalization Operationalization: "The process of identifying empirical indicators and the procedures for applying them to measure a concept" (Dixon et al., 2018, p. 107). Empirical indicators: "A single concrete proxy for a concept such as a questionnaire item in a survey" (Dixon et al., 2018, p. 109). "An observable characteristic of a concept"
Manipulated vs. Measured Operations
Manipulated operations "are designed to change the value or category of a variable" • Experiments use manipulated operations for study independent variables Measured operations "estimate existing values or categories" Verbal reports - "An operational definition based on respondents' answers to questions in an interview or questionnaire" • Observation • Archival records - "A source of operational definitions that consists of existing documents and institutional records"
Operationalization Example pt 2
Manipulated operations: need to define how I will manipulate the stimuli that I present to participants Tone sequences Either increasing, decreasing, or stable in pitch With each tone, increase/decrease pitch by a certain % Either increasing, decreasing, or stable in tempo With each tone, increase/decrease space between the tones by certain amount
Measurement Process Steps
Measurement starts with conceptualization : 1. Review the literature on the concept 2. Define/Refine the Meaning of the Concept Defining or clarifying the meaning of a given concept Conceptual definition: The meaning of a concept expressed in words that are derived from theory and/or observation. Also called theoretical definition
Data Collection Modes
Mixed-mode survey - "A survey that uses more than one mode of data collection either sequentially or concurrently, to sample and/or collect the data" • Using paper-pencil mailings or telephone interviews to recruit a sample & computer-assisted self-administered surveys to collect data • Starting with face-to-face interviews and shifting to computer assisted self administered surveys to preserve privacy for some questions • Using telephone interviews to catch respondents who didn't initially respond to a paper-pencil questionnaire
Levels of measurement chart
Nominal Ordinal Interval Ratio Meaningful values? No Yes Yes Yes Equal intervals? No No Yes Yes Absolute/meaningful 0? No No No Yes
Levels of Measurement
Nominal measurement - "A level of measurement in which numbers serve only to label categories of a variable" Common nominal variables include things like: • Gender • Race/ethnicity • Political party preference • College Major • Parental Status
Nonprobability Sampling
Nonprobability sampling is used in cases where we can't feasibly draw a sampling frame • Hidden populations (e.g., people living with HIV, sexual assault survivors) • Rare populations (e.g., elderly caregivers) It is also used in field research when researchers are seeking information from particular types of participants
Pretesting a Survey
Once you've constructed a survey or interview, you should conduct a field pretest: • "An evaluation of a survey instrument that involves trying it out on a small sample of persons" • You can use field pretests to: • Estimate survey length • Identify questions that are hard to understand • Identify questions that have no variation in response
Pros & Cons of Surveys
PROS Probability sampling makes surveys effective for generalizing to a population Surveys are versatile and can be used to explore many different psychological concepts Surveys are efficient and can be used to study multiple research questions at once CONS Less control over the independent variable; makes it hard to infer causality Typically relies on self-reported behavior which can lead to problems such as inaccuracy and social desirability
The Survey Process recruit sample and collect data code and edit data
Recruiting participants • It's often helpful to have a study description • Incentives can help improve response rates • All participants must receive an informed consent statement • Describes the study activities (and how much time it will take) • Explains what will be done with the data • Explains that participation is voluntary • Describes incentives • Describes risks and benefits of participating in the survey Data Coding and Editing • Coding - "The sorting of data into numbered (used for closed-ended questions) or textual (used for open-ended questions) categories" • Editing - "Checking data and correcting for errors in completed interviews or questionnaires" (Dixon et al., 2018, p. 241) • Multiple responses to the same item • Code outside the range of valid responses (e.g., saying your age is 1002) • Inconsistencies between responses to items (e.g., saying your age is 5 and that you are married)
Reliability and Validity pp
Reliability "The stability or consistency of an operational definition" (Dixon et al., 2018, p. 126) Types: test/retest: the same person will score the same at any point in time internal consistency: if you divide the questions in half (in any combination), the scores on each half will agree inter-rater: two different people will consistently give the same rating Validity "The goodness of fit between an operational definition and the concept it is purported to measure" (Dixon et al., 2018, p. 126) Are you measuring what you think you're measuring? Types: Convergent (are independent measures of the same construct associated with each other?) Construct (Is the measure related to other variables in ways that we would theoretically expect?)
What is Sampling?
Sampling is the "process of selecting cases" for your research and there are two main types of sampling designs: 1. Probability Sampling - "Sampling-based on a process of random selection that gives each case in the population an equal or known chance of being included in the sample" Purpose of Probability Sampling- The purpose of probability sampling is to make inferences from a sample to a population A. • Inferences - "A conclusion or generalization based on an observation" example: PSY 395 students know a lot about psychology research methods! B. • Population - "The total membership of a defined class of people, objects, or events" Example: All students enrolled in this PSY 395 class C. • Sample - "A subset of cases selected from a population" Example: - 75 randomly selected students in this PSY 395 class 2. Nonprobability Sampling - "Methods of case selection other than random selection"
Steps to nonprobability sampling are more simple than probability sampling
Selecting Cases or Sites In field research, the first step of nonprobability sampling involves selecting a particular case or research site 1. Case Study - "The holistic analysis of a single person, group, or event by one or more research methods" Four common strategies for selecting cases: 1. Convenient location 2. Fit to research topic 3. Provides relevant theoretical comparisons 4. Represents an extreme or "deviant" case
Why is Conceptualization necessary?
Some concepts are complex & the meaning needs to be clarified before we can measure them! Examples: love, aggression, neuroticism, depression Step 1: Review the psychological literature to see how the concept has been defined in the past Aggression = Behavior intended to inflict harm on others Step 2: Define the concept, distinguishing it from other similar concepts & determining its dimensions Multiple forms (e.g., overt, relational) of aggression
Operationalization Example
Study on pitch-based temporal illusions Basically, changes in the pitch of a sound sequence can influence the perception of speeding up and slowing down If increasing in pitch =Speeding up If decreasing in pitch =Slowing down Interested in how people synchronize to stimuli that cause these illusions Synchronization = matching the timing of movements to the timing of a stimulus rhythm
What is a Survey & Why Use One?
Survey - "Basic approach to social research that involves asking a relatively large sample of people direct questions through interviews or questionnaires" In psychology: • Personality psychologists use surveys to assess individual differences in personality and changes in personality over time • Clinical psychologists use surveys to identify levels of depression, anxiety, and other mental disorders • Community psychologists use surveys to examine people's sense of community in their neighborhood and their fit to their environments (e.g., workplace, school)
If you remember anything... lec 7
Surveys are characterized by (1) large scale probability sampling, (2) structured interviews or questionnaires and (3) quantitative analysis • Cross-sectional designs involve one time point of data collection; longitudinal designs involve multiple time points of data collection and can include trend or panel studies • Different modes of data collection (face-to-face, telephone, paperpencil, computer-assisted) have different strengths and weaknesses. You should know these! • The process of conducting a survey involves constructing and pretesting questions, sampling, data collection, coding and editing • Surveys are great for generalizing to populations & answering lots of research questions but not so great for causal inference and may be prone to self-report bias
index A composite measure of a concept constructed by adding or averaging the scores of separate indicators; differs from a scale, which uses less arbitrary procedures for combining indicators.
The counterpart of a conceptual definition, an operational definition A detailed description of the research procedures necessary to assign units of analysis to variable categories. an operational definition describes the exact procedures used to observe the categories or values of a variable.
Levels of Measurement ratio measurement
The highest level of measurement, which has the features of the other levels plus an absolute (nonarbitrary) zero point Example: How many days a week do you exercise? 0 • 1 • 2 • 3 • 4 • 5 • 6 • 7 There is a nonarbitrary zero The intervals are equal between response options It is possible to multiply/divide. A person who exercises 4 times a week exercises twice as much as a person who exercises 2 times a week
What is a sampling distribution? "A theoretical distribution of sample results for all possible samples of a given size" (Dixon et al., 2018; p. 151) i.e. a distribution of sample means For a sample of size n, what possible values can it take on, and what is the probability of each value? The "mean of the means" is always the population mean i.e. if you take an infinite number of samples and get the mean from each, the average of those sample means = the population mean Normally distributed
The sampling distribution is the theoretical probability distribution of all possible sample means for a given population So where do confidence intervals come in? Because the sampling distribution is normal: We know that ~95% of sample means will be within the interval of +/-2 standard errors of the population mean (Standard error being the SD of the sampling distribution) So, if the population mean is 50 and standard error is 2.5 If you take one random sample, there is a 95% chance that the sample mean will be between 45 and 55. If we take an infinite # of samples, and create 95% CI's (mean +/-2 SE) for each one If we select any one of these intervals, there is a 95% chance that interval contains the population mean Which means that for any sample for which we construct a 95% CI, we can be 95% sure that the population mean falls within that interval
The sampling process chart
There are several key steps to probability sampling 1. Target population - "The population to which the researcher would like to generalize [their] results example: On average, how many points per game do college football teams score? • Target population = NCAA Division I Football Bowl Subdivision (FBS) teams 2. Sampling frame - "An operational definition of the population that provides the basis for drawing the sample; ordinarily consists of a list of cases" example: There are 130 FBS Division 1 teams 3. Coverage error - "The error that occurs when the sampling frame does not match the target population" example: we can build a complete sampling frame from the NCAA website 4. Simple Random Sample - "A probability sampling design in which every case and every possible combination of cases has an equal chance of being included in the sample" example: Used Excel's random number generator to select 20 teams. 5. Proportionate Stratified Sample - "A sampling procedure in which strata are sampled proportionate to population composition" 6. Disproportionate Stratified Sample - "A sampling procedure in which strata are sampled disproportionately to population composition" Disproportionate Stratified Sampling requires weighting to obtain unbiased estimates of population characteristics
summary
There are three general features of surveys. First, a large number of respondents are chosen, usually through probability sampling, to represent a population of interest. Second, surveys tend to use structured questionnaire or interview procedures that ask a predetermined set of closed-ended rather than open-ended questions. Third, surveys involve quanti-tative analysis of responses, which may be descriptive (to describe a population), explanatory (to test hypotheses), or both.
summary 114
To answer quantitative questions, the measurement process initially follows the deductive logic of inquiry in which theory informs hypotheses and guides the collection and analysis of data. The first major step is conceptualization, which includes reviewing the literature on a concept and defining/refining its meaning. The second major step is operationalization, which involves specifying empirical indicators of a concept and spelling out procedures by which these indicators will be applied
What's the problem with this question?
To what extent do you agree or disagree with the following statement: I'd be lost without Instagram and Twitter 1. Strongly Disagree 2. Disagree 3. Agree 4. Strongly Agree It is double-barreled because it presents Instagram and Twitter together
• Two types of longitudinal designs: • Trend studies • Panel studies
Trend Study • "A longitudinal design in which a research question is investigated by repeated surveys of independently selected samples of the same population" • "Survey questions are repeated to understand change over time" • "A different independent sample of respondents, representative of the same target population, is asked the same questions in each survey" •" Data shows how the population changes over time" Panel Study •"A longitudinal design in which the same individuals are surveyed more than once, permitting the study of individual and group change" • "Survey questions are repeated to understand change over time" • "The initial sample of respondents is asked the same questions each time the survey is administered" • "Data shows how individuals change over time"
Reliability & Validity
Two criteria are used to assess the quality of measures: Reliability - "The stability or consistency of an operational definition" • Do I get the same results each time I apply the operational definition? • Are items within a measure consistent? Measurement Validity - "The goodness of fit between an operational definition and the concept it is purported to measure • Am I measuring what I intend to measure? • Low validity occurs when there is a measurement error In measurement, we are trying to hit the bullseye (our theoretical concept)
Why is Operationalization necessary?
We need to be able to recognize the concept if we observe it in our research Step 1: Specify empirical indicators A single, concrete proxy for a concept such as a questionnaire item in a survey A single empirical indicator of a concept may not be sufficient - It can have errors - It may not capture the whole concept An index is "a composite measure of a concept constructed by adding or averaging scores of separate indicators" Step 2: Spell out the procedures for applying the empirical indicators by formulating an operational definition A detailed description of the research procedures necessary to assign units of analysis to variable categories
Levels of Measurement nominal variable example
What political party do you identify with? 1. democrat 2. republican Numerical labels have no mathematical meaning? NO Any problems with this question? The measurement requirement that a measure includes all possible values or categories of a variable so that every case can be classified The categories are not exhaustive:
summary 139
Whereas the first major steps in the mea-surement process, conceptualization and operationalization, largely follow the deductive logic of inquiry, concepts and operational definitions may emerge and be refined through data analysis, reflective of the inductive logic of inquiry. Researchers may develop concepts from data; assess the quality of measures by using in-depth interviews; and modify operational definitions and refine their meaning through statistical analysis.
Reliability & Validity bullseye high reliability low validity
You can have high reliability and still have a measure that is not valid High reliability is reflected in tightly clustered pattern of shots But...these are not close to the bullseye indicating that the measure is not valid (it doesn't measure what it is intended to measure)
conceptualization : Defining and clarifying the meaning of concepts
conceptual definition The meaning of a concept expressed in words that is derived from theory and/or observation. Also called theoretical definition.
inter-rater reliability The extent to which different observers or coders get equivalent results when applying the same measure. Also called inter-coder reliability.
convergent validation Measurement validation based on the extent to which independent measures of the same concept are associated with one another Convergent validation consists of examining the association between alternative measures of the same concept.
operationalization The process of identifying empirical indicators and the procedures for applying them to measure a concept. identifying ways of observing the concept in real life and spelling out the procedures for applying these "indicators" when you carry out your research. To answer quantitative questions, you first need to find ways of indicating the concept in question. Second, you need to spell out the procedures by which you will apply these indicators
empirical indicator A single, concrete proxy for a concept such as a questionnaire item in a survey. An empirical indicator is an observable characteristic of a concept.
nominal measurement A level of measurement in which numbers serve only to label categories of a variable. The lowest level, nominal measurement, is a system in which cases are classified into two or more categories on some variable.
exhaustive The measurement requirement that a measure includes all possible values or categories of a variable so that every case can be classified exhaustive means that there must be sufficient categories so that virtually all persons, events, or objects being classified will fit into one of the categories.
weighting A procedure that corrects for the unequal probability of selecting one or more segments (e.g., strata) of the population cluster sampling A probability sampling design in which the population is broken down into natural groupings or areas, called clusters, and a random sample of clusters is drawn. multistage cluster sampling A sampling design in which sampling occurs at two or more steps or stages. probability proportionate to size sampling The selection of cases in cluster sampling so that the probability of selection is proportionate to the size of (i.e., the number of cases in) the cluster.
nonresponse error In survey sampling, the error that occurs when nonrespondents (sampled individuals who do not respond or cannot be contacted) differ systematically from respondents. Also called nonresponse bias case study The holistic analysis of a single person, group, or event by one or more research methods. convenience sampling The selection of cases that are conveniently available. purposive sampling Sampling that involves the careful and informed selection of typical cases or of cases that represent relevant dimensions of the population. Also called judgmental sampling.
structured interview A type of interview with highly specific objectives in which all questions are written beforehand and asked in the same order for all respondents, and the interviewer's remarks are standardized. unstructured interview A type of interview guided by broad objectives in which questions are developed as the interview proceeds. semi-structured interview A type of interview that, while having specific objectives, permits the interviewer some freedom in meetingthem. closed-ended question Survey questions that require respondents to choose responses from those provided.
open-ended question A survey question that requires respondents to answer in their ownwords descriptive survey Asurvey undertaken to provide estimates of the characteristics of a population. explanatory survey A survey that investigates relationships between two or more variables, often attempting to explain them in cause-and-effect terms. cross-sectional design The most common survey design, in which data are gathered from a sample of respondents at essentially one point in time.
mutual exclusivity The measurement requirement that each case can be placed in one and only one category of a variable. of mutual exclusivity means that the persons or things being classified must not fit into more than one category.
ordinal measurement A level of measurement in which different numbers indicate rank order of cases on a variable. ordinal measurement, numbers indicate the rank order of cases on some variable.
longitudinal design Survey design in which data are collected at more than one point in time. trend study Alongitudinal design in which a research question is investigated by repeated surveys of independently selected samples of thesame population. face-to-face (FTF) interview A type of interview in which the interviewer interacts face-to-face with the respondent. computer-assisted personal interviewing (CAPI) A software program, usually on a portable computer, that aids interviewers by providing appropriate instructions, question wording, and data-entry supervision.
panel study A longitudinal design in which the same individuals are surveyed more than once, permitting the study of individual and group change response rate In a survey, the proportion of people in the sample from whom completed interviews or questionnaires are obtained telephone interview A type of interview in which interviewers interact with respondents by telephone. computer-assisted telephone interviewing (CATI) Aset of computerized tools that aid telephone interviewers and supervisors by automating various data-collection tasks.
probability sampling Sampling based on a process of random selection that gives each case in the population an equal or known chance of being included in the sample. nonprobability sampling Methods of case selection other than random selection. population The total membership of a defined class of people, objects, or events. sample A subset of cases selected from a population.
probability The likelihood that something will occur, which may vary from 0 to 100 percent. random selection A selection process that gives each element in a population a known and independent chance of being selected. sampling without replacement A sampling procedure whereby once a case is selected, it is NOT returned to the sampling frame, so that it cannot be selected again. sampling with replacement A sampling procedure whereby once a case is selected, it is returned to the sampling frame, so that it may be selected again. probability distribution A distribution of the probabilities for a variable, which indicates the likelihood that each category or value of the variable will occur.
double-barreled question A question in which two separate ideas are presented together as a unit. leading question Aquestion in which a possible answer is suggested, or some answers are presented as more acceptable thanothers. interview schedule A survey form used by interviewers that consists of instructions, the questions to be asked, and, if they are used, response options. field pretesting An evaluation of a survey instrument that involves trying it out on a small sample of persons.
random-digit-dialing (RDD) A sampling-frame technique in which dialable telephone numbers are generated (sampled) randomly. coding The sorting of data into numbered or textual categories. editing Checking data and correcting for errors in completed interviews or questionnaires.
interval measurement A level of measurement that has the qualities of the ordinal level plus equal distances (intervals) between assigned numbers. Interval measurement has the qualities of the nominal and ordinal levels plus the requirement that equal distances or intervals between "numbers" represent equal distances in the variable being measured.
ratio measurement The highest level of measurement, which has the features of the other levels plus an absolute (nonarbitrary) zero point. The fourth level, called ratio measurement, includes the features of the other levels plus an absolute (nonarbitrary) zero point.
sampling error The difference between an actual population value (e.g., a percentage) and the population value estimated from a sample. standard error A statistical measure of the "average" sampling error for a particular sampling distribution, which indicates how much sample results will vary from sample to sample. confidence interval Arange (interval) within which a population value is estimated to lie at a specific level of confidence.
sampling distribution A theoretical distribution of sample results for all possible samples of a given size. normal curve A bell-shaped distribution of data that characterizes many variables and statistics, such as the sampling distribution of a proportion or mean. Also called normal distribution
measurement error A lack of correspondence between a concept and measure that is due to problems with an operational definition or with its application.
social desirability effect A tendency of respondents to bias answers to self-report measures so as to project socially desirable traits and attitudes.
target population The population to which the researcher would like to generalize his or her results. sampling frame An operational definition of the population that provides the basis for drawing a sample; ordinarily consists of a list of cases. coverage error The error that occurs when the sampling frame does not match the target population. simple random sample A probability sampling design in which every case and every possible combination of cases has an equal chance of being included in the sample.
stratified random sample A probability sampling design in which the population is divided into strata (or variable categories) and independent random samples are drawn from each stratum. proportionate stratified sampling A sampling procedure in which strata are sampled proportionately to population composition. disproportionate stratified sampling A sampling procedure in which strata are sampled disproportionately to population composition.
construct validation Measurement validation based on an accumulation of research evidence indicating that a measure is related to other variables as theoretically expected.
summary 137 One criterion by which the quality of an operational definition is assessed is re-liability, which refers to the consistency or stability of measurement. The three major forms of reliability assessment are test-retest reliability, internal consistency, and inter-rater (or inter-coder) reliability. Another criterion by which the quality of an operational definition is assessed is validity, which refers to whether the operational definition is accurately measuring the concept in question. Compared to reliability assessment, validity assessment is more difficult and is often indirect. Two major forms of validity assessment are convergent and construct validation. Both reliability and validity can be improved through repeated tests and by refining operational definitions through data analysis, discussed next.
paper-and-pencil questionnaire survey A survey form filled out by respondents computer-assisted self-administered interviewing (CASI) An electronic survey in which a questionnaire is transmitted on a computer disk mailed to the respondent or on a laptop computer provided by the researcher. mixed-mode survey A survey that uses more than one mode of data collection, either sequentially or concurrently, to sample and/or collect the data.
summary 229 Surveys vary in their design and modes of data collection. Surveys using a cross-sectional design ask a sample of people questions at one point in time, while surveys using a longitudinal design ask people questions at two or more points in time. Of the two major types of longitudinal designs, a trend study asks the same questions of indepen-dent samples of people, whereas a panel study asks the same questions of the same sample of people at multiple points in time. Surveys collect data through face-to-face interviews, telephone inter-views, paper-and-pencil questionnaires including mail surveys, computer-assisted self-interviews, or some combination of these modes (mixed-mode surveys). Each mode has strengths and limitations re-lated to its costs, the time it takes to ad-minister the survey, the response rate, the population coverage, and its quality of measurement. Face-to-face interviewing is considered to be the best (and most interactive) mode, but its costs and time investment can be prohibitive.
The process of planning and conduct-ing a survey involves choosing a mode of data collection, constructing and pretesting the questionnaire, choosing a sampling frame, designing and select-ing the sample, recruiting the sample and collecting data, and coding and editing data, which are then analyzed. Each of these steps involves additional steps or considerations. Choosing a mode of data collection depends on the researcher's goals and the resources available. Constructing and pretest-ing the questionnaire depends on the mode of data collection, and research-ers should strive to write unambiguous and neutral questions, present them in a logical order, and get feedback on question drafts. Likewise, choosing a sampling frame, selecting and re-cruiting a sample, and collecting data depend on the mode of data collection. At the minimum, this involves ensuring that the sampling frame is as close to the target population as is possible; se-lecting respondents randomly; clearly explaining the purposes of the survey to potential respondents and their rights; and attempting to gain their coopera-tion. Once the data are collected, they need to be coded and edited before being analyzed.
summary 241
secondary analysis Analysis of survey or other data originally collected by another researcher, ordinarily for a different purpose.
summary 244 The major strengths of surveys lie in their ability to provide reasonably accurate estimations of population characteristics, their versatility in speaking to a wide range of topics, and their efficiency. However, surveys are limited in their ability to establish causal relationships and in their reliance on self-reports of human behavior, and in the quality of measurement. Finally, it is important to note several problems that have surfaced in the past quarter-century, which are making it increasingly difficult and costly to conduct surveys, forcing some researchers to seek alternative approaches to social research. These include access impediments such as "walled subdivision, locked apartment buildings, telephone answering machines, [and] telephone caller ID," declining response rates, increasing costs due to "increased effort to contact and interview the public," and telephone survey coverage issues created by the increase in mobile phones
The three principal methods of reliability assessment are test-retest reliability, internal consistency, and inter-coder reliability.
test-retest reliability The association between repeated applications of an operational definition
Cluster Sampling
• "A probability sampling design in which the population is broken down into natural groupings or areas, called clusters, and a random sample of clusters is drawn" • Cluster sampling can be multi-stage! Cluster sampling has some problems! • Sometimes number of cases in clusters varies quite a bit and requires probability proportionate to size sampling • It is less precise than simple random sampling or stratified sampling because variation within cluster is smaller than variation between cluster • This increases sampling error!
If you remember anything... lec 6
• Probability sampling is used to make statistical inferences to a population and involves random selection • Probability sampling can be used to make inferences because we can use theoretical knowledge of the sampling distribution of sample statistic • Steps of probability sampling include defining a target population, making a sampling frame, and choosing a sampling design (simple random sample, stratified sample, cluster sample) • Nonprobability samples cannot be used to make statistical inferences and do not involve random selection
key points
• The measurement process may follow the deductive logic of inquiry in which theory informs data collection and analysis, but there may be "feedback loops" in this process, which reflect an inductive logic of inquiry. • In research addressing quantitative questions, the measurement process begins with conceptualization, in which the meaning of a concept is defined and refined based on a careful review of the scientific literature. • Following conceptualization, concepts are operationalized by specifying empirical indicators and spelling out the procedures to gather data. • To measure a concept, a researcher may use manipulation operations, which are by definition experimental, or measured operations, which include verbal reports, observation, and the use of archival records. • An important consideration in operationalization is the level of measurement. Four levels of measurement—nominal, ordinal, interval, and ratio—indicate the meaning of numbers or labels assigned to variable categories and provide progressively more information. • Concepts are operationalized based on the data source and desired level of precision with the aim of providing the best possible fit between concept and measure. • Operational definitions may be assessed on the basis of their reliability and validity. • Reliability may be assessed by calculating the correlation between repeated applications of an operational definition (test-retest reliability), examining the consistency of responses across the items of a composite measure (internal consistency reliability), or observing the correspondence between different coders or raters applying the same operational definition (inter-rater reliability). • Validity may be assessed by examining the correlation between alternative mea-sures of a concept (convergent validation) or by examining the pattern of asso-ciations between an operational definition of a concept and other variables with which the concept should and should not be related (construct validation). • Analyzing data and assessing the quality of measures may lead to the generation of new concepts and the refinement of operational definitions.
If you remember anything... lec 5
• The measurement process starts with conceptualization then moves to operationalization and finally to data collection and assessment of measurement quality • Manipulated operations are used in experiments to assess independent variables while measured operations are used in multiple approaches to research • The 4 levels of measurement are nominal, ordinal, interval, and ratio. Each has important qualities with implications for how measures can be used • Reliability is a necessary but not sufficient criterion for establishing validity. There are many methods of assessing both reliability and validity
key points ch8
• The primary features of surveys are relatively large probability samples, struc-tured questioning, and quantitative analysis .• Structured interviews address specific objectives with mostly closed-ended questions, whereas unstructured interviews address broad objectives with mostly open-ended questions. • Quantitative analysis of surveys may be descriptive and/or explanatory. • Survey designs may ask questions at one point in time (cross-sectional) or may repeat the same questions at multiple points in time (longitudinal). • Survey data-collection modes include face-to-face interviews, telephone inter-views, paper-and-pencil questionnaires, computer-assisted self-interviews, as well as combinations of these modes (i.e., "mixed mode" surveys) .• Survey data-collection modes vary in their costs, the time they take to com-plete, their response rates, their population coverage, and their quality of measurement .• The process of planning and conducting a survey involves key decisions about how to measure variables and how to select a sample. • Measuring variables entails choosing a mode of data collection and construct-ing and pretesting a questionnaire. • Selecting a sample involves choosing an appropriate sampling frame, drawing a probability sample, and, in interview surveys, randomly selecting respondents within households. • Following recruitment of respondents and administration of the questionnaire, survey responses are coded, edited, and analyzed. • The strengths of surveys are their versatility, efficiency, and ability to produce accurate generalizations about targeted populations, but surveys offer relatively weak inferences about causality, are limited to self-reports, and are susceptible to reactive measurement effects.
key points ch 6
• The two general strategies for selecting cases or observations are probability sampling and nonprobability sampling. • Based on random selection, probability sampling is used to make precise statis-tical inferences from a sample to a population. • To make statistical inferences, researchers use theoretical knowledge of the sampling distribution of a sample statistic to determine the confidence interval, or margin of error. • The steps in probability sampling consist of defining the target population, selecting a sampling frame, devising the sampling design, and determining the sample size. • The most basic probability sampling design, simple random sampling, gives each case in a sampling frame an equal chance of being selected. • Stratified random sampling divides the frame into strata (variable categories) and samples within each stratum; multistage cluster sampling divides the popu-lation into a succession of clusters (natural or geographic groupings), first sam-pling across clusters and then within each selected cluster. • In probability sampling, the two primary considerations in determining an appropriate sample size are desired precision and available resources. • Surveys using probability sampling may be subject to two sources of sample bias: coverage error and nonresponse error. Based on nonrandom selection, nonprobability sampling may be used when the target population cannot be readily identified, a sampling frame cannot be obtained or easily constructed, and research goals seek a holistic or in-depth understanding of a small number of cases. • Nonprobability sampling may occur at two stages: when choosing one or a few cases or research sites and when choosing whom or what to observe within se-lected sites. • Cases and research sites may be selected because they are conveniently located, fit the research topic, provide theoretical comparisons, or represent deviant cases. • Nonprobability methods of selecting interviewees or observations consist of convenience sampling, purposive sampling, snowball sampling, and theoreti-cal sampling. • Probability sampling provides a basis for statistical inference; nonprobability sampling generally is intended to provide a basis for theoretical inference.