Research Methods Exam 1
observational measures
raters record the visible behaviors of people or animals
to establish criterion validity, researchers make sure the scale or measure is correlated with
some relevant behavior or outcome
factors that can cause participant responses less accurate
*Response sets - people might adopt a consistent way of answering all the questions, especially toward the end of a long questionnaire. Rather than thinking carefully about each question, people might answer all of them positively, negatively, or neutrally. Response set weaken construct validity because these survey respondents are not saying what they really think. (shortcut) Common response set: 1) Yea or nay-saying -Solution: reverse scored items 2) Fence-sitting: playing it safe by answering in the middle of the scale, especially when survey items are controversial. Also people might also answer in the middle (or say "IDK") when a question is confusing or unclear. It is difficult to distinguish those who are unwilling to take a side from those who are truly ambivalence.(simultaneous conflicting reasons) -Solution: remove midpoints, add N/A, forced choice format *Question order - previous answers can influence following answers. -Solution: separate emotionally-charged questions; ask most important question first *Socially-desirable responding - when participants give answers they believe make them "look good". -Solution: encourage honestly through study design; remove individuals high in socially desirable responding; use filler items to disguise true purpose *Cognitively-impenetrable items - ask for information that we cannot access consciously. So we cant always access true motivation via self-repot
abstract
-concise summary of the article (120 words long) -briefly describes the study's hypotheses, method, and major results
introduction
-first paragraph explain the topic of the study -middle paragraphs lay out the theoretical and empirical background for the research what theory is being tested? what have past studies tested? what makes the present study important? pay attention to the final paragraph, it states the specific research questions, goals, or hypotheses for the current study
what are the two ways that journalists might distort the science they attempt to publicized?
-had not been peer-review -methods and conclusions had not been assessed by scientists in the field
Observer can also bias their subjects through EXPECTANCY EFFECTS
-people behave in ways they think they're expected to -even when not introducing a bias, researchers can affect the behavior of subjects via REACTIVITY (behaving differently when being observed by anyone)
2 common methods to reduce observer bias
-use coding manuals -employ multiple raters (and calculate inter-rater reliability)
Observational research is naturally subjective, it is prone to OBSERVER BIAS
-when observers record what they EXPECT to see rather than what ACTUALLY occurred -when observers create a Self-fulfilling prophecy
Scatterplot differs in 2 important ways
1) Direction of the relationship: slope direction can be positive, negative, or zero sloping up, sloping down, or not sloping at all 2) Strength of the relationship (r) dots are close to a straight, sloping line; in others, dots are more spread out. This spread corresponds to the strength of the relationship. In general, the relationship is strong when dots are close to the line; it is weak when dots are spread out. The numbers below the scatterplots are the correlation coefficient (r). (r) indicates the same 2 things as the scatterplot: the direction of the relationship & the strength of the relationship, both of which evaluate reliability evidence when the slope is positive, r is positive when the slope is negative, r is negative r of -1.0 = strongest possible negative relationship r of 1.0 = strongest possible positive relationship If there is no relationship between 2 variables, r will be .00 or close to .00 ( .02-.04 )
Describe at least 3 things that STATISTICAL validity addresses
1) You can ask how well the two variables were measured (construct validity) 2) You can ask whether you can generalize the result to a population (external validity) 3) You can ask about whether the researcher might have made any statistical conclusion errors as well as evaluating the strength and significance of the association (statistic validity)
Expectancy effects and reactivity can be remedied in several ways
1) hide 2) wait it out - let them get comfortable 3) measure artifacts of the behavior (measure soap, water, paper towels..)
Ways to measure
1) self report 2) observational 3) physiological
Criteria for causal claims
1) the cause variable and outcome variable are correlated (the correlation between the cause variable and the outcome variable cannot be zero) 2) the causal variable came first and the outcome variable came later 3) there is no other explanation for the relationship
reading with a purpose : empirical journals articles
1) what is the argument 2) what is the evidence to support the argument after reading the abstract, you can skip to the end of the intro to find the primary goals and hypotheses of the study after reading the goals and hypotheses, you can read the rest of the introduction to learn more about the theory that the hypotheses are testing another way to find the info about the argument of the paper is the FIRST paragraph of the Discussion section, where most authors summarize the key results of their study and state whether the results support their hypothesis once you have sense of what the argument is, you can look for the evidence, it is contained in the Method and Results section -what did the researchers do, and what results did they find? how well do these results support their argument (aka hypotheses)
Causal claim verbs
causes affects may curb exacerbates changes may lead to makes sometimes makes hurts promotes reduces prevents distracts fights worsens increases trims adds
Why is being falsifiable important in scientific thinking?
A hypotheses must be falsifiable: they must entail predictions which are in principle testable and could be false. Falsifiable is important in scientific thinking because you can reformulate your hypothesis if it's falsified and better explain real world phenomenon.
Operational Definition
A precise description for a particular study (description of how a variable is to be measured or manipulated) Must: 1) OBJECTIVE -they do not depend on judgement or opinion. 2) EASY TO REPLICATE - another researcher could exactly re-create the measurements. 3) RELIABLE - under the same conditions, they produce the same results.
Validity
concerns whether the operationalization is measuring what it is supposed to measure
Define each of the four validities and, for each, indicate when that validity is important to assess (meaning that for which types of claims is that type of validity important?)
Construct validity is how well the operational definition of a variable captures it's conceptual definition and if we are measuring or manipulating the variables correctly. It is important for Frequency, Association, and Causal Claims. Statistical validity is how well do the numbers support the claims, the degree to which a study's statistical conclusions are accurate and reasonable. It is important for Frequency, Association, and Causal Claims. External validity is the extent to which the results of a study generalize to some larger population, as well as to other times or situations. It is important for Frequency and Association Claims. Internal validity is how well a study can rule out alternative explanations for the results. It is important for Causal Claims.
Criterion Validity
Criterion validity examines whether a measure correlates with key outcomes and behaviors. Is the operational definition consistent with the real-world outcomes? Strong patterns between measurements and outcomes are evidence of high levels of validity no matter what type of operationalization is used, if it is a good measure of its construct, it should correlate with a criterion behavior or outcome that is related to that construct. Evidence for criterion validity is commonly represented with Correlation coefficients, it does not have to be. Another way to gather evidence for criterion validity is to use Known-Groups Paradigm, in which researchers see whether scores on the measure can discriminate among set of groups whose behavior is already well understood.
Conceptual Variable
Key variable of interest. Variables that are often expressed in abstract terms, broad.
references
contains a full bibliographic listing of all the sources the authors cited in writing their article, enabling interested readers to locate these studies
leading questions
are framed to suggest a particular answer participants are likely to form to perceived expectations responses can be influenced by HALO EFFECTS - feeling about one part are transferred to linked parts
What does it mean for a hypothesis to be falsifiable?
For a hypothesis to be falsifiable means that it could be tested and have the potential to be disconfirmed.
questions that can influence responses
Format : free response, forced choice, likert-scale Wording: unclear, leading (halo effect), double barreled, negatively-worded, framing questions (ceiling, flooring effect)
what are the differences between PsycInfo and Google scholar?
Google does not allow you to search as easily in specific fields doesn't categorize the article is less well organized
What question would you ask to interrogate a study's CONSTRUCT VALIDITY
How well the OPERATIONAL definition of a variable captures it's CONCEPTUAL definition? Are we measuring or manipulating the variables appropriately? How well did they measure DV, How well did they manipulate IV?
Face Validity
It looks like what you want to measure. mean does our our measure appear to be a good way of assessing whatever the variable is that we're interested in. (SUBJECTIVE)
empirical journal articles
contains details about the study's method, the statistical tests used, and the numerical results of the study
discussion
generally summarizes the study's research question and methods and indicates how well the data supported the hypotheses may discuss alternative explanations for their data and pose interesting questions raised by the research
Explain the Third Variable Problem & indicate to which rule for establishing causation for the problem is relevant
The Third Variable Problem refers to when we find a correlation between 2 variables we cant know whether there is some other variables is actually causing the relationship. This is relevant to the Internal Validity rule for causation. The problem in Third Variable Problem is that we never know if we have eliminated confounds variables. We don't know in a correlation whether there is some other third variable out there thats causing this relationship. And thats an issue for Internal Validity. To have Internal Validity we must establish that there are no alternative explanations for the association between variable A and variable B.
Content Validity
The measure contains all the parts that your theory says it should contain. are we capturing all the content of the variable. Does the operational definition capture all relevant parts (content) of the construct. (SUBJECTIVE)
Why don't researchers usually aim to achieve all four of the big validities at once?
They usually find it impossible to conduct a study that satisfies all four validities at once. Researcher decide what their priorities are.
Inter-rater reliability
Two or more independent observer will come up with consistent (or very similar) findings. Most relevant for OBSERVATIONAL measures consistent scores are obtained no matter who measures or observes different criteria or observation of different situations can cause LOW inter-rater reliability 1 reason why inter-rater reliability is LOW could be that the observer did not have a clear enough operational definition of "happiness" to work with. Another reason could be that one or both of the coders has not been trained well enough yet. ex) you are assigned to observe the number of times each child smiles in 1 hour on a daycare playground. Your lab partner is assigned to sit on the other side of the playground and make his own count of the same children's smiles. If, for one child, you record 12 smiles during the first hour, and your lab partner also records 12 smiles in that hour for the same child, there is inter-rater reliability. Any 2 observers watching the same children at the same time should agree about which child has smiled the most and which child has smiled the least.
Why are some pseudoscientific hypotheses often not falsifiable?
When a hypothesis cant be falsifiable is when it is untestable or it is too vague. It makes it difficult to generate predictions that could possibly disconfirm the theory.
data
a set of observations. depending on whether or not the data are consistent with hypotheses based on a theory, the data may either support or challenge the theory data that match the theory's hypotheses strengthen the researcher's confidence in the theory. When the data do not match the theory's hypothesis, however, those results indicate that the theory needs to be revised.
theories
a set of statements that describes general principles about how variables relate to one another -good theory are supported by data -good theory are falsifiable -good theories have parsimony (all other things being equal, the simplest solution is best) -theories don't prove anything -evaluate theories based on the weight of evidence
hypothesis/prediction
a way of stating the specific outcome about the researcher expects to observe if the theory is accurate
Divergent Validity
aka discriminate validity does the operational definition correlate with measures of dissimilar concepts (BAD)
External Validity
an indication of how well the results of a study generalize to, or represent, individuals or contexts besides those in the study itself. Important for Frequency & Association claims
Convergent Validity and Divergent Validity
are usually evaluated together as a pattern A measurement should correlate MORE strongly with similar traits (convergent validity) and LESS strongly with dissimilar (divergent validity) there are no hard-and-fast rules for what the correlations should be. Instead, the overall pattern of convergent and divergent validity helps researchers decide whether their operationalization really measures the construct they want it to measure , as opposed to other construct
double barreled questions
ask two things at one (poor construct validity) it is impossible to answer them separately, so you cannot tell which question drives the response fix: ask question separately
Measurement validity
can be established with subjective judgments (face validity and content validity) or with empirical data
unclear questions
can introduce error participants respond to their 'understanding' of what the question means, not necessarily your intent Fix = clarify question
what are the 3 criteria that causal claims must satisfy?
covariance temporal precedence internal validity
results
describes the quantitative and as relevant, qualitative results of the study, including the statistical tests the authors used to analyze the data (tables & figures to summarize key results)
Empiricism
empirical methods/research = using evidence from senses (sight, hearing , touch) or from instruments that assist the senses (thermometers, timers, photographs, weight scales, and questionnaires) as the basis for conclusions empirical research can be used for both applied and basic research questions -basic research is not intended to address a specific, practical problem; the goal is to enhance the general body of knowledge ( basic research is an important basis for later, applied studies) -applied research is done with a practical problem in mind (the researcher hop their findings will be done directly APPLIED to the solution of that problem in a particular real-world context -translational research represents a dynamic bridge from basic to applied research (translational research attempts to translate findings of basic science into applied arenas)
test-retest reliability
establishes whether a sample gives a consistent pattern of scores at more than one testing consistency over time
inter-rater reliability
establishes whether two observers give consistent ratings of a sample of targets consistency between raters
method
explains in detail how the researchers conducted their study. (participants, materials, procedure, and purpose)
open-ended format
free response; responses are not constrained Advantage = allows for unexpected responses Disadvantage = irrelevant data , time consuming, often difficult, subjective
Reliability
how consistent the results of a measure are. If your measurement is reliable, you get a consistent pattern of scores every time evidence for reliability is a special example of an association claim. The association between one version of the measure and another, between one coder and another, or between an earlier time and a later time when assessing reliability, a negative correlation is rare and undesirable
Scatterplot
important first step in evaluating reliability (also r) can be a helpful tool for assessing the agreement between 2 administrations of the same measurement (test-retest reliability) or between 2 coders (inter-rater reliability)
empirically derived validity tests
include criterion validity as well as convergent and divergent validity
internal reliability
is established when a sample of people answer a set of similarly worded items in a consistent way consistency regardless of wording
Association claim verbs
is linked to is at higher risk for is associated with is correlated with prefers are more/less likely to may predict is tied to goes with
measurement reliability
is necessary but not sufficient for measurement validity
Convergent Validity
is the operational definition definition "specifically" (and only) measuring what we expect? Just like criterion validity, convergent validity = does the operational definition correlate with other measures of similar concepts (GOOD)
Observations
observational research involves watching and recording the behavior of others 1) must be able to collect unbiased data 2) must insure out presence doesn't affect the behavior being measured
types of surveys
open-ended format closed-ended format likert-scale quick, easy, cheap
closed-ended format
participants must choose the best options Advantage = easy to score & analyze Disadvantage = options might not reflect participants true response (their reason might not be listed)
likert-scale
participants must select a level of agreement (most popular in personality scale)
Forced choice
people must pick one of 2 answer
self report
person reports on his or her own behaviors, beliefs, or attitudes
review journal articles
provide a summary of all the published studies that have been done in one research area (meta-analysis, effect-size)
Internal reliability
relevant for measures that use more than 1 item to get at the same construct a study participant gives a consistent pattern of answers, no matter how the researcher has phrased the question. A set of items has internal reliability if its item correlate strongly with one another. If they correlate strongly, the researcher can reasonably take an average to create a single overall score for each person. ex) sample of people take Diener's five-item subjective well-being scale. The questions on his scale are worded differently, but each item is intended to be a measure of the same construct. Therefore, people who AGREE with the 1st item on the scale should ALSO AGREE with the 2nd item (as well as with 3, 4, and 5). Similarly, people who DISAGREE with the 1st should ALSO DISAGREE with items 2, 3, 4, and 5. If the pattern is consistent across items in this way, the scale has internal reliability. correlation-based statistic: Cronbach's Alpha- to see if the measurement scales have internal reliability. First they collect data on the scale from a large sample of participants, and then they compute all possible correlations among the items (does item 1 correlate with item2? how bout 1 & 3? 2& 3? ...) the formula for Cronbach's alpha returns one number, computed from the average of the inter-item correlations and the number of items in the scale. The closer the Cronbach's alpha is to 1, the better the scale's reliability. (for self report, alpha of .70 or higher). If the internal reliability is GOOD, the researchers can average all the items together. If the internal reliability is LOW, the researchers are not justified in combining all the items into one scale. They have to go back and revise the items or average together only those items that correlate strongly with one another.
convergent and divergent validity
require showing that a measure is correlated in a pattern. More strongly with measures of similar constructs than with measures of dissimilar constructs.
criterion validity
requires showing that an operationalization is correlated with outcomes that it should correlate with, according to the understanding of the contruct
Correlation coefficient (r)
researcher can use a single number (r) to indicate how close the dots on the scatterplot are to a line drawn through them.
explain what the consumer of research and producer of research roles have in common, and describe how they differ
researchers in both roles require a curiosity about behavior, emotion, and cognition research producers and consumers share a desire to ask, answer, and communicate interesting questions. Both of them share a commitment to practice of empiricism to answer psychological questions with direct, formal observations, and to communicate with others about what they have learned. research producer = deepen your understanding of psychological inquiry. EVIDENCE - BASED treatment consumer of research needs to know how to ask the right questions, determine the answers, and evaluate a study on the basis of those answers
physiological measures
researchers measure biological data, such as heart rate, brain activity, and hormone levels
Ways to measure variables
self-report observational measures physiological measures
semantic differential
similar to likert scale but with opposite adjective anchors
types of measurement reliability
test-retest reliability inter-rater reliability internal reliability
Explain the Directionality Problem & indicate to which rule for establishing causation of the problem is relevant
the Directionality Problem applies to when we find a correlation between two variables we cant know which one cause the other . This is relevant to the Temporal Precedence rule for causation if we do a correlation analysis even if there is these 2 variables that are related we wouldn't know which one is causing the other even if there is a causal relationship there, if we just do a correlation analysis we cant tell the direction of causation. The criteria for establishing causation is Temporal Precedence indicating which variable precedes the other in time (that variable A occurs before variable B). We cant tell time wise for which problem came first.
Internal Validity
the ability to rule out alternative explanations for causal relationship between two variables. Important for Causal claims
Statistical Validity
the extent to which statistical conclusions derived from a study are accurate and reasonable
Generalizability
the extent to which the subjects in a study represent the populations they are intended to represent; how well the settings in a study represent other settings or contexts
Test-retest reliability
the researcher get consistent scores every time he/she uses the measure ex) people took an IQ test today. When they take it again 1 month later, the scores should be consistent. The pattern should be the same: People who scored highest at Time 1 should also score the highest at Time 2. Test-retest reliability can apply whether the operationalization is self-report, observational, or physiological. Primarily relevant when researchers are measuring constructs (intelligence, personality, religiosity) If we find that test-retest reliability to be low, simple its because the constructs do not stay the same over time (measuring flu symptoms, work stress..)
framing
the scale of response options can also bias the data Ceiling effect - an experimental design problem in which independent variable groups score almost the same on a dependent variable, all scores fall at the high end of their possible distribution. Floor effect - an experimental design problem in which independent variable groups score almost the same on dependent variable, all scores fall at the low end of their possible distribution.
negatively-worded questions
use 2+ types of negation in the same question participants can be easily confused and answer the opposite of what they intended "impossible" "never" double-barreled questions, negatively worded questions can reduce construct validity because they might capture people's ability or motivation to do the cognitive work, rather than their true opinions
After an initial study, what are some of the further questions scientists might ask in their research?
why does this occur? when does this happen the most? for whom does this apply? what are the limits?