exam research 312
operationalization
"does helping behavior become diffused?" They hypothesized that participants in a lab would be less likely to help when they believed there were more potential helpers besides themselves. This conversion from research question to experiment design is called...
Each of the *Big Five* can even be defined in terms of six more specific constructs called
"facets" of personality - openness to experience, conscientiousness, extraversion, agreeableness, neuroticism
• number of orders required to complete counterbalancing is computed using...
"n!" where n is the number of conditions. (exp: 3! = 3x2x1 = 6 possible orders for 3 conditions) (6! = 6x5x4x3x2x1 = 720 possible orders for 6 conditions)
o Specify the four broad steps in the measurement process.
(a) conceptually defining the construct, (b) operationally defining the construct, (c) implementing the measure, and (d) evaluating the measure.
• statistics in single-factor two level design
(means, t-test) interval or Ratio Scale data - calculate 2 *means*/averages, & compare using *t-test* OR Nominal Scale Data • t-test = statistics in single-factor two level design
• Random-Selected-Orders design
(random counterbalancing) - from the entire set of all possible orders, a subset of orders is randomly selected & each order is administered to one participant. Useful when there are a large number of conditions & the number of orders exceeds the #participants
o (b) operationally defining the construct - creating your own measure (measurement process)
(self-report, behavioral, and physiological measures)- you can modify an existing questionnaire, create a paper-and-pencil version of a measure that is normally computerized (or vice versa), or adapt a measure that has traditionally been used for another purpose. You should strive for simplicity. The need for brevity, however, needs to be weighed against the fact that it is nearly always better for a measure to include multiple items rather than a single item.
simulation studies
(where experiment attempts to reproduce real life events in a lab) often have good internal validity & better external validity than lab experiments (but not as good as field experiments) [greater psychological realism]
*random assignment* must be used in order for a ____
*between-subjects* design to be considered a true experiment
• There are several approaches for dealing with order effects. Each can be considered its own type of within-subjects design
*counterbalancing:* It is pretty easy to counterbalance when you only have two conditions but it gets more complicated with more conditions. With 3 conditions there are now 6 possible orders that you will need to present the conditions in, so you would need some multiple of 6 participants, because you need to have an equal number of participants complete each order for complete counterbalancing
• Repeated measures = ?
*within-group* design: All participants complete all conditions. each participant engages in every condition of the experiment
some form of *counterbalancing* (complete, latin squares, random-selected-order) must be used in order for a____
*within-subjects* design to be considered a true experiment
o a measure can be extremely reliable but have no validity whatsoever. for example...
, imagine someone who believes that people's index finger length reflects their self-esteem and therefore tries to measure self-esteem by holding a ruler up to people's index fingers. Although this measure would have extremely good test-retest reliability, it would have absolutely no validity.
• no-treatment control condition
, participants receive no treatment whatsoever. One problem with this approach, however, is the existence of placebo effects. A placebo is a simulated treatment that lacks any active ingredient or element that should make it effective, and a placebo effect is a positive effect of such a treatment
• block randomization (ABC) (BCA) (ACB)
- a single round of all the conditions is completed, then another& another, as for many rounds are needed to complete the experiment. within each round, the order of conditions is randomly determined.
• Explain what pilot testing is and why it is important
- a small-scale study conducted to make sure that a new procedure works as planned. recruit participants formally (from participant pool) or informally. Ensures participants understand instructions, clears misunderstandings, avoids mistakes, looks at if indirect manipulation is effective, finds out how long procedure takes, if it works properly, & if data is recoded correctly
• Continuous variables
- any two adjacent scale values, where further intermediate values are still possible (height, weight). -limitless possibilities (142.6777768 pounds)
• Example of internal validity
- if people who exercise regularly are happier than people who do not exercise regularly, this implication would not necessarily mean that exercising increases people's happiness.
High internal validity
- if the way it was conducted supports the conclusion that the independent variable caused any observed differences in the dependent variable. Because the way they are conducted—with the manipulation of the independent variable and the control of extraneous variables—provides strong support for causal conclusions.
• Single factor two-level design
- includes one independent variable (factor) with only two conditions.
• Nominal level = name (scales of measurement)
- involve meaningful but potentially arbitrary, non-numerical categories. attributes are only named; weakest. involves assigning scores that are category labels to communicate whether any two individuals are the same or different in terms of the variable being measured. (gender, race, hair color, brand of jeans).
• independent variable IV
- is systematically manipulated by the investigator (the presumed causal factor).
• Divergent validity
- the extent to which scores on two different measures of different constructs do not relate each other. Scores on a measure of trustworthiness should correlate with scores on a measure of integrity but not scores on a measure of physical activity
• external validity
- the generalizability of the findings beyond the present study - experiments are often conducted under conditions that seem artificial. generalizability of the findings beyond the present study
9) assume again we are conducting a study with 3 conditions (no music, pop music, classical) & we want 10 people in each condition. How many participants will be need to recruit for a *within-subject* design?
10
8) assume we are conducting a study with 3 conditions (no music, pop music, classical) & we want 10 people in each condition. How many participants will be need to recruit for a between-subject design?
30
10) Now assume we are conducting a study with 3 conditions (no music, pop music, classical) & we want 10 people in each condition. How many participants will be need to recruit for a *matched-pairs* design?
30 (same as between-subjects)
6) Dr. E is planning to conduct an experiment to examine the influence of self-esteem on pain tolerance. She plans to manipulate self-esteem by giving false feedback (positive, negative) on an essay. She measures pain by counting the seconds that a participant leaves their arm in an ice bath. -should she use a between-subjects design (random assignment) or a within-subjects design?
A between-subjects
12) what type of statistic would we need to calculate for a study with 3 levels of the IV & DV is the number of words participants were able to recall.
ANOVA & Post hocs
in order to determine whether the difference in the proportions is large enough to be considered statistically significant, to be unlikely just a results of chance, we use a____
Chi-Square Test is just like a t-test but it is used to statistically compare proportions, to determine whether the difference in two proportions or percentages is large enough to be unlikely due to chance alone. So if the dependent variable was measured using a nominal scale (so yes/no; like/dislike; remembered/didn't remember) then proportions or percentages are calculated and compared using a chi-square test.
4. within-groups design is particularly important in a:_____
Counterbalancing
• Ways to standardize the procedure of an experiment
Create a written protocol that specifies everything that the experimenters are to do, (2) create written instructions, (3) automate procedure by using software packages, (4) train multiple experimenter together, (5) be sure that each experimenter tests participants in all conditions
o Perhaps the most common measure of *internal consistency* used by researchers in psychology is a statistic called...
Cronbach's α (the Greek letter alpha). - is the mean of all possible split-half correlations for a set of items. For example, there are 252 ways to split a set of 10 items into two sets of five. Cronbach's α would be the mean of the 252 split-half correlations. Note that this is not how α is actually computed, but it is a correct way of interpreting the meaning of this statistic.
o Interrater reliability is often assessed using...
Cronbach's α alpha when the judgments are quantitative or an analogous statistic called Cohen's κ (the Greek letter kappa) when they are categorical.
o (a) conceptually defining the construct (measurement process)
Having a clear and complete conceptual definition of a construct. it allows you to make sound decisions about exactly how to measure the construct. If you had only a vague idea that you wanted to measure people's "memory," for example, you would have no way to choose whether you should have them remember a list of vocabulary words, a set of photographs, or a newly learned skill,
o Example ratio
Height measured in meters and weight measured in kilograms are good examples. So are counts of discrete objects or events such as the number of siblings one has or the number of questions a student answers correctly on an exam.
• Example of Latin Square in partial counterbalancing
If we have four conditions than our Latin square will be 4X4. So we will end up having the same number of orders as we have conditions. You need to have at least one participant complete each possible order and you need to have an equal number of participants complete each order, so again you will need to have a multiple of 4 participants if you have 4 conditions,
25) A race has been run and the finishing places have been posted (1st, 2nd, 3rd, etc.) along with the times for each runner. What scales of measurement are represented by the data described here?
Interval & ordinal
how is content validity assessed?
Like face validity, content validity is not usually assessed quantitatively.
7) let's say we are running a study examining the effect of listening to music on memory test performance. One groups listens to music & the other does not. Our DV is__, our IV is ___, & there are __ levels of the IV.
Memory test performance; music, 2
o content validity in new measurements. a new measure must have...
Multiple items are often required to cover a construct adequately. The other is a matter of reliability. People's responses to single items can be influenced by all sorts of irrelevant factors—misunderstanding the particular item, a momentary distraction, or a simple error. & has clear instructions
16) Octavia is measuring the type of bed people sleep on (memory foam, coil mattress, water bed), so what type of measurement scale is this?
Nominal
• low internal validity
Nonexperimental research designs (e.g., correlational designs), in which variables are measured but are not manipulated by an experimenter, are low in internal validity.
o Internal consistency example - (can only be assessed by collecting and analyzing data)
On the Rosenberg Self-Esteem Scale, people who agree that they are a person of worth should tend to agree that they have a number of good qualities. If people's responses to the different items are not correlated with each other, then it would no longer make sense to claim that they are all measuring the same underlying construct. This is as true for behavioral and physiological measures as for self-report measures.
• Carryover effects - type of order effects
Participating in one condition may influence your behavior in the other condition. (exp) if we give everyone the drug 1st & placebo 2nd, then the effects of the drug may trigger in the participant's system carrying over to the placebo condition. This may make it look like the drug was not effective when really it was so effective it continued to have an effect after it was terminated. (drug studies almost always use a between subjects design.)
• what is used to control for participant characteristics in between-subjects designs?
Random assignment: where each participant has an equal probability of being assigned to any one of the conditions OR within-group design where each participant engages in all conditions
• Ordinal level - scales of measurement
Ranking levels of a variable (contest winner, birth order, drink size SML) Assigning scores so that they represent the rank order of the individuals to show whether any two individuals are the same or different & whether one individual is higher or lower on that variable. (differences are not the same size: large is not twice as big as small)
Why do we need internal validity?
Two variables being statistically related does not necessarily mean that one causes the other. "Correlation does not imply causation." The purpose of an experiment, however, is to show that two variables are statistically related. If the researcher creates two or more highly similar conditions and then manipulates the independent variable to produce just one difference between them, then any later difference between the conditions must have been caused by the independent variable.
o (b) operationally defining the construct (measurement process)
Using an Existing Measure or creating your own. (a) you save the time and trouble of creating your own, (b) there is already some evidence that the measure is valid (if it has been used successfully), and (c) your results can more easily be compared with and combined with previous results.
Explain what a psychological *construct* is and give several examples.
We cannot accurately assess people's level of intelligence by looking at them, and we certainly cannot put their self-esteem on a bathroom scale. These kinds of variables include personality traits (e.g., extraversion), emotional states (e.g., fear), attitudes (e.g., toward taxes), and abilities (e.g., athleticism).
o Ratio meaning - adds the same ratio at two places on the scale also carries the same meaning
You can think of a ratio scale as the three earlier scales rolled up in one. Like a nominal scale, it provides a name or category for each object (the numbers serve as labels). Like an ordinal scale, the objects are ordered (in terms of the ordering of the numbers). Like an interval scale, the same difference at two places on the scale has the same meaning.
• Systematic error
a constant amount of error that occurs with each measurement
replication & extension
a new design element is added to the original study (new levels of IV, new IVs, and/or new DVs)
example of demand characteristics
a participant whose attitude toward exercise is measured immediately after she is asked to read a passage about the dangers of heart disease might reasonably conclude that the passage was meant to improve her attitude. As a result, she might respond more favorably because she believes she is expected to by the researcher.
• *manipulation check* - an attempt to directly measure whether the change of the independent variable had its intended effect to assess construct validity. Helps experimenter determine whether a failure had an effect (of the IV on the DV) is due to a problem in change whether than a real lack of effect. -- during a pilot study (trial run) can save experimenter expense/time
a separate measure of the construct the researcher is trying to manipulate. confirm that the independent variable was, in fact, successfully manipulated. determines whether the null result (not significant) is due to a real absence of an effect of the independent variable on the dependent variable or if it is due to a problem with the manipulation of the independent variable
11) how does a single factor two-level design differ from a single factor multilevel design?
a single factor two-level design as only 2 levels of IV (2 conditions) & a multilevel design has more than 2 levels of IV
• pilot study
a trial run of the study, usually conducted with a small number of participants, prior to initiating the actual study. *practice* procedure; *materials:* verify equipment is working or surveys are clear; identify potential problems with manipulation of independent variable & sensitivity of the dependent variable; can identify demand characteristics [materials, ceiling, practice]
• Confounding variable
a variable that varies with the independent variable in such a way that we can no longer determine which one has caused the change in the dependent variable. It varies systematically & therefore becomes a competing explaintation for differences in the dependent variable. (decreases internal vaidlity, or ability to conduce IV differences)
• Ratio - scales of measurement
absolute zero - involves assigning scores in such a way that there is a true zero point that represents the complete absence of the quantity. - Contain a zero point that represents the absence of the variable being measured (age, weight, time) [A weight of 0 indicates no weight.]
• confounding variable
an extraneous variable that varies systematically with the IV. they interfere with our ability to say the independent variable caused a change in the dependent variable. they threaten internal validity which is the ability to make causal conclusions.
• Scales of measurement are importance because...
analyzing the results of data, statistical tests, & interoperating results
5) Dr. C is planning to conduct an experiment to examine the influence of self-esteem on pain tolerance. She plans to manipulate self-esteem by giving false feedback (positive, negative) on an essay. She measures pain by counting the seconds that a participant leaves their arm in an ice bath. -assume that participants with false feedback became angry. What is the *confounding* variable?
anger
Variable (anything that varies)
any factor or attribute that can assume two or more values
Experimenter Expectancy Effect
any influence that the experimental exerts on participants to support the hypothesis under investigation. Experimenter may actually treat participants differently according to how they think participants should perform. (2) Experimenter may be biased in the way that he records & interprets the behaviors of participants
• extraneous variable
any variable other than the IV that might influence the DV. (gender/women more likely to have depression)
• Extraneous variable
any variable other than the independent variable & dependent variable, that may influence the dependent variable (gender/room conditions)
o Self-report measures
are those in which participants report on their own thoughts, feelings, and actions, as with the Rosenberg Self-Esteem Scale.
direct replication/exact replication
attempt to replicate precisely the procedures of a study to see whether results are reliable. the exact same manipulation of the independent variable is used in the same way. So a researcher who obtains an unexpected finding will often replicate it to make sure it is reliable and not a fluke, or Type I Error.
o Behavioral measures
behavior is observed & recorded. This is an extremely broad category that includes the observation of people's behavior both in highly structured laboratory tasks and in more natural settings. (exp) measuring working memory capacity using the backward digit span task.
• Nominal Scale Data
calculate *proportions* for each group & compare them using a *chi-Square test* for both multilevel and two-level designs
• disadvantages of matched-groups design
can be costly & time-consuming. All variables cannot be the same. Demand characteristics may tip off participants as to what the study is looking for.
• Independent variable = ?
cause. while • Dependent variable = effect (• True experiments - highest internal validity )
• Situational variable
characteristic that differ across environments or stimuli (temperature/light)
• Statistical validity (inferential statistics tests)
concerns the proper statistical treatment of data and the soundness of the researchers' statistical conclusions. There are many different types of inferential statistics tests (e.g., t-tests, ANOVA, regression, correlation) and statistical validity concerns the use of the proper type of test to analyze the data. researchers must consider the scale of measure their dependent variable was measured on and the design of their study. Are confounds controlled?
• disadvantages of within-subjects design
concerns with demand characteristics (participants realize hypothesis). *order effects* occur when responses are affected by order of conditions. *carryover effects* when one condition influences another. *progressive effects* when order effect changes responses from their cumulative exposure to prior conditions (practice/fatigue)
o When the criterion is measured at the same time as the construct, criterion validity is referred to as...
concurrent validity
• complete counterbalancing
conditions of an independent variable are arranged in every possible sequence, & an equal number of participants are assigned to each sequence.
6. A researcher conducts a study examining the alcohol consumption of male and female college students. She brings the male students into the laboratory the week before spring break. She brings the female students into the laboratory the week after spring break. If the researcher is interested in the influence of gender, then the difference in timing of the two groups may be a(n)____
confounding
10) A researcher is interested in the effects of exercise on memory test performance. She varies whether or not participants exercise/run for 30 minutes & then administers a memory test. Pretend that the treadmill which is only on for those in the running condition makes a very loud noise. *Noise* is an ___
confounding variable
• How do we choose the best operational definition?
consider practical factors, previous research, validity & reliability, & use multiple definitions sometimes
26) Which phrase is the best description of the concept of reliability of scores on psychological measures of intelligence, personality, and attitudes?
consistency
• Internal consistency
consistency of a measure within itself
o internal consistency
consistency of people's responses across the items on a multiple-item measure. In general, all the items on such measures are supposed to reflect the same underlying construct, so people's scores on those items should be correlated with each other.
2) the fact that she used her own made up measure of depression may decrease____
construct validity
6) Length of a desk is an example of a _____ variable
continuous (just like age)
4) Age is an example of a ___ variable
continuous (just like length)
• Define what a control condition is, explain its purpose in research on treatment effectiveness, and describe some alternative types of control conditions.
control condition, in which they do not receive the treatment. If participants in the treatment condition end up better off than participants in the control condition—for example, they are less depressed, learn faster, conserve more, express less prejudice—then the researcher can conclude that the treatment works. In research on the effectiveness of psychotherapies and medical treatments, this type of experiment is often called a *randomized clinical trial.*
o Criteria can also include other measures of the same construct. For example, one would expect new measures of test anxiety or physical risk taking to be positively correlated with existing established measures of the same constructs. This is known as....
convergent validity.
o Example split-half correlation
correlation between several university students' scores on the even-numbered items and their scores on the odd-numbered items of the Rosenberg Self-Esteem Scale. The correlation coefficient for these data is +.88. A split-half correlation of +.80 or greater is generally considered good internal consistency.
• Cronbach's Alpha
correlations between each item & every other item are calculated & the average of those correlations is computed
criteria need to be met in order to conclude *causation* (that X causes Y)
covariation of X & Y - As X varies, Y varies. IV is manipulated to determine whether it causes DV to vary across conditions. --*temporal order* - the variation of X occurs before the variation of Y. IV is manipulated 1st, then DV is measured. --• Absence of plausible alternative explanations. Extraneous variables are controlled to ensure they don't become confounding variables (competing explanations)
• Operational definitions
defining a variable in terms of the procedures used to measure or manipulate it (make concepts public & allow other scientists to study the same constructs the same way. (not vague, but specific)
• Face validity - is the extent to which a measurement method appears "on its face" to measure the construct of interest. Most people would expect a self-esteem questionnaire to include items about whether they see themselves as a person of worth and whether they think they have good qualities.
degree to which the items on a measure appear to measure the construct of interest. Appears to measure what it is supposed to.
• mundane realism
degree to which the materials & procedures involved in an experiment are similar to events that occur in the real world (high ecological validity). Designed to be like real life events [reduces artificiality]. if they studied the decisions of ordinary people doing their weekly shopping in a real grocery store. But if they observed people in a lab, then it would have *low external* validity.
• *Construct validity* -another element to scrutinize in a study is the quality of the experiment's manipulations.
degree to which the measure actually measures what it is intended to measure. -- Or the degree to which the constructs that the researchers claim to be studying are, in fact, the constructs that they are truly manipulating and measuring. (relates the quality of the operational definition of the construct & quality of manipulation of IV)
7) a study looks at the effect of a chaotic environment (garbage cans overflowing/no cleaning) on cognitive functioning. researchers were concerned that the research environment may have been a ____ because participants could have noticed that the lab was intentionally manipulated.
demand characteristic
Researchers conducted a study examining the effects of a pill designed to reduce anxiety on self-reported anxiety. Patients who received the sugar pill reported decreases in anxiety. The researchers reported their findings and suggested that there may have been a _________;
demand characteristic (not trash)
9) A researcher is interested in the effects of exercise on memory test performance. She varies whether or not participants exercise/run for 30 minutes & then administers a memory test. Further assume there is noise by a construction crew nearby. *Memory* is an ___
dependent variable
Distinguish *conceptual definition* from operational definitions, give examples of each, and create simple operational definitions
describes the behaviors & internal processes that make up that construct; how it relates to other variables. (exp) neuroticism is people's tendency to experience negative emotions such as anxiety, anger, and sadness. it has a strong genetic component, remains fairly stable over time, & involves pain
• The proper statistical analysis should be conducted on the data to determine whether the...
difference or relationship that was predicted was found. The number of conditions and the total number of participants will determine the overall size of the effect. With this information, a power analysis can be conducted to ascertain whether you are likely to find a real difference
• between-subject designs
different participants are randomly assigned to different groups. To ensure that participant characteristics do not become confounds, researchers use [Random assignment], a procedure in which each participant has an equal probability of being assigned to any one of the conditions in the experiment (flip a coin/draw out of a bag)
conceptual replication
different procedures (operational definitions, manipulations of IV) are used to try to address the same research question. discover whether a relationship between conceptual variables exists.
3) Number of words recalled by an individual is an example of a ___ variable
discrete
the price of a 6-pack of Coke is an example of a ___ variable
discrete (like numbers)
• counterbalancing [complete, partial, or random]
disrupts any systematic relationship between conditions & the order of presentation. Order effects can still occur but they are no longer systematic-- they are reduced to an extraneous variable rather than a confound
pros of Interval - scales of measurement
distance is meaningful. Involve use of real numbers designing real amounts that reflect relative differences in magnitude (temperature) - [the difference between 1 and 2 is the same as the difference between 2 and 3] separated by equal values, but do not have a zero point
• random sampling (probability sampling)
each member of a population has an equal chance of being selected from the population to participate in the study. used to select a sample that is representative of the population [to increase external validity]
• advantages of matched-groups design
ensures groups are equivalent without the disadvantages of within-subjects designs. (exp) assess depression using a pretest. We could then rank ordered people with respect to levels of depression. We would take the two most depressed people and randomly assign each to different conditions
4) Researcher is planning to conduct an experiment to examine the influence of self-esteem on pain tolerance. She plans to manipulate self-esteem by giving false feedback (positive, negative) on an essay. She measures pain by counting the seconds that a participant leaves their arm in an ice bath. -assume that participants with false feedback became angry. What is the *extraneous* variable?
essay writing
researchers are replicating an earlier experiment that indicated that participants who received task-specific feedback were more likely to persist at a task than general feedback. in an effort to ensure participants are not treated differently, researchers automate all procedures & follow written instructions. researchers are trying to minimize_____
experimenter expectancy effects
• All experimental are not low in external validity because...
experiments do not need to be artificial & can simulate real life. They are often conducted to learn about psychological processes that are likely to operate in a variety of people and situations.
o Discriminant validity
extent to which scores are not correlated with measures of variables that are conceptually distinct. (exp) self-esteem is not the same as mood, which is how good or bad one happens to be feeling right now. So people's scores on a new measure of self-esteem should not be very highly correlated with their moods. If the new measure of self-esteem were highly correlated with a measure of mood, it could be argued that the new measure is not really measuring self-esteem; it is measuring mood instead.
• Convergent validity
extent to which scores on the measure correlate with scores on the other measures of the same construct (we are simply examining the extent to which scores on two different measures that are not established relate to each other)
• population validity
extent to which the findings can be generalized to other populations. generalized from the specific sample it was conducted on to the larger population from which the sample was drawn. (generalized to other countries and cultures.)
• Concurrent validity
extent to which the measure predicts a simultaneous behavior. A gold-standard established measure is used to demonstrate this __. (we examine the extent to which score on our measure correlate with score on an already established gold-standard measure of the construct.)
8. A researcher designs a study whose procedures and setting resemble a classroom setting. The researcher is confident that the results of the study will closely approximate what will be observed in the real world regarding the relationship between the independent and dependent variables. He believes the study is high in ______
external validity
8) A researcher is interested in the effects of exercise on memory test performance. She varies whether or not participants exercise/run for 30 minutes & then administers a memory test. Further assume there is noise by a construction crew nearby. *Noise* is an ___
extraneous variable
• advantages of within-subjects design
fewer participants are required (small sample size). Groups are guaranteed to be equivalent in every way. Decreased random variability (less error). Increased power [our ability to find the effects of interest] • if I ran two studies that were the same in every way except one used a between subjects design and one used a within subjects design the effect may only be *significant* in the study using the within subject design.
• Random measurement error
fluctuations in the measuring situation that cause the obtained scores to deviate from a true score: observer error (scoring/grading error); environmental (light/noise), participant (fatigue), faulty equipment
3. Random assignment ensures that participants_____
have an equal probability of being assigned to any of the experimental condition
field studies
have excellent external validity but have low internal validity. experimenters take advantage of naturally occurring circumstances and naturally occurring differences, where experiments are carried out in natural environments
lab experiments
have good internal validity but often suffer from problems with external validity. As the artificiality of the experiment increases, internal validity increases.
*converging operations*
hen psychologists use multiple operational definitions of the same construct—either within a study or across studies.various operational definitions are coming together on the same construct. Scores on several different operational definitions are closely related & produce similar patterns of results
A good operational definition will lead to...
high construct validity. A poor operational definition of the construct will decrease construct validity
high mundane realism = ?
high ecological validity
example of Content validity = attitudes are usually defined as involving thoughts, feelings, and actions toward something. By this conceptual definition, a person has a positive attitude toward exercise to the extent that he or she thinks positive thoughts about exercising, feels good about exercising, and actually exercises
if a researcher conceptually defines test anxiety as involving both sympathetic nervous system activation (leading to nervous feelings) and negative thoughts, then his measure of test anxiety should include items about both nervous feelings and negative thoughts.
when a measure is not sensitive...
if all scores are at the maximum or minimum score level then it will not be possible to detect differences between the groups
• advantages of manipulation check
if pilot study shows that your manipulation was not effective/not strong enough, you have saved the expense of running the actual experiment. (2) if you get non-significant results (fail to find a difference between your 2 groups) & cannot find an effect of the independent variable on the dependent variable. -- identify whether the failure to find a difference between the groups or conditions is due to a problem with the manipulation or whether it is due to something else like a true absence of an effect
• Evaluate studies in terms of their external validity
if the way it was conducted supports generalizing the results to people and situations beyond those actually studied. when the participants and the situation studied are similar to those that the researchers want to generalize to and participants encounter every day, often described as *mundane realism.*
o Example inter-rater reliability
if you were interested in measuring university students' social skills, you could make video recordings of them as they interacted with another student whom they are meeting for the first time. Then you could have two or more observers watch the videos and rate each student's level of social skills.
• wait-list control condition
in which participants are told that they will receive the treatment but must wait until the participants in the treatment condition have already received it. This disclosure allows researchers to compare participants who have received the treatment with participants who are not currently receiving it but who still expect to improve (eventually).
• Single factor *multilevel* design
includes one independent variable with more than *2+* levels/conditons.
ways to reduce demand characteristics/expectation issues
increase psychological realism, pilot test (trial run/practice), use physiological measure for DV, use between-subjects design so participants only exposed to 1 condition, deceive participants about true purpose of study, manipulation of knowledge of hypothesis, interview participants during the debriefing to see if they were correct about hypothesis
• between-groups design uses what statistic?
independent groups t-test
7) A researcher is interested in the effects of exercise on memory test performance. She varies whether or not participants exercise/run for 30 minutes & then administers a memory test. Further assume there is noise by a construction crew nearby. Exercise is an ___
independent variable
o Example of retest reliability
intelligence is generally thought to be consistent across time. A person who is highly intelligent today will be highly intelligent next week. This means that any good measure of intelligence should produce roughly the same scores for this individual next week as it does today. Clearly, a measure that produces highly inconsistent scores over time cannot be a very good measure of a construct that is supposed to be consistent.
17) If two psychologists use the same IQ test to asses Bob's IQ & one finds that he has an IQ of 116 & the other finds that Bob has an IQ of 98, then the test demonstrates poor ____.
inter-rater reliability
• Describe several strategies for recruiting participants for an experiment.
interest in topic, education, need for approval, high IQ, social, high social class. use participants from a formal *subject pool*—an established group of people who have agreed to be contacted about participating in research studies.
3) if participants were not randomly assigned to the conditions but rather chose whether or not they would exercise, then ______ is threatened.
internal validity
two researchers were studying the effects of intelligence on coping strategies. One of the researchers was concerned that her measure of intelligence (grade point average in high school) might not accurately represent intelligence. The researcher is concerned about _________
internal validity
12) John is measuring the number of hours people sleep using the following scale: 0-3 hours; 3-6 hours; 6-9 hours; 9-12 hours. What type of measurement scale is this?
interval
• Interval - scales of measurement
involves assigning scores using numerical scales in which intervals have the same interpretation throughout. (exp: Fahrenheit or Celsius temperatures.) The difference between 30 degrees & 40 degrees represents the same temperature difference as the difference between 80/90 degrees. This is because each 10-degree interval has the same physical meaning
operational definition
is a definition of a variable in terms of precisely how it is to be measured. = Self-report measures, Behavioral measures, & physiological measures
treatment
is any intervention meant to change people's behavior for the better. This intervention includes psychotherapies and medical treatments for psychological disorders but also interventions designed to improve learning, promote conservation, reduce prejudice, and so on.
• dependent variable DV
is measured to determine the effect of the other variable (presumed effect)
• Content validity
is the extent to which a measure "covers" the construct of interest. - the degree to which the items on a measure adequately represent the entire range or set of items that could have been appropriately included
• Criterion validity (variables = criteria)
is the extent to which people's scores on a measure are correlated with other variables that one would expect them to be correlated with. -addresses the ability of a measure to predict an outcome.
o Validity
is the extent to which the scores from a measure represent the variable they are intended to. But how do researchers make this judgment? We have already considered one factor that they take into account—reliability. When a measure has good test-retest reliability and internal consistency, researchers should be more confident that the scores represent what they are supposed to.
o There are several precautions you can take to minimize demand characteristics
make the procedure as clear and brief as possible & guarantee participants' anonymity. You can even allow them to seal completed questionnaires into individual envelopes or put them into a drop box where they immediately become mixed with others' questionnaires. informed consent requires telling participants what they will be doing, it does not require revealing your hypothesis or other information that might suggest to participants how you expect them to respond.
6) in a pilot study assessing the effects of fear on attention, a researcher makes a very loud noise while participant answer questions about their comprehension. then they are asked their fear level due to the noise. but they say they were not afraid because the noise was not loud. researcher decides to produce a louder noise. this is an example of a ___
manipulation check
Learn the key features of true experimental designs
manipulation, measurement, control (MMC) - 1+ independent variables are *manipulated* to produce 2+ conditions. A dependent variable is *measured* & compared across conditions. either random assignment or repeated measures (within) design is used to assign participants conditions & all other aspects are *controlled* to ensure extraneous variables are not confounding
• statistics in single-factor *multilevel* design =
means compared using ANOVA. run post hoc tests if ANOVA is significant to determine which means differ from each other
o (c) implementing the measure (measurement process)
measure in a way that maximizes its reliability and validity.- test everyone under similar conditions that, ideally, are quiet and free of distractions.
what does measurement require?
measurement does not require any particular instruments or procedures. It does not require placing individuals or objects on bathroom scales, holding rulers up to them, or inserting thermometers into them. What it does require is some systematic procedure for assigning scores to individuals or objects so that those scores represent the characteristic of interest.
• Disadvantages of between-subject designs
more participants are required (than other designs). Random assignment does not always work to produce equivalent groups (in small samples). Increases random variability [error variance] due to differences in participant variables decreases power (our ability to find the effects of interest)
• Disadvantage of single factor multilevel design
more time and resources are typically required to conduct study. If each group has different people in it then more participants are required. A researcher is likely to use as strong a manipulation as possible, which increases the likelihood that a significant difference will be found
• Latin Square Design (Partial Counterbalancing)
n x n [number of positions in a series] times [number of orders] matrix in which each condition will appear only once in each column & each row. Constructed so that: each condition appears at each ordinal position & each condition precedes/follows each condition one time. provides a technique to control for order effects without including all possible orders
• advantages of between-subject designs
no concerns that participation in one condition will influence another condition. Fewer concerns with demand characteristics/that participant will know hypothesis. fewer worries with progressive effects (practice & fatigue effect: they may get tired/takes longer)
• Discrete variable
no intermediate values are possible between two adjacent values (gender, number of students) --cannot divide in half/only whole numbers (can't say half a pet)
23) When one describes a variable in terms of the procedures used to measure or manipulate it this is called establishing the_____
operational definition
13) A scientist is measuring hours of sleep using the following scale: 0-2 hours; 2-6 hours; 6-12 hours. What type of scale is used?
ordinal
15) Malfoy is measuring sleep quality by asking people to report whether they slept well, ok, or poorly. What type of measurement scale is this?
ordinal (like sleep)
o Cons of ordinal level
ordinal scales fail to capture important information that will be present in the other levels of measurement. for example, the difference between the responses "very dissatisfied" and "somewhat dissatisfied" is probably not equivalent to the difference between "somewhat dissatisfied" and "somewhat satisfied."
Dr. Sade is planning to conduct an experiment to examine the influence of self-esteem on pain tolerance. She plans to manipulate self-esteem by giving false feedback (positive, negative) on an essay. She measures pain by counting the seconds that a participant leaves their arm in an ice bath. What is the *dependent* variable?
pain tolerance
1) Race is an example of which variable?
participant variable & qualitative variable
• matched-groups design
participants are matched on one or more characteristics & each individual in the matched set is randomly assigned to complete a different condition of the experiment. the matching variable is typically DV or something closely related to the DV (a potentially confounding variable).
o Criterion validity example
people's scores on a new measure of test anxiety should be negatively correlated with their performance on an important school exam. If it were found that people's scores were in fact negatively correlated with their exam performance, then this would be a piece of evidence that these scores really represent people's test anxiety. But if it were found that people scored equally well on the exam regardless of their test anxiety scores, then this would cast doubt on the validity of the measure.
• Participant variable
personal characteristic that differ across individuals (gender, race, age, drug use, etc.)
Explain what internal validity is... and why experiments are considered to be high in internal validity
plausible alternative explanations are ruled out. & confounds diminish external validity.
1) Dr. S is conducting a study to examine exercise in reducing depression. her participant exercise for 30 minutes daily for 1 month. she used students in her sample, which limits her ability to generalize to more severely depression patients. this issue concerns___.
population validity
18) If Raven takes an IQ test when she is 6 & it is found to predict her performance in high school, then the IQ demonstrates good ___.
predictive validity
, when the criterion is measured at some point in the future (after the construct has been measured), it is referred to as...
predictive validity (because scores on the measure have "predicted" a future outcome).
• Measurement -
process of systematically assigning values (numbers, labels or symbols) to present attributes of organisms, objects, or events. --is the assignment of scores to individuals so that the scores represent some characteristic of the individuals.
2) Room temperature is an example of a...?
quantitative variable & environmental variable
• random assignment
randomly assigning individuals from the sample to participate in each of the experimental conditions. used to create equivalent groups [to increase *internal* validity]
o The items in *ordinal* scale are ranked/ordered...
ranging from *least to most satisfied.* - Unlike nominal scales, ordinal scales allow comparisons of the degree to which two individuals rate the variable. For example, our satisfaction ordering makes it meaningful to assert that one person is more satisfied than another with their microwave ovens.
• Inter-rater reliability
ratings of research assistants are correlated with each other
11) If we measured the amount of water people drank in a day in milliliters then the measurement scale would be___
ratio
14) Clark is measuring the number of hours people sleep by asking them to report it. what type of measurement scale is this?
ratio (like amount of water)
o physiological measures
recording any of a wide variety of biological processes, including heart rate and blood pressure, galvanic skin response, hormone levels, and electrical activity and blood flow in the brain.
• There are two distinct criteria by which researchers evaluate their measures:
reliability and validity. Reliability is consistency across time (test-retest reliability), across items (internal consistency), and across researchers (interrater reliability). Validity is the extent to which the scores actually represent the variable they are intended to.
• within-groups design uses what statistic?
repeated measures t-test
• Quantitative variables
represent properties that differ in amount (age, weight, volume)
• Qualitative variables
represent properties that differ in type (hair color, religion)
• Example of manipulation check
researchers trying to manipulate participants' stress levels might give them a paper-and-pencil stress questionnaire or take their blood pressure—perhaps right after the manipulation or at the end of the procedure—to verify that they successfully manipulated this variable.
• Advantage of multilevel design
save the time, expense and resources of running several different experiments and it allows us to detect curvilinear relationships.
floor effect
scores on a dependent variable bunch up at the *minimum* score level (very difficult test)
ceiling effect
scores on a dependent variable bunch up at the maximum score level (extremely easy test).
Dr. Blank is planning to conduct an experiment to examine the influence of self-esteem on pain tolerance. She plans to manipulate self-esteem by giving false feedback (positive, negative) on an essay. She measures pain by counting the seconds that a participant leaves their arm in an ice bath. What is the *operational definition* of the independent variable?
seconds arm held in ice bath
• Categories of measurement
self report/survey --physiological (heartbeat, cortisol, MRI) --behavioral [rate/reaction time]
1) Dr. Sade is planning to conduct an experiment to examine the influence of self-esteem on pain tolerance. She plans to manipulate self-esteem by giving false feedback (positive, negative) on an essay. She measures pain by counting the seconds that a participant leaves their arm in an ice bath. What is the independent variable?
self-esteem
20) What does a picture of a target, with all the arrows at the same spot, but not in the middle (not where they are supposed to be), then what does this picture represent?
something that is reliable but not valid
22) What does a picture of a target with random arrows, missed everywhere, represent?
something that is unreliable and not valid
split-half correlation (for Internal consistency)
splitting the items into two sets, such as the first and second halves of the items or the even- and odd-numbered items. Then a score is computed for each set of items, and the relationship between the two sets of scores is examined.
• Understand *reliability* and learn about the different ways reliability can be assessed
stability or consistency of a measure. Produce same results again & again. 3 types of consistency: over time (test-retest reliability), across items (internal consistency), and across different researchers (inter-rater reliability).
demand characteristics
subtle cues that reveal how the researcher expects participants to behave. refers to cues that influence participants' beliefs about the hypothesis being tested & the behaviors expected of them. Responses affected by features of settings & procedures, along with info & rumors about the study.
• Psychological constructs
such as intelligence, self-esteem, and depression are variables that are not directly observable because they represent behavioral tendencies or complex patterns of behavior and internal processes. An important goal of scientific research is to conceptually define psychological constructs in ways that accurately describe them.
13) what type of statistic would you need to calculate for a study on self-esteem (low vs. high) on pain tolerance (seconds arm held in ice)?
t-test
19) If two groups of students complete my new DFAQ_CU inventory on two different occasions & their scores on the two occasions are found to correlate highly, then my inventory has demonstrated good___.
test-retest reliability
sensitivity
the ability to detect an effect that is actually present. These measures that can detect differences or changes that occur in response to a manipulation of the independent variable.
o Test-retest correlation example
the correlation between two sets of scores of several university students on the Rosenberg Self-Esteem Scale, administered two times, a week apart. The correlation coefficient for these data is +.95. In general, a test-retest correlation of *+.80* or greater is considered to indicate good reliability. (assumed to be consistent over time, which is the case for intelligence, self-esteem, and the Big Five personality dimensions. ).
Internal validity
the degree to which we can be confident that a study demonstrated that one variable had a causal effect on another variable. Our ability to draw conclusions about casual relationships from data
• Parallel forms of reliability
the extent in which scores on two different but equivalent versions of the same measure, administered to the same participants, at two different times, correlate with each other
• Test-retest reliability
the extent to which scores on the same measure, administered to the same participants, at two different dimes, under equivalent conditions, correlate with each other. --a construct must be consistent across time
temporal validity
the extent to which the findings can be generalized to other points in time (other seasons, past years, future years). it may be more difficult to generalize the results of a study that was conducted 50 years ago than to generalize the results of a study conducted 5 years ago.
• Predictive validity
the extent to which the measure accurately predicts a later outcome or future behavior
o Assessing test-retest reliability requires using...
the measure on a group of people at one time, using it again on the same group of people at a later time, and then looking at the correlation between the two sets of scores. This is typically done by graphing the data in a scatterplot and computing the correlation coefficient.
replication
the process of repeating a study in order to determine whether the original findings will be upheld. & self-correcting mechanism that ensures that methodological flaws are eventually discovered.
• Split-half reliability
the test items are divided in half, scores on each half are calculated, and the correlation between the scores on the two halves is computed (indicator of internal consistency)
o how do researchers know that the scores actually represent the characteristic, especially when it is a construct like intelligence, self-esteem, depression, or working memory capacity?
they conduct research using the measure to confirm that the scores make sense based on their understanding of the construct being measured. Psychologists do not simply assume that their measures work. Instead, they collect data to *demonstrate* that they work.
o Cons of interval
they do not have a true zero point even if one of the scaled values happens to carry the name "zero." Zero degrees Fahrenheit does not represent the complete absence of temperature. Has no true zero point; it does not make sense to compute ratios of temperatures.
o Stevens's levels of measurement are important because...
they emphasize the generality of the concept of measurement. Categorizing or ranking individuals is a measurement, as long as they represent some characteristic of the individuals. (2) levels of measurement can serve as a rough guide to the statistical procedures that can be used with the data
Learn about placebo effects and experimenter expectancy effects and how to control for them
training experimenters in a consistent manner, automating procedures (instructions recorded by a computer), run all conditions simultaneously so that experimenter's behavior is the same for all participants; *replication* helps ensure that results due to bias are identified
• Measured score = ?
true score + random error
• independent groups design
type of between-subjects design where participants are randomly assigned to the various conditions of the experiment
• Constructs
underlying, hypothetical characteristics or processes that cannot be directly observed but instead are inferred from measurable behaviors or outcomes. (problem) some cannot be observed directly, like attraction. So researchers rely on systematic empiricism/making observations (solution: operational definitions)
• In order to detect curvilinear relations, anything other than a strictly linear relationship, the researcher must...
use more than 2 conditions; that is, he or she must use more than 2 levels of the independent variable.compare 3 groups of participants, one who receives the new drug, a group that receives Paxil and a group that receives a placebo.
• Understand how and why researchers operationally define their variables
variables like word length & weight are easy to measure. But abstract vague constructs like attraction, love, aggression are more difficult to define, which makes them harder to measure
True Score (Reliability)
what you would obtain if there was no error in the measure
• *psychological realism* - degree to which the experimental setting is made psychologically involving for participants, thereby increasing the likelihood that they will behave naturally. (Milgram's study). if the experiment has an impact on the participants, participant become immersed in the situation and behave naturally.
where the same mental process is used in both the laboratory and in the real world. If the students judged purple to be more appealing than yellow, the researchers would not be very confident that this preference is relevant to grocery shoppers' cereal-buying decisions because of low external validity but they could be confident that the visual processing of colors has high psychological realism.
• Errors in validity & reliability
with reliability we worry about random measurement error, while with validity we worry about systematic error (bias)
1. An experimenter conducts a study with three different conditions. The experimenter ensures that participants engage in every condition of the experiment. The experimenter is using a______
within-groups design
what experiments are often used to determine whether a treatment works?
• Between-subjects designs
what is true about both reliability & validity?
• Reliable measures do not need to be valid (bathroom scale has consistent results but cannot measure IQ) • Valid measures must be reliable