Psych 302 Test II
Alpha Level
probability that events occur out of chance (Type 1 Error)
Z - Score
How many standard deviations a single score is above or below the mean
F Test
(For studies that compare two or more group means, compare within and between groups) - ANOVA
Purposive Sampling
If researchers want to study only certain kinds of people, they only recruit those particular participants. When it is done in a nonrandom way, it's called purposive. ii. Limiting a sample to only one type of participant does not make a sample purposive.
Wait it Out
A researcher who plans to observe at a school might less the children get used to his or her presence until they forget about being observed
Oversampling
A variation of stratified sampling in which the researcher intentionally over-represents on or more groups. ii. Ex: If a researcher has 1000 participants and wants to have a representative percentage of South Asians (4% of the total population), 40 individuals might not be enough to have an accurate representation of how those individuals actually feel/react. Instead, the researcher might oversample this group. iii. A survey that includes an oversampled group are weighted to their actual proportions in the population.
Writing Well-Worded Questions
A. Avoid Leading Questions: B. Opt for Simplicity: C. Avoid Double-Barreled Questions D. Avoid Negative Wording: c. Can lower construct validity E. Question Order G. Using Shortcuts H. Trying to Look Good: i. Ex: Implicit Association Test I. Halo Effect
Cross Cultural Research
A. Translational: Research that uses knowledge derived from basic research to develop and test solutions to real-world problems. B. Ecological Validity: Cultural Differences in behavioral norms, worldviews etc. C. Differences across physiological measurement systems (machine related error)
Measure Behaviors Results
Another way to avoid reactivity is to use unobtrusive data 1. Instead of measuring behavior; researchers measure the traces that a behavior leaves behind. ii. Ex: In a museum wear-and-tear on the flooring can signal which area of the museum are the most popular and the height of smudges on the window can indicate the ages of visitors; the number of empty liquor bottles in a residential garbage cans can indicate how much alcohol is being consumed in a community.
Observational Measures
B. Observational Measures: Sometimes called a behavioral measure, operationalize a variable by recording observable behaviors or physical traces of the behaviors. a. Ex: A researcher could operationalize happiness by observing how many times a person smiles b. Type of Reliability: Interrater; not often but maybe. c. Validity i. Criterion: predictive of behavior
Types of Variables - Outcome
B. Outcome a. Dependent Variable i. The variable being measured. b. Criterion i. The extent to which the a measure is related to an outcome
Larger Samples Don't Mean...
C. LARGER SAMPLES ARE NOT NECCISARILY MORE REPRESENTITIVE. a. If sample is selected using probabilistic measures, then precision increases with sample size (smaller margins of standard error). b. Sample size is a statistical validity issue c. Sampling strategy is an external validity issue
Physiological Measures
C. Physiological Measures: Operationalizes a variable by recording biological data such as brain activity, hormone levels, or heart rate. a. Usually require the use of equipment to amplify, record, and analyze biological data. i. (EMG) - a way of electronically recording tiny movements in the face. ii. fMRI scans show relatively less brain activity for complex problems in people who are more intelligent. b. Ex: A physiological way to operationalize people's level of stress might be to measure the degree of the hormone cortisol that is released in their saliva. c. Type of Reliability: Test-retest reliability. (Phantom scans insure reliability of scanners; Human trials - different trials within the same visit). d. Validity: i. Convergent ii. Discriminant iii. Criterion: Concurrent, predictive, or postdictive
Coding Responses
Clear rating scales and codebook
Cluster Sampling
Clusters of participants within a population of interest are randomly selected and then all individuals in each selected cluster are used i. Ex: Researcher starts of with a list of colleges (clusters) in a state and randomly selects five of those colleges (clusters) and then includes each student within those five colleges in the sample (Representative).
Likert
Contains more than one item and is anchored by the terms "strongly agree" and "strongly disagree." Commonly used measure of self-esteem.
Cronbach's Alpha
Correlation-based statistic = CRONBACH'S ALPHA aka. Coefficient alpha. Used to check for internal reliability The formula for Cronbach's alpha returns one number, computed from the average of the inter-item correlations and the number of items in the scale. The closer the Cronbach's alpha is to 1, the better the scale's reliability. For self-reported measures, expected Cronbach's alpha is .7- or higher Statistical tests for types of measurement reliability Internal Reliability
Simple Random Sampling
Each person in a population has an equal chance of being chosen for the sample
Observer Effects
Expectancy Effects, this phenomenon can occur even in seemingly objective observations. The observers actually change the behavior of those they are observing. b. Clever Hans - clever horse that people believed could add/subtract but was actually looking for visual cues from those asking him questions. c. Solutions i. Blinding/Masking ii. Unobtrusive observations
Opt for Simplicity
Information, wording, questions all should be clear a. Choose simple over specialized words b. Choose as few words as possible c. Avoid linguistic ambiguity d. Use complete sentences e. Avoid slang/jargon
Quota Sampling
The researcher identifies subsets of the population of interest and then sets a target for the number in each category of the sample. ii. Next, the researcher samples from the population of interest non-randomly until the quota are filled. 1. In quota sampling the participants are selected nonrandom (perhaps through convenience or purposive) 2. In stratified random sampling they are selected using random selection techniques.
Open Ended Questions
a. Allow respondents to answer any way they like. b. Provide researchers with spontaneous, rich information. c. However, the responses must be coded and categorized, a process that can be difficult and time-consuming i. In response, psychologists often restrict the answers people can give.
Test-Retest
a. Consistent across time, coefficient of stability b. R c. Most relevant for stable/trait like constructs
Biased Sampling Techniques
a. Non-probability sampling. Usually only include people who are easy to find or people who self-select into the sample b. Nonresponse Bias: c. Selection Bias: e. Purposive Sampling: g. Quota Sampling: i.
Halo Effect
The tendency for an impression created in one area to influence opinion in another area. i. Ex: "the convertible furnishes a sporty image and provides a halo effect for other cars in the showrooms"
Self-Reporting More than they can Know...
A. Are people capable of reporting accurately on their own feelings, thoughts, and actions? a. Ex: Women when asking which pair of stockings they preferred (all were the same, yet a majority of the women selected one on the end) they easily formulated answers for the researchers, but their answers had nothing to do with the real reason they selected one pair of stockings. The women did not seem to be aware they they were inventing a justification for their preference. People may not be able to accurately explain why they acted in the way they did. B. Memories are not always accurate... a. Surveys and polls can be excellent measures of people's subjective states; of what they think they are doing, and have what they think is influencing their behavior. However to know what they are really doing it may be better to watch them. b. Many researchers prefer to observe behavior directly rather than rely on self-reports.
Categorical Variables
A. Categorical Variable (Qualitative): There are levels of categories. Also called nominal variables. (Ex: Sex whose levels are male and female; species of money in an experiment. THE NUMBERS DO NOT HAVE NUMERICAL MEANING. a. Categorical b. Dichotomous: compares two groups. c. Varies in kind - averages not meaningful
Choosing Question Format
A. Open Ended Questions: B. Forced-Choice Format C. Likert D. Semantic Differential Format: E. Visual Analog Scale: F. Coding Responses
Types of Variables - Predictors
A. Predictor: a. Independent i. The variable being manipulated b. Antecedent: i. A stimulus that cues an organism to perform a learned behavior. When an organism perceives an antecedent stimulus, it behaves in a way that maximizes reinforcing consequences and minimizes punishing consequences. c. Subject: i. A person the experiment is based on.
Probability Sampling
A. Probability Sampling: random sampling that offers a representative sample of the population in question. a. Simple Random Sampling: b. Cluster Sampling: c. Multistage Sampling: d. Stratified Random Sampling: e. Oversampling: f. Systematic Sampling g. Combining Techniques:
Self-Report Measures
A. Self-Report Measures: Operationalize a variable by recording people's answers about themselves in a questionnaire or interview. a. Interviewers might ask respondents to report on the frequency of specific events they might have experienced in the past year b. Interviews of children might replace self-reports by reports of their parents or teachers (bare in mind that these self-report measures might have traces of bias. c. Reliability: i. Test-retest ii. Internal (alpha) d. Validity: i. Convergent ii. Discriminant iii. Criterion: Predictive, Postdictive, concurrent
Survey Research
A. Strengths: a. Why/when should I use survey methods? i. Good way to gather data on subjective states, attitudes, opinions, and traits ii. Good way to gather initial information about a phenomenon iii. Relatively cheap b. Easy to administer to a large sample i. People generally like taking surveys B. Weaknesses: a. Cannot manipulate variables (no causation) i. Presentation/social desirability concerns - participants might change answers in order to be viewed in a better light b. People aren't always good at introspection and remembering i. Inherently subjective
Quantitative Variables
B. Quantitative Variables: The numbers are coded in meaningful ways. (Ex: Height and weight are quantitative because they are measured numbers, IQ numbers are measured in that they measure intelligence). Varies in amount a. Ordinal Scale: Applies when the numerals of quantitative variable represent a ranked order. (Discrete) b. Interval Scale: Applies to the numerals of a quantitative variable that meets two conditions. (Continuous) i. The numerals represent equal intervals between levels ii. There is no true zero (Ex: IQ Test) c. Ratio Scale: Applies when the numerals of a quantitative variable have equal intervals and when the value of zero truly means zero. (Continuous)
Cohen's D
How far apart two means are in standard deviation units, and how much overlap there is between steps
Special Consideration When Observing Behavior
Experimenter i. Clear unambiguous operationalization ii. Clear rating scales (codebook) iii. Hire independent coders b. Coders/Observers i. Observer Bias ii. Interrater Reliability 1. Multiple Coders 2. Statistics a. (R) - Continuous variables (2 coders) b. Kappa (k) - categorical variables c. ICC (interclass correlation coefficient) - continuous variables, more than two variables c. Participants/Subjects i. Observer Effects ii. Reactivity d. Ethics i. Public place and no interference OR informed consent (with proper ethics board approval).
Generalizability
Generalizability: Always an important consideration, especially in frequency data. i. Desired Validity: Want variability in both predictors and outcomes. ii. Undesired Validity: random error (low reliability); systematic error (high reliability)
Implicit Measures
Indirect Assessments: Useful when self-report is expected to be inaccurate (when participants are unwilling or unable to provide accurate information. (Ex: implicit association test, word fragment completion test. Implicit Bias: Can be opposite of explicit beliefs/values (Cultural values/norms) Not Harmless (affect behavior); Not unchangeable/inevitable - below conscious awareness; once aware, conscious efforts can change bias
Cohen's Kappa
Measure of interrater agreement for categorical items Wants to take the agreement by chance out of the equation
Combining Techniques
Methods of sampling may combine techniques mentioned. ii. Researchers might do a combination of multistage sampling and oversampling iii. Researchers might supplement random selection with other techniques to control for bias.
Selection Bias
Most research participants are volunteers. i. Self-selection occurs on most Internet polls and can cause serious problems for external validity. ii. Rate My Professor: People who care enough to write one only write ratings. These people are self-selecting themselves when they right one. d. Convenience Sampling: non-probabilistic sampling, uses samples that are chosen merely on the basis of who is easy to access
Avoid Negative Wording
Negative worded questions can also make survey items unnecessarily complicated. Whenever a question is cognitively difficult for people, it can cause confusion and thus reduce the construct validity of a survey b. When possible negative wording should be avoided but researchers sometimes ask questions both ways: i. Abortion should never be restricted ii. I favor strict restrictions on abortions iii. By asking the question both ways, the researcher can study the items internal consistency to see whether people respond similarly to both questions.
Observer Bias
Occurs when observers expectations influence their interpretation of the participants behaviors or the outcome of the study b. Observers rate behaviors not objectively but according to their own expectations or hypothesis c. Solutions i. Blind Design, in which the observers are unaware of the conditions to which participants have been assigned and are unaware of what they study is about ii. Clear rating scales/codebooks iii. Multiple Observers
Reactivity
Occurs when people change their behavior (react) in some way when they know another person is watching. They might be on their best behavior - or in some cases, their worse behavior -rather than displaying their typical behavior. Reactivity occurs not only with human participants but also with animal subjects. Solutions → a. Bend In: b. Wait it Out c. Measure the Behavior's Result
Bend In
One way to avoid observer effects is to make unobtrusive observations where one makes him or herself less noticeable. i. Developmental researchers might sit behind one-way mirrors ii. In a public setting, a researcher might look like a casual observer
Forced Choice Format
People give their pinion by picking the best of two or more options. b. Strengths: i. Easy to code and analyze ii. Easy for respondents iii. Can easily be used in large samples. c. Ex: Narcissistic Personality Inventory - instrument asks people to choose one statement from each of 40 pairs of items.
Postdictive
Postdictive: It can also refer to when a test replaces another test (i.e. because it's cheaper). For example, a written driver's test replaces an in-person test with an instructor. Used if the test is a valid measure of something that happened before
Predictive
Predictive: If scores on a measurement are accurately able to predict future performance of some other measure of the construct they represent. 1. Ex: SAT or ACT are predictive of future success. 2. Compare test score to performance in the future 3. Compare to tests that have been previously validated.
Quick Terminology
Quick Terminology: • Population: Entire set of people the researcher is interested in. • Sample: A subset of the population used in research. • Stratum: A variable that divides the sample into mutually exclusive segments. A. Populations and Samples a. Definitions: i. Population: The entire set of people or products in which you are interested ii. Sample: Smaller set taken from that population iii. Census: getting results from every person in a given population b. Coming from a Population vs. Representing that Population i. Coming from a population: Not sufficient by itself; not necessarily representative of that population. 1. Biased Sample: Some members of the population of interest have much higher probability if being included in the sample compared to other members. ii. Representative Sample: All members of the population have an equal chance of being included in the sample. 1. Only representative samples allow us to make inferences about the population of interest
Known Group Paradigms
Researcher see whether scores on the measure can discriminate among a set of groups whose behavior is already well understood. Can be used to validate self-report measures. Sometimes difficult to measure validity with physiological measures Reverse inference (measure → construct): How accurately can knowing the physiological responses alone predict the physiological state of interest (construct). Known groups have a postdictive criterion.
Semantic Differential Format
Respondents might be asked to rate a target object using a numerical scale that is anchored with adjectives i. Common Five Star Rating System ii. Easy → Hard
Using Shortcuts
Response Sets: Also known as non-differentiation, are types of shortcut respondents can take when answering survey questions. Although response sets do not cause many problems for answering a single, stand-alone item, people might adopt a consistent way of answering all the questions - especially toward the end of a long questionnaire. People might answer them all positive, negative, or neutral. b. Acquiescence: "Yes-saying" to all of the questions. i. The survey could accidentally be measuring the tendency to agree, or the lack of motivation to think and rate carefully. ii. One way to account for this is to ask reverse-worded items 1. Race relations are going well in this country a. Strongly Disagree → Strongly Agree 2. Race relations are going terribly in this country a. Strongly Disagree → Strongly Agree c. Fence Sitting: Playing it safe by answering in the middle of the scale, especially when survey items are controversial (or answering IDK). i. When a scale contains an even number of response options, the person has to choose one side or the other.
Three Common Types of Measures
Self-Report Observational Physiological
Snowball Sampling
Snowball Sampling: Help researchers find rare individuals. This sampling involves asking participants to recommend a few acquaintances for the study. Participants are supplying other participants.
Trying to Look Good
Socially Desirable Responding/Faking Good: Respondents are embarrassed, shy, or worried about giving an unpopular opinion, they will not tell the truth on a survey or other self-report measure. A similar but less common phenomenon is "faking bad" b. To avoid SDR, a researcher might ensure that the participants know their responses are anonymous - perhaps by conducting the survey online, or by having people put their unsigned responses into a large closed box. Anonymity may not be the perfect solution. c. One way to minimize this problem is to include special survey items that identify socially desirable responders with target items: i. "My table manners at home are as good as when I eat out in a restaurant; I never hesitate to go out of my way to help someone in trouble" ii. If people agree with may such items, researchers may discard that data from the final set under suspicion that they are exaggerating on the other survey items or not paying close attention in general. iii. Or, researchers might ask friends to rate each other. d. Researchers increasingly use special computerized measures to evaluate people implicit opinions about sensitive topics. One widely used test
Types of Measurement Validity
Subjective: Face, Content Related Empirical Construct Related (Discriminant, Convergent) Criterion
Types of Measurement Reliability
Test-Retest Interrater Internal
Multitrait Multimethod Matrix
The MTMM is simply a matrix or table of correlations arranged to facilitate the interpretation of the assessment of construct validity. The MTMM assumes that you measure each of several concepts (called traits by Campbell and Fiske) by each of several methods (e.g., a paper-and-pencil test, a direct observation, a performance measure). The MTMM is a very restrictive methodology -- ideally you should measure each concept by each method.
Matrix Diagram
The Reliability Diagonal (monotrait-monomethod) Estimates of the reliability of each measure in the matrix. You can estimate reliabilities a number of different ways (e.g., test-retest, internal consistency). There are as many correlations in the reliability diagonal as there are measures -- in this example there are nine measures and nine reliabilities. The first reliability in the example is the correlation of Trait A, Method 1 with Trait A, Method 1 (hereafter, I'll abbreviate this relationship A1-A1). Notice that this is essentially the correlation of the measure with itself. In fact such a correlation would always be perfect (i.e., r=1.0). Instead, we substitute an estimate of reliability. You could also consider these values to be monotrait-monomethod correlations. The Validity Diagonals (monotrait-heteromethod) Correlations between measures of the same trait measured using different methods. Since the MTMM is organized into method blocks, there is one validity diagonal in each method block. For example, look at the A1-A2 correlation of .57. This is the correlation between two measures of the same trait (A) measured with two different measures (1 and 2). Because the two measures are of the same trait or concept, we would expect them to be strongly correlated. You could also consider these values to be monotrait-heteromethod correlations. The Heterotrait-Monomethod Triangles These are the correlations among measures that share the same method of measurement. For instance, A1-B1 = .51 in the upper left heterotrait-monomethod triangle. Note that what these correlations share is method, not trait or concept. If these correlations are high, it is because measuring different things with the same method results in correlated measures. Or, in more straightforward terms, you've got a strong "methods" factor. Heterotrait-Heteromethod Triangles These are correlations that differ in both trait and method. For instance, A1-B2 is .22 in the example. Generally, because these correlations share neither trait nor method we expect them to be the lowest in the matrix. The Monomethod Blocks These consist of all of the correlations that share the same method of measurement. There are as many blocks as there are methods of measurement. The Heteromethod Blocks These consist of all correlations that do not share the same methods. There are (K(K-1))/2 such blocks, where K = the number of methods. In the example, there are 3 methods and so there are (3(3-1))/2 = (3(2))/2 = 6/2 = 3 such blocks.
Double-Barreled Words
The Wording of questions can become so complicated that respondents have trouble answering in a way that accurately reflects their opinions. b. Double-barreled question ask two questions in one, and have poor construct validity because people might be responding to the first half of the question, the second, or both. c. Careful researchers would ask both questions separately
Leading Questions
The intention of a survey is to capture respondents' true opinions. b. Ex: i. Do you think that relations between Blacks and Whites 1. - Will always be a problem? 2. - Or that a solution will eventually be worked out? 45% ii. Do you think that relations between Blacks and Whites 1. - Are as good as they're going to get? 2. - Or will they eventually get better? 75% iii. There is a difference in meaning because of the wording of the two versions. Framing race relations as a problem that needs to be worked out is more negative than framing race relations as good and possibly getting better.
Question Order
The order in which questions are asked can also affect the responses to a survey. i. Ex: Wilson found that Whites reported more support for affirmative action for minorities when they had first been asked about affirmative action for women. ii. To prepare different versions of a survey with the questions in different sequences. That way, researchers can look for order effects. F. Encouraging Actual Responses a. In certain situations, people are less likely to respond accurately. They might not make an effort to think about each question b. Giving Meaningful Responses Efficiently: i. Some students are skeptical that people can ever report accurately on surveys. However, self-reports can sometimes be ideal. ii. More importantly, self-reports often provide the most meaningful information you can get. iii. Some traits are not very observable
Stratified Random Sampling
The researcher selects particular demographic categories on purpose and then randomly selects individuals within each of the categories. Tries to have representative samples, especially of minority groups.
Multistage Sampling
Two random samples are selected. i. A random sample of clusters, then a random sample of people within those clusters. 1. A researchers starts with a list of colleges (clusters) in the state and selects a random five of those colleges 2. Then the researcher selects a random sample of students from within each of the five selected colleges. (Representative)
Systematic Sampling
Using a computer or a random number table, the researcher starts by selecting two random numbers. 1. Ex: If the numbers are 4 and 7 and the population of interest is a room full of students, the experimenter might start off at four and count every 7th student to find his or her sample.
Goals of Sampling
a. Generalization: Maximize external validity, the extent to which the results of the study can be generalized to the rest of the population. Must be generalized using a probabilistic based method drawing from clearly defined populations. b. Theoretical: test a hypothesis derived from a theory - often the goal in a lab study. Maximize internal validity; minimize threats to internal validity.
Internal
a. Consistent across items in a measurement instrument/survey b. Coefficient alpha - continuous variables i. Correlation based statistic, which sees if measurements scales have internal reliability. First they collect data on the scale from a large sample of participants and then compute all possible correlations among the items. ii. Returns one number, computed from the average of the inner-item correlations and the number of items on the scale. Typically look for a number >.70 iii. The closer the number is to 1, the better the scale's reliability. If the internal reliability is good, the researchers can average all the items together. c. (KR-20) -dichotomous variables (e.g. true/false) d. Most relevant for self-report/survey measures
Interrater
a. Consistent across raters/coders b. R - continuous variables c. Kappa (k) - categorical variables i. Measures the extent to which two raters place participants into the same categories. Kappa close to 1.0 means that two raters agreed. d. Reasons i. One reason it could be low is that observers did not have clear enough operational definition of happiness to work with. ii. Another reason could be that one or both of the coders have not been trained well enough. 1. A scatterplot can be a helpful tool in assessing the agreement between two administrations of the same measurement or between two coders. Most relevant for observational measures, self report measures
Empirical Validity - Construct Related:
a. Construct Related: i. Convergent: The measure should correlate more strongly with other measures of the same constructs. 1. One must look at the weight and pattern of evidence. ii. Discriminant: It should correlate less strongly with measures of different constructs. 1. What matters is if the correlation is weak, not that it is negative. 2. It is not usually necessary to establish discriminant validity between a measure of something that is completely unrelated. 3. Instead researchers worry about discriminant validity when they want to be sure that their measure is not accidentally capturing a similar but different construct.
Subjective Validity
a. Face: i. The extent that it appears to experts to be a plausible measure of the variable in question. b. Content-Related: i. A measure must capture all parts of a defined construct. ii. Ex: The conceptual definition of IQ contains the distinct elements such as the ability to reason, plan, solve problems, think abstractly etc. To have good content validity, an operationalization of IQ should include questions or items to assess each of these
Methods of Observing Behavior
a. Visual/In-person observations: i. Leon Festinger (1956) 1. Written observations, qualitative data 2. Ethnography ii. Cognitive Dissonance Theory iii. Limitations b. Indirect Measures of behavior c. Video recordings d. EAR (Electronically Activated Recorder) i. Time sampling ii. Naturalistic Observations iii. Maximize Ecological Validity
Empirical Validity - Criterion Related
b. Criterion Related: (Is the measure under consideration related to a concrete outcome; is it associated with what it's supposed to be associated with and not associated with what it's not supposed to be associated with). i. Predictive: compares the measure in question with an outcome assessed at a later time. ii. Concurrent: Reflects only the status quo at a particular time. iii. Postdictive: It can also refer to when a test replaces another test (i.e. because it's cheaper). For example, a written driver's test replaces an in-person test with an instructor. Used if the test is a valid measure of something that happened before
R
correlation coefficient - for interrater must be greater than .7, for test-retest it must be greater than .5
Concurrent
i. ii. Concurrent: Degree to which scores on a measurement are related to other scores, on other measurements, that have already been established as valid 1. Both measures are taken at the same time 2. Previously valued tests iii.
Content Related
i. A measure must capture all parts of a defined construct. ii. Does the test contain items from the desired domain? 1. Based upon assessments by experts in the domain fields 2. Content validity is not tested for, rather it is assured by the informed item selections made by the experts in its domain. iii. Ex: The conceptual definition of IQ contains the distinct elements such as the ability to reason, plan, solve problems, think abstractly etc. To have good content validity, an operationalization of IQ should include questions or items to assess each of these components.
Visual Analog Scale
is a psychometric response scale which can be used in questionnaires. It is a measurement instrument for subjective characteristics or attitudes that cannot be directly measured.
Non-response bias
reluctant and nonresponse people are typically not included in the sample (differences in people who are likely to respond). - Can't make people respond.
Cognitive Dissonance Theory
we act to reduce the discomfort we feel when two of our thoughts are inconsistent