PSY 10A Final Prep

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

The Science of Psychology

A hub science

Theory

An interrelated set of concepts that coherently explain and describe a body of evidence, and must generate novel, falsifiable hypotheses • Often, in psychology, are verbally stated • Models: often mathematical formalizations of theories • Composed of abstract theoretical constructs, and the relationships between them

Write up of results

As predicted, there was a significant difference in memory performance for those that received soft music (M = 9.00, SD = 1.41) compared to no music (M = 4.33 , SD = 1.51 ) or loud music (M = 4.13, SD = 1.17), F(2, 15) = 24.05, p < .001. Post-hoc tests confirmed that the soft music group performed significantly better than both the no music and loud music groups (ps < .001). There was no significant difference in performance between the no music and loud music groups (p > .250).

Parametric data

Assumptions • Normal distribution • Uses means and SDs • Variances are equal Ratio and Interval

Type II Error

Claiming there is NOT a statistically significant difference between two groups WHEN THERE IS A DIFFERENCE IN REALITY ~ MISS

Type I Error

Claiming there is a statistically significant difference between two groups WHEN THERE IS NOT A DIFFERENCE IN REALITY ~ FALSE ALARM

Crafting hypotheses for Single Factor designs

Hypotheses: • A claim about the world that is testable and falsifiable • State a predicted (causal) relationship between (operationalized) variables • Your hypothesis must include the comparison group and the direction of the effect (which group is predicted to preform better/worse) • H1: It was hypothesized that bilinguals have superior cognitive inhibition as indexed by quicker response time and greater accuracy on the Stroop task compared to monolingual controls.

Ratio Scales (NOIR)

Same properties as interval, but has a zero point that indicates an absence of the quantity measured. 0 = absence of quantity measured (cannot be negative). Even if 0 cannot be observed, theoretically, it's there. i.e. ACC, RT, Kelvin, GPA

Measures of dispersion

Standard Deviation, Range • Describes the spread of the data • Range: Largest minus smallest value • Sensitive to outliers • Standard Deviation (SD): the average amount by which a datapoint differs from the sample mean • In a normal distribution the majority (~68%) of datapoints are +/- 1 SD from the mean • We expect scores to fall around the mean by the SD • Means without their SD are impossible to interpret!

Which is worse - Type I or Type II Error?

Type I Error

Normal Distribution

mean = median = mode

Negative skew

mean, median then mode outliers are on the negative tail. mean is dragged down.

Positive skew

mode, median, then mean outliers are on the positive tail. mean is dragged forward.

Descriptive Statistics

summarizes the set of data the researcher collected

Varieties of 2 x 2 Designs and ANOVA

• 2 x 2 between-subjects design = 2 x 2 between-subjects ANOVA • 2 x 2 within-subjects design = 2 x 2 repeated measures ANOVA • 2 x 2 mixed design (one IV between, one within) = 2 x 2 mixed ANOVA • 2 x 2 quasi-experimental design (one true IV that's within or between and a quasi-IV) = mixed or between-subjects ANOVA depending on the true IV

Nominal Scales (NOIR)

• Defines a set of categories, discrete data; no in-between, either in a category or not. • Values have different names and are weighted equivalently i.e. religious affiliation, ethnicity, political affiliation

Statistical decision making

• If p < α (where α = 0.05), then reject the null hypothesis; otherwise fail to reject the null. • Do we have a statistically significant result? What about effect size?

Quasi-Experimental Design Example

• Impact of schizophrenia on the hippocampus during memory tasks • Quasi-IV: Healthy controls vs. patients with schizophrenia • "Groups were matched for age, gender, handedness, and parental education" • DV: Brain oxygen consumption (BOLD) during a memory task measured by fMRI

Participants

• Includes age, sex, distinguishing features and how participants were recruited • How were categories of participants formed? • E.g., how was low vs high SES operationalized?

Operationalizing an Experiment

• Independent and dependent variables are designed to manipulate and measure theoretical constructs of interest • Endless debate is possible about the construct validity of IVs and DVs • Are we manipulating / measuring what we think we are manipulating / measuring? • It's often a judgment call made on data, theory, and professional opinion

Empiricism / Empirical Observations

• Knowledge comes from systematic experiences • Replicable, objective observations or demonstrations of a phenomenon - often through experiments or other careful tests • The Scientific Method uses empirical observations to generate explanations about the universe

Law

• Law: universally true statements • Not very many in science, and rare in psychology because of equifinality • Usually mathematically formalized

What makes for a good theory?

• Testable: open to empirical tests • Generate hypothesis (claims) that lead to observations • Coherent: theories tie together data and logic to produce explanations about phenomena that "stick together" • Logical organization to the gathered evidence • Replicable: theories require data that may be reproduced by others, hopefully with a variety of different methods and in different contexts • Parsimony: we generally prefer the explanations with the fewest possible principles / assumptions... the fewer the better. • Occam's razor • Example: heliocentrism vs. geocentrism • Falsifiable: explanations must have the chance to be refuted through empirical observable (Karl Popper) • Explanations must be stated such that they may be falsified • Explanations cannot be proven, but they can be falsified • Explanations develop over time with additional findings, with adjustments and corrections • If enough counter-evidence is found, then the explanation needs to be retested, adjusted, or entirely rejected • Productive: generate testable and falsifiable hypotheses

Ordinal Scales (NOIR)

• Values have different names but can also be ranked according to quantity (e.g., low, moderate, high) • the exact distance between the levels is not known... all you can say is which is more than which i.e. ranking favorite toppings on pizza from 1-5; maybe you rank sausage as 1, mushroom as 2, pepperoni as 3, olives as 4, and anchovies as 5. distance between 1 and 2 could be small, meaning you don't have much of a preference between sausage and mushroom, but distance between 4 and 5 could be massive, meaning you're fine with olives but absolutely can't stand anchovies. or runners in a race; distance between 1st and 2nd place and 2nd and 3rd place can widely vary

Analyzing experimental results

1. State null and alternative hypotheses 2. Inspect distribution of results for normality (histogram) 3. Calculate descriptive statistics (mean and SD) for each condition 4. Based on the design, use the correct inferential statistic • E.g., t-test 5. Statistical decision-making: Based on the inferential statistic, reject or retain the null hypothesis 6. Present results

Using the correct inferential statistic

1. What is the scale of measurement of the DV? (NOIR) • RATIO (parametric) 2. Number of Independent variables and how many levels to each? • 1 IV, with 2 levels 3. For each IV, is it between or within -subjects? • Between subjects

The Scientific Method

A series of steps followed to solve problems including collecting data, formulating a hypothesis, testing the hypothesis, and stating conclusions.

Example: Harlow's Monkeys (testing between 2 theories)

Behaviorists → Infants form an association between caregiver and food, become attached to mother because she is a source of food. Prediction: Any entity could be considered a caregiver if it provides food Harlow → Caretakers provide for offspring emotional needs & physical comfort; offspring instinctively seek caretakers. Prediction: Infants seek caretaker comfort and support regardless if she is the food source Harlow's Operationalization: Harry Harlow raised infant monkeys apart from their mothers, but provided them two alternative "mothers": • Wire mother: provides food but no comfort • Cloth mother: provides comfort but no food • When scared, which mother will an infant monkey approach (a measure of attachment)? Results: Infants took food from whichever mother provided it • Regardless of which mother provided food... • Infants spent more time on cloth mother • Sought comfort from cloth mother when scary stimulus introduced • Used cloth mother (but not wire mother) as a secure base • Strong evidence in favor of Harlow's theory, over that of Behaviorism

Independent Samples t-test

Between subjects

Cohen's d

Cohen's d: the difference in performance between conditions in standard deviation units. Rules of thumb: • .1 - .3 = Small* effect • .3 - .6 = Medium effect • .6 and above = Large effect *Small doesn't mean unimportant depending on the research area

Establishing construct validity

Convergent validity: is the current measure is related to other measures of the same construct? Discriminant validity: does the current measure diverge from, or is NOT related to measures of dissimilar constructs? Criterion validity: to what extent does current measure predict more concrete behavioral, real-world, applicable outcomes Content validity: extent to which measure covers a representation of the construct of interest Face validity: does the measure 'look like' the construct it is attempting to assess

Conceptual integration

Explanations of reality that are mutually compatible, from the most micro to the most macro of systems

Construct Validity

Extent to which measure assesses intended theoretical concepts (i.e., reflects the quality of the operational definition of the construct) • To what degree does the measure "capture" what we are trying to study

Not all journals are created equal

General rule: The more "general" the name of the journal, the more prestigious; i.e. Science, Nature, Proceedings of the National Academy of Science (PNAS) • Work published here is most "important" in all the sciences Top journals in psychology: Psychological Science, Journal of Personality and Social Psychology, Journal of Experimental Psychology: General, Journal of Neuroscience, etc Mid-tier journals are excellent too, just more specialized: Cognition, Evolution & Human Behavior, Developmental Science, NeuroImage, Journal of Educational Psychology, etc.

How to recruit?

Goal: recruit a sample that is representative of the population of interest. • Achieved through random selection from a population • Probability sampling: individuals are randomly selected from the population such that each person is equally likely to participate. Unbiased. (Ideal method) • Non-probability sampling: individuals selected non-randomly. Could introduce bias. (More likely to be used)

Theoretical constructs

Include: Depression, Attachment, Self-Esteem, Intelligence, and many more

Hypotheses Generation

Inductive reasoning: moves from specific instances into a generalized conclusion (i.e. "The coin I pulled from the bag is a penny. That coin is a penny. A third coin from the bag is a penny. Therefore, all the coins in the bag are pennies.") Deductive reasoning: moves from generalized principles that are known to be true to a true and specific conclusion. (i.e. "All men are mortal. Harold is a man.") Abductive reasoning: Science never "proves" anything, but we can make an inference to the best explanation (i.e. You conclude, as a juror on your first day as a member of the jury, that he is guilty, but you are not certain. Here, you have made a decision based on your observations, but you are not certain it is the right decision.)

Measures of central tendency

Mean, Median and Mode • describes the bulk of the data • Mean: "average score" calculated by summing the scores and dividing by the total number of scores • Median: midmost score in the series of n scores; the score that bisects the distribution (even if not observed) • Mode: Score that occurs with greatest frequency - Concern: Outliers with undue influence

Non-Parametric data

No Assumptions • No distribution assumption • Uses counts or medians • No assumption about variance Ordinal and Nominal

NOIR

Nominal Ordinal Interval Ratio

Power

Probability of rejecting the null hypothesis when it needs to be rejected (when it is false)

Pros and Cons of Within-Subjects

Pros • Need half as many participants as between- subjects (time / money) • Relatively greater sensitivity to effects (statistical power) • Participants serve as their own control (see how they behave under both conditions) Cons • Need to control for order carryover, practice, fatigue effects

Pros and Cons of Between-Subject Design

Pros • No order, carryover, practice, fatigue effects • Can be simpler to administer Cons • Need twice as many participants (time / money) • Relatively less sensitivity to effects (statistical power)

There's always a tradeoff for validity in experimental designs

Psychologists tend to prioritize internal validity the most, because you must have internal validity to even consider external and ecological validity.

Psychology is multi-determined

Rarely (never) is there a sole causal factor that explains much of human behavior • Theory often specifies the many variables predicted to have causal impact on behavior • And often what variables are expected to not have an impact • Thus researchers often will manipulate multiple factors in a single study

Deduction / Deductive reasoning

Reasoning from a general theory to specific observations • E.g., deriving hypotheses from existing psychological, evolutionary, or economic theories • How psychology typically proceeds: theory-driven

Induction / (Inductive reasoning)

Reasoning from specific observations to general claims about how the world works (think, Sherlock Holmes) • Case studies • Patient H. M. and the role of the hippocampus in memory • Serendipity (with good observation skills) • The discovery of penicillium by Alexander Fleming

Abduction / Abductive reasoning

Science never "proves" anything, but we can make an inference to the best explanation Example: Sexual Strategies Theory suggests the existence of evolved, sex-differentiated human mate preferences • Specifically, human males and females are both choosy, but in different ways: • Women should more value resource acquisition in their mates - because reproductive success depends upon getting needed investment • Men should more value reproductive capacity in their mates - because reproductive success depends upon quality / quantity of offspring • We can never "prove" that these data reflect "evolved, human universal mate preferences" • But science never proves anything • But Sexual Strategies Theory is best able to predict and explain the data observed versus competing alternative theories - Abduction gives us reason to then prefer this theory versus others -

Scientific Reasoning

Scientists use logical inference to derive hypotheses from theory or observations; and when evaluating the fit between data and theory

Meta-theories

Sets of principles that you organize your whole view of human nature around Includes: • Evolutionary Psychology • Humanistic Psychology • Standard Social Science Model • Strict Behaviorism (zombie) • Psychodynamic (zombie)

Peer review

The journal's editor contacts 2-4 experts on the paper's topic. They evaluate the paper and make a decision: • Accept • Revise and resubmit • Reject Papers are evaluated on the strength of the argument, the methods and statistics, the fit of the data with the claims, the "importance" of the work, and the fit of the paper with the journal. Be skeptical; Peer reviewed doesn't mean "true".

Example: Operationalizing an experiment

Theory: the long-documented male advantages in spatial reasoning stem from stereotype threat • Stereotype threat: if negative stereotypes regarding a specific group are made salient, then group members are likely to become anxious (reduce cognitive resources) about their performance, which may hinder their ability to perform to their full potential (Steele, 1997) • Hypothesis: Women's spatial reasoning is significantly better when stereotype threat is eliminated compared to when stereotype threat is present. • IV - Stereotype threat: Spatial priming vs. Social priming, randomly assigned • Spatial: "Spatial ability is a cognitive ability that is defined as understanding the relations between objects in space and being able to mentally manipulate them and respond correctly. Males often score higher on measures of spatial ability" • Social: "Empathetic ability is a social ability that is defined as being able to identify with and understand what another person is seeing or feeling, and respond appropriately. Females often score higher on measures of empathetic ability" • DV - degree of error between participant response and the correct answer • Prediction: women given the social prime will be more accurate those given the spatial prime • Results: same performance for men before and after being presented w stereotype threat, but women perform significantly worse after being presented the stereotype threat; before, they perform equally to men.

Citations

When do I cite? • You need to cite the sources of specific previous theories, claims, findings, and methods • For very broad general concepts you don't need a citation. Which of the following would need a citation? • Working memory has a capacity of 7+/- 2 items. • Evolution occurs through natural selection. • Counterintuitive concepts have a memorability advantage. • fMRI is a commonly used neuroimaging method. Two types of in-text citations • Parenthetical citation: (AUTHOR(S), YEAR) • Ex: Birth order has been shown to have effects on personality characteristics (Smith & West, 1986). • List multiple citations in a parenthesis by using a semi-colon and alphabetic order (Meltzoff, 1995; Woodward, 1998) • Uses & • In-line citation: AUTHOR(s) (YEAR) • Ex: In their study, Smith and West (1986) showed that.... • Use this style for papers that are relatively more important for your argument / that you want to emphasize • Uses and APA Style In-Text Citations • If less than 7 authors, the first time you cite you write every name • (Heiphetz, Lane, Waytz, & Young, 2016) • If 7 or more authors or the second time you cite the same paper with 3 or more authors, use et al. • (Heiphetz et al., 2016) • In-line: Heiphetz and colleagues (2016) or Heiphetz et al. (2016) • Two authors: always write both names APA Style Reference section • How to cite journal articles • Other types of sources have different format, use OWL ~ Author's Last Name(s), Initials (year). Title of the article, Journal Title, Journal number(journal issue), page numbers. doi: if available ~ • Do a Hanging Indent everytime a line is skipped • Hanging indent in Word: crtl-tab • List references in alphabetical order by first author's last name Other APA Style Notes • Try not to use "I" • Paraphrase! Avoid direct quotes unless absolutely necessary • Do not put any specific statistics in the intro (e.g., sample size) in the introduction • No article titles in the introduction • No first names of authors. • When in doubt, cite. • Avoid phrasing things as a question. • Use clinical tone: no grand or vague claims

ANOVA

When you need ANOVA • One factor experiment with more than 2 conditions/levels • We have a single IV, with 3 or more levels = One Way ANOVA • Multi factorial design (multiple IVs/variables (e.g. quasi)) • We have 2 independent variables, each with 2 levels = 2 x 2 ANOVA • NOTE: Each IV can be between-subjects or repeated measures • Design and associated ANOVA can be: • Fully between (all IV's are between) • Repeated measures (all IV's are within) • Mixed (mix of both between and within for the IVs)

Dependent Samples t-test (AKA paired-samples t-test)

Within subjects

Hypotheses

a claim about the state of the world derived from a theory. It is not a question, but rather a statement • Contain operational definitions: specific, observable phenomena that can be measured → how we translate theoretical constructs into measurable quantities • State the expected relationship between different variables → predictions

Confirmation bias

ask only the questions and consider only the evidence that supports hypothesis • Solution: attempt to falsify your hypothesis, not confirm it

Writing your research proposal

https://gauchospace.ucsb.edu/courses/pluginfile.php/9570200/mod_resource/content/1/Topic9-Proposal.pdf

Interaction?

not parallel (lines cross) = INTERACTION This indicates that the effect of one variable depends on the level of the other variable.

Behavioral

operationalizes a construct by having people perform tasks that measure: accuracy, reaction time, preference, memory, etc. - Subjective/Objective -

Self-report

operationalizes a construct by having people report on their own to questions in surveys ("scales") or interviews - Subjective -

Physiological

operationalizes a construct by recording biological OBJECTIVE data: brain activity (e.g, fMRI, EEG), heart rate, etc. - Objective -

N = 1 is not a theory

over-relay on personal experience and opinions • Solution: remain objective and willing to falsify yourself

Yerkes-Dodson Law

performance increases with arousal only up to a point, beyond which performance decreases

Availability heuristic

rely on information that comes to mind most readily. • Solution: look up the actual base rates of phenomena

Same mean, different standard deviation

reporting ONLY the mean can be deceptive, because the SD can have a massive effect

Inferential statistics in psychology (concept)

the larger the ratio, the smaller the p-value

Statistical significance

when p < .05

Designing control conditions

• "Do nothing" vs. "do something" is not adequate experimental control • Rather, design the two conditions to be minimally different: as similar as possible but for the variable of importance • Eliminates alternative explanations

Profile plot

• 2 x 2 factorial designs • Y-axis: DV • X-axis = IV-1 • Different lines = IV-2 • Error bars around each data point (represents mean for that combination of conditions)

2 x 2 factorial notation example

• 2 x 2 x 2 • 3 IVs each with 2 levels • 3 x 2 • 1 IV with 3 levels and 1 IV with 2 levels

2 x 2 Designs test 3 different predictions

• 2(Cognitive load: Yes vs. No) x 2(Seductive detail: Yes vs. No) between-subjects design - Main effect of Cognitive load - Main effect of Seductive detail - Interaction of Cognitive load * Seductive detail • There is a possible main effect for each IV • There is a possible interaction for each combination of IVs

Ecological validity

• Are the tasks / scales representative of everyday life? Or do tasks in psychology seem very artificial? • Mundane realism: does the experiment physically resemble real life? • Psychological realism: does the experiment trigger the same psychological processes / mechanisms that are used in real life situations? • Consider the fit between the task / question / situations with what people face in everyday life or with the conditions under which the mind evolved.

Ways of Knowing (Epistemology)

• Authority: Believe in or do whatever the highly prestigious, important, or powerful tell you (i.e. Milgram's (1963) famous obedience study) • Tradition: Believe or do what lots of other people have believed in or done (i.e. Asch's (1951) Conformity study) • Intuition + Personal Experiences: Implicit understandings about things in the absence of formal training → generated by our own psychology! Everyday learning about the world • Logic: Pure reason → from simple premises, derive true conclusions • Empiricism: Knowledge comes from systematic experiences

Two Types of True Experimental Designs

• Between-subjects • Participants are randomly assigned to one and only one level of the IV • Different groups of people for each condition • Within-subjects • Participants experience every level of the IV • Same group of people for every condition • No random assignment, so other steps needed

Establishing Internal Validity

• Between-subjects • Was random-assignment to condition used? • Did only the IV change between conditions (everything else held constant) • Within-subjects • Was randomization / counterbalancing of the order of conditions used? • Both • Standardization of procedures • Construct validity of the IV and DV • Elimination of confounds

WEIRD Consequences

• Clearly, studying certain questions requires participants that are not WEIRD, hypotheses about certain cultures, etc. • But what about more "universal" human characteristics like visual perception, surely that's invariant across populations? • The diversity of human experience impacts all levels of human psychology and physiology Why does this matter? • The use of convenience and WEIRD samples constrains the generalizability of our theories and findings in psychology (external validity - more on this later) • We have to think a lot more carefully about our theories and findings • There's no such thing as a "generic" human, but we are each immersed in socio-cultural contexts that impact our brains, bodies, beliefs, behaviors

Confounds

• Confound: An unmeasured / uncontrolled factor that varies along with your IV • The presence of a confound reduces internal validity • The change in the DV could be because of the IV or because of the confound • Thus, confounds present an alternative explanation for the results

What if there are more than 2 levels to the IV?

• Consider 1 IV with 3 levels • Impact of music on memory, 3 between-subject conditions: • No music • Soft music • Loud music • DV: number of words recalled • Hypothesis: There will be a U-shaped pattern of performance, where those receiving no music and loud music will do worse compared to those receiving soft music. How to analyze? • With the tools that we have so far, maybe a series of independent samples t-tests? Conduct t-tests on: • No music vs. soft music • No music vs. loud music • Soft music vs. loud music • Compare significant results and effect sizes?

Multifactorial Designs: the 2 x 2 design

• Consider the simplest multifactorial study... the 2 x 2 • Two independent variables each with two levels: • IV-1: between or within subjects • IV-2: between or within subjects • A single continuous DV (interval or ratio) • Design determines analysis

Non-probability Sampling

• Convenience: Testing most readily available people, getting participants where you can, who happens to be available - Most common in psychology (e.g., SONA subject pool) - Gaining in popularity, online subject pools like mTurk • Haphazard: Going to a location (e.g., library) and inviting those around to participate • Snowball: Asking participants to recruit other participants via word-of- mouth. - Useful when dealing with sensitive topic or vulnerable population Concerns: could introduce bias

Types of Correlation

• Correlations are defined by • Strength: closer to +/-1 • Direction: positive or negative • Pearson's r correlation coefficient • Slope of the regression line Strengths and Weaknesses: Strengths: • Test hypotheses that cannot be experimentally investigated • Ethics / practicality: amount of drug use, number of sexual encounters, etc. • Existing archival or government data: rates of suicide, divorce, educational attainment, etc. Weaknesses • Correlational designs do NOT imply causality! Inferences are limited to statements about the association between variables only • Problem of directionality: X could cause Y, or Y could cause X • Third variable problem: unmeasured variable W could cause the relationship between X and Y

Class Memory Experiment

• Depth of processing theory: "perception involves rapid analysis of stimuli at a number of different stages...[shallow stages] are concerned with the physical or sensory features ... while [deeper stages] are concerned with extraction of meaning..." • Hypothesis: "we suggest memory trace persistence is a function of depth of analysis, with deeper levels of analysis associated with more elaborate, longer lasting, and stronger traces..." • Independent variable: Type of processing instructions. Assign participants to receive one of the following: • Shallow group instructions: "your task is to determine whether the word contains the letter 'E' or not" (physical level of processing) • Deep group instructions: "your task is to determine whether the word is pleasant to you or not" (semantic, or meaning level of processing) • Dependent variable: After processing a list of 20 words, the number of words recalled after a brief delay (ratio scale) - Alternative hypothesis: Deeper processing (e.g., considering semantics) of information results in better recall than shallower processing (e.g., considering physical characteristics of the stimuli) - H₁: μ1 =/= μ2 - Null hypothesis: Deep and shallow processing have equivalent effects on recall. - H₀: μ1 = μ2 μ** = population parameter

Measures / Materials

• Describe the stimuli, scales, surveys, manipulations administered to participants. • How were the IVs and DVs operationalized? • For each instrument, name it, describe what it is and how it works. Give examples of items on scales, the scale on which responses were collected (e.g., on a 1 to 7 Likert scale, from extremely unlikely to extremely likely) • How do the stimuli in different conditions differ? • Do you need any special equipment?, e.g., chemicals, a fMRI? • If you have modified or generated new stimuli or survey questions, describe how and explain what it is

Educational practices example

• Dr. Rich Mayer @ UCSB • Uses cognitive psychology to develop principles for best educational practices • Working memory: a cognitive system where new information is first stored and processed - characterized by a tightly limited capacity • Coherence: People learn better when extraneous words, pictures and sounds are excluded rather than included in a multimedia presentation Both factors may affect learning outcomes • Conditions of cognitive load may stress working memory and impair encoding of information • A really loud TV; people walking by and asking you questions; having to switch between multiple tasks all while trying to study • Powerpoint slides that include seductive detail, irrelevant text and pictures, may impair encoding of information • People focus on the wrong and useless information - Research question: Can well designed materials offset the impact of conditions of cognitive load? 2 x 2 between-subjects experimental design • Students enter the lab to watch a slideshow on climate change and will later be tested on what they learned. • Manipulate two independent variables each with two levels: • IV-1: Cognitive load = stress working memory resources or not • Operationalized: A TV in the lab room that's loud or muted • IV-2: Seductive detail = slides are coherent or not • Operationalized: slides include extraneous content or not

Number of main effects and interactions

• Each IV in a factorial design can have a main effect • Each unique combination of IVs can enter into an interaction

Effect Size (e.g. Cohen's d)

• Effect size: indicates strength / magnitude of the effect • Helpful for determining the practical implications of the effect

Types of papers

• Empirical Research Articles • Reports one or more novel studies that test hypotheses • New data • Use APA structure • What information is covered in: • Introduction • Method • Results • Discussion • Theory papers • Introduces or advances theoretical ideas • Summarizes lots of existing data • Review papers • Comments on the state of a field of research • Meta-analyses

Quasi-Experimental Designs

• Equivalent to experimental designs but where random assignment of a variable of interest is not possible for practical or ethical reasons • Participant variables: intrinsic characteristics of participants that are measured, not manipulated. For now, dichotomize: • Ethnicity • SES • Age (young vs. old) • Drug use • Brain damage / neuro-disorders • Sexual orientation • Endless others • We are often very interested in these variables, so measure them (maybe you need a scale and then a dichotomous split) and treat them as a quasi- independent variables • Quasi-IV: the different levels (categories) are determined by the researcher using some sort of criteria (who belongs in which group?) • E.g., political affiliation: republican vs. democrat • Matching the experimental group with the control group now becomes critical for internal validity • Match groups as closely as possible on all other relevant variables except the one you treat as the quasi-IV • Otherwise, risk of confounds! • For variables you cannot match, measure, and later control statistically

Matched Groups

• Especially when interested in special populations, it might take more effort to ensure only the IV is different between groups • E.g., autism is often comorbid with many deficits (e.g., IQ) • If you want to compare children with and without autism, measure potential confounds such as IQ, vocabulary, mental age • Only include control participants matched on those measures, then randomly assign - If you can't control it, measure it and statistically control for it later.

Within-subjects, or repeated-measures, designs

• Every participant experiences every level of the independent variable • Each participant experiences the control and experimental conditions, measure performance on both • Thus, every participant serves as their own control group • Appropriate for some research questions, not for others • When appropriate, must make additional design choices to ensure validity i.e. Stroop Effect (1995); often used as a measure of inhibitory control; people vary in how efficiently they perform the task and this could be related to a host of other psychological variables. • IV: Congruent vs. Incongruent items presented within-subjects • DV1: Response time • DV2: Response accuracy

Bar chart

• Experimental results • X-axis: IV-1 • Y-axis: DV • Different shades: IV-2 • Mean is top of bar • Error bars +/- 1SD around the mean • * denotes significance

Mounting Type I Error

• For each p-value we calculate, there is a risk of Type I Error (α) • Conducting a series of t-tests on the same dataset would inflate the Type I Error rate beyond acceptable levels • Solution: use an "omnibus" inferential statistic to determine if any differences between the means are present • If a difference is present, then use post-hoc tests to pinpoint where those differences lie

Higher order factorial design

• For example, 2 x 2 x 2 • That is, 3 IVs each with 2 levels • Each IV could be between or within • Results • Main effects for A, B, C • 1 Three-way interaction A*B*C • 3 Two-way interactions: A*B, A*C, B*C • Complex! Analyze using two sets of 2 x 2 graphs!

Results

• For our purposes, state in words what the authors found with regards to each hypotheses, e.g.: • Do Experts and Non-gamers differ in cognitive performance? • They compared the two groups, unless otherwise noted. • Visual and attentional tasks • Functional field of view, Attentional blink, Enumeration - Experts did better, but not significantly different • Multiple object tracking - Experts were significantly faster than Non-gamers but their overall accuracy did not differ

Histogram

• Frequencies of observed scores for one variable • Used to depict the sample distribution • Evaluate normality

External validity

• Generalizability: extent to which an effect can be obtained under theoretically unimportant differences in settings, samples, and procedures. • Robustness: the extent to which findings replicate across settings, samples, and changes to procedure • In establishing External Validity, consider the sampling technique. • Was probability sampling (random selection) used? Or was a non-probability convenience sample used? • Non-probability convenience sample might have weak external validity

Generating Hypotheses

• Hypothesis: A claim about the world that is testable and falsifiable • States a predicted (causal relationship) between variables • Must state the groups being compared • Hypotheses are often stated at the conceptual level, and predictions are operationalized: ex) It was hypothesized that bilinguals have superior cognitive inhibition capacities compared to monolingual controls, and therefore it was predicted that bilinguals would be more accurate and quicker on the Stroop task versus monolinguals. • Induction: Specific observations → general claim about people → test that claim • Deduction: General claims → specific predictions → Do observations match predictions? • Psychology is typically deductive: we start with existing theory and test the hypotheses that the theory produces

Hypothesis Testing

• H₀ (Null): No mean difference between two groups in the population • H₁ (Alternative): There is a difference between the means of two groups, in the population • Type I Error: You claim there is a difference between groups that does not actually exist - (Rejected the null when it is true) - Think FALSE ALARM • Type II Error: You failed to claim a difference between groups when there actually is a difference - (Retained the null when it is false) - Think MISS •What does your p value tell you? • The p value is estimating the probability of making a Type I error - So, when we say the effect is significant (p < .05), we are actually saying that there is less than a 5% chance that we have made a Type I error - Less than 5% chance that we claimed a relationship between variables and the relationship does not actually exist • How do I interpret a p-value? When is an effect statistically significant - The researcher gets to determine what level of Type I (false alarm) error rate she can tolerate. This threshold is called alpha (ɑ). By convention, ɑ = .05 in psychology

Post Hoc Tests

• If the ANOVA is statistically significant, you are then allowed to conduct a series of all pair-wise comparisons: • No music vs. soft music • No music vs. loud music • Soft music vs. loud music • These are basically a series of t-tests • But often with additional corrections to adjust for mounting Type I error given multiple comparisons (e.g., Tukey test; Bonferroni-adjustment)

The "weight of the evidence"

• If the data support a hypothesis, then the larger theory may gain some support • As many hypotheses gain support, so does the larger theory • If a hypothesis is falsified, that doesn't falsify the theory • We might have to qualify the theory, amend it for certain contexts • If many hypotheses are falsified, then we may have to then reject the theory • Each finding is just a datapoint. We need to aggregate across many papers to really see if a theory holds

One-Way Analysis of Variance (ANOVA)

• Inferential statistic to use when there is 1 IV with 3 or more levels • Quasi-IV • Between or repeated-measures • One-Way ANOVA produces a single F-ratio and a p-value • Conceptually: F = Treatment Effects / Error • small F ratio = large p-value (p>.05) = no significant difference between means • large F ratio = small p-value (p<.05) = significant diff between means

p-values

• Inferential statistics produce p-values: quantify the probability of the data you collected given that the null hypothesis is true • p never equals 0 • In other words, how likely are the data under the null hypothesis? • If p < α (where typically α = 0.05), then reject the null hypothesis; otherwise fail to reject the null. • That is, when p < α, the data are very unlikely given the null hypothesis, so reject the null hypothesis • Thus, conclude statistical significance when p < .05

Is the scale reliable?

• Internal consistency: Do the different items in the scale "hang" together? • Quantified by Cronbach's α (alpha), closer to 1.0 = better • Test-retest reliability: If you take the test multiple multiple times, do you get the same results? • Inter-rater reliability: Do different observers (raters) come to the same conclusion with the same data?

Internal validity for experiments

• Internal validity: How sure are we that the IV uniquely caused the changes in the DV? • Crucial for arguing there is an effect of the factor you're investigating • Depends how well the study was designed, evaluate: • Construct validity: Face, content, convergent / discriminant • Operationalization of the variables • Experimental controls: use of random assignment, elimination of confounds / use of randomization

One-Way ANOVA results table

• Is there a statistically significant result? • If so, then there is a difference or differences somewhere between the means but at this point we don't know where they are

2 x 2 ANOVA results

• Main effect of cognitive load? • Yes: YES < NO. Increasing cognitive load impairs test performance • Main effect of seductive detail? • Yes: YES < NO. Presence of seductive detail in multimedia impairs test performance • Interaction of cognitive load * seductive detail? • YES: Well designed multimedia (no seductive detail) can offset the impairments to test scores caused by high cognitive load. However, the presence of both seductive detail and cognitive load produce differential worse test scores. • Inferential statistics support our conclusions from the plot • If significant interaction, main effects less interpretable

2x2 Designs: Main effects and interactions

• Main effect: Holding the other variable(s) constant, is there a difference between the means of the levels of the IV on the dependent variable • Evaluate this for each IV using marginal means (to average over the other variable) • Interaction effect: Where the effect of one IV depends on the level of the other IV • Evaluate for every unique combination of IVs in the design • Have to evaluate both IVs together, use a profile plot • Is there a difference in slope? Design: • 2(Cognitive load: Yes vs. No) x 2(Seductive detail: Yes vs. No) between- subjects design on test scores Hypotheses: • Given past research, it was predicted that conditions of cognitive load would impair performance compared to those without conditions of cognitive load. Also, it was predicted that those who received slides with seductive details would have impaired performance compared to those who received slides without seductive detail. Finally, it was predicted that under conditions of cognitive load, those who received slides without seductive detail will perform better than those who do.

Descriptive Statistics: Measures of Central Tendency

• Mean: "average score" calculated by the sum of scores divided by the total number of scores • Median: midmost score in the series of n scores; must arrange them hierarchically before finding this • Mode: score that occurs with greatest frequency

Correlational research methods

• Measure two continuous variables to determine if they are significantly related to each other • Quantitative research • Predict positive or negative relationships, or associations, between two scales • Can be computed from survey responses, from observations out in the 'field', or using archived data

How many participants?

• N = the total number of participants in a study • n = number of people in a condition in the study • Larger samples tend to produce more replicable results (all else equal) • Because of the central limit theorem, which states that a randomly selected sample of sufficient size will be representative of the population • Sample size is a major determinant of statistical power: our ability to detect an effect in the data, should one exist • Larger samples = more power (trivial stuff becomes significant if too large) • Most samples in psychology are very underpowered

Concerns for repeated-measure designs

• Order effects: being exposed to one condition changes how people react to the next condition • Carryover effects: one condition contaminates the next • Practice effects: Participant's experience in one task makes it easier to perform a later task (even when the task is different) • Fatigue: too many tasks / measures a Solutions: • Randomize the order of conditions • If too many different tasks / measures or impractical to randomize, counterbalance: create different orderings of presentation and later analyze if the order has an effect • Consider a between-subjects design if order effects are likely to be a problem

Presenting Results

• Participants that received deep processing instructions recalled an average of 9.49 (SD = 2.76) words whereas those with shallow processing instructions recalled an average of 5.50 (SD = 2.43) words. There was a significant difference in mean recall between the groups t(87) = 7.20, p < .001, Cohen's d = 1.53.

Between subject design

• Post-test only: • Participants randomly assigned to one and only one level of the Independent Variable • To the control or to the experimental condition • Then tested once on the Dependent Variable after the manipulation • Compare performance for both groups on the DV

Statistical Power

• Power: Probability of rejecting the null hypothesis when it needs to be rejected (when it is false) • β is the probability of making a Type II error (missing it) • Power is the probability of NOT making a Type II error (1-β) **Sample size and effect size are major determinants of power**

Pretest/Posttest Design

• Randomly assign participants to two different groups and test participants twice on the key DV: once before and once after exposure to the independent variable • Ensure no selection effects (differences between groups other than the IV) • Often compute a difference score: score at T2 minus score at T1 • Then compare differences scores between conditions

Descriptive Statistics: Dispersion of Scores ("spread")

• Range: difference between the highest & lowest scores • Standard deviation: average measure of how much each score in the sample differs from the sample mean

Rationale: from past theory to current hypothesis

• Rationale is the progression of ideas in the introduction that leads the reader to your hypotheses • Logically connects existing theory and findings, points out their gaps, and suggests why the current hypotheses are likely to be true • Your hypotheses should feel inevitable given the past theory / research you cite • Rationale follows a "Past research has shown" to therefore "The current study will" format. • Key: The introduction section should have a clear transition from dealing with past research to introducing the current study (last one or two paragraphs of the introduction section) Introductions provide rationale • We cite past work to provide context and to build on • Often, critiquing past work, pointing out its limitations • The introduction is not a series of summaries, but an argument, your case for why your hypothesis is correct • Transition to the current study, with words like "however", "in contrast"... • Start a paragraph toward the end of the Introduction with "The current study proposes to.." or "In light of previous research, the current study proposes to..." or "Given limitations in previous research, the current study proposes to" or something like this!

Scatterplot

• Relationship between two variables (X, Y) • Each datapoint represents a participant's response for both X and Y • Evaluate for correlations • Regression line • Error around line

Null and alternative hypotheses

• Researchers are interested in the alternative hypothesis • But must always keep in mind the null hypothesis • Always a statement of equivalence, in the past tense • A description of what observation counts as a falsification of the alternative hypothesis • Do NOT write it in the paper, but you must be able to articulate it • Essential for our statistics: Null-hypothesis significance testing Example: • H₁: It was hypothesized that bilinguals have superior cognitive inhibition as indexed by significantly quicker response time and greater accuracy on the Stroop task compared to monolingual controls. • H₀: Bilinguals performed equivalently to monolinguals on the Stroop task in terms of response time and accuracy ** The null hypothesis is not that the IV had no effect on the DV (although that can happen); its that the both levels of the IV have an equivalent effect on the DV

Discussion

• Restates the hypotheses, explains how the collected data fits with the hypotheses and the larger theoretical landscape • States potential limitations • Discusses importance of the results, and implications of the results for theories and practice • Suggests future directions of the research

Method

• Series of subsections, often • Participants • Design • Materials / Measures • Procedure

Statistical decision making: Can I reject the null hypothesis?

• Set the α (alpha) level: How much Type I error can I tolerate? • You select your α • In psychology, by convention, α = 0.05 • α = .1 is more liberal, α = .01 is more conservative • The more we try to minimize one error, the more we risk the other

Probability Sampling

• Simple Random: every individual in the population gets a number, randomly select numbers • Systematic Random: every individual in the population gets a number, select every nth individual Concerns: not practical, often don't have access to such a large population, not enough money or resources

Student's t-test

• Single factor designs: test the effect of 1 IV with n levels on the DV • One IV with 2 levels • DV is continuous (interval or ratio scale) • 2 levels = t-test • 3 or more levels = ANOVA • Between-subjects = Independent Samples t-test • Within-subjects = Dependent Samples t-test • AKA paired-samples t-test • Compares means (calibrated for SD) from each condition, produces a p-value • We use the p-value to guide our statistical decision making

Between vs Within

• Some designs are more appropriate than others for a given hypothesis, but often it's a judgement call • Evaluate the pros and cons of each choice • Key class concept: Design choices determine what statistic you need to analyze the data • Number of IVs? • Whether IVs are between or within? • DV continuous or discrete?

Measure or Manipulate?

• Some variables you can choose to measure or manipulate • Your theory and hypothesis will determine which way is best, but sometimes it's a judgement call • Example: Self-esteem • Measure trait self-esteem (Rosenberg scale) then use scores to create quasiIV (low vs. high) • Induce high or low state self-esteem (false feedback) as a true IV

Random Assignment

• Something besides the IV varies by condition! The participants! • What if the differences observed on the DV are a result of who was in that group and the IV had no causal role? • Random assignment: participants are equally likely to experience either level of the IV. • Flip a coin • Have a computer randomly assign participants • Let's say half of the participants are alert (green) and half are sleepy (orange) • Randomly assigned to experimental or control group, equally spread out the effects of alert vs tired across the whole sample - Controls for an infinite number of variables that differ by participants, such that we may isolate the effect of the IV on the DV!* *With sufficient sample size

Interval Scales (NOIR)

• Spacing between values is meaningful; the distance between two points is exactly known and consistent • Zero point does not indicate absence of the quantity; 0 does not mean nothing i.e. Likert scale, Celcius

Design

• Specifies the independent variables, their levels, and whether they are between or within subjects • Specifies the dependent variables

Minimizing Confounds

• Standardization of procedure • Standard (as much as possible) the testing room, the experimenters, time of experiment, etc. • Design minimally different control conditions or control experiments • "Do nothing" is often not a good control condition

The Replication Crisis

• Starting around 2010, renewed interest in trying to replicate existing studies in psychology, which then spread to other life sciences • Many studies failed to replicate • This forces the field, and us, to carefully think about how we conduct our science. Factors behind replication crisis: • The incentive structure to have a job in academia: publish or perish • It's not outright fraud, that is still rare, but people cut corners • Misuse of statistics, especially null hypothesis significance testing • The p-value is deeply misunderstood by most • File drawer problem: only "positive" results get published, so we don't really know about all the failed experiments out there or what an area of research really looks like • "Researcher degrees of freedom": Researchers often make arbitrary decisions that have undue impact on study outcome SOLUTIONS: • "Open Science": Researchers should make all their data and materials available -- This wasn't previously the case! • Total transparency with all analyses and experiments conducted: some journals are now interested in publishing replications and null results • Pre-registration: Publicly post, in as much detail as possible, your hypotheses, your methods, and your statistical analysis plan BEFORE you collect data. Then note in the final paper where you deviated from the pre-registration.

Write up results

• Test score means were submitted to a 2(Cognitive load: Yes vs. No) x 2(Seductive detail: Yes vs. No) between-subjects analysis of variance (ANOVA). Results revealed main effects of cognitive load [F(1, 60) = 27.69, p < .001] and seductive detail [F(1, 60) = 13.14, p < .001]. These effects were qualified by an interaction between cognitive load and seductive detail [F(1, 60) = 14.89, p < .001], such that the absence of seductive detail offset the impairment caused by high levels of cognitive load on test scores. On the other hand, the presence of seductive detail and cognitive load differentially impaired test scores.

Descriptive research methods

• The goal is to describe behaviors / events • Qualitative research • Sociology / Anthropology • Ethnography, interviews • Psychology • Case studies such as Oliver Sacks on Clive Wearing Strengths and Weaknesses: Strengths: • Beautiful, rich presentations of human behavior and life in its full natural, real contexts • Case studies can illustrate theories at their most extreme • E.g., the hippocampus must have something to do with episodic memory because when that brain area is destroyed, one looses their episodic memory (the case of patient H.M.) Weaknesses: • Case studies often have poor generalizability • Often no controls or comparison groups limit causal inference • Evocative, but rarely conclusive

Descriptive Statistics: Outliers & Skew

• The mean is directly affected by the magnitude of each score in the distribution, thus its sensitivity to individual score values makes it susceptible to outliers and extreme scores - Not always the best measure of central tendency in skewed or bimodal distributions - One of the most common measures

Inferential Statistics

• The means look different from each other, but we need to test if they are significantly different from each other. • Or are the apparent differences by chance alone (measurement error) • Inferential statistics allow us to make claims about the population using the descriptive statistics computed from your sample • Allow us to conclude if two means are actually different from each other or if the differences are due to error (chance) • Statistical decision making: Do I have enough evidence to reject the null hypothesis?

Null Hypothesis Significance Testing (NHST)

• The primary type of inferential statistics used in psychology • CRITICAL POINT: The design of your experiment determines which inferential statistic you need to use 1. What is the scale of measurement of the DV? (NOIR) 2. Number of Independent variables and how many levels to each? 3. For each IV, is it between or within -subjects? • Statistical decision making: Do I have enough evidence to reject the null hypothesis?

Introduction

• The purpose of the study, why it is important to conduct / briefly note the real-world importance • Defines key concepts, constructs concisely but accurately • The theoretical framework that motivates the research with citations • The new contribution the research brings and the relationship between the current study and previous theory/research; what is the rationale*? • A note on the general methodology taken • The hypothesis(es) under test and the predictions that follow from that hypothesis *It is NOT a series of summaries of other work, it is a logical argument that justifies why the hypothesis should be correct...

Operationalization

• The translation of theoretical constructs into measurable observations i.e. depression; a theoretical construct that we must infer through observations

Proposed Statistical Analysis

• This study used a 2(Cognitive load: Yes vs. No) x 2(Seductive detail: Yes vs. No) between-subjects design on the DV of test score. Therefore, test score means will be entered into a 2(Cognitive load: Yes vs. No) x 2(Seductive detail: Yes vs. No) between-subjects analysis of variance (ANOVA) to test for both main effects and the interaction.

Design section of an APA paper

• This study used a 2(Cognitive load: Yes vs. No) x 2(Seductive detail: Yes vs. No) between-subjects design. The dependent variable was scores on a post-test about climate change knowledge.

True Experiment Designs

• Two primary features of experimental design: 1. Independent variable: the researcher manipulated 'something' and that is the only thing that differs between conditions • Experimental group vs. control group 2. Random assignment of participants: each participant is equally likely to experience either level (condition) of the IV

WEIRD

• USA undergraduates are by far the most common population sampled in psychology, because of convenience. • Are you representative of the world's humans? • Western • Educated • Industrialized • Rich • Democratic Introduces replication bias

Who to recruit?

• We aim for a representative sample of the population of interest • The sample accurately reflects the characteristics of the population • Allows us to infer things about the population without studying every single person in the population • The appropriate population depends on the study question • Cross-cultural work: need to sample people from different societies • Autism: need to sample those with autism • Development: sample infants • Often, however, any "neuro-typical" humans will do, like college undergraduates → "convenience samples

Experimental Design Example

• We are interested in a drug that reduces the frequency of headaches • Hypothesis: Those the receive the drug will have significantly fewer headaches compared to those who receive a placebo. • IV: randomly assign participants to receive the drug or placebo (double- blind) • Note, this is categorical • DV: number of headaches over a week's time • Note, this is continuous

Only the IV varies by condition

• We conclude the IV causes changes in the DV if: a) The changes in the IV co-vary (correlate) with changes in the DV b) The change in the IV occurs before change in the DV c) Necessity: If the IV doesn't change, the DV doesn't change (on average) d) Sufficiency: Every time the IV changes, the DV changes (on average) e) There is a logical, theoretical explanation for why the IV might change the DV f) Confounds have been eliminated (no alternative explanations)

Psychology requires samples

• We rarely can study an entire population we are interested in • So we need to recruit samples, or subsets, of people from the population of interest

Procedure

• What did the participants do in the study, and in what order? • Step by step, describe what participants experience as they go through the study. • What do participants experience in different conditions? Describe how subjects were divided into different groups (e.g., random assignment or on the basis of some observed characteristic) • How are manipulations conducted? How are responses recorded?

Experimental research methods

• Where we manipulate an independent variable (IV) and measure its effects on a dependent variable (DV) • Independent variable (IV): the experimenter sets the values - this is what the experimenter controls • Each IV has at least two different levels (also called conditions, groups) • Participants are assigned to condition • We seek to demonstrate a difference in performance on some measure between groups (e.g., experimental vs. control) • Dependent variable (DV): what the experimenter observes and records. Experimenter has no control of this. • An experiment can have one or more DVs • Experiments aim to demonstrate causality (the IV has a causal impact on the DV) Strengths and Weaknesses: Strengths: • When properly done, can demonstrate the causal factors and mechanisms underlying behavior • Psychology places a big premium on causal findings • Experiments often allow for more precise observations: controlled environments, comparison groups, experimental controls, etc. Weaknesses: • Can be highly artificial, experimental conditions often do not match the "real world" • Experiments often have unknown generalizability to other populations

2 x 2 within-subjects design: worked example

• Within-subjects: every participant experiences every level of every IV • Last summer: in-class Taste Test Experiment • Every participant sampled (in random order) • Gummy bears • Sour bears • Gummy worms • Sour worms • Rated each for taste (yummy-ness) from 1 (yuck) to 10 (yum) https://gauchospace.ucsb.edu/courses/pluginfile.php/9646795/mod_resource/content/1/Topic12_FactorialDesign.pdf


Ensembles d'études connexes

Module 7 Textbook Study Questions

View Set

Chapter 12: Products and services strategies

View Set