Psy 120L Exam Lec 1-2

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Hypothesis Testing - Significance

*Significance* expresses the likelihood that the observed effect would occur by chance alone • p = .05 is the standard alpha level for significance - If p < .05, then the null hypothesis should be rejected • However, significance DOES NOT tell you how big the effect is... • p = .001 is NOT "more significant" than p = .049

Type I Error

You claim there is a relationship between variables that does not actually exist (Rejected the null when it is true)

Evaluate

-How valid is the study? -How confident can we be in the results? -What do the results tell is about the behavior in the real world?

Correlation CANTImply

CAUSATION Why not? • Third variable - a variable that you did not measure that is significantly correlated with variable X and Y may be responsible for the relationship " That one example- can you conclude buying ice cream causes murderous behavior? Third variable can be heat in summer murder rates go up " More weird correlations google spurious correlations

Process of soliciting feedback from the driving instructors is intended to improve which aspect of the questionnaires validity?

Content. about weather content on survey you are measuring all sorts

Within-subjects example - one factor/single factor (Pearson et 2003) Result

Does higher dosage lead to better inhibitory control? YES! Post hoc-tests* revealed: Placebo < .30 mg/kg and .60 mg/kg .15 mg/kg < .30 mg/kg and .60 mg/kg *What are post-hoc tests? They are additional statistical analyses used to tell you WHERE the differences are between your conditions. (Remember that a significant F only tells you that there is a difference, but you don't know where).

(Fiorella & Mayer, 2013) Experiment 1 Hypothesis

Experiment 1 Hypothesis: Both *Preparing to teach* and *Learning by teaching* will improve students' comprehension of to-be-learned material on an IMMEDIATE test of materials as compared to controls. • Prepare > Control • Teach > Control Results: F(2, 90) = 5.92, p = .004 Does IMMEDIATE comprehension performance differ significantly across groups? YES! Teaching group & preparation groups outperformed control group, but did not differ significantly from each other in immediate testing performance • Teach = prepare > control on IMMEDIATE comprehension exam

Multiple IVs vs. Moderators Multiple IVs

Main effects: • The effect of IV # 1on a DV • The effect of IV #2 on a DV Interaction: • The effect of one IV on a DV is different for the different levels of another IV

Measure

Once our variables are operationally defined, we can manipulate or measure them in our experiment

Measure----Potential Problems

The whole point of running an experiment is to determine whether X caused the differences in *Y-->* to infer causality, we need to rule out alternative explanations • What if the subjects in the different conditions were different types of people? -If you used some rule to decide who got assigned to experimental condition (e.g. those who show up early), this would be correlated with X, and would be a problematic third variable .....or *confound*. -To solve this problem, subjects must be *randomly assigned* to the various levels of X -Thus, participants in different groups should have random variability on any possible confounding variable

Hypothesis Testing - Errors Types

Type I Error: (youre pregnant) Type II Error: (youre not pregnant)

Correlation Effect Size

how related are they to each other? • In a Pearson correlation, r can tell you how strong the linear relationship is • Cohen's (1988) conventions: Small .10 Moderate .30 Large .50

How many effects and interactions are there?

two main effects and 0 interactions (parallel on graph) -remember lines a parallel so the levels across 1 IV the same as other IV so no interaction

Multifactorial Design ex

• 2 x 2 between subjects example

Hypothesis Testing - How do we do it?

*Inferential Statistics* - what we use in hypothesis testing to make conclusions about our sample data (typically to infer how the whole population behaves) • Correlation • T-test • ANOVA • Need 3 pieces of information: • Significance • Effect size • Power

Multifactorial Design Features

*Main Effect(s)* • How each independent variable affects the dependent variable • Each IV has an effect on DV, regardless of the level of your other independent variables • # of IVs= #of main effects Interaction • When two or more independent variables work together to affect the DV • Interaction effect: The effect of IV 1 on the DV is different for the different levels of IV 2 -How both your IVs impact your DV • # of interactions = # of combinations of 2 or more IVs • For the purposes of this class, we will focus on interactions in 2 x 2 designs.

Examples of falsifiability

-is this falfifiable "all cars are red?" yes by finding one non red car -is this falsifiable "there exists a greew swan?" no

Statistical Tests for Multiple IVs vs. Moderators -2 IVs

2 Main effects and 1 Interaction • 2-way ANOVA • You will get 3 F-values and 3 associated p-values • Post-hoc if necessary • To do planned comparison (compare specific conditions) you need: A hypothesis that looks at two or more groups compared to each other • A significant F fort he interaction

In a 2 x 2 mixed factorial design, how many separate groups of participants would you most likely have?

2 groups of participants -remember mixed means one within and one between

Hypothesis Testing - Drawing Conclusions • You are justified in concluding NO relationship between 2 variables/ NO differences between experimental & control group when you have...

A very small average effect size A significance test that does not reach acceptable p-value Power that is acceptable -NOTE:If you have a small sample size,you probably don't have very much power, so you should NOT conclude there was no effect even if your statistical tests don't result in significant p values. Instead, it is better to conclude that you did not have enough power to detect an effect-->you should talk about this in your discussion • Regardless of what you conclude, you should address all three things in your discussion in your paper

Within-subjects example - one factor/single factor (Pearson et 2003)

Background: This study investigated the cognitive effects of stimulant medication in children with ADHD. Does higher dosage lead to better inhibitory control? • Participants: 6 children (mean age = 10.9 years, SD = 2.4) • Procedure: • Children were given 4 dosages of a drug, methylphenidate (MPH), each for 1 week, and were tested after each week. The order of doses was counterbalanced so that each dose appeared equally often in each order. -Dosages: .15 mg/kg, .30 mg/kg, .60 mg/kg, and a placebo (design) • Main DV = Delay of Gratification task. They were told that a star would appear on the computer screen if they waited "long enough" to press a response key, which measured the ability to suppress or delay impulsive behavioral responces

Pre existing individual diff as a (potential) confound applies to

Between subject design

Within-subjects example - two-factor (Anderson, Benjamin, and Bartholow, 1998) Their Hypothesis and results

DV = reaction time to say target word after the primed word Their hypotheses: • Main effects: They made no predictions for main effects for either the Prime word or Target word • Interaction: "The priming hypothesis predicts that the relative accessibility of aggressive thoughts (average RT to non aggressive target words minus average RT to aggressive target words) will be greater in the weapon-prime than in the nonweapon-prime condition" • In other words, when a non-weapon prime is presented, the response time to say the target word with not differ, whereas when a weapon prime is presented, the response time to say an aggressive word will be much faster than a non-aggressive word. • Results: • The interaction was statistically significant, F(1, 31) = 4.72, p < .04. In other words, on animal-prime trials, participants were slightly slower (5 ms) at naming aggressive words than nonaggressive words, but exposure to weapon primes reversed this pattern, enabling participants to name aggressive words 9 ms faster than nonaggressive words. • Neither the main effect of prime stimulus, F(1, 31) = 1.97, p > .15, d = .15, nor the main effect of word type, F(1, 31) = 0.21, p > .5, d = .05, approached significance.

Three Research Designs

Design: 1) Descriptive Goal -To simply describe behavior or phenomena Advantage -Provides a relatively complete picture of what is occurring at a given time -Allows development of questions for further study Disadvantage -Does not assess relationships among variables -May be unethical if participants do not know they are being observed 2) Correlational, Goal -To assess relationships between and among two or more variables Advantage -Allows testing of expected relationships -Can assess relationships between many everyday life events Disadvantage -Cannot be used to draw inferences about causal relationships between and among the variables 3) Experimental. Goal - To assess the causal impact of one or more experimental manipulations on a dependent variable Advantage -Allows conclusions to be drawn about *causal relationships* among variables Disadvantage -Cannot experimentally manipulate many independent variables* -May be expensive and time consuming -Lab experiments may not reflect real world events

Within-subjects designs - Problems

Form of order affect pf when systematic changes in performances occurs *as a result of completing one sequence* of conditions rather than a different sequence • Carryover effects - may affect *internal validity* -Order effects: what is presented 1st can have influence on subsequent items/tasks (can also happen in between-subject designs): First form will influence second part -Practice effects: Participant's *experience* in one task makes it easier to perform a later task (even when the task is different) -Interference effects: Performing one task *disrupts* performance on a 2nd task • Reduce carryover effects through *counterbalancing* - varying the order in which different tasks are completed Fatigue - Participant becomes more fatigued, bored the longer they have to engage in tasks -Ways to reduce these problems: give people breaks, do less trials

Descriptive Research Examples

Goal is to simply describe behavior or phenomena Case Study • Often conducted on unusual or unique phenomena - Phineas Gage, H.M., split brain studies • Piaget observed his own children in many of his studies to develop his stage theory of cognitive development Population surveys • Summarize the distribution of scores on a measured variable - height, weight, political views • Epidemiological research - analyses of the distribution and determinants (who, when, where) of health and disease conditions Marketing research • Research used to identify and define marketing opportunities and problems • Could describe who, when, where people use a specific product

Interpreting Stats Results Review

Hypothesis testing • Significance • Effect size • Power • Type I and Type II Errors • Drawing conclusions • Reading mean tables/graphs

Moderator Example (Romero-Canyas, Downey et al., 2010) Hypothese

IV: Nature of online interaction (acceptance, mild/harsh rejection) • Moderator: Rejection sensitivity • DVs: Hostility, negative mood, feelings towards the group • Hypotheses? • What could they have predicted about the effect of the IV on the DVs? •People who receive (IV) mild or harsh rejection feedback will report more (DV) negative mood, hostility, and negative feelings towards the group than people who receive (IV) acceptance feedback • What could they have predicted about the effect of the moderator on the DVs? • (M) HRS individuals will report more (DV) negative mood, hostility, and negative feelings towards the online community compared to (M) LRS individuals

Revise or Replicate

If findings do not support our hypotheses, we revise our hypotheses or operationalizations and go from there • In scientific research, we require that findings be replicable to ensure that our results were not due to statistical error. • Researchers should strive to test hypothesis both in the lab and in the field -Internal validity is easier to attain in the lab -External validity is easier to attain in the field

What makes good Psychological Research?

It must be testable (Can test a theory with hypotheses and the scientific method) and it must be falsifiable

Hypothesis Testing - Graphs

Like mean tables, just looking at a graph can give you a good estimate about whether you have main effects or interactions, and can help you interpret your data -If graph has two parallel lines No main effect of variable A • Main effect of variable B • No interaction -if line graph crosses like an X No main effect of variable A • No main effect of variable B • Interaction

Multiple IVs vs. Moderators Moderators + IV

Main effects: • The effect of IV #1 on the DV • The *relationship* of moderator (quasi or continuous variable) to a DV Moderation: (this you cant manipulate) • The strength of the effect of the IV on the DV is different based on the level of the moderator -We are not interested in a moderator effect. You only need to state a moderator hypothesis to help you understand these relationships

Experimental Design

Manipulation of a least 1 independent variable (IV) and measurement of a dependent variable (DV) • IV = variable whose values are chosen and set by the experimenter [this is what is manipulated in the experimental group(s)] • DV = variable whose value you observe and record • Aim is to show causality Today we will talk about a *one-factor between-subjects* experimental design • 1 IV that contains 2 or more levels • 1 DV

Hypothesis Testing - Reading a mean table for a 2x2 • Comparing marginal means estimate main effects and comparing condition (cell) means estimate interactions Second part

Means represent level of self esteem Main effects found by examining differences between marginal means

Mixed Design Example (Mackie & Worth, 1989)

Mixed Design Example (Mackie & Worth, 1989) • You are interested in examining the effects of a person's mood and the type of appeal used in a speech on how persuasive the argument is • Theory: Positive mood diffuses attention leading to less persuasiveness regardless of how strong or weak the argument is • Procedure • You randomly assign participants to a happy or negative mood induction • Then they read two environmental speeches with an appeal to emotion and an appeal to reason (counterbalanced) • Finally, they rate how persuasive each ad is on a scale from 1-9 X=Mean (SD) Persuasiveness MOOD Marginal Means AD TYPE Negative Positive Emotional Appeal 5.6 (.18) 4.65 (.17) 5.13 Reasoned Appeal 6.1 (.17) 2.56 (.17) 4.33 Marginal Means 5.85 3.6

Between-Subjects Designs

One factor designs: • Examines the effects of 1 IV (with 2 or more levels) on 1 DV • Uses one-way ANOVA to examine differences between groups, and post-hoc tests (e.g. LSD, Tukey) to find where the differences are • Can also use t-test if you only have 2 levels of your IV *Multifactorial designs: • More than 1 IV (or factor) • Uses factorial ANOVA to examine the effects - which include main effects and interactions • (Can also use multiple linear regression, but we won't be using that for this class)* -for more complicated designs

Researcher are interested in examined effect of caffeine on memory. 3 diff groups after 20 mins they perform a short memory test, What stats analysis should be conducted?

Run a one way between subjects ANOVA

4 Hour sleep condition one night then 2 hour sleep condition the next/ Have 1 IV so..

Run a within in subjects one factor

Hypothesis Testing - Going from a mean table to a graph

Start with all 4 means of condition • One IV will be on the x axis, one IV will be separate bars or lines • Plot each mean for all 4 conditions • Make sure you label the axis correctly for each variable • If you graph it with a line graph: • If lines are parallel, then there is no interaction • Main effects are the two lines being separated from each other or the points at each end being separated • If you graph it with a bar graph: • If the the 4 bars are not too different in heights, then there is no interaction • Main effects are the matching bars being different from each other

(Fiorella & Mayer, 2013) Summary

Teaching and preparing to teach led to better learning than only studying on an immediate test -Teaching and the mere act to preparing to teach may affect the tutor's cognitive processing of the information, which leads to better learning outcomes Teaching led to better learning than preparing to teach and only studying on a delayed test -Actually teaching may lead to deeper processing that leads to longer retention of information • Possibly a new study technique to try for your exam in this class?

Theory must be stated so that it can be falsified (what does this mean

Theory must be stated so that it can be falsified • Theory can only be falsified, NOT PROVEN CORRECT • Theory can evolve form adjustments on previous tests/formulations • If theory/hypotheses are not supported, then theory in current form might be wrong, may need to be retested, revamped, or scrapped

Real Multifactorial Example (Prybylski & Winstein, 2012) • How many main effects does this study have? -Conversation type -Presence of phone • How many interactions does this study have? -Conversation type X presence of phone

Two IVs Two main effects One Interaction Meaningful conversation will improve relationship quality Presence of phone will decrease the relationship quality Meaningful conversation with the presence of a book and absence of phone will improve relationship quilting rating the most. Causual conversation with presence of phone will decrease relation ship quality the most.

Hypothesis Testing - Drawing Conclusions

You are justified in concluding there is a relationship between 2 variables or that there are differences between conditions when you have... • An appropriate effect size • A significance test that reaches an acceptable p-value • Power that is acceptable -Best case scenario: You ran a power analysis and you ran enough subjects to be confident in the power in your study -If power is small, you can not conclude your test is insignificant

Hypothesis

an explicit, testable prediction about the conditions under which an outcome will occur How to develop a ______: • Inspiration from previous theories and research • Personal observation Key Features: -Contain at least *2 Concepts* and a statement of *the relationship* between them • They can be operationalized and tested (theories can only be tested by testing _____)

Moderators

• Sometimes experimenters hypothesize that the characteristics/ attitudes/predispositions that people bring into the lab may affect how they will react to the experimental manipulation. • A moderator variable is a continuous or categorical variable that you measure (you never manipulate it!) and is used to potentially explain how individual differences in participants might impact the main relationship you are testing (I--->DV) • It affects the STRENGTH of the relationship between the IV and DV • Example: -Age - young participants and old participants may be affected differently by the manipulation IV-------------------->DV ^Moderator

Correlation -Significance

• Statistical significance - p < .05 • Remember that if your p-value happens to be < .001, it is NOT "more significant" that it is than p < .01 or p < .05! • You need to look at the effect size to determine HOW STRONG the relationship is P grater than or equal to No. p < .10 No. But there is a trend "marginally significant" ------------------------------------------------- p < .05 YES! Significant p < .01 YES! Significant p < .001 YES! Significant

Hypothesis Testing - Errors

• The p-value estimates the probability of making a *Type I error* -So, when we say the effect is significant (p <. 05), we are actually saying that there is less than a 5% chance that we have made a *Type I error*; Less than 5% chance that we claimed a relationship between variables and the relationship does not actually exist -Can decrease likelihood by picking a smaller alpha level • Power is the probability of NOT making a *Type II error* -So the probability of making a Type II error is 1 - power. -If you have power of .80, what are the chances you made a Type II error? -Adequate sample sizes are key!

Reading a Graph Example • What would this look like in a bar graph?

• What would this look like in a bar graph? Look at the means and imagine averaging across the two variables -Commitment looks like it averages to about the same (brown averaged with orange) so I would expect no main effect of commitment -Conflict looks like averages will be different (orange bars averaged vs. brown bars averaged) so I would expect a main effect of conflict

Scinticim Method and validity Five Steps of the scientific method (HOMER)

" Hypothesize: explicit prediction testable about the conditions under which an outcome will occur " Operationalize: how to develop a. hypothesis -inspiration from " Measure: Once our variables look out " Evaluate " Revise or Replicate

Validity clarification

*Construct Validity*: The extent to which operationalizations of a construct measure that construct as *defined by a theory*. • To assess construct validity, you would statistically assess its convergent validity (test correlates with other measures of THE SAME or Very Similar Constructs) and *divergent validity* (test does not correlate with measures of DIFFERENT constructs). -For example, a measure of romantic love should only measure romantic love, not other related but different constructs, like liking in a friendship. *Content Validity*: Does the test contain items from the desired "content domain?" Does it measure every element or facet of a construct? • Based more on "experts" in that content domain, not statistical tests -For example, does an IQ questionnaire have items covering all areas of intelligence discussed in the scientific literature? • An element of subjectivity exists in relation to determining content validity -For example, determining the content validity of a personality test requires a degree of agreement about what a particular personality trait, such as extraversion, represents *Face Validity* Does the test "look like" a measure of the construct? • It does not guarantee that the test actually measures phenomena in that domain. • Asking a participant who does not know the purpose of a test whether they thought that test measured the construct you were measuring could tell you how face valid it is. • Note: Low face validity is not necessarily good or bad. -If the test has high face validity, participants will likely know what the questionnaire is trying to get at, so they can use that context to help interpret the questions and provide more useful, accurate answers -However, they may also try to change their answer depending on what they think we want *--->demand characteristics*

Moderator Example (Romero-Canyas, Downey et al., 2010)

*Research question* about the formation of online communities: How does interaction with new and established members of a group affect how the person perceives the group and him/herself? • Participants were told they would be assigned to a "compatible" group based on their responses to a questionnaire & would receive emails from other group members • Randomly assigned to receive emails with... -Acceptance -Mild rejection -Harsh rejection • The researchers measured the participant's hostility, negative mood, feelings toward group after receiving the emails • IV: Nature of online interaction (acceptance, mild/harsh rejection) • DVs: Hostility, negative mood, feelings towards the group • The researchers also though that another variable could affect how they react to the IV... -Rejection sensitivity (RS) = Propensity to anxiously expect, readily perceive, and overreact to rejection -Research has shown there are individual differences in rejection sensitivity -Some people are high in RS (HRS) and some people are low in RS (LRS) • MODERATOR = Rejection Sensitivity

Mixed design emperiment ex

All subjects are given a pre-test and a post-test (within-subjects IV) • Subjects are divided into two groups: experiment vs. control (between subjects IV) • Example: You are interested in examining the effects of a new type of cognitive therapy on depression. First, participants would complete a depression questionnaire (pre-test). Then they would be assigned to a treatment or control condition. Finally, they would complete the depression questionnaire at the end (post-test). IV1 is between subjects Need to randomly assign to group IV2 is within subjects Need to randomize the order which treatment/condition they are exposed to **counterbalance the order of presentation

Operationalize example

An example from Spencer, Steele, & Quinn, 1999: Stereotype threat • Conceptual definition: the fear that we will confirm the stereotypes that others have regarding some salient group of which we are a member • "When women perform math, unlike men, they risk being judged by the negative stereotype that women have weaker math ability. We call this predicament stereotype threat and *hypothesize that the apprehension it causes may disrupt women's math performance*." • How can they operationalize STEREOTYPE THREAT and MATH PERFORMANCE in this context? " How can we operationalize stereotype threat and math performance in this context? (its there or its not there quantity) how could we induce ST: give them a short paragraph women are bad at math Conceptual --------->Operational -Gender stereotype threat---->Testing instruction said this test has shown gender difference in past -Stereotype relevant math performance-------------------------------------------->Performance on math questions from the math GRE Hypothesis (stated with Conceptual definitions): For women, when *the gender stereotype* is present, *math performance* will suffer, while for men, when the gender stereotype is present, math performance will improve. Hypothesis (stated with Operational definitions): For women, when *the test instructions say the test has shown gender differences in the past*, this will lead to *worse performance on a portion of the math GRE*, whereas men who receive instructions that said the test has shown gender differences in the past will perform better on a portion of the math GRE.

Construct validity example:

Ella is interested in studying the impact of shyness on making a good first impression in early dating relationships. She sets up a study in which participants engage in a speed dating session. Each participant and each "date" in the speed dating session RATES INTEREST AND LIKING of the people they meet. Here is the measure Ella used to measure *interest and liking* Leikheart scale 1-7 1) How cute do you think this per son was All---Somwhat---Extremly 2) To what extent are u sure person wasn't being fake with you All---Somwhat---Extremly 3) How did this person compare with other people you've met speed dating Fine---Average---Much Better 4) Would you go on a date with this person again given the chance Not at all----Maybe---definitely Is this a face valid measure of *interest and liking*? -No, none of the items ask about interest or liking on the surface level • Does this have adequate content validity? -No, these behaviors may not necessarily capture the concepts of interest and liking • Importance of operationalization!

Operational Definition (ex on web)

Identifies one or more specific, observable events or conditions such that any other researcher can independently measure and/or test for them. Example: A researcher measuring happiness and depression in college students decides to use a ten-question happiness scale to measure positive outlook in her subjects. -In other words, her operational definition of happiness in this case is a given subject's score on the test.

Reading a Graph Example

Is there a main effect of commitment? • No • Is there a main effect of conflict? • Yes! • Is there an interaction? • Yes! • How could we describe the interaction? -In the low commitment condition, those who received a high conflict scenario reported higher self-esteem than those who received a low conflict scenario, whereas in the high commitment condition, those who received a low conflict scenario reported higher self esteem than those who received a high conflict scenario

Statistical Tests for Multiple IVs vs. Moderators -Moderator + IV

Main effect(s) & moderation: -If your moderator is continuous, you should run a regression analysis* -If your moderator is categorical, you treat it (from a statistical analysis perspective) like 2 IV's - 1 quasi and 1 true - and you can run a 2 x 2 ANOVA • See left column for details This also means you will need to state your hypotheses in the same format as 2 IVs-main effect for your IV, main effect for your moderator, and an interaction (moderation) *NOTE: We will NOT be running regressions so if you have a continuous moderator in your study you will be doing a median split on it the variable to make it categorical. Please know that this is not the most appropriate way to do the analysis and if you did your same experiment in the real world you would learn regression and then analyze your moderator differently.

Correlation • Bringing a little stats back into research design... • How do we state our predictions about a correlation?

Null hypothesis: There is no association between variable X and variable Y Alternative hypothesis: There is a (positive/negative) association between variable X and variable Y • A correlational analysis tells us 3 things about a relationship (under the assumption that the relationship is linear): -Direction (+/-) -Significance (p-value) -Strength (effect size)

Between-Subjects Example (Fiorella & Mayer, 2013)

Participants: -In experiment 1: 93 undergrads at UCSB (26 men, 67 women) -In experiment 2: 75 undergrads at UCSB (31 men, 44 women) Procedure: • Students watched a 10 minute video about the Doppler effect • Then they were assigned to 1 of 3 conditions with different instructions: -Teaching: you will be given 5mins to present material as if you were teaching the material to someone else & you will be video-recorded -Preparation: you will be given 5mins to prepare material as if you were going to teach it to someone else -Control: You will be given 5 minutes to study and will be asked questions Finally, they took a comprehension test (e.g. Explain the Doppler effect) -In experiment 1, they took the test immediately after the lesson -In experiment 2 ,they took the test after a delay of one week

Experimental Designs - General Features

Random assignment of participants to conditions • All participants should have an equal chance of being in an experimental group or control group • Why do we need to randomly assign participants? -Provides an "equal chance" of biases in each group -Prevents confounds related to people being variable on how they respond to a manipulation • How do we randomly assign participants to conditions? • Examples: coin flip, random number generator *Hypothesis(es)*: Statement of the expected relationship between your variables • Null hypothesis - no predicted differences BETWEEN THE MEAN LEVEL OF EFFECTS of the experimental conditions on the DV -In a one-factor design with 1 IV with 2 levels (control vs. experimental groups), the null hypothesis would be that there is no effect of the IV on the DV (Note: simply stating there is no effect is too simplistic) -However, many designs are more complicated than this • Alternative hypothesis - predicted effects of the experimental conditions on the DV -Researchers typically only state this hypothesis in their articles

Operationalize

Researchers can define a *construct* both CONCEPTUALLY and OPERATIONALLY • Construct = broad concept or topic of study that is not directly observable (e.g. aggression, love, intelligence) • Conceptual definition = defines a construct in abstract or theoretical terms how can we MEASURE something abstract? • Operational definition = defines a construct by specifying how it can be measured using numbers • Operationalization is the process by which we make an *construct* one that we can quantify

Between-Subjects Example 2 (Baumeister et al., 1998) Results

Results: compared time spent on maze between the Radish, Cookie, and Control groups • F(2,64)=26.88, p<.001 Radish group also self-reported feeling more fatigued, tired by the task; no effect of mood • Results support ego depletion hypothesis - The radish group spent significantly less time at the unsolvable mazes compared to those who ate cookies or controls.

Hypothesis Testing - Reading a mean table for a 2x2 • Comparing marginal means estimate main effects and comparing condition (cell) means estimate interactions

So you can have the following possibilities: • No main effects and no interaction • 1 main effect and no interaction • 2 main effects and no interaction • 1 significant interaction and no main effects • 1 significant interaction and 1 significant main effect • 1 significant interaction and 2 significant main effects Means represent self reported self esteem

How to spot good research

Sources • In psychological research, it is customary to use only empirical research or review articles Peer Review - the peer review process is assumed to evaluate the scientific merit of a study/review... -other people in that area have evaluated and found it scientifically merited -try to catch falsified evidence -if you find something once you want to find It multiple times

IS THIS TRUE OF IV AND DV

The Independent variable is the one the researcher manipulates, while the Dependent variable is the one the researcher measures

Between-Subjects Example 2 (Baumeister et al., 1998) How Can We Say This in Paper

The group that had to exert the most self control (radish group) showed they had less resources available to persist at a challenging task. When participants did an ego depleting task (like not eating cookies) they were left with less resources after, and this ego depletion affected their persistence in a problem solving task. Specifically these participants were less persistent in solving unsolvable mazes (i.e. spent significantly less time on them), as compared with those who did not perform an ego depleting task.

Real Multifactorial Example (Prybylski & Winstein, 2012) • Summary:

The mere presence of mobile phones inhibited the development of interpersonal closeness and trust, and reduced the extent to which individuals felt empathy and understanding from their partners. • These effects were most pronounced if individuals were discussing a personally meaningful topic. No main effect of convo type - marginal mean of casual not significantly different from important • Significant main effect of phone presence - marginal mean for phone absent is significantly higher than marginal mean for phone present • Significant interaction - lowest mean was phone present/important conversation compared to all other means

Why do we care about the null hypothesis then?

The null is what your statistical tests are based on, so to understand what our stats are doing we need to also understand the null hypothesis.

Hypothesis Testing - Reading a mean table for a 2x2

What information is given in a 2x2 ANOVA mean table? • Mean of condition: mean of one DV in a particular condition • Marginal Means: mean of the effects of one IV collapsed across the other IV • Looking at a mean table can give you a good estimate of whether you have main effects and possible interactions (but always run inferential statistics before concluding anything!)

Null vs. Alternative Hypotheses Example

You are conducting your honors thesis looking at how verbal feedback influences students intrinsic motivation (interest & enjoyment) in their Psychology class. Your design has two groups: *those who receive positive verbal feedback and those who receive no feedback*. • What is the IV? Verbal Feedback • What is the DV? • The null hypothesis must be specific about "no effect" -There will be no differences in self-report measures of intrinsic motivation between the positive and no feedback conditions • The alternative hypothesis always needs to be stated such that you are comparing the effects of the experimental conditions on the DV. -BAD HYPOTHESIS: Students who receive positive feedback will score higher on self report measures of intrinsic motivation. NO COMPARISON GROUP! -GOOD HYPOTHESIS: Students who receive positive feedback will score higher on self report measures of intrinsic motivation, as compared to students who received no feedback.

Type II Error

You failed to claim a relationship between variables and there actually is a relationship (Retained the null when it is false)

Hypothesis Testing - Power

________ Probability of rejecting the null hypothesis when it needs to be rejected (when the null is false and the alternative is true) • We ideally want our power to be > .80 • Things that affect power • Sample size - the larger, the better • Variance - the smaller, the better • Power Analysis: At the outset of a study, researchers can determine sample size needed to potentially detect effect by assuming a particular effect size • You can estimate your effect size by averaging the effect sizes of previous research that examined similar effects

Evaluate - Types of Validity Internal Validity

__________ Degree to which there can be reasonable certainty that the independent variables in an experiment caused the effects obtained the dependent variables Can you be sure that X actually caused (a change in) Y? • Proper true experimental methods should ensure high internal validity Issues to consider with ______ • *Random assignment*: participants must be randomized to experimental conditions to avoid CONFOUNDS • *Construct validity*: did you measure what you intended to measure? Did you manipulate the IV in an effective way?

Hypothesis Testing - Effect Size

___________ indicates the strength/magnitude of an effect • Correlational: (r) strength of association between two variables • Experimental: (d) difference between means of experimental & control groups [e.g. degree of change in the dependent variable (DV) attributable to the independent variable (IV)] .6-.8 "large" .3-.5 "moderate" Not good or bad .2-.3 "weak" • NOTE: Statistically small effects may still be of considerable practical importance (e.g., aspirin & heart disease)

Evaluate - Types of Validity Construct Validity

_____________ How well do operationalizations represent the intended concepts? Issues to consider with ______________: • *Content validity* - Extent to which survey test items are a representative sample of the behavior measures • *Face validity* - Extent to which survey items appears on the surface to measure what it intends to measure • *Manipulation check* - additional measure or question in the study that checks to make sure your manipulation induced the changes it was assumed to have made

Evaluate - Types of Validity Statistical Validity

_______________ • Have you demonstrated that an effect is not likely due to chance? • Statistical significance testing -95% confidence p<.05 • Obtaining statistical significance -Large number of subjects in a study -Large experimental effects • Statistical significance does NOT mean importance, just a degree of certainty that results are not due to chance

Evaluate - Types of Validity External Validity

________________ • Degree to which there can be reasonable confidence that results of a study would be obtained for other people and in other situations • Issues to consider with ____________ • *Random sampling*: how well people in your experiment represent a random sample of the larger population of interest -Most psychological research uses convenience sample • *Mundane realism*: the extent to which an experiment physically resembles real life situations -E.g. list learning is often used for memory paradigms, but how often do we memorize lists in the real world? • *Psychological realism*: the extent to which the psychological processes triggered in an experiment are similar to those that occur in everyday life

Hypothesis Testing - What is it?

• AKA Null Hypothesis Significance Testing (NHST) • *H0 (Null)*: No relationship between 2 variables -There is no difference between the experimental vs. control group (this example only works for correlation or simple manipulation vs control design) -There is no mean difference between two variables; no difference in treatment group improvement (this example is necessary for all other more complicated designs) • *H1 (Alternative)*: Relationship does exist between 2 variables -There is a difference between experimental and control groups (IV does have an effect on DV) or a difference in amount of improvement produced by one treatment compared to another. • When we run our statistical analyses, we are essentially calculating the probability of obtaining a null hypothesis.

Within-subjects designs - Statistics

• Although we interpret and report our results in the same way as between-subjects designs, the underlying statistics are different • Statistics need to account for the fact that *observations are not independent* (what they have in common is that each score is from the same person) ex Jack Condition 1 Condition 2 Condition 3 Condition 4

Real Multifactorial Example (Prybylski & Winstein, 2012)

• Background: Recent advancements in communication technology have enabled billions of people to connect using mobile phones, yet little is known about how the frequent presence of these devices in social settings influences face-to-face interactions. • Purpose of the study: Evaluate the extent to which the mere presence of mobile communication devices shape relationship quality in dyads Participants: 68 (25 men, 43 women) strangers paired with each other (34 pairs) • Procedure: • Spend 10 mins engaging in a brief relationship formation task • Randomly assigned to conversation type (IV #1) -Casual:talk about plastic holiday trees -Meaningful: talk about most meaningful events of pasty ear • Randomly assigned to presence of phone (IV #2) • Phone on table • No phone on table(note book in its place instead) • DVs = Relationship quality

Within-subjects example - two-factor (Anderson, Benjamin, and Bartholow, 1998)

• Background: The "weapons effect" is the finding that the presence of a weapon or even a picture of a weapon can cause people to behave more aggressively. • Purpose of the study: to examine the hypothesis that the presence of a weapon-word prime (such as "dagger" or "bullet") should increase the accessibility of an aggressive word (such as "destroy" or "wound"). • Does the mere presence of a weapon increase the accessibility of aggressive thoughts? More specifically, can a person name an aggressive word more quickly if it is preceded by a weapon word prime than if it is preceded by a neutral (non-aggressive) word prime? Participants: 35 undergrads (19 men, 16 women, 18-24 years) • Procedure: • Told that the purpose of this study was to test reading ability of various words • A computer presented a priming stimulus word (either a weapon or non weapon word) for 1.25 seconds, a blank screen for 0.5 seconds, and then a target word (aggressive or non-aggressive word). • Each subject named both aggressive and non-aggressive words following both weapon and non-weapon "primes." Procedure (continued): • Examples of the four types of words: -Weapon word primes: shotgun, grenade -Non-weapon word primes: rabbit, fish -Aggressive word: injure, shatter -Non-aggressive word: consider, relocate • The experimenter instructed the subjects to read the first word to themselves and then to read the second word out loud as quickly as they could. The computer recorded response times and computed mean response times for each participant for each of the four conditions

True Experimental Designs

• Between-subjects -Focuses on the differences between individuals on a given variable/process--->participants are assigned to one treatment condition each • Within-subjects -Focuses on differences between multiple measurements of a variable with the person; every person gets all of the same conditions as the other participants • Mixed design -Combines measurement at both between and within person levels

Real Multifactorial Example (Prybylski & Winstein, 2012) What were their hypotheses for each IV?

• Conversation type: Participants engaged in meaningful conversation will report higher quality of the relationship with their paired partner than those engaged in casual conversation • Presence of phone: Participants with phone present will report lower quality of the relationship with their paired partner than those who don't have a phone present Remember that interaction effects *qualify* the main effects • How could we state potential interaction hypotheses? -Participants with a phone present will report lower relationship quality with their paired partner, and this will be most pronounced for those engaging in the meaningful conversation. -Relationship quality will suffer the most when people are trying to make a meaningful connection and a mobile phone is present as compared to all other conditions. -When people engage in casual conversation, the presence of a phone will not affect relationship quality, whereas when people engage in meaningful conversation, the presence of a phone will decrease relationship quality compared to the absence of a phone.

(Fiorella & Mayer, 2013)

• Experiment 2 Hypothesis: *Learning by teaching* a concept will produce differential cognitive structures of to-be-learned material that will increase performance on a DELAYED measure of comprehension compared to those in the *Prepare to teach condition* and the control condition. • Teach > Prepare • Teach > Control Results: F(2, 72) = 5.00, p = .009 Does DELAYED comprehension performance differ significantly across groups? YES! • Teaching group outperformed prepare & control group; prepare & control groups did not differ significantly from each other • Teach > prepare = control on DELAYED comprehension exam

Within-subjects designs

• Features • Requires fewer participants because participants *serve as their own control* -Higher power • Reduces variability because there is usually more variation *between* people than *within* people • Can see how participants change over time in response to various conditions • Example: -Often used in health studies to examine effects of different treatments or exposures to treatment on person's functioning -Patient could receive placebo first, then be tested, then receive a drug, then be tested

Correlation

• Goal is to assess relationships between and among two or more variables • Common methods: -Observational designs - both in lab and in the field -Survey (self-report) designs

Measure - Potential problems • What if people in different conditions had different experiences during the study?

• If the levels of X differed in any way other than X (e.g. run at different times of day), this also becomes a potential *third variable or confound*. • For this reason, attempts need to be made to hold the situations for all conditions constant except for differences in X, e.g. same non-manipulated materials • A good true experiment should have: -Random assignment of subjects to the conditions or levels of the IV -A manipulated IV -Experimental control - reasonable attempts to hold everything constant other than the manipulated IV. If you can't control a possible confounding variable, then measure it so you can statistically control for it late

Multifactorial Design

• If we have a 3 x 4 between-subjects factorial design, how many IVs do we have and how many levels of each IV are there? -(three levels of IV1 and four levels of IV2) Number of digits = Number of IVs • "2 x 2" "3 x 4"-two IVs • "2 x 2 x 2" "3 x 2 x 4"-three IVs Value of digit = # of levels in each IV • "2 x 2"- both IVs have 2 levels • "3 x 4x 2"- the first IV has 3 levels, the second has 4, and the third has 2

Moderator Example (Romero-Canyas, Downey et al., 2010) Interaction

• Now let's think about how the IV and moderator could interact... • Would HRS individuals react the same way to mild or harsh rejection as LRS individuals? • One way to think about moderations is to ask "which condition would be most affected by the moderator?" • *Interaction hypothesis*: The relationship between (IV) acceptance or rejection from an online community group member on a person's (DV) mood and feelings towards a group will be *more pronounced* for those (M) high in rejection sensitivity, such that those who receive (IV) rejection feedback would report *more* (DV) negative mood & feelings towards the online community than those who receive (IV) acceptance feedback, and this effect would be even greater in the people who score (M) high in rejection sensitivity compared to those who score (M) low in rejection sensitivity. IV: Type of Response (acceptance vs.mild rejection vs.harsh rejection---------->DV: Negative Mood ^ Rejection Sensitivity (High vs. Low

Between-Subjects Example 2 (Baumeister et al., 1998)

• Participants: 67 intro psych students (31 men, 36 women) • Procedure: • Told this is a study on "taste perception" and to skip 1 meal and not eat for 3 hours prior to session • 2 foods presented on table - cookie and radish • Told there will be follow-up about sensation memory the next day (although they didn't actually do this) • Take 5 minutes to taste the assigned food, BUT eat ONLY the food assigned to you! • Experimenter leaves the room during "tasting" and observes the participants unobtrusively through a 1-way mirror, recording # of items eaten and verifying that they only ate the assigned food Condition -Eat 2-3 cookies -Eat2-3 Radishes -Control: Neither food presented (skip directly to the final tasks) Procedure (continued): • Participants then filled out mood and self-restraint questionnaires • Then they were asked to complete a Maze task. "Try as many times as you like, but don't pick up pencil from the paper or retrace over any lines. You will be judged on whether or not you finish tracing the figure. If you wish to stop before you finish, ring the bell on the table" • NOTE: These were unsolvable mazes! Thus they were interested in looking at the amount of time the participant persists at the task before giving up. • *Ego-depletion hypothesis*: Resisting temptation uses energy, therefore participants who have to resist a tempting food (those in the radish condition) will feel ego-depleted and they will persist less at the maze task than the cookie and control groups. • Radish < Cookie • Radish < Control

Correlation • Direction

• Positive - both variables change in the same direction, as one increases the other increases, or as one decreases the other decreases -ex self esteem up perceived social support up • Negative - variables are inversely related, as one increases the other decreases or vice versa ex self-esteem goes up depression down

Example Multifactorial Design

• We are interested in examining the effect of temperature of a room (warm vs. cool) and test difficulty (easy vs. hard) on test performance • IVs -Temperature -Test difficulty • Interaction = temperature X test difficulty • DV = test performance How to write an interaction hypothesis: • Remember that you will need to write more than "We hypothesized an interaction." You need to describe it too! • It is similar to writing a moderation hypothesis, like we did last week. • One way is to describe the effect of 1 IV across the levels of another IV • Let's start with test difficulty... • "Do students do better with a hard or easy test?" • If you answer "it depends on room temperature," then you expect an interaction • Then you describe the differences you expect Possible interactions: • In a cool room, students will perform better on the hard test compared to the easy test, whereas in a warm room, students will perform better on the easy test compared to the hard test. • When the room is cool, there will be no differences in test performance between easy and hard tests, whereas when the room is warm, test performance will be higher on the easy test than the hard test. • Students will perform higher on the easy tests, and the effect of test difficulty on test performance will be more pronounced in a warm room than a cool room, such that test performance is the highest with an easy test in a warm room.

Experimental Designs Week 4

• We've already talked about one factor and multifactorial (2x2) between-subjects designs -Participants in different conditions • Within-subjects/repeated measures designs -Participants complete all conditions or are tested multiple times • Mixed factorial designs -1 IV is between and 1 IV is within • Quasi-experimental designs -Not quite experimental because you don't randomly assign conditions -For statistical analyses, we treat them the same way as true IVs, but conclusions may be limited


Set pelajaran terkait

SDV Flashcards: Time Management, Overcoming Procrastination, Setting and Accepting Realistic Goals

View Set

BioEthics : Natural Inclinations and Natural Law

View Set

Rosetta Stone French Level 2 Unit 4 Lesson 1

View Set

Ch 19 Short-Term Financial Planning

View Set

Chapter 11 & 12 Medical Terminology

View Set

Comptia A+ 220-701 Laser Printers

View Set

Quiz: Multiplying and Dividing Polynomials

View Set