Psych 220 Exam 2
Two examples of simple experiments 1. taking notes 2. eating pasta
*1. Taking notes* - compared two practices of note taking: using hand written notes or using a laptop - taking notes on the laptop allowed students to take more concise notes but don't stop to think much about what the professor is saying - hand written notes causes students to need to summarize and connect ideas about the information presented but with fewer words than on the computer - they were then distracted for a while and then took an assessment to see how much they learned - on the factual information, the laptop and handwritten notes group scored the same but on conceptual information the handwritten one scores higher - causal claim = taking handwritten notes causes students to do better in class *2. eating pasta* - students were assigned one of the two experimental sessions through a coin flip - some were assigned large bowl and others medium bowl - they served themselves pasta and once the bowl was halfway done they would get it refilled - after each person finished eating they wished the bowl to see how much each person ate at their own pace - on average the participants took more pasta from the large serving bowl than the medium - large bowl people had eating 140 calories less than the medium bowl one
interrogating causal claims with the four validities 1. construct validity 2. external validity 3. statistical validity 4. internal validity
*1. construct validity* - how well were dependent variables measured? - how well were independent variables manipulated? - manipulation check = an extra DV that researchers can insert to an experiment to convince them that their experimental manipulation worked - pilot study = simple study using a separate group of participants that is completed before conducting the study of primary interest - measures and manipulations used in the study captured the conceptual variables in the theory *2. external validity* - can causal relationships generalize to other people, places, and times - generalizing to people = how were participants recruited and ask about random assignment - generalizing to other situations = applies to all type of situations to which can experiment might generalize and necessary to consider results of other research - researchers prioritize internal validity so external validity can be poor sometimes *3. statistical validity* - statistical significance = determine where the difference between means obtain is significant because if it is you can suggest covariance exists and if its not then there is no covariance - effect size = tiny differences can be significant in large samples and the effect size can help evaluate the strength of the covariance - statistical significance does not always mean large effect size *4. internal validity* - priority when interrogating causal claims - if internal validity is sound then the causal claim is almost certainly appropriate - three fundamental internal validity questions: a) were there any design confounds? b) if independent group was used, were selection effects controlled for using random assignment or matching? c) if within groups was used, was order effects controlled for through counterbalancing?
why experiments support causal claims 1. experiment establish covariance 2. experiments establish temporal precedence 3. experiments establish internal validity
*1. experiments establish covariance* - covariance is between the causal and outcome variable; at times the difference in group means - if independent variables did not vary there would be no way to establish covariance - covariance criterion is obvious but not in everyday situations because they don't have comparison groups - the results matter when it comes to variance as much as manipulating independent variable - control group = level of an independent variable that is intended for no treatment or neutral condition - treatment groups = other levels of independent variable that are receiving treatment - placebo group = control group that thinks they are receiving treatment but they are not - not every study needs a control group *2. experiments establish temporal precedence* - the experimenters manipulated the causal variable to ensure that it came first in time; by manipulating the independent variable, the experimenter virtually ensured that the cause comes before the effect - the ability to establish temporal precedence is a strong advantage of experimental designs in comparison to correlational studies *3. experiments establish internal validity* - well designed experiments can establish internal validity (most important when interrogating causal claims) - study must ensure that causal variable and not other factors are responsible for change in outcome variable - confounds = potential threats to internal validity as there can be several alternative explanations - design confounds = an experimenters mistake in designing the independent variable; it is a second variable that varies systematically along with the IV and is another explanation for results - classic threat to internal validity and cannot support causal claim if present - systematic variability = attitude of researcher with IV where they treat one group differently - unsystematic variability = random attitude across both groups then attitude is confound - selection effects = when the participants in one level of IV are systematically different from those in the other - selection effects can occur when researchers let participants choose the groups they want to be in or if they assign one kind of group to one condition and another kind of group to another - to avoid selection effects, you use random assignments - matched groups can avoid selection effect because they ensure the participants are equal in each group by seeing people who have same results and assigning them to different groups
1. Experiments with two independent variables can show interactions 2. intuitive interactions
*1. experiments with two IV can show interactions* - adding additional independent variables to a study allows researchers to look for interaction effects - interaction effect = whether the effect of the original IV depends of the level of another IV - an interaction of two IV's allows researchers to establish whether or not "it depends" - mathematical way to describe the interaction of two IV's is to refer to it as "difference in differences" (ex: different between cell phone and control condition might be different for older and younger drivers) - interaction --> a difference in differences --> the effect of one IV depends on the level of the other IV *2. intuitive interactions* - crossover interaction = a graph of interaction in which the lines cross each other meaning the results can be described as "it depends" because the IV depends on the conditions - spreading interaction = a graph of interaction in which the lines are not parallel but do not cross each other either meaning the results can be described as "only when" because the IV's depend on a condition only when something happens
Responses of self reporting methods 1. inaccurate answers 2. meaningful responses 3. using shortcuts 4. trying to look good 5. "more than they can know" 6. memories of events 7. rating products 8. advantages of self-reports 9. disadvantages of self-reports
*1. inaccurate answers* - people don't want to make an effort to think about each question - they want to look good - unable to report accurately their own motivations or memories *2. meaningful responses* - self-reports provide most meaningful information that can be obtained - some cases are the only optional measure (ex: knowing what someone dreams) - meaningful to ask people to report on their own subjective experiences *3. using shortcuts* - response set (non differentiation) = type of shortcut respondents can take when answering survey questions - response sets means participants pick a specific way to respond to all the questions (neutral, positive, or negative) without thinking about question - acquiescence (yea saying) = when people say "yes" or "strongly agree" to every item instead of thinking about it - fence sitting = playing it safe by answering in the middle of scale - difficult to know when someone is being honest when they use these methods *4. trying to look good* - socially desirable responding (faking good) = when respondents given answers that make them look better than they are - faking bad = idea that respondents are embarrassed of giving an unpopular opinion so they do not tell truth - to control for these, researchers tell participants responses are anonymous *5. "more than they can know"* - researchers wonder whether people can accurately report on feelings, thoughts, and actions - cannot assume reasons people give for own behavior are the truth *6. memories of events* - people's memories of events they participated in are not very accurate - people tend to be more confident in their recollection of memories in comparison to how accurate they are *7. rating products* - people may not be able to accurately report on products they buy because their responses are directly influenced on the cost and prestige of the product *8. advantages of self reports* - quick, easy, and inexpensive - potentially provides anonymity - allows to assess personality, attitudes, demographics, etc. simultaneously *9. disadvantages of self reports* - assumptions that people provide honest answers - assumption that people know themselves - assumption that "my 7 is your 7"
factorial variations 1. increasing number of levels of an IV 2. increasing the number of IV's 3. Two way interactions 4. three way interactions 5. why worry about these interactions?
*1. increasing the number of levels of an IV* - researchers can add more levels to an IV instead of being limited to a 2x2 some studies have 2x3 where one IV has 3 levels - "__x__" = first spot is for levels of one IV and second spot is for levels of the other IV - main effects and interactions are still computed using marginal means *2. Increasing the number of IV's* - sometimes its necessary to have more than two IV's in a study - three way design = 2x2x2 factorial design where there are two levels for three IV's - number of differences investigated in this design is increased - you are considered with three main effects and three separate two-way interactions and a three-way interaction *3. two way interactions* - in a three way design there are three possible. two way interactions - to inspect each interaction you construct three 2x2 tables - after commuting means you can investigate the differences using a table or graphing it *4. three way interactions* - in three-way design final results is a single three way interaction - if significant it means that the two way interaction between two of the IV's depends on the level of the third IV - easiest to detect by looking at line graphs - more main effects and more interactions are looked at as the number of IV's increase *5. why worry about all these interactions* - outcomes in psychological science studies and in life are not main effects, they are interactions - ex: is it okay to be forgiving in a relationship? - answer: reflects an interaction because people say it would depend on how serious the problems are in the relationship
factorial variations 1. independent groups factorial designs 2. within groups factorial designs 3. mixed factorial designs
*1. independent groups factorial designs* - both IV are studied as independent groups - if design is 2x2 there's four different groups of participants in the study - ex: if researchers want to use 50 people per cell they need a total of 200 *2. within groups factorial design* - both IV are manipulated as within groups - if design is 2x2 then there's only one group of participants but they participate in all four combinations of the design - requires less participants *3. mixed factorial designs* - one IV is manipulated as independent groups and other IV is manipulated as within groups - if researchers wanted 50 people in each of 2x2 mixed design, they'd need 100 people because 50 would be participating at both levels of the condition
independent groups design 1. independent groups vs within groups design 2. posttest-only design 3. pretest/posttest design 4. which design is better?
*1. independent groups vs within groups design* - independent groups = different groups of participants are place into different levels of IV - within groups = only one group and each person is presented with all levels of the independent variable *2. posttest-only design* - participants are randomly assigned to independent variable groups and are tested on the DV once - simplest independent groups design - satisfies three criteria for causation because covariance by detecting differences in DV, temporal precedence because IV comes first in time, and establishes internal validity *3. pretest/posttest design* - participants are randomly assigned at least two different groups and are tested on DV twice: before and after exposure to IV - used when they want to demonstrate random assignment made groups equal - allows tracking performance over time *4. which design is better?* - some situations it is problematic to use pretest/posttest design - mix of posttest design and random assignment with manipulated variable can lead to powerful causal conclusions
Choosing question formats 1. open ended questions 2. forced choice questions
*1. open-ended questions* - allowing respondents to answer in any way they like - categorizing these answers can be difficult because there are many answers people can give *2. forced-choice questions* - used to restrict answers given by people - people often give opinions by picking the best of two or more options - likert scale = strongly agree to strongly disagree scale - semantic differential format = rating target objects using a numeric scale that is anchored with adjectives
Experimental variables 1. psychology experiment 2. manipulated variable 3. measured variables 4. independent variable 5. dependent variable 6. control variables
*1. psychology experiment* - researchers manipulated at least one variable and measured another - can take place in laboratory or just about anywhere *2. manipulated variable* - variable that is controlled such as when researchers assign people to specific groups of that variable *3. measured variables* - take the forms of records of behavior or attitudes such as self reports, behavioral, etc. *4. independent variables* - manipulated variable in the experiment - the levels of the variable measured are referred to as conditions *5. dependent variables* - measured variable or outcome variable - how participants act on the measured variable depends on the level of the independent variable - researchers have less control over this variable *6. control variables* - control variable = any variable that an experiment holds constant on purpose - when researchers are manipulating independent variables, they need to make sure they are varying only one thing at a time - they control for potential third variables by holding all factors constant between the levels of the independent variable
writing well-worded questions 1. question wording matters 2. double barreled questions 3. negative wording 4. question order
*1. question wording matters* - the way in which question is worded can sometimes influence the answers given - leading question = one whose wording leads people to a particular response - this question should be formed in a specific way to get neutral response *2. double barreled questions* - meaning = two questions in one - wording of questions is difficult to understand so its hard for respondents to answer accurately - has poor construct validity because it might respond to the first half of the question but not the second half *3. negative wording* - negative phrasing in questions can confuse participants which reduces construct validity - might capture people's ability to figure out the question instead of their true opinions *4. question order* - order in which questions are asked can affect the responses to a survey - the earlier questions can change the way respondents understand or answer later questions - best way to control for this is by creating different versions of survey
Balancing priorities in quasi experiments 1. real world opportunities 2. external validity 3. ethics 4. construct validity and statistical validity in quasi-experiments
*1. real world opportunities* - quasi experiment allow to study interesting phenomena and important events - some studies take advantage of events that occurred in real world settings *2. external validity* - can enhance external validity with real world settings because of the likelihood of patterers observed will generalize to other circumstances or individuals *3. ethics* - ethical concerns are reasons researchers choose quasi experiments - many questions of uninteresting by unethical to study in true experiments because certain groups should not be assigned *4. construct and statistical validity in quasi-experiments* - usually shows good construct validity for IV but need to question whether the measure successfully grasped the DV - to check for statistical validity you need to ask how large the effect size was and whether the results were significant
within-groups design 1. repeated measures design 2. concurrent-measures design 3. advantages of within groups design 4. covariance, temporal precedence, and internal validity in within groups 5. disadvantages of within groups
*1. repeated measures design* - participants are measured on a DV more than once, after exposure to each level of the IV *2. concurrent measures design* - participants are exposed to all levels of an IV at roughly the same time and a single attitudinal or behavioral preference is the DV *3. advantages of within groups* - ensure participants in two groups will be equivalent - gives researchers more power to notice differences between conditions - power = probability that a study will show a statistically significant result when an IV truly has an effect in the population - attractive bc it generally requires fewer participants overall *4. covariance, temporal precedence, and internal validity in within groups* - covariance and temporal precedence can be established in within groups when comparison conditions are incorporated - no selection effects worries because participants are the same - threat to internal validity is order effects - order effects = being exposed to one condition changes how participants react to other condition - occurs when one level of IV influences responses to the next level and is a confound - practice effects = in exposure to task might lead them to get better at task or to get tired/bored towards the end - carryover effects = some form of contamination carries over from one condition to the next - counterbalancing = present the levels of the IV in different sequences to avoid order effects - full counterbalancing = all possible conditions are presented - partial counterbalancing = only some of the possible conditions are presented/presenting conditions in randomized order *5. disadvantages of within groups* - potential for order effects which threaten internal validity but can be solved through counterbalance - might nit be possible or practical - people see all levels of IV and then change the way they would normally act - demand characteristic = how participants can guess experimenters hypothesis
Factorial designs 1. studying two independent variables 2. can test limits
*1. studying two IV's* - factorial design = one in which there are two or more independent variables - used when researchers want to test for interactions - researchers tend to cross the two independent variables to study each possible combination - 2x2 factorial design = two levels of one IV are crossed with two levels of another IV - participant variable = variable whose levels are selected (measured) not manipulated - due to the variables not being manipulated they are not truly IV's but are called that in factorial designs *2. can test limits* - conduct studies with factorial designs to test whether an IV affects different kinds of people or people in different situations the same way - when testing an IV in more than on group at once, they are testing whether the effects generalize - testing for moderators is often referred to as the process of using factorial designs to test limits - moderator is an IV that changes the relationships between another IV and DV - moderator usually results in an interaction = the effect of one IV depends on the level of another IV
Problems with observational research 1. threats to construct validity 2. preventing observer bias and effects
*1. threats to construct validity* a) observer bias - when observer's expectations influence their interpretation of the participants behaviors and outcome of the study - instead of their observations being objective, they seek for it to confirm what they believe or their hypothesis b) observer effects - when observes inadvertently change the behavior of those they are observing to match their expectations ex 1: brights and dull rats - gave 5 rats for students to test - students had to record how long it took the rats to get out of a simple maze but researchers told some students that their rats were bred to be "maze bright" and others to be "maze dull" - although it was a lie participants believed it and it influenced the times they recorded for their rats ex 2: clever Hans - scientist thought he had taught his horse to do math but when a psychologist watched him she found out that when the observer knew the answer to the math problem Hans would get it right and when they didn't know he would get it Wrong - this was because the observer was giving head signals to Hans without realizing it giving him the answer c) reactivity - change in behavior when study participants know another person is watching (either good behavior or bad) - sometimes the presence of an outsider is enough to change the behavior of those being observed - solution 1: blend in = unobtrusive observations where you make yourself less notable to avoid observer effects - solution 2: wait it out = make sure to not rush into things when observing - solution 3: measure the behavior's results *2. preventing observer bias and observer effects* - researchers train their observers well to make reliable judgments without bias using codebooks - use multiple observers to test interrater reliability construct validity - masked research designs (blind design) = observers are unaware of the purpose of the study and the conditions to which participants have been assigned
Association claims 1. Bivariate Correlations 2. association claims with two quantitative variables 3. associations with categorical data 4. study with all correlational variables measured
*association claims* - describe the relationship found between two measured variables *1. bivariate correlations* - association that involves exactly two variables - researchers need to measure the first and second variable in the same group of people then use graphs or simple statistics to describe the type of relationship held between those two variables - even if more than variables are measured in a study, bivariate correlations only focus on looking at two variables at a time *2. association claims with two quantitative variables* - after testing an association claim you need to then describe the relationship between the two variables using scatterplots and the correlation coefficient r *3. associations with categorical data* - some variables have a mix of quantitative and categorical variables where one is categorical and other is quantitative - when measuring one quantitate and categorical variable a bar graph is most appropriate - each person is represented through the mean of all the participants - using bar graph means examining the difference between the groups averages to find an association - when analyzing these associations, researchers tend to use t test to determine the difference between means as statistically significant *4. study with all correlational variables measured* - claim is not supported by a particular kind of statistic or graph - it is supported by a study design (correlational research) in which all the variables are measured
evaluating the four validities in small N designs
*internal* - can be very high if the study is carefully designed *external* - can be problematic depending on the goals of the study *construct* - can also be very high if definitions and observations are precise *statistical* - not always relevant to small N studies
mediation 1. steps to measure mediators 2. mediators vs third variables 3. mediators vs moderators
*mediator* = reason why one variable is more likely to influence another or why does the correlation between two variables exist - correlational and experimental studies both can include mediators *1. steps to measure mediators* - mediation analyses require multivariate tools such as multiple regression 1) test relationship for c (mediator) 2) test relationship for a (whether it is associated to c) 3) test relationship for b (whether it is associated to c) 4) run a regression test using both a and b to predict the IV and see if relationship C goes away 5) proposed causal variable is measured and manipulated first in a study and then later by the mediating variable - if design establishes temporal precedence and follows these steps then there's evidence for mediation *2. mediators vs third variables* - mediators appear similar to third variable explanations - they function differently but they do involve multivariate designs and use multiple regression - proposed third variable is external to the two variables in the original bivariate correlation but mediators are intended to isolate which aspect of the presumed causal variable is responsible for the relationship *3. mediators vs moderators* - when test for mediators the question is: why are these two variables linked? but when testing for moderating variables they ask: are these two variables linked the same way for everyone? - mediating variable comes in the middle of two variables while moderator can change the relationship between two variables by making it more or less intense
Ruling out third variables with multiple regression analyses 1. measuring more than two variables 2. regression results indicate if a third variable affects the relationship 3. adding more predictors to a regression 4. regression does not establish causation
*multiple regression* = statistical technique that can help rule out some third variables by addressing some internal validity concerns *1. measuring more than two variables* - conduct a multivariate correlational study when measuring more than two variables to test the interrelationship among them all - conducting multivariate designs allow researchers to evaluate whether a relationship between two key variables still holds when they control for another variable - "controlling for" = identifying subgroups and determining whether taking out a third variable maintains the correlation or if it is was influenced the correlation *2. regression results indicate if a third variable affects the relationship* - when researchers use regression, they are testing whether some key relationships hold true even when a suspected third variable is statistically controlled for - criterion variable (DV) = variable that they are most interested in understanding or predicting - predictor variable (IV) = rest of variables measured in regression analysis - beta is similar to r but reveals more than r and often has one beta per predictor variable - positive beta = positive relationship between IV and DV when other IV's are statistically controlled for - negative beta = negative relationship - beta changes based on what other IV's are controlled for *3. adding more predictors to a regression* - adding several predictors to a regression analysis can help answer two kinds of questions: a) it helps control for several third variables at once b) by looking at betas for all other predictors you can determine which factor most strongly influences the specific criterion variable *4. regression does not establish causation* - multivariate designs do not always establish temporal precedence - even if studies take place over a long period of time, researchers cannot control for other variables they did not measure that could be influencing the relationship - experimenting studies are better at establishing causation than correlational studies
two examples of independent groups quasi experiments 1. psychological effects of walking by a church 2. psychological effects of cosmetic surgery
*non equivalent control group design* - independent groups design in which there are different participants at each level of the IV *1. psychological effects of walking by a church* - determining if passing by a church influences how people feel about others - compared people walking by a church to those walking by government buildings - asked people to complete short questionnaire asking about political views and attitudes towards various social groups - IV = type of building DV = the attitude measured - expected people who walk by church to be less accepting of different social groups - nonequivalent control group posttest only design = the participants were not randomly assigned groups and were test only after exposure to one level of the IV *2. psychological effects of cosmetic surgery* - compared women who had already gotten cosmetic surgery to women who were waiting to get it to see how it affects self esteem - questionnaire measured how they felt before the surgery, 3 months after, 6 months after the surgery, and a year after the surgery - nonequivalent control group pretest/posttest design = not randomly assigned to groups but were test before and after an intervention
Observational research background 1. three examples of observational research 2. observations can be better than self-reports
*observational research* = when researcher watches people or animals and systematically records how they behave or what they are doing - prefer this method bc some people can't accurately self report on some behaviors - can be basis for frequency claims - can be used to operationalize variables in association or causal claims *1. three examples of observational research* a) observing how much people talk - researchers gave people chips to wear and made them use it for 2-10 days and at the end recorded how many times the chip had heard words - people could not self report on how many words they would have said in the allotted time b) observing hockey moms and dads - researchers sat throughout hockey games to observe how many times parents acted in violent ways - parents may not be able to self report how violent they act because they may not be aware of it c) observing families in the evening - researchers had a camera follow working parents throughout the day and later measured for variety of behaviors from the videotapes *2. observations can be better than self-reports* - works better to know what a person is doing or what influences their behavior - people are sometimes not consciously aware of their behaviors in certain situations
two examples of repeated measures quasi experiments 1. food breaks and parole decisions 2. assessing the impact of health care laws
*repeated measures quasi experiments* - participants experiencing all levels of an IV - research relies on already scheduled event, new policy, or chance to occurrence that manipulates the IV *1. food breaks and parole decisions* - tested how many times judges gave parole to criminals based on when they ate - 65% of times judges denied parole but time parole ruling was held would increase chances of parole being granted but then declined after - after judges got mid morning snack the probability of parole rulings went up then down again - interrupted time series design = measure them repeatedly on a DV before, during, and after the interruption cause by some event 2. assessing the impact of health care laws - took answers from general survey asking about health insurance before and after the Massachusetts law that granted free health care - nonequivalent control interrupted time-series design = combines nonequivalent control group design and interrupted time series design
Surveys and Polls background info
*surveys* = term used when people are asked about a consumer product *polls* = when people are asked about their social or political opinions *book meaning* = a method of posing questions to people on the phone, in personal interviews, on written questionnaires or online - researchers who develop questions carefully can support frequency, association, or causal claims with good construct validity
Interrogating association claims 3. internal validity
- can we make a causal inference from an association? - correlation is not causation unless it meets three criteria *applying the three causal criteria* 1. covariance of cause and effect - results must show a correlation or association between the cause variable and the effect - are variables correlated (A <--> B) 2. temporal precedence - the cause variable must precede the effect variable (must come first in time) - sometimes referred to as the directionality problem because we don't know which variables cam first - did A cause B or did B cause A 3. internal validity - there must be no plausible alternative explanations for the relationship between two variables - referred to as third variable problem because when we come up with alternative explanations for associations between two variables that alternative is some lurking third variable - there's a C variable that is independently associated with A and B *more on internal validity: when is the potential third variable a problem?* - spurious association = bivariate correlation is there but only because of some third variable - when introducing a new variable that could explain a bivariate correlation, it might not always present an internal validity problem - before assuming that ab association suggest a cause, we have to apply what we know about temporal precedence and internal validity
Quasi experiments
- differs from true experiments because the researchers do not have full experimental control - researchers may not be able to randomly assign participants to one level or another so they study participants who are exposed to each level of the IV - participants are usually naturally occurring or cannot randomly be assigned to groups - several methodological limitations and questions can be broad in scope or time
small N designs 1. research in human memory 2. disadvantages of small N studies 3. other examples of small N studies
- having a large sample isn't always necessary so studies with few participants are conducted at times - single N designs limit their study to only a single person or animal *1. research in human memory* - studied Henry Molaison who had seizures and when medication couldn't control it anymore and started to destroy his cerebellum he got surgery - had his hippocampus and other parts of brain removed because they thought that was the cause of seizures with led to him having memory loss where he wasn't able to recognize the nurses at the hospital helping him but could remember family - studied him for over 40 years and realized his short term memory worked fine when he rehearsed something for a long time but the slight distraction erased it all - experimental control = case studies can effectively advance our knowledge when researchers use careful research designs - studying special cases = small N often take advantage of special medical cases *2. disadvantages of small n studies* - small N designs may not be fully representative of the general population *3. other examples of small N studies* - Jean Piaget observing his own children's perception to develop a theory of cognitive development and although he only used few children, he was able to make a systematic question and replicated his extensive interviews with each of them - Herman Ebbinghaus studied memory in which he made himself remember random syllabus and varied the duration of how long he studied each syllable and this allowed him to develop the "forgetting curve" which depicts how memory for a newly learned list of nonsense syllables declines most dramatically over the first hour but then declines slowly after that - although only studied himself he was able to create testing situations with experimental precision
Interrogating association claims 2. statistical validity
- how well does the data support the conclusion? - statistical validity of an association claim is determining what factors might have affected the scatterplot, bar graph, etc. that led to association claim - need to consider effect size and statistical significance of the relationship, any outliers, restriction of range, or curvilinear *question 1: what is the effect size* - effect size = describe the strength of a relationships between two or more variables - determined by positive or negative association as well as strength - the more strongly correlated two variables are, the more accurate predictions can be - larger effects sizes are considered more important because the result is more important - small effect sizes can be important when it has life or death implications (ex: treatments or illnesses) *question 2: is the correlation statistically significant?* - statistical significance = conclusion researchers reaches regarding the likelihood of getting correlation of that size just by chance, assuming there's no correlation in the real world - logics of statistical inference is the process of determining statistical significance - researcher study samples at a time instead of populations because correlations found in samples are most likely to mirror the correlations found in populations but it may not always be accurate - statistical significance calculations provide a probability estimate (p value) - p value = probability that the sample's association came from a population in which the association is zero - p value less than 5% means they are unlikely to come from a zero association population - if you get p value higher than 5% it is likely that the sample came from a zero association population - the higher the effect size --> the stronger the correlation --> the more likely it is to be statistically significant - statistical significance is dependent of sample sizes - if sample size and effect size are small then it is likely to be a result of chance *question 3: could outliers be affecting the association* - outlier = extreme score, cases that stand out in the data - outlier can sometimes have a high influence on the correlation coefficient r - can make medium sized correlation appear stronger or weaker - in bivariate correlations are problematic when they involve extreme scores on both variables - greater influence on small samples *question 4: is there a restriction of range* - restriction of range = in a correlational study when there's not a full range of scores on one of the variables in the association - it can make correlation appear smaller than it really is - correction for restriction of range = statistical technique in which it estimates the full set of scores based on what we know about an existing restricted set and then recompute correlation - can apply when one of the variables has very little variance *question 5: is the association curvilinear* - when study reports that there is no relationship between two variables, the relationship could actually be 0 - curvilinear association - the relationship between two variables is not a straight line - ex: age on health care could be high during childhood, then low in early adulthood, and high again in old age - r may be 0 but when looking at scatterplot it may show a curvilinear correlation
Interrogating association claims 1. construct validity
- how well was each variable measured? - need to check construct validity for each variable - you can use face validity, etc. to determine if the measure used for each variable has construct validity
interpreting factorial results 1. main effects 2. interactions 3. interactions are more important than main effects 4. possible main effects and interactions in a 2x2 factorial design
- in design with two IV's there are three results to inspect: two main effects and one interaction effect *1. main effects* - researchers test each IV to look for a main effect - main effect = overall effect of one IV on the DV which averages over the levels of the other IV - marginal means = arithmetic means for each level of an IV averaging over levels of the other IV - if sample sizes in each are equal then marginal means is a simple average but if they're unequal then its computer using weighted average - use statistics to find out whether marginal means are statistically significant and sometimes they can be but others times they're not - the main effect is not the most important result in the study as the interaction is the most important effect it is just the overall effect *2. interactions* - first two results obtained in a factorial design with two IV's are the main effects for each IV - third result obtained is the interaction effect which is the difference in differences - table's can be used to estimate whether a study's results show an interaction by first computing the two differences then use significance tests to confirm it - in line graphs its easier to determine by seeing if the lines cross or are not parallel then there's an interaction but if they are parallel then there's no interaction - in a bar graph its the same by mentally drawing lines from the same color bar to one another and seeing how it looks - many possible patterns for interactions so there's no one way to describe it in words - can sometimes use key phrases like "it depends" for crossover interactions *3. interactions are more important than main effects* - there may be real differences in marginal means but most accurate part of the study is the interaction *4. possible main effects and interactions in a 2x2 factorial design* - various outcomes are invented to show different possible combinations of main effects and interactions
Three potential internal validity threats in a study 1. observer bias 2. demand characteristics 3. placebo effects 4. controlling for observer bias and demand characteristics 5. combined threats
- internal validity threats usually occur in pretest/posttest designs and posttest only designs but can be fixed by adding a comparison group although some threats apply to designs who have comparison groups too *1. observer bias* - threat to internal validity in almost any study with a behavioral DV - occurs when researchers expectations influence their interpretation of results - comparison groups don't control for observer bias - means that an alternative explanation for results exists *2. demand characteristics* - problem when participants guess what the study is supposed to be about and change their behavior in the expected direction *3. placebo effects* - occurs when people receive a treatment and really improve but only because the recipients believe they are receiving a valid treatment - can occur when treatments are used to control symptoms - to rule out placebo effects a special kind of comparison group needs to be included such as the one that does not receive the treatment or the placebo to see the effect - double blind placebo control study = neither the people treating the patients nor the patients know whether they are in the real or placebo group *4. controlling for observer bias and demand characteristics* - to avoid these two threats researchers need to conduct a double blind study - variation might be an acceptable alternative if double blind study is not possible - masked design = participants know which group they're in but the observers don't *5. combined threats* - selection history threat = an outside event or factor systematically affects participants at one level of the IV - selection attrition threat = participants in one experimental group experience attrition
Establishing temporal precedence with longitudinal designs 1. interpreting results from longitudinal designs 2. longitudinal studies and the three criteria causation 3. why not just do an experiment?
- longitudinal design = can provide evidence for temporal precedence by measuring the same variables in the same people at several points in time - helps study changes in a trait or ability as a person becomes older - can be adapted to test causal claims *1. interpreting results* - multivariate design = involves more than two variables measured - multivariate designs give several individual correlations a) cross sectional correlations - test to see whether two variables, measured at the same point in time, are correlated - when cross sectional is measured it cannot have temporal precedence because you don't know which one came first b) auto correlations - they determine the correlation of one variable with itself measured on two different occasions c) cross lag correlations - researchers primary interest is cross-lag interactions - show whether the earlier measure of on variable was associated with the later measure of the other variable - addresses directionality problem and establishes temporal precedence - possible patterns from a cross-lag study = instead of showing one variable affecting the other and coming first, it could have found the opposite correlation or found both variables to be significant *2. longitudinal studies and three causation criteria* - longitudinal designs can provide evidence for causal relationships by means of the three criteria causation - significant relationships in longitudinal designs help establish covariance - longitudinal design can help researchers make inferences about temporal precedence - studies can analyze third variables at times depending on how the study is designed *3. why not just do an experiment?* - performing an experiment isn't always the solution for causal claims because people cannot be randomly assigned to a causal variable of interest at times - may be unethical to study children and assign them to a specific variable for long periods of time - unethical to make children experience potential harmful feedback so they need to pass strict ethical review and approval
Perhaps within groups variability obscured the group differences 1. measurement error 2. indvidual differences 3. situation name 4. power
- noise = null effect can occur when there's too much unsystematically variability within each group - noisy within group variability can get in the way of detecting true differences between groups - researchers prefer to keep within groups variability to a minimum *1. measurement error* - one reason for high within group variability - a human or instrument factor that can inflate or deflate a person's true score on the DV - all DV's have a certain amount of measurement error but researchers try to keep that to a minimum - lower within groups variability is better because it makes it easier to detect a difference between different IV groups - solution 1: use precise merriment tools - solution 2: measure more instances *2. individual differences* - individual differences also cause within groups variability - usually a problem in independent groups design because different people will always answer differently due to their differences - solution 1: change the design to within groups because that accommodates for individual differences - solution 2: add more participants to the study *3. situation noise* - external distractions that can cause variability within groups and obscure true group differences - can add unsystematic variability to each group in an experiment - can minimize situation noise by carefully controlling the surroundings in an experiment *4. power* - another name for solutions is power - by employing strong manipulation, carefully controlling the experimental situation or add more participants which increases power of study - easiest way to increase power is to add more participants - experiments with more power will most likely detect true patterns
interrogating null effects
- null effect = the IV did not make a difference in the DV where there is no significant covariance between the two - null effects can occur in within groups design or pretest/posttest design or a correlational study - when results give a null effect it might be that IV does not affect DV - another reason for null effects is that the study was not designed or conducted carefully enough so the IV actually does affect DV but there's something obscuring that in the study
internal validity in quasi experiments 1. selection effects 2. design confounds 3. maturation threat 4. history threat 5. regression to the mean 6. attrition threat 7. testing and instrumentation threats 8. observer bias, demand characteristics, and placebo effects
- quasi experiments look like experiments but are missing experimenter control over IV *1. selection effects* - only relevant for independent groups study - applies when kinds of participants at one level of the IV are systematically different than those at the other level - to control for this you can use wait list design which is when all participants plan to receive treatment but are randomly assigned to do so at different times *2. design confounds* - can be a problem in certain quasi experiments - some outside variable accidentally and systematically varies with the levels of the targeted IV *3. maturation threat* - occurs when an experimental or quasi experimental design with a pretest or posttest has an observed change that could have emerged more or less spontaneously over time *4. history threat* - occurs when an external historical event happens for everyone in a study at the same time as the treatment - makes it unclear whether the outcome is caused by the treatment or by the external event or factor - can usually be ruled out with comparison groups *5. regression to the mean* - occurs when an extreme finding is caused by a combination of random factors that are unlikely to happen in the same combination again so the extreme finding gets less extreme over time - can threaten internal validity for pretest/posttest designs and when a group is selected - random assignment eliminates this *6. attrition threat* - occurs in designs with pretests and posttest when people drop out of a study over time *7. testing and instrumentation threats* - testing threat = when measuring participants more than once because participants tend to change as a result of having been tested before - instrumentation threat = could change over repeated uses and it could threaten internal validity *8. observer bias, demand characteristics, and placebo effects* - observer bias = when experimenters expectations influence their interpretation of results - demand characteristics = when participants guess what the study is about and change their behavior in an expected direction - placebo effects = when participants improve because they believe they are receiving an effective treatment
Three small N designs 1. stable-baseline designs 2. multiple baseline designs 3. reversal designs
- small N designs are usually used in behavior analysis where practitioners use reinforcement principles to improve a clients behavior - when they see improvement in patients behavior they wonder if it was the result of their intervention or something else *1. story baseline designs* - A study in which practitioners or researchers observe behavior for an extended baseline period before beginning a treatment or other intervention - consistent behavior means more sure of effectiveness of treatment - when changes occur after baseline it is easy to deduce that the intervention cause it because it had remained consistent without it *2. multiple baseline designs* - Researchers stagger their introduction of an intervention across a variety of individuals, times, or situations to rule out alternative explanations - first record baseline behaviors for several days in various individuals then begin to treatment for a while to see if behavior changed - if the behavior changed in more than one individual then it rules out alternative explanations *3. reversal designs* - A researcher observes a problem behavior both with and without treatment but takes the treatment away for awhile to see whether the problem behavior returns and then introduce the treatment again to see if the behavior improves once again - observing behavior with and without treatment it shows internal validity and can make causal statements - appropriate only when treatment doesn't have long lasting effects to see the effect of taking it away
Null effects yielding non significant results
- sometimes there's really no effect to find - when studies yield a non significant results but it was conducted to maximize its power than you can conclude the IV truly does not affect the DV - many occurrences of real null effects in psychological research - null effects are not published often and rarely reported in popular media
are quasi experiments the same as correlational studies?
- the groups can look similar to categorical variables in correlational studies when quasi experiments use independent groups design - quasi experiments require more meddling than in correlational - correlational measure variables to analyze relationships but quasi more actively selects groups for an IV - both can use independent groups designs - neither use random assignment or manipulate variables
interrogating association claims 4. external validity
- to determine external validity of an association claim, you need to determine whether the association can be generalized to other people, places, etc. - size of sample does not matter but how the sample was selected does - even when a sample may not have been chosen randomly, it should not be completely disregarded - some bivariate correlational studies may have good statistical validity but is missing external validity - moderator = when the relationship between two variables changes depending on the level of another variable - moderators may impact the relationship of the variables not specifically state something but implies different relationships - in correlational research, moderators can inform external validity - when association is moderated you can determine that it does not generalize from one situation to others
Perhaps there's not enough between groups differences 1. weak manipulations 2. intensive measures 3. ceiling and floor effects 4. manipulation checks 5. design confounds acting in reverse
Some things that prevent study results from revealing true difference that exists between two or more experimental groups *1. weak manipulations* - can obscure a true causal relationship - important to ask how the researchers operationalized the IV *2. intensive measures* - null results are sometimes due to researchers not having used an operationalization of the DV with enough sensitivity - for dependent measures its good to use detail quantitative increments not just two or three levels *3. ceiling and floor effects* - ceiling effect = all scores are squeezed together at the high end - floor effect = all the scores are clustered at the low end - both can cause IV groups to score almost the same on the DV - poorly designed DV can also lead to these effects *4. manipulation checks* - manipulation check = separate IV that experiments in a study, specifically to make sure the manipulation worked - if manipulation check worked then there might be another reason for the null effect *5. design confounds acting in reverse* - considered to be internal validity threats but they apply to null effects too - study might be designed in a way that a design confound actually counteracts some true effect of an IV