EXAM #3: 10-13

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

power

'Power' describes the likelihood that a study will show a statistically significant result (a result that didn't occur by chance) when an independent variable truly has an effect in the population thats not due to chance Power describes the probability of not making a Type II error (type two missed a clue, in which researchers conclude treatment had no effect, or there is no significant relationship between variables, when they're acc is one they missed). Aka, its the probability of correctly rejecting a false null hypothesis (affirming that treatment does have an effect, its not due to chance) In simple terms, the power of a study is the 'extent to which the study can detect a difference when a difference exists'

threats to internal validity in sections

1. Individuals change (changes in the individual that occur not because of the iv but because of some other external or personal factor) - maturation effects - history effects - testing effects 2. Methodological errors (errors in design and measurement) - design confounds (having a variable other than the iv thats unintentionally systematically varying with the iv levels and changing the dv, these other variables should be kept constant aka made into control variables - variables that are intentionally kept constant by the researcher so to not become design confounds / alternative explanations for changes in the dv) - order effects - regression to the mean - instrumentation (measurement instrument such as raters or self report questionnaires change from pretest to posttest and this changes the dv scores as opposed to the iv changing them) 3. Non-treatment differences may exist or emerge between groups - selection effects - attrition / mortality

Construct validity of independant variables (how well an independent variable is manipulated) can be checked with...

1. Manipulation checks (extra secondary dependant variables that researchers sneak into a study in addition to the actual dv to assess whether the iv manipulation worked in the intended way, aka if their manipulation of the iv was sufficient to actually even effect the dv in the way they intend) ensure construct validity of independent variables 2. pilot studies (smaller studies usually done before a main study to assess the effectiveness of an iv manipulation) ensure construct validity of independent variables

disadvantages of within groups designs

1. Order effects: Potential for order effects of carryover effects or practice effects (in repeated measures designs) —Solution:counterbalancing order iv levels are presented 2.Might not be practical or possible 3.Demand characteristics: Experiencing all levels of the independent variable can change the way participants respond to the levels, aka seeing one level can change the response to another level (demand characteristics) more: 1. They have the potential for order effects, which can threaten internal validity (does iv really cause dv, or does the order of iv levels 1 and 2, say level 1 is presented before level 2, impact participants response to 2, thus the order would be a confound / alternative explanation for changes in the dv as opposed to the actual effect of the iv). But researchers can usually control for order effects by counterbalancing (switching the order in which iv levels are presented to participants) 2. A second disadvantage is that a within groups design may not be possible or practical. Suppose a researcher wants to test the best way to teach kids to ride a bike, method A or method B. The IV is a bike riding method, and the levels are method a and method b. She literally cannot present both levels of bike riding method to all participants, as she can't teach a group of kids to ride with method a then return them to baseline and teach them with method b. Once taught, the kids are permanently changed. In such a case, a within groups design wouldn't make sense, with or without counterbalancing. 3. A third problem occurs when participants see all levels of the independent variable and then change the way they would normally act. Seeing all levels of the variable instead of just one may clue people into the goals of the study or expectations of the researcher, and thus their responses to the levels would not be due to iv manipulation but due to them conforming to researcher expectations or otherwise changing their behaviour. Imagine a study that asks people to rate the attractiveness of two people, one black and one white. Participants in such a study, if they see both levels of the iv aka black and white faces, may guess that the study has something to do with prejudice, and as a result change their behavior not due to actual perceived attractiveness level but due to them guessing what the study was about. A cue that can lead participants to guess an experimenter's hypothesis and thus change their behavior is known as a demand characteristic, or an experimental demand.

two advantages of studies with larger samples

1. The CI range is narrower thus the estimate is more precise. Large samples are more likely to lead to statistically significant results (aka CI's that do not include 0). Larger sample = narrower CI and more precise estimate, thus less within groups variability and more between groups variability 2. Effects detected from small samples sometimes can't be repeated. (Imagine a study of online reading games that tested only 10 children. Even if reading games don't work, it's possible that just by chance, 3 children show a terrific improvement in reading after using them. Those children would have a disproportionate effect on the results cuz the sample was so small, and because the result was due to 3 exceptional children, researchers may not be able to replicate it. Thus, large samples have a better chance of estimating real effects.

statistical validity considerations

1. how strong is the effect (effect size d / strength of point estimate) 2. how precise is it (CI range or margin of error width) 3. were there any replication studies to confirm these statistics?

observer bias

A bias that occurs when observer expectations influence the interpretation of participant behaviours or the outcome of the study. Observer bias effects both 1. construct validity and 2. internal validity, and comparison groups don't control for observer bias. Observer bias can effect any experiment regardless of design. In the depression study, the researcher may expect the iv therapy treatment to improve depression levels. Thus, if using certain observational measures to assess depression, the researcher may unconsciously perceive the treatment group to be improving even if they are not. Comparison groups do not control for observer bias, even if the researcher used a comparison no therapy treatment group, her expectations of the treatment group who received therapy to improve could lead her to see more improvement in the therapy group than the comparison group even if its not there Observer bias threatens both internal validity and construct validity. It threatens internal validity because an alternative explanation may exist for the results - the alternative explanation being the observers' expectations skewing their interpretation of participants behavior. Observer bias threatens construct validity of the dependant variable (ie: depression levels) because it means the measurement of depression was influenced or skewed by the researchers expectations

placebo group

A control group in an experiment that is exposed to an inert treatment, such as a sugar pill. Also called placebo control group

demand characteristics

A cue that leads participants to guess a study's hypotheses or goals; a threat to internal validity present in within groups studies where participants are presented with all levels of an independent variable, and by seeing all levels, change the way they respond to the levels.. Also called experimental demand cues in an experiment that tell the participant what behavior is expected, thus they change their behaviour on the dv not due to the independent variable manipulation, but due to them guessing what's expected by the researchers demand characteristic example: Imagine a study that asks people to rate the attractiveness of two people, one black and one white. Participants in such a study may think the study has something to do with prejudice, and as a result of seeing both levels of the variable (both faces), may change their behaviour and rate them equally attractive to not seem prejudice. Basically, by seeing both levels of the variable (black face vs white face), participants were cued in two the hypothesis and goals of the study (that it was about prejudice), thus changed their responses to the diff iv levels. In the face attractiveness study, it may have been more appropriate to use an independent groups study in which parents were not presented with all levels of the variable, but rather split up into 2 groups and each group presented with only one level of the iv. Thus, one group would rate the attractiveness of a white man and the other would rate the attractiveness of a black man, and the demand cue (that suggested to participants the researchers hypothesis dealt with prejudice) that would be present by seeing both levels of the iv wouldnt come into play and cause participants to change their responses to the iv. When asking about demand characteristics in a within groups study, you ask: did the independent variable manipulation really work, or did the participants simply guess what the researchers expected them to do and change their behavior accordingly?

demand characteristics

A cue that leads participants to guess a study's hypotheses or goals; a threat to internal validity. Occurs when participants guesses about a studies goals or researchers hypothesis cause them to change their behaviour on the dv, not the intended iv manipulation (thus alternative explanation threat to internal validity) Demand characteristics can affect any study regardless of design Demand characteristics are a problem for internal validity when participants guess what the study is supposed to be about and change their behavior in the expected direction. For example, the depression studies depressed people know they are getting therapy and the researcher expects it to improve their depression, thus they may change their self reports of depression in the expected direction Demand characteristics may occur in within groups studies when participants are exposed to both levels of a variable, and exposure to both levels cues them into what the study is about (ie: rating attractiveness of a black person vs white person, seeing both levels of the variable aka a white face vs a black face, and subsequently guessing that the study is about prejudice and changing their response to these levels accordingly as to not seem prejudice).

null effect

A finding that an independent variable did not make a difference in the dependent variable; there is no significant covariance between the two If a CI range includes zero (ie: a point estimate of 9 with w a wide CI range of -2-17), we conclude its not statistically significant, aka is just as likely to be due to chance as due to the treatment, and we cannot conclude the treatment acc had an effect and there was any relationship/covariance at all between variables, there may be literally zero association. A CI range that includes zero renders a finding a null effect ( we conclude the iv didn't affect the dv in a significant way aka in a way that isn't likely to be due to chance) Even when null effects are found, its possible there still may be an effect of the iv on the dv or a relationship between the variables, just one that this particular study did not detect. Replication studies can either confirm or contradict null effects found by earlier studies.

Latin square

A formal system of partial counterbalancing that ensures that each condition in a within-groups design is represented at least once. So if you had 24 possible iv level sequences, a latin square formal system would ensure that each of the 24 sequences was represented at least once across all the participants

confound

A general term for a potential alternative explanation for a research finding; a threat to internal validity. Confounds can include third variable alternative explanations, or moderater variables, which moderate/change the relationship between two studied variables For within groups studies (where all participants are exposed to all levels of a variable either at the same time in a concurrent design or at diff times in a repeated measures design), a special confound / threat to internal validity they have is order effects: in which exposure to one level of a variable changes participants response to the other level of the variable. Order effects include practice effects (participants' performance improves over time because they become practiced at the dependent measure (not because of the iv manipulation or treatment) and carryover effects (some form of contamination carries over from one condition to the next, affecting participants' response to the second condition. ie: (when sipping orange juice right after you brushed your teeth, the first condition (teeth brushing) affects your response to the second condition (orange justice tasting). Counterbalancing (switching sequence of variable levels presented within the participant group, like half get level 1 then level 2 and half get level 2 then level 1, but its still within groups cuz all participants see all levels of the variable) can control for order effects. For indépendant groups studies, a confound can be in the form of selection effects, when the kinds of participants at one level of the variable are systematically diff than the kinds at the other level of the variable (ie: all the meditators were introverts and all the non-meditators were extroverts, aka a systematic variation in differences between groups). Random assigment prevents selection effects. Confounds can be prevented with control variables. In studies, researchers think carefully about confounds and turn them into control variables instead

control group

A level of an independent variable that is intended to represent "no treatment" or a neutral condition. Also called control condition (ie: my non-meditating no treatment group was a control group as they got no treatment/no condition). When a study has a control group, the other level of levels of the independent variable are usually called the treatment group(s). When the control group is exposed to an inert treatment such as a sugar pill, it called a placebo group, or placebo control group. Not every experiment has or needs a control group. For example, in the notetaking study, one group was handheld notetakers and the other was laptop notetakers, there was no group that represented 'no treatment condition' or 'no notetaking' - aka there was no control group and there didn't need to be. All experiments need a comparison group, but the comparison group may not need to be a control group.

preventing weak manipulations, ceiling effects, and floor effects

A manipulation check can be used to rule out ceiling and floor effects due to weak manipulations. A manipulation check is a separate/extra dependent variable that experimenters include in a study specifically to make sure the iv manipulation worked. In the anxiety voltage study, a manipulation check would entail: after telling participants in one group they'd receive a 10 volt shock and the other group they'd receive a 50 volt shock, researchers asking the participants how anxious are you right now on a scale of 1 to 10. The extra dependent variable would be anxiety levels given the iv of differing shock volt expectations, included in order to see if this manipulation of anxiety worked, when the main dependent variable was logical reasoning in differing anxiety levels. In this poorly manipulated iv of anxiety (manipulated via shock voltage expectation), a manipulation check would have revealed that all the groups had similar levels of anxiety depending on which voltage they thought they were gonna get, and researchers would have known their manipulation of anxiety didn't work and the iv of anxiety should be operationalised differently. On the other hand, if the manipulation check showed that the iv levels varied in the expected way, as in ppl who were told they'd get 10 volts were less anxious than those told they'd get 100 volts, then researchers could conclude their manipulation of anxiety was sufficient and could be used with the ultimate dv of logical reasoning. If the manipulation check worked, the researchers may look for another reason they got a null effect - perhaps the dv measurement was inappropriate and led to a floor effect - perhaps the logical reasoning test was too hard in general and both effectively manipulated anxious and non-anxious groups got low scores on it. When interrogating null results for potential ceiling and floor effects, ask: Was the iv manipulation strong enough to create a difference between the groups? And was the dependent variable measure sensitive enough (and appropriate) to detect the difference?

full counterbalancing

A method of counterbalancing in which all possible condition orders are represented A repeated measures design with 2 conditions (2 levels of the iv), such as the teeth brushing orange juice study or the chocolate tasting study, is easy to counterbalance because there are only two orders, A -> B and B -> A. In a repeated measures test with 3 conditions, or 3 levels of the iv being represented in all possible orders, each group of participants could be randomly assigned to one of the six following sequences: A → B → C, A → C → B, B → A → C, B → C → A, C → A → B, C → B → A. But as the number of conditions, or number of levels of the independent variable, increase, the number of possible orders needed for full counterbalancing increase - a study with four iv levels would require 24 possible sequences to fully counterbalance. Thus, full counterbalancing is used only for experiments with 3 iv levels or less. So for a study with 2 variables, there are only 2 sequences of orders that need to be presented. For a study with 3 variables, there are 6 sequences that need to be presented. For a study with 4 variables, there are 24 possible sequences that would need to be represented to achieve full counterbalancing In cases with over 3 conditions / independent variable levels that need to be presented to each participant, partial counterbalancing is used

partial counterbalancing

A method of counterbalancing in which some, but not all, of the possible condition orders are represented In partial counterbalancing, not all orders of the independent variable presentations are represented. All levels of the independent variable are still presented to each participant, making it within groups, but not all orders/sequences of these levels are represented like in full counterbalancing. Methods of partial counterbalancing: - One way to partially counterbalance is to present the conditions in a randomized order for every subject. So if your iv had 4 levels you wanted to partially represent, you would randomly assign participants to any of the 24 order sequences. - Another way to partially counterbalance is use a Latin Square: a formal system of partial counterbalancing which ensures that every condition sequence in a within groups design is represented at least once. So if you had 24 possible iv level sequences, a latin square formal system would ensure that each of the 24 sequences was represented at least once across all the participants.

regression to the mean

A phenomenon in which an extreme finding is likely to be closer to its own typical, or mean, level the next time it is measured, because the same combination of chance factors that made the finding extreme are not present the second time.

placebo effects

A response or effect that occurs when people receiving an experimental treatment experience a change only because they believe they are receiving a valid treatment For example, the women in the depression study may have improved pretest to postest not because of the fact that the iv treatment worked, but because of the fact that they expected to improve due to the treatment. This happens in studies on pills, when a control group is given a sugar pill instead of a depression pill, yet their depression improves because they believe the treatment they're receiving is valid and will work. Herbal remedies are often considered to be placebos. But placebo effects aren't imaginary, they've been shown to notably reduce real symptoms both psychological and physical. People symptoms appear to improve not only because of active ingredients in medications or therapy, but also because of their belief in what the treatment can do to alter their situation Placebo effects threatening internal validity can be prevented with a double blind placebo control study, in which a special type of comparison group is used in which neither the researchers or the participants know who is getting the placebo treatment and who is getting the real thing (ie: researchers and participants don't know who is getting a real pill vs who is getting the sugar pill)

interaction effect

A result from a factorial design, in which the difference in the levels of one independent variable changes, depending on the level of the other independent variable; a difference in differences. Also called interaction IE: in a does cell phone use effect driving study, the two independent variables are cell phone usage (levels are no usage and usage) and the other iv is driver age (levels are above or below 40). The DV is driving speed. These variables may interact - does the effect of the original independent variable (cell phone use) on the dv of speed depend on the level of another independent variable (driver age). The mathematical way to describe an interaction is to call it a 'difference in differences' - there is a difference in the effect of one iv on the dv which depends on another iv. In the driving example, the difference in driving speed (dv) between cell phone and control conditions might be different for older drivers than younger drivers. You can see interactions on a graph whenever the lines are not parallel - they may either crossover each other in a crossover interaction or branch out from each other (<) in a spreading interaction types of interactions: 1. there are two way interactions, in which differences in levels of one variable depend on changes in another variables levels 2. there are 3 way interactions (in 2x2x2 factorial designs) in which the two way interaction between two of the independent variables depends on the level of the third independent variable. This is easiest to see on a line graph You find 3 way interactions in.a 2x2x2 factorial design if 1. there is a two way interaction for one level of a third independent variable but not for the other level, and ... 2. If a graph shows one pattern of two way interaction on one side but a different pattern of two way interaction on the other side - aka if there are different two way interactions. - But if you found the same kind of two way interaction for both levels of the third independent variable, there would not be a 3 way interaction In other words, if the 2 way interactions are the same there's no three way interaction. If the 2 way interactions are different there is a 3 way interaction. If there are no 2 way interactions there is no 3 way interaction

masked design

A study design in which the observers are unaware of the experimental conditions to which participants have been assigned, but the participants know which condition they're in. Also called blind design In the depression group, participants would have to know which condition they were assigned to as they'd either be getting therapy or not getting therapy. But if using an observational measurement or even a self report measurement, raters wouldn't have to know which participants were in which group. They'd either observe participants depression symptoms without knowing if they'd received the treatment or not, or scale they're symptoms via a self report test without knowing if they'd been in the treatment therapy group or the no treatment comparison group. This would prevent observer bias, but unfortunately demand characteristics (cues that lead participants to guess the goals of the study and change their behavior accordingly) could still be an issue as the participants know which experimental group they're in and may change their behavior to match what they think the researchers expect from that group. Consider the note taking on laptop or hand notes study. This was a masked study design / blind study because the research assistants in that study were blind to the condition each participant was in when they graded their tests on the lectures. The participants themselves were not blind to their notetaking method. However, since the test-takers participated in only one condition (an independent-groups design), they were not aware that the form of notetaking was an important feature of the experiment. Therefore, they were blind to the reason they were taking notes in longhand or on a laptop.

double blind study

A study in which neither the participants nor the researchers who evaluate them know who is in the treatment group and who is in the comparison group A double blind study is the best way to control for observer bias (in which researchers expectations affect their interpretation of participant behavior) and demand characteristics (in which participants change their behavior because they know the purpose of the study or change it to match researcher expectations) When full double blind studies aren't possible,a variation called a masked design can be used, in which observers are blind to which experimental group participants are in, but the participants know which condition they're in by virtue of the study design

double blind placebo control study

A study that uses a treatment group and a placebo group and in which neither the researchers nor the participants know who is in which group Researchers can include a special kind of comparison group to determine whether an effect is really caused by a treatment or it's a perceived placebo effect. One group receives a placebo treatment, and the other receives the real thing. The crucial difference is neither the people treating the participants or the participants themselves know whether they are in the real group or the placebo group.

selection history threat

A threat to internal validity in a proper 2 group pretest posttest design, in which an external ('historical or seasonal') event systematically affects only the participants in the treatment group or only those in the comparison group, not both an outside event or factor systematically affects people in the study- but only those at one level of the independent variable, either there treatment group or comparison group In selection-history threats, an outside event or factor affects only those at one level of the independent variable. For example, in the camping study, say there was a comparison group - some campers got a low sugar diet and some got their normal diet. While the campers were in between the pretest and posttest of their unruly behavior, perhaps those that got the low sugar diet had to walk to a farther dining hall while the comparison group walked to the normal close by one. Perhaps the treatment group who had to walk far to get their food had reduced unruly behavior because they were tired out from the walk while the comparison group who went to the normal cafe did not get tired - so an external factor influenced only the participants at one level of the independent variable and not the other. The treatment group was influenced by an external factor (dining hall far walk tiredness) while the comparison group was not

design confound

A threat to internal validity in an experiment in which a second variable happens to vary systematically along with the independent variable and therefore is an alternative explanation for the results. a design confound is a second variable that unintentionally varies systematically with the independent variable For example, in the parental perseverance baby study, if the adult models modeling either perseverance or giving up had accidentally exhibited more cheerful attitudes in the perseverance than the giving up condition, the study would have a design confound because the second variable (cheerfulness level) would have systematically varied along with the intended independent variable (perseverance vs giving up) Remember that only systematic variability of a second variable in tandem with the independent variable is a problem, unsystematic variability is fine and inevitable. - - Considering the baby study, it might have been the case that some parental models were naturally more cheerful in disposition and others were more reserved. The emotional disposition of the models is a problem for internal validity only if it shows systematic variability with the independent variable. - Systematic variability: when the levels of some second variable coincide in a predictable and systematic way with the independent variable, creating a potential design confound or second variable explanation. In the baby perseverance study, if the parental models emotional disposition varied systematically with the independent variable, as in cheerful parents systematically went with the perseverance condition 1 and reserved parents went with the giving up condition 2, then emotional disposition would be a design confound, as it would vary systematically with the independent variable and thus be a potential alternative explanation for the dependant outcome variable - Unsystematic variability: when levels of a second variable vary unsystematically, in an unpredictable way, with the independent variable. In the baby study, if some parents happened to be cheerful and others happened to be reserved, but these levels did not vary systematically with the iv groups of perseverance vs giving up, it would be an unsystematic variation and would thus not affect internal validity. In other words, if some parents in the perseverance groups were cheerful and some were reserved, and the same went for the giving up group, this would be an unsystematic variable variation between groups that did not systematically coincide with the independent variable levels- thus not causing a confound / potential alternative explanation. Unsystematic variability is also called haphazard or random variability.

selection attrition threat

A threat to internal validity in which participants are likely to drop out of either the treatment group or the comparison group, not both In a selection attrition threat, only one of the groups (either treatment or comparison) experiences attrition, aka when a certain kind of participant systematically drops out midstudy after the pretest but before the posttest. If the depression study was a true 2 group pretest/posttest design, say half the participants were assigned to a therapy treatment group and half were assigned to a comparison no treatment group (so iv level 1 therapy treatment and iv level 2 no therapy treatment), it might be the case that the most severely depressed people dropped out - but only from the treatment group, not the control group. Perhaps the therapy in the treatment group was just too much for the most depressed people, so only the treatment group experienced attrition (certain kinds of participants dropping out). This poses a threat to internal validity: did the cognitive therapy really work compared to the comparison control group? Or is it just that the most severely depressed people dropped out of the treatment group, thus in the posttest the treatment groups scores dv depression appeared to improve, but it was actually due to the most depressed people dropping out of the treatment group?

regression threats

A threat to internal validity related to regression to the mean, a phenomenon in which any extreme finding is likely to be closer to its own typical, or mean, level the next time it is measured (with or without the experimental treatment or intervention) Occur when scores become less extreme over time (aka regression to the mean), so scores on the pretest may be extreme and then change to be less extreme on the posttest not due to iv manipulation, but due to natural regression to the mean When a group average (mean) is unusually extreme at time 1, the next time that group is measured (time 2), the mean likely to be less extreme aka be closer to its typical or average performance. Thus, in regression threats, an unusual pretest group measurement (mean) that was due to chance factors (regression to the mean) will likely be different in the posttest measurement not due to the iv treatment, but due to the fact that the initial pretest measurement was unusually extreme due to chance factors Regression threats occur only when a group is measured twice, and only when the group has an unusually extreme score at the pretest. Preventing regression threats: comparison groups allow for researchers to account for regression threats. If there was a control group that did not receive depression treatment, but their depression levels still went down after 12 weeks and regressed to their baseline (mean), the researchers could attribute these changes to regression effects, and see if the iv treatment in the experimental group led to further decreases in depression that were not just due to regression to the mean. Researchers can account for regression effects best if the comparison group and experimental group means are equally extreme at the pretest. However, even if the means are not equally extreme in the comparison vs experimental group at pretest, regression alone can't make an extreme group cross over the comparison group to the other extreme, so researchers could assume that therapy has some effect, in addition to a little help from regression. Changes in dv measurements in an experimental group can be due to both regression to the mean and actual effect of the iv Regression threats are an issue especially when group is selected specifically for their extreme scores (ie: super depressed people)

selection effects

A threat to internal validity that occurs in an independent-groups design when the kinds of participants at one level of the independent variable are systematically different from those at the other level Selection effects are a kind of systematic variability among diff groups/levels of the iv that can be a confound if not accounted for or prevented via random assignment or matched groups assigment. A selection effect may result if experimenters dont do random assignment and accidentally assign one type of person (ie: all the introverts or high achievers, or all who sign up early in the semester) to one condition, and another type of person (ie: all the extroverts or low achievers, or all those who wait till later in the semester) to another condition. This would result in systematic variation between the kinds of participants in one group/level of the iv and those in the other level/group of the iv (for example, in my meditation study, if i assigned all those who responses to the email first to the meditating group 1 and all those who responded last to the non-meditating group 2, this would mean the kinds of participants likely systematically vary among groups, and this would create an internal validity issue). Of course, in any study there will be natural variations between the kinds of participants in each group. These variations only become a selection effects confound when the kinds of participants in one group are systematically different from the kinds of participants in another. basically, selection effects are when participants in either group of an independent study vary systematically from each other (ie: particiipants in group 1 have high gpas while participants in group 2 have low gpas). Random assigment prevents systematic variation between groups, or selection effects. Macthed assigment is another way to prevent selection effects and ensures almost complete equality between groups even beyond normal random assignment

instrumentation threats

A threat to internal validity that occurs when a measuring instrument changes over time Instrumentation threats occur when a measuring instrument changes from pretest to posttest, thus the groups pretest posttest scores change not due to the iv manipulation but due to the measuring instrument changing over time In observational studies, the people or 'raters' who are coding/rating behaviors are the measuring instrument, and over a period of time, may unconsciously change their standards for judging behavior by becoming stricter or more lenient. For example, maybe the campers didn't really become less disruptive, maybe the observers judging their behavior unconsciously became more tolerant of loud voices and rough play. Thus, changes in the pretest vs posttest scores would not be due to iv manipulation but rather due to the measuring instrument (in this case raters, in other cases differing pretest/posttest items) taking diff measurements Another form of instrumentation threat would be when a researcher uses different forms of measurement pretest vs posttest, like a different self report scale (perhaps to prevent testing effects), but the two forms are not sufficiently equivalent. In the women's depression study, perhaps the pretest measure of depression was a bit different than the posttest one, and the women tended to score higher on the pretest measure because of this difference. Thus, the changes in dv measurement pretest to posttest would not be due to iv manipulation, but due to the pretest and posttest measurement method being different Preventing instrumentation threats: to prevent instrumentation threats in which measuring methods inadvertently change pretest to posttest, researchers can switch to a posttest only design, or they can take steps to ensure that the pretest and posttest measures are equivalent. To do so, they may collect data from each instrument to be sure the two are calibrated the same. To avoid shifting standards of behavior observers/raters, researchers might retain their coders throughout the experiment, establishing their reliability and validity at both pretest and posttest. Clear coding manuals (codebooks with instructions on how to rate observed behaviors) help raters observations remain the same from pretest to posttest

maturation threat

A threat to internal validity that occurs when an observed change in an experimental group could have emerged more or less spontaneously over time when dv measurements change from pretest to posttest not because of the iv treatment but because of spontaneous changes that naturally happen over time (ie: spontaneous remission) that have nothing to do with the iv treatment Maturation threats occur when peoples measurement on a dv changes because they naturally adapt to situations, or 'mature' over time (ie: the campers unruly behaviour decreasing not because of the iv low sugar diet but because they adapted to the camp environment, students doing better on the second exam in a class because they are more used to taking that type of test, etc). People adapt and change, and when these natural spontaneous changes that happen over time cause the changes in the dv instead of the iv causing these changes, thats a maturation threat Includes spontaneous remmission over time unrelated to iv manipulation (ie: of depression in the depression women study) Preventing maturation threats: comparison groups are the way to tell if the iv treatment caused the changes in the dv, or it was due to smth else like a maturation effect. In one group pretest/posttest designs that lack a comparison group, there is no way of knowing whether improvements (ie: in the women's depression levels or boys behavior) were due to the iv treatments administered (low sugar diet and therapy), or by spontaneous maturation effects. A real experiment would have contained 2 groups, one comparison group (a group of boys who got their normal diet or a group of women who didn't receive therapy), which would allow the researchers to see if it really was their iv causing these changes or just maturation.

history threat

A threat to internal validity that occurs when it is unclear whether a change in the treatment group is caused by the treatment itself or by an external (or 'historical') factor that affects most members of the group when external factors affect most members of the group at the same time as they are receiving the iv treatment, so it's unclear whether these external factors that affected the group members in between the pretest at time 1 and posttest at time 2 caused the changes in the dv measurement from pretest to posttest, or if the changes were actually due to the iv treatment Sometimes a threat to internal validity occurs not just because time has passed, but because smth specific has happened between the pretest and the posttest. In the camping example, was it the low sugar diet that caused improvements in the campers behavior, or perhaps they all started a difficult swimming course in the middle of the study and the exercise tired them out, thus that external factor/event that occurred at the same time as iv treatment was being administered / in between pretest and posttest was the cause of the change in the dv (their behavior), not the intended iv of diet change IE: A city in California has asked Professor Rodriguez to conduct an experiment on earthquake preparedness. Professor Rodriguez will assess the preparedness of a random sample of residents in the city and the city will mail out their annual brochure on earthquake safety. Then, 2 weeks later, he will again assess the preparedness of those residents. Right after the brochures are mailed, a large earthquake is reported in Japan. This poses a history threat to internal validity because its an external factor (earthquake far away) that happens between the pretest and the postest (earthquake readiness test), so people may be more prepared for earthquakes on the posttest because they're paranoid due to the earthquake that happened in Japan an did their own research, not bcuz of the intended iv of the earthquake brochure Preventing history threats: like with maturation effects, a comparison group helps control for history threats. In the camping study, a comparison group in which the boys did not receive the diet change would have allowed researchers to see if the low sugar diet iv manipulation was actually the cause of the dv change (behavior), or if the behavioral change was due to some other external factor that affected most members of the groups at the same time as they were receiving the iv treatment, like the external factor of the swimming class. In the depression example, a history threat could be an external factor influencing the womens depression levels (dv) in between the pretest and post test, aka an external factor affecting most members of the group at the same time they were undergoing the iv treatment. For example, the changes in the weather that occurred in between the depression pretest and posttest could have affected most of the groups depression levels at the same time as they were receiving the treatment, making it unclear whether it was this external factor of weather change causing dv (depression level) changes, or it was the iv manipulation of therapy. A comparison group with no therapy would allow researchers to tell if this external weather factor could've been the sole cause of dv changes, thus it being a history effect, or if they're iv manipulation was actually the cause of dv changes

practice effect

A type of order effect in which participants' performance improves over time because they become practiced at the dependent measure (not because of the manipulation or treatment). Also called fatigue effect. Practice effects can occur when participants get better at the dv measurement task OR get tired or bored during the second dv measurement thus get worse at the dv measurement task. Practice effects require the dv to be measured twice, so it'd have to be a repeated measured design

carryover effect

A type of order effect, in which some form of contamination carries over from one condition to the next exposure to one level of a variable 'contaminates' aka changes participants experience of the other level For example, imagine a study determining which tastes better, sipping orange juice or brushing your teeth. The IV is either orange juice or teeth brushing, and participants are exposed to both levels at different times, level 1 teeth brushing and level 2 orange juice tasting, then the dv of their preference of taste is measured. But carryover order effects pop up because when sipping orange juice right after you brushed your teeth, the first condition (teeth brushing) affects or contaminates your response to the second condition (orange justice tasting). The attractiveness study of white vs black men, if done in a repeated measures manner, would've been subject to carryover effects if exposure to one level (a white face) changed participants rating of attractiveness of the other level (a black face) because they didn't want to seem prejudice (they guessed the study was ab prejudice cuz demand characteristics), so exposure to one level contaminated their responses to the next level. Could've been controlled for via counterbalancing so carryover effects would be cancelled out IE: a study which tests reduction in depression via iv level 1: psychotherapy and iv level 2: depression med. If this was a within groups design, it would be subject to order effects in the form of carryover effects (repeated measures design). If participants got 1 level, say depression meds, then got the other level, psychotherapy (or visa versa), this would potentially cause carryover effects: is getting the meds first causing the decreased depression when they're exposed to the next condition of psychotherapy, or visa versa, is getting the psychotherapy first carrying over into exposure to the next condition of meds and causing the decreased depression

one-group pretest-posttest design

An experiment in which a researcher recruits one group of participants; measures them on a pretest; exposes them to a treatment, intervention, or change; and then measures them on a posttest There is no comparison group, which makes this a really bad experimental design One-group pretest post-tests are vulnerable to specific internal validity threats including maturation threats, history threats (external factor threats), regression threats, attrition threats, testing threats, and instrumentation threats. Because there's no comparison group in a one group pretestposttest 'really bad experiment' design thats not getting the treatment, its impossible to tell if changes in the dv are due to these threats or the iv manipulation. A one-group pretest-postest is a within groups design, but its shitty and shouldn't be used: pretest to posttest designs should be independent groups and contain a comparison group to rule out these special internal validity threats

concurrent-measures design

An experiment using a within-groups design in which participants are exposed to all the levels of an independent variable at roughly the same time, and a single attitudinal or behavioral preference is the dependent variable Unlike a repeated measures design, in which the one group of participants is exposed to diff levels of the iv at diff times, in a concurrent measures design participants are exposed to diff levels of the iv at the same time. For example, a study in which infants were shown a male face and a female face at the same time, and experimenters measured which face they looked at the longest. The independent variable was the gender of the face, the two levels being male or female, and the dependent variable was how long the baby looked at each face. The babies experienced both levels of the independent variable (exposure to male and female face) at the same time. If this was a repeated measures design, researchers would have presented the diff levels of the iv (male face and female face) one after the other at different times, and measured the dependent variable of face looking preference twice. But in the concurrent measures design, they presented both levels of the iv (male and female faces) at the same time, and measured the dv (face looking preference) only once. - In other words, 1. the measure of the dv wasn't repeated, it happened only once, and 2. the levels of the iv were presented to participants at the same time as opposed to one after the other at diff times as they would be in a repeated measure Concurrent measures designs aren't subject to order effects as participants don't get the levels the iv in a certain order they get them all at the same time

posttest only design

An experiment using an independent groups design in which participants are tested on the dependent variable only once, after exposure to the independent variable. Also called equivalent groups, posttest-only design The note taking example is a post-test only design, with two independent variable levels. Participants were assigned to either iv notetaking level 1 - laptop notes or iv notetaking level 2 - handheld notes, and they were tested on the dv (knowledge retention of lecture) only once, after their exposure to these iv levels Pretest/posttest designs are better at ensuring the groups are equivalent (on the pretest) before exposure to the dv, but sometimes post-test only designs are more appropriate (ie: in the baby study)

pretest/posttest design

An experiment using an independent-groups design in which participants are tested on the key dependent variable twice: once before and once after exposure to the independent variable. Also called equivalent groups pretest/posttest design In a meditation study in which gpa is the dependant variable, a pretest/posttest design would be testing each iv group's (the meditators and non-meditators) gpa before they underwent the treatment conditions of meditation vs no meditation, then testing their gpa after they underwent the treatment conditions of meditation vs no meditation. The initial gpa test (dv measure) would be the pretest, before Independent variable condition (meditation or no meditation) exposure, the posttest would be the gpa dv measurement after they were exposed to the conditions. When is a pretest/posttest design used? - Researchers may use a pretest/posttest design if... 1. they want to measure improvement over time... 2. They want to be extra sure that the two groups are equivalent in traits before exposing them to the iv. In other words, they may use a pretest/posttest design when they want to be sure random assignment made groups equal. In this case, a pretest/posttest would ensure there is no selection effect (systematic variation between groups) in the study (ie: the meditating and non-meditating groups had identical pretest gpas, this would ensure the groups were completely equal and there were no selection effects that made the kinds of participants in each group systematically differ) When not to use a pretest/posttest design - Pretest/posttest designs are effective as long as the pretest does not make the participants change their subsequent behavior in the posttest, as it might in the baby persistence study - pretest/posttest designs have the advantage of ensuring groups are completely equal prior to being exposed to the IV condition and determining the degree of within groups variability. But In some situations it is problematic to use a pretest/posttest design and will compromise the study. Imagine that in the persistence baby study, researchers had pretested babies to see how persistent they were in solving the toy puzzle before being exposed to the iv conditions of giving up parent vs persistent parent. If they had, the babies may have been too frustrated to continue dealing with the toy at all after being exposed to the iv - their pretest may have burned out their efforts and they wouldn't have responded the same to the iv.

independant groups design

An experimental design in which different groups of participants are exposed to different levels of the independent variable, such that each participant experiences only one level of the independent variable. Independent groups designs are also called between-subjects design or between-groups design. There are at least 2 distinct groups in independent groups designs, and the separate groups are placed into different levels of the independent variable. Independent groups designs include... 1. pretest/posttest designs (in which participants in both groups are measured on the dv twice, once before exposure to the iv and once after), and 2. posttest only designs (in which participants in both groups are measured on the dv only once, after exposure to the iv)

within groups design

An experimental design in which there is only one participant pool/group, and each participant is presented with all levels of the independent variable. Also called within-subjects design. There is only one 'group' or one participant pool in a within groups design, and the participants are presented with both levels of a given variable, then compared to themselves on different levels of the variable. Each participant is thus their own control / comparison. My meditation study would've been a within groups design if there was only one group and they were presented with both levels of the variable, aka one participant pool and for a certain time they meditated, then for a certain time they didn't meditate, then the results were compared between the same participants on these diff levels of the variable. The notetaking study could have been within groups if the researchers had asked each participants to take notes on a laptop, then had them also take notes with their hand, then compared these treatment conditions within the group Two types of within groups designs are... 1. repeated measured designs (in which all participants in a study are exposed to all levels of the independent variable, but at diff times, iv level 1 at time 1 and iv level 2 time 2, and are measured on the dv twice, once after exposure to one level and once after exposure to the other level), and 2. concurrent measures designs (all participants in a study are exposed to both levels of the independent variable at the same time, and measured once on the dv) Matched groups designs as within groups designs? - Matched groups designs can also be treated as within group designs. As in matched groups designs, participants are matched in sets/pairs between groups on some key control variable (such as gpa). The matched participants are assumed to be more similar to each other than in more traditional independent groups designs using normal random assignment, thus in a certain sense they can be considered within groups, but more generally they are considered between groups designs as they contain two distinct groups with different participants albeit the groups being perfectly equally matched and selection effects being wholly prevented via matching.

floor effects

An experimental design problem in which independent variable groups score almost the same on a dependent variable, such that all scores fall at the low end of their possible distribution DV Floor effects can be the result of a dependant variable test being too hard, so that both the manipulated group and control group both score low on it - their score reflecting the flawed difficulty of the dv test as opposed to any actual effect of the iv. This can lead to a false null effect. IV Floor effects can be the result of independent variables being weakly manipulated so that the participants at each iv level all score low on the dv due to the iv manipulation being too weak to effect them or show differences on the dv between levels IE: Consider the money study in which researchers studied if giving ppl money had an effect on mood and gave one group 1 dollar and the other group 2 dollars. These levels were too small (too weak manipulation) to have an effect on ppls moods, so the both groups dv measures of mood given the money they got would be clustered on the low side of the possible distribution - both the groups would have not much effect on mood because the differences between groups/levels would not be significant enough to produce changes in the dv - aka the manipulation would be too weak. Thus, even if amount of money does have an effect on mood, it wouldn't show up on the dv test in this study because the manipulation of the iv was ineffective/weak, weak enough that we saw no effect on the dv and got a false null effect

ceiling effects

An experimental design problem in which the separate independent variable groups score almost the same on a dependent variable, such that all scores fall at the high end of their possible distribution DV Ceiling effects can be the result of a dependant variable test being too easy, so that both the manipulated group and control group both score high on it - their score reflecting the flawed easiness of the dv test as opposed to any actual effect of the iv. This can lead to a false null effect. IV Ceiling effects can also be the result of weak manipulations on the iv - consider the anxiety voltage shock study where participants in group 1 were told they'd get a 10 volt shock, group 2 was told they get 50 volts, and group 3 was told they'd get 100 volts. The differences in the levels of this variable were too weak to create meaningful differences in the dv of participants anxiety, all these conditions caused anxiety, thus all dv anxiety scores clustered towards the high end not because voltage difference didn't effect anxiety, but because the manipulation of voltage difference was too weak Ceiling effects example on dependant variable: Frances investigated the effect of concreteness on memory. She created a list of 12 items that are very concrete (e.g., pencil and table) and a list of 12 items that are very abstract (e.g., justice and freedom). Each item was viewed for 1 second, then participants recalled them in order. Ten participants were randomly assigned to each list of items. The study showed null effects, and almost all the participants remembered all the words on both lists - this is a ceiling effect cause all participants scores clustered towards the high end reguardless of which iv group they were in. This was a ceiling effect on the dependant variable, as the word remembering test was too easy Ceiling effects example on independent variable: consider in my meditation study, if I only gave the meditating group 1 minute of meditation once a week and the control group no meditation, this iv manipulation would be too weak to produce differences in the groups scores on the dv of gpa. The difference between the levels of the iv (meditation vs no meditation) wouldn't be significant enough to allow us to detect any real difference of meditation on gpa, even if one did exist, which would lead to a false null effect. It would be a ceiling effect if everyone scored high on the posttest GPA measurement and there were no differences in treatment vs comparison group cuz the levels of the iv weren't different enough aka didn't produce enough between groups variability

matched groups design

An experimental design technique in which participants who are similar on some measured variable are grouped into sets; the members of each matched set are then randomly assigned to different experimental conditions. Also called matching. Matching is used when researchers want to be absolutely sure the experimental groups are as equal in traits between the participants within each group as possible. Matched groups are an independent groups design method, as there's 2 groups. in matched groups designs, the measured dv scores of each participant are often then compared directly against their matched counterpart in the other group. To create matched groups from a sample of 30, the researchers would first measure participants on a particular variable that might matter to the dependent variable. For example, student achievement, operationalized by a measurement of GPA, might matter in the notetaking study. The researchers would thus match up participants according to GPA similarities, matching up the two with the highest gpa, the two with the next highest, the two with the next highest, and so on. Within these matched sets, they would randomly assign one participant to the control group and the other to the experimental group. They'd continue this process until they matched each participant according to gpa similarity and randomly assigned one member of each matched set to the control group and the other to the experimental group. The scores of each on the notetaking method effectiveness at knowledge retention would then be compared directly between the matched sets. This method of matched random assignment is even more random than simple random assignment, and ensures complete equality in traits between the two groups, nearly zero systematic or unsystematic variation between the groups, aka completely reduces potential selection effects. However, this method takes more time than simple random assigment.

computing a 95% CI

Computing the 95% CI for a set of data requires 3 components: a variability component, a sample size component, and a constant (which we have no control over) A variability component (based on the standard deviation of scores) As error variability (noise; aka unsystematic variance within groups) decreases, the 95% CI becomes narrower aka the estimate becomes more precise. Error variability can be reduced by using precise measurements, reducing situation noise, and studying only one type of person or animal (ie: only high income families or only low income families) A sample size component (where a sample size goes in the denominator of the CI formula) As sample size increases the CI becomes narrower aka more precise, so researchers can increase precision of the estimate by using a larger sample A constant (in a 95% CI, the constant is at least 1.96. We have no real control over the constant when we estimate a 95% CI)

quiz questions

How many possible orders for full counterbalancing are there in a study with four conditions? - 24 In a business class experiment on the endowment effect, Theo is comparing the value of a coffee mug to someone who owns it and is selling it to someone who is buying it. The endowment effect describes the tendency of sellers to value something they own more than buyers do. Participants are randomly assigned to be buyers or sellers of a mug with their first name on it. Buyers select the maximum price they would pay for the mug. Sellers select the minimum price they would accept for the mug. Theo controls for selection effects in which of the following ways? - by using random assigment of participants (Random assignment avoids selection effects (in which participants in one group/condition/iv level of a study vary systematically from participants in the other group/condition/iv level), because each participant has an equal chance of being assigned to either condition, so extraneous (unsystematic) differences between participants are assumed to be evenly divided between the two groups Selection effects (threat to internal validity in which participants in one group or at one level of the iv vary systematically from those at another level) are controlled for via... random assignment (reduces selection effects thus enhances internal vanity by ensuring participants in each group do not vary systematically s they were randomly assigned) In psychology lab, Tetiana is conducting an experiment on depth perception using the Howard-Dolman box. Inside the box are two vertical rods and a horizontal ruler. The participant manipulates the rods until they appear to be aligned at the same distance, then the experimenter measures how far out of alignment they are. There are three conditions: left eye only, right eye only, and both eyes. Tetiana is using a repeated-measures design. The independent variable in Tetiana's design is being manipulated in which of the following ways? - within groups (Tetiana is having each participant complete each of the conditions, so it is a within-groups study.) Leigh is interested in looking at how caloric intake affects performance. She conducts a study in which participants drink a cup of water before completing a task, then eat small meal before completing the task again. Based on her study design, which of the following should she be concerned about? - practice effects (Practice effects are one of the possible problems in within-groups designs because the participants respond to each independent variable more than once, thus are measured on the dv twice. Task performance on the dv might thus be better after the meal because the participants completed the task for the second time.) Participants in a research study are given a list of words to study for 3 minutes and then, after a delay, are asked to recall the list. The length of the delay is manipulated between participants to be either 2 minutes, 5 minutes, or 10 minutes. Which of the following scenarios would present a design confound in this experiment? - All participants in the 2-minute condition are tested at 8:00 a.m., those in the 5-minute condition are tested at noon, and those in the 10-minute condition are tested at 4:00 p.m. (The time of day of the tests varies systematically with the independent variable, so this is a confounding variable. The time of day could be an alternative explanation for the differences in word recall.) Participants in a research study are given a list of words to study for 3 minutes and then, after a delay, are asked to recall the list. The length of the delay is manipulated between participants to be either 2 minutes, 5 minutes, or 10 minutes. Because different groups need different amounts of time, the first 25 participants who arrive are assigned to the 10-minute group, the next 25 are assigned to the 5-minute group, and the final 25 are assigned to the 2-minute group. Which confound does this create? -selection effect (the participants who arrive first may be systematically different from the later arrivals, so the differences between group performances may result from this rather than the independent variable.) Which is the name for a variable the experimenter holds constant on purpose? - control variable (control variables are variables other than the iv that could effect the dv, also called nuisance variables, so they are held constant intentionally as to not create a confound/alternative explanation) Which is the name for the level of the independent variable intended to represent a neutral condition? - control group Which of the following is a potential disadvantage of within-groups designs? - demand characteristics (in which exposure to all levels of a variable allows participants to guess the studies hypothesis or goals, aka what the researchers expect, and change their behavior to what they'd expect. When seeing all levels of a variable in a within groups design, seeing one variable level can change participants response to another variable level based on what they suspect the study is about / what they think researchers expect to see. Because participants see all levels of independent variable, they may guess the experiment's hypothesis and act accordingly). Other disadvantages of within groups designs include potential order effects, and they may not be possible or practical (think of teaching bike riding to kids with method a or method b study) Which of the following validities is correctly matched with the technique to address concerns regarding that validity? - internal validity and random assigment (Random assignment pertains to internal validity, whereas random sampling pertains to external validity. random assigment ensures internal validity by removing confounds regarding selection effects, aka making sure the groups are balanced/equal in traits and potential systematic differences between them won't pose a potential alternative explanation for changes in the dv between groups) Which of the following is an extra dependent variable that can be used to help researchers quantify how well an experimental manipulation worked? - a manipulation check

attrition threat

In a pretest/posttest, repeated-measures, or quasi-experimental study, a threat to internal validity that occurs when a systematic type of participant drops out of the study before it ends, aka after the pretest but before the posttest Attrition is basically participant drop-out effects, also reffered to as mortality. Attrition (participants dropping out in between pretest to post-test) becomes a problem for internal validity when attrition is systematic; that is, when only a certain kind of participant drops out (the most extreme participants dropping out cause the most issues as they change the mean the most from pretest to posttest. Participants with more average scores dropping out doesn't change the average group mean from pretest to posttest as much). (ie: only unruly campers dorp out or only well behaved campers drop out, only super depressed women drop out or the least depressed women drop out. If a random camper or random women drops out, it may not be a problem as it wouldn't change the average group mean pretest vs posttest, but if an extreme scored participant or participants drop out in between the pretest and posttest, this would change the group mean pretest vs posttest score due to an extreme score being gone) Attrition threats lead to pretest measurements varying from posttest measurements not because the iv treatment changed the dv measurement, but because extreme scores included in the pretest were absent in the posttest, thus changing the average group mean dv posttest results Why did the average level of unruly behavior in the campers decrease over the course of the one week low sugar diet - perhaps because of the diet, or maybe it was because the most unruly camper had to leave camp early. Similarly, the depression levels among women may have decreased from pretest to posttest because three of the most depressed women dropped out of the study (perhaps the treatment was too intense for them to follow through with), thus changing the average mean of scores. In attrition threats, posttest averages are either lower or higher only because these extra extreme participant scores are not included. Preventing attrition threats: attrition threats are fairly easy for researchers to identify and correct. When participants drop out of a study after the pretest but before the posttest, researchers will remove those participants' scores from the pretest. If those who dropped out didn't have extreme scores, their dropout may not affect the group mean pretest vs posttest and may not be a threat to internal validity thus not even need to be removed.

testing threat

In a repeated-measures experiment or quasiexperiment, a kind of order effect in which scores change over time just because participants have taken the test more than once, not because of the iv manipulation; includes practice effects / fatigue effects Testing threats are the result of people taking the dv measure test more than once. They can become more practiced at taking the test, leading to improved scores, or they may become fatigued or bored, which could lead to worse scores over time. Testing threats include practice effects. Preventing testing threats: to avoid testing threats, researchers might abandon a pretest altogether and use a posttest only design. If they do use a pretest, they may use alternative forms of the test for the two measurements (pretest n posttest) so the participants can;t become practiced at the dv test. The two forms may both measure depression, for example, but use different items to do so. A comparison group also helps (which is absent in a shitty one group pre/post test design). If the comparison group takes the same pretest and posttest as the treatment group, but the treatment group shows a larger change from pretest to posttest, then testing threats can be ruled out. If there is no comparison group, its hard to know whether the improvement/disimprovement from pretest to posttest is due to testing threats (practice effects) or due to the iv treatment. If a comparison group is used and both groups change (improve or get worse) from pretest/posttest, but the treatment groups scores change more, then it seems that both a practice effect and a true effect of the treatment are causing the change

counterbalancing

In a repeated-measures experiment, presenting the levels of the independent variable to participants in different sequences to control for order effects A method of controlling for order effects in a repeated measure design by either including all orders of treatment or by randomly determining the order for each subject When researchers use counterbalancing, they present the levels of the independent variable to participants in different sequences, thus any order effects should cancel eachother out when all the data is combined. when researchers counterbalance conditions (or levels) in a within groups design, they split their participants into groups, and each group receives one of the condition sequences, ultimately both get exposed to each level of the indevepant variable. But how do the researchers decide which participants receive the first order of presentation and which ones receive the second: through random assignment! They might recruit 50 participants into a within groups study, then randomly assign 25 of them to receive the independent variable level 1 first and level 2 second, and assign the other 25 to receive independent variable level 2 first and level 1 second. There are two forms of counterbalancing: full counterbalancing (all condition /iv level sequences are represented) and partial counterbalancing (some, but not all, condition / iv level sequences are represented) examples: - In the taste preference orange juice vs teeth brushing example, the conditions of brushing teeth and tasting orange juice would be presented to participants in different orders so that carryover (contamination) order effects would cancel out when the data is combined. - In the chocolate tasting study, counterbalancing was incorporated by half of the participants tasting their first chocolate in the shared tasting condition followed by tasting their second chocolate in the unshared connection, and the other half of participants tasting their first chocolate in the unshared tasting condition, then tasting their second chocolate in the shared tasting condition. Therefore, the potential order effect of "first taste of chocolate" was present for half of the people in each condition. When the data were combined from these two sequences, any order effect dropped out of the comparison between the shared and unshared conditions. Note while participants are split into groups, it is still within group in the sense that each participant is exposed to both levels of the independent variable, just in a different order so as to avoid order effects of either practice or carryover (the chocolate tasting example would be a carryover order effect, as exposure to one level of shared chocolate tasting would 'contaminate' the next level of unshared chocolate tasting with the alternative explanation / confound that the first bite of chocolate is always better).

order effects

In a within-groups repeated measures design (doesn't apply to concurrent), a threat to internal validity in which exposure to one condition (iv level 1) changes participant responses to a later condition (iv level 2) An order effect is a type of confound, or alternative explanation, for changes in the dependent/outcome variable that are not due to manipulation of the independent variable. In other words, participants' behavior at later levels of the independent variable might be caused not by the experimental manipulation but rather by the sequence in which the conditions were experienced The two types of order effects are practice effects (when participants become practiced/fatigued at the dv measurement) and carryover effects (exposure to one level of a variable 'contaminates' participants experience of the other level) An order effect could have occurred in the chocolate tasting study if people rated the first chocolate they tasted higher because the first bite of chocolate is always the best, not because of the intended iv of shared vs unshared chocolate tasting. In other words, tasting one chocolate after another would be an order effect (carryover effect) and a threat to internal validity because the order of tasting chocolate is confounded with the iv condition (shared versus unshared experiences). Solution to order effects is counterbalancing (changing the sequence in which the levels of the variables are presented to participants, often by splitting them into 2 groups, both groups still get exposed to all levels of the variable, just in a diff sequence - there's still only one participant pool seeing all the iv levels making it a within groups experiment

unsystematic variability

In an experiment, a description of when the levels of a variable fluctuate independently of experimental group membership, contributing to natural variability within groups that is not a confound / threat to internal validity when levels of a second variable vary unsystematically, in an unpredictable way, with the independent variable. (which does not create an internal validity threat). In the baby study, if some parents happened to be cheerful and others happened to be reserved, but these levels did not vary systematically with the iv groups of perseverance vs giving up, it would be an unsystematic variation and would thus not affect internal validity. In other words, if some parents in the perseverance groups were cheerful and some were reserved, and the same went for the giving up group, this would be an unsystematic variable variation between groups that did not systematically coincide with the independent variable levels- thus not causing a confound / potential alternative explanation. Unsystematic variability is also called haphazard or random variability. In the notetaking study, consider some students were interested in the video lectures they were taking notes on and others were not. If those in the handheld notes group were all very interested in the lecture and those in the laptop group were all uninterested in the lecture, this would be a second variable (student interest) systematic variation with the intended independent variable (notetaking method). But if this variation of student interest was unsystematic/random, and did not vary systematically with the independent notetaking method variable, as in some students in the handheld notes group and some students in the laptop notes group were uninterested while others were interested, it would be an unsystematic variation and not threaten internal validity

control variable

In an experiment, a variable that a researcher holds constant on purpose, also called nuisance variables When researchers are manipulating an independent variable, they need to make sure they are varying only one thing / one variable at a time - the potential causal force or proposed 'active ingredient'. In the notetaking experiment, they'd need to be sure the only variable that was varying was the notetaking method, and they'd have to rule out other potential alternative explanations / confounds. Thus, besides the independent variable, researchers also control potential third variables (also called nuisance variables) in their studies by holding all other factors constant between the levels of the independent variable. In the notetaking study, researchers manipulated the notetaking variable, but held a few other potential variables constant (making them control variables), including having people in both iv groups watch the same videos and answer the same questions about them in the same setting, etc

manipulation checks

In an experiment, an extra dependent variable researchers can include to determine how well a manipulation worked. Manipulation checks ensure construct validity of independent (manipulated) variables A manipulation check is an extra dependent variable that researchers can insert into an experiment to convince them that their experimental manipulation worked. Manipulation checks are often used when the intention is to make participants think or feel certain ways. For example, researchers may want to manipulate feelings of anxiety by telling some students they're going to have to give a public speech, or they may wish to manipulate empathy by showing poignant images of people crying. A manipulation check may be used in these cases, aka inserting an extra dependent variable in the study to check if the manipulation worked (IE: an extra dv would be asking the people they told were gonna have to give a speech how anxious they felt, or asking the ppl they chowed poignant images to if they felt sad on a self report scale) Here's an example. Researchers were interested in investigating whether humor would improve students' memory of a college lecture (Kaplan & Pascoe, 1977). Students were randomly assigned to listen to a serious lecture or one punctuated by humorous examples, and the key dependent variable was their memory of the material. In addition, to ensure students actually found the humorous lecture funnier than the serious one, researchers had the students rate the lecture on how "funny" and "light" it was. As expected, the students in the humorous lecture condition rated the speaker as funnier and lighter than students in the series lecture condition. The researchers concluded that the iv manipulation of lecture humour worked as expected. So the second dependent variable the researchers snuck in was the self-reported perceived humorous nature of the lecture, inserted in addition to the main dependent variable of memory retention of the lecture. This second dv allowed researchers to ensure their manipulation of the lecture's humor level worked.

advantages of within groups designs

In sum, the advantages of within groups designs are 1. They ensure the participants between groups will be equivalent, as they are the same participant (so selection effects are eliminated) 2. They allow researchers to make more precise estimates of the actual effect of the independent variable conditions as opposed to confound effects from extraneous differences among groups, as extraneous differences (unsystematic variations) between participants are ruled out given the participants are the same between groups. 3. They require less participants 1. the main advantage of within groups designs is that they ensure the participants in the 2 groups will be equivalent, as they are literally the same participants, so there will be no systematic differences between them, and since they aren't split up into groups, there will be no selection effects. Thus, considering there's only one participant pool without any systematic differences between these same participants, the only differences between the two conditions can be attributed to the independent variable. This is often referred to as using participants as their own controls. In within groups designs, it's often said that "each participant is his or her own control". 2. Besides providing the ability to use each participant as his or her own control, within groups designs also allow researchers to make more precise estimates of the differences between conditions, or make more precise estimates of the actual effect of the independent variable in the absence of extraneous variations in groups. In independent groups designs, extraneous differences(unsystematic variability) in personality, preferences, gender, ability, and so on make small differences in the traits of each group, even when matched grouping is used. However, within groups these differences are held constant as the participants are the same so there are no varying extraneous differences between 'groups' (levels of the treatment condition). When within groups designs naturally hold these extraneous differences constant, as participants are the same, researchers can estimate the effect of the independent variable more effectively. "When extraneous differences (unsystematic variability) in personality, food preferences, gender, ability, and so on are held constant across all conditions, researchers can estimate the effect of the independent variable manipulation more precisely—there is less extraneous error in the measurement." 3. Another advantage of within participant groups is they generally require fewer participants overall. Suppose I wanted groups for my meditation study with 50 meditators and 50 non-meditators. If I used an independent group's design, I'd need 100 participants with 50 in each group. But if i used a within groups design, id only need 50 participants, and would expose each participant to both levels of the meditation variable.

individual differences

Individual differences among participants within a group can be another source of within groups variability that obscures actual effects of the iv on the dv and can lead to false null. IE: in the experiment on money and mood, the normal mood of participants must have naturally varied - an inevitable individual difference that will effect a participants response to the money (if there mood is typically low, it may still score low after the money, if there mood is typically high, it may still be high even after getting less money). Individual differences spread out the scores of participants within a group. Controlling for individual differences (1. change design from independent groups to within groups, or 2. add more participants) 1. Change the design: one way to accommodate for individual differences causing within-group variation is to use a within-groups design, or matched design, instead of an independent-groups design. A within groups design, in which all participants are compared with themselves, controls for irrelevant individual differences - individual differences within one group, is better than individual differences between two groups. Individual differences between two matched groups are better accounted for via the matching than if the groups were not matched. 2. Add more participants: if a within groups design or matched groups design is inappropriate (ie: bcuz of order effects, demand characteristics, or they are impractical), another way to reduce individual differences from causing too much variability within experiment groups is to measure more people. When a greater deal of variability within groups exists because of individual participant differnces, the more people you measure means the less impact any single person will have on the group's average. Adding more participants to a study reduces the influence of individual differences within the groups, thereby enhancing the studies ability to detect true differences between groups due to the iv and not conclude a false null effect

noise (error variance / unsystematic variance within groups)

Noise refers to unsystematic variability within the members of a group in an experiment, which might be caused by 1. situation noise, 2. individual differences, or 3. measurement error. Also called error variance, unsystematic variance. Another reason a study might find a null effect is that there is too much unsystematic variability within each group in an experiment - aka too much 'noise' within the group. So noise refers to (unsystematic) within groups variability aka bad / error variability aka error variance Too much unsystematic variability (noise) within each individual experimental group can obscure group differences caused by the iv, making these iv effects undetectable, leading to null effect. This is referred to as noise, also known as error variance or unsystematic variance When there is more variability (noise) within experimental groups, it obscures the differences between the groups because more overlap exists between the members of the two groups. This is a statistical validity concern: the greater the overlap between groups, the less precisely the two group means are estimated and the smaller the standardized effect size More variability within groups = wider CI and smaller effect size, as there's more room for error via the variations within the group. Less variability within groups = narrower CI and larger effect size, as there's less room for error via the minimal variations within the group. (IE: If the two bowls of salsa contained nothing but tomatoes (limited variation within the two groups aka the two salsa bowls), the difference between two and four shakes of hot sauce would be more easily detectable because there would be fewer competing, "noisy" flavors within bowls (fewer variations within the bowls/groups).)

repeated measures design

One of the two types of within groups designs (other is concurrent measures) in which participants respond to a dependent variable more than once, after exposure to each level of the independent variable. In a repeated measures design, participants are measured on a dependent variable (ie: gpa) more than once, after exposure to each level of the independent variable (ie: meditation level 1 vs no meditation level 2). They are exposed to both levels of the iv at diff times: iv level 1 at time 1, measured on the dv, then iv level 2 at time 2, and measured again on the dv. For example, a study in which researchers looked at whether sharing a good experience with another person would make it better than having it alone, and if sharing a bad experience would make it even worse than having it alone. The independent variable was tasting chocolate: level 1 was tasting it alone, level 2 was tasting it at the same time as another participant. The researchers found that they rated the chocolates better when tasting it along with another person. The independent variable had two levels: sharing and not sharing an experience. Instead of having 2 groups as they would in an independent measures design (one which tasted alone and one which tasted along with others) participants experienced both levels of the variable at different times. It was a within groups design because there weren't two groups (all participants got all levels of iv chocolate tasting), and it was a repeated measures design because each participant rated the chocolate (dv) twice - aka repeatedly, given diff levels of the independent variable presented one after the other. In other words, they were measured on the dependent variable (chocolate enjoyment) twice, once after exposure to the first level (tasting alone), and once after exposure to the second level (tasting together). Repeated measures designs are subject to a particular design confound known as order effects, such as carryover effects, in which the order the iv levels are presented can effect participants responses to them (perhaps there was a carryover effect in that the participants rated the first chocolate better because the first taste is always better, aka the first level of the iv (first chocolate taste) contaminated the second level (second chocolate taste). Order effects (carryover effects in particular) can be prevented by counterbalancing either full or partial (presenting the iv levels in diff sequences for diff participants, like half get level 1 first and half get level 2 first, but its still within groups repeated measures cuz they get exposed to the iv levels at diff times and are measured twice after each time and its all one participant pool)

ceiling and floor effects on the dependant variable

Poorly designed dependent variables can lead to ceiling and floor effects. For dependant variable ceiling and floor effects, ask: was the dependent variable measure sensitive enough (and appropriate) to detect the difference? Imagine if in the anxiety and reasoning study, the logical reasoning dv test was so difficult that nobody could solve the problems - neither the intended low anxiety group or high anxiety group. That would cause a floor effect: the 2 anxiety groups would score the same not because the iv manipulation didn't have an effect, but that it seemed to have no effect because the dv measurement was flawed and resulted in low scores - aka the reasoning test was too hard so everyone got a low ability to reason dv score. Another example: suppose the dv reading test used in the online reading games study asked the children to point to the first letter of their own name. This dv test would be too easy - both the groups (those who did online reading games and those who didn't) would pass the test. Thus their scores would be towards the high end of the distribution, wed see a ceiling effect, not because the iv manipulation of online games vs no games had no effect, but because the dv measurement was flawed and didnt accurately measure the effect of the iv. So we'd see a null effect (no difference between groups with both scoring high) when there really may have been a difference, the dv test was just flawed and inaccurate at detecting it Another example: The students in Professor Zhao's 50-student Introductory Psychology class were randomly assigned to one of two review sessions, each being taught with a different technique. The next day, all 50 students got all 10 test questions correct. - this is a ceiling effect on the dv because the test was too easy. The Iv levels may have had differing effects, but the dv test was poorly easily designed and didn't accurately measure these effects - so we would see no differences between groups and detect a false null effect

power and precision

Power and precision: the opposite of obscuring Power: an aspect of statistical validity, the likelihood that a study will return an accurate result when the independent variable really does have an effect; aka the extent a study can detect an effect of the iv when there really is one. It's the likelihood of avoiding a type 2 (missed a clue, falsely accepted a null) error. Power can be improved by - using a within groups design (or matched design) to control for individual differences causing within groups variability - employing a strong manipulation - taking accurate and sensitive dv measurements - using a larger numbers of participants thus having a smaller CI and more precise estimate - minimizing situational noise The easiest way to increase precision and power is to add more participants.

lecture notes ch. 11

Power: if our study does not have adequate power, we may get a false null effect aka a type 2 error (type 2 missed a clue, we falsely accepted a null, said there was no effect when they're acc was one out study just didn't detect it due to either 1. too much within groups variability or 2. too little between groups variability). Power is the ability of our study to detect an effect of the treatment when there really is one - power is essentially a measure of how sensitive your test is: if there is a difference, will out study be able to detect it with sensitive enough dv measurements and strong enough iv manipulations, as well as control for confounds Increasing power - power can be increased by increasing between group variability (ie: making the diff levels of the iv more different aka using strong manipulations, such as having ppl meditate for an hour a day vs no meditation instead of one minute a day vs no meditation, and making sure our dv measurements detect differences). - Power can also be increased by reducing within groups variability, aka reducing 'noise' (unsystematic variance) within the group by using reliable measures, using homogenous (similar) participants, and switching designs from independent groups to within groups so as to control for only one set individual differences instead of individual differences within two separate groups. However, you don't wanna make your groups too homogenous / over controlled. This could render your results ungeneralizable. While studies do sacrifice external validity for internal validity, you don't wanna overdue it such that the participants and study are so controlled that the results dong even apply to the external population - the easiest way to increase power is to add more participants. Larger sample sizes have the advantages of 1. random errors in measurement and individual differences in participants cancel each other out, and 2. the probability of a confound looking like it made a difference decreases Statistics: - statistics (ie: F-statistic, T-statistic) are ratios of good variability (true variability due to iv treatment) vs bad variability (error variability) - statistics = good variability / bad variability(good variability comes from between groups, bad variability comes from within groups). - aka statistics = between groups variability (good variability) / within groups variability (bad variability) - the bigger the statistic (the bigger F, bigger T, bigger Z), the more likely you are to reject the null hypothesis aka to find a statistically significant result or effect of the iv on th dev that wasn't due to chance - T = M1 - M2 / SE ( T = mean 1 - mean 2 / standard deviation from the mean). In other words, T = between group mean differences / within groups varaibility - the standard error is a measure of the average variability within the groups (standard error is a measure of 'bad' or error variability aka within group variability! - when we have limited variability within the research groups, we can have more confidence that the variability we see between the groups is meaningful, or statistically significant aka not due to chance Causes of too much within groups (bad) variability - individual differences (between participants, ppl respond diff to treatments) - situation noise (external events occurring during the study that are distracting and obscure the true effect of the iv on the dv) - measurement error (variations or errors in dv measurements between participants within the group, often due to a poorly designed dv) Causes of too little between groups variability (good variability) - insensitive (dv) measures - weak (iv) manipulations - ceiling and floor effects on the iv and dv - reverse design confounds (when a design confound contracts or 'reverses' the intended effect of the iv on the dv, ie: giving hella money to ppl to raise their mood vs giving a small amount of money - perhaps the experimenter who have the large amount of money was a dick thus reversed any potential mood enhancing effects of money when he gave it to the participants, or perhaps the experimenter who gave small amounts of money was super cheerful thus reversed or counteracted the potential mood dampening effects of getting less money by being so cheerful to the participants)

treatment group

The participants in an experiment who are exposed to the level of the independent variable that involves a medication, therapy, or intervention (my treatment group was my meditating group). In a medication testing study, the treatment group would be those taking the medication while the control group would be those taking a sugar pill.

Situation noise

Unrelated events or distractions in the external environment that create unsystematic variability (noise) within groups in an experiment and can obscure iv effects leading to false null result Besides measurement error and irrelevant individual differences, situation noise (external distractions in experiment environment) are a third factor that could cause variability within groups (bad variability) and subsequently obscure true group differences Situation noise: unrelated events or distractions in the external study environment that create unsystematic variability within groups in an experiment (aka create noise) IE: suppose the money and mood researchers had conducted the study in the student union - the distractions in the environment such as music hella ppl etc would obscure the actual effect of the iv (money) on the dv (mood). The kind of distractions would vary moment to moment and from participant to participant, resulting in unsystematic variability within the experimental groups that obscured any true difference the iv made on the dv, resulting in a false null effect Researchers can control for situation noise (external distractions) by conducting experiments in controlled settings that stay the same among participants. IE: the researcher in the reading online games study may control for citation noise by administering the dv reading test in a quiet controlled room

external validity of casual claims

We know that casual claim experiments prioritise internal validity over external validity / generalisability. BUT, if you've got too much limitation in your casual claim experiment in the form of too many control variables, the external validity may be too low for the results to even matter in the real world at all. Using too many control variables or holding too many things constant in a study can render the study non-representative of the actual population, thus the results may not even matter

expressing effect size (strength of the relationship between two variables): using original units and using standardised effect sizes

When we do experiments, we have 2 ways to express effect size: 1) using original units (ie: People in the longhand condition earned an average of 4.29 points on the conceptual questions, compared with 3.77 in the laptop condition. Therefore, the effect size in original units is 0.52 points of improvement) - original units are good when you want to estimate the real world impact of an intervention (ie: how much would taking laptop notes affect a course grade 2) Using standardized effect size units (aka correlation coefficient R effect size or t-test statistics or D effect size). - Standardized effect sizes like d are good when you want to compare effect sizes that are based on different units from different studies. Standardized effect sizes allow you to compare results found in one study to a larger body of knowledge, while original effect sizes help researchers estimate real world impacts of interventions.

Single Blind vs. Double Blind study

a single blind study is a study in which the subjects/participants do not know if they are in the experimental or the control group - A placebo group is an example of this. Say the study is on the effects of prozac on depression. In a single blind study (participant blind study), the participants wouldn't know whether they got the pill or not as one group would be getting a sugar pill, thus they'd be blind to which experimental group they were in a double blind study is an experiment in which neither the participant nor the researcher knows whether the participant has received the treatment or the placebo, aka an experiment in which both the researchers and the participants are blind to group assignments In a double blind study, both the participants and the researcher are blind to which experimental conditions either group is receiving (ie: as in a double blind placebo study)

statistical significance

a statistical statement (often a statistic like t, r, or d) of how likely it is that an obtained result occurred by chance Statistical significance can expressed by either effect size correlation coefficient r, a t-test statistic, or effect size d statistical signifigance describes the likelihood that a dependant variable result is due to independent variable manipulation or due to chance (aka describes the power of our experiment, aka how likely it was that we correctly rejected a false null result and found an actual effect when there was one)

pilot studies

a study completed before (or sometimes after) the study of primary interest, usually to test the effectiveness of the iv manipulation. A pilot study is a simple study, using a separate group of participants, that is completed before (or sometimes after) the main study to determine the effectiveness of the iv manipulations. for example: In the humorous lecture study, a pilot study may have exposed a separate group of students to either a serious or a humorous lecture (iv level 1 serious lecture and iv level 2 humorous lecture), then measured how funny they found the lectures as the dv. This would allow them to determine the effectiveness of their manipulation of the iv (lecture humor level) in a smaller study done prior to the main study, which was intended to test whether lecture humor level affected knowledge retention.

factorial designs

a study in which there are two or more independent variables, or factors the simplest factorial design is a 2x2 factorial design in which there are 2 independent variables and 4 cells (4 possible combinations of independent variables) You can determine the amount of cells in a factorial design by multiplying the consecutive independent variables. (2x2 = 4 cells, 4x5 = 20 cells, 2x2x2 = 8 cells) There are two main types of factorial designs: independent groups designs and within groups designs. There are also mixed factorial designs, in which one dependant variable is manipulated as independent groups and the other is manipulated as within groups

cells

cells refer to a condition in an experiment. In a simple one iv experiment, a cell represents the level of one iv. In a factorial 2 iv experiment, a cell represents one of the possible combinations of two independent variables. Cells are all possible combinations in independent variables in a factorial experiment In the cell phone driver study, there were 4 cells: 1. young drivers on cell phones, 2. Young drivers not on cell phones, 3. Old drivers on cell phones, 4. Old drivers not on cell phones. The 4 different experimental conditions, which layered the diff levels of both iv's on top of eachother, allowed the researchers to test if driving speed depended on an interaction between the two iv's. This particular 2 independent variable, 4 cell (combination of the 2 variables) design is called a 2x2 factorial design, or two by two factorial design

effect size d

effect size 'd' is a standardized effect size that takes into account both.. 1. the difference between means in each condition (experimental group mean vs comparison group mean) and... 2. The spread of scores within each group (aka the standard deviation). ... So effect size d takes into account both 1. the difference between means in 2 groups and 2. the standard deviation from the mean within these groups, aka the spread of scores within each group. - When effect size D is large, it means the independent variable caused a large change in the dependent variable, relative to how spread out the scores between the experimental group and comparison group are - aka the iv had a large effect. - When effect size D is small, it means the scores of participants in the experimental group and comparison group overlap more, so the independent variable caused a minimal change in the dependent variable. Effect sizes D are larger when the scores in the two experimental groups overlap less, and smaller when the scores between the two groups overlap more. - Overlap is a function of how far apart the group means are (how far apart the experimental group mean is from the comparison group mean) as well as how variable the scores are within each group. In the laptop vs handheld notes study, the effect size d was d = 0.38, describing the difference in the mean conceptual test performance between the longhand group and the laptop group. D = 0.38 means the laptop group scored 0.38 of a standard deviation higher than the longhand group. Benchmarks for D effect size: a d = 0.2 is considered small, a d = 0.5 is moderate, and a d = 0.8 is large. These are benchmarks, not roundups. Compare to r effect size benchmarks, where r = 0.1 is small, r = 0.3 is moderate, and r = 0.5+ is large. - According to these guidelines, a d=0.38 is considered small to moderate. But remember, small effect sizes can have a large real world impact when accumulated over many people or situations. - d effect sizes are often used when there are 2 groups in an experiment, aka its an independent groups design (pretest/posttest design or posttest-only design)

interrogating null effects

false null effects can be the result of either... 1. too little differences/variation between the seperate experimental groups - this can be due to 1. weak manipulations, 2. insensitive dv measures, 3. ceiling and floor effects, and 3. reverse design confounds - too little differences between groups can be prevented by using manipulation checks and pilot studies to ensure strong manipulations, designing proper dv measurements to avoid ceiling and floor effects 2. too much difference/variation within individual groups - this can be due to 1. measurement error, 2. individual differences (between participants in each group), and 3. situational noise (distracting external factors In the study setting) - too much variation within groups can be prevented by using more participants, or changing the design from independent groups to within groups

6 internal validity threats special to one group pretest post-test designs ('really bad experiments')

history threats, maturation threats, regression threats, testing threats, attrition threats, instrumentation threats

weak manipulations

ineffective manipulation of the independent variable, causes not enough variation between groups, can lead to false null effects . IE: one week of reading games may not improve reading skill in an experimental group compared to a control group, but 3 months of online reading games would've been a stronger manipulation. Questions to ask: how did the researchers manipulate the iv, was it strong enough, was the difference between iv levels in the experimental group and dv group enough to produce differences on the dv measurement, so manipulation checks suggest the manipulation did what it was intending to do? Changes in the independent variable that are not significant enough to affect the dependent variable

ceiling and floor effects on the independant variable

ineffective or weak manipulations of the iv can lead participants scores on the dv to cluster either low or high, not due to the differing iv levels actually not having an effect on the dv for the experimental vs control group, but due to the iv manipulation being ineffective / too weak to produce an observable effect in the dv. So researchers wouldn't detect an effect of the differing iv levels even if there was one and would get a false null effect. For ceiling and floor effects in the independent variable, ask: Was the iv manipulation strong enough to create a difference between the groups? IE: researchers manipulated three levels of anxiety by threatening ppl with a 10 volt shock a 50 volt shock or a 100 volt shock. All these iv levels made people anxious, thus their dv measurements of anxiety all clustered towards the high end of the dv anxiety variable (ceiling effect) regardless of which iv condition they were exposed to - this was due to the levels of the iv not being different enough between the 3 conditions and all producing anxiety. Questions to ask: are there meaningful differences between the levels of the iv? Do manipulation checks suggest the manipulation did what it intended to do?

insensitive measurements

insufficiently sensitive measurement of the dependent variable, can cause researchers to not see an effect on an iv when they're rlly was one their dv measurement just wasn't sensitive or appropriate to detect it, can lead to not enough between groups variability and thus false null effect IE: researchers used a pass/fail measure when the improvement or difference in the iv treatment vs control group was only detectable by using a finer grained measurement scale. Questions to ask: how did the researchers measure the dependent variable and was the measure sensitive enough to detect group differences?

controlling for observer bias and demand characteristics

observer bias and demand characteristics can be best controlled for with either 1. a double blind study, or 2. a masked design A double blind study is when neither the researchers or the participants are aware of the experimental conditions, or neither the participants or researchers know who is in the treatment group and who Is in the comparison group When full double blind studies aren't possible, a masked design can be used. A masked design is also called a blind study, or a single blind study. It is when the researchers / those administering the treatment are unaware of the experimental conditions aka unaware of who is in the treatment group and who is in the comparison group, but the participants know which group they're in (often by virtue of the study design) If either the researchers or both the researchers and participants don't know who is in the treatment group vs who is in the comparison group, this controls for observer bias cause the observers won't know what to expect from each group, and it controls for demand characteristics as the participants won't know if they're getting the treatment or not so they won't have expectations that influence their behavior

3 internal validity threats that can affect any study, regardless of design

observer bias, demand characteristics, and placebo effects

if you only have 2 experimental groups to compare, the preferred statistical test is a..

t-test If you only have 2 groups to compare, t-tests are the preferred statistical test (if u have more than 2 groups you must use an ANOVA test and factorial design) two types of t-test: - indépendant samples t-test: independant samples t-tests are used for independent groups designs; comparaison of means (in either experimental group) in the numerator, and standard error in the denominator - dependant samples t-test: dependant samples t-tests are used in within groups designs, 'one group t-test'; compares means between the 2 'groups' - the diff levels of the iv within that one group

measurement error

the degree to which the recorded measure for a participant on some variable differs from the true value of the variable for that participant. Measurement error is basically when a measurement for a participant is inaccurate to their true value of the measured variable, either too high or too low. Measurement errors may be random, such that scores that are too high and too low cancel each other out; or they may be systematic, such that most scores are biased too high or too low Measurement error can cause high within-groups variability, which increases overlapping scores between the two groups and thus obscures the true effect of the iv and can lead to a null effect One reason for high within-groups variability is measurement error, a human or instrument factor that can randomly inflate or deflate a person's true score on the dependant variable An example of measurement error is a person who is 160cm tall being measured at 161cm tall because of the angle of vision of the person using the meter stick. All dependent measures include a certain amount of measurement error, but researchers try to minimize it. - Measurement error in the dependent variable will result in scores that are more spread out within a group (i.e., increased within-groups variability Dependant variable score = participants true score + / - random error of measurement In the reading online games study, measurement error could show up in the forms of / be due to: perhaps some students had been exposed to the words on the reading test at home, perhaps some students were tired when they took the test and others were focused, etc. This would lead to their dv scores being inaccurately measured, aka the dv scores not accurately reflecting their true reading comprehension given the iv manipulation. When these distortions of measurement are random like this, they cancel eachother out across a sample of people and will not affect the groups average/mean. But an operationalization with a lot of measurement error will result in a set of scores that are more spread out around the group mean... The more sources of random error there are in a dependent variables' measurement, the more variability there will be within each group in an experiment. On the contrary, the more precisely and carefully a dependant variable is measured, the less variability there will be within each group Ways to reduce measurement error: (1. use precise measurement tools or 2. measure more instances w a larger sample or repetitive measuring) 1. Use reliable and precise tools (make sure measurement tools are appropriate and reliable, ensure inter rater reliability, test-retest reliability, and internal reliability of measurement) 2. Measure more instances (use a larger sample, or take multiple measurements on the sample you have to assess reliability/consistency between these measures and determine if there's too much measurement error. The more participants or items there are, the better chances of having full representation of all the possible errors and the more the random errors will cancel eachother out, resulting in a better estimate of the true measurement average within a group

systematic variability

when the levels of some second variable coincide in a predictable and systematic way with the independent variable, creating a potential design confound aka second variable explanation (threat to internal validity) Iin the baby perseverance study, if the parental models emotional disposition varied systematically with the independent variable, as in cheerful parents systematically went with the perseverance condition 1 and reserved parents went with the giving up condition 2, then emotional disposition would be a design confound, as it would vary systematically with the independent variable and thus be a potential alternative explanation for the dependant outcome variable In the notetaking study, consider some students were interested in the video lectures they were taking notes on and others were not. If those in the handheld notes group were all very interested in the lecture and those in the laptop group were all uninterested in the lecture, this would be a second variable (student interest) systematic variation with the intended independent variable (notetaking method). But if this variation of student interest was unsystematic/random, and did not vary systematically with the independent notetaking method variable, as in some students in the handheld notes group and some students in the laptop notes group were uninterested while others were interested, it would be an unsystematic variation and not threaten internal validity


Set pelajaran terkait

B101 Previous Quizzes Study Guide

View Set

Lesson 3 (CH18) DSM - Cardio-Heart (Brasher AP II)

View Set

Quick Recall Arts & Humanities Questions

View Set

SOC 110 Ch. 6 Knowledge Development

View Set

What do you enjoy doing in your free time? How much time do you have each week for doing these things? Why do you like doing these activities? How did you start doing this activity at first? Is there some other hobby or sport you would like to try?

View Set

Week 1- Practice Exercises ( Exercise 2 )

View Set

Chapter 2: Securities Markets and Transactions

View Set

Ch 20 Preserving and Protecting Your Environment

View Set