PSYC 270 Final
Ex post facto
"after the fact". Condition/treatment/event is not within control, but want to measure impact of it on behavior, mental health, etc. Refers to conditions in an experiment that are not determined prior to the experiment, but only after some manipulation has occurred naturally.
2x2 factorial design
2 factors; each with 2 levels. All possible combinations of the levels of the IVs are considered. A1B1, A2B1, A1B2, A2,B2
Psuedo-science vs Science: evasion of risky tests
A good scientific experiment is set up in such a way that it is extremely unlikely to yield a positive result unless the hypothesis is indeed true. Science is concerned about truth. Pseudo-science has a lot to lose if they are wrong, so they tend to avoid tests that might yield unfavorable outcomes.
Comparing groups or conditions
A significant difference between groups or conditions allows researcher to conclude that an effect exists in the population because they are able to reject the null hypothesis in a significant test. Sample mean differences can be caused by the effect, by sampling error (random group difference), or both. Type 1 error. A larger statistical value indicates a larger treatment effect because this suggests sample means differ due to a true effect. IF the means are VERY DIFFERENT, it increases the chance that the effect you found is true. Individual variability could be different due to chance alone. The smaller the group, the higher the chance of having random chances.
One-shot case studies
An ex post facto study. Observing behavior after some treatment or event has happened. Researcher has no control over the timing of the intervention but still wants to describe bx after something has happened. One-group posttest-only design or a single group design. There is no control or comparison. Why use it? Better to describe the phenomenon while you have a chance if it is something that will not last long or won't happen again. Take a group of 40 newly hired wait staff and give all 2-hr training session in which you have them introduce themselves to customers by first name and check in on customer 8-10 minutes after giving the food (treatment). Participants begin employment and record the among of tips for one month (posttest score). Should have split into 2 groups; random assignment. Start first without training, then training, then measure tips.
Hidden assumptions
Avoid questions that have a situation that may not apply to everyone. "Even with help from family" assumes someone has family.
Limitations to correlational research
Correlation does not equal causation A third variable might account for the true underlyign relationship. Z may cause X and Y. What other variables might account for the relationship between smoking and lung cancer? As head size increases, so does memory, by age might be the true variable. Low correlations may be the product of a truncated range and not because there is a lack of actual correlation. Can have a broad range on one scale (50 to 115). Age on the other scale. If very narrow range (18-20). Tougher to know if there is a correlation since there's very low variability. If narrow range on one variable, hard to find a correlation (like in our group project). Always important to look at descriptive stats to see the variability.
Double-barreled questions
Each question should only contain one concept. How many cups of coffee and tea do you drink per day? Trying to assess caffeine intake.
Pseudo-science
In contrast to science, pseudo-science will often lack the fundamental aspects of research. It can be heard to distinguish science and pseudo-science, but they differ in 10 fundamental ways
Examples of interrupted-time series designs
Introducing the 20-cent fee for directory assistance calls in Cincinnati. Calls dropped from 90,000 to 15,000 Divorce rates after terrorist event Drug use trends before and after introduction of a drug education program.
Threats to validity in quasi-experiments
Maturation- change in people over time because of growth and natural change. Most noticeable for kids (white matter development, social intelligence). History- a confound in research that takes place between measurements because of historical changes among the participants.
Illusory correlations
Might be a correlation, but it's just a coincidence. Things to do to rule it out. DID and childhood trauma- abuse, neglect, parent death, etc. Need a contingency table to prove to illusory.
Two types of quasi-experiments
One-shot case studies (the worst form of quasi-experimentation). Use to form hypotheses Interrupted-time series design
ABA Exercise Chart
Plotting across 28 days and minutes of exercise. Get baseline. Some variability. Treatment- monetary loss contingency; you give therapist an extra $500, the therapist will take the money out and keep it themselves. Loss aversion. If you don't meet your goal, you will lose $20. Then return to baseline without intervention. Behavior dropped backed to baseline.
Factorial designs simple effect
Simple effect is the average within one cell, based on 2 different IV's.
Predictive
The variable that is used for the prediction is called predictor and used to predict outcome.
Science
There are several ways of knowing, but Science is the best because it uses empiricism and rationalism We push information through the scientific filter with hypotheses and theories. Science is self-correcting because of replicability and falsifiability. Good scientists have a healthy dose of skepticism and critical thinking skills.
Multi-group time series
Using non equivalent control groups and comparing to treatment group on the line grapj/
Validity: response scales
Want more precise.
Replication
3 kinds of replication: 1. Direct replication- repeating the experiment as closely as possible to determine whether the same results will be obtained 2. Systemic replication- repeating an experiment while varying numerous factors considered to be irrelevant to the phenomenon to see if it will survive these changes. Conceptual replication- attempt to demonstrate an experimental phenomenon with an entirely new paradigm or set of experimental conditions (related to converging operations)
ABAB Design Example
4 year old Bill's crying over mild frustration. 1. First, establish a baseline (no treatment yet) of the behavior. Researchers used a ten day time period (first A phase). Number of tantrums. 2. Introduce intervention. Ignore the crying and reward more appropriate bx. First B phase. 3. Now all we have is AB. We see an effect but we can't say if it's truly from the intervention. 4. Next, the researchers return to baseline by treating Bill as they did before the treatment ever started. Second A. Baselien bx comes back, but not as severe 5. The behavior returned, so the researchers implemented the treatment again. Second B.
Hope to correlational studies
A pattern of correlation can be combined to strengthen your assumptions Criteria must be met in order to strengthen the assumption of causality: 1. Consistency. Do results replicate across multiple tests and multiple studies to rule out 'flukes'? YES- studies consistently find a relationship between smoking and cancer. 2. Strength of relationship. Is the association clinically meaningful, ruling out trivialities? YES- studies show that smokers live significantly shorter lives than non-smokers. 3. Dose-response relationship. Are higher levels of the presumed cause paired with higher levels of the presumed effect? YES- studies consistently find that the more you smoke, the higher the chance of cancer and that if you stop smoking, you reduce the chance of cancer. 4. Plausibility- Can the relationship be meaningfully explained by sound theorizing? YES- there are known carcinogens in cigarettes. 5. Coherence. Is the explanation consistent with other known facts? YES- all the data lead to one explanation and there are not any other theories that better account for increased cancer rates in smokers.
Bias disrupting validity
A sample that does not accurately reflect the population of interest is a biased sample. Ways researchers may introduce bias: nonresponse error: a sampling error that occurs when individuals chosen for the sample do not respond to the survey, biasing the sample. Can happen with things like timing. Ex: Doing a survey when finals week. Bias comes with how you're sampling those highly motivated. Ex: political survey-- not a survey everyone fills out. If you have strong opinions, you are more likely to fill out the survey. If you don't care, you probably won't take it. People who fill it out are those that are already strongly opininated. Coverage error: a nonresponse sampling error that occurs when the sample chosen to complete a survey does not provide a good representation of the population. Very small sample, specific university, specific age.
AB Design
A= baseline bx. B= bx after some therapy is introduced. Not a good way to do it. Invalid way to testing the effectiveness of an intervention because of the problem of falsifiability We don't know if it was the intervention (IV) that caused a change in bx (DV) or some other extraneous variable. The typical solution to this problem is to actually use a large-n design and randomly assign participants to a treatment condition or control condition. Point system in which the teen loses privileges. At the same time, however, she starts asking parents if they are proud of her and they lavish her with praise. Also at that same time, she stops hanging out with two peers that were teasing her. This is the extraneous variable that we didn't know about.
Quasi-experiment
An experiment in which the independent variable occurs naturally and is not under direct control of the experimenter. Similar to ex post factor The design of a quasi-experiment is similar to the small-n designs. Like unto an ABA design except the experimenter is not applying the IV but is making an observation of its effects after the fact. Might be thought of as an observation-treatment-observation design. There is no true reversal phase in this design because the IV is not typically under the control of the experimenter and the treatment is likely to have a carryover effect. 9/11 event. Now COVID-19.
Comparing more than two means with ANOVA
Analysis of variance is used to test an effect when more than two groups or conditions are being considered. An F-statistic is calculated for each effect tested based on the ratio of average variance between the condition means and the average variance within the conditions. Variance is variability within scores (range, SD). Compares the variance between groups. The comparison of means for each level of an independent (or quasi-independent) variable is the main effect. If a main effect is significant in an ANOVA, it indicates that there is a mean difference somewhere among the means tested. The main effect does not tell you which means are different from one another. You must conduct additional tests, called post hoc tests, to determine which means are different. Post hoc takes each group at a time. Like individual t tests. Table of all the comparisons. If the main effect is significant, then what's causing it? An interaction effect is also tested in an ANOVA for all combinations of independent variables. The interaction effect tests the effect of one independent variable at each level of the other variable. The interaction effect is teste by comparing the difference in condition means across levels of one independent variable for each level of the other independent variables.
Converging operations
Because any particular measurement procedure may provide only a rough and imperfect measure of a given construct, researchers sometimes measure a given construct in several different ways. By using several types of measures-each coming at the construct from a different angel- a researcher can more accurately assess the variable of interest. When different kinds of measures provide the same results, we have more confidence in their validity. This approach to measurement is called converging operations or triangulation. Related to falsifiability. Strong inference: scientific progress comes about through a series of tests of alternative theoretical outcomes We cannot prove a theory. The best way to show support for a theory is by the method of strong inference. We try to disprove as many opposing theories as possible, while showing support for the main theory. Coming up with alternative hypotheses to avoid confirmation bias. Few potential problems with converging operations: Generalizing from non-human to human species. Converging operations require good communication between scientists and their subfields Converging operations takes a lot of time and energy.
Benefits and limitations of factorial designs
Benefit: enhances the signal. If there is an effect, we are likely to find it. Increases the chances that we identify the signal. Efficient design by combining two studies. Only testing for interactions. Limitations: complex, more stat steps and interpretation is more challenging. More participants required.
ABA Design
Better way. Baseline bx. Intervention. Bx after the intervention is removed. The most common single participate research design. Attempts to demonstrate that IV affects behavior, first by showing that the variable causes a target behavior to occur, and then by showing that removal of the variable causes the behavior to cease. Often referred to as reversal design. Returning to baseline. Participant is first observed in the absence of the IV. Target bx is measured many times to establish adequate baseline for comparison. After the target bx is seen to be relatively stable, the IV is introduced and the bx is observed again. If IV influence bx, we should see a change in the bx. If so, we may have an effect, but we'd be wrong; this is just AB right now. We have to remove the treatment and see if this has an effect. If the bx returns to baseline, the treatment probably had an effect. But if the treatment causes permanent effect, it's a carryover. In therapy situations, the clinician will often times decide to not withdraw the IV because of such good improvements in the target bx (ethical considerations). Many variations of the ABA design. To increase our confidence of the effectiveness of the intervention, the researcher may decide to introduce the IV again in an ABAB design.
Age as a variable
Can have a big impact on things. Cross-sectional design- taking a large sample of the population of various ages at one time and testing them. This design makes it hard to distinguish age effects from generational effects- when groups differ in age as well as the generation in which they grew up in. Longitudinal design- testing one group repeatedly as they age. Three weaknesses: hard to find people willing to be tested again and again. It's hard to keep track of people. Repeated testing can take years, lots of money, and effort. Time-lag design- used to control for time-of-testing. Subjects of a particular age (19 yr olds) are tested at different time periods. Cross-sequential design- used when trying to control for cohort and time-of-testing effects. Involves testing several different age groups at several different time periods.
Looking for relationships with Pearson R and regression tests
Chi-square test: a significance test to determine if a relationship exists between two variables measured on nominal or ordinal scales. Proportional. UNC engineering gender majors ratio vs national average Pearson r test: a significance test used to detmine if a linear relationship exists between two variables measured on interval or ratio scales. Linear regression: a statistical technique that determines the best fit line to a set of data to allow prediction of the score on one variable from the score on another variable.
Main effect
DV:: agressive play. IVs: prior state (frustrated or not), cartoon type (violent and nonviolent). Looking at impact of just 1 IVby itself. Marginal means. Each IV by itself, independent of the other ones. The means are in the margins of the table. If there is a stat difference between the marginal means (4.15 vs 2.71) we say there is a main effect. Main effect for cartoon type. The effect of 1 IV on the DV is referred to as the main effect. If the IV doesn't cause a change in the DV, there is no main effect. Looks at one IV at a time.
Research questions in correlational studies
Descriptive research question: presence of behavior, how frequently Predictive: one behavior predicts another Predictor variable vs outcome
Creating a survey
Easiest way: find a questionnaire that has already been developed, tested, and validated Two good sources: health and psychosocial instruments database, mental measures yearbook What if your questionnaire is new? psychometrics: involves the development, validation, and refinement of surveys and tests for measuring psychological constructs.
Factorial Design: Memory Testing
Effects of sleep deprivation and cafffeine on memory. Factor 1: sleep deprivation. 1 hr vs. 24 hrs. Factor 2: caffeine. Pill vs. placebo. Main effects: Collapsed across the other IV. There is a main effect for sleep deprivation; sig worse for 24 hrs. Now looking at supplement-if you ignore and go straight down the middle, there is a main effect; those with caffeine do much better. Interactions: Taking ranges of the two lines. 10 vs. 75. There is an interaction. If the graph crosses at all (or looks like it will), there is an interaction. If parallel or overlapping, then no interaction.
Science loves empiricism
Empiricism- gaining knowledge through observation of organisms or events in the real word Science- gaining knowledge through empirical methods and logical reasoning. Science is the baby of rationalism (logical) and empiricism. Only acceptable way for researchers and scientists to gain knowledge.
Correlational study
Examines relationships between multiple variables, without manipulating any of the variables. The simple finding of a relationship among variables does not provide information about the causal relationship between those variables. This is the most important issue to consider when designing a correlational study. Looking for prediction. Can suggest the incidence or likelihood of something occurring in the presence or absence of something else. Correlation studies are usually the only option for health studies. Can't subject people to unhealthy things. Strong positive correlations between age and incidence of Alheimer's disease have been found such that by the time one reaches 85, they have a 50% chance of gettign Alzheimer's. Few researchers would argue that age causes Alzheimer's.
Quasi-experiment example
Family planning before and after a parent gets cancer. Housing trends before and after hurricane Katrina. Music sales before and after Vietnam War. Don't have control over, but look at how this has caused change in behavior
Within subjects desiegn
Group 1: drug A. Treatment for 3 months. Group 1 again: Drug B. Treatment for 3 months.
Between subject designs (independent samples)
Group 1: men; change in depression in 3 months. Drug A Group 2: women; change in depression after 3 months. Drug A. How robust is the effect? Effect size! Difference between 2 bell curves. The bigger the difference, the larger the effect size. _________________________________ Group 1: Drug A Group 2: Drug B
Null and alternative hypothesis
Groups will not differ. Drugs A and B do not differ in their impact on depression. Alternative hypothesis: groups will differ (can specify if you want) If there is a difference in means, is it by chance alone, or is it really meaningful?
Factorial design
Have only discussed 1 IV (2 levels). Now we want to focus on experimentation with more than one IV each having multiple levels. Will rarely find a study that doesn't have multiple IVs. Having multiple IVs helps with control and generalization. Because Bx is so complex, having more than one IV increases ecological validity.
Hypothesis testing
Hypothesis tests are designed to look for evidence against the null hypothesis that predicts no difference between conditions (or no relationship between variables) If our calculated statistic falls in the most extreme portion of this distribution, called the critical region, we have enough evidence to reject the null hypothesis and accept the scientific/alternative hypothesis that there is a difference or relationship. If strong enough difference in averages, increases chances null hypothesis is wrong.
Changing-criterion design
Involves changing the bx necessary to obtain reinforcement. A child is unable to sit still in class. Reinforce a positive behavior. First, get a baseline. Set the criterion for reward (sit still for 5 mins) After bx changes and is consistent, increase the bx necessary for reward. Child must now sit still for 10 mins for reward.
Partial correlations
Is there a relationship between online student's sense of community an their course grades? Simple correlation. Is there a relationship between online student's sense of community and their course grade, which controlling for sex? Want r for m and r for women. EX: is there an association between healthcare funding and disease rates? Assumed to be negative correlation. There is actually a positive correlation as funding increases, disease ranges increases). Why? After controlling for office visits to a doctor, the positive correlation is removed. The initially appear to be positive correlated because more people have access to healthcare when funding increase, which results in more REPORTED diseases by doctors and hospitals.
Alternating-treatment designs
Just as in large-n designs, researchers will want to test more than one IV in an experiment. With small-n studies, this can be done with alternating-treatment designs. Simply put, alternating treatment designs occur when the presentation of different IVs alternates. ABABCBCBA. Pretty messy design.
Factorial design: Women's Health Study
Measuring cardiovascular events when treated with low-dose aspiring and vit E supplement. 2x2 design. 4 groups. Aspirin and vit e; aspirin and vit e placebo. Aspirin placebo and vit e; aspirin placebo and vit e placebo. Might find that aspirin has an impact. Collapsing the groups. That's the marginal means; main effect. But what if presence of vit e on aspirin. Aspirin reduces the risk of cardio events, but to a greater degree among women who also take vit e. This is the interaction.
Psuedo-science vs Science: stagnation
Modern science will have rapid changes as new discoveries come forth. Pseudo-science is static and will rely more on "ancient wisdom" and cherished beliefs.
Controlling in quasi-experiments
One way to control for these threats is to find a comparable group that did not receive the same treatment and use them as a "control" or comparison group. Referred to as a nonequivalent control group design. Cavities before and after fluoride introduction into water. Could look at neighboring town that is demographically similar but did not introduce fluoride into the water. Can have multiple of these 'controls' to broaden it to have a higher sample. However, there is no random assignment so now we have a potential selection bias problem. Selection bias occurs when subjects are not selected randomly and therefore may have significant difference not recognized by the researcher.
Common inferential statistics and their uses
One-sample t test: Tests an effect. When the population mean without the treatment is known and is compared with a single sample Independent samples t test: Tests an effect. When two samples of different individuals are compared Repeated measures/paired samples: Tests an effect. When two related samples or two sets of scores from the same individual are compared ANOVA: Tests an effect. When more than two samples or sets of scores from the same individual are compared Pearson r test: Tests a relationship. When the relationship between two sets of scores is being tested Regression: Tests a relationship. When you want to predict an individual's score on one variable from the score on a second, related variable
Comparing two means with t-tests: one-sample t-test
One-sample t-test- is often used when the population mean without a treatment is known and is being compared with a sample mean that represents the population with a treatment. Population means are known for certain types of variables, such as standardized tests that have been given to many samples in the past or were designed to have a specific population mean. Is the sample meaningful? We have the population data! Ex: GRE score. Average GRE for our sample. Take that average and compare the average reported by the company that makes the GRE for the nation. Are UNC students doing better? Is the mean statistically higher or lower than average? Another ex: Comparing the mean of people taking the GRE that took a prep class vs. the national average.
Types of survey response scales
Open-ended: How do you feel today? ____ What is your political affiliation? ___ Free response. Respond in any manner they feel is appropriate for the question. Strengths: flexible and data-rich-often, an open-ended questionnaire is administered in the early phases of a research project and the responses are then used to form more specific close-ended questionnaires. Shortcomings: some way of analyzing data. Incredible amount of variability. Time-consuming. Subjective when trying to interpret what a respondent meant. Requires a coding scheme. Interrater reliability of 80% or greater. Close-ended- selects responses from 2 or more preselected possible answers. Strengths: easier to answer, easier to understand if participants don't know as much, easier to tabulate. Don't have to recalculate Shortcomings: limits options to a set responses that may not accurately describe how participants feel or what they think. Based on bias from researcher. May not include an exhaustive list of possibilities. Might need an other or none of the above. Look out for: too few choices, too many choices may confuse but can allow for more variable responses (increasing validity). Equal numbers of positive and negative response categories? Is there a rating scale with labels for the endpoints and possibly the midpoints of the scale? Visual analog- used for kids a lot. Allows to know direction.
Example ANOVA
Outcome: depression after three months of treatment (Dependent variable). Placebo (control), Wellbutrin (Tx1), Psychotherapy CBT (Tx2), Positive Psych (Tx3). Going to get a curve for all 4 groups. May have overlap or no. ANOVA gives an overall value first. Gives the main effect. Interaction effect: breaking each group down into men and women. Wellbutrin as a whole may not be significantly different, but it could work significantly better for men.
Descriptive studies
Positive relationship: increase in one variable that occurs with an increase in the other variable Negative relationship: as one variable increases, a decrease occurs in the other variable.
Psuedo-science vs Science: Reliance on personal experience
Pseudo-science has a reliance on personal experience: science will move away from anecdotal evidence and personal testimonies toward empirical testing. Opinions can often lead to hypotheses, but science will tests these whereas pseudo-science will encourage people to believe that the anecdotal evidence is just fine.
Psuedo-science vs Science: Outward appearance of science
Pseudo-science has the outward appearance of science: pseudo-science will often use scientific language to sound legitimate. Whereas a scientist can give a clear, precise definition of energy, pseudo-scientists use the term "human energy field" in a vague way to mystify the consumer without ever giving a definition of the term.
Psuedo-science vs Science: appeals to authority
Pseudo-science has very little data for the consumer to examine They instead encourage people to just take their word for it.
Psuedo-science vs Science: Absence of skeptical peer review
Pseudo-science will have the absence of skeptical peer review. Both science and pseudo-science will publish articles in journals, but science uses a very rigorous peer review process. Research findings are written up and submitted to a journal and the journal will send the paper to 2 or more experts in the field to scrutinize the merit of the paper and the methods used in the research.
Regression
Same as r when you only have 2 variables. Multiple regression is used when we use more than 1 IV to predict a DV. Can use head size to predict word recall, but also age and IQ. We call the variables covariates. Head size and age are probably also related. Plugged into regression.
Psuedo-science vs Science: tolerance of inconsistencies
Science believes contradictory statements cannot be true Pseudo-science is tolerant of these logical inconsistencies. Because the signs of the Zodiac are based on seasons, some astrologers reverses their order for the southern hemisphere while others do not. Orthomolecular medicine involves taking mega-doses of dietary supplements to increase potency whereas homeopathy involves diluting them to increase potency.
Psuedo-science vs Science: retreats to the supernatural
Science is concerned with measuring the natural world. Pseudo-science will often rely on things outside the natural because they cannot be proven false. Ex: human energy field cannot be observed or measured.
Psuedo-science vs Science: promising the impossible
Science respects the limits of present knowledge. Pseudo-science is not bound by reality: UFOs go faster than the speed of light, perpetual motion machines work, polygraph is 100% accurate
Psuedo-science vs Science: the mantra of holism
Science tries to get more and more specific in the way it answers questions. Pseudo-science avoids such distinctions in the name of holism. Most pseudo-scientists use holism in a vague way to escape deeper investigation
Threats to internal validity with survey
Social desirability bias: respondents' desire to be viewed more favorably by others, typically resulting in over reporting of "positive" behaviors and underreporting of "negative" behaviors. Marlow-Crowne Social Desirability Scale- 33 T/F items. Common threat to validity of the study.
Factorial designs Interaction
Sometimes the true effect in an experiment is masked when we look at the main effects of the IVs in isolation Only when we test for an interaction can we know if there is an underlying effect caused by the combination of IVs. Recall that Interaction is when the effects of one IV depend on the level of another IV. Sometimes called effect modification.
Designs employing subject variables
Subject variables can be used as independent variables Gender, age, IQ, brian damage, etc
Survey research
Survey research: a research study that uses the suvey observational technique to measure behavior (can include all types of questionnaires and interviews). Important information can be gathered from questionnaires, including clinical information, demographic information, and behavioral information. Questionnaires can be administered in person or electronically. They can even be administered through clinical interviews. Surveys are a common data collection technique used in psychological research, but it can be difficult to construct a valid survey. One of the most important issues to consider when designing a new survey: how to maximize the validity of the scores from the survey.
Survey reliability
Test-retest: scores on survey will be similar when participatns complete the same survey more than once. Issues trying to get a participant to take a survey twice. Attrition is when participants choose not to do the survey again. Mortality as well. Testing effects: when participants are tested more than once in a study with early testing affecting later testing. Internal consistency: a form of reliability that tests relationships between scores on different items of a survey. Split half reliability: a test is split into two, odds and evens maybe, if the scores are similar then the test is reliable. Cronbach's alpha- tests score's internal consistency and indicates the average correlation between scores and all pairs of the items on a survey.
Maturation example
Testing the efficacy of new reading program in school. The children may become better readers because of the program but the children also become more sophisticated and naturally smarter with time. In adults, the on set of disease may effect the influence of the IV on the DV. Kids are more vulnerable to maturation effect.
Interrupted-time series design
The design is similar to the observation-treatment-observation design but the treatment occurs outside of the experimenters control. Academic achievements before and after fluoride in the water. Variability in data before. Threats to validity- maturation and history effects. After the fluoride, scores improved. Dental and skeletal health is better with fluoride. Fewer health problems, less likely to miss school, etc that could influence academics. History effects- education programming is different; better in 80's and 90's compared to 60's and 70's. Flyn effects. When possible, it's always bet to get a no-treatment group for comparison reasons. Another prob elm is that it takes time.
P-value
The distribution of statistical values is used to determine the p-value for the value calculation in the test. Assuming the null is true, we can only reject it if the data are so different between groups that differences would occur by chance 5% of the time at most. Alpha is the probability of committing a Type 1 error. If we repeated the study 100 times, we would find a meaningful true difference 95% of the time. If the p-value is less than or equal to the alpha, the null hypothesis is rejected, and the researcher concludes that there is an effect. If the p-value is greater than the alpha level, the researcher cannot reject the null hypothesis and must conclude that there is no evidence of an effect. Reduces confidence. Even if the means look different, if the p-value is over .05, the stats suggests that it could be random chance and don't have convincing study.
Survey administration
The more participants in your sample who are willing to complete the survey, the lower chance there will be of nonresponse error in your study How can you increase participants? Presentation! Don't want page to be visually overwhelming. Clear instructions. Contacting participants multiple times will increase completion rates. Including an incentive.
Factorial notation
The notation indicates how many factors there are in a study and how many levels each of these factors has. The number of numerals in the notation tells you the number of factors 2x2. Two factors because there are two numbers listed. Levels of each. One has 2 levels and the other has two levels. 4 groups. 2x3. 2 factors; 1 has 2 levels and the other has 3 levels. 6 groups.
Cross-lagged panel design
The problem with correlation research is that we don't know which variable causes which The only way to know is to measure over time. Cross-lagged panel design will help us understand direction of causation between two variables. Across more than 1 time point. Time 1.... years go by... more measurements at Time 2.... Still have 2 variables. Does violent TV watching cause aggression or does aggression lead to violent TV watching? Directionality problem! In 1972 aggressive children watched more violent TV (r=.21). Not impressive correlation. If watching violent TV produces agressive Bx, we would not expect a relationship between aggression and later TV watching. We would expect early TV watching to correlate with later aggression. Time 1: Correlation between TV watching and aggressive behavior. Correlation only-cause cannot be found. Same as any other corr. study. Time 2: exact same thing. Correlation between TV watching and aggressive behavior. The cause has to come before the effect in time. You can determine which variables influence each other when measuring each set across time points. Have to assume cause before effect in time. Compare the strength of the relationship between: variable A (TV) at time 1 and variable B (aggression) at time 2 with the strength of relationship between variable B (aggression) at time 1 and variable A (TV) at time 2. Compare the strength of association between kid's aggressiveness and their TV preferences 10 years later with the strength of association between kids TV preferences and their aggressiveness 10 years later. If aggressiveness at T1 predicts TV at time 2, but TV at time 1 does not predict aggression at time 2, aggressiveness cause us to watch more violent TV. If TV at T1 predicts aggressiveness at T2 and aggressiveness at T1 does not predict TV habits at T2, TV causes aggression. Path diagram. Time 1 is just left; r=.21. Time 2 is just right; r=-,05. Correlations are different. TV T1 is not correlated well with TV T2 (.05). Those aggressive in childhood are likely to be aggressive as adult. (.38). TV t1 and aggression at t2 is strong (.38). Aggressive t2 is not correlated with tv at t2. The early TV watching is predictive of aggressive behavior. By taking it apart in time, we know which one is predictive of the other. The media you consumer at 8 is predictive of aggressiveness later in life.
Matching in quasi
Things like gender, age, IQ, etc are quasi-IVs. We cannot put subjects in these categories. We may find differences in bx among these groups but we cannot make causal statements. Regression to the mean- we want to match people based on things like scores on tests but regression to the mean might cloud the true abilities of the person.
Contingency tables
To establish a correlation, you need to know the number of cases reporting childhood trauma that (1) have Dx of DID, (2) don't have Dx of DID, as well as the number of cases that do not report childhood trauma that (3) have a Dx of DID, (4) don't have Dx of DID. Total from all 4 matches sample size. To establish that childhood trauma is related to DID dx, the rate of DID among individuals with trauma must differ from the rate of DID among those without trauma. Rate 1 = cell 1/cell 1 + cell 2 (both said yes to trauma) Rate 2 = cell 3/cell 3 + 4. (both said no to trauma). R1 has to be different than R2. If same percentage, there's no difference. 3 possible outcomes: if rates are equal, the rate of DID is the same in both groups. Has nothing to do with trauma If rate 1 is greater; there is a positive association. Trauma is associated with increased rate of DID If rate 2 is greater, there is a negative association. Trauma is almost protective Why this matters: you get into trouble when you look at just 1 cell instead of looking at rates instead.
Causality
To support the idea that one event causes another, certain conditions must be present. X and Y must be correlated. X must precede Y in time (directionality problem). Can lead to reciprocal causality: X causes change in Y and the change in Y causes new change in X, etc. EX: self-esteem causes good grades, better grades causes increase in self-esteem, etc. All other factors (Z) must be ruled out (third variable problem) Thus, there are a lot of problems to correlational research
Small n designs
Up to now we have covered large-n designs. However, small-n designs are often used to assess the clinical utility of techniques to modify Bx. The same behavior modification techniques will not work in all situations so often a therapist will do research on different techniques on an individual basis. Small-n designs are often called single-case designs. Trying to introduce some kind of intervention. There is an IV. A case study is meant to describe some person, group, or event. A single-case design is similar but will have an IV (intervention or treatment) of some kind.
Comparing two means with t-tests:: paired/related samples t-test
Used if two related samples (e.g. each individual in one sample is matched with one individual in the other sample on some characteristic) or two sets of scores from the same individual (i.e., the individual provides scores in more than one condition) are compared. IF two sample means are related from matched individuals or the same individual, the variability is likely to be lower, and an effect is more likely to be detected (i.e. the test will have more power) than if the samples are independent. Want to match as much as possible on the demographics that are likely to be influential (like age in a memory study). Will also help with within-subjects design
Validity review
Validity- degree to which a test measures what it intends to measure Internal validity- being confident the cause of the change in variable is because of the independent variable. Don't want confounding variable. Confident it was the medication that changed headaches. External validity- how generalizeable are the results. Lot of control over variables, not like the real word. Face validity- does a measure look like it's measuring what we want it to measure. Construct validity- does something measure what we want it to measure. Content validity- does the measure include all aspects of the construct? For example, when measuring anxiety, a a measure should include the affective, cognitive, and somatic aspects of the construct. Criterion-related validity- determining the validity of the scores of a survey by examining the relationship between survey scores and other established measures of the behavior of interest. Compare your survey scores to other established/previously validated surveys. Gold standard- when making a new measure, we should validate it against the gold standard of measurement. A new stress measure would need to be highly correlated with pre-existing measures or something like cortisol levels. Predictive validity- does the test predict things accurately? Does the GRE predict 1st year of grad school GPA? Nope. Awful predictor of GPA. Low predictive validity.
History effect example
Voting trends in 18 year olds now and 18 year olds in the 60's. 18 yr olds are different now in general. Drug use before and after introduction of a drug education program but at the same time, a rock star dies from overdose or an athlete is banned from playing due to drug use.
Multiple-baseline design
We cannot be sure that the introduction of the IV didn't cause some permanent effect on behavior (carryover). Works around this issue. Influence of treatment while understanding there may be a carryover effect. Therapist teaches you new coping mechanisms. You can't unlearn the new techniques. Introduce an intervention at different times for each of several different behaviors to see if onset of behavior change coincides with the manipulation. An intellectually disabled child is rewarded for doing certain tasks like brushing teeth, face washing, and hair combing. If all bxs are rewarded at the same time, it is possible that the change in bx was due to the presence of the researcher, attention received, or a spontaneous decision to change. Instead, reward just one bx at a time for a week at a time. This sequence would make it possible to see whether the increase in bx coincided with the reward. We can use multiple-baseline for multiple bxs in the same person, the same bx in multiple people, or a bx in one person in multiple situations. Uses varying time schedule that allows the researcher to determine if the application of treatment is truly influencing the change in behavior. Might vary the length of time in the initial baseline determination and the apply the treatment to determine if the chance in behavior correspond with the introduction of treatment. Might apply varying amounts of a specific treatment (verbal praise vs. verbal and physical praise) to better understand not only the best treatment but also the best amount to treatment.
Tips for developing a new survey
When developing a new questionnaire, important that you have clear instructions. Reading level. Example item if questions are complex. Using headings where questions related to a certain topic are group together. Warm-up questions so that people don't get defensive. Clarity: concrete easy to interprete items. No ambiguous language. Have poor internal consistency in responses. Avoid jargon and language that might be confusing. Neutrality: avoid loaded words. Do not imply judgement.
Comparing two means with t-tests: independent samples t-test
used when two samples of different individuals are compared. Each sample represents the population for that group or condition. Thus, there are two sample means that are compared in the test. Depression scores between men and women. Drug A vs. Drug B (groups are matched; average age, ratio or man/woman, level of education)