PSY3213L Exam 3

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Inferential Statistics - Hypothesis Testing Two-tailed hypotheses -Null Hypothesis - H0 : -Alternative Hypothesis - H1 : One-tailed hypotheses -Null Hypothesis - H0 : -Alternative Hypothesis - H1 :

----------------Two-tailed hypotheses Null Hypothesis - H0 : "Hypothesis of no difference" Group 1 = Group 2 Alternative Hypothesis - H1 : "Hypothesis of difference" Group 1 ≠ Group 2 --------------One-tailed hypotheses Null Hypothesis - H0 : Group 1 >= Group 2 Alternative Hypothesis - H1 : Group 1 < Group 2

5. Histogram displays ? for what type of variable ?

-A histogram is basically a bar graph on which the bars touch ◦ use bars to display a frequency distribution for a quantitative variable -The y-axis represents a frequency count of the number of observations falling into a category -Categories represented on the x-axis ◦ u should be able to tell if theyre positively scewed/negatively scewed based upon numbers.

Descriptives - Measures of Central Tendency Mean used for what type of variables? -Used if data are measured along what scale?

-Average of all scores in a distribution -Sum of all scores divided by number of scores Used for continuous variables -Used if data are measured along an interval or ratio scale and are normally distributed -doesn't represent central tendency as well in skewed distributions -More sensitive to outliers (extreme scores)

Classic test theory : -True Score -Systematic error -Random error

-goal in every measure is to reduce the error as much as possible True Score- -what ur actually trying to measure. -both highly reliable and highly valid has little systematic or random error -ex: True score would be the scale telling u ur actual weight Systematic error- -lots of systematic error, little true score -error that occurs every time so it makes me think the scale is reliable but its not valid. -ex: Scale adds 10 pounds everytime.. its consistent so u will think its reliable.. worst kind of error Random error -Lots of random error, little true score—not reliable or valid -ex: scale adds 20 pounds, add 5 pounds, etc.. u would never think its reliable

♣ Correlation coefficient

-is a statistic that describes how strongly variables are related to one another to calculate a correlation coefficient, we need to obtain pairs of observations from each subject. Thus, each individual has 2 scores, one on each of the variables -1 t0 1 --> The size (value) of the coefficient indicates the STRENGTH of the relationship. (+ or -) The sign of the coefficient indicates the DIRECTION of the linear relationship

• Effect size

-refers to the strength of association between variables -pearson r correlation coefficient is one indicator of effect size; it indicates the strength of the linear association between 2 variables. how spread apart are our distributions ex: ♣ Noise on test taking ability ♣ Silent vs loud ♣ Then look at 2 groups 1 silence, 1 noise, maybe 3 point differences on exam.. small effect ♣ Again, blast music, 15 pt differences- way greater effect ♣ How well ur gonna do on this last exams- gender (not gonna find much effect), IQ (wont be huge effect maybe 10% of variance), how long u study for (25%), how u study (explain 40% of variance) ♣ My IV of how u study is accounting for 40% of the score u got on exam ♣ Larger effect, more power, more likely u find statistical significance

Graphing Data (5)

1. Bar graph 2. Line graph 3. Scatterplot 4. Pie chart 5. Histogram

Decision Time...

1. Compare two sample means 2. Assess the probability that the means of the two samples are not different (no effect) Basically test null-hypothesis 3. If the probability is small enough (< .05), and in the predicted direction, then we reject the null hypothesis and say the difference is statistically significant p = .03 = statistically significant p value is the probability of obtaining data as extreme as observed (or more extreme), assuming that the NH is true 4. Direction of statistical test -One tailed = more power -Two tailed = less power

What do you consider? (2) Which one do you use?

1. Consider scale of measurement 2. Consider shape of distribution A. If you use a nominal scale you can ONLY use the mode B. If you use an ordinal scale you can ONLY use mode or median -In reality we will often use a mean when we have an ordinal scale just be careful with your interpretation C. If you use interval or ratio scale you can use mean IF distribution is normal -If not normally distributed use median ex: Researcher documents the amount of fights couples have during a one hour observation at a jewelry store Example : 2, 2, 1, 18, 2, 1 Mean = 4 -18 is an outlier. Messing w data. Now the mean score has no diagnostic value to show u the average score/central tendency -mean becomes bias Example: 10,500; 12,000; 11,000; 15,000; 50,000 (20,000) Mean = 19,700 Mean = 13,700 By giving the highest tax bracket a big tax break I can show that the 'average' American can save 6K in taxes under my tax plan. In this case, the median family income of a county is usually a better measurement of central tendency than the mean family income. Not many people have extremely high incomes, so using the mean would make it appear that the "average" person makes more money than is actually the case.

Running an ANOVA

1. Get F-Ratio 2. Is F-Ratio significant? If no: stop. If yes: need to do some follow up tests 3. Unplanned comparisons Post-hoc Can become "fishing expeditions" Increases the probability for error

Recording behavior (5)

1. Observation can be costly, complex and fast-paced o Ex: 50 kids playing and ur trying to catch every instance of aggression... very difficult.. the way to get around it is time sampling or individual sampling 2. Time Sampling -Scan subjects for a specific period (scan every 30 seconds), and then record your observations during the next period 3. Individual Sampling -Select a subject and observe behavior for a given period and then shift to another subject and repeat observations (randomly* pick someone, code her... then every 30 seconds... another person, code... 30 sec... another) 4. Event sampling -Select one behavior for observation and record all instances of that behavior -Good for when one specific behavior is focus of study and more important than other observed behaviors --Ex: One day u go out and mark down only punches.. next day slaps... next day hair pulling... 5. Recording a) Cameras -This allows more than one person to observe and code behaviors -Allows for covert observation b) Paper-and-pencil coding sheets (quiet, take more time) c) Voice Recorders (efficient but noisy)

Analyzing your data 1. once you are done looking at ________, you can look at __________ 2. you must make ? 3. An ANOVA test assumes? 4. inferential statistics again see if ? 5. statistical significance refers to ?

1. Once you are done looking at descriptive statistics, you can look at inferential statistics. 2. You must make assumptions using inferential statistics 3. An ANOVA test assumes that the data is normally distributed (bell shaped), and homogeneity of variance (or the variance between your groups is equal) 4. Inferential statistics again see if your IV affects your DV 5. Statistical significance refers to whether or not this effect is due to chance (if its not, u are able to generalize outside of ur sample)

Descriptives - Measures of Variability 1. Range

1. Range Subtract the lowest from the highest score in a distribution of scores (R = H - L) Simplest and least informative measure of spread -3 5 7 9 2 5 1 9 9 3 4 What is the range? 9-1 = 8 Prob: Scores between extremes are not taken into account --takes in account the extreme scores but doesn't take into account the scores in between the range Prob: Very sensitive to outliers (extreme scores) -outliers heavily affect range

Descriptives - Measures of Variability 2. Interquartile range

2. Interquartile range • More information: contains the middle 50% of the scores in a distribution. • Less sensitive to outliers, skewness -find median -find median for lower q -find median for upper q -Q3- Q1= IQR bottom 25% knock off, top 25% knock off -get rids of outliers/ less sensitive to outliers -if data is skewed u wanna use Interquartile range bc it gets rid of outliers

Descriptives - Measures of Variability 3. Variance

3. Variance-the average difference between your scores Tells us on average how much our scores are deviating from the mean The problem is that it can range from 0 to infinity. Your variance could be 50 even if you have a scale from 1-10. This is hard to interpret. s^2

What 4 things affect Power?

4 things that affect power 1. *sample size- bigger sample, more representative, more power. Sample is giving u better eye sight/ better glasses 2. direction of the test- one tailed/ two tailed.. one tailed gives u more power.. if im looking for a difference in one direction It gives u better eye sight than if ur told to look in two directions. 3. U need less evidence to say theres an effect.. more evidence, more errors 4. Effect size -Bigger effect sizes= more power

Descriptives - Measures of Variability 4. Standard Deviation

4. Standard Deviation ◦ Square root of the variance ◦ Most widely used measure of spread ◦ relatively good as long as u do standard error ◦ M = 3.00; SD = 1.0 ◦ indicates the average deviation of scores from the mean ◦ ex: income ◦ SD, derived by first calculating the variance, symbolized as s2 (the standard deviation is the square root of the variance) 1. Puts your variance back into the range (1-10) of your data. A standard deviation of 7 (out of 10) tells you that you have a lot of variability 2. Your standard deviation must fall into the range of your data 3. It can become biased when your distribution is not normally distributed 4. Population standard deviation is sigma σ 5. Sample standard deviation is s • Range & standard deviation are sensitive to extreme scores • When your distribution of scores is skewed, the standard deviation does not provide a good index of spread

● Naturalistic observation Advantages Disadvantages

Advantages: High External Validity ■ Why? Bc ur letting the variables freely act how they want to in the real world -Can be useful for generating hypotheses -Provides information about behavior in the natural environment Disadvantages: Low Internal Validity, Cost ■ Not manipulating anything so u have no Control ■ cannot explore underlying causes of behavior, ■ may take a lot of time (more time) ■ more money -Sometimes yields biased results -May be difficult to do unobtrusively -Doesn't allow conclusions about cause-and-effect relationships

Chapter 13: Inferential Statistics ○ Population: ○ Sample: The sample size

Allow you to infer things about the population based on the data you gather from your sample ○ Population: All people of interest for your study ○ Sample: A chosen selection of people from a population The sample size—the total number of observations—on determinations of statistical significance ○ -we take samples to estimate to millions of people, we run the deviation of sample The ability to state (with confidence) that the difference observed in your study will also occur in the real world. -Assess the reliability of your finding ■ Are your results repeatable? -used to determine whether the results match what would happen if we were to conduct the experiment again and again with multiple samples -in essence, we are asking whether we can infer that the difference in the sample means reflects a true difference in the population means -allow us to arrive at such conclusions on the basis of sample data -give the probability that the difference between means reflect random error rather than a real difference

● Sampling error

Always an issue in statistics and in psychological research Difference between sample and population Are the results of your study due to chance or error? Are the results of your study indicative of what happens in the real world? Is the difference between the means of the 2 groups (treatment & control) due to sampling error - or are there truly differences in the real world?

● Reactivity

An issue - the possibility that the presence of the observer will affect people's behaviors -can be reduced by concealed observation -also can be reduced by allowing time for ppl to become used to the observer and equipment (Habituation)

1. Which graph(s) work best when the IV is categorical? What about when the IV is quantitative?

Bar Graph- Best if IV is categorical Line Graph- Used if IV is quantitative

♣ Cohen's Kappa

Cohen's Kappa o professor says best method o Allows you to determine if agreement observed is due to chance o A Cohen's Kappa of .70 or more indicates acceptable interrater reliability ♣ correlation between agreed and disagreed ♣ 0-1 ♣ the higher the better ♣ u use this when ur categories are yes or no.. whether the child was aggressive or not U cannot use Cohen's Kappa if data is continuous

● Meta-analysis similar to? statistical tests can be done to? one adv? -Studies= -IV= -DV= Weighted Analysis should be done (2) Publication Bias? (2)

Combining or comparing findings across studies to get an overall picture of behavior or phenomenon ■ Similar to a literature review ■ Statistical tests can be done to assess size of effect ■ One advantage is that you can find moderator variables by analyzing all studies on a subject ■ Studies = Participants ( 1 participant/person = 1 study ) ■ Study characteristics (method, setting, etc) = IV's ■ Effect Size = DV Weighted Analysis should be done ■ For studies with larger sample sizes (less error) ■ For studies with better methodology -across all these studies.. do we find a reliable effect? Very good method to look at that -also tells us what moderating variables (methodology issues)- why u didnt find something? Ex: measure verdict on a continuous scale vs on a dichotiment scale.. changes data Publication Bias? ■ Use both published and unpublished studies -Two studies find effect so u try to run the study and don't find the effect.. both of them are type 2 errors... - Metaanalysis will tell us... there needs to be an exact # of studies unpublished.. more # unpublished studies make us more that the real data is true ■ Calculate False-Safe N - indicates how many null findings would be necessary to make the observed effect non-significant.

Construct validity Internal validity External validity

Construct validity -concerns whether our methods of studying variables are accurate Internal validity -refers to the accuracy of conclusions about cause and effect External validity -concerns whether we can generalize the findings of a study to other populations and settings

♣ Correlation coefficient

Correlation coefficient o -is a statistic that describes how strongly variables are related to one another o to calculate a correlation coefficient, we need to obtain pairs of observations from each subject. Thus, each individual has 2 scores, one on each of the variables

Descriptive statistics Inferential statistics

Descriptive statistics -are used to describe the data by looking at the pattern of your data , the people who are responding, variance, spread, distribution etc. Inferential statistics -are used to infer past your data to see if there is a relationship between your variables above and beyond chance. -This tells you if you found an effect and if it's significant.

Examine Distribution Distribution of the data can be described using 2 properties:

Descriptive statistics -allow researchers to make precise statements about the data Distribution of the data can be described using 2 properties: 1. Central Tendency using one number that represents the central score of your data set -mean/median/mode 2. Variability what number gives the best desciption of variation/how much spread/ how widely the distribution of scores is spread -standard deviation, range These 2 numbers summarize the info contained in a frequency distribution

● Sampling distribution

Distribution of every possible sample taken from the population Characteristics of the sample (mean, SD, etc.) Gathered to make inferences about the population distribution Related to "normal distribution" Sampling distribution is based on the assumption that the null is true All statistical tests rely on sampling distribution to determine the probability that the results are consistent with the null. When the obtained data are very unlikely according to null hypothesis expectations (usually a .05 probability or less), the researcher decides to reject the null and therefore to accept the research hypothesis

Entering the group

Entering the group -Case and approach -Gatekeepers - Can give you access -Guides and informants can help you gain access to gatekeepers Usually want to take notes if you can -If covert, take notes at the end of the day -how do u take notes? Covertly record things... take notes after meeting, bring in tape recorder.. memory issues.. Analyze data -Do important themes immerge? -ex: how do u enter the group? If it was a situation like being a nark and entering a gang... If they find out ur life could be in danger.. how are disputes handle within the group? What are their goals? infiltrated the cult, covert, outlined themes, investigated cognitive dissocance act and is maintained in the group, and they maintained their consistent level of thought even when failure occurred. CLASSIC EX: Festinger "When Prophecy Fails" -studied a small UFO religion in Chicago called the Seekers that believed in an imminent apocalypse and its coping mechanisms after the event did not occur. -Festinger's theory of cognitive dissonance can account for the psychological consequences of disconfirmed expectations

ANOVA Example

Group 1 = Zoloft m = 20 Group 2 = Paxil m = 10 Group 3 = Valium m = 30 F-ratio = F(1,219) = 5.91, p = .02 Is this a significant F-ratio? Can you tell me where the differences lie?

1. Distinguish between Inferential statistics and Descriptive statistics.

Main diff- descriptive- look at patterns of data, whos responding how many times, how many are male/female/trans, ethnicity, etc. allows us ground work to go run those tests. Certain things we need to check for to get an idea of what the data should look like Inferential-taking data and inferring if there is an effect/relationship above and beyond chance. Did we find an effect? Is it significant?

Evaluating Interrater Reliability

Main problem with observational research is lack of objectivity You must establish reliability of observations from multiple observers - very important EX: aggression study... one person that can code all the data for u... they might interpret ur coding scheme diff than u

Descriptives - Measures of Central Tendency Median -Used if data are measured along what scale?

Median ◦ Central score in an ordered distribution ◦ Used if data are measured along an ordinal ◦ Order scores from lowest to highest ◦ Find middle score in odd numbered distribution. ◦ Find middle two scores and average in even numbered distribution It is not affected by outliers. It doesn't take into account all of the numbers of distribution Used for skewed distributions ((n + 1)/2)th item 1, 2, 3, 7, 10, 11 (6+1)/2 = 3.5 3 + 7 = 10 10/2 = 5 1, 10, 13, 25, 35

Descriptives - Measures of Central Tendency Mode -Used if data are measured along what scale?

Mode ◦ Most frequent score in a distribution ◦ Used with nominal data (like gender or ethnicity) ◦ Limited application and value ◦ Can have bimodal distributions (these are the worst) 6, 6, 4, 3, 7, 4, 6 -in general, how do ppl score on this scale? mode/ 6... good number to use 6,38,78,6,45,6,52- mode/6 is not a good number to use

Nominal Scale Ordinal scale Interval scale Ratio scale -An important implication of interval/ratio scales is what?

Nominal Scale - most IVs in experiments are nominal ♣ Defines cases or types (Qualitative/Categorical) ♣ Variables differ on QUALITY not QUANTITY -no numerical, quantitative properties ♣ No order; No mathematical operations, just frequency (counting) -The levels are simply different categories or groups ♣ Ex. Gender (Male/Female); Majors (Psych/Biology/English) ♣ Ethnicity • Do the two groups DIFFER on a quality? (Gender) Ordinal scale ♣ Defines cases or types that may be assigned a rank ordering (distance between intervals not known) -exhibit minimal quantitative distinctions. ♣ Still use categorical labels ♣ Greater then or less than operations ♣ Ex. Answers on a Likert scale (Strongly Agree, Agree, Disagree, Strongly Disagree) ♣ U.S.D quality of beef (good, choice, prime) ♣ Does one group have MORE of a quality than the other? (Grade in school Knowledge) • We can rank order the levels of the variable being studied from lowest to highest • Ex: u might ask ppl to rank the most important problems facing ur state today. If education is ranked first, health care second, and crime third, u know the order but u do not know how strongly ppl feel about each problem. • The intervals between each of the items are prob not equal Interval scale ♣ Spacing between ranked values is known ♣ Quantitative: Indicates how much values differ ♣ The difference between 8 and 9 is the same as the difference between 76 and 77 - intervals between the levels are equal in size. ♣ Ex. WAIS (Wechsler Adult Intelligence Scale) ♣ Zero point ≠ absence of the variable (e.g., Temperature) ♣ Addition and subtraction operations ♣ Degrees Fahrenheit scale ♣ How MUCH MORE of the quality does one group have? (8 on depression scale vs. 18 on depression scale) -Ex: asking ppl to rank their mood on a 7point scale ranging from a "very negative" to "very positive" mood. There is no absolute zero point that indicates an absence of mood Ratio scale ♣ Spacing between ranked values is known ♣ Quantitative ♣ Zero point = absence of the variable -have both equal intervals and have an absolute zero point that indicates that absence of the variable being measured. ♣ Multiplication and division operations ♣ Ratios are equivalent ♣ The ratio of 2 to 1 is the same as the ratio of 8 to 4. ♣ Ex. Weight, Income, Retention Interval, length, other physical measurements ♣ Hours spent studying for Research Methods ♣ Annual income in dollars ($) ♣ Does one group have TWO TIMES as much of a quantity?(60 hours donated is twice as much as 30 hours donated) Interval/Ratio Scales -An important implication of interval/ratio scales is that data can be summarized using the mean, or arithmetic average.

Null Hypothesis - H0 : Alternative Hypothesis - H1 :

Null Hypothesis - H0 : null is made up -it's random based -to compare mathematically to it -made up distribution that we test our research alternative hypothesis against ex: Men and women perform equally on a spatial reasoning task simply that the population means are equal—the observed difference is due to random error.. the iv had no effect the logic of the null is this: if we can determine that the null is incorrect, then we accept the research hypothesis as correct. Acceptance of research hypothesis means that the iv had an effect on the dv. Alternative Hypothesis - H1 : -research/original hypothesis -opposite of the null -do I have support for this hypothesis? -Collect data, analyze ex: Women perform differently than men on a spatial reasoning task that the population means are, in fact, not equal ... the iv did have an effect

Chapter 6: Observational Methods -Observational methods can be broadly classified as (2) Overview (5) Nonexperimental Research Types of Designs- Nonexperimental (5)

Observational methods can be broadly classified as primarily quantitative or qualitative. Qualitative research focuses on peoples behaving in natural settings and describing their world in their own words -conclusions are based on interpretations drawn by investigator Quantitative research tends to focus on specific behaviors that can be easily quantified (counted) -conclusions based upon statistical analysis of data Overview: -Types of non-experimental research -Quantifying behavior -Interrater Reliability -Content Analysis -Meta-analysis Nonexperimental Research -Is when there is no manipulation of variables -It only involves measurement -Only produces descriptive or predictive knowledge -Remember, no CAUSAL relationships can be established using nonexperimental research -correlational, no manipulation -not just surveys, via observational reports Types of Designs- Nonexperimental -Naturalistic Observation -Ethnography -Case History -Archival Research -Content Analysis

● Naturalistic observation Problem: examples

Observing subjects in natural environment without manipulating any variables The goal of naturalistic observation is to provide a complete and accurate picture of what occurred in the setting, rather than to test hypotheses formed prior to the study To achieve this goal, researcher must keep detailed field notes The data in naturalistic observation studies are primarily qualitative in nature; that is, they are the descriptions of the observations themselves rather than quantitative statistical summaries ■ Mall, courtroom, day-care center, place of employment, etc. ■ ex: investigating aggression on playgrounds ● code for diff type of aggressive behavors (slapping, pushing, elbow) but if theyre playing bball and push that's not aggressive, and count duration/frequency... beginning of week, end of week... male or female... no manipulations but still look at any correlations Problem: Observation changes behavior ■ Should try to be unobtrusive (one-way mirrors, blind) ■ Habituation: Letting subjects get used to observers or cameras ● Ex: couple arguments... banks, grocery store.. what are they arguing about? observation changes behavior so don't let them know ur observing them - theres some ethical standards.. lab we can use a one way mirror, u can use cameras ● Ex: at work and boss isn't around u talk and then when he walks by u start working

More than Two Groups

One way ANOVA test (Analysis of Variance) -Uses F ratio ( larger F ratio can lead to significant results) - an extension of the t test -Used for one categorical IV with 3 or more levels and continuous DV -more general statistical test that can be used to ask whether there Is a difference among 3 or more groups or to evaluate the results of factorial designs (ex: zoloft, paxil, valium) so 3 comparisons F statistic is a ratio of 2 types of variance: 1. Systematic variance: Deviation of the group means from the grand mean, or the mean score of all individuals in all groups. -It is small when the differences between groups is small and increases as the group mean differences increase -It is the variability of scores between groups 2. Error variance: Deviation of the individual scores in each group from their respective group means -the variability of scores within groups. Look at the overall effect (omnibus test) i. The p-value less than 0.05 only tells you that at least one of the means differ from one another. With 3 groups, you can make 3 comparisons. You want to know where the difference lies. (slide 22) Look at the post-hoc effects to see where the effect lies i. You are basically doing 3 t-tests and outputting a p-value to see WHICH comparison is significant.

♣ Pearson Product-Moment Correlation used when data are? can test? correlation vs agreement ex:

Pearson Product-Moment Correlation -Correlate ratings of multiple observers using Pearson r (0 - 1) -Used when data are continuous o Can test statistical significance of observed r o Two sets of scores may correlate highly, but may still differ (i.e. high consistency, low agreement) - Check for the means and if they are the same and pearson is high then you can assume this is ok Ex: how many times the child acted aggressive • first person 1,2,3,4,5... second person 6,7,8,9,10... correlation 1/6...2/7...3/8...4/9..5/10 • if their two means are equal- don't worry about the low agreement.. u can trust correlation • if they aren't equal- it's a false sense pearson r correlation coefficient is the appropriate way to describe the relationship between 2 variables with interval or ratio scales. Pearson r provides us info about the strength of the relationship and the direction of the relationship. -A correlation of 0.00 indicates no relationship between variables. - The nearer a correlation is to 1.00 (plus or mins), the stronger is the relationship. So the Size indicates the strength. -A 1.00 correlation is sometimes called a perfect relationship bc 2 variables go together in a perfect fashion. -The sign of pearson r tells us about the direction of the relationship; whether there is a positive or a negative relationship between the variables Ex: -.54 indicates a stronger relationship than does a coefficient of +.45

♣ Percent agreement

Percent agreement o bad method in professors opinion o Simplest method o Should be around 70% o Can underestimate - if agreement is defined as an exact match o Does not take into account chance o Can overestimate - if behaviors happen a lot (or little) then agreement is likely based on chance Total Number of Agreements _____________________________________________ X 100 Total Number of Observations

Issues Power of statistical test (2) Power is affected by (4) Too much power?

Power of statistical test -Ability to detect differences among groups -Ability to correctly reject the null hypothesis Power is affected by: 1. Alpha Level More conservative = Less power -U need less evidence to say theres an effect.. more evidence, more errors The lower the significance level, the more the data must diverge from the null hypothesis to be significant. Therefore, the 0.01 level is more conservative than the 0.05 level. The Greek letter alpha (α) is sometimes used to indicate the significance level. 2. ** Sample Size ** Larger samples = more power Sample is giving u better eye sight/ better glasses 3. Effect Size Manipulation of your IV Actual effect in the population (power calculations) Bigger effect sizes= more power 4. Direction of statistical test One tailed = more power Two tailed = less power -if im looking for a difference in one direction It gives u better eye sight than if ur told to look in two directions. Too much power? Finding a statistical effect doesn't mean that it happens in the real world --If you have a significant t, z, or F value, so what? Also need to "interpret" the effect, to put it into words. --Just like interpreting the sign of the correlation --But you have 2 means to report

Probability

Probability is the likelihood of the occurrence of some event or outcome. We want to specify the probability that an event (in this case, a difference between means in the sample) will occur if there is no difference in the population.

Quantifying Behavior (2 methods)

Quantifying Behavior Turning behavior into mathematical pieces of information o -the more u code for the more error u will have Frequency Method o Record the frequency with which a behavior occurs within a certain time period o Ex: how many times does a child exhibit aggressive behavior? It's a # Duration Method o Record how long a behavior lasts o Ex: couple dispute... they fought for this amount of time

Quantitative vs. Qualitative

Quantitative Theory: -Expressed in # -Formulas used to describe behavior -Variables are given a certain mathematical weight to predict a certain outcome Qualitative Theories: -Relationships between variables are expressed in words. -Information can be ranked but no specific measures can be given.

● Systematic observation

Refers to the careful observation of one or more specific behaviors in a particular setting -This is much less global than naturalistic observation research Ex: bakeman and brownlee were interested in the social behavior of young children. 3 yr olds were videotaped in a room in a "free play" situation and coded each child's behavior every 15 seconds using a coding system -setting up our study so that we eliminate or reduce bias. We set up decision rules ahead of time that reduce inferences. A decision rule is a procedure set in place before we begin data collection.

● Ethnography

Researcher becomes part of the group they are observing Used in anthropology to study different cultures Participant vs. Nonparticipant Observations/ Member vs. Nonmember -Participant Observation/Member- actually become part of the group so people think they are.... "A nark" -Nonparticipant Observation/Nonmember- people know you're not in the group Overt vs. Covert -Consider ethical restraints (no consent form can be administered) -Overt- takes notes when ppl know -covert- they don't know ur taking notes.. no consent.. ethical probs, how do they record their data? Yield more accurate data bc ppl act natural

Significance Level

Researchers traditionally used a .05 or a .01 significance level in the decision to reject the null hypothesis Specifies the probability of a Type I error if the null hypothesis is rejected and it is actually correct Significance level chosen and the consequences of a Type I or a Type II error are determined by the use of the results

What is meant by statistical significance? How is it different from practical significance?

Statistical Significance Statistical vs. Practical Significance A statistically significant effect is not likely due to chance --Does not mean that a difference is important --A finding may have practical significance if the finding is large enough to be meaningful Sometimes practical significance can be small but still relevant --Consider an effect size .034 for aspirin on heart attacks. This is small but it translates to 3.4%. If 750,000 have heart attacks, this translates to 25,500 people. Sometimes practical significance can be large but not relevant --Implementing very costly interventions or equipment in every day life

Statistical significance/p value marginal means cell means

Statistical significance - An effect is not due chance -probability of obtain data as extreme of observe assuming null hypothesis is true -u compare p value to alpha and if its less than 0.05 then u have a significant effect -but it only tells you that at least one of the means differ from one another. -so u have to figure out what the effect is... Look at the means -Differences between the two means -groups difference numerator- systematic variance (I have explained this by manipulation) ---------------------------------------------------------- denominator- error variance (unexplained) -one group was more likely to provide the defendant guilty (mean#) marginal means represent main effect cell means (the combined effect)—look for interaction

Statistical Significance Statistical vs. Practical Significance

Statistical vs. Practical Significance A statistically significant effect is not likely due to chance --Does not mean that a difference is important -A finding may have practical significance if the finding is large enough to be meaningful -When a small effect size can be translated to areas such as medicine and have a relevant meaning. Sometimes practical significance can be small but still relevant --Consider an effect size .034 for aspirin on heart attacks. This is small but it translates to 3.4%. If 750,000 have heart attacks, this translates to 25,500 people. Sometimes practical significance can be large but not relevant --Implementing very costly interventions or equipment in every day life

● Coding system

The researcher must decide which behaviors of interest, choose a setting in which the behaviors can be observed, and most important, develop a coding system, to measure the behaviors Ex of a Systematic observation Coding system for children: ○ Unoccupied- child is not doing anything in particular or is simply watching other children ○ Solitary play- child plays alone with toys but is not interested in or affected by the activities of other children

To use a statistical test, u must first ... U must also specify the

To use a statistical test, u must first specify the alternative/research hypothesis, specify one tailed or two tailed? (null changes if its one tailed/two tailed) and then null hypothesis that u are evaluating. U must also specify the significance level that u will use to decide whether to reject the null; this is an alpha level. As noted, researchers generally use a significance level of .05

• Dummy Coding

Turning categorical variables into numerical values. ♣ tell spss to have levels -If you have a dichotomous variable , use 0 and 1. Using 0 and 1 you can find the average between those and intuitively understand this average. -if your average is 0.78, your sample is mostly female (78% female and 22% male) -If you have more than 2 levels of your variable , just use 1, 2, 3, 4 etc. EX: o Male = 0, Female = 1 o Caucasian = 1, Hispanic = 2, African American = 3, Asian = 4 o Freshman = 1, Sophomore = 2, Junior = 3, Senior = 4

Distinguish between a Type I error and a Type II error. Why is your significance level the probability of making a Type I error? How do you correct for a Type I error?

Type 1 ● -alpha - a probability of type 1 error (type 1 error—when u say theres an effect, when there is no effect... false positive- worst mistake) ● type 1 error example: drug that helps reduce heart attacks -patient at risk who has heart attack and drug doesn't help... that's a worst mistake ● control for that so set alpha at a low number .05 ● if we were to run the study 100 times the same exact way... 5 x out of 100 we will get a false positive Type 2 ● type 2 error- less important ● -saying there is no effect when there really is

Type 1 and type 2 errors

Type 1: "False Alarm" -False positive/ Worst Mistake -finding a significant mean difference between the groups in the study when there really isn't a difference between groups in the population -when u say theres an effect, when there is no effect... false positive- worst mistake Example: drug that helps reduce heart attacks -patient at risk who has heart attack and drug doesn't help... that's a bad mistake ● Alpha - a probability of type 1 error -control for that so set alpha at a low number .05 ● if we were to run the study 100 times the same exact way... 5 times out of 100 we will get a false positive Type 2: "Miss" -False negative/ not as bad as type 1 -finding no difference between the groups in the study when there really is a difference between groups in the population -saying there is no effect when there really is ● Beta at .2 (20% chance) ● Beta compared to alpha- statistically we're more willing to accept making error 2 (20%), than error 1 (5%)

Chapter 12: Describing Data Overview: -Types of data analysis -Exploratory data analsys -Techniques Research Process (7)

Types of data analysis • Descriptive vs inferential Exploratory data analsys • Normal distribution • Graphs • Frequency histograms Techniques • Central tendency • Variability • Measures of association Research Process 1. Developing an idea and a hypothesis 2. Choosing an appropriate research design 3. Choosing an appropriate subject population 4. Conducting a study 5. Analyzing data 6. Reporting results 7. Start again. Repeat, repeat, repeat!

Nonparametric Statistics

Used if you have categorical data Chi-square - Used when your DV is dichotomous (e.g., yes or no/ Guilty or Not Guilty) and IV is categorical (expert testimony or control group) Contingency (frequency) tables: Compares observed cell frequencies with expected cell frequencies

Descriptives - Measures of Variability

Variability or Spread -measures how close to the center the variable's values are ◦ How spread out the scores are using the range, interquartile range, variance, or standard deviation

For each test: what are the criteria for each statistical test that was covered in lecture (think: type of IV, type of DV, type of design)? Don't worry about chi-square

What kind of iv do u have? o Categorical or continuous ♣ If iv was continuous, the answer would be to run a regression (y=mx+b) ♣ All other tests require categorical • Guilt/not guilty dichotomous • IQ, temp, likert scale, level of depression—all continuous Is there 1 IV or is there 2 IVS? ◦ T test- u only have 1 IV, 2 levels, iv must be categorical, dv must be continuous ♣ between/independent- ♣ within / nonindependent- everybody experienced both levels. Groups are dependent upon eachother Is DV cateorgical or continuous or dichotomous ◦ Majority is continuous ◦ 1 is categorical chi square Design. What kind of design ◦ Between ◦ Within ◦ Mixed- requires 2 IV's ◦ Matched & within are considered the same thing in statistics If you got 2 IV's- u get rid of t-test, one way nova... must be factorical factorial anova o -2 IV's, IV's have to be categorical, can have multiple levels, dv is continuous o -design? ♣ Between ♣ Within ♣ Mixed ♣ -now u know u have to be running

Two Sample Tests

When you want to compare 2 groups (e.g., treatment & control) and your DV is continuous -commonly used to examine whether the 2 groups are significantly different from each other Two-Sample Tests 1. t test for independent samples used when subjects were randomly assigned to your two groups -Between-subject designs 2. t test for non-independent samples used when samples are not independent -Within-subject, matched pairs designs You run a t-test to compare the means of Group 1 and Group 2 and find that t = 2.99, p = .06 1. What is the null hypothesis for this test? 2. Is the difference between Group 1 & 2 statistically significant? 2b. How can you tell? 3. In reality Group 1 & 2 are the same (means are not different). Have you made an error or a correct decision? If you made an error, what type?

● Content analysis Rules (3) Observers must be ?

You analyze spoken or written records for the occurrence of specific categories of events (e.g., a word or phrase) Can be done on TV Shows, Magazines, Interactions between couples in therapy, etc. the systematic analysis of existing documents. It requires researchers to devise coding systems that raters can use to quantify the info in the documents. Rules: ■ Do Not Look at materials before creating coding system -This is so u go in w an unbias perspective ■ Be sure to Create rules about what you are observing ■ Be sure to Create operational definitions (violence, fighting, etc.) Observers doing content analysis must be blind. Why should they be blind? ■ bc u don't want them to know what ur looking for and don't want each other to know what ur coding for... ■ then once complete they bring their coding together and agree/disagree ■ then repeat process and keep going until they get interrater reliability Materials to be analyzed should be chosen carefully to increase generality Mainly descriptive

● Archival research

You use existing records (e.g., police records, prison records, medical records) as your source of data Trick is to gain access Really great way to get real life data The use of archival data allows researchers to study interesting questions, some of which could not be studied in any other way. They are valuable supplement to more traditional data collection methods

Possible exam question: given the mean, median and mode- what is type of distribution do you have (negatively skewed, normal, positively skewed)?

a. Mean < median < mode: negatively skewed b. Mean= median= mode: normal (no skew) c. Mean > median > mode: positively skewed

categorical- continuous - dichotomous -

categorical- The color of a ball (e.g., red, green, blue) or the breed of a dog (e.g., collie, shepherd, terrier) would be examples of categorical variables. continuous - IQ, temp, height, weight, age-- the answer would be to run a regression (y=mx+b) dichotomous - 2 answers (guilty/ not guilty) (yes, no)

bimodal distribution

is a continuous probability distribution with two different modes. These appear as distinct peaks (local maxima) in the probability density function ___—|||—_—|||—___

● Family-wise error rate

is the probability of making one or more false discoveries, or type I errors when performing multiple hypotheses tests.

Understand what effect sizes are and be able to interpret effect size.

o Bigger effect sizes= more power

Exploratory Data Analysis ◦ Descriptive Statistics - Search for patterns in the data (3)

o Data normally distributed -Important because many statistical analyses assume that the data are normally distributed -A nice bell shaped curve that is symmetric with half of the population is above the mean and the other half is below the mean -mean / median / mode all equal to the same number -Frequency distributions tell you the distribution of your data for one variable. It is a histogram turned on its side -Positively skewed graph --Ex: income- very few ppl making a lot of money -Negatively skewed graph --Ex: letter scores at UF- very few ppl making low grades such as D's and E's o Variance in the data - how spread out (more spread out, lots of error variance) o Outliers -abnormally high or low scores -more than 2 Standard deviation away from the mean, but FORSURE an outlier if its 3 std away from mean -if u remove outliers u find effect -if u don't remove and u don't find effect -just bc theyre weird... doesn't mean u should remove them from ur data set

reliability interrater reliability

reliability- refers to the consistency or stability of a measure of behavior interrater reliability- the extent to which raters agree in their observations. Thus, if two raters are judging whether behaviors are aggressive, high interrater reliability is obtained when most of the observations result in the same judgment. A commonly used indicator of interrater reliability is called Cohens kappa

Background information Error variance ex: Confounding variable

what causes error variance? Extraneous variables. Another term is unexplained variance How they seek to handle error variance is key EX: How well u do on test Study? 1 group 10 hrs 1 group 20 Sleep, stress, techniques, intelligence, etc are all noise and affect it Making it hard to see the difference Nothing we can do, its always present.... But we need to reduce it -indivudal differences will be present Confounding variable - when we actually know that an uncontrolled third variable is operating, we can call the third variable a confounding variable

Analyzing Data - Frequency Distribution

• -indicates the # of individuals who receive each possible score on a variable. • Frequency distributions are familiar to most college students—they tell how many students received a given score on the exam. Along w the number of individuals associated w each response or score, it is useful to examine the % as associated with this number ◦ Can take the form of a table or a graph ◦ Graphically, a frequency distribution is shown on a histogram for quantitative variables (or a bar chart for categorical variables) --take frequency distribution-- take each person score. Ethnicity for each person, how many ppl marked self as cauc/aa, distribution of peoples responses.. basically same as histogram but turned on side

Which type of error do scientists place more emphasis on not committing?

• Type 1 • Ex: drug that helps reduce heart attacks- patient at risk who has heart attack and drug doesn't help... that's a worst mistake

Graphing Data IV on which axis? , DV on which axis? predictor/criterion? how do graphs help more instead of looking at raw data?

• Typically IV always on x axis/ DV always on y axis • (predictor/IV)—(criterion/DV) • Easier to see patterns than looking at raw data. • Helps when reporting your results. • -distribution of all the scores of ur DV. What was everyones score? That's a lot when u have hundreds... easiest way is to graph • When researchers are interested in predicting some future behavior (called the criterion variable) on the basis of a persons score on some other variable (called the predictor variable), it is necessary to demonstrate that there is a reasonably high correlation between the criterion and predictor variables.

Distinguish between the null hypothesis and the research hypothesis. When does the researcher decide to reject the null hypothesis?

○ All statistical tests rely on sampling distribution to determine the probability that the results are consistent with the null. When the obtained data are very unlikely according to null hypothesis expectations (usually a .05 probability or less), the researcher decides to reject the null and therefore to accept the research hypothesis ○ We reject the null when we find a very low probability that the obtained reults could be due to random error. This is what is meant by statistical significance: a significant result is one that has a very low probability of occurring if the population means are equal. More simply, significance indicates that there is a low probability that the difference between the obtained sample means was due to random erorr. Significance, then, is a matter of probability

■ Content analysis- Example:

○ Example: ■ Content analysis of children's books in 1972 revealed that boys were depicted as active and outdoorsy while girls were depicted as passive and stayed mostly indoors. Researchers believe that this may influence self-esteem and identity issues in girls. ■ So maybe you go in a childrens library and look in fictional section to see how many superheros are men or female.. only 3 are woman.. themes that characterize genders in a certain way that help ppl maintain the stereotypes

○ Meta-Analysis Example

○ Meta-Analysis Example ■ Verbal Overshadowing ■ Popular in the 1990's ■ Schooler (1990) found effect in 6 different studies ■ Meta-Analysis found modest VOE effect across 29 studies and over 2,000 participants (Meissner & Brigham, 2001)

● Case history/study

○ You observe and report on a single case -bc some are so rare u cant study large groups so u study individuals ○ Psychological Disorders ○ An observational method that provides a description of an individual. ○ A naturalistic observation is sometimes called a case study, and both of the approaches overlap ○ Or a case study may be a description of a patient by a clinical psychologist or a historical account of an event such as a model school that failed Advantages: -Provides a good way to generate hypotheses -Yields data that other methods can't provide Disadvantages: -Sometimes gives incomplete information -Sometimes relies only on self-report data, which can be misleading -Can be subjective and thus may yield biased results -Doesn't allow conclusions about cause-and-effect relationships Ex: R.M- had a surgery involved removal of part of the brain known as the hippocampus to alleviate the severe symptoms of epilepsy. ■ he became the most studied medical case in history, and on death his brain was dissected - preserving his tragically unique brain for posterity.

● Alpha level

○ a probability of type 1 error ○ control for type 1 error so set alpha at a low number .05 ○ if we were to run the study 100 times the same exact way... 5 x out of 100 we will get a false positive ○ Lower u put alpha, the more control and less likelihood u commit type 1 error ○ chances of making a type 1 error .. 5% chance saying theres an effect when theres rlly not.. usually set it at .05

● Power

○ power is probability of finding an effect when it exists ○ set power to .8 (beta = .2 alpha =.05 ) ○ certain tests allow u a greater likelihood of finding an effect (any continuous will be compared to chi square)

● Beta

○ set beta at .2 (20% chance) ○ Beta compared to alpha ○ -statistically we're more willing to accept making error 2, than error 1

1. Bar graph best if IV is? dv is ?

◦ Best if IV is categorical ◦ Y-Axis = DV; X-Axis = IV ◦ Bar doesn't touch each other bc whats on the x axis is categorical. Y axis is some descriptive. U cant use this with a continuous IV ◦ -separate and distinct bar for each piece of info

2. Line graph IV is? used to display? ex:

◦ Connects data with a line ◦ Used if IV is quantitative ◦ Used to display functional relationships ♣ When the value of the dependent variable varies as a function of the independent variable ¬ function of salary is dependent upon yrs of education ■ Line graphs are used when the IV represented on the horizontal axis is quantitative—that is, the levels of the IV are increasing amounts of that variable (not differences in category)

3. Scatterplot type of variables?

◦ Use to plot the relationship between two continuous variables ◦ The value of one variable is represented on the x-axis and the value of the other on the y-axis ◦ Correlation ◦ Tells u if theres a strong positive relationship (tightly packed—homoganearous variance)

4. Pie chart

◦ Use to represent proportions or percentages ◦ -particularly useful when representing nominal scale info ◦ -most commonly used to depict simple descriptions of categories for a single variable psychology usually uses this

◦ positive skew ◦ Negatively skew

◦ positive skew ♣ classic would be income. (some have higher income) ♣ mode is highest/median slightly lower/ mean is lowest- positive skewed -tail is on the right (ending) -big to small --*** --- ________ ◦ Negatively skew ♣ education level (most ppl will score a's and b's some score lower ♣ mean is lower/ median is higher/mode is highest- negative skewed -tail is on the left (beginning) -small to big ________ --- *** --

Observational Research

♣ Goal is to measure naturally occurring behaviors ♣ A behavioral category (or coding scheme)- general and specific behaviors to be observed ♣ Most important- categories must be carefully (operationally!) defined by the researcher before observation begins ♣ Crucial for replication and for critiques/analyze -Observational research hinges on the quality of your system of observation o Lets say we are going to observe aggression in children during recess. How will you operationally define aggression?


Set pelajaran terkait

Client assessment CH.15 Assessing Head and Neck

View Set

electrical Electrical - Conductors !

View Set

Debt: US Government Debt Section 3

View Set

Unit 5 America's Past - 2nd Grade Social Studies

View Set

Auditing: Chapter 15 - Audit Reports for Financial Statement Audits

View Set