PY 211 Test 2

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

CI practice: Film and alcohol attitudes: M=70, SE=2, 95% Reaction team: M=1.5, SE=0.1, 99%

(2)(1.96) = 3.92 70 + 3.92 + 70 = 73.92, upper limit 70 - 3.92 = 66.08, lower limit 95% CI: 66.08 - 73.92, We are 95% confident that most people that seen this film are going to score between 66.08 and 73.92. Based on this information, if the true population mean was 70, 95% of the time sample means would b/t raw scores of 66.08 and 73.92. (0.1)(2.58) = 0.258 1.5 - 0.258 = 1.24, lower limit 1.5 + 0.258 = 1.76, upper limit 99% CI: 1.24 - 1.76 Based on this information, if the true population mean was 1.5, 99% of the time sample means would b/t raw scores of 1.24 and 1.76.

Confidence Intervals

-Interval estimates: Confidence limits 95% confidence interval 99% confidence interval How to figure: 1. Figure the standard error oM = square root o^2M = square root o^2 / N 2. Figure the raw scores for 1.96 standard errors (2-tail tests of p = 0.05, 95% confidence interval) or 2.58 st. errors (2-tail tests of p = 0.01, 99%) above and below the sample mean. M1!!!! Example: General population had mean of 5, individual scored a 2. You would use the 2. Example: oM = 1 Raw scores: (1) (1.96) + or - M = 95% confidence interval

2. Describe how we determine if a result is "statistically significant" and the impact of this decision on null and research hypotheses.

Statistically significant - when a sample score is so extreme (beyond our cutoff score) that the null is rejected. Biggest complaint about significance testing is they they are often misused. Example: Cutoff = 1.7, sample = 2.3 Reject/retain null? REJECT -> Results support research hypothesis. Our sample score is more extreme than the cutoff score. Cutoff = +/- 2, sample = -1.2 Reject/retain null? RETAIN/fail to reject -> Results are inconclusive. Our sample score was not more extreme than the cutoff score.

More on confidence intervals

-95% confidence interval: area in a normal curve on each side between the mean and the z score that includes 47.5% 47.5 + 47.5 = 95 -2.5% to the tail = 1.96 -95% confidence interval is from z-scores -1.96 and 1.96 -99% confidence interval: area in a normal curve on each side between the mean and the z score that includes 49.5% -0.5% to the tail = 2.58 -99% confidence interval is from z scores -2.58 to 2.58 How to figure: 1. Find Standard Error oM = square root o^2M = square root o^2 / N = o / square root N 2. Figure raw scores for z score given your confidence interval -95% CI -> raw scores for 1.96 standard errors above and below the sample mean -99% CI -> raw scores for 2.58 standard errors above and below the sample mean -Multiply 1.96 or 2.58 by SE Subtract this number from mean for lower limit Add this number to mean for upper limit (SE) (1.96) +/- M (SE) (2.58) +/- M

Hypothesis testing with a dist. of means. What would the comparison distribution be and how would you find your sample score to compare to the cutoff score?

-A dist. of means is the comp. dist. when a sample has more than one individual. -The Z score of a sample's mean on a dist. of means is Z = M - mewM / oM Z = sample's mean - distribution of mean's man / standard deviation of that distribution of means

Effect size

-Amount that two populations do not overlap. Distance between one population's mean from another (mew). Effect: Difference b/t sample mean and pop. mean stated in NH -Insignificant when null is retained (H0 supported, H1 inconclusive) -Significant when null is rejected (H1 supported) Effect size: Size of an effect in a population How far scores have shifted in the population Percent of variance that can be explained by a given variable -The amount of overlap is influenced by predicted mean difference and population st. dev. -A standardized effect size adjusts the difference b/t means for the st. dev.

Factors that decrease standard error, law of large numbers

-As the population st. dev. decreases, standard error decreases. -Example: Samples of size 2 (N=2) selected from one of five populations having a population with o1 = 4, o2 = 9, 16, 25, 81. -If the st. dev. is low, the standard error is low. -If the sd is high, the se is high. EVEN WITH sample size as the same. -A smaller sd allows us to be more confident in our predictions and our data results, the conclusions we come to based on our HT. -As sample size increases, standard error decreases. -LAW OF LARGE NUMBERS: Increasing the number of observations/sample size will decrease standard error. The smaller the SE, the closer a distribution of sample means will be from the population mean. By increasing our sample, the more approximate we are to the overall population. -Example: samples of size 4 (N=4), 9, 16, 25, 81. Standard deviations of all samples as 4 (the population st. dev. stays the same). -As sample size increases, st. error decreases. -If sample size decreases, SE increases.

Effect size assumptions, effect size assumptions in psychology

-Assume the sample st. dev. is representative (the population it comes from has the same variance) -Assume the pop. st. dev. for the experimental group is the same as that of the comp. dist. -In psychology, effect size is usually small 0.06-0.25 b/c human behavior is affected by a lot of different variables. 0.25 and larger is of Interest. 0.5 IS THE DESIRED EFFECT SIZE! What effect size means... -In an experiment measures the strength of your manipulation. -In a comparison of groups (Example: comparing two section of 211), measures the raw difference between them (those number of groups in our example 2) -In a correlational study, measures the strength of association between 2 variables -In general, the larger the effect size, the more likely a result is significant. More effect size, less overlap between 2 populations. -However, a result can have a large effect size and not be significant. And similarly, a result can have a small effect size and be significant. It depends on the mean difference of the populations, the st. dev. and the range of values.

Hypothesis testing controversies and HT in research articles

-Criticisms of basic logic of significance tests. Can be confusing/frustrating to people the direction we use in HT (goal is rejecting the null) rather than proving a thing or proving our RH. -BIGGEST COMPLAINT: Misuse of significance tests. This proves x, y, z ... this is not how that works. The RH is either supported or the study is inconclusive. -Significance and HT has been the lay of the land for so long that sometimes people default to this work/research without considering the broader context and if there is a better option. "rigorous research requires continued use of significance testing in the appropriate context, and adherence to recommendations that promote its rational use" ___________________________________________________________________________ -Reported with regard to specific statistical procedures used T-tests t(137) <- participants = 2.23, p < 0.01 (fell in cutoff range of p<0.01 result was significant and NH rejected) F-tests F(2, 321) = 4.61, p<0.05 (fell in cutoff p<0.05 so less than 5% if the EP did not have an effect) X2 <- Kai squares (2, N=89) = 12.31, p<0.001 -There is less than a 5%, 1%, 0.1% chance that we would have gotten this result if the EP did not have an effect. Just by chance [if the NH was true]. "Near significant trend" or "results approached statistical significance" -Really close but not quite. EX: p < 0.05, p=0.06. -Set cutoff scores BEFORE we run our test. Set expectations so some do not convey something as more significant than it is. -Close to having a statistically significant test, but it did not reach the cutoff. Not significant, ns -Do not reject the null, study is inconclusive. Asterisks are often provided in a table format to show statistical significance.

Decision errors: Type 1 and Type 2 Occurs even when... Blue ink example

-Describe the relation of the decision using HT procedure (Decision to reject/retain the H0, null hypothesis.) with results of a real study to the true (but unknown) real situation. -Decision we choose to make from our analysis/study compared to what might be happening in the real population - the true situation, but this will typically be unknown to us. We are pulling samples out of populations to make inferences on the overall population. We do not always know what the truth of the overall population is, but we do the best we can in pulling these samples, getting the samples to be approximate as possible to the population so that we can say with a degree of confidence that we think that this is what is happening. -Decision errors occur even if all computations are correct. This is why stat. significance says: I am 95% sure that there has to be something else going on here. "I must have messed up in the equation, subbed in the wrong number, etc." This is not just a mathematical error, we are talking about this concept of doing the best by coming to the best using math, but even the decision our correct math leads us to may not be exactly correct. ___________________________________________________________________ Type 1: Reject the NH when in fact it is true Support false RH, H1 Alpha (a) probability of making a type 1 error Example: sig. level is 0.05, alpha = 0.05 sig. level is 0.01, alpha = 0.01 "Manipulation had an effect." Not true. Example: students who take notes with blue ink perform better on exams than other color inks. Math tells us that students who use blue ink DO better on the exam, but in fact the truth is that students that use blue ink do not do any better but some other factor or noise is involved. Type 2: Do not reject the NH when it is in fact false Determine true RH, H1 is inconclusive Beta (B) probability of making a type 2 error Example: People who write with blue ink perform better on exams. Our data comes out but we say there is no difference in writing with blue ink versus other inks, but there was a difference Math can be all right, but there is ALWAYS the possibility for error. Why we never say a point is proven by any particular study; say RH is supported (reject the null) or data is inconclusive (retain the null).

One-Tailed and Two-Tailed Hyp. Tests

-Directional hypotheses need the one-tailed test. We are hypothesizing what direction things are going to occur in. Example: RH: We hypothesize that rats are going to move faster through the maze if they are given CBD. Looking for a result on ONE of the tails. One side of the mean. -The direction of the result is predicted. If it does not fall in or within the cutoff range then we do not support our research hypothesis. If beyond it, it would. -It is called a one-tailed test b/c the hyp. test looks for an extreme result at just one end of the comparison distribution. -EVEN IF A SCORE IS VERY EXTREME, IF ON THE OTHER SIDE OF THE MEAN (THE OTHER TAIL WE ARE NOT LOOKING AT), THEN IT IS STILL NOT IN OUR CUTOFF ZONE AND WE DO NOT SUPPORT OUR RESEARCH HYPOTHESIS. -'More than' 'Less than' 'Faster' 'Slower' -Non-directional hypotheses need the two-tailed test. Example: RH: We think that giving rats CBD will affect their time in completing the maze. No clear direction (rats could go slower or they could go faster and that hypothesis be supported). Looking for a result extreme enough to be on EITHER tail. -Look for extreme result at either end of the comparison distribution. -Direction of the result is not predicted! -Look for results in potentially 2 cutoff zones. If the significance level is 0.05, then you split each tail to 0.025. If a significance level of 0.01, split each tail to 0.005. -Example: 'alters/effects the speed of the rats' Does not specify if it faster or slower. 'A difference' 'Is different than' -One-tailed tests have a bigger area on just one side of the mean. -If a one-tailed test is used and result is in the direction opposite from the predicted direction, no matter how extreme, it cannot be considered significant. Therefore, researches generally prefer to use two-tailed tests except where it is very clear only one direction of outcome would be of interest.

Meta-Analysis

-Systematic procedure for combining results from different studies. -The current gold standard of research today. -Cannot compare significance results from different studies. Example: 1 study has a sig. of 0.05 and another at 0.01, potentially more stringent, but you cannot say the 0.01 study had a larger effect than 0.05. Not how that works. -Provides an overall effect size. Can compare effect sizes for different subgroups of studies -Common in the more applied areas of psychology like intervention studies

Core Logic of Hypothesis Testing CBD rats example: Considers the probability that...

-HT considers the probability that the result of a study could have occurred if the experimental procedure had no effect. -We set up a study and we are going to try and understand the probability that whatever we were looking at in that study/experimental procedure - look at the probability that we got the result that we do from our study but assuming that the procedure had no effect. -Example: Hypothesis: Dosing the rats with CBD would make them go through the maze faster because it lowered anxiety or helped them to think more clearly. Rats through a maze. Our experimental procedure is we give them a dose of a drug or chemical like CBD and see how they perform. This is our experimental procedure. So whether the EP has an effect or not - we will have the results of that study and the process of HT is looking at those results and going 'OK, well if the CBD wasn't really having an effect here, how likely is it/what is the probability we would have gotten these same results? If that probability is really low (if the EP had no effect) then we're probably going to be more likely to say it seems the EP actually had an effect. We would thus say the EP is supported because it does seem the CBD had an effect on these rats. -If this probability is low, probability of no effect is rejected, and the theory behind the experimental procedure is considered to be supported.

Cohen's d: What does it measure? What causes it to increase?

-Measures the number of st. devs. an effect shifted above or below the population mean by the NH -One way to standardize the effect size. -Value for Cohen's d is 0 when there is no difference between the 2 means and increases as differences get larger. How to figure: d = Mew1 - Mew2 / o (st. dev.) or d = M - mew / o Mew1 = Mean of population 1 Hypothesized mean for the population that is subjected to the experimental manipulation Mew2 = Mean of population 2 Mean of the comp. dist. O = St. dev. of Population 2 Assumed to be the st. dev. of both populations Example: Population mean on GRE was 558 +/- 139 (mew +/- sigma/O). Mean in our sample was 585 (M=585). What is the effect size using Cohen's d? d = M - mew / o Numerator: difference between the sample mean and pop. mean. Denominator: Pop. St. Dev. 585-558 / 139 d = 0.19 -Observed effect shifted 0.19 st. devs. above the mean in the population. Students in the elite school score 0.19 st. deviations higher, on average, than students in the general population. Example: Sample mean 208, comp. dist. mean 200, st. dev. 48 208-200/48 d=0.17 What happens when the mean difference is sixteen and the pop. st. dev. is still 48? d = 0.33 AS THE MEAN DIFFERENCE BETWEEN SCORES INCREASE, OUR EFFECT SIZE (D) ALSO INCREASES B/C THOSE CURVES ARE MOVING FURTHER AWAY FROM EACH OTHER AND OVERLAPPING LESS. What happens to the effect size when the mean difference is back to 8, but the pop. st. dev. is 24? d = 0.33 AS WE LOWER THE POPULATION'S ST. DEV., THE EFFECT SIZE GETS LARGER. IT TAKES LESS DIFFERENCE B/T THE SCORES FOR THE EFFECT SIZE TO BE BIGGER B/C THE ST. DEV. IS SMALLER.

3. How do we choose between conducting a one-tail or two-tail test and how does this impact the determination of our cutoff Z score?

-One-Tail: Research question is asking to see change in a particular direction (i.e. increase or decrease, improvement, etc.) -Look for entire alpha/p-value/significance level in only 1-tail (E.g. 0.01 = 1% in the entire tail, 0.05 = 5% in the entire tail). -Two-Tail: Research question is asking to see a change but the direction is not specified or predicted. -Look for alpha in both tails; need to divide the % into the 2 tails (divide by 2) (E.g. 0.01 significance level = 0.5% in either tail, 0.05 = 2.5% in either tail)

Statistical power

-Probability that the study will produce a stat. significant result if the RH is true. Power: The ability to detect an effect. Possible to support our H1, but if we do not have enough power in our study, then our study will not be able to detect that significance. -Analysis is using a magnifying glass to look for the differences but if your magnifying glass is not powerful enough, then you will not be able to find the differences that do actually exist. How to figure power: 1. Gather the needed info. Mean and st. dev. for population 2 and the predicted mean of population 1. 2. Figure the raw-score cutoff point on the comp. dist. needed to reject the NH. 3. Figure the z-score for this cutoff point, but on the distribution of means for population 1. [In most situations, figuring power by hand is complicated. Technology - power software packages, internet power calculators, power tables to be used. We need to know what data we need to make a study worth doing. A lot of times researchers do power analyses before collecting data. Let us know how many participants we need to know in each sample (N) what difference we will actually be looking for. Especially important when writing research grants, like in NIH. Difference we will actually be looking for. Example: Curve with power at 37%, beta at 63% The sample we collected, research hypothesis based on Population 1. Only 37% chance of study panning out. We want to increase our power maybe by increasing our sample size (N). Power we could have expected. Beta is the opposite of power, probability of making a Type 2 Error. M=208 -We figured the z-score of the raw cutoff score (209.84). Our M is 208, before the cutoff or rejection region, we retain the null. Study is inconclusive. NH situation comparison dist. based on Population 2. Alpha 5%. Not given personality information, comp. dist. Set cutoff score here, figured its raw score (209.84). Our z-score was +1.64. Step 1: Mean and st. dev. of population 2 Mean = 200 Mean of pop. 1 = 208 2: Raw-score cutoff point on comp. dist. needed to reject NH z-score for 0.05 was +1.64, (X-M)/SD 3: Figure z-score for cutoff point, but on dist. of means for Population 1 4: Figure probability of getting z-score more extreme than that z-score Recall we set our alpha/cutoff zone on Population 2 [the comparison distribution]. From there, you draw that determined raw cutoff score needed onto Population 1, with its predicted mean as the mean for that curve. Compare where the line falls.

Statistical inference, null hypothesis significant testing, p-values

-SI: How we can go from describing data we already know to making inferences for data we do not have. Making decisions about ideas/hypotheses. Example: Listening to Mozart improves Calculus grades." -2 groups of randomly chosen 25 students. 1/2 listened, 1/2 did not. -Those who listened to Mozart scored on average 3 points higher. -Sample statistics are just estimates of the mean of the population they are taken from. -1. One way you could test a hyp. is by how well it predicts the data you got. Example: You guess baby giraffe spots to have a M=175, SD=50 and your friend M=209, SD=45. You then take the zoo's sample of M=200 and calculate to see which one of you is more likely to be right. Draw your two curves and then place a line in the middle of the sample's mean (M=200). -2. Null Hypothesis Significance Testing (NHST) -Example hyp.: People with gene X eat a different amount of calories than the general population. -NHST asks you to test a different hypothesis which says there is no difference or effect in this gene. How well does this Null Hypothesis predict the data we have collected?, H0 = Population mean caloric intake for people with Gene X is 2,300 the same as the regular population. H0 = N(Gene X) = 2,300 Reducto ad absurdem - Argument which tries to discredit an idea by assuming the idea (H0) is correct and then showing that if you make that assumption, something contradictory (H1) happens. So, assume M=2,300. 60 people with Gene X had a M=2,400 and SD=500. How rare or absurd is it to to get a sample mean this far away from our assumed mean of 2,300? P-values: Answers the question of how rare your data is by telling you the probability of getting data that is as extreme as the data you observed if the NH was true. -If p were 0.10 you could say your sample is in the top 10% most extreme samples we would expect to see based on the distribution of sample means. -This would be a one-tailed test b/c we want our research hypothesis to be true that people with Gene X eat more caloric intake. 2-tail example: Does this medicine have a different level of efficacy than the existing treatment? You do not know which direction the effect will be (could be better or worse). -You decide if the sample is extreme enough to reject the null. In NHST p-values need a cutoff. Could say p=0.05, if less than 0.05 then we have sufficient evidence to allow us to reject our NH as true. And any p-value would count (0.049 or 0.0001), if this does occur, then it is statistically significant and would be in the top 5% (unlikely to happen due to random chance alone).

Influences on power continued

-Sample size: Affects the sd of the distribution of means; remember: as sample size increases, sd/error decreases. MOST COMMON ALTERNATIVE TO INCREASING POWER. sigmaM = square root sigma^2 / N -Significance level (alpha): Raising alpha (e.g. .10 instead of .05 or .01) gives more opportunity to find significant effect. Often kept as 0.05 or 0.01 Con: increases likely of type 1 error -One- versus two-tailed tests More power in one-tailed test b/c great spread to detect effect/difference 5% in one-tailed, 2.5% in two-tailed Most of the time we increase N or sample size, do lit reviews and understand studies before our own to determine our own decision to use a 1-tailed test (lot of research in that area) More N, less SE. Example: Perpetration of sexual aggression in college students. How exposure to childhood violence and parenting style influence that. In the lit review there is already lots of info that experience to violence make it more likely to perpetuate an act. Could set it up as a 1-tail test, confidence it could be on one side of the graph. All rejection region gathered at 1-tail which will be bigger. ____________________________________________________________________________ Effect size (d) = large d increases power Effect size combines the following two features: Mean differences = large differences [increase the intensity of experimental procedure] St. dev. = small o [use a less diverse population] Sample size = large N Significance level = lenient (high a, 0.05 or 0.1) One-tail or two = one tail Type of hypothesis testing procedure used = varies

-Population of individuals that follows a normal curve M=60, SD=10 -What are the characteristics of a distribution of means from this population for samples of 4 scores each? -Shape of distribution? Distribution mean? Variance of the distribution of means? Standard error? Z = M - mewM / oM o^2M = o^2 / N oM = square root o^2M = square root o^2 / N oM = o / square root N

-Shape = normal -Mean = 60 -Variance = 25 o^2M = o^2 / N 10^2 / 4 -> 25 -SE = 5 oM = square root o^2M = square root 25 = 5 OR oM = o / square root N = 10 / square root 4 = 5

Distribution of means, shape POPULATION NOTATION: MEAN : MEW VARIANCE : SIGMA SQUARED SD : SIGMA

-Shape is approximately normal if either: Each sample includes 30 or more individuals or -The distribution of the population of individuals is normal. -Is possible for your population of individuals to be skewed in one direction or more or less kurtic and so this is another reason why we like to have these bigger sample sizes, approximate more of a normal curve. -(A) Distribution of populations of individuals is a normal curve -(B) Distribution of a particular sample. Drawn from a and not as much data (blocky looking, lot smaller). A distribution of a SAMPLE pulled from a population. -(C) Distribution of means. Taking several samples to create a distribution of means. Map onto the population of individuals. Distribution is a little more but because it is so approximate to the mean, we get a highly kurtotic curve peaking. MUCH MORE PEAKED.

The Hypothesis Testing Process Steps 2, 3 (critical value), 4, 5 YOUR COMPARISON DISTRIBUTION IS THE SITUATION IN WHICH THE NULL IS TRUE

-Step 2: Determine the characteristics of the comparison distribution. -Also called a sampling distribution. -In HT, the actual sample's score is compared to this comparison distribution. -Could be comparing two different groups or comparing back to what you think is happening in the overall population or if you have that data already. -Example: Compare our 211 section to all 211 students. Then all 211 students is the comparison dist. -Our class (our sample) it being compared back to that comp. dist. -Do students in our section do better or worse? Does time you take the class make a difference? -Step 3: Determine the cutoff sample score (critical value) on the comparison dist. at which the NH should be rejected. -If our score goes beyond this then that is the point at which we should reject the NH (hypothesis if not supported means then our experiment did have an effect). -Step 4: Determine your sample's score on the comp. dist. The sample score will be the statistics presented in this course. -Step 5: Decide whether to reject the NH. The big decision. -If the sample score is more extreme than the cutoff sample score, the NH can be rejected - the experiment has had an effect. -If the sample score is not as extreme as the cutoff sample score, the NH cannot be rejected - the results are inconclusive. THIS DOES NOT MEAN the NH is accepted or our RH is rejected, but we say our results are inconclusive.

7. Define effect sizes. What are advantages, typical conventions, and influences on effect size?

-The degree to which an experimental manipulation separates 2 populations; indicates the size of a statistical effect. ADVANTAGE: SIGNIFICANCE LEVEL DOES NOT TELL US THE SIZE OF THE STATISTICAL EFFECT; only the effect size can do this. ADVANTAGE: Because they are standardized, we use them to compare different studies and conduct meta-analyses. CONVENTIONS: Small: Cohen's d +/- 0.2 Medium: Cohen's d +/- 0.5 Large: Cohen's d +/- 0.8 INFLUENCES: -Increase mean difference (actual difference between the means of our sample and comparison distribution) = increase effect size. The gap between the two distributions grows or gets bigger. -Decrease standard deviation = increase effect size

Influences on power

-The greater the effect size, the greater our power -Difference between population means [very different means b/t the two populations] Diff. b/t means goes up, power goes up -Population st. dev. [very small population sd] Pop. sd. goes down, power goes up -Figuring power from predicted effect sizes Predicted mew1 = mew2 + (d)(o)

8. Define power and its influences.

-The probability that if the research hypothesis is true, the experiment will support it. -Think of the magnifying glass: we want our magnifying glass/experiment to be powerful enough that if there actually is a true effect, that it is powerful enough to actually detect it. INFLUENCES: -Increase effect size = greater power -Increase number of participants (sample size) = greater power -Smaller standard deviation of original population = greater power -1-tailed or 2-tailed test = 1-tailed has more power because your rejection region is bigger if only in 1-tail of a distribution.

Effect size versus statistical significance

-Theoretically oriented psychologists emphasize significance -Applied researchers emphasize effect size (gets to idea of practical significance better) Treatment, intervention studies -But it is increasingly common for effect sizes to be reported in research articles [Especially true for meta-analyses] More common to be 'BOTH/AND' Use both effect size and significance! -Include effect size as columns and then provide asterisks if significant

Inferential Statistics -Hypotheses

-These statistics help us to make inferences or predictions about an overall population based on our sample statistics. -Involve making inferences or educated guesses about populations based on information from samples. -Hypotheses and supporting or refuting those hypotheses about those populations based on the information from the samples. -As compared to descriptive stats which merely summarizes known information (mean, median, mode, SD, variation). -Especially important b/c they provide a basis for drawing conclusions about the world in general (populations that cannot be measured as a whole. Remember we can very rarely measure an entire population) based on results from particular groups of people studied (samples). -You can very rarely measure the entire population, but you can get information from those samples or particular groups and hopefully if our samples are representative enough of the entire population then we can perhaps draw some higher conclusions off a smaller set of data.

One-Dependent Sample z-Test: When do you use?

-Use to test hyp. concerning the mean in a single population with known variance. -In z-tests we have a known variance, we know the variance of a population. -Very rare! Often not reported unless a small and feasible population like all 3rd graders in one school district vs. everyone in the US. -Non-Directional, Two-Tailed Tests (H1: not equal) Alternative hypothesis is stated as not equal to Interested in any alternative from NH Greater than or less than are long as not equal to. -Directional, One-Tailed Tests (H1: greater than or equal to) or (H1: less than or equal to) Alternative hypothesis is stated as greater than or less than Interested in specific alternative from the NH Is it greater than or not greater than? Is it less than or not less than? Not necessarily both.

Importance of power when evaluating study results

-When a result is significant: a. Stat. significance versus practical significance: Could be statistical significance in a study but it does not lead to very practical outcomes. b. Big sample size, a lot of power, but is the difference enough to make a noticeable impact? Could be statistical significance in a study but it does not lead to very practical outcomes. Example: Drug treatment program, opioid addiction. Statistical significance is great, but it is possible that our SS could come from reducing the number of times a person shared heroin needles from 5 times a day to 2. That is good harm reduction, but practically we would really prefer to not have any needle sharing at all. Math significance is good, practical significance is not. -When a result is not statistically significant Result Stat. Sign.? Yes -> Sample size: small -> Important result Yes -> Sample size: large -> Might or might not have practical importance No -> Sample size: small (small power to start with, it could have an effect but we did not have enough power to find it, make bigger changes to our study like rework our H1) -> Inconclusive No -> Sample size: large (big power) -> Research hypothesis probably false

Controversies of significance tests and confidence intervals: Which should we use? Reporting in research articles

-Which should we use? -CI: Give additional information Focus attention on estimation Less likely to be misused by researchers (Because they are estimations, not saying "I prove this!" People are just to say it is really likely to happen) -ST: Necessary for some advanced stat. procedures More recently, we are now seeing both in the same studies. This is helpful because CIs better contextualize and give more information around the significance tests. And significance tests can also better help us understand, well what are we saying about these CIs or estimations? __________________________________________________________________________ Z-test rarely reported due to known population's M and SD being not known most of the time Standard error, SE, SEM see much more often and reported as standard error bars. Shown a bar chart with lines on top. "Based on the data what we found can fall as high or as low as this"

Data-Driven Hypothesis Testing -What is a hypothesis? -What is the null hypothesis and the alternative/research hypothesis? -NH = innocent until proven guilty

1. Formulate hypothesis. 2. Find the right test for the hyp. 3. Execute test. 4. Make decision based on result. Hypothesis: An idea that can be tested. Example: The mean salary is $113k. Null hypothesis, H0, The mean salary is $113k. H0 = M0 = $113k. Alternative/research hypothesis, H1/HA, The mean data salary is not $113k. M0 does not equal $113k. You accept the NH if the H0 is close enough to the true mean. Reject the NH if the M0 is too far away from the true mean. NH: Innocent until proven guilty (you assume the H0 is true until rejected). One-tail Test: "Data scientists make more than $125k per year." NH: H0: M0 > $125k AH: H1: M0 is less than or equal to $125k. Outcomes refer to population parameters, generally researchers are trying to reject the NH.

The Hypothesis Testing Process Step 1 HYPOTHESES ABOUT THE POPULATIONS

1. Restate the question as a research hypothesis and a null hypothesis about the populations. Example: Does CBD affect maze completion time in rats? Population 1: Rats who do not have any CBD Population 2: Rats who do Research hypothesis/alternative hypothesis: We believe that CBD is going to affect maze completion time in rats. Null hypothesis: CBD does not affect maze completion time in rats.

9. A large school district is considering implementing a program to improve the reading scores of its students. The current reading scores for the district are normally distributed with a mean of 37 and a standard deviation of 12. The administrators decide to test the new program in one school of 340 students and found an average reading score of 48 in this sample. a. What is the null hypothesis? b. What is the research hypothesis? c. What is the μM? d. What is the variance of the comparison distribution? e. What is σM? f. Is this a one- or two-tailed test? g. What is the cutoff sample score with a significance level of 5%? h. If the mean score of the sample is more extreme than the cutoff score on the comparison distribution, what will the administrators decide? i. What is the sample's Z score on the comparison distribution? j. Should the administrators reject or retain the null?

A. Reading scores for the students participating in the program will be no different or LESS THAN (particularly looking to IMPROVE reading scores) the population of students who did not participate in the program. B. Reading scores for the students participating in the program will be higher than the population of students who did not participate in the program. C. μM = μ 37 D. σ^2M = σ^2 / N 12^2/340 = 0.42 E. σM = square root of σ^2M square root of 0.42 = 0.65 F. One-tailed because you are only looking for improved (increased) reading scores. G. +1.64 (1-tailed, 5% significance level) H. The program significantly improved reading scores (reject the null). I. z = M - μM / σM 48 - 37 / 0.65 = 16.92 [More extreme than our cutoff score.] J. REJECT THE NULL; the data supports the research hypothesis that this program improves reading scores.

5. Define confidence intervals.

Area in which a research can be 95 or 99% confident contains the true population mean. -Figure by finding the cutoff points: -95% CI: lower 2.5% and upper 2.5% of distribution - area between those 2 cutoff points will be the 95% CI. -99% CI: lower 0.5% and upper 0.5% of distribution. 1% left-over -> split 0.5% between 2 tails. Area in between these two cutoff points is the 99% CI.

Distribution of means and their characteristics (mean, variance, SE) TAKE SEVERAL RANDOMLY SELECTED SAMPLES FROM A POP., EACH OF THE SAME SIZE, MAKE A DIST. OF THOSE SAMPLES' MEANS. TENDS TO FORM A NORMAL DISTRIBUTION: UNIMODAL AND SYMMETRICAL AS SAMPLE SIZE INCREASE, DIST. OF MEANS BECOMES BETTER APPROXIMATION OF THE NORMAL CURVE

Comparison Distribution in the chapter before... Comparing sample score for one person versus a comparison distribution of the distribution the individual score came from. Now... Comparing a large number of samples of the same size, with each sample being randomly drawn from the same population of individuals. -Comparison distributions considered so far were distributions of individual scores (comparing an individual score to the rest of the sample or distribution that score came from). -Now using hyp. tests involving means of groups of scores, so comp. dist. will be a distribution of means -Our CD is no longer the distribution that the individual score came from, but a dist. of means -Theoretically, dists. of means are based on a very large number of samples of the same size (with each sample randomly drawn from the same population of individuals) -Example: Population of individuals: UA students 5 different samples from the student body population and all 5 of those samples had 100 people each. 5 samples - 100 each, total of 500 students but all divided into different samples that will typically have the same size in them. Versus originally comparing 1 UA student's sample score. CHARACTERISTICS: -Its mean is the same as the mean of the population of individuals -Its variance is the variance of the population divided by the number of individuals in each of the samples mew(M) = mew Distribution of means = population mean o^2(M) = o^2 / N Variance of the dist. of means = population variance divided by the number of individuals in each sample Example: 5 samples of 100, N=100 -Its st. dev. is the square root of its variance AKA STANDARD ERROR (SE) of the Mean (SEM) o(M)/SE/SEM = square root o^2(M) = square root of o^2 / N or o / square root of N SE = square root of the variance of our dist. of means = square root of the population variance divided by the number of individuals per sample

One-Dependent Sample z-Test example problem 2

Directional, Upper tail test -Population mean on GRE 558 plus or minus 139. Sample of 100 students (N=100), sample mean equals 585. -Compute one-independent sample z-test at a 0.05 sig. level 1. H0: mew is less than or equal to 558 - mean test scores are less than or equal to the population of students at the elite school H1: mew is greater than 558 2. Level of significance is 0.05, alpha = 0.05 One-tail test, Table A-1 (0.05 in tail) z=1.64 and 1.65 (average = 1.645) ONLY POSITIVE 1.645 3. Compute statistic M - mew / oM oM = o / square root of N oM = 139 / square root 100 = 13.9 585-558 / 13.9 -> 1.94 4. Decision. Compare obtained value to critical value 1.645 Obtained value is greater than critical value and falls in rejection region Decision to reject NH Conclusion: Mean score on the GRE in this population is significantly different than 558.

4. What are basic assumptions and principles about a distribution of means that is used as a comparison distribution in hypothesis test? a. What is the mean, variance, and standard error of the comparison distribution in a study with 49 participants pulled from a known population with a μ = 500 and σ2 = 98?

Distribution of Means: -Take several randomly selected samples from a population, each of the same size, make a distribution of those samples' means. -Tends to form a 'normal distribution': unimodal and symmetrical -As sample size increases, our distribution of means becomes a better approximation of the normal curve. (A) MewM = 500 | Distribution of Mean's Mean = Population Mean Variance of the distribution of means = 2 | Variance of the DOM = Variance of the population / number of participants per sample SEM = 1 | SEM = Square root of the variance of the DOM

Type 1 Error, Alpha and Type 2 Error, Beta

Example: -If the truth is the person is innocent of the crime and jury decides they are innocent, that is correct. -If the truth is a person is innocent, but the jury finds them guilty, Type 1 Error. -If the person committed a crime and jury decides they are guilty, that is also correct. -Alternatively, if a person is guilty and the jury finds them innocent, Type 2 Error. -If we find an EP had no effect and our test came back and said it had no affect, correct. If the truth is the EP did have an effect and we say the NH did is false (that were was no effect) that is also correct. -If the truth was that CBD did have an effect on rats [reject null] (H0 is false) but we said the CBD in our data did not [retain null], that is a Type 2 error. -If the truth is the RP had no effect [retain null] (CBD had no effect) but in our data we said CBD did have an effect [reject null], that is a Type 1 error. -CBS did have an effect, but it did not actually, and we kept dosing rats and people with CBD, that is eventually going to come out. We need to control for these as much as possible so we do not publish any false information/science. -Why science is a longer process. -Type 1 Error example: Publish false results on vaccines causing neurological affect or ability. Truth: Vaccines do not cause neurological differences, Null Hypothesis. Data: Vaccines DO cause neurological differences, Research Hypothesis. -Type 1 Error, what we are mostly looking at with our p-values (p < 0.05, less than a 5% chance this EP did not have an effect even though we saying that it is. p < 0.01, less than a 1% chance that our EP did not have an effect even though our stats are saying that it did), also known as alpha. -Type 2 Error, also known as beta. AS YOU REDUCE THE CHANCE FOR ALPHA, YOU MIGHT BE INCREASING THE CHANCE FOR BETA AND VICE VERSA. BUT REMEMBER TYPE 1 ERRORS ARE FAR MORE SEVERE, LARGEST CONCERN, THAN TYPE 2.

One-Dependent Sample z-Test Example problem 1 Population mean on GRE was 558 plus or minus 139 (mew plus or minus sigma). Select sample of 100 participants (N=100). Sample mean equal to 585 (M=585). Whether or not we will retain the population mean stated by the null (mew=558) at a 0.05 significance level (alpha = 0.05 or p < 0.05).

Example: Non-D, Two-Tail -Population mean on GRE was 558 plus or minus 139 (mew plus or minus sigma). Select sample of 100 participants (N=100). Sample mean equal to 585 (M=585). Whether or not we will retain the population mean stated by the null (mew=558) at a 0.05 significance level (alpha = 0.05 or p < 0.05). Is 585 different enough from 558 given a sigma of 139 to say it is statistically significant? 1) State H0 and H1 H0: mew equals 558, mean test scores are equal to 558 in the population. H1: mew does not equal 558 2) Determine level of sig. Level of sig. is 0.05, which makes the alpha 0.05. TWO TAILED TEST so divide 0.05 in half to get 0.025 -Locate critical values in Table A-1 (0.025 to tail). z= 1.96 to right of mean (upper tail) and z=-1.96 to left of mean (bottom tail) -Regions beyond critical values are called rejection regions (1.97 and higher, -1.97 and lower) 3) Compute test statistics Sample mean - pop. mean / standard error of the mean z-statistic: z(obt) = M - mew / oM, where oM = o / square root N [population's SD / square root of N] Standard Error for denominator: 139 / square root 100, 13.9 585 - 558 / 13.9 = 1.94 4) Make a decision Compare obtained value to critical value (1.94 vs. 1.96) Reject null if obtained value exceeds critical value Obtained value is less than critical value and does not fall in the rejection region Decision is to retain the NH Conclusion: Mean score on quant. portion of the GRE in this population is not significantly different than 558 (value stated in the null)

-Researcher predicts that making people hungry will affect how well they do on a coordination test. -Randomly selected person, not eat for 24 hours, score of 400. -People in general, M=500, SD=40. -0.01 significance level. Does hunger affect coordination test performance?

H0 = Coordination test scores after when hungry is the same/does not differ. H1 = Coordination test scores after when hungry are statistically significant. Comp. Dist. M = 500 O = 40 Cutoff: 2-tail, sig. level 0.01 -> -2.58 and 2.58 Z-score (x-m) / sd 400-500/40 = -2.5 RETAIN THE NULL. Not extreme enough. The result is not significant. Retain the null; the study is inconclusive as to whether hunger affects coordination test performance. NEVER PROVING A HYPOTHESIS, IT'S EITHER SUPPORTED OR INCONCLUSIVE.

A training program to increase friendliness, tried on one individual. -General public mean is 30, st. dev. is 4. -5% significance level. -Individual mean is 40, his score is 40. Sig. Level 0.05, 1-tail = -1.64 or 1.64 | 2-tail = -1.96 and 1.96 0.01, 1-tail = -2.33 or 2.33 | 2-tail = -2.58 and 2.58 Does training program increase friendliness?

H0 = Friendliness score after program is the same/does not differ from the general public. H1 = Friendliness score are higher after receiving training program. Comp. Dist. M = 30 O = 4 Cutoff: Sig. Level 5%, 1-tail = +1.64 Z-Score on Comp. Dist. z = (x-m) / sd -> 40-30/4 = +2.5 REJECT NULL. Extreme enough (+1.64). The research hypothesis is supported, the program appears to increase friendliness based on this stat. significant result.

-25 women, take part in a special program to DECREASE reaction time. -M of women = 1.5 -General women M=1.8, sd = 0.5 -0.01 significance level

H0 = Mean reaction times are the same/do not differ. H1 = Mean reaction times are less than the mean reaction time. Comp. Dist. M=1.8 SD=0.5 o^2M = o^2 / N 0.5^2/25 = 0.01 Cutoff: 1% significance level, one tail, -2.33 Z-Score M-mew / o^M 1.5 - 1.8 / 0.1 = -3 REJECT NULL. The research hypothesis is supported; the program does appear to decrease reaction time. The distribution of means uses many samples of the same size with each sample randomly taken from the population of individuals. Since the sample score could occur by chance 1% of the time or less on the dist. of means, the results support the hyp. that the program significantly decreases reaction times.

-A psychologist is working with people who have had a major surgery. Proposes people will recover quicker if friends and family are in the room in the first 48 hours. -It is known, M=12, SD=5 -Patient recovers in 18 -0.01 significance level

H0 = Surgery recovery is the same/does not change. H1 = Surgery recovery time is faster when social support is present. Comp. Dist. M=12 O=5 Cutoff: 1% significance level, 1-tail -> take less time = it is faster, -2.33 Z-score z = (x-m) / sd 18-12/5 = +1.2 Reject the null? NO! The result is not significant. Retain the null; the study is inconclusive as to whether social support speeds up recovery. Given only the information, how do we know to retain the NH based on the individual sample before performing any calculations? -Only because it is a 1-tail test, our one sample tells us they recovered MUCH SLOWER. Sample goes in the opposite direction of what we are testing. Why we get larger sample sizes - they could be an outlier.

-Certain film will change people's attitudes to alcohol. -Select 36 people, M=70 -People in general M=75, SD=12 -5% significance level -Does viewing film change people's attitudes toward alcohol?

H0 = The mean alcohol attitude gave is the same/does not differ H1 = The mean alcohol attitude gave of the sample shown the film is different. Comp. Dist. M=75 O=12 O^2M = o^2 / N = 12^2 / 36 = 4 Cutoff: 5% sig, 2-tail -1.96 and 1.96 z-score Zcomp = M - mew / oM 70 - 75 / 2 = -2.5 REJECT NULL. The research hypothesis is supported; seeing the film does appear to change alcohol attitudes. The distribution of means uses many samples of the same size with each sample randomly taken from the population of individuals. Since the sample score could occur by chance 5% of the time or less on the dist. of means, the results support the hyp. that seeing the film significantly change people's attitudes to alcohol.

Comparing the three types of distributions

Population's Distribution: Scores of all individuals in the pop. Could be any shape, usually normal Mean: mew Variance: sigma^2 St. dev.: sigma Particular Sample's Distribution: Scores of the individuals in a single sample Could be any shape Mean: M = (EX)/N Variance: SD^2 = [E(X-M)^2]/N St. Dev.: SD = square root SD^2 Distribution of Means: Means of samples randomly taken from the population Approximately normal if samples have more than or equal to 30 individuals in each or if the population is normal Mean: mewM = mew Variance: o^2M = o^2 / N St. Dev.: Square root of o^2M

One-Tail and Two-Tail Cutoff Scores

Significance Level: 0.05 -> 1T = -1.64 or 1.64 -> 2T = -1.96 and 1.96. Significance Level: 0.01 -> 1T = -2.33 or 2.33 -> 2T = -2.58 and 2.58. -Recall for one-tailed tests, we are looking at either the left or right tail, not both tails. 'or' -Two-tailed tests are non-directional. We are not predicting less than or greater than, so we use BOTH z-scores/critical values. "Expecting a score lower than" -1.64, -2.33 "Greater than" +1.64, +2.33 JUST LOOKING AT THE LEFT OR RIGHT TAIL EXCLUSIVELY When we set our significance level from 0.05 to 0.01, the absolute value of these numbers are getting bigger because we are giving ourselves a 1% chance as opposed to 5%. The absolute values of these values on the curve are getting further out. The rejection regions get smaller.

Cohen's effect size conventions

Small d = 0.2 Medium d = 0.5 Large d = 0.8 or more

STUDY GUIDE: 1. Explain hypothesis testing including the goal of the process, null hypotheses, research hypotheses, and how a comparison distribution with a "cutoff score" relates to these hypotheses.

Step 1: Null and Research Hypotheses Determine if experiment, program, manipulation works by attempting to reject the hypothesis that it does not work (the null). -Null: Typically worded as 'no difference.' 'zero difference' -Research: Typically worded as there 'is a difference,'; or one is 'greater' or 'less than' another. Step 2: Comparison Distribution Distribution to which we compare our sample back to. Could be our population, another group or sample we have data/information for. -This is the situation in which the null is true!!! If comparing back and the null is 'there is no difference.' Step 3: Cutoff Score Point we set on the comparison distribution in order to decide whether to reject/retain the null. -Points are determined by level of significance (typically 0.01 or 0.05) and 1 or 2-tailed. -Reported in research as: p < 0.01 or p < 0.05 (Probability we get this result if the null is true) -Represents the point at which if the null is true, a result more extreme than this point is unlikely. E.g. if we reject the null at 0.01 level, it means there is a less than 1% chance of getting such an extreme result if the null is true.

6. Define and provide examples of Type I and Type II errors.

Type 1 Error: -Reject the null when it should be retained (research hypothesis is false/inconclusive/not supported). -Alpha (= to the significance level) = probability of a T1E. -Increasing the significance level = increased probability for a T1E. 0.1, 0.15 instead of a 0.05 or 0.01 significance level -> why we keep the significance level as small and strict as possible. Example: Stats professor concludes grades do improve (there is difference) based on particular teaching technique when the true situation is that it does not improve (there is no difference). Type 2 Error: -Retain/fail to reject the null when it should be rejected (research hypothesis is supported). -Beta = probability of a T2E. Example: School psychologist concludes that sticker charts do not lead to behavior improvements (there is no difference) when the true situation is that they do (there is difference).

Standard Error of the mean, sampling error

Variance of a sampling dist. of sample means o^2(M) = o^2 / N Standard error of a sampling dist. of sample means o(M) = square root of o^2 / N SAMPLING ERROR: The extent to which sample means selected from the same population differ from one another. Example: Population may be 3rd graders in school district, but we are comparing 2 different classes (Mr. Thomas and Mrs. Alex) of third graders. The sampling error is the extent to which the sample means selected from the same population differs from one another. Population being third graders but we have different sample means between these two classes. Example: A population is normally dist. with a mean of 56 and sd of 12. 1. What is the mean of the sampling dist. (mew(M)) for this population? Sample mean = pop. mean 56 2. If a sample of 36 participants (N=36) is selected from this population, what is its standard error of the mean (o(M))? o(M) = o / square root of N o(M) = 12 / square root 36 12 / 6 2


Set pelajaran terkait

nur 111 - Davis - Left and Right Heart Failure

View Set

Ibrahim Breaks the Idols/ قصّة سيّدنا إبراهيم: إِبْرَاهِيم يَكْسِرُ الأصْنَام Story 4

View Set

Six Sigma Green Belt-Value of Six Sigma

View Set