Psych 2220 Chapter 6

¡Supera tus tareas y exámenes ahora con Quizwiz!

The Six Steps of Hypothesis Testing

1. Identify the populations, comparison distribution (like dist of means), and assumptions and then choose the appropriate hypothesis test 2. State the null and research hypotheses in both words and symbolic notation - Hypotheses are about populations, not about samples 3. Determine the characteristics of the comparison distribution - For z tests, we will determine the mean and standard error of the comparison distribution. These numbers describe the distribution represented by the null hypothesis and will be used when we calculate the test statistic. 4. Determine the critical values, or cutoffs, that indicate the points beyond which we will reject the null hypothesis 5. Calculate the test statistic - use the information from step 3 to calculate the test statistic, in this case the z statistic. We can then directly compare the test statistic to the critical values to determine whether the sample is extreme enough to warrant rejecting the null hypothesis 6. Decide whether to reject or fail to reject the null hypothesis - fail to reject = if the test statistic is not beyond the cutoffs - reject = test statistic is beyond the cutoffs (p value associated with the test statistic is smaller than the alpha level of 0.05)

way to increase statistical power from easiest to most difficult (either done post hoc, after study, or use electronic power calculators before the study to identify the sample size necessary to achieve a given level of power)

1. Increase alpha - (like changing the rules by widening the goal posts in football or the goal in soccer.) side effect of increasing the probability of a Type I error from 5% to 10% 2. Turn a two-tailed hypothesis into a one-tailed hypothesis - increase the chances of rejecting the null hypothesis, which translates into an increase in statistical power. 3. Increase sample size - leads to increase in the test statistic, making it easier to reject the null hypothesis. 4. Exaggerate the mean difference between levels of the independent variable - ex. studying the effectiveness of group therapy for social phobia, we increase the length of therapy from 3 months to 6 months. It's possible that a longer program leads to a larger change in means than would the shorter program. 5. Decrease standard deviation - We see the same effect on statistical power if we find a way to decrease the standard deviation as when we increase sample size. When standard deviation is smaller, standard error is smaller and the curves are narrower - curves can become narrower not just because the denominator of the standard error calculation is larger, but also because the numerator is smaller HOW? (1) by using reliable measures from the beginning of the study, thus reducing error, or (2) by sampling from a more homogeneous group in which participants' responses are more likely to be similar to begin with.

three different ways to identify the same point beneath the normal curve:

1. raw score 2. z score 3. percentile ranking

how can we decrease overlap (increase effect size)?

1. when means are farther apart (increase distance) 2. when variability within each distribution of scores is smaller. (decrease standard deviation)

conventional alpha level of 5% for a two-tailed test gives..

2.5% (0.025) area in each tail on the distribution. when you look this up, you get -1.96 and 1.96 - phrased differently, positive and negative 1.96 cutoff the most extreme 5% of the standard normal distribution

alpha level of 5% for a ONE-tailed test gives...

5% (0.05) in one tail of the distribution. when you look this up, you get 1.645 (or 1.65) for your z-score

what is true about a distribution of means vs scores?

A distribution of means is more tightly clustered (has a smaller standard deviation) than a distribution of scores. AND: A distribution of means has the same mean as a distribution of scores from the same population, BUT a smaller standard deviation. - why? When we plotted individual scores, each extreme score was plotted on the distribution. BUT when we plotted means, we averaged each extreme score with two other scores. So each time we pulled a score in the 7s, we tended to pull two lower scores as well; when we pulled a low score, such as 2.905, we tended to pull two higher scores as well. **- What do you think would happen if we created a distribution of means of 10 scores rather than 3? As you might guess, the distribution would be even narrower because there would be more scores to balance the occasional extreme score. **

standard normal distribution

A normal distribution with a mean of 0 and a standard deviation of 1. - notation: N(0,1) - a normal distribution of z scores - mean, median, and mode = 0 - variance = 1

Null Hypothesis (H0)

A statement of "no difference." - the "boring" one that posits no change or no difference between groups.

Three Assumptions for Conducting Analyses for Parametric Tests (z tests)

Assumption 1: The dependent variable is assessed using a scale measure - if it's clear that the dependent variable is nominal or ordinal, we can't make this first assumption and thus shouldn't use a parametric hypothesis test. Assumption 2: The participants are randomly selected - Every member of the population must have an equal chance of being selected for the study. This assumption is often violated; it is more likely that participants are a convenience sample. If we violate this second assumption, we must be cautious when generalizing from a sample to the population. Assumption 3: The distribution of the population of interest must be approximately normal. - bc hypothesis tests deal with sample means rather than individual scores, as long as the sample size is at least 30 (recall the discussion about the CLT), it is likely that this third assumption is met. ***ALSO: the independent variable must be nominal assumptions meaning: characteristics that we require the POPULATION from which we're sampling to have so we can make accurate inferences

what conclusion can we reach from obtaining a percentile ranking of 98.75%?

Based on this percentage, the average rating of a sample of women at your university is quite high. But we still can't arrive at a conclusion about whether these women are rated as significantly more attractive compared to the population average of 2.5 until we conduct a hypothesis test.

HARKing

Hypothesizing after results are known - presenting a post hoc hypothesis as if it were a priori, knowledge of the study results alters the set of hypotheses advanced in the reports introduction

confidence interval script

If we were to sample 20 U.S. teens (sample size) from the same population over and over, we would expect the population mean to fall in that interval 95% of the time. thus, it provides a range of plausible values for the population mean - the problem's confidence interval was 8.12-9.88. Note that the population mean for U.S. adults, 11, falls outside of this interval. So, it is not plausible that the sample of U.S. teens comes from the null hypothesis population of U.S. adults. reject null hypothesis

EXAMPLE: Let's calculate z scores without a calculator or formula. If the mean on a statistics exam is 70, the standard deviation is 10, and your score is 80, what is your z score? If your score is 50? What if your score is 85?

In this case, you are exactly 10 points, or 1 standard deviation, above the mean, so your z score is 1.0. - Now let's say your score is 50, which is 20 points, or 2 standard deviations, below the mean, so your z score is −2.0. - Now you're 15 points, or 1.5 standard deviations, above the mean, so your z score is 1.5.

As sample size increases, so does the test statistic (if all else stays the same) and it becomes easier to reject the null hypothesis. called small but statistically significant correlation

Increasing sample size always increases the test statistic if all else stays the same (the standard error decreases and the test statistic increases) - Because of this, a small difference might not be statistically significant with a small sample but might be statistically significant with a large sample.

Meeting the assumptions improves the quality of research, but not meeting the assumptions doesn't necessarily invalidate research.

Meeting the assumptions improves the quality of research, but not meeting the assumptions doesn't necessarily invalidate research.

Parametric vs Nonparametric Tests

Parametric: inferential statistical analyses based on a set of assumptions about the population Nonparametric: inferential statistical analyses that are not based on a set of assumptions about the population

point estimates

Single sample values used to estimate population parameters - a summary statistic from a sample that is just one number used as an estimate of the population parameter - rarely exactly accurate

Comparing apples to oranges (turning everything to mangoes)

Standardization allows us to compare apples with oranges. If we can standardize the raw scores on two different scales, converting both scores to z scores, we can then compare the scores directly. - We can take any apple from a normal distribution of apples, find its z score using the mean and standard deviation for the distribution of apples, convert the z score to a percentile, and discover that a particular apple is, say, larger than 85% of all apples. Similarly, we can take any orange from a normal distribution of oranges, find its z score using the mean and standard deviation for the distribution of oranges, convert the z score to a percentile, and discover that this particular orange is, say, larger than 97% of all oranges. The orange (with respect to other oranges) is bigger than the apple (with respect to other apples), and yes, that is an honest comparison. With standardization, we can compare anything, each relative to its own group.

Misinterpreting Statistical Significance

Statistical significance that is achieved by merely collecting a large sample can make a research finding appear to be far more important than it really is (like how a curved mirror can exaggerate a person's size)

determine the percentage associated with a given z statistic

Step 1: Convert a raw score into a z score. Step 2: Look up a given z score on the z table to find the percentage of scores between the mean and that z score.

The scores at least as extreme as a particular z score

THIS IS IN BOTH DIRECTIONS EX. Jessica has a z-score of 0.98. what percentage of scores are at least as extreme as hers? 16.35% but you must remember to multiply by 2 because the curve is symmetric. ANSWER: 32.7%

Confidence level vs confidence interval

The confidence LEVEL is 95%, but the confidence INTERVAL is the range between the two values that surround the sample mean

file drawer problem

The idea that reviews and meta-analyses of published literature might overestimate the support for a theory, because studies finding null effects are less likely to be published than studies finding significant results, and are thus less likely to be included in such reviews. SOLUTIONS: 1. file drawer analysis: a statistical calculation, following a meta-analysis, of the number of studies with null results that would have to exist so that a mean effect size would no longer be statistically significant 2. replication or reproducibility

alpha levels (p levels)

The probabilities used to determine the critical values, or cutoffs, in hypothesis testing. TYPE 1 ERROR RATE (set before the study) - "acceptable" risk of a Type 1 error when we run a study and conduct analyses. by convention, this is usually 0.05 or 5% (5% rate for type 1 errors) - the alpha level that we set, along with the type of test (one-tailed or two-tailed) lets us determine our critical value(s) and critical region(s)

Making a Decision About a Hypothesis

We decide to fail to reject the null hypothesis: - we can only say that we do not have evidence to support our hypothesis we choose between these, we infer something about pop means based on samp means

when do we use z tests in real life?

We only use z tests when we have one sample and know the population mean and standard deviation (or standard error), a rare situation. ex. do people consume fewer calories when they know how many calories are in their favorite latte and doughnut? We can use the z test to compare the average numbers of calories that customers consume when calorie counts are either posted or not posted on their menu boards

distribution of means (3 properties)

a distribution composed of many means that are calculated from all possible samples of a given size, all taken from the same population - basically, the numbers that make up the distribution of means are not individual scores; they are means of samples of individual scores ****A distribution of means reduces the influence of individual outliers**** ****The larger the sample size, the smaller the spread of the distribution of means.**** 1. As sample size increases, the mean of a distribution of means remains the same. 2. The standard deviation of a distribution of means (called the standard error) is smaller than the standard deviation of a distribution of scores. As sample size increases, the standard error becomes ever smaller. 3. The shape of the distribution of means approximates the normal curve if the distribution of the population of individual scores has a normal shape or if the size of each sample that makes up the distribution is at least 30 (the central limit theorem).

Central Limit Theorem (CLT)

a distribution made up of the MEANS of many samples (rather than individual scores) approximates a normal curve (although with less variance), even if the underlying population is not normally distributed, if the samples are composed of AT LEAST 30 scores 2 principles: 1. repeated sampling approximates a normal curve, even when the original population is not normally distributed 2. A distribution of means is less variable than a distribution of individual scores. ****reduces the influence of individual outliers****

cohen's d

a measure of effect size that expresses the difference between two means in terms of standard deviation (the standardized difference between means.) - FOR Z-TESTS small effect (0.2) medium effect (0.5) large effect (0.8)

z distribution

a normal distribution of standardized scores—a distribution of z scores. - we convert raw scores to z scores and z scores to percentiles using the z distributions - z distribution forms a normal curve with a unimodal, symmetric shape - 100% of the population falls beneath the normal curve

meta-analysis`

a study that involves the calculation of a mean effect size from the individual effect sizes of more than one study - provides added statistical power by considering multiple studies simultaneously and helps to resolve debates fueled by contradictory research findings STEPS: 1. Select the topic of interest and decide exactly how to proceed before beginning to track down studies. 2. Locate every study that has been conducted and meets the criteria. - key part of meta-analysis is finding any studies that have been conducted but have not been published (unpublished b/c the studies didn't find a significant difference). overall effect size seems larger without these studies 3. Calculate an effect size, often Cohen's d, for every study. 4. Calculate statistics—ideally, summary statistics, a hypothesis test, a confidence interval, and a visual display of the effect sizes - calculate a MEAN EFFECT SIZE

Standardization

a way to create meaningful comparisons as we convert individual scores from diff normal distributions to a shared normal distribution with a known mean, standard deviation, and percentiles USING Z-SCORES - diff variables of raw scores -> z-scores -> % of population that falls above or below the z-score

confidence interval

an interval estimate based on a sample statistic; it includes the population mean a certain percentage of the time if the same population is sampled from repeatedly - We are NOT saying that we are confident that the population mean falls in the interval; we are merely saying that we expect to find the population mean within a certain interval a certain percentage of the time—usually 95%—when we conduct this same study with the same sample size. - centered on the mean of the SAMPLE - indicating the 95% that falls between the two tails (i.e., 100%−5%=95% - the range of values within which a population parameter is estimated to lie

statistically significant

an observed effect so large that it would rarely occur by chance - the data differ from what we would expect by chance if there were, in fact, no actual difference (null hypothesis is true) - does not necessarily mean that the finding is important or meaningful.

interval estimate

based on a sample statistic and provides a range of plausible values for the population parameter - adding and subtracting a margin of error from a point estimate ex. "Whatever" was chosen by 47% of respondents with a margin of error of 3.2%, so the interval estimate is 43.8% to 50.2% - when intervals overlap: this means that it's plausible that the two phrases are perceived as equal—in this case, equally annoying—in the population. - when intervals DON'T overlap: we conclude that the population means are likely different. In the population, it seems that "whatever" really is more annoying than "you know."

normal curve

bell-shaped curve that is unimodal, symmetric, and defined mathematically - can be used to identify cheaters in form of extreme scores (tails) - determines probabilities about data and let's us draws conclusions - as sample size increases, the shape of distribution becomes more and more like a normal curve (as long as the underlying population distribution is normal)

even though we had nondirectional hypotheses, we can report the direction of the finding

even though we had nondirectional hypotheses, we can report the direction of the finding

type 2 error

failing to reject a false null hypothesis. An effect does exist out in the world, but we don't detect it. - we commit a Type II error when we fail to reject the null hypothesis given the null hypothesis is false. - finding nothing when something is there !!!!!! - results in a failure to take action ex. FALSE NEGATIVE in medical testing. getting a negative result on pregnancy test (failing to reject) but you're actually pregnant

cohen's d script

for example, let's say d = -1.00. 1. we know that the two sample means are 1.00 standard deviation apart

reject the null hypothesis script

given that the null hypothesis is true, data as extreme or more extreme than ours are unlikely to have occurred. Therefore, reject H0 - conditional probability

two-tailed test

hypothesis test in which the research hypothesis does not indicate a direction of the mean difference or change in the dependent variable, but merely indicates that there will be a mean difference. - more common than one-tailed tests

one-tailed test

hypothesis test in which the research hypothesis is directional, positing either a mean decrease or a mean increase in the dependent variable, but not both, as a result of the independent variable - used only when the researcher is absolutely certain that the effect cannot go in the other direction OR the researcher would not be interested in the result if it did. EX. H0: μ1≥μ2 H1: μ1<μ2

If a researcher always sets the critical region as 8% of the distribution, and the null hypothesis is true, how often will he reject the null hypothesis if the null hypothesis is true?

if the null hypothesis is true, he will reject it 8% of the time

Effect size (Cohen's d script)

indicates the size of a difference and is UNAFFECTED by sample size (strength of an association) practical importance (not always though, small effect could still have large importance) - tell us whether a statistically significant difference might also be an important difference. we want a big effect size - small effect size = not good - based only on the variability in the distribution of scores (so we can compare the effect sizes of different studies with each other, even when the studies have different sample sizes) ***tells us how much two populations do not overlap. the less overlap, the bigger the effect size. ****standardized measure based on distributions of scores rather than distributions of means.

with smaller samples we are more likely to...

make type 1 AND type 2 errors solution?? Use a power calculator, such as G*Power, to plan your sample size based on the expected effect size before the study begins.

p value

probability of finding this particular test statistic, or more extreme than, if the null hypothesis is true—that is, if there is no difference between means - pvalue of 0.05 is the same as 5%

Statistical power

probability that we will reject the null hypothesis when we should reject the null hypothesis—the probability that we will not make a Type II error. - Researchers consider a probability of 0.80—an 80% chance of rejecting the null hypothesis if we should reject it—to be the minimum for conducting a study. - power analysis prior to conducting a study. - Power = Effect Size × Sample Size (conceptually). This means that we could achieve high power because the size of the effect is large- or, we could achieve high power because the size of the effect is small, but it is a large sample

how do z scores relate to raw scores and percentile ranks?

raw scores are used to compute z scores, and z scores are used to determine what percentage of scores fall below and above that particular position on the distribution. a z score can also be used to compute a raw score

significant

reject the null

Why doesn't the z-table tell you probabilities for negative z-scores?

remember that the normal curve is symmetric: One side always mirrors the other. So, the percentages are the same for negative and positive z scores.

forest plot

shows the confidence interval for the effect size of every study

Larger sample size means what?

smaller standard error

Research Hypothesis (H1, alternative hypothesis)

the "exciting" one that posits that a given intervention will lead to a change or a difference ex. for instance, that a particular kind of psychotherapeutic intervention will reduce general anxiety.

critical region

the area in the tail(s) of the comparison distribution in which the null hypothesis can be rejected - the shaded region in the picture

z-score

the number of standard deviations a particular score is from the mean EX. your score on midterm is 2 standard deviations above mean, your z-score is 2.0...a way to create meaningful comparisons - z-score is part of its own distribution, the z distribution (like a raw score is part of its own distribution ex. a person's height part of a distribution of heights) - z distribution has a mean of 0, always has a standard deviation of 1 - we need mean and standard deviation of population of interest - z scores can be transformed into percentiles.

critical values (cutoffs)

the test statistic values beyond which we reject the null hypothesis. - critical values of the comparison distribution indicate how extreme the data must be, in terms of the z statistic, to reject the null hypothesis - we determine two cutoffs: one for extreme samples below the mean and one for extreme samples above the mean. arbitrary standard—the most extreme 5% of the comparison distribution curve: 2.5% on either end. ex. 1.96 and -1.96 are critical values

σM (Standard error: standard deviation of the distribution of means)

the typical amount that a sample mean varies from the population mean. - standard error is the name for the standard deviation of a distribution of means. - extreme scores are balanced by less extreme scores when means are calculated

p-hacking

the use of questionable research practices to increase the chances of achieving a statistically significant result - ex. create criteria for removing certain scores, outliers, for example—after initial analyses have been performed, rather than before. They may analyze the data repeatedly throughout the data collection process—after collecting data from 30 participants, then 60, and so on—and stop data collection once they reject the null hypothesis - "play" with their data until it edges under the arbitrary criterion of p<0.05,

Robust hypothesis tests

those that produce fairly accurate results even when the data suggest that the population might not meet some of the assumptions - pretty accurate even if you violate the assumptions

As the size of the sample approaches the size of the population, the shape of the distribution tends to be normally distributed.

true

Type 1 error

we commit a Type I error when we reject the null hypothesis given the null hypothesis is true ex. FALSE POSITIVE in a medical test (finding something when nothing is there!! - results in taking action when you didn't need to (telling fam you are pregnant), more hurtful

why can we make meaningful comparisons with the normal curve?

when data are normally distributed, we can compare one particular score to an entire distribution of scores BY converting the raw score to a standardized score (for which percentiles are already known)

A consumer advocate is interested in evaluating the claim that a new granola cereal contains 4 ounces of cashews in every bag. The advocate recognizes that amounts of cashews will vary slightly from bag to bag, but she suspects that the mean amount of cashews per bag is less than 4 ounces. To check the claim, the advocate purchases a random sample of 40 bags of cereal and calculates a sample mean of 3.68 ounces of cashews. She calculates the z score as -2.59 for testing H0: μ = 4 versus Ha: μ < 4. Can she conclude that the bags are incorrectly labeled at an alpha level of 0.05?

yes

Z-statistic formula (Z-score for a sample mean, rather than for an individual score)

z = (M − μM) / (σM) - tells us how many standard errors a sample mean is from the population mean. M = mean of the sample μM = mean of distribution of means σM = standard error (the standard deviation of the distribution of means)

steps of calculating z-score formula

z-score = (X−μ) / σ where X = particular score μ = population mean σ = population standard deviation

μM meaning

μM = mean of a distribution of means - μ indicates that it's the mean of a population, & the subscript M indicates that the population is composed of sample means

formula that lets us know exactly how much smaller the standard error is compared to the standard deviation (standard error calculation)

σM= (σ )/ (√N) - this occurs b/c the larger the sample size, the narrower the distribution of means and the smaller the standard deviation of the distribution of means (standard error) b/c outliers have less influence σ = standard deviation for population N = sample size


Conjuntos de estudio relacionados

PSYCH ASSESS - CH. 5 (RELIABILITY)

View Set

Quiz 1 (Ch 1 & Ch 2) Business Law

View Set

Analyzing an Autobiographical Essay 100%

View Set

Series 7 - Mastery Exam III #2 (Q1 - Q110)

View Set

Organizational Behavior (Chapter 1)

View Set