chapt. 6 How Do We Know When to Reject the Null Hypothesis? Alpha, P Values, Normal Distribution, Confidence Intervals, Power and Effect Sizes

¡Supera tus tareas y exámenes ahora con Quizwiz!

What Affects the CI?

• The population standard deviation • The sample size • The level of confidence (Alpha)

Some factors affecting power:

• alpha • the magnitude of the effect or the true difference • the standard deviations of the distributions • the sample size

Using a free online power calculator supplied by Statistical Solutions, we have the following result:

Calculate Sample Size [+] Choose which calculation you desire, either to determine a power or a sample size. See more for the power of the test below. Calculate Power ________________________________________ Enter a value for mu(0) [80] This should be the "known" mean value for your population. Enter a value for mu(1) [76] This should be the "expected" mean value from your sample. The delta between mu(0) and mu(1) is what you should consider a significant difference for the test. Enter a value for sigma [8] This should be the known sigma (standard deviation) for the population.

The Central Limits Theorem and the Normal Distribution (2)

The normal curve is sectioned by standard deviations. If we take random samples from a population, the area under the bell curve associated with the sample will be representative of the probabilities associates with the likelihood of obtaining a particular sample means. For example, 68 percent of the means taken form random samples of the population will fall within plus or minus one standard deviation of the population mean.

Alpha (Significance Level) (3)

We cannot conclude that we proved or disproved a hypothesis. If the statistical test indicates that the probability of obtaining a data set was less than the alpha level, it simply means that the data set would randomly occur less than 5 percent of the time.

Figure 3 depicts distribution curves from two samples and three samples.

• Note that the mean values are not equal, and that the distribution curves overlap. • The question researchers try to answer is "given the means and standard deviations for each data set, would we find this amount of difference between data sets more than 5% of the time. • If so, the statistical analyses would produce a p-value greater than .05. • If this divergence of data sets occurs at or less than5% of the time, researchers may postulate that this divergence may be due to the unique characteristics of each data set.

What affects the test statistic?

Size of the samples Alpha level (level of confidence set by the researcher....See below) Differences between the measures of central tendencies (means or medians)

Confidence Intervals (3)

When we are testing the null hypothesis using a t-test, we can also use the CI of the mean difference to determine if we have statistical significance. If the difference between the sample parameters lies outside of the 95% CI, we can conclude that we have significance at the .05 level. Why? Because the CI indicates an estimate of a range of mean values that lie within the center 95% of the normal distribution.

Alpha (Significance Level)

• Alpha (α) is the probability of a Type I error given that the null hypothesis is true. • Because of the many factors that can adversely affect the statistical outcome of a research study, there is always a degree of chance that researchers take. Researchers decide on an alpha level prior to data analysis. In most cases, alpha is set at .05 (p<.05).

Chlorhexidine-Alcohol versus Povidone-Iodine for Surgical-Site Antisepsis. Darouiche, Rabih O., et.al. (2010) Chlorhexidine-Alcohol versus Povidone-Iodine for Surgical-Site Antisepsis, Rabih O. Darouiche, Matthew J. Wall, Jr., Kamal M.F. Itani, Mary F. Otterson, Alexandra L. Webb, Matthew M. Carrick, Harold J. Miller, Samir S. Awad, Cynthia T. Crosby, Michael C. Mosier, Atef AlSharif, and David H. Berger, N Engl J Med 2010; 362:18-26, January 7, 2010

"The average baseline rate of surgical-site infection at the six participating hospitals was 14% after clean-contaminated surgery with povidone-iodine skin preparation, and we estimated that substituting chlorhexidine-alcohol for povidone-iodine would reduce this rate to 7%. Therefore, we planned to enroll approximately 430 patients in each study group who could be evaluated in order for the study to have 90% power to detect a significant difference in the rates of surgical-site infection between the two groups, at a two-tailed significance level of 0.05 or less." (p.20)

Effect of Physical Activity on Cognitive Function in Older Adults at Risk for Alzheimer Disease: A Randomized Trial, JAMA, September 3, 2008, Vol 300, No. 9 , Nicola T. Lautenschlager, MD; Kay L. Cox, PhD; Leon Flicker, MBBS, PhD; Jonathan K. Foster, DPhil; Frank M. van Bockxmeer, PhD; Jianguo Xiao, MD, PhD; Kathryn R. Greenop, PhD; Osvaldo P. Almeida, MD, PhD

"We collected 12-month prospective ADAS-Cog data on an independent sample of older adults with subjective memory complaints living in Perth. The results showed a mean (SD) increase of 3.5 (4.5) points over that period. Because the participants were more likely to be at risk of impairment than those in the independent sample, we then estimated that the ADAS-Cog scores of participants not receiving the physical activity intervention would deteriorate an additional 2.5 points (total, 6.0 points; SD, 4.5) per year. This is the smallest difference considered to be clinically meaningful in clinical treatment trials. The participation of 84 volunteers in each of the 2 groups (n=168) at baseline resulted in power of 90% with alpha set at .05. We estimated a dropout rate of 20%, which led to the recruitment of 170 participants with a power of 80% (85 randomly allocated to each group)." (page.1030)

The Central Limits Theorem and the Normal Distribution

Alpha levels and p-values are rooted in the central limits theorem. The central limit theorem states that given large enough samples from a population, the distribution of sample means taken from this population will approach a normal distribution (bell curve or normal curve).

Confidence Intervals (4)

It is much more common to use p-values instead of CI's to determine statistical significance. However, when dealing with odds ratios and relative risks, the CI's are used in concert with p-values. (See the section on odds ratios and relative risks) A 95% confidence interval indicates that there is a 95% chance that the population or test statistic is contained within the CI.

Alpha (Significance Level) (2)

NOTE: Again, note that the p value must be less than .05. This value indicates the maximum threshold of probability of obtaining a particular set of data that the researcher is willing to except, given the null hypothesis is correct. If a statistical test indicates that the probability of obtaining a particular data set was greater than the alpha level set by the researcher, the null hypothesis is accepted. If the statistical test indicates that the probability of obtaining a particular data set was equal to or less than the alpha level, the null hypothesis is rejected. But we must exercise caution.

alpha and p-values

Ninety five percent of the area under the bell curve is contained within +/- 1.96 standard deviations. Five percent of the remaining area lies outside of +/-1.96 standard deviations, with 2.5 percent of the area residing in each end of the curve (at the tails). So, if our sample mean is converted to a z score, and that z score is equal to +/- 1.96, we can conclude that this z score would only randomly occur 5 percent of the time.

Using a free online power calculator supplied by Statistical Solutions, we have the following result: (2)

One-sided test Choose whether your test is based on one-sided or two-sided criteria. A single specification limit or pass/fail is one-sided, alternately an upper and lower specification is two-sided. Two-sided test [+] Enter a value for alpha (a) (default is .05) [.05] The outcome of the test depends on whether the null hypothesis (H0) is true or false and whether you reject or fail to reject it. When H0 is true and you reject it, you make a Type I error. The probability (p) of making a Type I error is called alpha (a), or the level of significance of the test. When H0 is false and you fail to reject it, you make a Type II error. The probability (p) of making a Type II error is called beta (b). See more on the Type I and Type II errors below.

Using a free online power calculator supplied by Statistical Solutions, we have the following result: (3)

Power of the test (default is .80) [.80] The power of a test is the probability of correctly rejecting H0 when it is false. In other words, power is the likelihood that you will identify a significant effect when one exists. If you are solving for power, leave this field blank. If you are solving for sample size, it is recommended to leave this field at the default value of .80 and the associated sample size will be generated when you click the calculate button.

alpha and p-values (2)

Recall that the p-value indicates the probability level for a statistical occurrence. If we compare a sample mean to the population mean and the sample mean is equal to or greater than +/-1.96 standard deviations away from the population mean, we can conclude that there was a 5% or less chance that this difference randomly occurred. It is this logic that allows researchers to postulate that something else may have caused the sample mean to be so different.

Using a free online power calculator supplied by Statistical Solutions, we have the following result: (4)

Sample size for the test [56] If you are solving for sample size, leave this field blank. If you are solving for power, enter your desired sample size and the associated power will be generated when you click the calculate button. • We would need 56 subjects in our study to have a power of .80.

Problems with Null Hypothesis Significance Testing (2)

Schmitz (2007) further states that in order to gain any useful information from NHST, researchers must assure that: "Randomization has occurred." "Samples are of a reasonable size neither too small, nor too large to generate meaningless results." "A limited number of variables are being examined." "The selected alpha level takes into consideration the type of research and hypotheses being tested." "P-Values are accompanied by measures of effect size and/or confidence intervals."

Problems with Null Hypothesis Significance Testing

Schmitz (2007) highlighted criticisms of null hypothesis statistical testing (NHST). They are: "1. Researchers want to know if the null hypothesis is true or false. NHST tells us the probability of obtaining our data set given the null hypothesis is true. Note the subtle difference between the objective and the output. 2. All data sets are different. As we begin to understand how we use the number of subjects and the number of comparisons to determine the cut off for reaching statistical significance, we will find that given a large enough sample, researchers will eventually reach significance."

Confidence Intervals

The 95% confidence interval (CI) indicates the observed range in which 95/100 sample means taken from this population (mean, median) would fall.** CI also symbolizes cumulative incidence, a biostatistical measure. CI's are good estimates of the range in which we may find the population parameter.

Confidence Intervals (2)

The CI is always accompanied by a confidence level, usually the 95% confidence level. This confidence level is similar to selecting .05 as an alpha level. For example, we may find a study reporting a Relative Risk = 1.75 (95% CI= 1.11-2.34). We can interpret this RR as indicative of a 75% increase in risk, and a 95% confidence that the population risk is somewhere between 11% and 134%. (This interpretation is subject to debate)

Effect Size (3)

This measure (d, Cohen's d, D, or delta) is often referred to as the standardized effect size, calculated as the mean of group A minus the mean of group B divided by the pooled standard deviation. Interpretation of effect size has been defined as: 0.2 = small effect 0.5 = medium effect 0.8 = large effect • It would be extremely rare to find two samples that do not have any overlap. An effect size of .8 is an indication of 50% overlap. • When evaluating correlation outcomes, .2 is considered weak; .5 is considered moderate and .8 is considered strong.

Effect Size

• As noted, we have statistical significance when the p-value is less than the alpha level set by the researcher. • We also know that the p-value only addresses the probability of finding such a sample as it relates to the population distribution. • If our p-value is less than .05, we can conclude that given a null hypothesis that is true, we would only find such a sample data set less than five percent of the time.

Figure 3 depicts distribution curves from two samples and three samples. (3)

• But again, this significant p-value does not mean that there is a real difference between the two data sets. • The p-value below .05 indicates that the sample data set would only occur less than five percent of the time if the null hypothesis is correct. • Similar logic is used to detect the probability associated with more than two data sets. See diagram above.

The Central Limits Theorem and the Normal Distribution (3)

• Examination of the percentage of area contained within two standard deviations +/- indicates that 95.4 percent of the sample means exist within this area of the normal curve. • The probability of obtaining a mean value residing outside +/- two standard deviations is only 4.6 percent.

Figure 3 depicts distribution curves from two samples and three samples. (2)

• For example, assume that the solid line curve represents blood pressure values from the entire population, and the dotted curve represents blood pressure measures from a sample of individuals who are clinically obese. Do blood pressure measures differ from population to sample? We could use the data to conduct a Z-test or a one sample t-test. This test will consider the means and the variances of each group and produce a p-value indicating the probability that the difference is what we would expect by chance. With alpha set at .05 (p<.05), a p-value of .049 or less would indicate significance.

P-Values

• Many statistical tests are designed to product a p-value (probability value). • The p-value indicates the degree of extremeness associated with the data set, assuming the null hypothesis is true. • The p-value is compared to the alpha level set by the researcher. • As stated by Schmitz, (2007) "the p-value does not mean the probability that the null hypothesis is correct. It (the p-value) indicates the probability of obtaining our data, assuming the null is correct."

More About Power

• More power is needed if alpha is set below .05. ( p < 0.01 or <0.001). • If the difference between data sets is small, more power is required to detect such a difference. • Everything else being equal, the more subjects, the more power. • To increase power from 0.80 to 0.90, multiply N by 1.33. • To move alpha from α = 0.05 to 0.01 multiply N by 1.5.

Theory Behind a Test Statistic

• Most test statistics are reduced to the difference between measures of central tendencies divided by the size of these differences due to random error. • Usually this is reflected as the mean difference (numerator) divided by a modified version of variance (denominator). The denominator is also affected by the size of the samples.

Power (2)

• Researchers use the expected variance, the alpha level, and the magnitude of the effect size to determine how many subjects are needed to reach a certain power. • Power is not reported as often as it should be reported. This is in part due to researchers selecting sample sizes based on convenience. • Any study incorporating small sample sizes and failing to report the power of the study should be viewed with caution.

P-Values (2)

• Researchers will reject the null hypothesis because the probability of obtaining the particular data fell below the .05 level. But this does not mean that a real difference was proven. • A significant p value only tells us that any differences between the groups being compared are not a consequence of sampling error. • A significant p value does not tell us anything about the magnitude, importance or practicality of the differences.

Power

• The statistical power of a study is an indication of the study design's capability of avoiding false negatives, or making a Type II error, when the researchers are trying to detect a specific effect size. • As we have seen, researchers are trying to create a healthy balance between the possibilities of Type I and Type II errors.

More About Power (2)

• To move from 0.80 power and α = 0.05 to 0.90 power and α = 0.01, double N • As you can see, raising the power of a study will require significant increases in the sample size. Power is most often set a 0.80-0.09. • Specifically, power is an indication of the experimental design to detect an effect of a specified size. In the examples below, note that the researchers defined the specific effect size, the desired power and the alpha level.

Effect Size (2)

• We also know that the likelihood of finding statistical significance increases as our sample size increases; the larger the sample, the smaller the measures of central tendency between data sets need to be to reach statistical significance. • So, how do we determine if a significant difference is clinically significant? The calculation of effect size is helpful.

Example of Power & Sample Size Calculator

• We want to see if a sample of aerobic exercisers has a statistically different diastolic blood pressure when compared to the population. • We know the population mean for diastolic blood pressure equals 80 mm Hg. • We know the standard deviation for the population equals 8. • Based on past research, we suspect that we may see a 4 mm Hg drop in the exercise group. But we could se a rise in pressure. So, we decide to run a two-tailed t-test. • We want a power of .80 • What is our sample size?

What Do We Mean by "Significance"?

• When a statistical test indicates significance at the .05 level, we can assume that the likelihood of difference between data sets occurring by chance is low. • As stated by Schmitz (2007), "The term "significant" does not mean "a really important finding," or that a particularly large difference or relationship was found. A finding that falls below .01 ("highly significant") is not necessarily larger, smaller, or more important than one that falls just below .05 ("significant")." (


Conjuntos de estudio relacionados

TTC NUR205: MedSurgII Chapter 24 PrepU (Intestinal & Rectal Disorders)

View Set

BIO 220 LAB Practical Exam 1 CSUSB Lujan

View Set

Who was the first Englishman to sail around the world

View Set

Nature and Environment 9 клас

View Set

INTEGRATIONS - Texas Nurse Practice Act (PART 1&2) COMPLETE

View Set

Ethical Reasoning - Philosophy 150 Final Exam

View Set

HESI Dosage Calculations Practice Exam

View Set