Biostatistics quiz #2 review

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

CIs (confidence intervals) answer the question...

"What is the size of that difference between those groups?" -How large is the difference between groups? *Can't have CI without P-value

Type II error

(false-negative) (regular hypothesis) -There really is a difference between populations but random sampling leads to a small difference that isn't statistically significant -You fail to reject null when it is actually false -False negative -Bugs are bigger in native plants but we found no difference

Type I error

(false-positive) (regular hypothesis) -There is actually no difference but random sampling lead to a difference in the samples -You reject null when the null is actually true -False positive -Bugs are the same size in both plots but our sample showed bigger bugs in native plants

Differences of P- values & confidence intervals CIs

-CI provides the same info as statistical test, plus more -CI reminds reader of variability -CI more clearly shows influence of sample size

Similarities of P- values & confidence intervals CIs

-Neither account for bias -Statistically equivalent

P values are:

-Proportion between 0 and 1 **always shown as a decimal -Probability of obtaining the data (or data showing as great or greater difference from the null hypothesis) if the null hypothesis were true -Small p-value indicates strong evidence against the null hypothesis

P-Values and Confidence intervals

-Related! -If 95% confidence intervals for 2 samples do not overlap, P-value will be < 0.05 -If 90% confidence intervals for 2 samples do not overlap, P-value will be <0.10 -If 99% confidence intervals for 2 samples do not overlap, P-value will be < 0.01

5 questions to ask yourself when determining if there is an outlier

1.) Data entry mistake? 2.) Did you enter an incorrect code in a program 3.) Did you notice any issues during the experiment? 4.) Could it just be biological variability? 5.) Is it possible that the distribution is not gaussian

What are the steps to hypothesis testing?

1.) Hypothesis (set up H0 and Ha) 2.) Significance 3.) Sample 4.) P-value 5.) Decide

What does Beta (b) equal?

1.0 minus power

No. P values are fractions, so they are always between 0.0 and 1.0.

A P value can be very small but can never equal 0.0. If you see a report that a P value equals 0.0000, that really means that it is less than 0.0001.

Can a P value equal 1.0?

A P value would equal 1.0 only in the rare case in which the treatment effect in your sample precisely equals the one defined by the null hypothesis. When a computer program report

By how much do you need to increase the sample size to make a CI half as wide?

A general rule of thumb is that increasing the sample size by a factor of 4 will cut the expected width of the CI by a factor of 2. (Note that 2 is the square root of 4.)

What is an outlier?

A value that is so far from others that it appears to have come from a different population

Why do I have to specify the effect size for which I am looking? I want to detect any size effect.

All studies will have a small power to detect tiny effects and a large power to detect enormous effects. You can't calculate power without specifying the effect size for which you are looking

hould P values be reported as fractions or percentages?

By tradition, P values are always presented as fractions and never as percentages.

How to calculate power?

Compute it in a machine, but you need the SD, sample size, and the alpha value (what is significant/ not significant), size of difference you are hoping to detect

Is the concept of statistical hypothesis testing about making decisions or about making conclusions?

Decision-making. The system of statistical hypothesis testing makes perfect sense when it is necessary to make a crisp decision based on one statistical analysis. If you have no need to make a decision from one analysis, then it may not be helpful to use the term statistically significant.

What is a type II error?

False-Negative

What is a type I error?

False-Positive

How is a QQ plot created?

First the program computes the percentile of each value in the data set. Next, for each percentile, it calculates how many SDs from the mean you need to go to reach that percentile on a Gaussian distribution. Finally, using the actual mean and SD computed from the data, it calculates the corresponding predicted value.

Null hypothesis

H0

Alternative hypothesis

Ha

My P value to nine digits is 0.050000000. Can I call the result statistically significant?

Having a P value equal 0.05 (to many digits) is rare, so this won't come up often. It is just a matter of definition. But I think most would reject the null hypothesis when a P value is exactly equal

If the 95% CI just barely reaches the value that defines the null hypothesis, what can you conclude about the P value?

If the 95% CI includes the value that defines the null hypothesis, you can conclude that the P value is greater than 0.05. If the 95% CI excludes the null hypothesis value, you can conclude that the P value is less than 0.05. So if the 95% CI ends right at the value that defines the null hypothesis, then the P value must equal 0.05.

The 99% CI includes the value that defines the null hypothesis, but the P value is reported to be < 0.05. How is this possible?

If the 99% CI includes the value that defines the null hypothesis, then the P value must be greater than 0.01. But the P value was reported to be less than 0.05. You can conclude that the P value must be between 0.01 and 0.0

Interpreting a high P value (normality test)

If the P value from a normality test is large, all you can say is that the data are not inconsistent with a Gaussian distribution. Ugh! Statistics requires the use of double negatives. A normality test cannot prove the data were sampled from a Gaussian dis- tribution. All the normality test can do is demonstrate that the deviation from the Gaussian ideal is not more than you'd expect to see from chance alone.

Interpreting a small P value (normality test)

If the P value from the normality test is small, you reject that null hypothesis and so accept the alternative hypothesis that the data are not sampled from a Gaussian population. But this is less useful than it sounds. As explained at the beginning of this chapter, few variables measured in scientific studies are completely Gaussian. So with large sample sizes normality tests will almost always report a small P value even if the distribution only mildly deviates from a Gaussian distribution

why is the observed power 50% when the P value equals 0.05?

If the P value is 0.05 in one experiment, that is your best guess for what it will be in repeat experiments. You expect half the P values to be higher and half to be lower. Only that last half will lead to the conclusion that the result is statistically significant, so the power is 50%.

If the P value is large and there are outliers

If the P value is high, you have no evidence that the extreme value came from a different distribution than the rest. This does not prove that the value came from the same distribution as the others. All you can say is that there is no strong evidence that the value came from a different distribution

High power (basement example)

If the child spent a long time looking for a large tool in an organized base- ment, there is a high chance that he would have found the tool if it were there. So you could be quite confident of his conclusion that the tool isn't there. Similarly, an experiment has high power when you have a large sample size, are looking for a large effect, and have data with little scatter (small SD). In this situation, there is a high chance that you would have obtained a statistically significant effect if the effect existed.

Low power

If the child spent a short time looking for a small tool in a messy basement, his conclusion that the tool isn't there doesn't really mean very much. Even if the tool were there, he probably would not have found it. Similarly, an experiment has little power when you use a small sample size, are looking for a small effect, and the data have lots of scatter. In this situation, there is a high chance of obtaining a conclusion of "not statistically significant" even if the effect exists.

Statistical significance:

If there really is no difference between population means, there is a 5% chance of obtaining a statistically significant result. That is the definition of sta- tistical significance (using the traditional 5% significance level).

If the P value is small and there are outliers

If this P value is small, conclude that the outlier is not from the same distribution as the other values. Assuming you answered no to all five questions previously listed, you have justification to exclude it from your analyses.

How to narrow the CI in a study?

Increase the sample size

What if I want to make the CI one-quarter as wide as it is?

Increasing the sample size by a factor of 16 will be expected to reduce the width of the CI by a factor of 4. (Note that 4 is the square root of 16.)

What can cause outliers?

Invalid data entry, biological diversity, random chance, experimental mistakes, invalid assumption

P-values answer the question...

Is there a statistically significant difference between two groups?" -Is there a difference between groups

When is an effect large enough to care about, to be scientifically significant (important)?

It depends on what you are measuring and why you are measuring it. This question can only be answered by someone familiar with your particular field of science. It is a scientific question, not a statistical one.

The 99% CI includes the value that defines the null hypothesis, but the P value is reported to be < 0.01. How is this possible?

It is inconsistent. Perhaps the CI or P value was calculated incorrectly. Or perhaps thedefinitionofthenullhypothesisincludedinthe99%CIisnotthesamedefinitionused to compute the P value.

If confidence intervals do not overlap

It is not statistically significant

If confidence intervals overlap

It is statistically significant

Statistical significance vs biological significance

Just because something occurred by chance doesn't mean it is important or has biological significance

High power

Large sample size • Looking for large effect • Little scatter

It is better to rely on the median or mean when there are possible outliers?

Median because the mean changes if there is an extreme outlier

The problem of multiple comparisons shows up in many situations:

Multiple end points ◆ Multiple time points ◆ Multiple subgroups ◆ Multiple geographical areas ◆ Multiple predictions ◆ Multiple geographical groups ◆ Multiple ways to select variables in multiple regression ◆ Multiple methods of preprocessing data

Is it better to have a wider or narrower CI?

Narrower

If a P value is greater than 0.05, can you conclude that you have disproven the null hypothesis?

No

When a P value is less than 0.05 and thus the comparison is deemed to be statistically significant, can you be 95% sure that the effect is real

No! It depends on the situation, on the prior probability

If I perform many statistical tests, is it true that the conclusion "statistically significant" will be incorrect 5% of the time?

No! That would only be true if the null hypothesis is, in fact, true in every single experiment. It depends on the scientific context

Do normality tests determine whether the data are parametric or nonparametric?

No! The terms parametric and nonparametric refer to statistical tests, not data distributions.

Shouldn't P values always be presented with a conclusion about whether the results are statistically significant?

No. A P value can be interpreted on its own. In some situations, it can make sense to go one step further and report whether the results are statistically significant (as will be explained in Chapter 16). But this step is optiona

Are the P value and α the same?

No. A P value is computed from the data. The significance level α is chosen by the experimenter as part of the experimental design before collecting any data. A difference is termed statistically significant if the P value computed from the data is less than the value of α set in advance.

Does it make sense to ask whether a particular data set is Gaussian?

No. A common misconception is to think that the normality test asks whether a particular data set is Gaussian. But the term Gaussian refers to an entire distribution or population. It only makes sense to ask about the population or distribution from which your data were sampled. Normality tests ask whether the data are consistent with the assumption of sampling from a Gaussian distribution.

Is it possible to compute a CI of a P value?

No. CIs are computed for parameters like the mean or a slope. The idea is to express the likely range of values that includes the true population value for the mean or slope etc. The P value is not an estimate of a value from the popula- tion. It is computed for that one sample. Since it doesn't make sense to ask what the overall P value is in the population, it doesn't make sense to compute aCIofaPvalue.

Two of the examples came from papers that reported CIs, but not P values or conclu- sions about statistical significance. Isn't this incomplete reporting?

No. In many cases, knowing the P value and a conclusion about statistical significance really adds nothing to understanding the data. Just the opposite. Conclusions about statistical significance often act to reduce careful thinking about the size of the effect

Does a normality test decide whether the data are far enough from Gaussian to require a nonparametric statistical test?

No. It is hard to define what "far enough" means, and the normality tests were not designed with this aim in mind.

Can P values be negative?

No. P values are fractions, so they are always between 0.0 and 1.0.

But isn't the whole point of statistics to decide when an effect is statistically significant?

No. The goal of statistics is to quantify scientific evidence and uncertainty.

My P value to four digits is 0.0501. Can I round to 0.05 and call the result statistically significant?

No. The whole idea of statistical hypothesis test is to make a strict criterion (usually at P=0.05) between rejecting and not rejecting the null hypothesis. Your P value is greater than α, so you cannot reject the null hypothesis and call the results statistically significant.

Is a one-tailed P value always equal to half the two-tailed P value?

Not always. Some distributions are asymmetrical. For example, a one-tailed P value from a Fisher's exact test (see Chapter 27) is usually not exactly equal to half the two-tailed P value. With some data, in fact, the one- and two-tailed P values can be identical. This event is rare. Even if the distribution is symmetrical (as most are), the one-tailed P value is only equal to half the two-tailed value if you correctly predicted the direction of the difference (correlation, association, etc.) in advance. If the effect actually went in the opposite direction to your prediction, the one-tailed P will not be equal to half the two-tailed P value. If you were to calculate this v

Should a normality test be run as part of every experiment?

Not necessarily. You want to know whether a certain kind of data are consistent with sampling from a Gaussian distribution. The best way to find out is to run a special experiment just to ask about the distribution of data collected using a particular method. This experiment would need to generate plenty of data points but would not have to make any comparisons or ask any scientific questions. If analysis of many data points convinces you that a particular experimental protocol generates data that are consistent with a Gaussian distribution, there is no point in testing smaller data sets from individual runs of that experiment.

Is it ever possible to know for sure whether a particular data set was sampled from a Gaussian distribution?

Not unless the data were simulated. Otherwise, you can never know for sure the distribution from which the data were sampled.

Is α the probability of rejecting the null hypothesis?

Only if the null hypothesis is true. In some experimental protocols, the null hypothesis is often true (or close to it). In other experimental protocols, the null hypothesis is almost certainly false. If the null hypothesis is actually true, α is the probability that random sampling will result in data that will lead you to reject the null hypothesis, thus making a Type I error

Does the context of the experiment (the prior probability) come into play when decid- ing whether a result is statistically significant?

Only if you take into account prior probability when deciding on a value for α. Once you have chosen α, the decision about when to call a result "statisti- cally significant" depends only on the P value and not on the context of the experiment.

Mistake: Believing that a study design has a single power

Power can be computed for any proposed effect size. So there are a range of power values. If you want to compute a single power value, you must ask, "What is the power of this experimental design to detect an effect of a specified size?" The effect size might be a differen

I chose to use a one-tailed P value, but the results came out in the direction opposite to my prediction. Can I report the one-tailed P value calculated by a statistical program

Probably not. Most statistical programs don't ask you to specify the direction of your hypothesis and report the one-tailed P value assuming you correctly pre- dicted the direction of the effect. If your prediction was wrong, then the correct one-tailed P value equals 1.0 minus the P value reported by the program.

Kurtosis

Quantifies whether the tails of the data distribution matches the Gaussian distribution. A Gaussian distribution has a kurtosis of zero; a distribution with fewer values in the tails than a Gaussian distribution has a negative kurtosis; and a distribution with more values in the tails (or values further out in the tails) than a Gaussian distribution has a positive kurtosis.

Robust statistical test

Rather than eliminate outliers, an alternative strategy is to use statistical methods that are designed to ensure that outliers have little effect on the results. Methods of data analysis that are not much affected by the presence of outliers are called robust. You don't need to decide when to eliminate an outlier, because the method is designed to accommodate them. Outliers just automatically fade away. The simplest robust statistic is the median. If one value is very high or very low, the value of the median won't change, whereas the value of the mean will change a lot. The median is robust; the mean is not.

Low Power

Small sample size • Looking for small effect • Lots of scatter

If 2 CI and P overlap that shows

Statistical significance

My two-tailed P value is not low enough to be statistically significant, but the one- tailed P value is. What do I conclude?

Stop playing games with your analysis. It is only OK to compute a one-tailed P value when you decided to do so as part of the experimental protocol

Normality tests

Tests to see how far the data set deviates from the expectations of the Gaussian Distribution

Who invented the threshold of P < 0.05 as meaning statistically significant?

That threshold, like much of statistics, came from the work of Ronald Fisher

If the 95% CI is centered on the value that defines the null hypothesis, what can you conclude about the P value?

The observed outcome equals the value that defines the null hypothesis. In this case, the two-tailed P value must equal 1.000.

If you conclude that a result is not statistically significant, it is possible that you are making a Type II error as a result of missing a real effect. What factors influence the chance of this happening?

The probability of a Type II error depends on the significance level (α) you have chosen, the sample size, and the size of the true effect.

What is the Bonferroni correction

The simplest approach to achieving a familywise error rate is to divide the value of α (often 5%) by the number of comparisons. Then define a particular compari- son as statistically significant only when its P value is less than that ratio. This is called a Bonferroni correction. (Divide alpha by the number of comparisons)

Alternative hypothesis example

There is a difference in insect size between native plants and porcelain berry on Purchase College campus

Why is statistical hypothesis testing so popular?

There is a natural aversion to ambiguity. The crisp conclusion "The results are statistically significant" is more satisfying to many than the wordy conclusion "Random sampling would create a difference this big or bigger in 3% of experiments if the null hypothesis were true."

Null hypothesis example

There is no difference in insect size between native plants and porcelain berry on Purchase College campus

What is the difference between Type I and Type II errors?

Type I errors reject a true null hypothesis. Type II errors accept a false null hypothesis.

When does it make sense to compute power?

When deciding how many subjects (or experimental units) you need When evaluating or critiquing experiments

What is a QQ plot?

When people prefer to plot the cumulative frequency distributions, maybe transform the axes so that a Gaussian distribution becomes linear The Y axis plots the actual values in a sample of data. The X axis plots ideal predicted values assuming the data were sampled from a Gaussian distribution.

When does it make sense to do calculations involving power?

When planning a study, you need to decide how large a sample size to use. Those calculations will require you to specify how much power you want to detect some hypothetical effect size. After completing a study that has results that are not statistically significant, it can make sense to ask how much power that study had to detect some specified effect size.

Why do some investigators present the negative logarithm of P values?

When presenting many tiny P values, some investigators (especially those doing genome-wide association studies; see Chapter 22) present the negative logarithm of the P value to avoid the confusion of dealing with tiny fractions. For example, if the P value is 0.01, its logarithm (base 10) is -2, and the nega- tive logarithm is 2. If the P value is 0.00000001, its logarithm is -8, and the negative logarithm is 8. When these are plotted on a bar graph, it is called a Manhattan plot (Figure 15.3). Why that name? Because the arrangement of high and low bars make the overall look vaguely resembles the skyline of Manhattan, New York (maybe you need to have a few drinks before seeing that resemblance).

Isn't it possible to look at statistical hypothesis testing as a way to choose between alternative models?

Yes

Can a one-tailed P value have a value greater than 0.50?

Yes, this happens when the direction of the treatment effect is opposite to the prediction.

Is a P value always associated with a null hypothesis?

Yes. If you can't state the null hypothesis, then you can't interpret the P value.

Is it possible to report scientific data without using the word significant?

Yes. Report the data, along with CIs and perhaps P values. Decisions about statistical significance are often not helpful.

Is the False Positive Report Probability (FPRP) the same as the False Positive Risk (FPR) and the False Discovery Rate (FPRP)?

Yes. The FPRP and FPR mean the same thing. The FDR is very similar, but usually is used in the context of multiple comparisons rather than interpreting a single P value.

Does your choice of a value for α influence the calculated value of the FPRP?

Yes. Your decision for a value of α determines the chance that a result will end up in the first or second column of Table 18.1.

How is it possible for an effect to be statistically significant but not scientifically signifi- cant (important)?

You reach the conclusion that an effect is statistically significant when the P value is less than 0.05. With large sample sizes, this can happen even when the effect is tiny and irrelevant. The small P value tells us that the effect would not often occur by chance but says nothing about whether the effect is large enough to care about.

40. If 90% confidence intervals for 2 samples do not overlap, P-value will be ___?

alpha: .10 (10%)

39. If 95% confidence intervals for 2 samples do not overlap, P-value will be ___?

alpha= 0.05 (5%)

Mistake: Believing that if a result is statistically significant, the effect must be _______

large

If 99% confidence intervals for 2 samples do not overlap, P-value will be ___?

more than the alpha value (alpha changes) alpha= 0.01 (1%)

If 2 CL and P don't over lap that shows

not statistical significance

Null vs. alternate hypothesis

null hypothesis - no effect, no difference, results will be similar amongst IV levels alternate hypothesis - will have an effect, results will differ between levels of IV

The best way to deal with multiple comparisons

plan studies with focused hypotheses and defined methods to account for multiple comparisons in analyses.

Skewness

quantifies symmetry. A symmetrical distribution (such as the Gaussian) has a skewness of zero. An asymmetrical distribution with a heavy right tail (more large values) has a positive distribution, and one with a heavy left tail (more small numbers) has a negative skew.

Avoid looking at overemphasis on _______ aka "stargazing" in some scientific papers since things are either statistically significant or not

statistical significance

P values and CIs can help you state _______ without using the words ________

statistical significance

the power of the experiment, depends on four values:

• The sample size • The amount of scatter (if comparing values of a continuous variable) or starting proportion (if comparing proportions) • The size of the effect you hypothesize exists • The significance level you choose


Kaugnay na mga set ng pag-aaral

Health Promotion/ Health Literacy Ch. 6

View Set

CH 5 Quiz Other Pooled Investments

View Set

Southeast and Northeast Regions of the United States Test practice

View Set

B1: M2: Enterprise Risk Management (ERM)

View Set

Architect Journey: Development Lifecycle and Deployment

View Set