STATS 1001 Post Midterm (HW 5-7)

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

What is the difference between a histogram and a bar plot?

- bar plot-categories are on the x-axis, frequency is reflected by the y axis, different categories of data - histogram-measures frequency of different values in one category, frequency distribution, presents numerical data

When is correlation a valid measure of association?

- correlation, which measures the strength of a certain type of relationship between two measurement variables, how closely related they are/whether relationship exists (r) - indicator of how closely their values fall to a straight line on a scatterplot, usually linear relationship btw. two variables, measures specific linear association, strength of specific type of relatinship - association-whether two variables are related, direction association always symbolized, if there is a relationship - correlation useful measure of association when correlation is not due to relationship ∆ing over time - correlation only useful if relationship is linear, positive/negative, if relationship is not horizontal or vertical - if the value of one measurement variable is always the square of the value of the other variable, they have a perfect relationship but may still have no statistical correlation - correlation measures linear relationships only; [how close the individual points in a scatterplot are to a straight line]

replicability crisis

- inability of researchers to replicate earlier research findings - -over misunderstanding of what statistcs can do, famous example is the P<0.05 criterion in null hypothesis significance testing, main statistical tool in many sciences for many decades that is used to determine whether should believe a theory, but p-value doesn't say anything about how confident you should be, sometimes statistics is not the best method to understanding a problem -

What is the difference between a p-value and a significance level?

- p-value is a quantitative measure of significance - probability of obtaining results at least as extreme as the observed results of a statistical hypothesis test, assuming that the null hypothesis is correct - helps determine whether there is evidence to reject the null hypothesis, probability that observations were due to chance - smaller p-value means that there is stronger evidence in favor of the alternative hypothesis. - Significance level" is a threshold which you choose and specify. It defines which p-values you will count as being "statistically significant" or not; "statistically significant" means that the actual p-value was lower than the chosen/specified threshold. A common choice of threshold is 0.05.

central limit theorem

- as sample size increases, the distribution of sample means of size n, randomly selected [or sampling distribution] approaches a normal distribution.

hypothesis tests

- ask if the relationship observed in the sample large enough to be called statisti- cally significant, or could it have been due to chance? - use a known distribution to determine whether a hypothesis of no difference (the null hypothesis) can be rejected

Interpret the confidence level of a confidence interval

- A confidence interval indicates where the population parameter is likely to reside - The confidence level refers to the long-term success rate of the method, that is, how often this type of interval will capture the parameter of interest (95% confidence level->95% of intervals will cover true population value) - A specific confidence interval gives a range of plausible values for the parameter of interest. - basically is how confident you are that the true population value resides in the confidence interval

Gathering data that supports estimating interactions and heterogeneity

- Heterogeneity in statistics means that your populations, samples or results are different. - It is the opposite of homogeneity, which means that the population/data/results are the same - interaction is a special property of three or more variables, where two or more variables interact to affect a third variable in a non-additive manner. In other words, the two variables interact to have an effect that is more than the sum of their parts

Does increasing the size of the quantity being estimated change the width of the confidence interval?

- Increasing the sample size decreases the width of confidence intervals, because it decreases the standard error. - however, if the value of the quantity being estimated, for example the percentage of people who voted for a candidate, increases [45 to 50% support], that does not change the width of the confidence interval, it merely shifts it upwards

What is the difference between randomized sampling and randomized assignment?

- Random sampling allows us to obtain a sample representative of the population: Therefore, results of the study can be generalized to the population - way of selecting members of a population to be included in your study, random selection of individuals from population - Random assignment allows us to make sure that the only difference between the various treatment groups is what we are studying. - For example if randomly assign half people to read serif fonts, other half sans serif to see which group reads faster-> random assignment helps us create treatment groups that are similar to each other, and the only difference between them is that one group reads text in serif font and the other in sans serif font. Therefore, causality can be inferred. - In contrast, random assignment is a way of sorting the sample participants into control and experimental groups. - if doing both random sampling and assignment usually random sampling, then assignment: first random sample, then randomly half of sample to one group, half to other group

When a study is conducted, why do we think of there being many samples from the population distribution but just one from the sampling distribution?

- The population distribution gives the values of the variable for all the individuals in the population, so each - sampling distribution-distribution of an sample-derived estimate(such as the mean from a sample) across every possible sample you can take with the exact same sample size The statement that "a sampling distribution was the distribution of estimates from every possible random sample of a specific sample size" is absolutely correct. Perhaps the review question was worded in a confusing way, but what I meant was that when we conduct a study, we get a sample of size greater than 1 from the population of interest (e.g. a poll of 1000 voters), but from that sample we only get a single estimate (e.g. the proportion of those 1000 voters who approve of the president's job performance). Thus, in a single study, we get a sample of many individuals from the population distribution, but we only get a sample of size 1 from the sampling distribution (representing our single estimate among all estimates we could have gotten if our sample from the population had been different). - sample size of 100, take 100 samples from population distribution, since pop. distribution has every single person's view on something, but only 1 sample from sampling distribution, since from those 100 people you only get 1 average, and each point on sampling distribution is an average for a 100-person sample - for example, in study of US asthma rates, when you take a random sample of 100 people from pop. of U.S, taking 100 samples from pop. distribution, as pop. distribution includes entire U.S pop.'s asthma rates - but this study only takes 1 sample from sampling distribution, bc this study results in 1 estimate of U.S average rates, which is 1 sample of the sampling curve here of every possible estimate of U.S asthma rates with a randomly selected 100-person sample size

If we compute an estimate from a random sample, what is the difference between the population distribution of the sample and the sampling distribution of the estimate?

- population distribution of the sample-describes the values of the variable for all individuals in the population, distribution of whatever you''re measuring - the actual distribution/frequency curve of the entire population that you're taking the sample from, not necessarily normal - single experiment/instance of data collection, sample many people from population distribution, ultimately combine them into single estimate - The sampling distribution of an estimate describes the values of the estimate in all possible samples of the same size from the same population. - describes all of different kinds of estimates might get if repeat sampling process over and over again, as long as estimate is an average, should roughly look like normal curve - the distribution of values taken by the estimate in all possible samples of the same size from the same population. - sampling distribution of a statistic is a probability distribution based on a large number of samples of size n from a given population. - EX: looking at average length of fish in tank, take random sample of 20 fish, use sample mean to estimate population mean - sampling distribution of estimate-taking sample of 20 random fish and getting mean from 20 fish sample over and over, then finding the distribution of all the means from all those samples of 20 fish

Why are low power studies less likely to replicate?

- power of a test is the probability of correctly declaring that a true alternative hypothesis is true [correctly detecting that there is a relationship/effect in study/rejecting null hypothesis] - Power of study depends on true value of parameter [in photo, curve on right=alternative distribution, power = proportion of right curve that is highlighted yellow reject null hypothesis only if estimate comes into right of red dashed line(sig. threshold), - assume alternative is true, power is prob. reject null hypothesis, prob. come into right of red line if estimate looks like single sample from curve on right, - lower power, higher chance of making type 2 error, dec. likelihood of achieving statistical significance - 80% power-> 80% chance that whatever estimate is get is going to be to right of the red line will successfully reject the null hypothesis - replication-trying to repeat study exactly, but w/different random sample, 2nd estimate randomly sampled from same alternative distribution on the right, - also 80% chance coming to right of red line, same power for 2nd replication - 80% chance/power that replication also gets estimate strong enough to reject null, effect - When reduce power-ie through reading sample size, bring effect size closer to 0, more measurement spread -> power goes from 80-> 25%, low power - 1st study estimate is random sample from right, comes in in yellow region, can reject null hypothesis - 2nd identical study, same except they get new estimate-> second random sample from curve on the right, to replicate study, also have to get estimate to be in yellow region to right of red line, but since only 25% chance of it happening, unlikely, most likely attempt to replicate will not get lucky, replicators will likely fail to reject null hypothesis - Think of replications as additional samples from same alternative sampling distribution[if null is false] then power is long-run proportion of those replications that would expect to get a statistically significant result/reject the null - low power-probability of future replications of a study succeeding is low - so bc some of factors that cause the study to have low power, such as low sample size also make it less likely that are result is accurate, and not just a result of chance, and that our results can be replicated - low power is bad bc most studies fail to get statistical significance, ones with significance are huge overestimates of true effect - if only publish if statistical sig-est. have to be bigger than red dashed line, but low power [effect size is to close to 0, and sample size is too big], really confident that 0 isn't the truth, have to get estimate significantly larger than 0, but if true effect size is really small, any estimations that are statistically significant are going to be much larger than the actual effect size

standard error

- the standard deviation of a sampling distribution - measures the accuracy with which a sample distribution represents a population by using standard deviation

Why does increasing the confidence level of a confidence interval also increase its width?

- wider interval, higher confidence interval, higher level of confidence, because wider confidence interval, more likely that actually answer will fall in the interval -if you want to be more sure of answer, you need to make frame wider to cover your more potential answers to increase odds your interval covers the actual answer, the poll or survey were repeated over and over again, the results would match the results from the actual population 95 percent of the time. -confidence level- the poll or survey were repeated over and over again, the results would match the results from the actual population 95 percent of the time. - wider level of confidence is generally preferred - more narrow confidence level when interval is too wide to be useful, and would rather take lower confidence level/confidence that results actually reflect population for narrower confidence interval(smaller range of values that we believe actual population estimate is in)

What is the formula for a confidence interval? What role does z play?

- z-determined by desired confidence level, and is how many sampling distribution standard deviations adding or subtracting to estimate, determines how wide our confidence interval is

common significance level

0.05

what affects power?

1. The significance level α of the test. If all other things are held constant, then as α increases, so does the power of the test. - larger α m->larger rejection region for the test and thus a greater probability of rejecting the null hypothesis-> more powerful test. - The price of this increased power is that as α goes up, so does the probability of a Type I error should the null hypothesis in fact be true. 2. The sample size n. As n increases, so does the power of the significance test. This is because a larger sample size narrows the distribution of the test statistic. The hypothesized distribution of the test statistic and the true distribution of the test statistic (should the null hypothesis in fact be false) become more distinct from one another as they become narrower, so it becomes easier to tell whether the observed statistic comes from one distribution or the other. The price paid for this increase in power is the higher cost in time and resources required for collecting more data. There is usually a sort of "point of diminishing returns" up to which it is worth the cost of the data to gain more power, but beyond which the extra power is not worth the price. 3. The inherent variability in the measured response variable. As the variability increases, the power of the test of significance decreases. - test of significance is like trying to detect the presence of a "signal," such as the effect of a treatment, and the inherent variability in the response variable is "noise" that will drown out the signal if it is too great. - Researchers can't completely control the variability in the response variable, but they can sometimes reduce it through especially careful data collection and conscientiously uniform handling of experimental units or subjects. The design of a study may also reduce unexplained variability, and one primary reason for choosing such a design is that it allows for increased power without necessarily having exorbitantly costly sample sizes. - For example, a matched-pairs design usually reduces unexplained variability by "subtracting out" some of the variability that individual subjects bring to a study. - Researchers may do a preliminary study before conducting a full-blown study intended for publication, as importantly partially so researchers can assess the inherent variability within the populations they are studying. - An estimate of that variability allows them to determine the sample size they will require for a future test having a desired power. - A test lacking statistical power could easily result in a costly study that produces no significant findings. 4. The difference between the hypothesized value of a parameter and its true value. - sometimes called the "magnitude of the effect" in the case when the parameter of interest is the difference between parameter values (say, means) for two treatment groups. - The larger the effect, the more powerful the test is. - This is because when the effect is large, the true distribution of the test statistic is far from its hypothesized distribution, so the two distributions are distinct, and it's easy to tell which one an observation came from. - The intuitive idea is simply that it's easier to detect a large effect than a small one. This principle has two consequences that students should understand, and that are essentially two sides of the same coin. - On the one hand, it's important to understand that a subtle but important effect (say, a modest increase in the life-saving ability of a hypertension treatment) may be demonstrable but could require a powerful test with a large sample size to produce statistical significance. - On the other hand, a small, unimportant effect may be demonstrated with a high degree of statistical significance if the sample size is large enough. - Because of this, too much power can almost be a bad thing, at least so long as many people continue to misunderstand the meaning of statistical significance. - For your students to appreciate this aspect of power, they must understand that statistical significance is a measure of the strength of evidence of the presence of an effect. It is not a measure of the magnitude of the effect. For that, statisticians would construct a confidence interval.

HW 5 19.4. Suppose you are interested in estimating the average number of miles per gallon of gasoline your car can get. You calculate the miles per gal lon for each of the next nine times you fill the tank. Suppose, in truth, the values for your car are bellshaped, with a mean of 25 miles per gal lon and a standard deviation of 1. Draw a picture of the possible sample means you are likely to get based on your sample of nine observations. Include the intervals into which 68%, 95%, and almost all of the potential sample means will fall.

19 #4 [5 points] We are told that the population distribution (i.e. the population of possible mile per gallon measurements) has a mean of 25 and standard deviation of 1. The sampling distribution will share this mean, and the standard deviation will be the population standard deviation divided by square root of sample size, which we are told is 9. Thus, a correct answer should draw a normal curve with mean at 25 and with a standard deviation of 1⁄3. One, two, and three standard deviation intervals around the mean should also be shown. Points may have been deducted for any of the following: ● The frequency curve is obviously non-normal (slight asymmetry or other artifacts resulting from imperfect drawing are okay as long as the general bell curve shape is present) ● The frequency curve's mean is clearly different from 25 ● The frequency curve's standard deviation is clearly different from 1⁄3 ● The 1, 2, and 3 standard deviation intervals are missing or obviously incorrect

In Class.1: Which of the following is a correct completion of the following sentence? The power of a hypothesis test is the probability of _______ the null hypothesis under the assumption that the null hypothesis is _______.

Answer: Rejecting, false

Suppose that we perform a hypothesis test, and we get a p-value larger than the significance level. Which of the following is true?* a. We can conclude with confidence that the null hypothesis is false. b. We can conclude with confidence that the null hypothesis is true. c. We cannot draw a confident conclusion about the truth or falsity of the null hypothesis.

Answer: We cannot draw a confident conclusion about the truth or falsity of the null hypothesis, the probability of getting an estimate as extreme as we observed assuming the null hypothesis is true.

HW 5 19.10 Use the Rule for Sample Means to explain why it is desirable to take as large a sample as possible when trying to estimate a population value

Correct answers should (1) correctly connect the sample size with the standard deviation of the sampling distribution and (2) relate the standard deviation of the sampling distribution with the accuracy of a given estimate. Concretely, the sampling distribution standard deviation decreases as sample size increases, and a smaller standard deviation for the sampling distribution implies that we should expect any given estimate to be closer to the mean, which is the true value of the quantity we are estimating. Thus higher sample sizes translate to higher average accuracy for our estimates. Points may have been deducted for any of the following: ● Answer makes no connection between standard deviation and sample size. ● Answer incorrectly connects standard deviation and sample size, or otherwise makes an incorrect statement about the sampling distribution and the sample size. ● Answer makes no connection between the sampling distribution standard deviation and estimation accuracy. ● Answer tries to connect sampling distribution standard deviation and estimation accuracy but makes incorrect statements or otherwise does not express that smaller standard deviation gives higher accuracy. ACTUAL ANS The rule of sample means, which says if several samples that are the same size are taken, the frequency curve of means from the samples will be approximately bell-shaped, while the standard deviation will be the population's standard deviation divided by the square root of the sample size, explains why it is best to take as large of a sample as possible when trying to estimate a population value because the standard deviation of the possible sample means decreases as the sample size increases. Consequently, if the standard deviation of the possible sample means decreases, that means that as the sample size increases, the potential sample almost all fall closer and closer to the mean of collection of sample means, or the mean of the population. This means that as the sample size increases and the standard deviation of the sample size decreases, the possible sample means are increasingly likely to be accurate and closer to the population's mean. Additionally, since the rule of sample means says that mean of the collection of sample means will be the same as the mean of the population, that it applies when the population of the measurements of interests is bell-shaped or is not bell shaped and is a large random sample further suggests that a larger sample size implies the sample means will be close to the mean.

HW 7 24.26 In Original Source 1, the researchers addressed some limitations with the study. One of them was: First, there was a relatively small number of subjects who participated and this lim- ited our statistical power. A number of our hypothesized effects were in the predicted direction, but failed to reach significance (Davidson et al., p. 569). Explain what is meant by the second sentence of the quote.

Correct answers should explain that some of the researcher's results were compatible with their hypothesis but that they were not large enough/far enough from zero to allow them to reject the null hypothesis. Any explanation that expresses this idea is acceptable. Points may have been deducted if the given explanation makes incorrect statements, is confusing or otherwise does not demonstrate understanding, or simply rephrases the original sentence without elaboration. ACTUAL ANS: The second sentence means that even though they found evidence of the alternative hypothesis being true, but because the p-value was not small enough to convincingly rule out chance and was above the level of significance, the relationship was not found to be statistically significant, and there was not enough to enough evidence to prove the null hypothesis as false.

HW 6 3.. Suppose that you are presented with 1000 confidence intervals for a particular unobserved quantity. Suppose these intervals are computed from 1000 independent replications of the same study (i.e. each study had a different independent random sample, but all samples were drawn randomly from the same population and all were analyzed in exactly the same way). Further imagine that all of these studies used a 95% confidence level when computing their confidence intervals. How many of these 1000 intervals would you expect to contain the true value of the unobserved quantity? Why?

Correct answers should state that about 950 of the 1000 confidence intervals would be expected to contain the true value of the parameter. This is because the confidence level is a probability under the sampling distribution, meaning that it can be interpreted as a proportion of possible samples that would result in a confidence interval containing the true value of the quantity of interest. There are a wide range of explanations that are acceptable here, but the primary criteria are (i) that the explanation make sense, and (ii) that it is non-trivial (i.e. that it does not just claim that this is exactly the definition of confidence level, or something like this). ACTUAL ANS: Of these 1000 intervals, I would expect about 0.95 x 1000, or 950 of the confidence intervals to contain the true value of the UN observed quantity because a 95% confidence level means that there is a 95% chance that each confidence interval holds the true value. Therefore, if each of the 1000 intervals had a 95% chance of holding the the true value, than you would expect the larger sample to have a 95% confidence level, and about 95% of the 1000 intervals, or 950 intervals, to contain the true value.

HW 7 22.21 Many researchers decide to reject the null hy pothesis as long as the pvalue is 0.05 or less. In a testing situation for which a type 2 error is much more serious than a type 1 error, should researchers require a higher or a lower pvalue in order to reject the null hypothesis? Explain your reasoning.

Correct answers should state that if a type II error is much more serious than a type I error then researchers should increase the significance level (i.e. lower the threshold or make it easier to reject the null hypothesis). Answers should also explain why this is true (e.g. that choosing a larger significance level makes us less likely to fail to reject the null if the alternative is true). Points may have been deducted for an incorrect answer or for an explanation that makes incorrect statements or otherwise does not indicate understanding of the relevant concepts. ANCTUAL ANS: 21. In a testing situation for which a type 2 error is much more serious than a type 1 error, researchers should require higher cutoff for the pvalue to reduce the probability of a type 2 error, as you should choose a higher level of significance and be willing to reject the null hypothesis with even a moderately large p-value, as in this case, you want to ensure as best as you can avoiding making a type 2 error or false negative, and are therefore more reluctant to reject the alternative hypothesis.

HW 5 1 Also complete the following problem: 1. Suppose that you want to estimate the average cholesterol level in some population. In order to do this, you collect a simple random sample of 100 people from the population and measure their individual cholesterol levels. You then estimate the population average cholesterol level to be the average cholesterol level in your sample. Because the sample was randomly drawn from the population, the estimate you compute from the sample is itself random. The frequency curve or probability distribution for the estimate is its sampling distribution. a. If you changed your study by decreasing the sample size from 100 to 50, would the standard deviation (or spread) of the sampling distribution of the new estimate (from 50 measurements) be smaller or larger than the standard deviation (or spread) of the sampling distribution of the old estimate (with 100 measurements)? Explain your answer. b. If you changed the population of individuals that you were studying and the new population had greater variation in cholesterol levels, would the standard deviation (or spread) of the sampling distribution of the new estimate (from the more variable population) be smaller or larger than the standard deviation (or spread) of the sampling distribution of the old estimate (from the less variable population)? Explain your answer.

For part (a), correct answers should explain that the spread of the sampling distribution will go up as we decrease sample size. For (b), correct answers should explain that the spread of the sampling distribution will again go up if there is greater variability in the underlying population from which individuals are sampled. Both of these conclusions follow directly from the formula for the sampling distribution standard deviation. Points may have been deducted for an incorrect answer or for a correct answer that lacks an explanation or for which the explanation does not support the conclusion or makes clearly incorrect statements. ACTUAL ANS A. If you decreased the sample size from 100-50, the standard deviation of the new sampling distribution would be greater than the sampling distribution of the old estimate because of the rule of sample proportion. Therefore, the smaller the sample size, the larger the sample deviation, so the sample deviation of the new estimate with a sample size of 50 is bigger than the old estimate that has a sample size of 100.

HW 5 19.5 Refer to Exercise 4. Redraw the picture under the assumption that you will collect 100 measurements instead of only nine. Discuss how the picture differs from the one in Exercise 4. Exercise 4: 19.4. Suppose you are interested in estimating the average number of miles per gallon of gasoline your car can get. You calculate the miles per gal lon for each of the next nine times you fill the tank. Suppose, in truth, the values for your car are bellshaped, with a mean of 25 miles per gal lon and a standard deviation of 1. Draw a picture of the possible sample means you are likely to get based on your sample of nine observations. Include the intervals into which 68%, 95%, and almost all of the potential sample means will fall.

Increasing the sample size to 100 changes the sampling distribution standard deviation to 1/10 from 1/3. The sampling distribution mean is unchanged. Thus a correct answer should draw a frequency curve like in the last problem, but now the standard deviation should be 1/10. Points may have been deducted for any of the following: ● The frequency curve is obviously non-normal (slight asymmetry or other artifacts resulting from imperfect drawing are okay as long as the general bell curve shape is present) ● The frequency curve's mean is clearly different from 25 ● The frequency curve's standard deviation is clearly different from 1⁄3 ● The 1, 2, and 3 standard deviation intervals are missing or obviously incorrect ● No written explanation is given of the difference between the curves (e.g. of the difference in standard deviations), or the explanation is clearly incorrect.

What is the difference between power and the significance level?

Significance (p-value): probability that we reject the null hypothesis while it is true. - therefore is probability of Type I error - i.e incorrectly rejecting a true null hypothesis/how likely your results just due to chance, smaller p value is, stronger evidence should reject null hypothesis, results are not just due to chance, - significance= p-value ≤ level of significance(cut off point for the p-value's maximum value in order for it to be small enough to rule out null hypothesis) - calculated using the probability distribution for the null hypothesis - statistical significance: if relationship as strong as the one observed in the sample (or stronger) would be unlikely without a real relationship in the population, [relationship observed is very likely not due to mere chance] Power: probability of rejecting the null hypothesis while it is false - probability of avoiding a Type II error/probability of correctly rejecting a false null hypothesis - 1−power= the probability of Type II error - calculated using the probability distribution for the alternative hypothesis, - study may have failed to find a relationship between two variables because the test had such low power, a common consequence of conducting research with too small samples size,

Overcoming the cycle of noisy estimates and overconfidence

Statistical noise-random irregularity we find in any real life data, has no pattern

HW 7 24.25 In Original Source 1, the researchers addressed some limitations with the study. One of them was: First, there was a relatively small number of subjects who participated and this lim- ited our statistical power. A number of our hypothesized effects were in the predicted direction, but failed to reach significance (Davidson et al., p. 569). Explain what is meant by the first sentence of the quote.

The question wording is a bit ambiguous, so correct answers can explain either the reasoning or the meaning of the sentence. A correct explanation of the reasoning would be that a smaller sample size (i) leads to a larger standard deviation for the sampling distribution and (ii) that a larger sampling distribution standard deviation in turn decreases the power of a hypothesis test. An explanation of the meaning of the sentence would state that power quantifies our probability of successfully rejecting the null/detecting a nonzero effect if the null is in fact false, and the small sample size made it more likely that the researchers would be unable to correctly reject the null hypothesis. Points may have been deducted if the given explanation makes incorrect statements, is confusing or otherwise does not demonstrate understanding, or simply rephrases the original sentence without elaboration. ACTUAL ANS: The first sentence means that because the sample size was small, and therefore limited the statistical power, or probably of detecting rejecting the null hypothesis and detecting an effect when the alternative hypothesis is true, and avoiding a type II error, or false negative. This means that there is a chance that the alternative hypothesis is true, but that the sample size was too small to detect it, thus making the power too low.

HW 6 4. . Consider the following two scenarios. In each scenario, choose the confidence interval that you think would be more appropriate for the task described. a. Imagine you are trying to estimate the average concentration of a certain harmful chemical in the tap water in a neighborhood in NYC. You are trying to determine if the concentration is below a certain threshold. If you conclude that the average is almost certainly below the threshold, then no further safety analysis will be conducted. If you conclude that average concentration may be above the threshold, further analysis of the pipes will be performed. You choose to assess whether the average could be above this threshold by computing a confidence interval and checking whether the upper limit of the interval is below the threshold. For this purpose, would you prefer to compute an 80% confidence interval that is narrower, or a 99% confidence interval that is wider? Why? b. Imagine that you are helping to set the budget for snow-clearing devices in a city that has historically gotten up to 32 inches of snowfall annually. You are tasked with projecting the amount of snowfall that the city may experience next year. You gather data to predict this level of snowfall and compute two confidence intervals: a 99% interval ranging from 2 inches to 30 inches, and an 80% interval ranging from 12 inches to 20 inches. If the city is primarily interested in efficiently allocating its funds and can adapt to higher or lower snowfall later if needed, which of these intervals would you prefer to report as your estimated range? Why?

This problem is meant to test your understanding of the tradeoff between confidence level and interval width. The primary criterion for a correct answer is that it fully explains your reasoning and correctly uses the concepts of confidence level and/or interval width. The answer I had in mind for (a) was that the wider interval would be preferable as the health consequences of mistakenly estimating that the contaminant level is below the threshold could be severe. The answer I had in mind for (b) was that the narrower interval would be preferable since the consequences for making a mistake are small, but the efficiency and cost gains from successfully estimating lower snowfall are potentially large. However, if you preferred the other interval in either case, you could still receive full credit if you convincingly argued for their position. ACTUAL ANS: 4a. In this case, it would be best to use a 99% confidence interval that is wider as here, certainty of whether the average is below a certain threshold is more important, since if it is almost certainly bellow the threshold, no more safety analysis will be needed to be conducted. Therefore, since the goal is almost certainty that of the average concentration is below the threshold, you want a higher, and thus wider confidence interval, to be almost certain of whether the concentration falls below the upper limit of the interval. 4b. Since the city's primary interest is efficient allocation of funds, I would report the 80% interval ranging from 12 to 20 inches. Here, extreme accuracy is not as important a general range of the snowfall next year, so an 80% confidence that the interval captures the true snowfall is high enough. Additionally, since the goal is being able to efficiently allocate funds, a narrower confidence interval is more monetarily efficient, as the 80% confidence interval will most likely, but almost certainly include the snowfall for next year. Additionally, even if the actual snowfall next year ended up being lower or higher than the upper or lower limits of the 80% interval, the city can adapt to higher or lower snowfall later if needed, further implying the goal is not almost certain accuracy with a wider interval, but a relatively accurate interval that covers most of the likely snowfall amounts.

HW 6 2. Now imagine that you do this process again, but you specifically focus on steakhouses. a. Suppose that you use the same sample size, and that your estimate of the population standard deviation (i.e. sigma) is roughly the same as before. If you find that the customers in steakhouses are older on average, then, compared to your confidence interval for all restaurants, your confidence interval for steakhouses will be [wider, narrower, roughly unchanged]. b. Now suppose that you recompute the confidence interval with a higher confidence level of 99.9%. Compared to the original 95% confidence interval, this confidence interval will be [wider, narrower, roughly unchanged].

[5 points] Correct answers should state "roughly unchanged" for (a), "wider" for (b), and explain both answers coherently. The answer for (a) follows from the fact that changing the value of the quantity being estimated does not affect the width of the confidence interval as long as the sampling distribution's standard deviation is unchanged and the confidence level is the same. This can be seen directly from the formula. For (b), the answer follows from the fact that obtaining a larger confidence level requires choosing a larger value of z, which in turn increases the width (as can again be seen from the confidence interval formula). ACTUAL ANS: 2a. If you find that the customers in steakhouses are older on average, then, compared to your confidence interval for all restaurants, your confidence interval for steakhouses will be roughly unchanged, as because of the formula for the confidence interval. Because the population is only being added or subtracted to the rest of the equation, it does not effect the rest of the formula, and therefore, if only the population average changes it does not affect the width of the confidence interval. Rather, the only change an increase in the population average causes is an increase in the value of the lowest and highest possible average age in the confidence interval. 2b. Compared to the original 95% confidence interval, an increase in confidence level to 99.9% will cause the new confidence interval to be wider because z, or the multiplier, which is how many sampling distribution standard deviations we are adding and subtracting, depends on the desired confidence level. Therefore, the higher the confidence level, the higher the multiplier, and thus the width of the confidence interval, as the multiplier is multiplied in the confidence interval formula. After all, because if you want more confidence that the actual proportion falls in the range of your confidence interval, you will have to expand your range of potential values, or the width of your confidence interval. Additionally, since another common confidence interval is 99.7%, and we know the value of z for a 99.7% confidence interval is 3, which is larger than about 2 for a 95% confidence interval, an increase in confidence level even more to 99.9% will only cause the width of the confidence level to further increase.

In Class.2: The setup for this question is a hypothetical tutoring program that increases students' grades by an average of 5%. The problem asks you to consider a test of the null hypothesis that the average effect of the tutoring program is 0 (i.e. on average student grades do not change). a. Imagine there is an increase in the variation of students' outcomes after tutoring (i.e. the average increase is still 5% but there is more variation around this average). Does this increase or decrease the power of the researcher's hypothesis test? b. Imagine that the researcher decides to change their significance level from 10% to 1%. In order to reject the null hypothesis, the estimated grade improvement in their sample... c. After decreasing their significance level, the power of the researcher's hypothesis test.. d. Imagine the researcher decides to double their sample size. Will the power of their hypothesis test then increase or decrease? e. Imagine the researcher repeats this study with a different tutoring program that increases students calculus grades by an average of 7%. If the variation in student outcomes is the same as in the first program, and if the researcher uses the same sample size and significance level, then the power of their test will now be...

a. Answer: Decrease power b. Answer: must be larger, c. decreases d. Answer: Increase, e. higher

HW 7 22.13 Given the convention of declaring that a result is statistically significant if the pvalue is 0.05 or less, what decision would be made concerning the null and alternative hypoth eses in each of the following cases? Be explicit about the wording of the decision. a. pvalue 5 0.35 b. pvalue 5 0.04

a. Correct answers should indicate that in this case the researchers do not reject the null hypothesis or that their estimate does not attain statistical significance. Any language that expresses this idea is acceptable, but answers should not state or imply that the researchers accept the null hypothesis or conclude that the alternative hypothesis is false. Points may have been deducted for any of the following: i. Answer states that null hypothesis should be rejected ii. Answer states that null hypothesis is accepted or alternative hypothesis is rejected b. Correct answers should indicate that the null hypothesis is rejected or that the estimate attains statistical significance. Any language that expresses this idea is acceptable. Points may have been deducted for any of the following: i. Answer states that null hypothesis cannot be rejected ACTUAL ANS: 13a. Since the p-value is 0.35 and greater than 0.05, p value is not small enough to convincingly rule out chance. Therefore, we cannot reject the null hypothesis as an explanation for the results. 13b. Since the p-value is 0.04 and less than 0.05, the alternative hypothesis is accepted and the null hypothesis is rejected

HW 7 22.5 The news story discussed in Case Study 6.5 (pp. 132-133) was head- lined "NIH study finds that coffee drinkers have lower risk of death" (Source: http://www.nih.gov). The news story was based on an ar ticle published in The New England Journal of Medicine (Freedman et. al., 2012) that followed hundreds of thousands of older adults from 1995 to 2008 and examined the association between drinking coffee and longevity. a. What are the null and alternative hypotheses for this study? b. The authors concluded that those who drank coffee had a statistically significantly lower risk of death (during the time period of the study) than those who did not. Which type of error, type 1 or type 2, could have been made in making this conclusion? c. Explain what a type 1 error and a type 2 error would be in this situation. Which one do you think is more serious? Explain.

a. Correct answers should state that the null hypothesis is that there is no association between coffee consumption and longevity for the relevant population (e.g. older adults), and that the alternative hypothesis is that coffee consumption is associated with longevity. It is acceptable if the answer assumes a particular direction for the alternative hypothesis (i.e. coffee consumption associated with increased or decreased longevity specifically). Points may have been deducted for any of the following: i. Incorrect null hypothesis ii. Incorrect alternative hypothesis b. Correct answers should state that a Type I error could have occurred. No explanation is required since the question did not ask for one. Points may have been deducted for any of the following: i. Incorrect answer (i.e. type II error or other error) c. Correct answers should explain that a type I error would occur if the researchers conclude that there is an association when there is none, and a type II error would occur if the researchers do not conclude that there is an association when in fact there is one. Points may have been deducted for any of the following: i. Incorrect explanation of type I error ii. Incorrect explanation of type II error (e.g. if answer states researchers conclude null hypothesis is true) ACTUAL ANS: 5a. The null hypothesis is that drinking coffee has no effect on how long one lives. The alternative hypothesis is there is an association between drinking coffee and risk of death. 5b. If the authors concluded that those who drank coffee had a significantly lower risk of death, then a type 1 error, which can only be made if the null hypothesis is actually true, could have been made in making this conclusion. Since the authors concluded that the alternative hypothesis of there being an association between drinking coffee and longevity, is true, then a potential error could be the alternative hypothesis actually not being true, and there not being an association between coffee consumption and longevity, or a type 1 error. 5c. A type one error, would be concluding that there is an association between an association between drinking coffee and longevity when there is no association. A type two error, which is a false negative, and when the alternative hypothesis is actually true but the null hypothesis is mistakenly considered to be true, would be concluding that there is no association between drinking coffee and longevity when there actually is an association between coffee and longevity. A type 2 error would be more serious, because if people are told there is no association between drinking coffee and lifespan when there actually is either a positive or negative assumption between coffee consumption and longevity, this is more troublesome since people are under the assumption that they do not need to alter their coffee consumption habits to maximize their lifespan when they actually should, and think coffee has no effect on their lifespan when it does. In contrast, a type 2 error, or false negative is less serious, since telling people there is an association between drinking coffee and longevity when there is non merely means you are encouraging people to alter their coffee habits for no reason, since coffee has no effect on their lifespan.

The p-value is...*

the probability of getting an estimate as extreme as we observed assuming the null hypothesis is true.

Type One vs Type Two Error

type 1 error can only be made if the null hypothesis is actually true, false positive, rejecting the null hypothesis when there actually is not enough data to reject null hypothesis type 2 error can only be made if the alternative hypothesis is actually true, false positive, rejected the alternative hypothesis when it was true

Replicability

when a study's findings are able to be duplicated, ideally by independent investigators

simpson's paradox

when you combine the results of two separate contingency tables into one - problem occurs when a relationship is different for one population than it is for another, but the results for samples from the two are combined


Ensembles d'études connexes

18 - Questions - Spanning Tree Protocol (STP)

View Set

Section 24: Deed Types and Title Transfer

View Set

Public Health 195 Exam 2 Study Guide

View Set