STAT FINAL STUDY GUIDE

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

A university wanted to find out the proportion of students who felt comfortable reporting cheating by their fellow students. A survey of 2,800 students was conducted and the students were asked if they felt comfortable reporting cheating by their fellow students. The results were 1,344 answered "Yes" and 1,456 answered "no". A 99% confidence interval for the proportion of the student population who feel comfortable reporting cheating by their fellow students is from __________ to __________:

0.456 to 0.504 Response Feedback: p=1344/2800=0.48 n=2800 standard error = sqrt(0.48*0.52*1/2800) = 0.009442 multiplier = 2.58 (99 percent confidence) confidence interval = 0.48 +/-(2.58)*0.009442 = (0.456,0.504)

If numerous large random samples or repetitions of the same size are taken from a population, the curve made from means from the various samples will have what approximate shape?

A bell shape

Suppose you think that the proportion of coupon users in grocery stores in your town has decreased from 10 years ago. You know from previous research that 10 years ago, 35% of grocery store customers in your town used coupons. Suppose you take a random sample of 100 customers from a variety of grocery stores in your town, and you find that 25 percent of them use coupons. What is the p-value (one-sided) for your observed results?

0.0179 Response Feedback: Use standard normal lookup table. For z=-2.10, you will see 0.0179, which is the one-sided p-value. Probability of getting a -2.10 or more negative is 0.0179.

Using critical values of -2 and + 2 this a test at the

0.05 or 5 percent level of significance

What is a Hypothesis?

A hypothesis is a claim (assertion) about a population pa

The owner of a local nightclub has recently surveyed a random sample of n = 250 customers of the club. She would like to determine whether or not the mean age of her customers is is 30. Suppose she found that the sample mean was 30.45 years and the standard deviation of age is 5 years. What is the p-value for the hypothesis test?

0.16 ZSTAT was 1.42. From standard normal look up table, the probability associated with 1.42 is 0.9222 (means the probability of z less than 1.42 is 0.9222). Therefore the probability of z greater than 1.42 is (1-0.9222) or 0.0778. Two sided test, so 0.0778 x 2 or about 0.16

Suppose you think that the proportion of coupon users in grocery stores in your town has decreased from 10 years ago. You know from previous research that 10 years ago, 35% of grocery store customers in your town used coupons. Suppose you take a random sample of 100 customers from a variety of grocery stores in your town, and you find that 25 percent of them use coupons. What is your conclusion? Assume a significance level of .05.

Do not reject the null hypothesis None of the above Response Feedback: Reject. p-value < 0.05.

The owner of a local nightclub has recently surveyed a random sample of n = 250 customers of the club. She would like to determine whether or not the mean age of her customers is is 30. Suppose she found that the sample mean was 30.45 years and the standard deviation of age is 5 years. Using a significance level of 0.05, what conclusion can she make?

Do not reject the null hypothesis that mean age is 30 Response Feedback: H0: mean age is 30, H1 not equal to 30 What is standard error? 5/sqrt(250) = 0.3162 ZSTAT=(30.45-30)/0.3162 = 1.42 Critical values for 0.05 are -2 and +2 Not beyond critical: Do not reject the null hypothesis

When you experience a coincidence, which of the following interpretations is appropriate?

If an event has a million to one chance, it is expected to happen to 330 people in the U.S. in a given day, on average (because the U.S. population is 330 million). It is not unlikely that something surprising will happen to someone, somewhere, someday. There is a big difference between the probability of a rare event happening to someone somewhere, and the

If ZSTAT < -2 or If ZSTAT > +2 we...

Reject the Null

What does the Range between -1 and 1 mean as a Features of the Coefficient of Correlation

The closer to -1, the stronger the negative linear relationship. - The closer to 1, the stronger the positive linear relationship. - The closer to 0, the weaker the linear relationship.

Suppose that test scores on a particular exam have a mean of 77 and standard deviation of 5. Suppose you take numerous random samples of size 100 from this population. Which of the following statements is true?

The curve for the sample means will be bell-shaped, with a mean of 77 and a standard deviation of 0.5.

Standard error =

standard deviation / sqrt(sample size)

Practical significance means

the magnitude of an effect is large.

A Type I error is committed when

you reject a null hypothesis that is true

Vaccine Example: Statistical Significance

The null hypothesis is: the vaccine has no effect (vaccine is no better than placebo shot). • If we reject the null, we conclude vaccine had a statistically significant effect

When a relationship or value from a sample is so strong that we decide to rule out chance as an explanation for its magnitude, what does this mean?

The observed result is statistically significant. We conclude that the observed result carries over to the population, and cannot be explained away by chance. We could have been unlucky with our sample, and come to the wrong conclusion, but that chance is

For Covid-19, the CDC reported the relative risk of hospitalization for vaccinated compared to unvaccinated was 0.06 and the 95 % confidence interval was 0.03 to 0.12. Consider the null hypothesis: the vaccine is no more effective than a placebo shot at preventing hospitalization. You reject the null hypothesis at the 0.05 level of significance.

True

In the American judicial system, you must presume that the defendant is innocent unless there is enough evidence to conclude that he/she is guilty. True or False: In this situation, we can state the null hypothesis is that the defendant is innocent and the alternative hypothesis is that the defendant is guilty

True

Larger samples tend to result in more accurate estimates of population values than do smaller samples.

True

We never say we accept the null we either reject it or do not reject it

True

Suppose, in testing a hypothesis about a mean, the appropriate p-value is computed to be 0.043. True or False: The null hypothesis should be rejected if the chosen level of significance is 0.05.

True Response Feedback: p-value is low, so null must go (0.043 is less than 0.05)

The test statistic and p-value do not provide information about the magnitude of the effect

True, Confidence intervals can be more useful: show you both the magnitude and uncertainty

The covariance has a limited interpretation

True, It is not possible to determine the relative strength of the relationship from the size of the covariance

Type I Error - Innocent person convicted • An innocent person falsely convicted and guilty party remains free. Type II Error: - Guilty person acquitted • A criminal is set free Which is more serious?

U.S. legal system: Type I Error more serious

Covariance between two variables

cov(X,Y) > 0 X and Y tend to move in the same direction. cov(X,Y) < 0 X and Y tend to move in opposite directions. cov(X,Y) = 0 X and Y are independent.

The ZSTAT just tells us

how many standard errors our sample mean is from the hypothesized mean

The width of a confidence interval estimate for a proportion will beThe width of a confidence interval estimate for a proportion will be

narrower for 90% confidence than for 95% confidence Response Feedback: Higher levels of confidence require a greater range of values (wider interval) Higher sample leads to narrower intervals (interval tightly bounds the true value)

A statistically significant relationship or difference does

not necessarily mean an important one.

• Let's suppose we know the overall proportion of Americans in poverty is 0.12. • By way of context, a family of four that has total income less than about $28,000 would be poor according the U.S. government's definition of poverty. • We wish to test the null hypothesis that the proportion of Native Americans in poverty is equal to 0.12 against the alternative the proportion in poverty is not equal to 0.12. • The U.S. Census Bureau interviews a sample of 50 Native Americans and determines 12 are in poverty How do we estimate Native American's in poverty?

p=12/50=0.24, thus we know it it different form the null hypothesize 0.12 but we need to construct test statisc to determine stastically signficance

Suppose a poll was conducted to find out what percentage of Americans intend to vote in the next Presidential election. The 95% confidence interval from the poll is 49% to 55%. What is the poll's margin of error?

plus or minus 3%

The power of a test is measured by its capability of

rejecting a null hypothesis that is false.

Regression of Doctor Visits on Age • Each additional year of age leads to 0.138 more doctor visits - Each additional 10 years of age leads to 1.38 more doctor visits on average • Formula Y = -2.555 + 0.138*X What if we wanted to predict the number of doctor visits for a 35 year old?

-2.555 + 0.138*35 = 2.275

Suppose the Senator read in a report that average household income in PPCs for the United States (as a whole) is $40,700. • He wants to know if average household income in PPCs in South Carolina is different than $40,700. • H0 : μ = $40,700 • H1 : μ ≠ $40,700 • Let's suppose the state of South Carolina pulls a sample of 1,000 households in the state PPCs and conducts a survey of sample members. • Tabulations indicate mean income in the sample is $40,000. • The standard deviation of income in these counties is $10,000. What is the standard error of the mean?

$10,000/sqrt(1000) = $316.23

An economist is interested in studying the incomes of consumers in a country. A random sample of 50 individuals resulted in a mean income of $15,000 and a standard deviation of $1,000. What is the upper end in a 99% confidence interval for the average income?

$15,365

• Let's suppose the state of South Carolina pulls a sample of 1,000 households in the state Persistent Poverty Counties (PPCs) and conducts a survey of sample members. • Tabulations indicate mean income in the sample is $40,000. • The standard deviation of income in these counties is $10,000. • Construct a 95 percent confidence interval for mean income in the PPCs

$40,000 +/- (2)*($316.23) = ($39,368, $40,632) • Lower limit: $40,000-(2)($316.23) = $39,368 • Upper limit: $40,000+(2)*($316.23)=$40,632

ZSTAT =

(sample mean - hypothesized mean) / standard error

Critical values are

-2 and +2

Suppose you want to test whether the proportion of coupon users in grocery stores in your town has decreased from 10 years ago. You know from previous research that 10 years ago, 35% of grocery store customers in your town used coupons. Suppose you take a random sample of 100 customers from a variety of grocery stores in your town, and you find that 25 percent of them use coupons. What is the value of the test statistic for your observed results? HINT:Construct your standard error using a proportion of 0.35.

-2.10 Response Feedback: H0: Proportion 0.35. H1 Proportion < 0.35. Standard error (use null value of 0.35) = sqrt(0.35*(1-0.35)/100) = 0.0477 ZSTAT=(0.25-0.35)/0.0477=-2.10

Suppose a 95% confidence interval for the average amount of weight loss on a diet program for males is between 13.4 and 18.3 pounds. These results were based on a sample of 42 male participants who were deemed to be overweight at the start of the 4-month study. What is the standard error of the sample mean?

1.225

Suppose a 95% confidence interval for the average amount of weight loss on a diet program for males is between 13.4 and 18.3 pounds. These results were based on a sample of 42 male participants who were deemed to be overweight at the start of the 4-month study. What is the sample mean?

15.85

Multiplier for 95 percent confidence interval will always be

2

Suppose a 95% confidence interval for the average amount of weight loss on a diet program for males is between 13.4 and 18.3 pounds. These results were based on a sample of 42 male participants who were deemed to be overweight at the start of the 4-month study. What is the margin of error?

2.45

The introductory biology class at a large university is taught to hundreds of students each semester. For planning purposes, the instructor wants to find out the average amount of time that students would use to take the first quiz, if they could have as long as necessary to take it. She takes a random sample of 100 students from this population and finds that their average time for taking the quiz is 24 minutes, and the standard deviation is 16 minutes. What is the 95% confidence interval for the average amount of time that students would use to take the first quiz?

21%-27%

About how many people would need to be gathered together to be at least 50% sure that two of them will share the same birthday (the same day of the year, not necessarily the same year)?

23

Suppose a sample of 120 smokers were given nicotine patches and, after 8 weeks, 55 individuals had quit smoking. What is the 95% confidence interval for the percentage of nicotine-patch users who quit smoking by the eighth week?

37%-55% Response Feedback: p=55/120, n=120 plug in these values for standard error = sqrt((p*(1-p)/120) confidence interval: p +/- (2) * standard error (0.37, 0.55) or 37 to 55 percent

How do men and women compare when it comes to talking on the cell phone? Suppose you take a random sample of 100 male cell phone owners and a random sample of 100 female cell phone owners. The average number of minutes for the women per month was 280 with a standard deviation of 20; the average number of minutes for the men per month was 190 with a standard deviation of 30. Based on these sample results, what is the 95% confidence interval for the difference in average time spent on the cell phone for females versus males?

83-97 minutes Response Feedback: point estimate of difference: 280-190=90 standard error for women: 20/sqrt(100) =2 standard error for men: 30/sqrt(100)=3 standard error of difference=sqrt(2^2 + 3^2) = 3.6 95 percent confidence interval: 90 +/- (2)*(3.6) = (82.8,97.2) or rounded 83 to 97 minutes

Which of the following is a correct interpretation of a 95% confidence interval?

95% of the random samples you could select would result in intervals that contain the true population value

How do you calculate Relative Risk?

Experimental Group risk ration (experimental positve vs total) divided by control/placebo risk ratio (placebo positve divided total)

90% confidence intervals are wider than 95% confidence interval.

False

A fair coin is to be flipped two times. The sequence HT is more likely than HH (where H denotes coin landed on heads, T on tails).

False

Holding the level of confidence fixed, increasing the sample size will lead to a wider confidence interval

False

The relative risk and the odds ratio will have very different values if the risk of disease under study is low.

False

A sample is used to obtain a 95% confidence interval for the mean of a population. The confidence interval goes from 15 to 19. True or False: If the same sample had been used to test the null hypothesis that the mean of the population is equal to 18 versus the alternative hypothesis that the mean of the population differs from 18, the null hypothesis could be rejected at a level of significance of 0.05.

False Response Feedback: 18 is in your confidence interval; you would not reject it

What is Unit free as a Features of the Coefficient of Correlation

For example, the correlation between weight and height remains the same regardless of whether height is expressed in inches, feet or millimeters

The Alternative Hypothesis, H1

Is the opposite of the null hypothesis.

Sampling methods and confidence intervals are routinely used for financial audits of large companies. According to Case Study 20.1 in the textbook, which of the following is an advantage of doing it this way, versus having a complete audit of all records?

It is much cheaper. A sample can be done more carefully than a complete audit. A well-designed sampling audit may yield a more accurate estimate than a less carefully carried out complete audit or census.

Which of the following is true about a p-value for a one-sided test?

It measures how likely you would be to observe results at or beyond your test statistic, assuming the null hypothesis is true.

Factors Affecting Power of Test "Power" to reject null when it is false. What things effect this?

Less power when actual population value is close to hypothesized value. Higher power with larger sample size - With large samples, you can detect even small differences Suppose the true proportion of population in poverty is 11.99 and the null hypothesized value is 12.00. It would be difficult to detect such a small difference, other things equal.

Our 95 percent confidence interval for average income was found to be: $39,368 and $40,632. Is $40,700 in the confidence interval?

No, so we can reject the null hypothesis since it falls outside confidence interval

Let's suppose the state of South Carolina pulls a sample of 1,000 households in the state Persistent Poverty Counties (PPCs) and conducts a survey of sample members. • Tabulations indicate mean income in the sample is $40,000. • The standard deviation of income in these counties is $10,000. • Construct a 95 percent confidence interval for mean income in the PPCs If someone asks you what is average income among all households in PPCs, can you know the answer?

No. With only a sample, you will never know the true value for the entire population BUT you can say something about the true value. You can say you are 95 percent certain the average income among all households in these counties is between: $39,368 and $40,632.

• Formula for confidence interval is always

Point estimate +/- (multiplier)*(standard error)

Suppose the bank manager wants to test the null hypothesis that mean balance in all checking accounts is greater than or equal to $3,600 against an alternative that mean balance is less than $3,600. The bank manager examines a sample of 31 checking accounts and finds the mean balance in the sample is $3,450. The standard deviation for balances is $500. Using a significance level of 0.05, what would you conclude? HINT: The critical value for a one-tailed (left tail) test at the 0.05 level is -1.645.

Reject the null hypothesis that mean balance in all accounts is greater than or equal to $3,600 Response Feedback: Null hypothesis is mean balance greater than or equal to $3,600. This means a one tailed test. You need the critical value for the lower 0.05 of left tail. What is that? -1.645. ZSTAT= (3450-3600)/(500/sqrt(31)=-1.67. It is "beyond critical" (more negative than -1.645), so reject.

Baseline Risk - • The probability of getting Covid-19 one week after second placebo shot is: 0.008091072 • Risk with Treatment: • The probability of getting Covid-19 one week after second vaccine shot is: 0.000422258. Calculate Relative Risk

Relative Risk - 0.000422258/0.008091072 = 0.05 (rounded) • Risk of vaccinated group only 5 percent of the risk of the unvaccinated group. • This the basis for the widely discussed statistic "vaccine is 95 percent effective" • Could think about like this: if you got the vaccine, the risk (probability) of getting Covid-19 would be reduced 95 percent. Relative Risk Reduction is 95 percent.

Standard error of the proportion

Sqrt [ ( proportion * (1- proportion) ) / sample size ] • In hypothesis testing, use the hypothesized proportion

The Null Hypothesis, H0

States the claim or assertion to be tested.

Why is Power of Test Important?

Suppose there is new drug that cures a disease, but sample size is small •The null hypothesis would be the drug has no effect on the disease •With a small sample, the test will have little power and very unlikely to reject the null •In policy circles, the press, etc. the results would be interpreted as the drug has no effect and should not be brought to market.

Suppose the Senator read in a report that average household income in PPCs for the United States (as a whole) is $40,700. • He wants to know if average household income in PPCs in South Carolina is different than $40,700. • H0 : μ = $40,700 • H1 : μ ≠ $40,700 • Let's suppose the state of South Carolina pulls a sample of 1,000 households in the state PPCs and conducts a survey of sample members. • Tabulations indicate mean income in the sample is $40,000. • The standard deviation of income in these counties is $10,000. From prior knowlege of the equation we know that We found a test statistic ZSTAT of -2.21. • What is the probability of getting a z-score of -2.21 or lower? Use your lookup table: probability is 0.0136. How do we caculate P valve in a two tailed test?

The p-value, for a two-tail test, is defined to be two times this number: 2 x 0.0136 = 0.0272.

In which of the following situations can you construct a confidence interval for the population proportion with what is given?

The sample proportion and the margin of error. The sample proportion and the sample size. The number of individuals in the sample with the trait of interest, and the total sample size.

Suppose 40% of the adult population owns a cell phone. A sample of 2,500 adults is to be taken from the population. Which of the following statements are true?

There is 68 percent chance that between 0.39 and 0.41 of the people in the sample will own a cell phone

Suppose that test scores on a particular exam have a mean of 77 and standard deviation of 5. You plan to take a random sample of 100 and calculate a sample mean. Which of the following statements is true?

There is 95 percent chance the sample mean will be between 76 and 78 Response Feedback: Rule of sample means says standard deviation is 5/sqrt(100)= 0.5 And curve is normal with mean equal to population mean (77) From empirical rule if you want 95 percent coverage, take the mean +/- 2 standard deviations 77 +/- (2)(0.5) or 76 to 78

Suppose that test scores on a particular exam have a mean of 77 and standard deviation of 5. You plan to take a random sample of 100 scores and calculate a sample mean. Which of the following statements is true?

There is a 16 percent chance the mean will be below 76.5 Response Feedback: Rule of sample means says standard deviation is 5/sqrt(100) = 0.5 And that curve is normal with mean equal to population mean (77) Construct a Z-score for 76.5: (76.5-77)/0.5 = -1.00 Probability z less than -1.00 is about 0.16 or 16 percent

Suppose 40% of the adult population owns a cell phone. A sample of 2,500 adults is to be taken from the population. Which of the following statements is true?

There is an 84 percent chance the proportion of people in the sample who own cell phones is less than 0.41. Response Feedback: From the rule of sample proportions, the standard deviation is sqrt(0.4*0.6/2500) or about 0.01 The rule also tells us the curve for sample proportions is normal with mean equal to population proportion (0.40) Construct a Z-score fro 0.41: (0.41-0.40)/0.01 = 1 From lookup table for standard normal, we see probability is about 0.84 or 84 percent

Suppose you think that the proportion of coupon users in grocery stores in your town has changed from 10 years ago but you do not have an initial impression as to whether it has increased or decreased. Which of the following is true?

This is a two-sided hypothesis test and the p-value is twice what it would be under a one-tail test

A statistically significant difference may not be of practical importance because the estimated magnitude of the difference is very small

True

Features of the Coefficient of Correlation

Unit free and Range between -1 and 1

Suppose a confidence interval for the difference in mean weight loss for two different weight loss programs (Program 1 - Program 2) is entirely above zero. What does this mean?

We can say with confidence that there is a difference in mean weight loss for the populations of people on these two programs; further, we can say that the average weight loss on Program 1 is higher.

Whats a Type 2 error?

We do not reject null hypothesis when it is false

What is a Type 1 error?

We reject the null hypothesis when it is true

Linear Regression: Fitting a Line to Scatter Plot

Y = a + bX • a = intercept - where the line crosses the vertical axis when X = 0. • b = slope - how much of an increase there is in Y when X increases by one unit. • Y is the dependent variable and X is the explanatory variable (or independent variable)

Suppose the Senator read in a report that average household income in PPCs for the United States (as a whole) is $40,700. • He wants to know if average household income in PPCs in South Carolina is different than $40,700. • H0 : μ = $40,700 • H1 : μ ≠ $40,700 • Let's suppose the state of South Carolina pulls a sample of 1,000 households in the state PPCs and conducts a survey of sample members. • Tabulations indicate mean income in the sample is $40,000. • The standard deviation of income in these counties is $10,000. From prior knowlege of the equation we know that We found a test statistic ZSTAT of -2.21. • What is the probability of getting a z-score of -2.21 or lower? Use your lookup table: probability is 0.0136. We know the p valve is 0.0272 do we reject the null hypothesis using the p two tailed test?

Yes p is less than 0.05 so the P is too low the null hypothesis must go

Let's suppose we know the overall proportion of Americans in poverty is 0.12. • By way of context, a family of four that has total income less than about $28,000 would be poor according the U.S. government's definition of poverty. • We wish to test the null hypothesis that the proportion of Native Americans in poverty is equal to 0.12 against the alternative the proportion in poverty is not equal to 0.12. • The U.S. Census Bureau interviews a sample of 50 Native Americans and determines 12 are in poverty p=0.24 Construct test stastic to determine if we should accept null or rejected it?

You always need a standard error. Start there. • Standard error: sqrt(0.12*(1-0.12)/50) = 0.046 • For hypothesis tests, always use the hypothesized proportion (0.12 in this case) when determining standard error. 21 Solution Piece-by-piece •You can now construct your test statistic ZSTAT: • (sample proportion - hypothesized proportion) / standard error • (0.24-0.12)/0.046 = 2.61

Suppose the Senator read in a report that average household income in PPCs for the United States (as a whole) is $40,700. • He wants to know if average household income in PPCs in South Carolina is different than $40,700. • H0 : μ = $40,700 • H1 : μ ≠ $40,700 • Let's suppose the state of South Carolina pulls a sample of 1,000 households in the state PPCs and conducts a survey of sample members. • Tabulations indicate mean income in the sample is $40,000. • The standard deviation of income in these counties is $10,000. What is our test statistic? Do we Reject the Null hypothesis ?

ZSTAT =($40,000 - $40,700) / ($316.23) = -2.21, Yes we reject the null hypothesis because it is beyond critical point -2. We reject the null hypothesis that average income in these counties is $40,700 in favor of the alternative that average income is different than $40,700.


संबंधित स्टडी सेट्स

FLVS 2.02 - Cuando manejo en mi ciudad

View Set

Chapter 2 Family-Centered Community-Based Care

View Set

Biology-Ch 14, selected parts of 15 and 16

View Set

Ch 11- The Atomic Nature of Matter

View Set