The Ultimate Exam Study Guide Stat 226

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

According to history.com, Pope Gelasius declared February 14 St. Valentine's Day around 498 A.D., though the actual festival may have begun sometime in the 3rd century, during the life of the unidentified Valentine. Today, many marriage proposals happen on February 14. It turns out that the age that young women first married in 1892 follows a normal distribution with a mean of 19 years and a standard deviation of 4.1 years. What is the probability that a woman got married between the ages of 23 and 26 during that time? Round your answers to 4 decimal places.

.1199

For an efficiency study, the library is polling students at ISU to determine the mean number of students at ISU who study at the library at least once a week. The sample is

All students we poll

For a Student t-distribution with 100 degrees of freedom we have that P(t>5) is (choose the correct answer from those provided below):

Approximately 0

Using Table A (i.e. the standard normal table) find the value of zzthat makes the following probability true: P(|Z|>z)=0.005P(|Z|>z)=0.005 Report your answer to two decimal places.

2.81 There are several ways of reaching the answer. All of them exploit the symmetry and sum to 1 properties of the normal distribution. We suggest you draw a curve and shade the areas corresponding to the probability required here, as a way to reach the answer without making errors. One possible way to get the answer is to note that P(|Z|>z)=P(Z>z)+P(Z<−z)=2P(Z<−z)=0.005P(|Z|>z)=P(Z>z)+P(Z<−z)=2P(Z<−z)=0.005 Thus, the lower tail of the distribution is half of this value, P(Z<−z)=0.0025P(Z<−z)=0.0025 Using Table A, we obtain -2.81 as the z-value corresponding to this probability. Please note that the z-value in the question HAS to be positive (a negative value would give us a probability of 1). Therefore, the correct answer is ONLY z=2.81

Lightbulbs. A company produces lightbulbs. We know that the lifetimes (in hours) of lightbulbs follow a bell-shaped (symmetric and unimodal) distribution with a mean of 7,207 hours and a standard deviation of 641 hours. Use the Empirical Rule (68-95-99.7 rule) to answer the following question: The shortest lived 2.5% of the lightbulbs burn out before how many hours? Report your answer as a whole number. For example, if your answer is 8550.60 hours, report 8551.

5925

A questionnaire concerning satisfaction with the Financial Aid office on campus was mailed to 50 students on a university campus. The 50 students in this survey are an example of a

Sample

Consider two samples in the same town. Both samples are the same size (50). Sample 1 consists of a set of kindergarten students. Sample 2 consists of members of a small church. Consider the variable age. Which sample would most likely have the largest sample standard deviation?

Sample 2 would more likely have a higher standard deviation that Sample 1.

The respondents who indicated support for stricter gun control in a recent study were asked to identify their top choice for the kind of restrictions on firearms that should be imposed. The pie chart below shows the percentage of respondents selecting each of the four choices. Using the information in this chart, the median for variable Top Choice is (choose one):

Cannot be calculated ( there is a tie, with values at 60, 15, 15, 10. 15 is not the answer, as the top choice is not a number, but a action [ban online sales, ban mentally ill, etc.])

When estimating the mean of a distribution, larger samples are required for larger populations.

False Sample mean is an unbiased estimator of population mean. For any sample size we can get estimate of population mean.

The best way to get a representative sample is to handpick the sample carefully.

False The best way to get a representative sample is to get a random sample.

For two observations from a normal distribution, it holds true that the observation with the smaller z-score is also the observation closer to the mean.

False The one with the smaller absolute value is closer to the mean. Closer to the mean refers to distance from mean which can be viewed in terms of how many standard deviations away we are.

Residuals should never be negative.

False Residual = Observed Value − Predicted Value

University of Louisville researchers examined the process of filling plastic pouches of dry blended biscuit mix (Quality Engineering, Vol. 91, 1996). The current fill mean of the process is set at μ= 407 grams. Operators monitor the process by randomly sampling 36 pouches each day and measuring the amount of biscuit mix in each. Suppose that on one particular day, two operators, Finny and Quinn, observe a sample mean of 400 grams and a standard deviation of 10.1 grams. Operator Finny believes that this indicates that the true process fill mean μ is off-target, i.e. the true process mean is actually different from 407 grams. Operator Quinn argues that μ= 407, and the small value of x¯x¯ observed is due to random variation in the fill process. Which operator do you agree with? (Hint: think of this in the context of hypothesis testing).

Finny

Suppose you were to collect data for grip strength of adults as well as their age. You want to make a scatterplot but need to determine what should be the proper x and y axes. The results of the study found that younger adults tended to have slightly higher grip strength. Which variable would you use as the response variable? Furthermore, would you expect a positive, negative, or no linear association between the two variables?

Grip Strength Slight Negative

Ames Housing. Data Science is everywhere! Real estate agents understand and appreciate the importance of the data in learning about the housing market. In this problem, we will provide summaries of the data from the Ames Housing market and ask you several questions that would inform relators about the state of the housing market. The variable of interest here is the sale price of houses in Ames (reported in thousands US$), between 2015 and 2019. Comparing the mean with the median in the previous part, what can you say about the shape of the distribution?

The distribution is right-skewed because the mean is larger than the median

Hot Dogs. Since their introduction in 1893, hot dogs have been hugely popular because they are easy to eat, convenient, and inexpensive. Two graduate students in the Food Sciences Department at ISU have decided to learn more about the calorie content of hot dogs for their thesis project. They collected data on the calorie content of 118 brands of hot dogs across the United States. The distribution and summary statistics are given below. What numerical summaries best describe the center and variability of these data?

The median and IQR because the distribution is left skewed

In order to rate TV shows, phone surveys are sometimes used. Such a survey might record several variables, some of which are listed below. Which of these variables are categorical? (Choose all that apply.)

The name of the show if any being watched

A description of different houses on the market includes the following three variables. Which of these variables is quantitative? Choose one answer from those provided below.

The square footage of the house The monthly gas bill The monthly electric bill *All of the above

Managers at Story Time Bookstore monitor their book sales each year. Data were collected for the past year on the genre of each book sold and the price for each book sold. Also considered was the day each book was sold. The managers would also like to know if their total book sales stay steady throughout the year. They should examine a:

Time series plot

If the standard error increases, the p-value for a test of hypotheses for the population mean increases.

True

If we have a dataset with n = 90 values that are all equal to -5 then the standard deviation, IQR, and range of this dataset are all equal to zero.

True

The assumption of independence of observations is required for the correct conclusions of a statistical test of hypotheses.

True

The test statistic for a test of hypotheses is unitless.

True

When we increase the sample size, the p-value generally decreases.

True

For a standard normal variable, find the P(Z≥−6). Report your answer as a probability to 4 decimal places.

Larger than .9997

One day Betty bought a box of chocolate chip cookies from her local grocery store. A lover of chocolate, she is curious if the cookies she bought have enough chocolate chips per cookie. We are told that the number of chocolate chips per cookie follows a bell-shaped distribution (symmetric and unimodal) with a mean of 26 chips and a standard deviation of 7 chips. Using the 68-95-99.7 rule, the cookies corresponding to the middle 95% of all cookies in terms of their chocolate chip count have how many chocolate chips?`

Lower Bound 12 Upper Bound 40

Consider a sample with data values of 27, 25, 20, 5, 30, 34, 28, 25. Find the 5-number summary.

Min- 5 Q1- 22.5 Median- 26 Q3 29 Max- 34

Treasury bills (T-bills) are short-term borrowing instruments for the U.S. government. T-bills are largely regarded as risk-free investments (unless the U.S. government defaults). Below is information concerning the annualized rate of return for 6 months Treasury bills from 1958 to 2007.

Modality: Unimodal Shape: Right Skewed Presence of outliers: Yes

Which of the following are examples of people mistaking correlation for causation? Choose all correct answers.

More Books at home make you better at reading wearing a lucky shirt to the fame helps the team win Diet coke drinking leads to weight gain

Data regarding quarterly sales (in thousands of Dollars) were collected for a random sample of n=10 Armand's restaurants located near college campuses. The size of the student population(in thousands) was thought to be related to the sales of the Pizza Franchises. Would you use this model to predict Armand's quarterly sales in a town with a student population of 50,000 students?

No

Data were collected for 100 Stat 226 students on what their current Facebook relationship status was. Below is a graphical summary of these data. Would a 5-number summary be appropriate to use for these data?

No

A catalog sales company promises to deliver orders placed on the Internet within 3 days. Follow-up calls to randomly selected customers show that a 95% confidence interval for the mean delivery time is 2.8 +/- 1.1 days. Which, if any, of the following interpretations of this interval are correct? Select ALL that apply.

We are 95% confident that the true mean delivery time of all orders from this seller is between 1.7 and 3.9 days. The procedure that produced this interval generates ranges that hold the true mean delivery time for all orders from this company for 95% of the samples.

For a normal distribution, the mean can be any real number

true

Consider the five scatterplots that are shown below: Select the scatterplot that shows the strongest linear relationship between the X and Y variables (Select all that apply).

a b The observed points in the scatterplot with strongest linear association will almost fall on a perfect line. The scatterplot with strongest linear association may show positive or negative association.

A test in which the alternative hypothesis allows any value of a parameter larger than a specified value is...

a one sided test

For a Student t-distribution with 10 degrees of freedom we have that P(t<5) is (choose the correct answer from those provided below):

approximately 1

The alternative hypothesis (Ha)...

contradicts the null hypothesis

A class of 30 introductory statistics students took a 15 item quiz, with each item worth 1 point. The standard deviation for the resulting score distribution is 0. You know that:

everyone correctly answered the same number of items

To preserve confidentiality, the actual profits of all 97 Fortune 500 companies have not been published, but the five-number summary of profits is available. The five-number summary is tabulated and displayed graphically below, with units in millions of dollars. What is the label for the MEDIAN in the plot in Figure 1?

i

Suppose you have a data set with 9 distinct observations. If the largest number in the data set increases the range will

increase

Tina's score on her midterm exam was at the 50th percentile. The scores were Normally distributed. The exam average was 60.89 and the standard deviation was 7.35. What was Tina's score on the exam? Report your answer as a number, with two decimal places.

60.89

Iowa Winery Sales. A local winery near Ames is considering expanding its distribution network to include several grocery chains. To make a convincing sales pitch, they would like to summarize data they have on sales volume for recent years. They decide to consult with an ISU student who has taken Stat 226 on how to best present and analyze their data. The winery owner knows that time of year plays a role in wine sales and she asks the student to focus only on weekly wine sales for the Summer months (i.e. April through September). Below is a histogram of weekly summer wine sales during the years 2012-2016. Use this graph to answer the following question. What measure of center would be best to use for the distribution of Summer Wine Sales?

Median

Ames Housing. Data Science is everywhere! Real estate agents understand and appreciate the importance of the data in learning about the housing market. In this problem, we will provide summaries of the data from the Ames Housing market and ask you several questions that would inform relators about the state of the housing market. The variable of interest here is the sale price of houses in Ames (reported in thousands US$), between 2015 and 2019. Which summary statistics are the most appropriate to report the center and spread of the data?

Median and IQR

Treasury bills (T-bills) are short-term borrowing instruments for the U.S. government. T-bills are largely regarded as risk-free investments (unless the U.S. government defaults). Below is information concerning the annualized rate of return for 6 months Treasury bills from 1958 to 2007. Find and report the 5-number summary for the above data:

Min- 1.050 Q1 3.645 Median- 5.130 Q3- 6.983 Max- 13.810

McWait Times. One reason for the renovation of the McDonald's in West Ames was to make the kitchen more efficient. The restaurant's General Manager would like to know the maximum amount of time her customers have to wait to get their order after the renovation. A sample of n= 13 customers of the West Ames McDonald's was sampled and their wait times are given below, rounded to the nearest minute. 1, 0, 3, 12, 2, 1, 3, 4, 2, 3, 9, 1, 8 Find and report the 5-number summary for the above data:

Min-1 Q1-1 Median - 3 Q3 6 Max- 12

In a group of ten students attending a party, the average age is 21.5 years. A 27-year-old student joins the party. Can we calculate the new median age of the 11 students?

No

Iowa Winery Sales. A local winery near Ames is considering expanding its distribution network to include several grocery chains. To make a convincing sales pitch, they would like to summarize data they have on sales volume for recent years. They decide to consult with an ISU student who has taken Stat 226 on how to best present and analyze their data. The winery owner knows that time of year plays a role in wine sales and she asks the student to focus only on weekly wine sales for the Summer months (i.e. April through September). Below is a histogram of weekly summer wine sales during the years 2012-2016. Use this graph to answer the following question. Assume that the weekly summer wine sales at the local winery vary with a mean of μ = 2600 dollars and standard deviation σ= 1300 dollars. The student is actually going to attend a wedding at the winery in August, and he is wondering what is the probability that the sales for that week will be more than 2500 dollars. Can he calculate this probability?

No

Baseball Game Length. Baseball has long been America's past time, but the appeal of the game seems to be fading away. One of the reasons is the length of games seems to be growing over time. In 2017, the average length of Major League Baseball games was 185 minutes with a standard deviation of 24 minutes. The distribution of length of Major League Baseball games is known to be skewed. Can you find the probability that a randomly chosen game had a length of less than 3 hours (180 minutes)?

No Before calculating the probability, it is necessary to check the distribution shape (Normal or not). Table A can only be used to calculate the probability of a Normal distribution.

Unlike the standard deviation or the mean, the correlation is not influenced by outliers.

False

Quality assurance. In the food industry, quality assurance is an important business practice. One manufacturer produces packages of potato chips that are advertised to contain 16 ounces of product. The actual weight of contents in a package has a mean of 16.5 ounces and a population standard deviation of 0.6 ounces. The mean amount filled, μ, is set high so that under-filling will not occur frequently. Find the probability that the mean weight for a sample of 35 packages is between 16.25 and 16.5 ounces.

.4932 Note: we do not know anything about the distribution we are sampling from, so need to check condition n≥30n≥30 to use CLT. Sampling distribution of sample mean by CLT: N(u,σn√).N(u,σn). calculate standard error: SE=(σn√)=(0.635√)≈0.1014SE=(σn)=(0.635)≈0.1014 Want to find P(16.25<X¯<16.5)=P(X¯<16.5)−P(X¯<16.25)P(16.25<X¯<16.5)=P(X¯<16.5)−P(X¯<16.25) find z-scores : upper(x−u)SE=(16.5−16.5)0.635√=0,lower(x−u)SE=(16.25−16.5)0.635√=−2.47upper(x−u)SE=(16.5−16.5)0.635=0,lower(x−u)SE=(16.25−16.5)0.635=−2.47 Note from Table A: P(Z<0)−P(Z<−2.47)=0.5−0.0068=0.4932

For a standard normal variable, find the P(Z≥3.08)P(Z≥3.08) . Report your answer as a probability to 4 decimal places.

0.001 P(Z ≥ 3.08) = 1-P(Z ≤ 3.08) = 1-0.999 = 0.001

Gallup Poll: Atheism in U.S. Elections. A recent Gallup Poll tested American's willingness to vote for candidates who do not fit the traditional mold. The survey is based on telephone interviews conducted between January 16 and 29, 2020. 1033 interviews were conducted with respondents (eligible to vote) on landline telephones and cellular phones. Landline respondents are chosen at random within each household. The primary poll question that is of interest here is the following: "Between now and the 2020 political conventions, there will be discussion about the qualifications of presidential candidates - their education, age, religion, race and so on. If your party nominated a generally well-qualified person for president who happened to be [characteristic], would you vote for that person?" The results for this question are shown in the chart below. Please note that each respondent was allowed to choose only one party as their political affiliation, and remember that only U.S. citizens 18 and older are eligible to vote. Use this chart and the description of the Gallup poll to answer the following question. Select one example of a statistic.

41% (i.e., the proportion of the respondents who identify as Republican and would vote for an atheist candidate)

The weekly salary paid to each employee of a small company is normally distributed with a mean of $813 and a standard deviation of $108. This small company has 36 employees. What is the average weekly salary for all 36 employees such that the probability of being above this weekly average salary is only 10%? (Report your answer rounding to 2 decimal places.)

836.04 X¯∼N(μ,σ/√n)) x=z σ/√n+μ P(Z>z)=.1 implies that z=1.28 x=mean+1.28*sd/6

Least-squares regression minimizes the sum of the squares of the horizontal distances of points from the regression line.

False Least-squares regression minimizes the sum of the squared residuals. Consider the direction of the residuals.

: For two random variables X and Y such that X∼N(8,σ= 4) and Y∼ N(250,σ= 2), the interquartile range (IQR) for the X distribution is SMALLER than the IQR for the Y distribution.

False

A p-value for a two-sided test of hypotheses is always double the p-value for a one-sided test.

False

A p-value is only meaningful for a two-sided test of hypotheses.

False

A scatterplot shows the relationship between two qualitative variables.

False

Consider the following situations for a random variable X. (i) The population of X follows a normal distribution and we have a sample of size n= 14. (ii) The population of X follows a normal distribution and we have a sample of size n= 45. (iii) The population of X is not normally distributed and we have a sample of size n= 17. (iv) The population of X is not normally distributed and we have a sample of size n= 50. For which of the scenarios (i) -(iv) can you use the Normal Distribution Table to find probabilities for the random variable X. Select all that apply.

I II

The standard deviation of the heights of a previous 226 class was 26 inches. John was a lot taller than the rest of the class. When he was removed from the study, the standard deviation remained the same

Incorrect

Gallup Poll: Age and Willingness to elect an atheist president. To gain more insight into the demographics and political affiliations of American voters who have said they would vote for a qualified atheist presidential candidate, the designers of a Gallup Poll recorded additional information for each respondent, such as their age (reported as Age Group) and political affiliation (reported as Party). Using the charts below, answer the following question: What is the mode of the political party affiliation of the respondents who are willing to vote for a qualified atheist presidential candidate?

Independent

A college student wishes to study if herbal tea can improve the health of nursing home patients. She makes weekly visits to a local nursing home, visiting and talking with the residents, and serving them herbal tea. After six months, the residents drinking tea on more occasions were observed to have fewer days of ill and cheerful attitudes. Identify the explanatory variable from the choices given above. Also, indicate what your findings would be for the correlation.

Number of weeks drinking herbal tea negative When considering the explanatory variable consider which variable may have an impact on the second variable (the response). In this case: Would you expect drinking more tea to impact the health or the health to impact drinking tea. For the correlation consider the statement in the problem: "After six months, the residents drinking tea on more occasions were observed to have fewer days of ill and cheerful attitudes."

Gallup Poll: Atheism in U.S. Elections. A recent Gallup Poll tested American's willingness to vote for candidates who do not fit the traditional mold. The survey is based on telephone interviews conducted between January 16 and 29, 2020. 1033 interviews were conducted with respondents (eligible to vote) on landline telephones and cellular phones. Landline respondents are chosen at random within each household. The primary poll question that is of interest here is the following: "Between now and the 2020 political conventions, there will be discussion about the qualifications of presidential candidates - their education, age, religion, race and so on. If your party nominated a generally well-qualified person for president who happened to be [characteristic], would you vote for that person?" The results for this question are shown in the chart below. Please note that each respondent was allowed to choose only one party as their political affiliation, and remember that only U.S. citizens 18 or older are eligible to vote. Use this chart and the description of the Gallup poll to answer the following question. Which would be the correct conclusion of this study? Please CHOOSE the correct answer to complete the sentence (only one answer is correct): "There is some evidence that..."

..the Democrats express somewhat more willingness than Republicans to support most of the candidate types tested, with the widest gaps seen for Muslims, atheists and socialists.

Iowa Winery Sales. A local winery near Ames is considering expanding its distribution network to include several grocery chains. To make a convincing sales pitch, they would like to summarize data they have on sales volume for recent years. They decide to consult with an ISU student who has taken Stat 226 on how to best present and analyze their data. The winery owner knows that time of year plays a role in wine sales and she asks the student to focus only on weekly wine sales for the Summer months (i.e. April through September). Below is a histogram of weekly summer wine sales during the years 2012-2016. Use this graph to answer the following question. Assume that the weekly summer wine sales at the local winery vary with a mean of μ = 2600 dollars and standard deviation σ= 1300 dollars. The student draws a simple random sample of 34 summer weeks and computes the sample mean. Find the probability that the mean weekly summer sales is less than 2000 dollars.

.0036 In this case, we want P(X¯<2000)P(X¯<2000) and our sample size n=34.n=34. In this case, n is large enough to apply the Central Limit Theorem so that X¯∼approx N(μ,σ/√n) Sampling distribution of sample mean: N(2600,σ/√n). calculate standard error: SE=(σ/√n)=(1300/√34)=222.948 Want to find P(x¯<2000). find z-score: (x¯−u)/SE=(2000−2600)/222.948=−2.69. Note from Table A: P(Z<−2.69)=0.0036

Tuition. Fighting increased attendance costs, Iowa State learns they must increase student tuition. Interim President Benjamin Allen decides to consult the Statistics Department in regard to this controversial issue. A professor in the department concludes that it may be best to look at the yearly cost of tuition for all Universities in the state of Iowa. After further research, it is discovered that the yearly cost of tuition for all Universities in Iowa follows a normal distribution with mean $13,960 and a standard deviation of $3,250. Using this information, answer the following question. What is the probability that a University in Iowa charges less than $7,250 for yearly tuition? Report answer to 4 decimal places.

.0197

The Roberson family of Duck Dynasty fame would like to start selling camouflage sleeping bags. They assume the amount people will pay for their sleeping bags is normally distributed with a mean of $250 and a standard deviation of $40. What is the proportion of customers who would pay more than $310? Report answer as a proportion (NOT a percentage) to 4 decimal places.

.0668 For this problem. plugging in x=310 and corresponding mean and standard deviation. We get z=1.5 P(Z>1.5)=1-P(Z<1.5) = 1-0.9332=.0668 P(Z<1.5)=0.9332 From table A

The weekly salary paid to each employee of a small company is normally distributed with a mean of $800 and a standard deviation of $100. This small company has 36 employees. What is the probability the average weekly salary of all 36 employees is between $775 and $735?

.0668 In this case, we want P(735<X¯<775)P(735<X¯<775) and we know X¯∼N(μ,σn√)X¯∼N(μ,σn) calculate standard error: SE=(σn√)=(10036√)=16.667SE=(σn)=(10036)=16.667 Want to find P(735<x¯¯¯<775)=P(x¯¯¯<775)−P(x¯¯¯<735).P(735<x¯<775)=P(x¯<775)−P(x¯<735). find z-scores : upper(x¯¯¯−u)SE=(775−800)16.667=−1.50,lower(x¯¯¯−u)SE=(735−800)16.667=−3.90upper(x¯−u)SE=(775−800)16.667=−1.50,lower(x¯−u)SE=(735−800)16.667=−3.90 Note from Table A: P(Z<775)−P(Z<735)=0.0668−0=0.0668(notethataz−scoreof−3.9goesbeyondlowertailoftable,soprobabiltyisverycloseto0.)P(Z<775)−P(Z<735)=0.0668−0=0.0668(notethataz−scoreof−3.9goesbeyondlowertailoftable,soprobabiltyisverycloseto0.)

Game of Thrones. Last July, a new episode of the hit HBO series "Game of Thrones" (GOT) was released every Sunday at 8 PM CST for 7 weeks. Many fans of the show started streaming the new episodes soon after 8 PM through the streaming service HBO Now or HBO Go. However, other fans waited longer to start watching the new episodes for various reasons, such as having to wait for their kids to go to bed or their parents to stop hogging the TV with reruns of The Golden Girls. Overall, pretend that the distribution of the minutes that a typical GOT fan waits (after 8 PM) to stream a new episode is normally distributed with mean μ = 30.8 minutes and standard deviation σ = 20.4minutes. Suppose HBO could provide us with the times it took 100 randomly sampled GOT watchers to start watching the 6th episode of last season: "Beyond the Wall." Find the probability that the meantime (in minutes) it takes for the 100 fans to start watching the 6th episode is less than 28 minutes. (Leave answer as a proportion NOT a percent and report the value to 4 decimal places.)

.0853 In this case, we want P(X¯<28)P(X¯<28) , and we know X¯∼N(μ,σn√)X¯∼N(μ,σn) calculate standard error: SE=(σn√)=(20.4100√)=2.04SE=(σn)=(20.4100)=2.04 Want to find P(X¯¯¯¯<28).P(X¯<28). find z-score: (x¯¯¯−u)SE=(28−30.8)2.04=−1.37.(x¯−u)SE=(28−30.8)2.04=−1.37. Note from table A: P(Z<−1.37)=0.0853P(Z<−1.37)=0.0853

According to Business Week (Special Annual Issue, Spring 2003) the average stock price for companies making up the S&P500 is $30, and the standard deviation is $8.20. Assume that stock prices are normally distributed. What is the probability a company will have a stock price of at least $40? Report answer to 4 decimal places.

.1112 For this problem. plugging in x=40 and corresponding mean and standard deviation. We get z=1.22 P(Z>1.22)=1-P(Z<1.22) = 1-0.8888=.1112 P(Z<1.22)=0.8888 From table A

According to the IRS, income tax returns 5 years ago averaged $1,332 in refunds for the taxpayer. Suppose the average amount of tax at the end of the year is normally distributed with a mean refund of$1,332 and a standard deviation of $725. What proportion of the tax returns shows a refund between $100 and $700? Leave answer as a proportion NOT a percent and report the value to 4 decimal places.

.1467 z-score of 100 is (100-1332)/725=-1.7;z-score of 700 is (700-1332)/725=-0.87;P(100 < X < 700) = P(-1.7 < Z < -0.87) = P(Z < -0.87) - P(Z < -1.7) = 0.1922 - 0.0446 = 0.1476

The distribution of heights of students in Stat 226 is bell-shaped (symmetric and unimodal) with a mean of 69.5 inches and a standard deviation of 5.6 inches. Use the 68-95-99.7 rule to answer the following question: What is the probability of Stat 226 students who are shorter than 63.9 inches? (Report value as a proportion NOT a percent and round to 2 decimal places.)

.16 Following the Empirical rule, the answer is 16% or 0.16, corresponding to the value on the upper tail that sits 1 standard deviation left from the mean (69.5-5.6=63.9). Answers using the normal distribution (and thus Table A) are not correct since the population is not normally distributed. Answers obtained using this approach did not receive credit.

Weekly TV Habits. A study is being conducted regarding the number of hours spent by STAT 226 students watching TV. The purpose of this study is to determine whether STAT226 students are spending too much time outside of class watching TV. Several Graduate students conducted a survey and found that STAT 226 students' weekly time spent watching TV was normally distributed with a mean of 21 hours and a standard deviation of 2.5 hours. Using this information, answer the following question. What is the probability that a STAT 226 student spends less than 19 hours a week watching TV? Report answer to 4 decimal places.

.2119

For a standard normal variable, find the P(Z≥0.6)P(Z≥0.6). Report your answer as a probability to 4 decimal places.

.2743 Method 1: P(Z ≥ 0.6) = P(Z ≤ -0.6) = 0.2743 Method 2: P(Z ≥ 0.6) = 1-P(Z ≤ 0.6) = 1-0.7257 = 0.2743

How does the cost of a movie depend on its length? The scatterplot below provides information on a random sample of 120 films. Data were collected on the cost (millions of dollars) and the running time (minutes) for each film. The analysts fit a simple linear regression of cost vs. length and obtained a coefficient of determination (R2 ) of 0.101. Using all the information provided in this problem, report the value of the correlation coefficient up to four decimal places.

.3178 positive association: r=R2−−−√r=R2 negative association: r=−R2−−−√r=−R2

For a standard normal variable, find the P(0.3≤Z≤2.2)P(0.3≤Z≤2.2). Report your answer as a probability to 4 decimal places.

.3682 .9861-.6179

Every second of our lives we are using oxygen to keep our bodies alive, even though we don't think about it very often. When we exercise, we need more oxygen than usual. After consulting with a statistician, the YMCA designed a survey to see how oxygen uptake is related to pulse (heartbeat) while running. The data for thirty-one individuals are displayed in a scatterplot below, along with some summary statistics corresponding to the simple linear regression model fit to these data. Use this information to calculate the linear correlation coefficient between these two variables. Report your answer to TWO decimal places.

.40 Look at prediction line (e.g. in JMP output) and get the direction from the sign of the slope b1. b1>0 :a positive slope indicates positive relationship (r >0) r=R2−−−√r=R2 b1<0 :a negative slope indicates negative relationship (r <0) r=−R2−−−√r=−R2

The weekly salary paid to each employee of a small company is normally distributed with a mean of $800 and a standard deviation of $100. What is the probability that an employee of the small company will make between $775 and $925 in one week? Round your answers to 4 decimal places.

.4931 z-score of 775 is (775-800)/100=-0.25; z-score of 925 is (925-800)/100=1.25; P(775 < X < 925) = P(-0.25 < Z < 1.25) = P(Z < 1.25) - P(Z < -0.25) = 0.8944 - 0.4013 = 0.4931

One day Betty bought a box of chocolate chip cookies from her local grocery store. A lover of chocolate, she is curious if the cookies she bought have enough chocolate chips per cookie. We are told that the number of chocolate chips per cookie are normally distributed with a mean of 26 chips and a standard deviation of 7 chips. Betty's first cookie had 21 chocolate chips. Using table A, what proportion of cookies will have 21 chocolate chips or more? Report answer as a proportion (NOT a percentage) to 4 decimal places.

.7611

Assume that the distribution of running times for women who run on a regular basis follows a normal distribution with mean 31.2 min and a standard deviation of 2.38 min. What proportion of runners who run a 5k race in 28 minutes or more? Report answer as a proportion (NOT a percentage) to 4 decimal places.

.9099

Since the Federal Reserve Rate is strongly tied to the T-bill rates, we may be able to predict the T-bill rate from the Federal Fund rate. Use the JMP output below to calculate the linear correlation coefficient between these two variables. Report your answer to TWO decimal places.

.9885 Look at prediction line (e.g. in JMP output) and get the direction from the sign of the slope b1. b1>0 :a positive slope indicates positive relationship (r >0) r=R2−−−√r=R2 b1<0 :a negative slope indicates negative relationship (r <0) r=−R2−−−√r=−R2

The figure below contains four scatterplots. Match each scatterplot to the correlation r below that best describes it.

1) 1 2) 0 3) .91 4) -.33 1. When x and y show a positive, linear relationship in the scatterplot ⇒ r will be positive. When x and y show a negative, linear relationship in the scatterplot ⇒ r will be negative. 2. When r= 1, we have a perfect, positive linear relationship. When r=−1, we have a perfect, negative linear relationship. 3. categorizing the strength: considered weak: 0<r≤|0.3| considered moderate:|0.3|<r≤|0.7| considered strong:|0.7|<r≤|1| 4. Of note in this problem is that, even though the relationship is very tight in Plot 2, the curved pattern makes the correlation coefficient meaningless. r=0 implies that x and y are not linearly related, but x and y could have a perfect non-linear relationship.

Using Table A (i.e. the standard normal table) find the value of z that makes the following probability true: P(|Z|>z)=0.242 Report your answer to two decimal places.

1.17 There are several ways of reaching the answer. All of them exploit the symmetry and sum to 1 properties of the normal distribution. We suggest you draw a curve and shade the areas corresponding to the probability required here, as a way to reach the answer without making errors. First recognize that z is positive in this scenario. One possible way to get the answer is to note that P(|Z|>z)=P(Z>z)+P(Z<−z)=2P(Z<−z)=0.242P(|Z|>z)=P(Z>z)+P(Z<−z)=2P(Z<−z)=0.242 0.242/2=2P(Z<−z)/2=P(Z<−z)=0.121 Using Table A we find the z values are P(Z < -1.17) = 0.121 so −z=−1.17, z=1.17

Assume Miles per Gallon (MPG) of U.S. manufactured muscle cars follows a normal distribution. Presently, the average MPG for the U.S. manufactured muscle cars is 10MPG with a variance of 9MPG. What is the median MPG for the US manufactured muscle cars? Choose the (ONE) correct answer among those provided below.

10 The median is the same as the mean for a normal distribution

A group of friends compares what they received while trick or treating. They find that the average number of pieces of candy received is 32, with a standard deviation of 2. What is the z-score corresponding to 60 pieces of candy? Report your answer as a number, rounding to two decimal places (you should be careful to report the negative sign if your z-score is negative). For example, if your answer is -3.4569, report -3.46.

14 Standardizing: If X∼N(μ,σ), then Z=X−μ/σ∼N(0,1) For an observed random variable xx, a z-score is obtained by standardizing, i.e. z=x−μ/σ x in this case represents the number of pieces of candy. Use the given mean and standard deviation to calculate the z-score.

Quality assurance. In the food industry, quality assurance is an important business practice. One manufacturer produces packages of potato chips that are advertised to contain 16 ounces of product. The actual weight of contents in a package has a mean of 16.9 ounces and a population standard deviation of 0.7 ounces. The mean amount filled, μ, is set high so that under-filling will not occur frequently. The company will not ship the day's production if the sample mean weight of the 35 packages is too low. Managers would like to have a "cut off" value whereby they will decide not to ship the day's production if the sample mean weight of the 35 packages falls below the cut-off. What should the cut-off value be so that the probability of not shipping the day's production is 0.0025? (Round up to 2 decimal places.)

16.57 X¯¯¯¯∼approx.N(μ,σn√)X¯∼approx.N(μ,σn) x=zσn√+μx=zσn+μ P(Z<z)=.0025 implies that z=-2.81 x=mean+1.88*sd/sqrt(35)

Retirement savings. Research indicates that many Americans do not save enough for retirement, on average. A group of economists at the Federal Reserve Bank of St. Louis are interested in conducting a study to examine whether Americans between the ages of 55 and 65 have saved too little. The analysts obtain a random sample of Americans in this age group and proceed to test if there is evidence of insufficient savings from these data. The variable considered is the total amount of savings for each individual, reported in thousands of US dollars. For this question, assume that financial specialists suggest that the minimum level of retirement savings should be 1.5 million US dollars. Use the JMP output provided below to report the value of the test statistic used to gather evidence against the null hypothesis. Report your answer as a number (no symbols) and round to two decimal places. Test Mean Hypothesized Value 1500 Actual Estimate 1613.07 DF 251 Std Dev 782.626 t Test Prob > |t| 0.0226*

2.2935 t=estimate−m0/s/√n Plugging in our values, t=1613.07−1500/782.626/√252 Note n=252 as df=n-1=251

Using Table A (i.e. the standard normal table) find the value of z that makes the following probability true: P(Z<z)=0.997 Report your answer to two decimal places.

2.75 Directly from Table A we find z = 2.75. P(Z<z) = 0.102 so P(Z< 2.75) = 0.997.

A group of friends compares what they received while trick or treating. They find that the average number of pieces of candy received is 37, with a standard deviation of 7. What is the z-score corresponding to 57 pieces of candy? Report your answer as a number, rounding to two decimal places (you should be careful to report the negative sign if your z-score is negative). For example, if your answer is -3.4569, report -3.46.

2.86

A liquor wholesaler is interested in measuring how the price per bottle (in dollars) of a premium scotch whiskey affects sales (in quantity sold). Data for eight randomly selected weeks are displayed below together with the corresponding JMP output from fitting a least-squares regression line to the data. Use the fitted least-squares regression line presented in the JMP output above to predict the number of bottles sold per week when the price of a bottle is $19.47. Keep FOUR decimals in your intermediate calculations, and report your answer to TWO decimal places.

21.09

Soccer Referees. A manager for a well-known soccer club in England has complained that his players are injured more often because the match referees are allowing opposing players to commit fouls without being penalized. The manager for a rival club claims the opposite, that more fouls are being assessed than necessary due to players simulating injuries. We know that in 2016-17, there was an average of 21.5 fouls per match across all matches played. To see which manager is correct, you are asked to test the claim that referees called fewer fouls, on average, during the entire 2017-18 English Premier League (soccer) season than the previous season. The JMP output presented in the figure below presents partial JMP output from the statistical analysis conducting the test of hypotheses. Answer the following questions using this output. Please report the values exactly as they appear in JMP so you get credit for your correct answers. If you need to compute something, round to 3 decimal places

21.5 -2.2811 .0116

Corn. Corn yield in Iowa was historically low in 2012. For all IA counties, the yield of corn (bushels per acre) during 2012 follows a bell-shaped distribution (symmetric and unimodal) with a mean of 167 bushels per acre and a standard deviation of 24 bushels per acre. Use the 68-95-99.7 Rule (EmpiricalRule)to answer the following question. 0.15% of counties have a corn yield more than or equal to what value?

239

Corn. Corn yield in Iowa was historically low in 2012. For all IA counties, the yield of corn (bushels per acre) during 2012 follows a bell-shaped distribution (symmetric and unimodal) with a mean of 239 bushels per acre and a standard deviation of 19 bushels per acre. Use the 68-95-99.7 Rule (EmpiricalRule)to answer the following question. 0.15% of counties have a corn yield more than or equal to what value

296

Maria is the owner of Frozen Spoon, an ice cream shop that sells over 100 varieties of ice cream. Maria's ice cream is so good that many of her customers eat it until they get a brain freeze (i.e. a headache that occurs when one eats too much ice cream). Maria is concerned about the well being of her customers, and has decided to determine the mean amount of ice cream (in ounces, or oz) it takes for her customers to get brain freeze. For this reason, she selected a random sample of 29 customers to which she offered unlimited free ice cream on the condition that they eat until they get a brain freeze; not surprisingly, everyone accepted her offer. When she looked at the data collected, Maria found a sample mean of 4.36 oz. It is known that the distribution of the amount of ice cream it takes for all Maria's customers to get a brain freeze is symmetric, unimodal, and has a standard deviation of 1.886 oz. Compute a 95% confidence interval for the mean amount of ice cream it takes Maria's customers to get a brain freeze. Report the LOWER BOUND of this interval. Keep FOUR decimals in your intermediate calculations and report your final answer as a number, rounding to TWO decimal places.

3.67 Note that σσ is KNOWN, and CLT condition for sampling from symmetric and unimodal distribution is met (n≥15)(n≥15) indicating the use of standard normal distribution (z). 95% Confidence ⟹α=0.05 z α/2 = P (Z>z α/2 ) =α /2 = 0.05/2 = 0.025 P(Z<z a/2 )=1−0.025=0.975 (want lower tail probability to use table A) From Table A, z α/2 =1.96 Lower Tail of Confidence Interval: Lower Bound=(x)¯−z(α/2)⋅(σ/√n). ⟹(x)¯−1.96⋅(σ/√n)

Game of Thrones. Last July, a new episode of the hit HBO series "Game of Thrones" (GOT) was released every Sunday at 8 PM CST for 7 weeks. Many fans of the show started streaming the new episodes soon after 8 PM through the streaming service HBO Now or HBO Go. However, other fans waited longer to start watching the new episodes for various reasons, such as having to wait for their kids to go to bed or their parents to stop hogging the TV with reruns of The Golden Girls. Overall, assume that the distribution of the minutes that a typical GOT fan waits (after 8 PM) to stream a new episode is normally distributed with mean μ = 30.0 minutes and standard deviation σ = 20.6 minutes. Suppose HBO could provide us with the times it took 100 randomly sampled GOT watchers to start watching the 6th episode of last season: "Beyond the Wall." What is the value (in minutes) that corresponds to the 90th percentile of the sampling distribution of the sample mean of the 100 times? (Report your answer in minutes rounding to 2 decimal places.)

32.64 X¯¯¯¯∼N(μ,σn√)X¯∼N(μ,σn) x=zσn√+μx=zσn+μ P(Z<z)=.9 implies that z=1.28 x=mean+.128*sd

Page Loading Speed. widgetwarehouse.com is a popular e-commerce site that processes thousands of transactions every day. Tom, product manager at the site is analyzing some historical user data that widgetwarehouse.com has collected over several months to gain insights into how fast the site loads for each of its users. Tom knows that the time it takes each user to load the site follows a normal distribution with mean μ= 350 milliseconds and standard deviation σ= 6 milliseconds. Tom doesn't have a table or calculator with statistical capabilities, but he does know about the Normal distribution. Tom wants to know more about the typical user's experience rather than any individual. He investigates the sample mean for samples of size n= 160,000, which is a typical number of visitors the site has each minute. Tom knows that the sampling distribution of the sample means also follows a normal distribution. Because the sampling distribution is normal, you can use the empirical rule (68-95-99.7 rule) to calculate approximate probabilities for sample means as well. Use this fact to find the time that is greater than 99.85% of average loading times for samples of 160,000 users. (Report your answer in milliseconds rounding to 2 decimal places.)

350.05 X¯¯¯¯∼approx.N(μ,σn√)X¯∼approx.N(μ,σn) The empirical rule tell us that P(X¯¯¯¯>μ+3σn√)=0.9985P(X¯>μ+3σn)=0.9985 , therefore the value we are looking for is x=mean+3*sd/400

World GDP. Gross Domestic Product for a specific country is the monetary ($) measure of the market value of all the final goods and services produced by that country within a specific time period. There are a total of 193 countries in the world that report GDP to the International Monetary Fund. Suppose the distribution of GDP (billions of $) follows a skewed-right distribution with a mean μ = 432 billion and a standard deviation σ =$ 482 billion. What is the value (in billions of $) that corresponds to the third quartile of the sampling distribution of the sample mean for the GDP of the 100 randomly chosen countries? (Report your answer in billions of $ rounding to 2 decimal places.)

464.29 X¯∼approx.N(μ,σ/√n) x = z σ/√n + μ P(Z<z)=.75 implies that z=0.67 x=mean+0.67*sd/10

Highway Mpg. A United States car manufacturer is attempting to develop a model that has the highest miles per gallon (mpg) on the highway. The company gathered data of 93 cars on the road and a statistical consultant calculated their average highway mpg. The histogram and the summary statistics of the data collected are shown below. Use the JMP output provided to answer the following questions.

5

Highway Mpg. A United States car manufacturer is attempting to develop a model that has the highest miles per gallon (mpg) on the highway. The company gathered data of 93 cars on the road and a statistical consultant calculated their average highway mpg. The histogram and the summary statistics of the data collected are shown below. Use the JMP output provided to answer the following questions. For the distribution estimated by the histogram provided in the JMP output, what is the value of IQR?

5

Hot Dogs. Since their introduction in 1893, hot dogs have been hugely popular because they are easy to eat, convenient, and inexpensive. Two graduate students in the Food Sciences Department at ISU have decided to learn more about the calorie content of hot dogs for their thesis project. They collected data on the calorie content of 118 brands of hot dogs across the United States. The distribution and summary statistics are given below. Report the IQR.

53.75

Skinny jeans. A clothing company makes skinny jeans and has a design specification for the elasticity of these jeans set to 57 Pascals (Pa). They would like to know if the mean elasticity is below this specification. The figure below presents relevant JMP output for conducting a test of hypothesis to investigate this question. Answer the questions below using this output. Summary Statistics Mean 55.533796 Std Dev 4.5528105 Std Err Mean 1.0180394 Upper 95% Mean 57.664577 Lower 95% Mean 53.403015 N 20 t Test Test Statistic Prob > t 0.9170 Answer the following questions by filling in the blanks for each situation. Report each answer as a number and round to two (2) decimal places. Please note that answers that do not follow this exact specification will not receive any credit.

57 19 .0830 H0H0 : μ=57μ=57 , HaHa : μ<57μ<57 df=n-1, n=sample size p-value: (Prob<t)=1-(Prob>t)

Gallup Poll: Age and Willingness to elect an atheist president. To gain more insight into the demographics and political affiliations of American voters who have said they would vote for a qualified atheist presidential candidate, the designers of a Gallup Poll recorded additional information for each respondent, such as their age (reported as Age Group) and political affiliation (reported as Party). Using the charts below, answer the following question: How many respondents said "Yes" to vote for a qualified atheist candidate? Report your answer as a number, not a proportion (i.e. report 15 if your answer is 15 respondents.) There is more than one way you could answer this question, so feel free to select the chart that is easiest for you to use.

606

Gallup Poll: Age and Willingness to elect an atheist president. To gain more insight into the demographics and political affiliations of American voters who have said they would vote for a qualified atheist presidential candidate, the designers of a Gallup Poll recorded additional information for each respondent, such as their age (reported as Age Group) and political affiliation (reported as Party). Using the charts below, answer the following question: What percentage of the respondents who are willing to vote for a qualified atheist candidate are 31 years or older? Round your answer to 2 decimal places.

69 Answers may vary depending on the chart used and the rounding within. Using Chart D, we could get the answer as: (152+121+140)*100/606=68.15 Using Chart C, we could get the answer as: 12+8+5+9+6+5+12+7+5=69

The panels in the picture below (also linked here ) (labeled A, B, C, D, E and F) present six separate instances where 50 random samples were drawn from a normal population (with variance 1) and confidence intervals for the population mean were computed for each case and represented by horizontal segments. Within each panel, the intervals were constructed using the same sample size and confidence level. The panels differ in the following ways: • Only in some panels the population standard deviation was assumed to be known • For some, the sample size was 4, while for others it was 15.• The confidence level was not the same in all panels SOLID lines indicate the intervals that do NOT include the true population mean. Each panel shows results for 50 random samples. We recommend zooming in as much as you can and thinking about each panel before answering the question below. For panel B, what is the most likely confidence level used?

80% Let "a" be the number of intervals which contain the true population mean. Then the most likely confidence level will be (a*100)/50=2*a

The weekly salary paid to each employee of a small company is normally distributed with a mean of $820 and a standard deviation of $107. This small company has 36 employees. What is the average weekly salary for all 36 employees such that the probability of being above this weekly average salary is only 10%? (Report your answer rounding to 2 decimal places.)

842.83 X¯∼N(μ,σ/√n) x=zσn√+μx=zσn+μ P(Z>z)=.1 implies that z=1.28 x=mean+1.28*sd/6

Gallup Poll: Age and Willingness to elect an atheist president. To gain more insight into the demographics and political affiliations of American voters who have said they would vote for a qualified atheist presidential candidate, the designers of a Gallup Poll recorded additional information for each respondent, such as their age (reported as Age Group) and political affiliation (reported as Party). These data are represented in the charts below, from different perspectives. What percentage of ALL respondents who voted "Yes" for an atheist presidential candidate is between 51 and 70 years old AND self-identified as Independent?

9 The correct answer follows from Chart B

The panels in the picture below (also linked here ), labeled A, B, C, D, E and F, present six separate instances/situations where 50 random samples were drawn from a normal population (with variance 1) and confidence intervals for the population mean were computed for each case and represented by horizontal segments. Within each panel, the intervals were constructed using the same sample size and confidence level. The panels differ in the following ways: • Only in some panels the population standard deviation was assumed to be known • For some, the sample size was 4, while for others it was 15.• The confidence level was not the same in all panels SOLID lines indicate the intervals that do NOT include the true population mean. Each panel shows results for 50 random samples. We recommend zooming in as much as you can and thinking about each panel before answering the question below. Which were the panels for which the population standard deviation was known? Select all that apply.

A B C If the population standard deviation is known, then width = 2z⋅σ/√n If the population standard deviation is unknown, then width = 2t⋅s/√n For different samples we will have different values of s, hence the width will be different if the sigma is unknown.

Correlation Detective. The figure below consists of graphs for four different data sets, labeled as A, B, C, and D. For each of these graphs, you are required to identify the correct value of the linear correlation coefficient from the list provided.

A) .81 B) -.54 C).04 D) .39 1. When x and y show a positive, linear relationship in the scatterplot ⇒ r will be positive. When x and y show a negative, linear relationship in the scatterplot ⇒ r will be negative. 2. categorizing the strength: considered weak: 0<r≤|0.3| considered moderate:|0.3|<r≤|0.7| considered strong:|0.7|<r≤|1|

Gallup Poll: Atheism in U.S. Elections. A recent Gallup Poll tested American's willingness to vote for candidates who do not fit the traditional mold. The survey is based on telephone interviews conducted between January 16 and 29, 2020. 1033 interviews were conducted with respondents (eligible to vote) on landline telephones and cellular phones. Landline respondents are chosen at random within each household. The primary poll question that is of interest here is the following: "Between now and the 2020 political conventions, there will be discussion about the qualifications of presidential candidates - their education, age, religion, race and so on. If your party nominated a generally well-qualified person for president who happened to be [characteristic], would you vote for that person?" The results for this question are shown in the chart below. Please note that each respondent was allowed to choose only one party as their political affiliation, and remember that only U.S. citizens 18 and older are eligible to vote. Use this chart and the description of the Gallup poll to answer the following question.

All U.S. Citizens age 18 or older

Gallup Poll: Atheism in U.S. Elections. A recent Gallup Poll tested American's willingness to vote for candidates who do not fit the traditional mold. The survey is based on telephone interviews conducted between January 16 and 29, 2020. 1033 interviews were conducted with respondents (eligible to vote) on landline telephones and cellular phones. Landline respondents are chosen at random within each household. The primary poll question that is of interest here is the following: "Between now and the 2020 political conventions, there will be discussion about the qualifications of presidential candidates - their education, age, religion, race and so on. If your party nominated a generally well-qualified person for president who happened to be [characteristic], would you vote for that person?" The results for this question are shown in the chart below. Please note that each respondent was allowed to choose only one party as their political affiliation, and remember that only U.S. citizens 18 and older are eligible to vote. Use this chart and the description of the Gallup poll to answer the following question. What is the population of this study?

All U.S. citizens age 18 or older

Central Limit Theorem in Practice. The figure below presents four histograms, labeled with upper case letters A through D. One of these graphs displays the population of interest for a study. The other three graphs represent the sample means for 400 random samples of size 5, 12, and 60. Answer the following question using this information and the graphical displays. The histogram corresponding to samples of size 60 is shown in the graph labeled

B

Central Limit Theorem in Practice. The figure below presents four histograms, labeled with upper case letters A through D. One of these graphs displays the population of interest for a study. The other three graphs represent the sample means for 400 random samples of size 5, 12, and 60. Answer the following question using this information and the graphical displays. The histogram corresponding to samples of size 5 is shown in the graph labeled

C Graph D displays the population (with the greatest skew and largest spread). Graph C is the next largest spread so this should represent the sampling distribution for samples of size 5. The options are 5, 12 and 60. The larger the sample size the smaller the spread and the closer to normal the shape. Graph A should represent the sampling distribution for samples of size of 12. Graph B should represent the sampling distribution for samples of size 60. Notice that Graph B looks closest to a normal shape and also has the smallest spread/range.

White Sharks. White sharks are known to be a migratory species of shark, meaning they do not tend to remain in the same location year-round. However, there is usually a couple of white sharks together around the same location for some amount of time. Suppose that the amount of time they spend in Southern California follows a normal distribution with mean μ = 62 days and standard deviation σ = 5.4 days. Suppose we obtained the time spent in Southern California for 81 randomly sampled white sharks. Find the probability that the meantime (in days) it takes for the 81 sharks to leave Southern California is less than 65 days. (Leave answer as a proportion NOT a percent and report the value to 4 decimal places.)

Close to 1 Note that the standard error is given by SE=σn√SE=σn . In this case σ=5.4andn=81σ=5.4andn=81 . So that our standard error is SE=σn√=5.49=0.6SE=σn=5.49=0.6 . We know that X¯∼N(μ=62,σn√=0.6)X¯∼N(μ=62,σn=0.6) . Then P(X¯<65)P(X¯<65) will be large and close to 1 since 65 is several standard deviations above the mean 62.

The panels in the picture below (also linked here ), labeled A, B, C, D, E and F, present six separate instances/situations where 50 random samples were drawn from a normal population (with variance 1) and confidence intervals for the population mean were computed for each case and represented by horizontal segments. Within each panel, the intervals were constructed using the same sample size and confidence level. The panels differ in the following ways: • Only in some panels the population standard deviation was assumed to be known • For some, the sample size was 4, while for others it was 15.• The confidence level was not the same in all panels SOLID lines indicate the intervals that do NOT include the true population mean. Each panel shows results for 50 random samples. We recommend zooming in as much as you can and thinking about each panel before answering the question below. Which were the panels for which the population standard deviation was NOT known? Select all that apply.

D E F If the population standard deviation is known, then width = 2z⋅σn√2z⋅σn If the population standard deviation is unknown, then width = 2t⋅sn√2t⋅sn For different samples we will have different values of s, hence the width will be different if the sigma is unknown.

For a standard normal variable, find the P(Z≥5) . Report your answer as a probability to 4 decimal places.

Less than .0003

Opinion polls. You may all remember the Parkland shooting, which took place a little over a year ago, on 2/14/18. On February 8th2019, The New York Times published an article with the following headline: "Americans Support Gun Control but Doubt Lawmakers Will Act." The Reuters/Ipsos poll was conducted online in English between Jan. 11 and Jan. 28 2019 throughout the United States. It gathered responses from 6,813 adults, including 2,701 who identified as Democrats and 2,359 who identified as Republicans. According to the poll, 69 percent of Americans, including 85 percent of Democrats and 57 percent of Republicans, want restrictions placed on firearms. In this question, you will be asked to comment on several aspects of the study designed to assess adult American's enthusiasm for gun control. If we were to draw a bar chart of Americans who favor restrictions on firearms by political parties, which of the following categories would have the highest bar?

Democrats who favor firearm restrictions

A college statistics class conducted a survey of how students spend their money. They gathered data from a large random sample of college students who estimated how much money they typically spent each week in different categories (e.g., food, entertainment, etc.). The following statistics were calculated for money spent weekly on food:mean = $31.52; median = $30.00; interquartile range = $34.00; standard deviation = $21.60; range= $132.50. A student states that the median food cost tells you that a majority of students in this sample spend about $30 each week on food. How do you respond?

Disagree, the median tells you that 50% of the sample spent less than #30 and 50% of the sample spent more

A liquor wholesaler is interested in measuring how the price per bottle (in dollars) of a premium scotch whiskey affects sales (in quantity sold). Data for eight randomly selected weeks are displayed below together with the corresponding JMP output from fitting a least-squares regression line to the data. Use the output given below to identify the explanatory variable and report the correlation coefficient.

Explanatory Variable Price (X) Correlation -.9381 When determining the explanatory variable, consider which variable we put on the x-axis versus which variable we typically put on the y-axis. For the correlation consider using the relationship between the R2 and the correlation. In addition consider the value of the estimated slope when answering this question.

The panels in the picture below (also linked here ), labeled A, B, C, D, E and F, present six separate instances/situations where 50 random samples were drawn from a normal population (with variance 1) and confidence intervals for the population mean were computed for each case and represented by horizontal segments. Within each panel, the intervals were constructed using the same sample size and confidence level. The panels differ in the following ways: • Only in some panels the population standard deviation was assumed to be known • For some, the sample size was 4, while for others it was 15.• The confidence level was not the same in all panels SOLID lines indicate the intervals that do NOT include the true population mean. Each panel shows results for 50 random samples. We recommend zooming in as much as you can and thinking about each panel before answering the question below. The intervals in panels D and F were computed using the same confidence level. Which panel (D or F) was constructed using sample size 15?

F The larger sample size will produce narrower confidence interval (smaller width)

Correlation implies causation as long as we can ensure that the collected data are a random sample from the population.

False

Correlation near ±1 always implies a cause and effect relationship between both variables

False

Daniel is studying to be a string music teacher. Part of his student teaching assignment, he has to collect and analyze data relating to the length of the musical education and talent. He is working with middle schoolers, thus he asks his 100 students to provide data on the following two variables: Variable 1: The number of years each student had played their instrument. Variable 2: Musical talent index. This is actually a measurement provided by a group of independent listeners/judges. A value closer to 1 indicates a beginner and a value closer to 100 indicates a master musician. Although Daniel took an intro level statistics class, he didn't pay close attention during class. After studying the relationships between these variables, Daniel reported the following statement in his student teaching report. Is the statement is correct? "The correlation between variable 1 and variable 2 is .89. This implies that more years playing an instrument causes an increase in the talent of the musician."

False

Data have been published which indicates that the more children a couple has, the less likely the couple is to get a divorce. Therefore we can conclude that having fewer or no children causes couples to divorce

False

Deleting outliers from a data set is considered good statistical practice.

False

If we are only given the value of r2 and the least squares prediction line we cannot tell whether the relationship is positive or negative without looking at a scatterplot.

False

Increasing the sample size will increase the width of a 90% confidence interval.

False

Researchers from the Center of Disease Control (CDC) collected data for each of the 50 U.S. states to estimate the percentage of residents in each state who eat at least five servings of fruits/vegetables per day. We are interested in determining if these percentages differ across the four regions of the country —Midwest, Northeast, South, and West. Based on the above graphical display decide whether the statement below is TRUE or FALSE. The mean percentage of states that consume at least five fruits and vegetables per day is about the same for the MW and S region, namely around 22%.

False

Statistical significance always implies practical significance.

False

The 80th percentile of a probability distribution is the value x such that P(X > x) = 0.8.

False

The correlation coefficient between two quantitative variables is zero. This implies that there is no association between the two variables.

False

The larger the p-value the stronger the evidence against the null hypothesis.

False

The p-value for a statistical test of hypotheses is the same as the confidence level for a confidence interval

False

The standard deviation of a dataset can be positive or negative.

False

The value of the correlation contains only information about the strength of the linear relationship but not the direction.

False

Toyota Corolla. A large Toyota car dealership offers purchasers of new Toyota cars the option to buy their used cars as part of a trade-in. The dealership then sells the used cars for a small profit. To ensure a reasonable profit the dealers need to predict the price based on the age of the car. For that reason, data were collected on all previous sales of used Toyota Corollas at the dealership. The data includes the sales price of used cars (recorded in US dollars) and the age of the cars (recorded in months). The JMP output provided below presents the results obtained by the dealer when fitting a least square regression model to the data collected for 654 cars. Use this JMP output to answer the following question. Based on the information provided in the JMP output, are all the residuals obtained from the least-squares regression model are positive?

False

When describing the variability of a numerical distribution, you should always report the standard deviation because it has the same units as the data.

False

When finding the Least Squares Regression line it does not matter which variable is used as the response variable and which one as the explanatory variable, the LS regression line will be the same.

False

p-values can be positive or negative, depending on the null hypothesis and the sample mean.

False

Shere Hite is the author of the 1987 book called "Women and Love". In summary, the book is about the satisfaction of American women with their personal relationships. To prepare and research for her book, Hite sent questionnaires to 100,000 U.S. women asking about their marriages and relationships. Of the 100,000 women, about 4.5% responded and Hite used those responses to write her book. For the 4,500 women that responded, 98% claimed to be unsatisfied with their relationship in some way. Hite wanted to make the following claim in the book: "Ninety-eight percent of American women are unsatisfied with some aspect of their relationship." Is this a fair statement, based on the description of the study? Identify which of the following statements is correct (choose one):

No, because the findings were based on a voluntary response survey and only 4,500 of the 100,000 women who were asked to complete the survey responded.

A recent study by Pew Research Center investigated whether adults who use Facebook in the United States mainly get their news from Facebook. The study randomly surveyed 8,000 adult U.S. Facebook users. Of these, 3,600 (or 45%) responded that they do get their news primarily from Facebook. A news report comes out with the following headline: "Pew Research Center finds that 45% of all adults who use any social media site mainly get their news from those sites". Is this a fair statement, based on the description of the study? Identify which of the following statements is correct (choose one):

No, because the sample is not representative of the stated population.

Game of Thrones. Last July, a new episode of the hit HBO series "Game of Thrones" (GOT) was released every Sunday at 8 PM CST for 7 weeks. Many fans of the show started streaming the new episodes soon after 8 PM through the streaming service HBO Now or HBO Go. However, other fans waited longer to start watching the new episodes for various reasons, such as having to wait for their kids to go to bed or their parents to stop hogging the TV with reruns of The Golden Girls. Overall, assume that the distribution of the minutes that a typical GOT fan waits (after 8 PM) to stream a new episode is normally distributed with mean μ = 30.8 minutes and standard deviation σ = 20.4minutes. Suppose HBO could provide us with the times it took 100 randomly sampled GOT watchers to start watching the 6th episode of last season: "Beyond the Wall." The corresponding sampling distribution of the sample mean, i.e. the distribution of all sample means based on all samples of size 100 from all GOT watchers, has what shape? Select one answer from those provided below.

Normal The sampling distribution of the sample means is a Normal distribution (regardless of the sample size), if the population follows a Normal distribution. Or, the sampling distribution of the sample means is approximately a Normal distribution, if the population follows a non-Normal distribution and the sample size is large enough (CLT). Usually, to apply CLT, the sample size needs to be at least 15 if the population follows a bell-shaped distribution, and the sample size needs to be at least 30 if the population follows a skewed distribution.

Baseball. During the early part of the 1994 baseball season, many sports fans and baseball players noticed that the number of home runs being hit seemed to be unusually large. Below is the data set(SORTED) for the number of home runs by American League team through June of 1994. Data Set (American League):{35, 40, 43, 49, 51, 54, 57, 58, 58, 64, 68, 68, 75, 77} Find the range and IQR of the data set.

Range: 42 IQR 19

Suppose you were to collect data for blood alcohol level and reaction time (in minutes). You want to make a scatterplot but need to determine what should be the proper x and y axes. Which variable should you use as the explanatory variable? Furthermore, would you expect a positive, negative, or no linear association between the two variables?

Reaction Time Positive When considering the Explanatory variable consider which variable may have an impact on the second variable (the response). In this case: Would you expect alcohol to impact the reaction time or would you expect the reaction time impact the blood alcohol level? For the correlation: Remember that a slower reaction time would imply it takes longer (more minutes) to react.

Operating expenses in U.S. private and public colleges are funded through individual, corporation, and foundation contributions (a.k.a. donations). Much of this money is put into an endowment fund, and the college spends only the interest earned by the fund. A random sample of sixteen college endowments was drawn from the list of endowments in the Chronicle of Higher Education Almanac (Sept. 2, 1996). The endowments (in millions of dollars) were recorded and provided to users to be analyzed. Analysts calculated a confidence interval for the mean college endowment across all U.S. private and public colleges. The interval calculations were done using JMP, and are shown in the rightmost part of the output provided to you (under the heading Confidence Intervals). Using this JMP output (below and linked here download), report the following quantities

Sample Size-16 The Student T's distribution was used to calculate the interval Sample mean - 625.79 Margin of error 78.697 Interval width/length -157.393 Lower bound - 547.090 Confidence level used - 99%

Operating expenses in U.S. private and public colleges are funded through individual, corporation, and foundation contributions (a.k.a. donations). Much of this money is put into an endowment fund, and the college spends only the interest earned by the fund. A random sample of sixteen college endowments was drawn from the list of endowments in the Chronicle of Higher Education Almanac (Sept. 2, 1996). The endowments (in millions of dollars) were recorded and provided to users to be analyzed. Analysts calculated a confidence interval for the mean college endowment across all U.S. private and public colleges. The interval calculations were done using JMP, and are shown in the rightmost part of the output provided to you (under the heading Confidence Intervals). Using this JMP output (below and linked here download), report the following quantities:

Sample size 16 The Student's T distribution was used to calculate the interval Sample mean 625.79 Margin of error 46.818 Interval width /length 93.64 Lower bound 578.968 Confidence level used 90% Normal distribution (z*) should be used if the population standard deviation is known, and Student's T distribution (t*) should be used if only the sample standard deviation is known. For a given interval (a, b):width = b - a;margin of error (ME) = (b - a)/2;critical value z∗=ME⋅n√σz∗=ME⋅nσ or t∗=ME⋅n√st∗=ME⋅ns ;standard deviation σ=ME⋅n√z∗σ=ME⋅nz∗ or s=ME⋅n√t∗s=ME⋅nt∗ . If a critical value is known and the degree of freedoms (df) is known, the confidence level can be found in Table D.

It takes different times for different workers to perform the same specific task, as it is shown in the distribution below. The box plot displays time in minutes. Which of the following statements must be true?

The distribution is skewed to the left

A third variable that is not originally included in a study but may help to explain relationships between other variables is called a lurking variable.

True

Assume there is a visual linear relationship between the explanatory variable x and the response variable y. In this case a negative correlation between the response variable y and the explanatory variable x indicates that larger values of x are generally associated with smaller values of y

True

Changing the mean of a normal distribution does not affect the standard deviation of the distribution.

True

For a simple linear regression model fit to a set of data, even if the R2-value(i.e. the coefficient of variation) is close to 1 (or 100%), we cannot conclude that the model provides an adequate fit to the data.

True

For different values of an explanatory variable the simple linear regression model describes the change in the predicted value (or mean) of the response variable.

True

If men always marry women who are 2 years younger than them, then the correlation between the age of a husband and his wife would be equal to 1.

True

The correlation indicates both the direction and strength of a linear relationship.

True

The distinction between explanatory and response variables is not important for correlation.

True

The t-statistics for a test of hypotheses follows a t-distribution under the null hypothesis.

True

The variance of a random variable is sensitive to outliers.

True

The weekly salary paid to each employee of a small company is normally distributed with a mean of $800 and a standard deviation of $100. This small company has 36 employees. What is the probability that the average of all 36 employees' salaries is above $950?

Very close to 0 but greater than 0 Note that the standard error is given by SE=σn√SE=σn . In this case σ=100andn=36σ=100andn=36 . So that our standard error is SE=σn√=1006≈16.667SE=σn=1006≈16.667 . We know that X¯¯¯¯∼N(μ=800,σn√=1006)X¯∼N(μ=800,σn=1006) . Then P(X¯¯¯¯>950)P(X¯>950) will be small and very close to 0 since 950 is many standard deviations away from the mean of 800

Choose all that apply. In statistical inference for the population mean μ when σ is unknown, we use the t-distribution which has heavier tails than the normal distribution. This is necessary because using the sample standard deviation instead of the population standard deviation

adds more variability to the distribution of the test statistic. adds more uncertainty to the estimation of the population mean μ.

For a Student t-distribution with 30 degrees of freedom we have that P(t<5) is (choose the correct answer from those provided below):

approximately 1

For a standard normal variable, what is true about P(−5≤Z≤−0.5)P(−5≤Z≤−0.5)?

it is equal to .3085 P(−5≤Z≤−0.5)=P(Z≤−0.5)−P(Z≤−5)=0.3085−P(Z≤−5)P(−5≤Z≤−0.5)=P(Z≤−0.5)−P(Z≤−5)=0.3085−P(Z≤−5) P(−5≤Z≤−0.5)P(−5≤Z≤−0.5) is clearly lower than 0.3085, since it is equal to 0.3085 minus a probability -5 is outside of the values included in Table A, what meansP(Z≤−5)P(Z≤−5) is smaller than P(Z≤−3.49)=0.0002P(Z≤−3.49)=0.0002 , equivalently −P(Z≤−5)>−0.0002−P(Z≤−5)>−0.0002 Then, P(−5≤Z≤−0.5)=0.3085−P(Z≤−5)>0.3085−0.0002=0.3083P(−5≤Z≤−0.5)=0.3085−P(Z≤−5)>0.3085−0.0002=0.3083 Therefore P(−5≤Z≤−0.5)P(−5≤Z≤−0.5) is greater than 0.3083.

A negative correlation coefficient between the response variable y and the explanatory variable x indicates that

large values of x are associated with small values of y. negative association: smaller(larger) values of the explanatory variable are associated with larger(smaller) values of the response variable.


Ensembles d'études connexes

Chapter 17 - Gene Expression: Gene→Protein

View Set

Supply Chain Chapter 6 Strategic Sourcing

View Set

Comptia A+ Hard Drive Technologies

View Set

Real Estate Broker Pre-Assessment

View Set

Child Psych Exam 3 (Chapter 12 - The Family)

View Set