Quantitative Analysis II review of all the assignment (chap1-5)

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

The Annenberg Inclusion Initiative tallied the number of films released by seven top distributors in 2019 that had female leads or​ co-leads and that had leads or​ co-leads who were from an underrepresented population. The results are shown in the accompanying table. Which studio earned the most from films with leads from underrepresented populations​?

Walt Disney Studios earned the most EXPLANATION -The data for the total revenues from films with female leads is in the second column of the given table. -The row that corresponds to the studio that earned the most will have the largest value. -The largest value in the second column is the 4099.7 in the row for Walt Disney Studios. So that studio earned the most for films with female leads.

A particular IQ test is standardized to a Normal​ model, with a mean of 100 and a standard deviation of 15.

explained in depth in notebook.

A distribution is said to be skewed to the right if

it has a long tail tail that trails toward the right side

A distribution is said to be skewed to the left if

it has a long tail that trails to the left

Some IQ tests are standardized to a Normal model ​N(100​,15​). ​a) What cutoff value bounds the highest 10​% of all​ IQs? ​

-the highest of 10% of all IQs corresponds to 90th percentile. -divide 90 by 100= 0.9 -find the cut point using the invNorm Function on your calculator. Enter 0.9 for area. leave both the mean and standard deviation as they are. move the tail to the LEFT. -The cut point is z=1.28 z= (y-μ)/σ 1.28=(y-100)/15 y=119.2 a) the cutoff value is 119.2

A company that manufactures rivets believes the shear strength​ (in pounds) is modeled by N(650,20).Use the 68-95-99.7 Rule to complete parts a through d below. ​b) Would it be safe to use these rivets in a situation requiring a shear strength of 630 pounds? Explain

​No, because about​ 16% of all of this​ company's rivets have a shear strength of less than 630 pounds.

Some IQ tests are standardized to a Normal model ​N(100​,15​).​ c) What cutoff values bound the middle 70​% of the​ IQs?

- 100-70=30 -the middle 70% leaves 30% left over, half on each side. Thus -z is at the 15th percentile and z is at the 85th ( from 100-15) percentile. -The cutoff scores for the middle 70​% are the 15th and 85th percentiles. -find the z-score of the 15 percentile and round to two decimal places. 15/100=0.15 find the cut point using the invNorm Function on your calculator. Enter 0.15 for area. leave both the mean and standard deviation as they are. move the tail to the LEFT. The cut point is z= −1.04 z= (y-μ)/σ -1.04=(y-100)/15 y=84.4 --find the z-score of the 85 percentile and round to two decimal places. find the cut point using the invNorm Function on your calculator. Enter 0.85 for area. leave both the mean and standard deviation as they are. move the tail to the LEFT. The cut point is z=1.04 z= (y-μ)/σ 1.04=(y-100)/15 y=115.6 c) The cutoff values are 84.4 and 115.6

A company that manufactures rivets believes the shear strength​ (in pounds) is modeled by N(650,20).Use the 68-95-99.7 Rule to complete parts a through d below. ​d) Rivets are used in a variety of applications with varying shear strength requirements. What is the maximum shear strength for which you would feel comfortable approving this​ company's rivets? Explain your reasoning. Select the correct choice below and fill in the answer box to complete your choice.

-A very small probability of failure is​ desired, less than 1 out of a​ million, or 6 standard deviations below the mean. The maximum shear strength that should be approved is---pounds - since it's 6 standard deviation below the mean use this formula: μ -6σ 650-6(20)= 530 pounds d) -A very small probability of failure is​ desired, less than 1 out of a​ million, or 6 standard deviations below the mean. The maximum shear strength that should be approved is 530 pounds

People with​ z-scores of 2.25 or above on a certain aptitude test are sometimes classified as geniuses. If aptitude test scores have a mean of 100 and a standard deviation of 24 ​points, what is the minimum aptitude test score needed to be considered a​ genius?

-A​ z-score measures the distance of a data value from the mean in standard deviations. The​ z-score of a data value y is given by the formula z=y−y/s​, where y is the mean and s is the standard deviation. 2.25=y-100/24 y=154 The minimum aptitude test score needed to be considered a genius is 154 points.

Here are boxplots of the points scored during the first 10 games of the season for both Holly and Sue. ​a) Summarize the similarities and differences in their performance so far. ​b) The coach can take only one player to the state championship. Which one should she​ take? Why?

-Both girls have the same approximate​ median, but Holly has a larger IQR. -A and B are both​ possible, depending on the​ coach's preference. (a: She should take Holly​, because she has the ability to score a higher point total. b: She should take Sue​, because she is a more consistent performer.)

A highly rated community college has over​ 60,000 students and seven different campuses. One of its highest density classes offered is Introduction to Statistics. The statistics course is required for nearly every major offered at the college and therefore is considered a strategic course for the college. The​ college's leadership is very interested in the relationship between the class size of its statistics courses and​ students' final grades for the course.​ Specifically, the college is concerned with the low pass rate of some of its class sections and is determined to remedy the situation. The​ college's institutional research department recently collected data for analysis in order to support​ leadership's upcoming discussion regarding the low pass rate of some of its statistics class sections. Final grades from a random sample of 300 class sections over the last five years were collected. The research division also conducted​ analysis, using archived​ data, to determine the class size of these 300 class sections. The Class​ Number, Campus, Class​ Size, Average Final​ Grade, Number of​ "F"s, Average G.P.A. and​ Successful/Unsuccessful data were collected for these 300 class sections. Using the Empirical Rule with a mean of 2.66 and a standard deviation of​ 0.23, between what two values do​ 68% of the Average G.P.A. data​ fall?

2.66-0.23= 2.43 2.66+0.23= 2.89 Between 2.43 and 2.89

Two companies are vying for a​ city's "Best Local​ Employer" award, to be given to the company most committed to hiring local residents. Although both employers hired 300 new people in the past​ year, Company A brags that it deserves the award because​ 70% of its new jobs went to local​ residents, compared to only​ 60% for Company B. Company B concedes that those percentages are​ correct, but points out that most of its new jobs were​ full-time, while most of Company​ A's were​ part-time. Not only​ that, says Company​ B, but a higher percentage of its​ full-time jobs went to local residents than did Company​ A's, and the same was true for​ part-time jobs.​ Thus, Company B​ argues, it's a better local employer than Company A. Suppose Company A filled 50 of 100​ full-time and 160 of 200​ part-time positions with local residents. Which hiring by Company B gives an example of the situation described​ above?

Company B filled 179 of 299​ full-time positions and 1​ part-time position with local residents.

Fifty-three men completed the​ men's alpine downhill part of the super combined. The gold medal winner finished in 100.25 seconds. The accompanying table lists the times​ (in seconds) for all competitors. Complete parts a through d below ​d) Make a histogram of these times. What can be seen from the​ histogram? Choose the correct graph below.

The data are not Normal. They are unimodal and skewed right.

Fifty-three men completed the​ men's alpine downhill part of the super combined. The gold medal winner finished in 100.25 seconds. The accompanying table lists the times​ (in seconds) for all competitors. Complete parts a through d below ​c) Why would these two percentages not​ agree?

The data distribution is not unimodal and symmetric.

A highly rated community college has over​ 60,000 students and seven different campuses. One of its highest density classes offered is Introduction to Statistics. The statistics course is required for nearly every major offered at the college and therefore is considered a strategic course for the college. The​ college's leadership is very interested in the relationship between the class size of its statistics courses and​ students' final grades for the course.​ Specifically, the college is concerned with the low pass rate of some of its class sections and is determined to remedy the situation. The​ college's institutional research department recently collected data for analysis in order to support​ leadership's upcoming discussion regarding the low pass rate of some of its statistics class sections. Final grades from a random sample of 300 class sections over the last five years were collected. The research division also conducted​ analysis, using archived​ data, to determine the class size of these 300 class sections. The Class​ Number, Campus, Class​ Size, Average Final​ Grade, Number of​ "F"s, Average G.P.A. and​ Successful/Unsuccessful data were collected for these 300 class sections. What needs to hold true in order to use the Empirical​ Rule?

The distribution must be approximately normal.

A university teacher saved every​ e-mail from students in a large introductory statistics class during an entire term. He then​ counted, for each student who had sent him at least one​ e-mail, how many​ e-mails each student had sent. The accompanying histogram shows the distribution of​ e-mails sent by students. Describe the shape of the distribution.

The shape is unimodal and skewed right

A survey of autos parked in student and staff lots at a large university classified the brands by country of​ origin, as seen in the table. Complete parts a through f below.​​ f) Do you think that the origin of the car is independent of the type of​ driver? Explain.

They do not appear to be independent because the conditional distributions of origin are significantly different for at least one group for the two driver classifications.

A survey of autos parked in student and staff lots at a large university classified the brands by country of​ origin, as seen in the table. Complete parts a through f below. c) What percent of the students owned American​ cars?

To calculate the​ percent, divide the number of American cars owned by students by the number of cars owned by students. The number of cars owned by students was 106+30+58=194. The number of American cars owned by students was 106. Divide the count 106 by the total 194 and convert the result to a percent​, rounding to one decimal place. 106/194=0.54 c) 54.6%

A survey of autos parked in student and staff lots at a large university classified the brands by country of​ origin, as seen in the table. Complete parts a through f below. ​a) What percent of all the cars surveyed were​ foreign? ​

To calculate the​ percent, divide the number of foreign cars by the total number of cars. Part 2 106+105+30+13+58+50 = 362 The total number of cars was 362 Part 3 30+ 13+58+50 = 151 The number of foreign cars was 151. Divide the count 151 by the total 362 and convert the result to a percent​, rounding to one decimal place. 151/362 = 0.417 a) 41.7%

Students were asked how many songs they had in their digital music libraries. To the right is a display of the responses. Use this display to complete parts a and b below. ​a) What aspect of this distribution makes it difficult to​ summarize, or to​ discuss, center and​ spread? ​b) What would you suggest doing with these data if we want to understand them​ better?

a)the extreme skew b)​Re-express the data.

The pie chart shows the ratings assigned to 842 ​first-run movies released in a recent year. ​a) Is this an appropriate display for these​ data? Explain. ​b) Which was the most common​ rating?

a)​Yes, because each movie falls into only one category and no categories overlap. b) Not rated

A highly rated community college has over​ 60,000 students and seven different campuses. One of its highest density classes offered is Introduction to Statistics. The statistics course is required for nearly every major offered at the college and therefore is considered a strategic course for the college. The​ college's leadership is very interested in the relationship between the class size of its statistics courses and​ students' final grades for the course.​ Specifically, the college is concerned with the low pass rate of some of its class sections and is determined to remedy the situation. The​ college's institutional research department recently collected data for analysis in order to support​ leadership's upcoming discussion regarding the low pass rate of some of its statistics class sections. Final grades from a random sample of 300 class sections over the last five years were collected. The research division also conducted​ analysis, using archived​ data, to determine the class size of these 300 class sections. The Class​ Number, Campus, Class​ Size, Average Final​ Grade, Number of​ "F"s, Average G.P.A. and​ Successful/Unsuccessful data were collected for these 300 class sections. College leadership is interested in analyzing the average G.P.A. of this sample of 300 of its statistics class sections. Calculate the probability of randomly selecting a class section with an average G.P.A. greater than 3.00.​ (Use the mean and standard deviation of the Average G.P.A. data.​ Also, if appropriate based upon your visual analysis of a histogram of the Average G.P.A.​ data, use the Normal distribution to answer this​ question.)

on statcrunch, do the followings: Stats>summary stats> columns> average GPA> compute. You'll get the value shown in the picture. on your calculator, press normal cdf, leave the lower as it is. In the upper type 3 (which is under n on the table), in the μ type the value that's under the mean on the table 2.6557333(type the entire number). in the σ, type the value that's under the standard deviation 0.22523224 (type the whole number). press enter. You'll get 0.9368053301 type 1-0.9368053301= 0.0631946699 multiply 0.0631946699 by 100 = 6.32% 6.32% is your answer

The company that sells frozen pizza to stores in four markets in the United States​ (Denver, Baltimore,​ Dallas, and​ Chicago) wants to examine the prices that the stores charge for pizza slices. The accompanying boxplots compare data from a sample of stores in each market. Complete parts a and b below. ​a) Do prices appear to be the same in the four​ markets? Explain.

​No, prices appear to be both higher on average and more variable in Baltimore than in the other three cities.

A company that sells frozen pizza to stores in four markets in the United States​ (Denver, Baltimore,​ Dallas, and​ Chicago) wants to examine the prices that the stores charge for pizza slices. Boxplots are given comparing the data from a sample of stores in each market. The mean price of pizza in Baltimore was​ $2.85, $0.23 higher than the mean price of​ $2.62 in Dallas. To see if that difference was​ real, or due to​ chance, we took the 156 prices from Baltimore and Dallas and mixed those 312 prices together. Then we randomly chose 2 groups of 156 prices​ 10,000 times, and computed the difference in mean price each time. The histogram shows the distribution of those​ 10,000 differences. Use the accompanying histogram and boxplots to complete parts a through c below. ​c) Consider a similar analysis using shuffling to compare prices in Chicago and Denver. Do you think that the actual difference in mean prices would be different from what you might expect by​ chance?

​No, since the majority of data in the boxplots for Chicago and Denver are close​ together, the difference in sample means would likely be a value observed often by randomly shuffling the data.

A company that sells frozen pizza to stores in four markets in the United States​ (Denver, Baltimore,​ Dallas, and​ Chicago) wants to examine the prices that the stores charge for pizza slices. Boxplots are given comparing the data from a sample of stores in each market. The mean price of pizza in Baltimore was​ $2.85, $0.23 higher than the mean price of​ $2.62 in Dallas. To see if that difference was​ real, or due to​ chance, we took the 156 prices from Baltimore and Dallas and mixed those 312 prices together. Then we randomly chose 2 groups of 156 prices​ 10,000 times, and computed the difference in mean price each time. The histogram shows the distribution of those​ 10,000 differences. Use the accompanying histogram and boxplots to complete parts a through c below ​b) Do you think the presence of the outliers in the accompanying boxplots affects your​ conclusion?

​No, the outliers lie fairly close to the minimum and maximum​ values, and only account for a small proportion of the observations.

The company that sells frozen pizza to stores in four markets in the United States​ (Denver, Baltimore,​ Dallas, and​ Chicago) wants to examine the prices that the stores charge for pizza slices. The accompanying boxplots compare data from a sample of stores in each market. Complete parts a and b below. ​b) Does the presence of any outliers affect your overall conclusions about prices in the four​ markets?

​No, the presence of outliers does not affect the overall conclusions.

A company must decide which of two delivery services it will contract with. During a recent trial​ period, the company shipped numerous packages with each service and kept track of how often deliveries did not arrive on time. Use the accompanying table to complete parts​ a) through​ c) below. ​c) The results here are an instance of what​ phenomenon? Choose the correct answer below.

​Simpson's Paradox

A company must decide which of two delivery services it will contract with. During a recent trial​ period, the company shipped numerous packages with each service and kept track of how often deliveries did not arrive on time. Use the accompanying table to complete parts​ a) through​ c) below. ​b) On the basis of the results in part​ a, the company has decided to hire Company A. Based on the information given in the data​ table, do you agree that Company A delivers on time more​ often? Why or why​ not? Be specific. Choose the correct answer below.

-Determine the percentage of late regular deliveries for Company A. (14/400) times 100% = 3.5% -Determine the percentage of late overnight deliveries for Company A.(14/400) times 100%= 3.5% Determine the percentage of late regular deliveries for Company B. (2/100) times 100% = 2% Determine the percentage of late overnight deliveries for Company B. (28/400) times 100% = 7% 2​% of regular deliveries are late and 7​% of overnight deliveries are late for the company Company B. Compare the individual percentages to one another to determine the better company in terms of delivery times. ANSWER b) ​No, Company B has a lower percentage of regular deliveries that are late and a lower percentage of overnight deliveries that are late.

A company must decide which of two delivery services it will contract with. During a recent trial​ period, the company shipped numerous packages with each service and kept track of how often deliveries did not arrive on time. Use the accompanying table to complete parts​ a) through​ c) below. ​a) Compare the two​ services' overall percentage of late deliveries.

-The total number of late deliveries for Company A is 14+14=28 -The total number of overall deliveries for Company A is 400+100=500 -Overall percentage of late deliveries for company A: (28/500) times 100 = 5.6% -The total number of late deliveries for Company B is 2+28=30 -The total number of overall deliveries for Company B is 400+100=500 -Overall percentage of late deliveries for company B: (30/500) times 100= 6% answer: Overall, Company A is late 5.6​%of the time. Company B is late 6.0​% of the time.

A survey of autos parked in student and staff lots at a large university classified the brands by country of​ origin, as seen in the table. Complete parts a through f below. b) What percent of the American cars were owned by​ students? ​

-To calculate the​ percent, divide the number of American cars owned by students by the total number of American cars. 106+105= 211 -The total number of American cars was 211 -The number of American cars owned by students was 106. Divide the count 106 by the total 211 and convert the result to a percent​,rounding to one decimal place. 106/211=0.502 b) 50.2%

An incoming MBA student took placement exams in economics and mathematics. In​ economics, she scored 81 and in math 86. The overall results on the economics exam had a mean of 71 and a standard deviation of 7​, while the mean math score was 66​, with a standard deviation of 11. On which exam did she do better compared with the other​ students?

-To determine which exam the student did better​ on, find her​ z-score on each test using the formula​ below, where y is the​ student's score on the​ exam, μ is the mean of the exam scores and σ is the standard deviation. The exam the student performed better on is the exam which has the larger​ z-score. z= y-μ/σ -Find the​ student's z-score on the economics exam. The mean economics exam score was 71 and the standard deviation was 7. z= (student's score on economics exam)-(mean score on economics exam)/ (standard deviation of economics exam) (81-71)/7 = 1.43 z econ= 1.43 -Find the​ student's z-score on the math​ exam, rounding to two decimal places. The mean math exam score was 66 and the standard deviation was 11. z= (student's score on Math exam)-(mean score on Math exam)/ (standard deviation of Math exam) (86-66)/11 = 1.82 Z math = 1.82 Since the​ z-score for the mathematics exam is higher than the​ z-score for the economics ​exam, the student performed better on the mathematics exam. ANSWER: Since she scored 1.431 standard deviations above the mean in economics and 1.82 standard deviations above the mean in​ mathematics, she did better on the mathematics exam.

A company that manufactures rivets believes the shear strength​ (in pounds) is modeled by N(650,20).Use the 68-95-99.7 Rule to complete parts a through d below. ​c) About what percent of these rivets can be expected to fall below 690 pounds?

-Use the 68-95-99.7 rule to determine the percent of rivets that are more than two standard deviations from the mean. -About ​100%−​95%=​5% of rivets are more than two standard deviations from the mean. Since the Normal model is​ symmetric, divide that result by two to get the percent that are more than two standard deviations greater than the mean. -About​ 2.5% of rivets have a strength greater than 690 pounds C) ​Therefore, about​ 97.5% of rivets have a strength less than 690 pounds.

A highly rated community college has over​ 60,000 students and seven different campuses. One of its highest density classes offered is Introduction to Statistics. The statistics course is required for nearly every major offered at the college and therefore is considered a strategic course for the college. The​ college's leadership is very interested in the relationship between the class size of its statistics courses and​ students' final grades for the course.​ Specifically, the college is concerned with the low pass rate of some of its class sections and is determined to remedy the situation. The​ college's institutional research department recently collected data for analysis in order to support​ leadership's upcoming discussion regarding the low pass rate of some of its statistics class sections. Final grades from a random sample of 300 class sections over the last five years were collected. The research division also conducted​ analysis, using archived​ data, to determine the class size of these 300 class sections. The Class​ Number, Campus, Class​ Size, Average Final​ Grade, Number of​ "F"s, Average G.P.A. and​ Successful/Unsuccessful data were collected for these 300 class sections. Using the Empirical Rule with a mean of 2.66 and a standard deviation of​ 0.23, what percent of Average G.P.A. data fall above​ 3.35?

0.15%

Fifty-three men completed the​ men's alpine downhill part of the super combined. The gold medal winner finished in 100.25 seconds. The accompanying table lists the times​ (in seconds) for all competitors. Complete parts a through d below ​a) The mean time was 103.90 seconds, with a standard deviation of 2.65 seconds. If the Normal model is​ appropriate, what percent of times will be greater than 108.49seconds?

Find the​ z-score using the formula z=(y−μ)/σ​, where y is the​ observation, μ is the​ mean, and σ is the standard​ deviation, rounding to two decimal places. z= (108.49-103.90)/(2.65) z>1.73 Area(z>1.73) For this we're finding the area to the right of 1.73(​ round the​ area to three decimal places) on your calculator, choose the "normal cdf" option, this time around put your z-score for the "lower" and hit "2nd", common, and 99 to put positive 1E99 as the "upper" , leave the mean(0) and standard variation(1) as they are Area(z>1.73)= 0.042 multiple 0.042 by 100 a) 4.2% will be greater than 108.49seconds

A​ company's customer service hotline handles many calls relating to​ orders, refunds, and other issues. The​ company's records indicate that the median length of calls to the hotline is 5.2 minutes with an IQR of 2.7 minutes. ​a)If the company were to describe the duration of these calls in seconds instead of​ minutes, what would the median and IQR​ be? ​b)In an effort to speed up the customer service​ process, the company decides to streamline the series of pushbutton menus customers must​ navigate, cutting the time by 36 seconds. What will the median and IQR of the length of hotline calls​ become?

PART A -Since the durations of all the calls are being multiplied by​ 60, multiply the median and IQR by 60. 5.2 times 60 = 312 2.7 times 60 = 162 Therefore, the new median will be 312 seconds and the new IQR will be 162 seconds. PART B Find the new median after the durations of all the calls have decreased by 36 seconds 312-36= 276 After the durations of all the calls have been decreased by 36 seconds the IQR will remain the SAME. Therefore, the new median is 276 seconds and the IQR remains the same at 162 seconds.

The mean household income in the US in 2019 was about ​$89,930 and the standard deviation was about ​$85,000. ​(The median income was ​$59,039​.) ​a) If a Normal model is used for these​ incomes, what would be the household income of the top 15​%? ​b) How confident can you be in the answer in part​ a? ​c) Why might the Normal model not be a good one for​ incomes?

PART A -divide 15 by 100 = 0.15 -find the cut point by using the "invNorm" option on your calculator. Type in 0.15 for the area. the mean should remain 0 and the standard variation shoul remain 1. switch the "tail" to right and press enter. - the cut point is z= 1.04 -use z= (y-μ)/σ to convert the​ z-score into an income by solving for y. 1.04 = (y-89,930)/85,000 y=178330 a)the household income of the top 15​% would be $178330 b) It is only possible to be confident in the answer from part a if the distribution of incomes is unimodal and symmetric without obvious outliers. C)Since the median is much less than the mean and the standard deviation and mean are very​ close, the distribution of incomes is likely right skewed.

A study was conducted on shoe sizes of​ students, reported in European sizes. For the​ women, the mean size was 37.44 with a standard deviation of 1.65. To convert European shoe sizes to U.S. sizes for​ women, use the equation shown below. USsize=EuroSize×0.7906−22 ​a) What is the mean​ women's shoe size for these responses in U.S.​ units? ​b) What is the standard deviation in U.S.​ units?

PART A US.mean = European mean×0.7906−22 = 37.44 x 0.7906 - 22 = 7.60 ​a) The mean​ women's shoe size in U.S. units is 7.60 PART B US.SD= European SD ×0.7906 = 1.65 x 0.7906 = 1.30 ​b) the standard deviation in U.S.​ units is 130

A study looked at outliers arising from a plot of average wind speed by month in the Hopkins Forest. Each was associated with an unusually strong​ storm, but which was the most remarkable for its​ month? The summary statistics for each month are shown in the accompanying table. The outliers had values of 6.729 ​mph, 3.931 ​mph, and 2.533 ​mph, for​ February, June, and​ August, respectively. ​a) What are their​ z-scores? ​b) Which was the most extraordinary wind​ event?

PART A z= (y-ȳ)/s y=observation, ȳ=mean, s=standard deviation FINDING FEBRUARY Z SCORE (look at the picture) y=6.729 ȳ=2.3237 s=1.5766 z=(6.729-2.3237)/1.5766 z= 2.79 FINDING June Z SCORE (look at the picture) use same formula just change numbers y=3.931 ȳ=0.8565 s=0.7948 z=3.87 FINDING AUGUST Z SCORE (look at the picture) use same formula just change numbers y=2.533 ȳ=0.626 s=0.5969 z=3.19 Part B ​b) The most extraordinary wind event was in JUNE. It has the LARGEST z-score

The accompanying histogram shows the distribution of mean ACT composite scores for all Wisconsin public schools in 2019.​ 75.3% of the data points fall between one standard deviation below the mean and one standard deviation above the mean. Complete parts a and b below. ​b) The Normal probability plot on the left shows the distribution of these scores. The plot on the right shows the same data with the Milwaukee area schools​ (mostly in the low​ mode) removed. What do these plots tell you about the shape of the​ distributions? Select the correct choice below.

PART B The shape of the distribution with the Milwaukee area schools removed is more approximately Normal than the distribution with all the schools.

The accompanying histogram shows the distribution of mean ACT composite scores for all Wisconsin public schools in 2019.​ 75.3% of the data points fall between one standard deviation below the mean and one standard deviation above the mean. Complete parts a and b below. ​a) Give two reasons a Normal model is not appropriate for these data. Select all that apply.

Part A -If a Normal model were​ appropriate, about​ 68% of the data points would fall within one standard deviation of the mean. -The data are bimodal with a high mode and a low mode.

A company that sells frozen pizza to stores in four markets in the United States​ (Denver, Baltimore,​ Dallas, and​ Chicago) wants to examine the prices that the stores charge for pizza slices. Boxplots are given comparing the data from a sample of stores in each market. The mean price of pizza in Baltimore was​ $2.85, $0.23 higher than the mean price of​ $2.62 in Dallas. To see if that difference was​ real, or due to​ chance, we took the 156 prices from Baltimore and Dallas and mixed those 312 prices together. Then we randomly chose 2 groups of 156 prices​ 10,000 times, and computed the difference in mean price each time. The histogram shows the distribution of those​ 10,000 differences. Use the accompanying histogram and boxplots to complete parts a through c below. ​a) Given this​ histogram, what do you conclude about the actual difference of​ $0.23 between the mean prices of Baltimore and​ Dallas?

Since the resampling process never generated a difference in sample means close to​ $0.23, it appears that the observed difference of​ $0.23 did not occur by chance.

The thrill of riding a roller coaster is​ addictive, both for its users who take the​ ride, and for the designers and engineers to make them bigger and faster. On the​ internet, there are many data sets available that rank popular roller coasters according to their maximum​ speed, the​ g-force experienced, or the​ height, to allow you maximizing your thrill. In this case​ study, we will investigate one of these data​ sets, counting 408 roller​ coasters, half of them in North​ America, and the others in Latin America and Europe. Important variables in the data set that affect the thrill are Height​ (in meters), Speed​ (in miles per​ hours, mph), Length​ (in meters), and Duration​ (in seconds). But connoisseurs will also be interested in knowing if the coaster is of steel or wood​ (Type or also​ Construction), if there are any Inversions​ (Yes/No) and if​ so, how many​ (Numinversions), the maximum​ g-force (GForce) or when it was constructed​ (Opened). As with all real data​ sets, the data are​ incomplete: for quite a lot of​ coasters, we are missing values for one or several of the variables. We will perform all our analyses on the largest possible data​ set, excluding only those records that contain missing values for the specific analysis. Given that a ride is such​ thrill, one can only hope it lasts as long as possible. So what determines the variable Duration of a​ ride? Let us first look at the distribution of the variable duration. Construct a histogram of the distribution of Duration. What describes best the distribution of​ Duration?

Somewhat skewed right

A survey of autos parked in student and staff lots at a large university classified the brands by country of​ origin, as seen in the table. Complete parts a through f below.​ e) What are the conditional distributions of origin by driver​ classification?

The driver classifications are the​ columns, so the conditional distributions are the column percentages. Part 1 Calculate the column totals. Recall from part c that the total number of student cars is 106+30+58= 194. The total number of staff cars is 105+13+50=168 Part 2 Divide each cell in the student column by the student total 194 to get the conditional distribution of origin for STUDENTS and convert the results to​ percents, rounding to one decimal place 106/194=0.54 times 100= 54.6% 30/194=0.15 times 100 = 15.5% 58/194=0.29 times 100 = 29.9% Divide each cell in the staff column by the staff total 168 to get the conditional distribution of origin for staff and convert the results to​ percents, rounding to one decimal place 105/168= 0.62 times 100 = 62.5% 13/168=0.07 times 100= 7.7% 50/168=0.29 times 100= 29.8%

Some IQ tests are standardized to a Normal model ​N(100​,15​). b) What cutoff value bounds the lowest 25​% of the​ IQs?

The lowest 25% of the IQs corresponds to the 25th percentile. 25/100= 0.25 -find the cut point using the invNorm Function on your calculator. Enter 0.25 for area. leave both the mean and standard deviation as they are. move the tail to the LEFT. -The cut point is z= -0.67 z= (y-μ)/σ -0.67=(y-100)/15 y=89.9 b) the cutoff value is 89.9

A survey of autos parked in student and staff lots at a large university classified the brands by country of​ origin, as seen in the table. Complete parts a through f below.​ d) What is the marginal distribution of​ origin? ​

The marginal distribution of origin can be calculated by dividing each row total by the total for all of the rows. total of row 1 (American): 106 + 105 = 211 total of row 2 (European): 30 +13=43 total of row 3 (Asian): 58+50 = 108 TOTAL OF ALL THE ROWS: 211+43+108= 362 Divide the row totals by the total for all the rows to get the marginal distribution of origin and convert the results to​ percents, rounding to one decimal place. 211/362=0.58 times 100 = 58.3% 43/362=0.11 times 100 = 11.9% 108/362= 0.29times 100= 29.8% d)MARGINAL: 58.3%, 11.9%, 29.8%

Fifty-three(53) men completed the​ men's alpine downhill part of the super combined. The gold medal winner finished in 100.25 seconds. The accompanying table lists the times​ (in seconds) for all competitors. Complete parts a through d below ​b) What is the actual percent of times greater than 108.49 seconds

To find the actual percent of times greater than 108.49 ​seconds, count all the number in the table that are greater than 108.49 seconds. There are 5 numbers greater that are greater than 108.49 seconds. divide that number by the total number of men which in the case is 53 and multiply by 100 to find the percentage so 5/53=0.094 0.094 times 100 = 9.4%

The Annenberg Inclusion Initiative tallied the number of films released by seven top distributors in 2019 that had female leads or​ co-leads and that had leads or​ co-leads who were from an underrepresented population. The results are shown in the accompanying table. Which studio earned the most on average from those​ films?

Walt Disney Studios earned the most on average. EXPLANATION -The data for the average revenues from films with female leads is in the fourth column of the given table. The largest value in the fourth column is the 1020.1 in the row for Walt Disney Studios. So that studio earned the most on average for films with female leads.

A National Vital Statistics Report provides information on deaths by​ age, sex, and race. Displays of the distributions of ages at death for White and Black males are provided. Use these displays to complete parts a through c below. ​a) Describe the overall shapes of these distributions. ​b) How do the distributions​ differ? ​c) Look carefully at the bar definitions. Where do these plots violate the rules for statistical​ graphs? Select all that apply.

a) Both distributions are skewed to the left. The White distribution has one small​ peak, while the Black distribution has three. b)The center for the distribution of Black males is less than the center of the distribution of White males. c)The interval widths are not constant. The vertical axes do not have the same maximum.

The thrill of riding a roller coaster is​ addictive, both for its users who take the​ ride, and for the designers and engineers to make them bigger and faster. On the​ internet, there are many data sets available that rank popular roller coasters according to their maximum​ speed, the​ g-force experienced, or the​ height, to allow you maximizing your thrill. In this case​ study, we will investigate one of these data​ sets, counting 408 roller​ coasters, half of them in North​ America, and the others in Latin America and Europe. Important variables in the data set that affect the thrill are Height​ (in meters), Speed​ (in miles per​ hours, mph), Length​ (in meters), and Duration​ (in seconds). But connoisseurs will also be interested in knowing if the coaster is of steel or wood​ (Type or also​ Construction), if there are any Inversions​ (Yes/No) and if​ so, how many​ (Numinversions), the maximum​ g-force (GForce) or when it was constructed​ (Opened). As with all real data​ sets, the data are​ incomplete: for quite a lot of​ coasters, we are missing values for one or several of the variables. We will perform all our analyses on the largest possible data​ set, excluding only those records that contain missing values for the specific analysis. a) What variables in our data set are of categorical​ type? Select all that apply. b) What variables in our data set are of quantitative​ type? Select all that apply. What are the measurement units of the variable​ Type?

a) Country Inversions Type b)Numinversions Height Duration Opened GForce Speed Length c)As a categorical​ variable, Type has no units.

Pew Research surveyed 5006 U.S. adults to ask their opinions about the state of jobs in the United States in 2016. Respondents were asked how satisfied they are with their current job and how their current standard of living compares with that of their parents at the same age. The accompanying table summarizes their responses. Complete parts a and b below. ​ a) Is this a table of row​ percents, column​ percents, or table​ percents? How can you​ tell? ​b) Which of the following can you tell from this​ table? If you​ can, then give the value specified. i. What percent of all respondents are both worse off than their parents and somewhat satisfied with their​ jobs? Select the correct choice below​ and, if​ necessary, fill in the answer box to complete your choice. ii. What percent of those respondents who are better off than their parents were at the same age are nevertheless dissatisfied with their current​ job? Select the correct choice below​ and, if​ necessary, fill in the answer box to complete your choice. iii. What percent of those respondents who are dissatisfied with their current job are actually better off than their parents were at the same​ age? Select the correct choice below​ and, if​ necessary, fill in the answer box to complete your choice. iv. What percent of all respondents are dissatisfied with their current​ job? Select the correct choice below​ and, if​ necessary, fill in the answer box to complete your choice

a) The table has column percents because the total for each column is about 100​%. b)i)The value cannot be found from the table. ii)The value cannot be found from the table. iii) 39.2% iv) The value cannot be found from the table.

A survey of 299 undergraduate students asked about​ respondents' diet preference​ (Carnivore, Omnivore,​ Vegetarian) and political alignment​ (Liberal, Moderate,​ Conservative). A mosaic plot of the results is given. Complete parts a through d below. ​a) Are there more men or women in the​ survey? Explain briefly. ​b) Does there appear to be an association between Politics and​ Gender? Explain briefly. ​c) Does there appear to be an association between Politics and​ Diet? Explain briefly. d) Does the association between Politics and Diet seem to differ between men and​ women? Explain briefly.

a) There are more men because the total area of the bars labeled M is larger. b)There appears to be a strong association between Politics and Gender because the relative proportion of females in each group for Politics tends to become smaller as the political alignment becomes more conservative. c)There appears to be a strong association between Politics and Diet because the conditional distribution of diet tends to include meat more as the political alignment becomes more conservative. d)The difference in the proportion of vegetarians between liberals and conservatives is larger for women. The difference in the proportion of carnivores between liberals and conservatives is larger for men.

Students in an introduction to statistics course were asked to describe their politics as​ "Liberal," "Moderate," or​ "Conservative." ​a) Produce a graphical display comparing the conditional distributions of males and females among the three categories of politics. Choose the correct answer below. b) Comment briefly on what you see from the display in part a.

a) answer in the picture b)The proportions of females and males for conservatives are significantly different from the proportions for the other categories.

Here are boxplots of the points scored during the first 10 games of the season for both Diane and Kate. ​a) Summarize the similarities and differences in their performance so far. ​b) The coach can take only one player to the state championship. Which one should she​ take? Why?

a)Both girls have the same approximate​ median, but Diane has a larger IQR. b)A and B are both​ possible, depending on the​ coach's preference.(She should take Kate​, because she is a more consistent performer. She should take Diane​, because she has the ability to score a higher point total.)

The following histograms are of the assets​ (in millions of​ dollars) of 79 companies. ​a) Which​ re-expression of the assets histogram do you​ prefer? Why? ​b) In the square root​ re-expression, what does the value 45 actually indicate about the​ company's assets? c) In the logarithm​ re-expression, what does the value 3 actually indicate about the​ company's assets?

a)Logarithm​ re-expression, because its histogram is more symmetric b) (45)^2= 2025 The​ company's actual assets would be 2025 millions c) log so (10)^3= 1000 The​ company's actual assets would be 1000 million dollars.

The thrill of riding a roller coaster is​ addictive, both for its users who take the​ ride, and for the designers and engineers to make them bigger and faster. On the​ internet, there are many data sets available that rank popular roller coasters according to their maximum​ speed, the​ g-force experienced, or the​ height, to allow you maximizing your thrill. In this case​ study, we will investigate one of these data​ sets, counting 408 roller​ coasters, half of them in North​ America, and the others in Latin America and Europe. Important variables in the data set that affect the thrill are Height​ (in meters), Speed​ (in miles per​ hours, mph), Length​ (in meters), and Duration​ (in seconds). But connoisseurs will also be interested in knowing if the coaster is of steel or wood​ (Type or also​ Construction), if there are any Inversions​ (Yes/No) and if​ so, how many​ (Numinversions), the maximum​ g-force (GForce) or when it was constructed​ (Opened). As with all real data​ sets, the data are​ incomplete: for quite a lot of​ coasters, we are missing values for one or several of the variables. We will perform all our analyses on the largest possible data​ set, excluding only those records that contain missing values for the specific analysis. To satisfy our interest in the history of roller​ coasters, let us first look when most of them were designed. Construct a​ stem-and-leaf display of all roller coasters by year of opening. Part 2 a)Describe the shape of the distribution of the variable year of opening. Choose the correct answer below. b)From what year is the oldest roller​ coaster? c)In what time period are the most roller coasters opened according to the​ graph? d)What is the advantage of making a​ stem-and-leaf display instead of a​ dotplot? e)Describe in your own words the distribution of the observations of​ Opened, covering both​ shape, center, and spread of the distribution. f)Given that the distribution of opening years is skewed to the​ left, what combinations of measures of central tendency and spread best describe this​ distribution?

a)Skewed to the left b) 1924 c)​2000-2004 d)A​ stem-and-leaf display preserves the individual data values. e) The distribution is strongly skewed to the​ left, with 91 roller coasters that opened in the​ 2000-2005 time​ period, which is the mode of the distribution. Only a few of the roller coasters are from the​ 1920-1970 time​ period, making a long left tail. f)Median and interquartile range

The U.S. Census Bureau keeps track of the number of adoptions in each state. The accompanying histogram shows the distribution of adoptions for 47 of the states. ​a) Which would you expect to be​ larger: the median or the​ mean? Explain briefly. ​b) Which would you​ report: the mean or the​ median? Explain briefly.

a)The mean is larger because the distribution is skewed to the right, so the mean is pulled toward the higher values. b)The median is resistant to the skewed shape of the​ distribution, so it is a better choice for most summaries.

The thrill of riding a roller coaster is​ addictive, both for its users who take the​ ride, and for the designers and engineers to make them bigger and faster. On the​ internet, there are many data sets available that rank popular roller coasters according to their maximum​ speed, the​ g-force experienced, or the​ height, to allow you maximizing your thrill. In this case​ study, we will investigate one of these data​ sets, counting 408 roller​ coasters, half of them in North​ America, and the others in Latin America and Europe. Important variables in the data set that affect the thrill are Height​ (in meters), Speed​ (in miles per​ hours, mph), Length​ (in meters), and Duration​ (in seconds). But connoisseurs will also be interested in knowing if the coaster is of steel or wood​ (Type or also​ Construction), if there are any Inversions​ (Yes/No) and if​ so, how many​ (Numinversions), the maximum​ g-force (GForce) or when it was constructed​ (Opened). As with all real data​ sets, the data are​ incomplete: for quite a lot of​ coasters, we are missing values for one or several of the variables. We will perform all our analyses on the largest possible data​ set, excluding only those records that contain missing values for the specific analysis. Part 1 a) Construct the​ five-number summary of the distribution of opening years. b) One of the more spectacular things of a roller coaster are the​ loopings, or inversions as they are called in this data set. Is your interest in riding historic roller​ coasters, at odds with achieving maximum thrill by going through at least one​ looping? Construct a​ side-by-side boxplot and​ stem-and-leaf plots of the distribution of opening​ years, grouped by the presence of inversions. c) Summarize the main differences between the two distributions of openings years for roller coasters with and without loopings.

a)The minimum is equal to 1924. The first quartile is equal to 1991. The median is equal to 1999. The third quartile is equal to 2004. The maximum is equal to 2014. b) The oldest 15 coasters are all without any inversion. The newest 3 coasters are all with inversion. Both distributions have equal median opening year. c) One major difference is that roller coasters with inversions are relatively​ new, built from 1975 on. As a​ result, the distribution of opening years of roller coasters with loopings is much less strongly left skewed than the distribution of coasters without​ loopings, which date back to the​ 1920's.

Suppose your statistics professor reports test grades as​ z-scores, and you got a score of 1.78 on an exam. ​a) Write a sentence explaining what that means. ​b) Your friend got a​ z-score of −2. If the grades satisfy the Nearly Normal​ Condition, about what percent of the class scored lower than your​ friend?

a)The score was 1.78 standard deviations higher than the mean score in the class. Part B use picture to solve this lower than -2 so: 2.35% + 0.15%= 2.5% ​b) About 2.5​% of the class scored lower than your friend.

ANOTHER EXAMPLE Suppose your statistics professor reports test grades as​ z-scores, and you got a score of 2.59 on an exam. ​a) Write a sentence explaining what that means. ​b) Your friend got a​ z-score of −1. If the grades satisfy the Nearly Normal​ Condition, about what percent of the class scored lower than your​ friend?

a)The score was 2.59 standard deviations higher than the mean score in the class. PART B use picture to solve this lower than -1 so: 13.5+2.35+0.15=16% ​b) About 16​% of the class scored lower than your friend.

The Titanic was a British passenger liner that sank in the North Atlantic Ocean in​ 1912, after colliding with an iceberg during her maiden voyage from Southampton to New York City. The sinking caused more than​ 1,500 passengers and crew to​ die, making it one of the deadliest peacetime maritime disasters in modern history. The Titanic catastrophe is not only famous for the​ movie, but also it allows several types of statistical​ analyses, and is therefore a popular case study in statistics texts. One of these analyses focuses on the following​ question: who survived the​ sinking? "Women and children​ first" is a code of conduct dating from​ 1852, whereby the lives of women and children were to be saved first in a​ life-threatening situation, typically abandoning​ ship, when survival resources such as lifeboats were limited. But is it indeed true that women and children had the best perspectives to​ survive? This data set has the survival data for the passengers on the Titanic.​ Also, for each​ person, the data set indicates if they were an adult or​ child, their​ gender, and the class they were staying in. The fact that survival rates by class were unequal can be nicely illustrated by different types of graphs of the association of Survival and​ Class: the​ side-by-side bar​ chart, the segmented bar​ chart, and the​ side-by-side pie chart. Focus on the segmented or stacked bar chart. Please generate one such bar chart with Class as​ Column, choosing Group by​ Survival, Stack Bars as grouping​ option, and Percent​ (within category) as Type.​ Next, redo the segmented bar​ chart, now with Percent as Type. a) Which of the two graphs allows you to read if more crew than first class passengers​ (in absolute​ numbers) survived? b)Which of the two graphs allows you to read how the proportion of crew that survived compares to the proportion of first class passengers that​ survived?

a)Type as Percent is the better display because the goal is to compare total counts. b)Type as Percent​ (within category) is the better display because the goal is to compare relative frequencies.

The pie chart shows the ratings assigned to 846 ​first-run movies released in a recent year. ​a) Is this an appropriate display for these​ data? Explain. ​b) Which was the least common​ rating?

a)Yes, because each movie falls into only one category and no categories overlap. b) NC-17

Use the Normal model ​N(100​,16​) describing IQ scores to answer the following. ​a) What percent of​ people's IQs are expected to be over 80​? ​b) What percent of​ people's IQs are expected to be under 95​? ​c) What percent of​ people's IQs are expected to be between 116 and 124​?

part A Standard deviation= 100 mean=16 Step 1: find z score using formula z=(y-μ)/σ = (80-100)/16 z = -1.25 step 2: find the area to the left of z = -1.25 using the normalCDF option on your calculator (Make sure to use the negative sign not the minus sign). The area to the left of z=-1.25 is 0.106 -Since the area to the left of z=−1.88 is 0.106​, 10.56% of people have IQs BELOW 80 (10.56% was found by multiplying 0.106 by 100). -Subtract this percent from​ 100% to find the percent of people with IQs ABOVE 80. 100-10.56=89.4 a) Approximately 89.4​% of​ people's IQs are expected to be above 80. PART B To find the percent of people with IQs under 95​, use the same formula. (95-100)/16 z= -0.31 The area to the left of z= -0.31 is 0.378 b) so 37.8% of​ people's IQs are expected to be under 95. PART C step1: ​Find the​ z-score of people with IQs equal to 116 (116-100)/16 z=1 the area to left of z=1 is 0.841 step2: Find the​ z-score of people with IQs equal to 124 (124-100)/16 z=1.5 the area to left of z=1.5 is 0.933 Step3: Subtract the area to the left of z=1 from the area to the left of z=1.5 0.933-0.841= 0.918 step 4 Multiply the difference by 100 to find the percent of people with IQs between 116 and 124. 0.918 times 100 = 9.2% c) so 9.2% of​ people's IQs are expected to be between 116 and 124​

A company that manufactures rivets believes the shear strength​ (in pounds) is modeled by N(650,20).Use the 68-95-99.7 Rule to complete parts a through d below. ​a) Draw and label the Normal model. Choose the correct graph below.

step1 μ=650 σ=20 Determine the values that are 1 standard deviation away from the mean. 650-1(20)= 630 650+1(20)= 670 Determine the values that are 2 standard deviations away from the mean. 650-2(20)=610 650+2(20)=690 Determine the values that are 3 standard deviations away from the mean. 650-3(20)=590 650+3(20)=710 a) In the Normal model for these​ rivets, about​ 68% of the rivets have a shear strength between 630 and 670 pounds -about​ 95% of the rivets have a shear strength between 610 and 690 -about​ 99.7% of the rivets have a shear strength between 590 and 710


Kaugnay na mga set ng pag-aaral

Unit 1B Vocabulary - General Psych

View Set

FIN223 Midterm 2 (Chapters 6, 7, 8, 9, 10)

View Set

Cognitive Psychology Final (FA16)

View Set

Competency 7: Teaching English Language Learners (ELLs)

View Set

Pentest+ - Linux Academy / Udemy

View Set

Taxation of Individuals & Business Entities Chapter 1

View Set

CIP C13 - CH04 - Extra Contractual Liability in Quebec (Completed)

View Set

ECON270: Chapter 9 questions (Exam 3)

View Set