Quantitative Analysis II review of all the assignment (chap1-5)
The Annenberg Inclusion Initiative tallied the number of films released by seven top distributors in 2019 that had female leads or co-leads and that had leads or co-leads who were from an underrepresented population. The results are shown in the accompanying table. Which studio earned the most from films with leads from underrepresented populations?
Walt Disney Studios earned the most EXPLANATION -The data for the total revenues from films with female leads is in the second column of the given table. -The row that corresponds to the studio that earned the most will have the largest value. -The largest value in the second column is the 4099.7 in the row for Walt Disney Studios. So that studio earned the most for films with female leads.
A particular IQ test is standardized to a Normal model, with a mean of 100 and a standard deviation of 15.
explained in depth in notebook.
A distribution is said to be skewed to the right if
it has a long tail tail that trails toward the right side
A distribution is said to be skewed to the left if
it has a long tail that trails to the left
Some IQ tests are standardized to a Normal model N(100,15). a) What cutoff value bounds the highest 10% of all IQs?
-the highest of 10% of all IQs corresponds to 90th percentile. -divide 90 by 100= 0.9 -find the cut point using the invNorm Function on your calculator. Enter 0.9 for area. leave both the mean and standard deviation as they are. move the tail to the LEFT. -The cut point is z=1.28 z= (y-μ)/σ 1.28=(y-100)/15 y=119.2 a) the cutoff value is 119.2
A company that manufactures rivets believes the shear strength (in pounds) is modeled by N(650,20).Use the 68-95-99.7 Rule to complete parts a through d below. b) Would it be safe to use these rivets in a situation requiring a shear strength of 630 pounds? Explain
No, because about 16% of all of this company's rivets have a shear strength of less than 630 pounds.
Some IQ tests are standardized to a Normal model N(100,15). c) What cutoff values bound the middle 70% of the IQs?
- 100-70=30 -the middle 70% leaves 30% left over, half on each side. Thus -z is at the 15th percentile and z is at the 85th ( from 100-15) percentile. -The cutoff scores for the middle 70% are the 15th and 85th percentiles. -find the z-score of the 15 percentile and round to two decimal places. 15/100=0.15 find the cut point using the invNorm Function on your calculator. Enter 0.15 for area. leave both the mean and standard deviation as they are. move the tail to the LEFT. The cut point is z= −1.04 z= (y-μ)/σ -1.04=(y-100)/15 y=84.4 --find the z-score of the 85 percentile and round to two decimal places. find the cut point using the invNorm Function on your calculator. Enter 0.85 for area. leave both the mean and standard deviation as they are. move the tail to the LEFT. The cut point is z=1.04 z= (y-μ)/σ 1.04=(y-100)/15 y=115.6 c) The cutoff values are 84.4 and 115.6
A company that manufactures rivets believes the shear strength (in pounds) is modeled by N(650,20).Use the 68-95-99.7 Rule to complete parts a through d below. d) Rivets are used in a variety of applications with varying shear strength requirements. What is the maximum shear strength for which you would feel comfortable approving this company's rivets? Explain your reasoning. Select the correct choice below and fill in the answer box to complete your choice.
-A very small probability of failure is desired, less than 1 out of a million, or 6 standard deviations below the mean. The maximum shear strength that should be approved is---pounds - since it's 6 standard deviation below the mean use this formula: μ -6σ 650-6(20)= 530 pounds d) -A very small probability of failure is desired, less than 1 out of a million, or 6 standard deviations below the mean. The maximum shear strength that should be approved is 530 pounds
People with z-scores of 2.25 or above on a certain aptitude test are sometimes classified as geniuses. If aptitude test scores have a mean of 100 and a standard deviation of 24 points, what is the minimum aptitude test score needed to be considered a genius?
-A z-score measures the distance of a data value from the mean in standard deviations. The z-score of a data value y is given by the formula z=y−y/s, where y is the mean and s is the standard deviation. 2.25=y-100/24 y=154 The minimum aptitude test score needed to be considered a genius is 154 points.
Here are boxplots of the points scored during the first 10 games of the season for both Holly and Sue. a) Summarize the similarities and differences in their performance so far. b) The coach can take only one player to the state championship. Which one should she take? Why?
-Both girls have the same approximate median, but Holly has a larger IQR. -A and B are both possible, depending on the coach's preference. (a: She should take Holly, because she has the ability to score a higher point total. b: She should take Sue, because she is a more consistent performer.)
A highly rated community college has over 60,000 students and seven different campuses. One of its highest density classes offered is Introduction to Statistics. The statistics course is required for nearly every major offered at the college and therefore is considered a strategic course for the college. The college's leadership is very interested in the relationship between the class size of its statistics courses and students' final grades for the course. Specifically, the college is concerned with the low pass rate of some of its class sections and is determined to remedy the situation. The college's institutional research department recently collected data for analysis in order to support leadership's upcoming discussion regarding the low pass rate of some of its statistics class sections. Final grades from a random sample of 300 class sections over the last five years were collected. The research division also conducted analysis, using archived data, to determine the class size of these 300 class sections. The Class Number, Campus, Class Size, Average Final Grade, Number of "F"s, Average G.P.A. and Successful/Unsuccessful data were collected for these 300 class sections. Using the Empirical Rule with a mean of 2.66 and a standard deviation of 0.23, between what two values do 68% of the Average G.P.A. data fall?
2.66-0.23= 2.43 2.66+0.23= 2.89 Between 2.43 and 2.89
Two companies are vying for a city's "Best Local Employer" award, to be given to the company most committed to hiring local residents. Although both employers hired 300 new people in the past year, Company A brags that it deserves the award because 70% of its new jobs went to local residents, compared to only 60% for Company B. Company B concedes that those percentages are correct, but points out that most of its new jobs were full-time, while most of Company A's were part-time. Not only that, says Company B, but a higher percentage of its full-time jobs went to local residents than did Company A's, and the same was true for part-time jobs. Thus, Company B argues, it's a better local employer than Company A. Suppose Company A filled 50 of 100 full-time and 160 of 200 part-time positions with local residents. Which hiring by Company B gives an example of the situation described above?
Company B filled 179 of 299 full-time positions and 1 part-time position with local residents.
Fifty-three men completed the men's alpine downhill part of the super combined. The gold medal winner finished in 100.25 seconds. The accompanying table lists the times (in seconds) for all competitors. Complete parts a through d below d) Make a histogram of these times. What can be seen from the histogram? Choose the correct graph below.
The data are not Normal. They are unimodal and skewed right.
Fifty-three men completed the men's alpine downhill part of the super combined. The gold medal winner finished in 100.25 seconds. The accompanying table lists the times (in seconds) for all competitors. Complete parts a through d below c) Why would these two percentages not agree?
The data distribution is not unimodal and symmetric.
A highly rated community college has over 60,000 students and seven different campuses. One of its highest density classes offered is Introduction to Statistics. The statistics course is required for nearly every major offered at the college and therefore is considered a strategic course for the college. The college's leadership is very interested in the relationship between the class size of its statistics courses and students' final grades for the course. Specifically, the college is concerned with the low pass rate of some of its class sections and is determined to remedy the situation. The college's institutional research department recently collected data for analysis in order to support leadership's upcoming discussion regarding the low pass rate of some of its statistics class sections. Final grades from a random sample of 300 class sections over the last five years were collected. The research division also conducted analysis, using archived data, to determine the class size of these 300 class sections. The Class Number, Campus, Class Size, Average Final Grade, Number of "F"s, Average G.P.A. and Successful/Unsuccessful data were collected for these 300 class sections. What needs to hold true in order to use the Empirical Rule?
The distribution must be approximately normal.
A university teacher saved every e-mail from students in a large introductory statistics class during an entire term. He then counted, for each student who had sent him at least one e-mail, how many e-mails each student had sent. The accompanying histogram shows the distribution of e-mails sent by students. Describe the shape of the distribution.
The shape is unimodal and skewed right
A survey of autos parked in student and staff lots at a large university classified the brands by country of origin, as seen in the table. Complete parts a through f below. f) Do you think that the origin of the car is independent of the type of driver? Explain.
They do not appear to be independent because the conditional distributions of origin are significantly different for at least one group for the two driver classifications.
A survey of autos parked in student and staff lots at a large university classified the brands by country of origin, as seen in the table. Complete parts a through f below. c) What percent of the students owned American cars?
To calculate the percent, divide the number of American cars owned by students by the number of cars owned by students. The number of cars owned by students was 106+30+58=194. The number of American cars owned by students was 106. Divide the count 106 by the total 194 and convert the result to a percent, rounding to one decimal place. 106/194=0.54 c) 54.6%
A survey of autos parked in student and staff lots at a large university classified the brands by country of origin, as seen in the table. Complete parts a through f below. a) What percent of all the cars surveyed were foreign?
To calculate the percent, divide the number of foreign cars by the total number of cars. Part 2 106+105+30+13+58+50 = 362 The total number of cars was 362 Part 3 30+ 13+58+50 = 151 The number of foreign cars was 151. Divide the count 151 by the total 362 and convert the result to a percent, rounding to one decimal place. 151/362 = 0.417 a) 41.7%
Students were asked how many songs they had in their digital music libraries. To the right is a display of the responses. Use this display to complete parts a and b below. a) What aspect of this distribution makes it difficult to summarize, or to discuss, center and spread? b) What would you suggest doing with these data if we want to understand them better?
a)the extreme skew b)Re-express the data.
The pie chart shows the ratings assigned to 842 first-run movies released in a recent year. a) Is this an appropriate display for these data? Explain. b) Which was the most common rating?
a)Yes, because each movie falls into only one category and no categories overlap. b) Not rated
A highly rated community college has over 60,000 students and seven different campuses. One of its highest density classes offered is Introduction to Statistics. The statistics course is required for nearly every major offered at the college and therefore is considered a strategic course for the college. The college's leadership is very interested in the relationship between the class size of its statistics courses and students' final grades for the course. Specifically, the college is concerned with the low pass rate of some of its class sections and is determined to remedy the situation. The college's institutional research department recently collected data for analysis in order to support leadership's upcoming discussion regarding the low pass rate of some of its statistics class sections. Final grades from a random sample of 300 class sections over the last five years were collected. The research division also conducted analysis, using archived data, to determine the class size of these 300 class sections. The Class Number, Campus, Class Size, Average Final Grade, Number of "F"s, Average G.P.A. and Successful/Unsuccessful data were collected for these 300 class sections. College leadership is interested in analyzing the average G.P.A. of this sample of 300 of its statistics class sections. Calculate the probability of randomly selecting a class section with an average G.P.A. greater than 3.00. (Use the mean and standard deviation of the Average G.P.A. data. Also, if appropriate based upon your visual analysis of a histogram of the Average G.P.A. data, use the Normal distribution to answer this question.)
on statcrunch, do the followings: Stats>summary stats> columns> average GPA> compute. You'll get the value shown in the picture. on your calculator, press normal cdf, leave the lower as it is. In the upper type 3 (which is under n on the table), in the μ type the value that's under the mean on the table 2.6557333(type the entire number). in the σ, type the value that's under the standard deviation 0.22523224 (type the whole number). press enter. You'll get 0.9368053301 type 1-0.9368053301= 0.0631946699 multiply 0.0631946699 by 100 = 6.32% 6.32% is your answer
The company that sells frozen pizza to stores in four markets in the United States (Denver, Baltimore, Dallas, and Chicago) wants to examine the prices that the stores charge for pizza slices. The accompanying boxplots compare data from a sample of stores in each market. Complete parts a and b below. a) Do prices appear to be the same in the four markets? Explain.
No, prices appear to be both higher on average and more variable in Baltimore than in the other three cities.
A company that sells frozen pizza to stores in four markets in the United States (Denver, Baltimore, Dallas, and Chicago) wants to examine the prices that the stores charge for pizza slices. Boxplots are given comparing the data from a sample of stores in each market. The mean price of pizza in Baltimore was $2.85, $0.23 higher than the mean price of $2.62 in Dallas. To see if that difference was real, or due to chance, we took the 156 prices from Baltimore and Dallas and mixed those 312 prices together. Then we randomly chose 2 groups of 156 prices 10,000 times, and computed the difference in mean price each time. The histogram shows the distribution of those 10,000 differences. Use the accompanying histogram and boxplots to complete parts a through c below. c) Consider a similar analysis using shuffling to compare prices in Chicago and Denver. Do you think that the actual difference in mean prices would be different from what you might expect by chance?
No, since the majority of data in the boxplots for Chicago and Denver are close together, the difference in sample means would likely be a value observed often by randomly shuffling the data.
A company that sells frozen pizza to stores in four markets in the United States (Denver, Baltimore, Dallas, and Chicago) wants to examine the prices that the stores charge for pizza slices. Boxplots are given comparing the data from a sample of stores in each market. The mean price of pizza in Baltimore was $2.85, $0.23 higher than the mean price of $2.62 in Dallas. To see if that difference was real, or due to chance, we took the 156 prices from Baltimore and Dallas and mixed those 312 prices together. Then we randomly chose 2 groups of 156 prices 10,000 times, and computed the difference in mean price each time. The histogram shows the distribution of those 10,000 differences. Use the accompanying histogram and boxplots to complete parts a through c below b) Do you think the presence of the outliers in the accompanying boxplots affects your conclusion?
No, the outliers lie fairly close to the minimum and maximum values, and only account for a small proportion of the observations.
The company that sells frozen pizza to stores in four markets in the United States (Denver, Baltimore, Dallas, and Chicago) wants to examine the prices that the stores charge for pizza slices. The accompanying boxplots compare data from a sample of stores in each market. Complete parts a and b below. b) Does the presence of any outliers affect your overall conclusions about prices in the four markets?
No, the presence of outliers does not affect the overall conclusions.
A company must decide which of two delivery services it will contract with. During a recent trial period, the company shipped numerous packages with each service and kept track of how often deliveries did not arrive on time. Use the accompanying table to complete parts a) through c) below. c) The results here are an instance of what phenomenon? Choose the correct answer below.
Simpson's Paradox
A company must decide which of two delivery services it will contract with. During a recent trial period, the company shipped numerous packages with each service and kept track of how often deliveries did not arrive on time. Use the accompanying table to complete parts a) through c) below. b) On the basis of the results in part a, the company has decided to hire Company A. Based on the information given in the data table, do you agree that Company A delivers on time more often? Why or why not? Be specific. Choose the correct answer below.
-Determine the percentage of late regular deliveries for Company A. (14/400) times 100% = 3.5% -Determine the percentage of late overnight deliveries for Company A.(14/400) times 100%= 3.5% Determine the percentage of late regular deliveries for Company B. (2/100) times 100% = 2% Determine the percentage of late overnight deliveries for Company B. (28/400) times 100% = 7% 2% of regular deliveries are late and 7% of overnight deliveries are late for the company Company B. Compare the individual percentages to one another to determine the better company in terms of delivery times. ANSWER b) No, Company B has a lower percentage of regular deliveries that are late and a lower percentage of overnight deliveries that are late.
A company must decide which of two delivery services it will contract with. During a recent trial period, the company shipped numerous packages with each service and kept track of how often deliveries did not arrive on time. Use the accompanying table to complete parts a) through c) below. a) Compare the two services' overall percentage of late deliveries.
-The total number of late deliveries for Company A is 14+14=28 -The total number of overall deliveries for Company A is 400+100=500 -Overall percentage of late deliveries for company A: (28/500) times 100 = 5.6% -The total number of late deliveries for Company B is 2+28=30 -The total number of overall deliveries for Company B is 400+100=500 -Overall percentage of late deliveries for company B: (30/500) times 100= 6% answer: Overall, Company A is late 5.6%of the time. Company B is late 6.0% of the time.
A survey of autos parked in student and staff lots at a large university classified the brands by country of origin, as seen in the table. Complete parts a through f below. b) What percent of the American cars were owned by students?
-To calculate the percent, divide the number of American cars owned by students by the total number of American cars. 106+105= 211 -The total number of American cars was 211 -The number of American cars owned by students was 106. Divide the count 106 by the total 211 and convert the result to a percent,rounding to one decimal place. 106/211=0.502 b) 50.2%
An incoming MBA student took placement exams in economics and mathematics. In economics, she scored 81 and in math 86. The overall results on the economics exam had a mean of 71 and a standard deviation of 7, while the mean math score was 66, with a standard deviation of 11. On which exam did she do better compared with the other students?
-To determine which exam the student did better on, find her z-score on each test using the formula below, where y is the student's score on the exam, μ is the mean of the exam scores and σ is the standard deviation. The exam the student performed better on is the exam which has the larger z-score. z= y-μ/σ -Find the student's z-score on the economics exam. The mean economics exam score was 71 and the standard deviation was 7. z= (student's score on economics exam)-(mean score on economics exam)/ (standard deviation of economics exam) (81-71)/7 = 1.43 z econ= 1.43 -Find the student's z-score on the math exam, rounding to two decimal places. The mean math exam score was 66 and the standard deviation was 11. z= (student's score on Math exam)-(mean score on Math exam)/ (standard deviation of Math exam) (86-66)/11 = 1.82 Z math = 1.82 Since the z-score for the mathematics exam is higher than the z-score for the economics exam, the student performed better on the mathematics exam. ANSWER: Since she scored 1.431 standard deviations above the mean in economics and 1.82 standard deviations above the mean in mathematics, she did better on the mathematics exam.
A company that manufactures rivets believes the shear strength (in pounds) is modeled by N(650,20).Use the 68-95-99.7 Rule to complete parts a through d below. c) About what percent of these rivets can be expected to fall below 690 pounds?
-Use the 68-95-99.7 rule to determine the percent of rivets that are more than two standard deviations from the mean. -About 100%−95%=5% of rivets are more than two standard deviations from the mean. Since the Normal model is symmetric, divide that result by two to get the percent that are more than two standard deviations greater than the mean. -About 2.5% of rivets have a strength greater than 690 pounds C) Therefore, about 97.5% of rivets have a strength less than 690 pounds.
A highly rated community college has over 60,000 students and seven different campuses. One of its highest density classes offered is Introduction to Statistics. The statistics course is required for nearly every major offered at the college and therefore is considered a strategic course for the college. The college's leadership is very interested in the relationship between the class size of its statistics courses and students' final grades for the course. Specifically, the college is concerned with the low pass rate of some of its class sections and is determined to remedy the situation. The college's institutional research department recently collected data for analysis in order to support leadership's upcoming discussion regarding the low pass rate of some of its statistics class sections. Final grades from a random sample of 300 class sections over the last five years were collected. The research division also conducted analysis, using archived data, to determine the class size of these 300 class sections. The Class Number, Campus, Class Size, Average Final Grade, Number of "F"s, Average G.P.A. and Successful/Unsuccessful data were collected for these 300 class sections. Using the Empirical Rule with a mean of 2.66 and a standard deviation of 0.23, what percent of Average G.P.A. data fall above 3.35?
0.15%
Fifty-three men completed the men's alpine downhill part of the super combined. The gold medal winner finished in 100.25 seconds. The accompanying table lists the times (in seconds) for all competitors. Complete parts a through d below a) The mean time was 103.90 seconds, with a standard deviation of 2.65 seconds. If the Normal model is appropriate, what percent of times will be greater than 108.49seconds?
Find the z-score using the formula z=(y−μ)/σ, where y is the observation, μ is the mean, and σ is the standard deviation, rounding to two decimal places. z= (108.49-103.90)/(2.65) z>1.73 Area(z>1.73) For this we're finding the area to the right of 1.73( round the area to three decimal places) on your calculator, choose the "normal cdf" option, this time around put your z-score for the "lower" and hit "2nd", common, and 99 to put positive 1E99 as the "upper" , leave the mean(0) and standard variation(1) as they are Area(z>1.73)= 0.042 multiple 0.042 by 100 a) 4.2% will be greater than 108.49seconds
A company's customer service hotline handles many calls relating to orders, refunds, and other issues. The company's records indicate that the median length of calls to the hotline is 5.2 minutes with an IQR of 2.7 minutes. a)If the company were to describe the duration of these calls in seconds instead of minutes, what would the median and IQR be? b)In an effort to speed up the customer service process, the company decides to streamline the series of pushbutton menus customers must navigate, cutting the time by 36 seconds. What will the median and IQR of the length of hotline calls become?
PART A -Since the durations of all the calls are being multiplied by 60, multiply the median and IQR by 60. 5.2 times 60 = 312 2.7 times 60 = 162 Therefore, the new median will be 312 seconds and the new IQR will be 162 seconds. PART B Find the new median after the durations of all the calls have decreased by 36 seconds 312-36= 276 After the durations of all the calls have been decreased by 36 seconds the IQR will remain the SAME. Therefore, the new median is 276 seconds and the IQR remains the same at 162 seconds.
The mean household income in the US in 2019 was about $89,930 and the standard deviation was about $85,000. (The median income was $59,039.) a) If a Normal model is used for these incomes, what would be the household income of the top 15%? b) How confident can you be in the answer in part a? c) Why might the Normal model not be a good one for incomes?
PART A -divide 15 by 100 = 0.15 -find the cut point by using the "invNorm" option on your calculator. Type in 0.15 for the area. the mean should remain 0 and the standard variation shoul remain 1. switch the "tail" to right and press enter. - the cut point is z= 1.04 -use z= (y-μ)/σ to convert the z-score into an income by solving for y. 1.04 = (y-89,930)/85,000 y=178330 a)the household income of the top 15% would be $178330 b) It is only possible to be confident in the answer from part a if the distribution of incomes is unimodal and symmetric without obvious outliers. C)Since the median is much less than the mean and the standard deviation and mean are very close, the distribution of incomes is likely right skewed.
A study was conducted on shoe sizes of students, reported in European sizes. For the women, the mean size was 37.44 with a standard deviation of 1.65. To convert European shoe sizes to U.S. sizes for women, use the equation shown below. USsize=EuroSize×0.7906−22 a) What is the mean women's shoe size for these responses in U.S. units? b) What is the standard deviation in U.S. units?
PART A US.mean = European mean×0.7906−22 = 37.44 x 0.7906 - 22 = 7.60 a) The mean women's shoe size in U.S. units is 7.60 PART B US.SD= European SD ×0.7906 = 1.65 x 0.7906 = 1.30 b) the standard deviation in U.S. units is 130
A study looked at outliers arising from a plot of average wind speed by month in the Hopkins Forest. Each was associated with an unusually strong storm, but which was the most remarkable for its month? The summary statistics for each month are shown in the accompanying table. The outliers had values of 6.729 mph, 3.931 mph, and 2.533 mph, for February, June, and August, respectively. a) What are their z-scores? b) Which was the most extraordinary wind event?
PART A z= (y-ȳ)/s y=observation, ȳ=mean, s=standard deviation FINDING FEBRUARY Z SCORE (look at the picture) y=6.729 ȳ=2.3237 s=1.5766 z=(6.729-2.3237)/1.5766 z= 2.79 FINDING June Z SCORE (look at the picture) use same formula just change numbers y=3.931 ȳ=0.8565 s=0.7948 z=3.87 FINDING AUGUST Z SCORE (look at the picture) use same formula just change numbers y=2.533 ȳ=0.626 s=0.5969 z=3.19 Part B b) The most extraordinary wind event was in JUNE. It has the LARGEST z-score
The accompanying histogram shows the distribution of mean ACT composite scores for all Wisconsin public schools in 2019. 75.3% of the data points fall between one standard deviation below the mean and one standard deviation above the mean. Complete parts a and b below. b) The Normal probability plot on the left shows the distribution of these scores. The plot on the right shows the same data with the Milwaukee area schools (mostly in the low mode) removed. What do these plots tell you about the shape of the distributions? Select the correct choice below.
PART B The shape of the distribution with the Milwaukee area schools removed is more approximately Normal than the distribution with all the schools.
The accompanying histogram shows the distribution of mean ACT composite scores for all Wisconsin public schools in 2019. 75.3% of the data points fall between one standard deviation below the mean and one standard deviation above the mean. Complete parts a and b below. a) Give two reasons a Normal model is not appropriate for these data. Select all that apply.
Part A -If a Normal model were appropriate, about 68% of the data points would fall within one standard deviation of the mean. -The data are bimodal with a high mode and a low mode.
A company that sells frozen pizza to stores in four markets in the United States (Denver, Baltimore, Dallas, and Chicago) wants to examine the prices that the stores charge for pizza slices. Boxplots are given comparing the data from a sample of stores in each market. The mean price of pizza in Baltimore was $2.85, $0.23 higher than the mean price of $2.62 in Dallas. To see if that difference was real, or due to chance, we took the 156 prices from Baltimore and Dallas and mixed those 312 prices together. Then we randomly chose 2 groups of 156 prices 10,000 times, and computed the difference in mean price each time. The histogram shows the distribution of those 10,000 differences. Use the accompanying histogram and boxplots to complete parts a through c below. a) Given this histogram, what do you conclude about the actual difference of $0.23 between the mean prices of Baltimore and Dallas?
Since the resampling process never generated a difference in sample means close to $0.23, it appears that the observed difference of $0.23 did not occur by chance.
The thrill of riding a roller coaster is addictive, both for its users who take the ride, and for the designers and engineers to make them bigger and faster. On the internet, there are many data sets available that rank popular roller coasters according to their maximum speed, the g-force experienced, or the height, to allow you maximizing your thrill. In this case study, we will investigate one of these data sets, counting 408 roller coasters, half of them in North America, and the others in Latin America and Europe. Important variables in the data set that affect the thrill are Height (in meters), Speed (in miles per hours, mph), Length (in meters), and Duration (in seconds). But connoisseurs will also be interested in knowing if the coaster is of steel or wood (Type or also Construction), if there are any Inversions (Yes/No) and if so, how many (Numinversions), the maximum g-force (GForce) or when it was constructed (Opened). As with all real data sets, the data are incomplete: for quite a lot of coasters, we are missing values for one or several of the variables. We will perform all our analyses on the largest possible data set, excluding only those records that contain missing values for the specific analysis. Given that a ride is such thrill, one can only hope it lasts as long as possible. So what determines the variable Duration of a ride? Let us first look at the distribution of the variable duration. Construct a histogram of the distribution of Duration. What describes best the distribution of Duration?
Somewhat skewed right
A survey of autos parked in student and staff lots at a large university classified the brands by country of origin, as seen in the table. Complete parts a through f below. e) What are the conditional distributions of origin by driver classification?
The driver classifications are the columns, so the conditional distributions are the column percentages. Part 1 Calculate the column totals. Recall from part c that the total number of student cars is 106+30+58= 194. The total number of staff cars is 105+13+50=168 Part 2 Divide each cell in the student column by the student total 194 to get the conditional distribution of origin for STUDENTS and convert the results to percents, rounding to one decimal place 106/194=0.54 times 100= 54.6% 30/194=0.15 times 100 = 15.5% 58/194=0.29 times 100 = 29.9% Divide each cell in the staff column by the staff total 168 to get the conditional distribution of origin for staff and convert the results to percents, rounding to one decimal place 105/168= 0.62 times 100 = 62.5% 13/168=0.07 times 100= 7.7% 50/168=0.29 times 100= 29.8%
Some IQ tests are standardized to a Normal model N(100,15). b) What cutoff value bounds the lowest 25% of the IQs?
The lowest 25% of the IQs corresponds to the 25th percentile. 25/100= 0.25 -find the cut point using the invNorm Function on your calculator. Enter 0.25 for area. leave both the mean and standard deviation as they are. move the tail to the LEFT. -The cut point is z= -0.67 z= (y-μ)/σ -0.67=(y-100)/15 y=89.9 b) the cutoff value is 89.9
A survey of autos parked in student and staff lots at a large university classified the brands by country of origin, as seen in the table. Complete parts a through f below. d) What is the marginal distribution of origin?
The marginal distribution of origin can be calculated by dividing each row total by the total for all of the rows. total of row 1 (American): 106 + 105 = 211 total of row 2 (European): 30 +13=43 total of row 3 (Asian): 58+50 = 108 TOTAL OF ALL THE ROWS: 211+43+108= 362 Divide the row totals by the total for all the rows to get the marginal distribution of origin and convert the results to percents, rounding to one decimal place. 211/362=0.58 times 100 = 58.3% 43/362=0.11 times 100 = 11.9% 108/362= 0.29times 100= 29.8% d)MARGINAL: 58.3%, 11.9%, 29.8%
Fifty-three(53) men completed the men's alpine downhill part of the super combined. The gold medal winner finished in 100.25 seconds. The accompanying table lists the times (in seconds) for all competitors. Complete parts a through d below b) What is the actual percent of times greater than 108.49 seconds
To find the actual percent of times greater than 108.49 seconds, count all the number in the table that are greater than 108.49 seconds. There are 5 numbers greater that are greater than 108.49 seconds. divide that number by the total number of men which in the case is 53 and multiply by 100 to find the percentage so 5/53=0.094 0.094 times 100 = 9.4%
The Annenberg Inclusion Initiative tallied the number of films released by seven top distributors in 2019 that had female leads or co-leads and that had leads or co-leads who were from an underrepresented population. The results are shown in the accompanying table. Which studio earned the most on average from those films?
Walt Disney Studios earned the most on average. EXPLANATION -The data for the average revenues from films with female leads is in the fourth column of the given table. The largest value in the fourth column is the 1020.1 in the row for Walt Disney Studios. So that studio earned the most on average for films with female leads.
A National Vital Statistics Report provides information on deaths by age, sex, and race. Displays of the distributions of ages at death for White and Black males are provided. Use these displays to complete parts a through c below. a) Describe the overall shapes of these distributions. b) How do the distributions differ? c) Look carefully at the bar definitions. Where do these plots violate the rules for statistical graphs? Select all that apply.
a) Both distributions are skewed to the left. The White distribution has one small peak, while the Black distribution has three. b)The center for the distribution of Black males is less than the center of the distribution of White males. c)The interval widths are not constant. The vertical axes do not have the same maximum.
The thrill of riding a roller coaster is addictive, both for its users who take the ride, and for the designers and engineers to make them bigger and faster. On the internet, there are many data sets available that rank popular roller coasters according to their maximum speed, the g-force experienced, or the height, to allow you maximizing your thrill. In this case study, we will investigate one of these data sets, counting 408 roller coasters, half of them in North America, and the others in Latin America and Europe. Important variables in the data set that affect the thrill are Height (in meters), Speed (in miles per hours, mph), Length (in meters), and Duration (in seconds). But connoisseurs will also be interested in knowing if the coaster is of steel or wood (Type or also Construction), if there are any Inversions (Yes/No) and if so, how many (Numinversions), the maximum g-force (GForce) or when it was constructed (Opened). As with all real data sets, the data are incomplete: for quite a lot of coasters, we are missing values for one or several of the variables. We will perform all our analyses on the largest possible data set, excluding only those records that contain missing values for the specific analysis. a) What variables in our data set are of categorical type? Select all that apply. b) What variables in our data set are of quantitative type? Select all that apply. What are the measurement units of the variable Type?
a) Country Inversions Type b)Numinversions Height Duration Opened GForce Speed Length c)As a categorical variable, Type has no units.
Pew Research surveyed 5006 U.S. adults to ask their opinions about the state of jobs in the United States in 2016. Respondents were asked how satisfied they are with their current job and how their current standard of living compares with that of their parents at the same age. The accompanying table summarizes their responses. Complete parts a and b below. a) Is this a table of row percents, column percents, or table percents? How can you tell? b) Which of the following can you tell from this table? If you can, then give the value specified. i. What percent of all respondents are both worse off than their parents and somewhat satisfied with their jobs? Select the correct choice below and, if necessary, fill in the answer box to complete your choice. ii. What percent of those respondents who are better off than their parents were at the same age are nevertheless dissatisfied with their current job? Select the correct choice below and, if necessary, fill in the answer box to complete your choice. iii. What percent of those respondents who are dissatisfied with their current job are actually better off than their parents were at the same age? Select the correct choice below and, if necessary, fill in the answer box to complete your choice. iv. What percent of all respondents are dissatisfied with their current job? Select the correct choice below and, if necessary, fill in the answer box to complete your choice
a) The table has column percents because the total for each column is about 100%. b)i)The value cannot be found from the table. ii)The value cannot be found from the table. iii) 39.2% iv) The value cannot be found from the table.
A survey of 299 undergraduate students asked about respondents' diet preference (Carnivore, Omnivore, Vegetarian) and political alignment (Liberal, Moderate, Conservative). A mosaic plot of the results is given. Complete parts a through d below. a) Are there more men or women in the survey? Explain briefly. b) Does there appear to be an association between Politics and Gender? Explain briefly. c) Does there appear to be an association between Politics and Diet? Explain briefly. d) Does the association between Politics and Diet seem to differ between men and women? Explain briefly.
a) There are more men because the total area of the bars labeled M is larger. b)There appears to be a strong association between Politics and Gender because the relative proportion of females in each group for Politics tends to become smaller as the political alignment becomes more conservative. c)There appears to be a strong association between Politics and Diet because the conditional distribution of diet tends to include meat more as the political alignment becomes more conservative. d)The difference in the proportion of vegetarians between liberals and conservatives is larger for women. The difference in the proportion of carnivores between liberals and conservatives is larger for men.
Students in an introduction to statistics course were asked to describe their politics as "Liberal," "Moderate," or "Conservative." a) Produce a graphical display comparing the conditional distributions of males and females among the three categories of politics. Choose the correct answer below. b) Comment briefly on what you see from the display in part a.
a) answer in the picture b)The proportions of females and males for conservatives are significantly different from the proportions for the other categories.
Here are boxplots of the points scored during the first 10 games of the season for both Diane and Kate. a) Summarize the similarities and differences in their performance so far. b) The coach can take only one player to the state championship. Which one should she take? Why?
a)Both girls have the same approximate median, but Diane has a larger IQR. b)A and B are both possible, depending on the coach's preference.(She should take Kate, because she is a more consistent performer. She should take Diane, because she has the ability to score a higher point total.)
The following histograms are of the assets (in millions of dollars) of 79 companies. a) Which re-expression of the assets histogram do you prefer? Why? b) In the square root re-expression, what does the value 45 actually indicate about the company's assets? c) In the logarithm re-expression, what does the value 3 actually indicate about the company's assets?
a)Logarithm re-expression, because its histogram is more symmetric b) (45)^2= 2025 The company's actual assets would be 2025 millions c) log so (10)^3= 1000 The company's actual assets would be 1000 million dollars.
The thrill of riding a roller coaster is addictive, both for its users who take the ride, and for the designers and engineers to make them bigger and faster. On the internet, there are many data sets available that rank popular roller coasters according to their maximum speed, the g-force experienced, or the height, to allow you maximizing your thrill. In this case study, we will investigate one of these data sets, counting 408 roller coasters, half of them in North America, and the others in Latin America and Europe. Important variables in the data set that affect the thrill are Height (in meters), Speed (in miles per hours, mph), Length (in meters), and Duration (in seconds). But connoisseurs will also be interested in knowing if the coaster is of steel or wood (Type or also Construction), if there are any Inversions (Yes/No) and if so, how many (Numinversions), the maximum g-force (GForce) or when it was constructed (Opened). As with all real data sets, the data are incomplete: for quite a lot of coasters, we are missing values for one or several of the variables. We will perform all our analyses on the largest possible data set, excluding only those records that contain missing values for the specific analysis. To satisfy our interest in the history of roller coasters, let us first look when most of them were designed. Construct a stem-and-leaf display of all roller coasters by year of opening. Part 2 a)Describe the shape of the distribution of the variable year of opening. Choose the correct answer below. b)From what year is the oldest roller coaster? c)In what time period are the most roller coasters opened according to the graph? d)What is the advantage of making a stem-and-leaf display instead of a dotplot? e)Describe in your own words the distribution of the observations of Opened, covering both shape, center, and spread of the distribution. f)Given that the distribution of opening years is skewed to the left, what combinations of measures of central tendency and spread best describe this distribution?
a)Skewed to the left b) 1924 c)2000-2004 d)A stem-and-leaf display preserves the individual data values. e) The distribution is strongly skewed to the left, with 91 roller coasters that opened in the 2000-2005 time period, which is the mode of the distribution. Only a few of the roller coasters are from the 1920-1970 time period, making a long left tail. f)Median and interquartile range
The U.S. Census Bureau keeps track of the number of adoptions in each state. The accompanying histogram shows the distribution of adoptions for 47 of the states. a) Which would you expect to be larger: the median or the mean? Explain briefly. b) Which would you report: the mean or the median? Explain briefly.
a)The mean is larger because the distribution is skewed to the right, so the mean is pulled toward the higher values. b)The median is resistant to the skewed shape of the distribution, so it is a better choice for most summaries.
The thrill of riding a roller coaster is addictive, both for its users who take the ride, and for the designers and engineers to make them bigger and faster. On the internet, there are many data sets available that rank popular roller coasters according to their maximum speed, the g-force experienced, or the height, to allow you maximizing your thrill. In this case study, we will investigate one of these data sets, counting 408 roller coasters, half of them in North America, and the others in Latin America and Europe. Important variables in the data set that affect the thrill are Height (in meters), Speed (in miles per hours, mph), Length (in meters), and Duration (in seconds). But connoisseurs will also be interested in knowing if the coaster is of steel or wood (Type or also Construction), if there are any Inversions (Yes/No) and if so, how many (Numinversions), the maximum g-force (GForce) or when it was constructed (Opened). As with all real data sets, the data are incomplete: for quite a lot of coasters, we are missing values for one or several of the variables. We will perform all our analyses on the largest possible data set, excluding only those records that contain missing values for the specific analysis. Part 1 a) Construct the five-number summary of the distribution of opening years. b) One of the more spectacular things of a roller coaster are the loopings, or inversions as they are called in this data set. Is your interest in riding historic roller coasters, at odds with achieving maximum thrill by going through at least one looping? Construct a side-by-side boxplot and stem-and-leaf plots of the distribution of opening years, grouped by the presence of inversions. c) Summarize the main differences between the two distributions of openings years for roller coasters with and without loopings.
a)The minimum is equal to 1924. The first quartile is equal to 1991. The median is equal to 1999. The third quartile is equal to 2004. The maximum is equal to 2014. b) The oldest 15 coasters are all without any inversion. The newest 3 coasters are all with inversion. Both distributions have equal median opening year. c) One major difference is that roller coasters with inversions are relatively new, built from 1975 on. As a result, the distribution of opening years of roller coasters with loopings is much less strongly left skewed than the distribution of coasters without loopings, which date back to the 1920's.
Suppose your statistics professor reports test grades as z-scores, and you got a score of 1.78 on an exam. a) Write a sentence explaining what that means. b) Your friend got a z-score of −2. If the grades satisfy the Nearly Normal Condition, about what percent of the class scored lower than your friend?
a)The score was 1.78 standard deviations higher than the mean score in the class. Part B use picture to solve this lower than -2 so: 2.35% + 0.15%= 2.5% b) About 2.5% of the class scored lower than your friend.
ANOTHER EXAMPLE Suppose your statistics professor reports test grades as z-scores, and you got a score of 2.59 on an exam. a) Write a sentence explaining what that means. b) Your friend got a z-score of −1. If the grades satisfy the Nearly Normal Condition, about what percent of the class scored lower than your friend?
a)The score was 2.59 standard deviations higher than the mean score in the class. PART B use picture to solve this lower than -1 so: 13.5+2.35+0.15=16% b) About 16% of the class scored lower than your friend.
The Titanic was a British passenger liner that sank in the North Atlantic Ocean in 1912, after colliding with an iceberg during her maiden voyage from Southampton to New York City. The sinking caused more than 1,500 passengers and crew to die, making it one of the deadliest peacetime maritime disasters in modern history. The Titanic catastrophe is not only famous for the movie, but also it allows several types of statistical analyses, and is therefore a popular case study in statistics texts. One of these analyses focuses on the following question: who survived the sinking? "Women and children first" is a code of conduct dating from 1852, whereby the lives of women and children were to be saved first in a life-threatening situation, typically abandoning ship, when survival resources such as lifeboats were limited. But is it indeed true that women and children had the best perspectives to survive? This data set has the survival data for the passengers on the Titanic. Also, for each person, the data set indicates if they were an adult or child, their gender, and the class they were staying in. The fact that survival rates by class were unequal can be nicely illustrated by different types of graphs of the association of Survival and Class: the side-by-side bar chart, the segmented bar chart, and the side-by-side pie chart. Focus on the segmented or stacked bar chart. Please generate one such bar chart with Class as Column, choosing Group by Survival, Stack Bars as grouping option, and Percent (within category) as Type. Next, redo the segmented bar chart, now with Percent as Type. a) Which of the two graphs allows you to read if more crew than first class passengers (in absolute numbers) survived? b)Which of the two graphs allows you to read how the proportion of crew that survived compares to the proportion of first class passengers that survived?
a)Type as Percent is the better display because the goal is to compare total counts. b)Type as Percent (within category) is the better display because the goal is to compare relative frequencies.
The pie chart shows the ratings assigned to 846 first-run movies released in a recent year. a) Is this an appropriate display for these data? Explain. b) Which was the least common rating?
a)Yes, because each movie falls into only one category and no categories overlap. b) NC-17
Use the Normal model N(100,16) describing IQ scores to answer the following. a) What percent of people's IQs are expected to be over 80? b) What percent of people's IQs are expected to be under 95? c) What percent of people's IQs are expected to be between 116 and 124?
part A Standard deviation= 100 mean=16 Step 1: find z score using formula z=(y-μ)/σ = (80-100)/16 z = -1.25 step 2: find the area to the left of z = -1.25 using the normalCDF option on your calculator (Make sure to use the negative sign not the minus sign). The area to the left of z=-1.25 is 0.106 -Since the area to the left of z=−1.88 is 0.106, 10.56% of people have IQs BELOW 80 (10.56% was found by multiplying 0.106 by 100). -Subtract this percent from 100% to find the percent of people with IQs ABOVE 80. 100-10.56=89.4 a) Approximately 89.4% of people's IQs are expected to be above 80. PART B To find the percent of people with IQs under 95, use the same formula. (95-100)/16 z= -0.31 The area to the left of z= -0.31 is 0.378 b) so 37.8% of people's IQs are expected to be under 95. PART C step1: Find the z-score of people with IQs equal to 116 (116-100)/16 z=1 the area to left of z=1 is 0.841 step2: Find the z-score of people with IQs equal to 124 (124-100)/16 z=1.5 the area to left of z=1.5 is 0.933 Step3: Subtract the area to the left of z=1 from the area to the left of z=1.5 0.933-0.841= 0.918 step 4 Multiply the difference by 100 to find the percent of people with IQs between 116 and 124. 0.918 times 100 = 9.2% c) so 9.2% of people's IQs are expected to be between 116 and 124
A company that manufactures rivets believes the shear strength (in pounds) is modeled by N(650,20).Use the 68-95-99.7 Rule to complete parts a through d below. a) Draw and label the Normal model. Choose the correct graph below.
step1 μ=650 σ=20 Determine the values that are 1 standard deviation away from the mean. 650-1(20)= 630 650+1(20)= 670 Determine the values that are 2 standard deviations away from the mean. 650-2(20)=610 650+2(20)=690 Determine the values that are 3 standard deviations away from the mean. 650-3(20)=590 650+3(20)=710 a) In the Normal model for these rivets, about 68% of the rivets have a shear strength between 630 and 670 pounds -about 95% of the rivets have a shear strength between 610 and 690 -about 99.7% of the rivets have a shear strength between 590 and 710