STAT 10 - 1.7 Corequisite
Would you consider a breakfast cereal with 215 calories per serving to be unusual?
Yes, 215 is far from the center of the distribution.
Now let's explore a data set involving six numbers where the median is greater than the mean. Here is the data set: 20, 30, 130, 130, 130, 160. (9) What is the mean of the data set above?
100
Now let's explore a data set involving six numbers where the median is less than the mean. Here is the data set: 50, 50, 60, 80, 160, 200. What is the mean of the data set above?
100
.What is the approximate mode of the sample?
120
Now let's see what happens to the mean and median if we remove the three highest salaries. Calculate the total salary of all the players with the three highest salaries removed. Enter your answer rounded to the nearest million.
282 million Recall that total salaries of all players is $377 million. Subtract the three highest salaries (34 million, 32 million, and 29 million).
First, calculate the original total salary of all the players. You can do this by multiplying the mean by the number of players in the sample. Enter your answer rounded to the nearest million.
377 million The mean is 6.29 million and the sample size is 60 players.
What is the mean salary after those three players are removed? Enter your answer in millions of dollars and round to two decimal places.
4.95 million
What is the approximate median of the sample?
42.5
The histogram below displays 60 annual precipitation amounts (in inches) from a sample of 60 years in Connecticut. The annual precipitation of a region is a measurement of the total amount of rain and snow received in the region. The precipitation data in the sample were selected randomly from all annual precipitation data in the years from 1890 to 2013. What is the approximate mean of the sample?
45
What is the approximate mode of the sample?
47.5
What is the mode of the data set above?
50
What is the median of the data set above?
70
What is the approximate median of the sample? Assume each number in the bin is at the midpoint of the bin. For example, the first bin has one country life expectancy of 47.5 years. To find the median in a set of 50 numbers, you would need order the values least to greatest, then locate the 25th and 26th value in the ordered set.
72.5 There are 15 values between 45 and 70. There are 12 values between 70 and 75. That means that by the time we get to 75, we have included 27 of the 50 countries. Both the 25th and 26th values are in the 70-75 bin, so a good estimate for the median is 72.5.
What is the approximate mean of the sample? Assume each number in the bin is at the midpoint of the bin. For example, the third bin (ranging from 55 to 60) has three countries and a midpoint of 57.5. Quickly find the total of these three countries by multiplying 3(57.5) = 172.5.
73 Mean = 3645/50 = 72.9
why the median is often preferred with these types of data sets
In the previous questions, you saw that removing the 3 highest salaries dropped the mean dramatically from 6.29 million to 4.95 million. However, the median would move much less. In this case the median would lower from 2.17 million to 2.0 million.
Median
Median is the middle number in a sorted list of numbers. To determine the median value in a sequence of numbers, the numbers must first be sorted, or arranged, in value order from lowest to highest or highest to lowest.
Would you consider a breakfast cereal with 153 calories per serving to be unusual?
No, 153 is near the center of the distribution.
The dotplot below displays a random sample of 60 MLB player salaries from the 2018 season. We can use this sample to get a sense of how salaries in the population of major league baseball players are distributed. In the sample, 23 of the 60 players make close to the league minimum. These 23 salaries are plotted in the first column of dots. The mean, median, and mode of this sample are provided below. The mode of a data set is the value that occurs with the greatest frequency. Which measure best describes the typical annual salary in this sample. Create at least two arguments to support your answer. Use the sample data in your arguments.
One could argue why the median is the best measure by stating that half the players make less than 2.17 million, while half make more. The typical player makes a salary in the middle of the salaries. They might argue against the mean since majority of players in the sample (40 out of 60, or 2/3) make less than 5 million. Since 2/3 of the players in the sample make 5 million or less, 6.3 million cannot represent a typical salary. They could then argue that the mode is too low, since it does not reflect all the players that make a higher salary. One could argue why the mean is the best measure by stating that the median (2.17) is too low and does not adequately capture the higher values in the sample. One could argue that the mode is the best measure since more players have this salary than any other salary.
We can use this histogram to estimate the mean and median of the data set. There are 8 bins and each bin has a width of 5 years. The midpoint of a bin is the average of the lower and upper limits of the bin. For example, the first bin has a midpoint of 47.5 and a frequency of 1. You can interpret the bar for the first bin to indicate that there is one country in the sample with a life expectancy in 2016 of 47.5 years. The frequency of each bin is shown above the corresponding bar. Interpret the bar that represents the bin ranging from 70 to 75.
The bar indicates that the sample contains 12 countries with a 2016 life expectancy of about 72.5 years.
This lesson showed how to think about sample means and sample medians through the examination of graphical representations of data and distributions.
The center of a data set is an important and commonly studied characteristic. There is more than one way to quantify the center. Three commonly used measures are the mean, median and mode. The characteristics of a data set affect which measure of center (median or mean) is most appropriate.
The distribution is skewed...
The distribution is skewed right. The extreme large values (160, 200) pull the mean to the right, but do not change the median.
(12) The distribution is skewed...
left The distribution is skewed left. The extremely small values (20, 30) pull the mean to the left, but do not change the median.
What is the approximate median of the sample?.
120
(10) What is the median of the data set above?
130
(11) What is the mode of the data set above?
130
How many calories is too many calories for a breakfast cereal? A student was interested in understanding how calories per serving amounts vary among common breakfast cereals. He collected a random sample of 25 cereals and recorded the calories per serving for each cereal. The histogram below summarizes the calories per serving for the 25 cereals. What is the approximate mean of the sample?
134
Compare the mean and median. How does the shape of this data set impact the mean and median?
The histogram is approximately bell-shaped so the mean and median are similar. i think wrong answer below The histogram is skewed right so the mean is pulled to the left.
Compare the mean and median. How does the shape of this data set impact the mean and median?,
The histogram is skewed right so the mean is pulled to the right.
mean
The mean is the average of the numbers. It is easy to calculate: add up all the numbers, then divide by how many numbers there are. In other words it is the sum divided by the count.
Which measure, mean or median, better represents the typical life expectancy of the countries represented in the sample?
The median is a better measure since the distribution is skewed.
Typically, with distributions that are skewed, the median is used
The median is the preferred measure of center when data are skewed since the mean is heavily influenced by outliers, or extreme values in a distribution, whereas the median is not.
mode
The mode is the value that appears most frequently in a data set. A set of data may have one mode, more than one mode, or no mode at all. Other popular measures of central tendency include the mean, or the average of a set, and the median, the middle value in a set.