Data distribution
Comparing the median ages, younger people tend to buy the BMW 3 series, while older people tend to buy the BMW 7 series. However, this is not a rule, because there is so much variability in each data set.
Compare the three box plots. What do they imply about the age of purchasing a BMW from the series when compared to each other?
The mean and the median are both six.
Describe the relationship between the mean and the median of this distribution.
The mean is 4.1 and is slightly greater than the median, which is four.
Describe the relationship between the mean and the median of this distribution.
. The mode and the median are the same. In this case, they are both five.
Describe the relationship between the mode and the median of this distribution.
skewed right
Describe the shape of this distribution
The distribution is skewed left because it looks pulled out to the left.
Describe the shape of this distribution.
There is not enough information to tell. Each interval lies within a quarter, so we cannot tell exactly where the data in that quarter is concentrated.
Look at the BMW 5 series. Are there more data in the interval 31 to 38 or in the interval 45 to 55? How do you know this?
IQR ~ 17 years
Look at the BMW 5 series. Estimate the interquartile range (IQR).
The interval from 31 to 35 years has the fewest data values. Twenty-five percent of the values fall in the interval 38 to 41, and 25% fall between 41 and 64. Since 25% of values fall between 31 and 38, we know that fewer than 25% fall between 31 and 35.
Look at the BMW 5 series. Which interval has the fewest data in it? How do you know this?31-3538-4141-64
The third quarter has the largest spread. There seems to be approximately a 14-year difference between the median and the third quartile.
Look at the BMW 5 series. Which quarter has the largest spread of data? What is the spread?
The second quarter has the smallest spread. There seems to be only a three-year difference between the first quartile and the median.
Look at the BMW 5 series. Which quarter has the smallest spread of data? What is the spread?
The median of a data set is the middle value of an organized data set.
Median
mean, quartiles, whiskers
What are the parts of a boxplot
normal, negative, positive
What are the types of skew
A longer box indicates a greater interquartile range since the sides of the box indicate the 1st and 3rd quartiles. A greater interquartile range is an indicator of data that may be somewhat unreliable.
What does a longer box indicate?
Each box plot is spread out more in the greater values. Each plot is skewed to the right, so the ages of the top 50% of buyers are more variable than the ages of the lower 50%.
What does the shape of each box plot implies about the distribution of the data collected for that car series.
The BMW 3 series is most likely to have an outlier. It has the longest whisker.
Which group is most likely to have an outlier? Explain how you determined that.
longer whiskers are less of a concern than a long box. A broad range of possibilities but a strong likelihood of central values is more reliable to use for prediction than a moderate overall range with little concentration at the median
Which is more of a concern long whiskers or long box? Why?
The mode is 12, the median is 12.5, and the mean is 15.1. The mean is the largest.
Which is the greatest, the mean, the mode, or the median of the data set? 11; 11; 12; 12; 12; 12; 13; 15; 17; 22; 22; 22
Since the interquartile range represents the 50% of the data closest to the median, a greater range in this section of the plot suggests that the median may not be a great indicator of central tendency.
Why is a longer box not a good indicator of central tendency?
The numbers needed to construct a box-and-whisker plot are called the five-point-summary. The five points are the minimum, the lower median (Q1), the median, the upper median (Q3), and the maximum.
five point summary
Statistical correlation is a representation of possible related changes in values between the two sets of data.
statistical correlation
The mean tends to reflect skewing the most because it is affected the most by outliers.
Of the three measures, which tends to reflect skewing the most, the mean, the mode, or the median? Why?