STAT300 Unit 2 Review
Recently a group of students answered the question, "On average, how many expensive coffee beverages do you consume each week?" The boxplots show the distributions for the weekly number of expensive coffee beverages consumed for men and women. Which of the following statements is the most reasonable conclusion about the variability depicted in the boxplots?
There is less variability in the number of expensive coffee beverages consumed by women because the IQR is smaller. (The IQR measures the variability for the typical range of values (the middle 50% of the data). A smaller IQR suggests less variability in the typical range of weekly coffee beverages consumed by the women.
According to OpenSecrets.org, the net worth of U.S. senators is strongly skewed to the right. In 2010, the two measures of center for U.S. senators were $2,502,770 and $13,224,333. Which number best represents the mean net worth and which represents the median net worth of US Senators?
$2,502,770 is the median and $13,224,333 is the mean. The data is strongly skewed to the right. The mean will be pulled away from the median toward the right skew, so the mean is more than the median.
The mean is 22 in each of the distributions. Which distribution has the least variability about its mean?
B (has the most data at the mean, which decreases the average amount of variability about the mean.)
more variability
larger and wider box plot (IQR)
less variability =
smaller box plot (IQR)
use the mean and standard deviation in measures of center when:
spread is only symmetrical
center
the center of a distribution is a typical value that represents the group. We have two different measurements for determining the center of a distribution: mean and median.
To analyze the distribution of a quantitative variable we describe the (1._______________) and any (2.____________). We use three types of graphs to analyze the distribution of a quantitative variable: dotplots, histograms, and boxplots.
1. overall pattern of the data (shape, center, spread) 2. deviations from the pattern outliers
The mean is 22 in each of the distributions. Which distribution has the most variability about its mean?
C ( these three distributions all of data is one unit from the mean, but C has less data at the mean than the other distributions. Data at the mean decreases the average variability about the mean. ) C has the most variability.
It is impossible to tell (A boxplot divides the data values into four groups of equal counts. In a boxplot there are the same number of students between the minimum and Q1, between Q1 and the median, between the median and Q3, and between Q3 and the maximum. However, a boxplot does not depict the number of students that are in each group. The boxplot for each class does not indicate how many students are enrolled in that class.)
The boxplots below show the distribution of test scores for two classes. Which class has more students?
median
is the physical center of the data when we make an ordered list. It has the same number of values above it as below it.
Which histogram shows a distribution of exam scores on an easy exam?
histogram I (the data set is skewed left. Most of the students are clustered on the right around very high quiz scores, and just a few students are on the left with low scores. This was likely an easy quiz.)
spread
of a distribution is a description of how the data varies. We studied three ways to measure spread: range (max - min), the interquartile range (Q3 - Q1), and the standard deviation. When we use the median, Q1 to Q3 gives a typical range of values associated with the middle 50% of the data. When we use the mean, Mean ± SD gives a typical range of values. 1. The interquartile range (IQR) measures the variability in the middle half of the data. 2. Standard deviation measures roughly the average distance of data from the mean.
mean
the average. We calculate the mean by adding the data values and dividing by the number of individual data points. The mean is the fair share measure.
General Guidelines for Choosing a Measure of Center
1) Always plot the data: We need to use a graph to determine the shape of the distribution. By looking at the shape, we can determine which measure of center best describes the data. 2)Use the mean as a measure of center only for distributions that are reasonably symmetric with a central peak. When outliers are present, the mean is not a good choice. 3)Use the median as a measure of center for all other cases.
Q2: Assume that the following histograms are drawn on the same scale. Which one of the histograms has a mean that is larger than the median?
Histogram II. The histogram is skewed right. The mean is pulled away from the median and toward the right skew. So the mean is larger than the median.
Outliers
are data points that fall outside the overall pattern of the distribution. When using the median and IQR to measure center and spread, we use the 1.5 * IQR interval to identify outliers. Specifically, points outside the interval Q1 - 1.5 * IQR to Q3 + 1.5 * IQR are labeled as outliers.
Q6)A local chain of coffee shops is collecting data on the time each customer spends in line before placing an order. The frequency of each number of minutes spent waiting in line is given in a histogram for each of the four coffee shops. Consider the variable time-spent-waiting-in-line . In which coffee shop is the standard deviation of time spent waiting in line zero?
coffee shop III 3 because all customers spent the same amount of time waiting in line. (All customers wait in line for the same amount of time. There is no variability in the time customers spend waiting in line in coffee shop 3. So the standard deviation is 0).
balancing point
the mean is also known as the balancing point of a distribution. If we measure the distance between each data point and the mean, the distances are balanced on each side of the mean.
The graphs below give the cancer mortality rate for countries with low per capita income and countries with high per capita income. The cancer mortality rate is given as a percentage of total deaths. Which of the following statements are valid?
1) Typical countries with low per-capita income have a lower cancer mortality rate than typical countries with high per-capita income. Therefore, people living in countries with low-per capita income are less likely to die from cancer. (The countries in the low-income group are clustered on the left, so the typical cancer mortality rates for this group of countries are small. The countries in the high-income group are clustered on the right, so the typical cancer mortality rates for this group of countries are large. So you are correct that typical countries with low per-capita income have a lower cancer mortality rate than typical countries with high per-capita income.) 2) Based on the shape, center, and spread of each distribution, people living in countries with high-per-capita income are more likely to die from cancer. (You are correct that we must compare the shape, center, and spread for both dot-plots. The center of data for the low-income group is less than the center of data for the high-income group. More importantly every country in the largest cluster of the high-income group has a greater risk of dying from cancer than every country in the largest cluster of the low-income group.)
Q3: A local swim club offers competitive meets throughout the season. Some races are open competitions with no qualifying times. Other races are qualified competitions . For example, 15- to 16-year-old girls are eligible to swim the 100-meter freestyle in a qualified competition if they posted a time of 1-minute 8-second time or less during the qualifying period. Consider the group of 15- to 16-year-old females who swim the 100-meter freestyle in the open competitions and the group of 15- to 16-year-old females who swim the 100-meter freestyle in the qualified competitions . Which group's distribution of race times would most likely have the largest standard deviation?
The group competing in the open competitions is most likely to have a higher standard deviation than the group competing in the qualified competitions. (standard deviation is one way to measure variability . Any 15 to 16-year old female is allowed to compete in these races, so there will be slower swimmers in this group. There will be more variability in the swim times.)
Q12: A chemistry class has learned how to read a sensitive high-caliber balance scale (this is not a digital scale). Since most students are very good at reading the scale, the instructor decides it's time to test their skills. The instructor asks each student to weigh the same 1.6 ounce bag of Peanut M&Ms, and then records each student's reading of the balance scale. Which histogram shows the distribution of the students' balance scale readings?
Histogram II (since most of the students are very good at reading the balance scale, we would expect most students to read the scale close to the accurate weight with a few readings that are too high and few that are too low.)
Consider the data distribution given in each of the histograms. Assume that the horizontal scales are the same. Which distribution of data has the smallest standard deviation and why ?
Histogram IV because more of the data values are clustered close to the mean. ( This distribution of data has the smallest standard deviation because the data values are more mostly clustered close to the center, so there is very little spread from the mean. Standard deviation approximates the average distance (or spread)) from the mean.
Consider the data distribution given in the histogram. Which of the following tools are most appropriate to measure the center and spread for this distribution?
Mean and standard deviation (this histogram is bell shaped with a central peak and some of the data values to the right. The mean is the most appropriate measure of center. Since standard deviation measures variability about the mean, it is the most appropriate measure of variability.
shape
as describe the shape of a distribution as: left skewed, right skewed, symmetric with a central peak (bell shaped) or uniform. Not all distributions have a simple shape that fits into one of these categories.