Chapter 3 STATS
3.1.RA-2 Fill in the blank below. The mean of a collection of data is located at the ______ of a distribution of data.
The mean of a collection of data is located at the "balancing point" of a distribution of data.
In a boxplot, the vertical line inside the box marks the location of the
median.
Which of the following is NOT one of the five numbers needed to make a boxplot? Q1 The maximum Median Mean
The mean is not shown in a boxplot, so it is not used to construct boxplots.
When a distribution contains outliers, which of the following is the best choice for a measure of center? Choose the correct answer below. Interquartile range Mean Median Standard deviation
note: The median is resistant to outliers, so when a distribution contains outliers, the median is the best choice for a measure of center.
3.1.RA-4 The mean represents the typical value in a set of data for what type of distribution? A. For distributions that are roughly symmetric B. For distributions that are bimodal C. For distributions that are skewed D. For all distributions
A. For distributions that are roughly symmetric
In a right-skewed distribution, which of the following is true? A. The mean and median are approximately the same. B. The mean tends to be greater than the median. C. The mean tends to be less than the median. D. None of these
B. The mean tends to be greater than the median. note: The mean tends to be greater than the median in a right-skewed distribution. This is because the higher values to the right of the center pull the mean up more than they affect the median.
The value that would be right in the middle if you were to sort the data from smallest to largest is called the
median note: The median is the value that would be right in the middle if you were to sort the data from smallest to largest. About 50% of the observations are below it and about 50% of the observations are above it.
Name two measures of the center of a distribution, and state the conditions under which each is preferred for describing the typical value of a single data set. What are two measures of the center of a distribution? interquartile range and standard deviation first quartile and third quartile median and mean
median and mean
3.2.RA-3 A standard unit measures which of the following? A. How many standard deviations away an observation is from the mean B. How many standard deviations away an observation is from the median C. The magnitude of the standard deviation D. The interval within which approximately 68% of the observations fall
A. How many standard deviations away an observation is from the mean note: A standard unit is how many standard deviations away an observation is from the mean. A measurement converted to standard units is called a z-score.
For most applications, why is the standard deviation is preferred over the variance? A. The units for the variance are always squared. B. The standard deviation is easier to calculate than the variance. C. The standard deviation more accurately measures the variability in a distribution. D. All of the above
A. The units for the variance are always squared. note: The units for the variance are always squared. Measuring spread distance with the variance implies that the units for measuring spread are different from the units for measuring center, which is not true.
Boxplots are NOT recommended for use with which of the following types of distributions? Unimodal Skewed Symmetric Bimodal or any multimodal distribution
Correct answer: Bimodal or any multimodal distribution Boxplots are best used only for unimodal distributions because they hide bimodality or any multimodality.
What is the first step to do with potential outliers? Choose the correct answer below. A. Eliminate them from the data set B. Assume there was an error in the sampling process C. Assume there was an error in entering the data D. Investigate further
The first step with potential outliers is always to investigate. A potential outlier might not be an outlier at all. Or a potential outlier might tell an interesting story, or it might be the result of an error in entering data.
A dieter recorded the number of calories he consumed at lunch for one week. As you can see, a mistake was made on one entry. The calories are listed in increasing order below. 349, 371, 386, 398, 412, 4190 When the error is corrected by removing the extra 0, will the mean change? Will the median? Explain without doing any calculation.
note: The median is resistant to outliers and extreme values because it orders the data from lowest to highest and looks at the middle value. The highest value does not change the order, and so it does not change the median. The mean is the balancing point for the data set. When looking at the shape of a histogram, the mean is the point which balances the weight on both sides. If an extreme value is placed on one end of the mean, it has to shift in that direction to keep everything balanced.
The symbol ∑ stands for which of the following? Multiplication Summation Division Finding the mean
note: The symbol ∑ stands for summation. If x represented a single observation, then ∑x would mean that all the values should be added together.
3.1.RA-7 To compute the variance, what should one do? A. Double the standard deviation. B. Square the standard deviation. C. Divide the standard deviation by n minus −1. D. Take the square root of the standard deviation.
note: The variance is the square of the standard deviation. It is represented symbolically by s squared s2.
For what purpose is the median used? A. To give the spread of a distribution B. To measure the variation of a data set C. To give a typical value of a data set D. None of these
C. To give a typical value of a data set note: The median is a typical value of a data set. It is used particularly when the distribution is skewed.
Name two measures of the center of a distribution, and state the conditions under which each is preferred for describing the typical value of a single data set. Under what conditions is the median preferred? A. The median is preferred when there are few data points. B. The median is preferred when the data is strongly skewed or has outliers. C. The median is preferred when there are many data points. D. The median is preferred when the data is relatively symmetric.
B. The median is preferred when the data is strongly skewed or has outliers. note: The median provides a better measure of center when the data is skewed or has outliers because the presence of an outlier has a much greater effect on the mean.
The median is often used for which of the following types of distribution? Uniform Skewed Symmetric Bimodal
Skewed note: The median is often used for skewed distributions. The mean is not often used for skewed distributions because skew affects the mean more than it affects the median.
Why is the mean different from the median?
note: The median gives a better measure of center for this distribution because the professor's age is an outlying observation. The median also tends to give a better representation of a typical observation in a skewed distribution.
Which measure of the center (mean or median) is more resistant to outliers, and what does "resistant to outliers" mean?
The median is more resistant, which indicates that it usually changes less than the mean when comparing data with and without outliers. note: The median is more resistant to outliers than the mean, especially when the outliers have extreme values. The presence of an extreme value can cause the mean to become very skewed because it will shift heavily in the direction of the extreme value. The amount the median shifts by is based on the number of data observations, because it is determined by the middle value after ordering all the observations from lowest to highest. If there is only one outlier with an extremely large value the median will shift very slightly, while the mean will change significantly.
3.2.RA-1 According to the Empirical Rule, ________ will be within two standard deviations of the mean.
note: According to the Empirical Rule, if a distribution is unimodal and symmetric, approximately 95% of the observations will be within two standards deviation of the mean.
If the mean and the median of a distribution are approximately the same, then the shape of the distribution is likely to be _______.
note: If the mean and the median of a distribution are approximately the same, then the shape of the distribution is likely to be symmetric.
In a boxplot, the whiskers extend to which of the following? Choose the correct answer below. A. The smallest and largest values in the data set B. To the most extreme values that are not potential outliers C. To the first and third quartiles D. None of these
note: In a boxplot, the whiskers extend to the most extreme values that are not potential outliers. Potential outliers are then represented by others markers, such as dots.
The interquartile range tells us how much space the _____ of the data occupy.
note: The interquartile range tells us how much space the middle 50% of the data occupy. It is found by subtracting the third quartile from the first quartile.
The length of the box in a boxplot is proportional to which of the following? Choose the correct answer below. IQR Mean Median Standard deviation
note: The length of the box in a boxplot is proportional to the IQR. The left edge of the box is at the first quartile and the right edge is at the third quartile.
In a boxplot, potential outliers are points that are more than ___ IQRs from the edges of the box.
In a boxplot, potential outliers are points that are more than 1.5 IQRs below the first quartile or above the third quartile.
When comparing groups, if one group is strongly skewed or has outliers and the other is symmetric, which of the following should be used to compare the groups? A. The median and interquartile range for the skewed group and the mean and standard deviation for the symmetric group B. The mean and standard deviation for the skewed group and the median and interquartile range for the symmetric group C. The means and standard deviations D. The medians and interquartile ranges
D. The medians and interquartile ranges note: When comparing two distributions, one should always use the same measures of center and spread for both distributions. Since the mean will be affected by the skew or outliers in the first distribution, use the median and interquartile range for both distributions.
The ______ is a number that measures how far away the typical observation is from the mean.
answer: standard deviation note: For most distributions, a majority of observations are within one standard deviation of the mean value.
3.2.RA-2 The Empirical Rule applies to distributions that are
answer: symmetric and unimodal. note: According to the Empirical Rule, if a distribution is unimodal and symmetric, approximately 68% of the observations will be within one standard deviation of the mean, approximately 95% of the observations will be within two standard deviations of the mean, and nearly all the observations will be within three standard deviations of the mean.
Because the median is not affected by the size of an outlier and does not change even if a particular outlier is replaced by an even more extreme value, we say the median is _____ to outliers.
note: The median is resistant to outliers. This makes it a good choice for a measure of center when a distribution is skewed.
The ______ is another term for the arithmetic average.
mean note: The mean is another term for the arithmetic average. It can be thought of as the balancing point of a distribution of data.
3.2.RA-4 If an observation has a z-score of 0, this means which of the following? Choose the correct answer below. A. The observation is equal to the standard deviation. B. The observation is equal to the median. C. The z-score was computed incorrectly. D. The observation is equal to the mean.
D. The observation is equal to the mean. note: If an observation has a z-score of 0, then it is equal to the mean. The mean is 0 standard deviations away from itself, so it has a z-score of 0.
When a distribution is skewed, the _______ is used to measure the center and the _______ is used to measure variation.
note: The mean and standard deviation are used to measure the center and variation, respectively, when a distribution is symmetric.
3.1.1 A sociologist says, "Typically, men in a certain country still earn more than women." What does this statement mean? A. The center of the distribution of salaries for men in the country is greater than the center for women. B. The highest paid people in the country are men. C. All women's salaries in the country are less varied than all men's salaries. D. All men make more than all women in the country.
A. The center of the distribution of salaries for men in the country is greater than the center for women. note: In a distribution of values, the typical value is given by the mean. In this case, the average salary of a man is higher than that of a woman, so when comparing the distribution of men's salaries to women's salaries, the center of the distribution for men is greater than the center of the distribution for the women.
3.1.RA-6 In a symmetric, unimodal distribution, about two-thirds of the observations are where? A. Within three standard deviations of the mean B. Within two standard deviations of the mean C. More than one standard deviation from the mean D. Within one standard deviation of the mean
D. Within one standard deviation of the mean
In your own words, describe to someone who knows only a little statistics how to recognize when an observation is an outlier. What action(s) should be taken with an outlier?
Outliers are observed values far from the main group of data. In a histogram they are separated from the others by space. Outliers must be looked at in closer context to know how to treat them. If they are mistakes, they might be removed or corrected. If they are not mistakes, you might do the analysis twice, once with and once without the outliers. note: Outliers are observed values that lie outside the range of the main group of data. When an outlier is present, the observer needs to consider it more closely. Sometimes it is just a mistake that happened while collecting the data and can be corrected or discarded. Other times it is a legitimately observed value. In those cases, the analysis needs to be presented once with outliers and once without outliers to give a better idea of what a typical value is. Note that in statistics, potential outliers are defined as observations that are more than 1.5 interquartile ranges below the first quartile or above the third quartile, not above or below the median. Also note that a potential outlier is not the same thing as an outlier.
Name two measures of the center of a distribution, and state the conditions under which each is preferred for describing the typical value of a single data set. Under what conditions is the mean preferred? A. The mean is preferred when the data is relatively symmetric. B. The mean is preferred when the data is strongly skewed or has outliers. C. The mean is preferred when there are few data points. D. The mean is preferred when there are many data points.
A. The mean is preferred when the data is relatively symmetric.
3.2.RA-5 Which of the following can be used to compare values measured in different units, such as inches and pounds? z-score standard deviation standard error interquartile range
answer: z-score note: The z-score measures distance from a mean in terms of standard deviations, so it can be used to compare values measured in different units, such as inches and pounds.
3.1.19 In a recent competition, do you think the standard deviation of the running times for all men who ran the 100-meter race would be larger or smaller than the standard deviation of the running times for the men's marathon? Explain. A. The standard deviation for the 100-meter event would be less. All the runners come to the finish line within a few seconds of each other. In the marathon, the runners can be quite widely spread after running that long distance. B. The standard deviation for the marathon event would be less. Many more runners compete in a marathon rather than a 100-meter event. Therefore, the average time will be determined with greater precision. C. The standard deviation for the marathon event would be less. All the runners finish the race in a matter of seconds. In the marathon, the runners can be quite widely spread after running that long distance. D. The standard deviation for the 100-meter event would be less. All the runners finish the race in a matter of seconds. In the marathon, the runners take at least a few hours to complete the course.
note: Since the difference between running times in the 100-meter event will be within a few seconds of each other, the running times will have small variation. In the marathon, since the running times are likely to be minutes apart, the times will have greater variation. Thus, the marathon running times will have a greater standard deviation.