Stats - Chapter 3
In a boxplot, potential outliers are points that are more than ___ IQRs from the edges of the box.
1.5 IQRs
The mean of a collection of data is located at the ________ of a distribution of data.
Balancing Point
Boxplots are NOT recommended for use with which of the following types of distributions?
Bimodal or any multimodal distribution Boxplots are best used only for unimodal distributions because they hide bimodality or any multimodality.
The mean represents the typical value in a set of data for what type of distribution?
Distributions that are roughly symmetric.
A standard unit measures which of the following?
How many standard deviations away an observation is from the mean
The length of the box in a boxplot is proportional to which of the following?
IQR The length of the box in a boxplot is proportional to the IQR. The left edge of the box is at the first quartile and the right edge is at the third quartile.
If an observation has a z-score of 0, this means which of the following?
If an observation has a z-score of 0, then it is equal to the mean. The mean is 0 standard deviations away from itself, so it has a z-score of 0.
What is the first step to do with potential outliers?
Investigate Further The first step with potential outliers is always to investigate. A potential outlier might not be an outlier at all. Or a potential outlier might tell an interesting story, or it might be the result of an error in entering data.
The five-number summary for a distribution of final exam scores is 40, 72, 79, 90, 96. Explain why it is not possible to draw a boxplot based on this information. (Hint: What more do you need to know?)
It isn't possible to draw a boxplot based on the five-number summary because the boxplot must mark all potential outliers. Since the minimum is lower than the left limit, there may be other unknown outliers between the minimum and left limit, and so the boxplot cannot mark all potential outliers.
What does a Symmetric Distribution look like?
Left and right hand sides are roughly mirror images.
The ____ is another term for the arithmetic average.
Mean The mean is another term for the arithmatic average. It can be thought of as the balancing point of a distribution of data.
When a distribution is skewed, the _______ is used to measure the center and the _______ is used to measure variation.
Median and interquartile range. The mean and standard deviation are used to measure the center and variation, respectively, when a distribution is symmetric.
What are the 5 numbers needed for a boxplot?
Min Q1 Median Q3 Max
In your own words, describe to someone who knows only a little statistics how to recognize when an observation is an outlier. What action(s) should be taken with an outlier?
Outliers are observed values far from the main group of data. In a histogram they are separated from the others by space. Outliers must be looked at in closer context to know how to treat them. If they are mistakes, they might be removed or corrected. If they are not mistakes, you might do the analysis twice, once with and once without the outliers.
The median is often used for which of the following types of distribution?
Skewed The median is often used for skewed distributions. The mean is not often used for skewed distributions because skew affects the mean more than it affects the median.
The ______ is a number that measures how far away the typical observation is from the mean.
Standard Deviation For most distributions, a majority of observations are within one standard deviation of the mean value.
What doe the symbol ∑ stand for?
Summation
The Empirical Rule applies to distributions that are ________.
Symmetric and Unimodal According to the Empirical Rule, if a distribution is unimodal and symmetric, approximately 68% of the observations will be within one standard deviation of the mean, approximately 95% of the observations will be within two standard deviations of the mean, and nearly all the observations will be within three standard deviations of the mean.
A sociologist says, "Typically, men in a certain country still earn more than women." What does this statement mean?
The center of the distribution of salaries for men in the country is greater than the center for women. In a distribution of values, the typical value is given by the mean. In this case, the average salary of a man is higher than that of a woman, so when comparing the distribution of men's salaries to women's salaries, the center of the distribution for men is greater than the center of the distribution for the women.
A dieter recorded the number of calories he consumed at lunch for one week. As you can see, a mistake was made on one entry. The calories are listed in increasing order below. 349, 371, 386, 398, 412, 4190 When the error is corrected by removing the extra 0, will the mean change? Will the median? Explain without doing any calculation.
The corrected value will give a different mean but not a different median. Medians are resistant to outliers and not as affected by extreme values, but the more extreme a value is, the more the mean is affected by it.
In a right-skewed distribution, which of the following is true?
The mean tends to be greater than the median in a right-skewed distribution. This is because the higher values to the right of the center pull the mean up more than they affect the median.
The value that would be right in the middle if you were to sort the data from smallest to largest is called the ______.
The median Median is the value that would be right in the middle if you were to sort the data from smallest to largest. About 50% of the observations are below it and about 50% of the observations are above it.
For what purpose is the median used?
The median is a typical value of a data set. It is used particularly when the distribution is skewed.
Which measure of the center (mean or median) is more resistant to outliers, and what does "resistant to outliers" mean?
The median is more resistant to outliers than the mean, especially when the outliers have extreme values. The presence of an extreme value can cause the mean to become very skewed because it will shift heavily in the direction of the extreme value. The amount the median shifts by is based on the number of data observations, because it is determined by the middle value after ordering all the observations from lowest to highest. If there is only one outlier with an extremely large value the median will shift very slightly, while the mean will change significantly.
Because the median is not affected by the size of an outlier and does not change even if a particular outlier is replaced by an even more extreme value, we say the median is _____ to outliers.
The median is resistant to outliers. This makes it a good choice for a measure of center when a distribution is skewed.
When a distribution contains outliers, which of the following is the best choice for a measure of center?
The median is resistant to outliers, so when a distribution contains outliers, the median is the best choice for a measure of center.
For most applications, why is the standard deviation is preferred over the variance?
The units for the variance are always squared. The units for the variance are always squared. Measuring spread distance with the variance implies that the units for measuring spread are different from the units for measuring center, which is not true.
In a boxplot, the vertical line inside the box marks the location of the _______.
The vertical line inside the box marks the location of the median.
In a boxplot, the whiskers extend to which of the following?
To the most extreme values that are not potential outliers In a boxplot, the whiskers extend to the most extreme values that are not potential outliers. Potential outliers are then represented by others markers, such as dots.
In a symmetric, unimodal distribution, about two-thirds of the observations are where?
Within one standard deviation of the mean
What can be used to compare values measured in different units, such as inches and pounds?
Z-score The z-score measures distance from a mean in terms of standard deviations, so it can be used to compare values measured in different units, such as inches and pounds.
According to the Empirical Rule, ________ will be within two standard deviations of the mean.
approximately 95% of the observations
The interquartile range tells us how much space the _____ of the data occupy.
middle 50% The interquartile range tells us how much space the middle 50% of the data occupy. It is found by subtracting the third quartile from the first quartile.
To compute the variance, what should one do?
the square of the standard deviation The variance is the square of the standard deviation. It is represented symbolically by s2.