Statistics Chapter 3
Distributions of gestation periods (lengths of pregnancy) for a particular animal are roughly bell-shaped. The mean gestation period for this animal is 258 days, and the standard deviation is 10 days for females who go into spontaneous labor. Which is more unusual, a baby being born 10 days early or a baby being born 10 days late? Explain.
Both events are equally likely
The mean represents the typical value in a set of data for what type of distribution?
For distributions that are roughly symmetric
A standard unit measures?
How many standard deviations away an observation is from the mean
The length of the box in a boxplot is proportional to?
IQR (interquartile range)
Suppose you have a data set with the weights of all members of a high school soccer team and all members of a high school academic decathlon team (a team of students selected because they often answer quiz questions correctly). Which team do you think would have a larger standard deviation of weights? Explain.
The academic decathlon team would have a larger standard deviation of weights. Academics don't require a specific weight to succeed, so the distribution of weights should mirror that of the general population.
A sociologist says, "Typically, men in a certain country still earn more than women." What does this statement mean?
The center of the distribution of salaries for men in the country is greater than the center for women.
According to the Empirical Rule, ________ will be within two standard deviations of the mean.
approximately 95% of the observations
Boxplots are NOT recommended for use with which types of distributions?
bimodal or any multimodal distribution
What is the first step to do with potential outliers?
investigate further
The ____ is another term for the arithmetic average.
mean
Which of the following is NOT one of the five numbers needed to make a boxplot? Mean The maximum Median Q1
mean
In a boxplot, the vertical line inside the box marks the location of the _______.
median
The value that would be right in the middle if you were to sort the data from smallest to largest is called the ______.
median
When a distribution contains outliers, which of the following is the best choice for a measure of center?
median
The interquartile range tells us how much space the _____ of the data occupy.
middle 50%
The median is often used for which types of distribution?
skewed
To compute the variance, what should one do?
square the standard deviation
The ______ is a number that measures how far away the typical observation is from the mean.
standard deviation
The symbol ∑ stands for?
summation
If the mean and the median of a distribution are approximately the same, then the shape of the distribution is likely to be _______.
symmetric
The Empirical Rule applies to distributions that are ________.
symmetric and unimodal
Which of the following can be used to compare values measured in different units, such as inches and pounds?
z-score
In a boxplot, potential outliers are points that are more than ___ IQRs from the edges of the box.
1.5
In a right-skewed distribution, which of the following is true?
The mean tends to be greater than the median.
The mean of a collection of data is located at the ______ of a distribution of data.
"balancing point"
When you are comparing two sets of data, and one set is strongly skewed and the other is symmetric, which measures of the center and variation should you choose for the comparison?
The medians and interquartile ranges
If an observation has a z-score of 0, this means which of the following?
The observation is equal to the mean
In a recent competition, do you think the standard deviation of the running times for all men who ran the 100-meter race would be larger or smaller than the standard deviation of the running times for the men's marathon? Explain.
The standard deviation for the 100-meter event would be less. All the runners come to the finish line within a few seconds of each other. In the marathon, the runners can be quite widely spread after running that long distance.
A real estate agent claims that all things being equal, houses with swimming pools tend to sell for less than those without swimming pools. What does this statement mean?
The typical price for homes with pools is smaller than the typical price for homes without pools.
For what purpose is the median used?
To give a typical value of a data set
In a boxplot, the whiskers extend to which of the following?
To the most extreme values that are not potential outliers
Because the median is not affected by the size of an outlier and does not change even if a particular outlier is replaced by an even more extreme value, we say the median is _____ to outliers.
resistant
An IQ test has a mean of 95 and a standard deviation of 5. Which is more unusual, an IQ of 90 or an IQ of 105?
An IQ of 105 is more unusual because its corresponding z-score, 2, is further from 0 than the corresponding z-score of −1 for an IQ of 90.
Name two measures of the variation of a distribution, and state the conditions under which each measure is preferred for measuring the variability of a single data set.
The interquartile range is preferred when the data is strongly skewed or has outliers. The standard deviation is preferred when the data is relatively symmetric.
In your own words, describe to someone who knows only a little statistics how to recognize when an observation is an outlier. What action(s) should be taken with an outlier?
Outliers are observed values far from the main group of data. In a histogram they are separated from the others by space. Outliers must be looked at in closer context to know how to treat them. If they are mistakes, they might be removed or corrected. If they are not mistakes, you might do the analysis twice, once with and once without the outliers.
A dieter recorded the number of calories he consumed at lunch for one week. As you can see, a mistake was made on one entry. The calories are listed in increasing order below. 349, 371, 386, 398, 412, 4190 When the error is corrected by removing the extra 0, will the mean change? Will the median? Explain without doing any calculation.
The corrected value will give a different mean but not a different median. Medians are resistant to outliers and not as affected by extreme values, but the more extreme a value is, the more the mean is affected by it.
Which measure of the center (mean or median) is more resistant to outliers, and what does "resistant to outliers" mean?
The median is more resistant, which indicates that it usually changes less than the mean when comparing data with and without outliers.