STATS Chapterr 3
A company advertises a mean lifespan of 1000 hours for a particular type of light bulb. If you were in charge of quality control at the factory, would you prefer that the standard deviation of the lifespans for the light bulbs be 5 hours or 50 hours? Why?
5 hours would be preferable since a smaller standard deviation indicates more consistency. Your answer is correct. The company would prefer to have a consistent product on which customers can depend. A smaller standard deviation is more desirable since it indicates that the light bulb lifespans do not vary much.
According to the Empirical Rule, ________ will be within two standard deviations of the mean.
According to the Empirical Rule, if a distribution is unimodal and symmetric, approximately 95% of the observations will be within two standards deviation of the mean.
A z-score represents how many ______________ a data value is above or below the ______________.
A z-score represents how many standard deviations a data value is above or below the mean.
Because the median is not affected by the size of an outlier and does not change even if a particular outlier is replaced by an even more extreme value, we say the median is _____ to outliers.
Because the median is not affected by the size of an outlier and does not change even if a particular outlier is replaced by an even more extreme value, we say the median is resistant to outliers.
In a typical boxplot, the length of the box indicates which measure of spread?
IQR Since the box in a boxplot runs from the first quartile to the third quartile, the length of the box is the IQR.
The length of the box in a boxplot is proportional to which of the following?
IQR The length of the box in a boxplot is proportional to the IQR. The left edge of the box is at the first quartile and the right edge is at the third quartile.
If all the data values in a set are identical, what can you conclude about the standard deviation?
If the data values are all identical, then the mean is equal to that data value. Therefore, there is no spread from the mean, and the standard deviation is zero.
In a boxplot, potential outliers are points that are more than ___ IQRs from the edges of the box.
1.5 In a boxplot, potential outliers are points that are more than 1.5 IQRs below the first quartile or above the third quartile.
The median is often used for which of the following types of distribution?
Skewed The median is often used for skewed distributions. The mean is not often used for skewed distributions because skew affects the mean more than it affects the median.
If the standard deviation for a data set is zero, what can you conclude about the data?
The data values must all be equal. Standard deviation measures spread. If there is no spread, one can conclude that the values are all the same.
Which statement is NOT true regarding the mean? A.The mean should be used when the distribution is roughly symmetric. B.The mean is the center of gravity or balancing point for the data set. C.The mean is always the best measure of center. D.The calculation of the mean uses all the values in the data set.
The mean is not always the best measure of center. If the distribution is skewed, it might be better to report the median since it is resistant. If the data is qualitative, the mean cannot be used.
Suppose, on the warmest day of the month, the daily high temperature in a city is accidentally recorded as 700 instead of 70 degrees Fahrenheit. Compare the effect this mistake will have on the mean monthly high temperature to the effect on the median monthly high temperature.
The mean will increase significantly, but the median will not change as a result of the mistake. Unlike the median, the mean is not resistant to extreme values and will be significantly affected by changing only one temperature to such an extreme value.
Which statement is NOT true regarding the median?
The median is always one of the values in the data set. The median is not always one of the values in the data set. If there is an even number of values, then the median will be the mean of the middle two values, which will not necessarily be a value in the data set.
Which measure of center (mean or median) is resistant? Explain what it means for that measure to be resistant.
The median is resistant because it is not sensitive to extreme values in the data set. If the largest observation was doubled, for example, the median would not change since that largest value does not factor into its computation. why we use it for skewed data
If an observation has a z-score of 0, this means which of the following?
The observation is equal to the mean. If an observation has a z-score of 0, then it is equal to the mean. The mean is 0 standard deviations away from itself, so it has a z-score of 0.
Describe the sample standard deviation in words rather than with a formula.
The sample standard deviation is the square root of the quotient of the sum of the squared deviations from the mean and (n−1). The formula for the sample standard deviation is s=√∑(x−x)^2/n−1 where x is the sample mean an n is the sample size. The sample variance, s^2, is the square of the sample standard deviation.
Name two measures of the variation of a distribution, and state the conditions under which each measure is preferred for measuring the variability of a single data set.
The standard deviation is preferred when the data is relatively symmetric. The interquartile range is preferred when the data is strongly skewed or has outliers.
If you calculate the z-score for your height in inches, what unit is used on the z-score?
The z-score will have no units. Calculating the z-score involves subtracting the mean in inches and then dividing by the standard deviation in inches. The inches divide out and leave a number with no units.
Identify when the interquartile range is better than the standard deviation as a measure of dispersion and explain its advantage.
When the distribution is skewed left or right or contains some extreme observations, then the interquartile range is preferred since it is resistant. The IQR is resistant to extreme values in the data, making it a better choice for a skewed distribution.
In a symmetric, unimodal distribution, about two-thirds of the observations are where?
Within one standard deviation of the mean Approximately two-thirds of the observations in a symmetric, unimodal distribution are within one standard deviation of the mean.
Which of the following is NOT needed to construct a boxplot?
Mean A boxplot uses the median as a measure of the center, not the mean.
The value that would be right in the middle if you were to sort the data from smallest to largest is called the ______.
Median
Suppose the list below shows how many text messages Elyse sent each day for the last 10 days. If Elyse wants to know how many text messages she typically sends each day, which measure of central tendency better describes the typical number of text messages per day? 21 22 24 26 26 29 32 32 33 88
Median; The median of 27.5 is a better representative of the center since it is resistant to the one extreme value. The mean of 33.3 is not representative of the typical number of texts since only one number is larger than the mean. The one extremely high value of 88 texts/day increases the mean substantially so that it is no longer a good measure of the typical number of texts. The median is often a better measure of center when there are extreme values.
The interquartile range tells us how much space the _____ of the data occupy.
Middle 50% The interquartile range tells us how much space the middle 50% of the data occupy. It is found by subtracting the third quartile from the first quartile. Next Question
Name two measures of the center of a distribution, and state the conditions under which each is preferred for describing the typical value of a single data set.
One measure of the center of a distribution is the mean. This measure is preferred when the distribution is relatively symmetric. One measure of the center of a distribution is the median. This measure is preferred when the distribution is strongly skewed.
When an odd number of data values are arranged in order, the _________ is the middle value.
median
The Empirical Rule applies to distributions that are ________.
symmetric and unimodal According to the Empirical Rule, if a distribution is unimodal and symmetric, approximately 68% of the observations will be within one standard deviation of the mean, approximately 95% of the observations will be within two standard deviations of the mean, and nearly all the observations will be within three standard deviations of the mean.
The interquartile range (IQR) is the difference between the _______ quartile and the _______ quartile.
third and first Quartiles divide the ordered data into four equal parts. The first quartile is the median of the bottom half of the data and separates the first quarter, or 25%, of the data from the upper three-quarters, or 75%, of the data. The second quartile is the median. The third quartile is the median of the top half of the data and separates the lower three-quarters, or 75%, of the data from the upper quarter, or 25%, of the data. The IQR is the difference between the third and first quartiles, that is, IQR=Q3−Q1.
Can the variance of a data set ever be negative? Explain.
No; since the variance is based on the squared deviations from the mean and N, it cannot be negative. A population variance is the sum of the squared deviations from the mean, divided by N. Since the deviations from the mean are squared, each one is zero or positive, never negative. N is the number of objects in the population and must be positive. As a result, the variance must be zero or a positive number, never negative.
A community college faculty is negotiating a new contract with the school board. The distribution of faculty salaries is skewed right by several faculty members who make over $100,000 per year. If the faculty want to give the community the impression that they deserve higher salaries, should they advertise the mean or median of their current salaries?
The faculty should use the median to make their argument. The median will be lower than the mean since the mean is influenced by the few extremely high salaries. The median is resistant to the extremely high salaries and will not be influenced as much as the mean. As a result, the median will be lower than the mean. By reporting the lower measure of center, the faculty can better make their case that they deserve higher salaries.
If all the data values in a population are converted to z-scores, the distribution of z-scores will have what mean?
The mean of the z-scores will be zero. When data is standardized by converting it to z-scores, the new distribution has a mean of zero and a standard deviation of 1.
The mean represents the typical value in a set of data for what type of distribution?
The mean represents the typical value in a set of data for distributions that are roughly symmetric.
How can you tell from a boxplot if the distribution is symmetric?
The median is in the center of the box, and the left and right whiskers are approximately the same length. If a distribution is symmetric, then the distance from the median to the first quartile will be about the same as the distance from the median to the third quartile, and the distance from min to Q1 is about the same as the distance from Q3 to max
A community college school board is negotiating a new contract with the college faculty. The distribution of faculty salaries is skewed right by several faculty members who make over $100,000 per year. If the school board wants to give the community the impression that the faculty are already overpaid, should they advertise the mean or median of the faculty salaries?
The school board should use the mean to make their argument. The mean will be higher than the median since it will be influenced by the few high salaries. The mean is not resistant and will be pulled in the direction of the tail for a skewed distribution. Since the distribution is right-skewed, the mean will be pulled to the right which is a higher value.
To compute the variance, what should one do?
The variance is the square of the standard deviation. It is represented symbolically by s^2.
If someone's gross annual income has a z-score of positive 2, what can be concluded?
Their income is 2 standard deviations above the mean income.
a.In your own words, describe to someone who knows only a little statistics how to recognize when an observation is an outlier. What action(s) should be taken with an outlier? b. Which measure of the center (mean or median) is more resistant to outliers, and what does "resistant to outliers" mean?
a.Outliers are observed values far from the main group of data. In a histogram they are separated from the others by space. Outliers must be looked at in closer context to know how to treat them. If they are mistakes, they might be removed or corrected. If they are not mistakes, you might do the analysis twice, once with and once without the outliers. Outliers are observed values that lie outside the range of the main group of data. When an outlier is present, the observer needs to consider it more closely. Sometimes it is just a mistake that happened while collecting the data and can be corrected or discarded. Other times it is a legitimately observed value. In those cases, the analysis needs to be presented once with outliers and once without outliers to give a better idea of what a typical value is. Note that in statistics, potential outliers are defined as observations that are more than 1.5 interquartile ranges below the first quartile or above the third quartile, not above or below the median. Also note that a potential outlier is not the same thing as an outlier. b. The median is more resistant, which indicates that it usually changes less than the mean when comparing data with and without outliers. The median is more resistant to outliers than the mean, especially when the outliers have extreme values. The presence of an extreme value can cause the mean to become very skewed because it will shift heavily in the direction of the extreme value. The amount the median shifts by is based on the number of data observations, because it is determined by the middle value after ordering all the observations from lowest to highest. If there is only one outlier with an extremely large value the median will shift very slightly, while the mean will change significantly.