Stats 2 Chapter 3
Both the Empirical Rule and Chebyshev's Inequality can be used to determine the percentage of data that lie within a certain range. What assumptions must be made about the underlying distribution before using these rules?
The distribution must be roughly bell-shaped for the Empirical Rule to apply, but Chebyshev's Inequality holds for distributions of any shape.
State an advantage and a disadvantage of using the range instead of the variance as a measure of dispersion in sample data.
The range is easier to calculate, but it is too affected by extreme values in the data set.
In a typical boxplot, the length of the box indicates which measure of spread?
interquartile range IQR
The value that divides a histogram into two equal areas is called the ____________. The value that serves as a balancing point for a histogram is the ____________.
The value that divides a histogram into two equal areas is called the MEDIAN. The value that serves as a balancing point for a histogram is the MEAN.
What is the symbol used to represent the sample mean?
_ x
Sample population deviation
1. Stat edit 2. insert numbers 3. Stat edit 4. vars-1 5. σx
Sample population variance
1. Stat edit 2. insert numbers 3. Stat edit 4. vars-1 5. σx squared
Shape of distribution
Because the mean and median are very close, this distribution's shape is symmetrical Because the mean is greater than the median, this distributions shape is right skewed Because the mean is less than the median, this distribution's shape is left skewed.
quartiles
first: 25 second: 50 third: 75
percentile
p i= ---- n 100 p= desired percent n= numbers in set ex i=2 second position in the ordered data set
What is the symbol used to represent the sample standard deviation?
s
Explain why the mean should not be found for a sample of zip codes. Which measure of center should be used instead?
Even though they are numeric data, zip codes are qualitative since they do not measure or count anything. The mean cannot be found since adding zip codes would be meaningless. For qualitative data, the mode is the only measure of center that can be found.
Which measure of center must be equal to an actual data value? Explain why.
Since the mode is the most frequent observation that occurs in the data set, it must be an actual value from the data set.
Chebyshev's Theorem atleast 75% 2 SD at least 89% 3 SD
Using Chebychev's Theorem, determine the range of prices that includes at least 82% of the homes around the mean. According to Chebyshev's Theorem, for any distribution, the percent of the values that fall within z standard deviations from the mean will be at least (1-1/z^2)•100%, for z greater
To find how many of the data values fall within one standard deviation from the mean, find the upper and lower bounds of the interval.
_ x - s _ x + s
Name a feature of a distribution that is more easily seen in a histogram than a boxplot.
the shape of distribution
Empirical Rule
if a distribution follows a bell-shaped, symmetrical curve centered around the mean, it is expected that approximately 68, 95, and 99.7 percent of the values will fall within one, two, and three standard deviations of the mean respectively. The formula for expressing the z-score in terms of x is the following, where μ is the population mean, σ is the population standard deviation, and z is the z-score. x=μ+zσ
Which of the following is NOT a measure of spread?
midrange
5 number summary
min value Q1 Q2 Q3 Max value
variability
more variability = greater variation less variability = less variation
What is the symbol used to represent the sample variance?
s^2
A z-score represents how many ______________ a data value is above or below the ______________.
standard deviations mean
upper bound
+
lower bound
-
Sample standard deviation
1. Stat edit 2. insert numbers 3. Stat edit 4. vars-1 5. Sx
Sample Variance
1. Stat edit 2. insert numbers 3. Stat edit 4. vars-1 5. Sx squared
Since i is an integer
25th 50th 75th percentile is the average of the values in position i and position i+1
coefficient of variation
CV=s/x(100) sample standard deviation / sample mean
Interquartile range
Q3-Q1
Range
largest number minus smallest number
population z score ??????
x-μ --------- σ
What is the symbol used to represent the population mean?
μ
What is the symbol used to represent the population standard deviation?
σ
What is the symbol used to represent the population variance?
σ^2
Suppose the list below shows how many text messages Elyse sent each day for the last 10 days. If Elyse wants to know how many text messages she typically sends each day, which measure of central tendency better describes the typical number of text messages per day? 21 22 24 26 26 29 32 32 33
Median; The median of 27.5 is a better representative of the center since it is resistant to the one extreme value. The mean of 33.3 is not representative of the typical number of texts since only one number is larger than the mean.
Why does the formula for calculating the sample variance, S^2=∑(x−x-)2/n−1, involve division by n−1 instead of n?
If the formula involved division by n, the sample variance would be biased and consistently underestimate the population variance.
Suppose a pediatrician is wondering whether there is more variability in the heights or weights of the 2-year-old boys that he sees and collects the data below for a sample of 100 2-year-old boys in his practice. He concludes that the boys' weights vary more than their heights since the standard deviation is greater for weight than for height. What is wrong with this conclusion? Heights: mean=30.2 in., standard deviation=1.9 in. Weights: mean= 29.4 lb, standard deviation= 2.1 lb
Since the standard deviations have different units, he cannot compare them directly. The coefficient of variation should be used instead. The coefficient of variation (CV) should be used since the heights and weights have different units. Otherwise, he is comparing inches to pounds.
Which measure of center (mean or median) is resistant? Explain what it means for that measure to be resistant.
The median is resistant because it is not sensitive to extreme values in the data set. If the largest observation was doubled, for example, the median would not change since that largest value does not factor into its computation.
sample z score
_ x-x ------ s