Chapter 2
What is the first step to finding the median of a distribution?
1. Arrange all observations in order of size, from the smallest to the largest.
What are 2 measure of center and spread that can be used for reasonably symmetric distributions and free of outliers?
1. Mean 2. Standard Deviation
What is the Four-Step Process to Organize our Problems:
1. STATE: What is the practical question, in the context of the real-world setting? 2. PLAN: What specific statistical operations does this problem call for? 3. SOLVE: Make the graphs and carry out all required calculations. 4. CONCLUDE: Give your practical conclusion in the setting of the real-world problem.
The five-number summary os a distribution consists of the:
1. minimum (smallest observation) 2. Q1 3. median (M) 4. Q3 5. maximum (largest observation)
When the mean and median are roughly symmetric, then the distribution is___.
Close together
If the distribution is exactly symmetric, then the mean and median are ___.
Exactly the same
What is the equation for the interquartile range (IQR)?
IQR= Q3 -Q1
___ tend to require additional investigation.
Outliers
What is a boxplot?
a graphical display of the five-number summary.
The mean is...
an arithmetic average and is calculated by summing the observations and then dividing by the number of observations.
When extreme observations are involved, be sure to choose statistical methods that ___ influenced by outliers.
are not
Which must be true about the standard deviation? a. It is always positive (i.e., it is always greater than 0). b. It can equal 0. c. It is resistant to outliers. d. It is a function of the median.
b. It can equal 0.
In a skewed distribution, the mean is usually where?
farther in the long tail than the median.
The deviation of an observation xi is ___ from the mean. (xi - x).
how far it is
Since the deviation depends on the MEAN x, it ___ resistant to outliers and skewness.
is not
Boxplot: A ___ in the box marks the median M.
line
Boxplot: Lines extend from the box out to the ___ observations.
smallest and largest
The median is...
the midpoint of a distribution the number such that half of the observations are smaller than it & the other half are larger.
What is the first quartile, Q1?
the value in the sample that has 25% of the data at or below it. (the median of the lower half of the data)
What is the second step to finding the median of a distribution of the number of observations is 1. odd 2. even
1. odd: the median M is the center observation in the list. 2. even: the median M is midway between the two center observations in the list. (the average of those 2 points) (to find the median in the ordered list of observations can be found by counting up to the value at the "(n+1)/2" location. Ex: if there are 9 observations...9+1/2 is 5. So the median number is the 5th number.)
What are 3 measure of center and spread that are resistant to strong outliers?
1. 5 number summary 2. Median 3. IQR (interquartile range)
How do you find the quartiles?
1. Arrange the observations in increasing order and locate the median in the ordered list of observations. 2. The first quartile Q1 is the median of the observations that are to the left of M. 3. The third quartile Q3 is the median of the observations that are to the right of M.
A suspected outlier falls more than ___ away from either Q1 or Q3.
1.5 "IQRs"
What portion of data is between the Q1 and Q3?
50%
A suspected outlier will influence which statistic the most? a) Q1 b) IQR c) Median d) Mean
d) Mean
Standard deviation is or is not resistant to outliers and skewness?
IS NOT (like the mean)
Standard deviation ___ as the observations become more spread out about their mean.
Increases
Which measure of center is most impacted by extreme observations?
Mean (if one value changes drastically, then the average will change, not the middle number or median.)
Which measure of center is most appropriate for heavily skewed data sets? Mean or Median?
Median
What is considered the "balance point" of the distribution?
The mean
What is the difference between the mean and the median?
The mean is an average the median is the middle number in ASCENDING order.
Why is the sum of the deviations always 0?
because the mean is in the middle of all the data points. So, there will be just as many numbers below the mean as there is above (Mathematically speaking)
Which is not true about Q3? a) It is the median of the ordered list of observations that are to the right of the median. b) It is the 75th percentile. c) It is the mean of the ordered list of observations that are to the right of the median. d) It is one of the statistics used in computing the "IQR."
c) It is the mean of the ordered list of observations that are to the right of the median.
The number n-1 in the denominator is called the ___.
degrees of freedom
Quartiles and medians are also measures of ___ - hence the need to order the values.
location
Boxplot: A central box spans the ___ of the data (marked by the first and third quartiles).
middle 50%
A ___ is similar to a boxplot, but it shows suspected outliers as ___.
modified boxplot dots (or another symbol, such as an asterisk)
The ___ is used to describe the variation around the mean.
standard deviation, s,
What is the second quartile, Q2?
synonymous to the median
What is the interquartile range (IQR)?
the distance between Q1 and Q3. (measure of spread that plays a role in mathematically determining outliers.)
What is the third quartile, Q3,
the value in the sample that has 75% of the data at or below it. (The median of the upper half of the data)
Prior to finding the standard deviation, we first calculate the ___, the average of the squares of the deviations of the observations from their mean.
variance s2