3.4 Measure of Position and Outliers
Finding quartiles
1) Arrange the data in ascending order 2) Determine the median, M, or second quartile, Q2. 3) Determine the first and third quartiles, Q1 and Q3, by dividing the data set into two halves; the bottom half will be the observations below (to the left of) the location of the median. The first quartile is the median of the bottom half and the third quartile is the median of the top half.
Steps for determining a box plot
1) Determine the lower and upper fence Lower fence = Q1 - 1.5 (IQR) Upper fence = Q3 +1.5 (IQR) 2) Draw vertical lines at the Q1, M, and Q3. Enclose these lines in a box. 3) Label lower and upper fence 4) Draw a line from Q1 to the smallest data value that is larger than the lower fence. Draw a line from Q3 to the largest value that is smaller than the upper fence. 5) Any data values that are outliers (less than the lower fence and greater than the upper fence) get marked with an asterisk (*)
Quartiles (most common percentiles) --> resistant to extreme values
Divide data sets into fourths, or four equal parts. The first quartile, denoted Q1, divides the bottom 25% of the data from the top 75%. The second quartile divides the bottom 50% of the data from the top 50%, so the second quartile is equivalent to the 50th percentile, which is equivalent to the median. Finally the third percentile divides the bottom 75% of the data from the top 25%, so that the third quartile is equivalent to the 75th percentile.
outliers
Extreme values that don't appear to belong with the rest of the data.
Determining z-score
If a data value is larger than the mean, the z-score will be positive. (occurs for observations with a value greater than the mean) If a data value is smaller than the mean, the z-score will be negative (occurs for observations less than the mean) If the data value equals the mean, the z-score will be zero Z-scores measure the number of standard deviations an observation is above or below the mean. Ex. A z-score 1.24 is interpreted as "the data value is 1.24 standard deviation above the mean." or GREATER than the mean. Ex. A z-score .5 or 1/2 , the standard deviation is LESS than the mean Ex. A z-score of 0 indicates that the value of observation is EQUAL to the mean
z-score (often called the standardized value)
Represents the distance that a data value is from the mean in terms of the number of standard deviations. (It is obtained by subtracting the mean from the data value and dividing this result by the standard deviation) The z-score is unitless. It has a mean 0 and standard deviation 1. The z-score is often called the standardized value.
Rounding rule:
Round z-scores to 2 decimal places
Determining outliers
Standardized values (z-scores) can be used to identify outliers. It is recommended to treat any data value with a z-score less than -3 or greater than +3 as an outlier. Such data values can then be reviewed for accuracy and to determine whether they belong in the data set.
Interquartile range (IQR)
The range of the middle 50% of the observations in a data set. The difference between the upper quartile and the lower quartile. IQR = Q3 - Q1 Interpretation of the interquartile range is similar to that of the range and standard deviation. That is, the more spread a set of data has, the higher the interquartile range will be.
kth percentile
denoted Pk, of a set of data is a value such that k percent of the observations are less than or equal to the value. Percentiles divide a set of data that is written in ascending order into 100 parts; thus 99 percentiles can be determined. Ex. P1 divides the bottom 1% of the observations from the top 99%, P2 divides the bottom 2% of the observations from the top 98% and so on.
percentile
provided information about how the data are spread over the interval from the smallest value to the largest value. (Recall the median divides the lower 50% of a set of data from the upper 50%. The median is a special case of a general concept called the percentile)
sample z-score
z = (x - x̄) / s
population z-score
z = (x - µ) / σ