Elementary Statistics W2L6
Interquartile Range
One method for detecting outliers involves a measure called the Interquartile Range.
Boxplot
A boxplot is a graph that presents the five-number summary along with some additional in formation about a data set. There are several different kinds of boxplots. The one we describe here is sometimes called a modified boxplot.
drawing a boxplot solution pt 1
see picture
drawing a boxplot solution pt 2
see picture
drawing a boxplot solution pt 3
see picture
drawing a boxplot solution pt 4
see picture
drawing a boxplot solution pt 5
see picture
outliers
An outlier is a value that is considerably larger or considerably smaller than most of the values in a data set. Some outliers result from errors; for example a misplaced decimal point may cause a number to be much larger or smaller than the other values in a data set. Some outliers are correct values, and simply reflect the fact that the population contains some extreme values.
Determining Skewness 1
Boxplots can help determine the skewness of a data set. If the median is closer to the first quartile than to the third quartile, or the upper whisker is longer than the lower whisker, the data are skewed to the right
Determining Skewness 3
If the median is approximately halfway between the first and third quartiles, and the two whiskers are approximately equal in length, the data are approximately symmetric.
z-score and the empirical rule
Since the z-score is the number of standard deviations from the mean, we can easily interpret the z-score for bell-shaped populations using The Empirical Rule. When a population has a histogram that is approximately bell-shaped, then: Approximately 68% of the data will have z-scores between -1 and 1. Approximately 95% of the data will have z-scores between -2 and 2. All, or almost all of the data will have z-scores between -3 and 3.
Computing the percentile corresponding to a given value
Sometimes we are given a value from a data set, and wish to compute the percentile corresponding to that value. Following is the procedure for doing this: Step 1: Arrange the data in increasing order. Step 2: Let x be the data value whose percentile is to be computed. Use the following formula to compute the percentile: Percentile = 100∙ [(number of values less than x)+0.5]/number of values in the data set Round this result to the nearest integer. This is the percentile corresponding to the value x
Procedure for Computing Percentiles
Step 1: Arrange the data in increasing order. Step 2: Let n be the number of values in the data set. For the pth percentile, compute the index L= (p/100)·n. Step 3: If L is a whole number, then the pth percentile is the average of the number in position L and the number in position L+ 1. If L is not a whole number, round it up to the next highest whole number. The pth percentile is the number in the position corresponding to the rounded-up value.
The five-number summary
The five-number summary of a data set consists of the median, the first quartile, the third quartile, the smallest value, and the largest value. These values are generally arranged in order.
Percentile
The mean and median of a data set describe the center of a distribution. For some data it is often useful to compute measures of positions other than the center, to get a more detailed description of the distribution. Percentiles provide a way to do this. Percentiles divide a data set into hundredths. For a number p between 1 and 99, the pth percentile separates the lowest p% of the data from the highest (100 -p)%
(Interquartile Range) IQR Method for Detecting Outliers
The most frequent method used to detect outliers in a data set is the IQR Method. The procedure for the IQR Method is: Step 1: Find the first quartile Q1, and the third quartile Q3. Step 2: Compute the interquartile range: IQR = Q3-Q1. Step 3: Compute the outlier boundaries. These boundaries are the cutoff points for determining outliers: Lower Outlier Boundary = Q1-1.5(IQR) Upper Outlier Boundary =Q3+ 1.5(IQR) Step 4: Any data value that is less than the lower outlier boundary or greater than the upper outlier boundary is considered to be an outlier.
Quartiles
There are three special percentiles which divide a data set into four pieces, each of which contains approximately one quarter of the data. These values are called the quartiles.
Z-score
Who is taller, a man 73 inches tall or a woman 68 inches tall? The obvious answer is that the man is taller. However, men are taller than women on the average. Suppose the question is asked this way: Who is taller relative to their gender, a man 73 inches tall or a woman 68 inches tall? One way to answer this question is with a z-score. The z-score of an individual data value tells how many standard deviations that value is from its population mean. For example, a value one standard deviation above the mean has a z-score of 1. A value two standard deviations below the mean has a z-score of -2.
Determining Skewness 2
if the median is closer to the third quartile than to the first quartile, or the lower whisker is longer than the upper whisker, the data are skewed to the left.
A National Center for Health Statistics study states that the mean height for adult men in the U.S. is μ = 69.4 inches, with a standard deviation of σ= 3.1 inches. The mean height for adult women is μ = 63.8 inches, with a standard deviation of σ = 2.8 inches. Who is taller relative to their gender, a man 73 inches tall, or a woman 68 inches tall?
see picture
Drawing a boxplot example
see picture
IQR Method solution pt 1
see picture
IQR Method solution pt 2
see picture
IQR method example
see picture
In 1989, the rainfall in Los Angeles during the month of February was 1.90. What percentile does this correspond to?
see picture
Procedure for Drawing a boxplot
see picture
Quartiles example
see picture
Quartiles on the TI-84 plus
see picture
Quartiles solution
see picture
The five-number summary example and solution
see picture
computing percentiles example
see picture
computing percentiles solution pt 1
see picture
computing percentiles solution pt 2
see picture