1.3: Describing Quantitative Data with Numbers
Measuring Center
1.) Mean 2.) Median
Measuring Variability
1.) Range 2.) Standard Deviation 3.) Interquartile Range (IQR)
Identifying Outliers
- Any data value less than Q1 - 1.5(IQR) is an outlier - Any data value more than Q3 + 1.5(IQR) is an outlier - May be inaccurate data values may indicate a remarkable occurrence - Heavily influence the values of some summary statistics
Effect of Skewness and Outliers on Measures of Center
- If the distribution of a quantitative variable is roughly symmetric and has no outliers, the mean and median will be approximately the same. - If the distribution is strongly skewed, the mean will be pulled in the direction of the skewness but the median won't. - The median is resistant to outliers, the mean is not.
3.) Interquartile Range (IQR)
the distance between Q1 and Q3 - IQR = Q3 - Q1 - Measure the variability of the middle 50% of the data - Resistant measure
1.) Range
the distance between the minimum value and the maximum value of a distribution - Range =. Maximum - Minimum - Range is a SINGLE number - Not a resistant measure
Five-number summary
the minimum, Q1, Median, Q3 and maximum
Resistant
a statistical measure is resistant if it is not sensitive to extreme values.
Quartiles
divide the ordered data set into four groups having roughly the same number of values.
2.) Standard Deviation
measures the typical distance of the values in a distribution from the mean - Find the mean of the distribution - Calculate the deviation of each value from the mean: value - mean - Square each division - Add all the squared deviations, divide by n-1, and take the square root - Sx² is called the variance - Sx ≥ 0 - Sx = 0 when all data values are equal - Not a resistant measure - Measures variation about the mean--use only when mean is the chosen measure of center
First Quartile (Q1)
the median of the data values that are to the left of the median in the ordered list
Third Quartile (Q3)
the median of the data values that are to the right of the median in the ordered list
2.) Median
the midpoint of a distribution, the number such that about half of the observations are smaller and about half are larger - Arrange the data values in numerical order - If n is odd, the median is the middle value in the ordered list - if n is even, the median is the average of the two middle values in the ordered list - A resistant measure
Boxplot
visual representation of the five-number summary - Find the five-number summary - Identify Outliers - Draw and label the axis - Scale the axis - Draw a box from Q1 to Q3 - Mark the median inside the box - Draw whiskers to the largest and smallest data values that are not outliers. Mark any outliers with a * - Don't show gaps, clusters, or peaks - Does not show shape well
1.) Mean
x-bar is the average of all the individual data value. - x-bar = (sum xi)/n - not a resistant measure