Stat: Chapter 3
Measure of Center
A value at the center or middle of a data set, commonly measure by Mean, Median, or Mode.
Coefficient of Variation
Also called CV. Describes the standard deviation relative to the mean and expressed as a percent. Allows a better comparison of variation when means are vastly different.
Mean
Also called the average. Is found by adding the data values and dividing the total by the amount of data values.
Outliers
Any value that is more than 1.5 time the IQR away from Q1 or Q3 is considered an outlier. If it is more that 3 time the IQR away, it is an extreme outlier. If it is between 1.5 and 3 times the IQR away, then it is a mild outlier.
Five number Summary
Consists of the Minimum value, lower quartile, Median, upper quartile, and maximum value of a data set.
Quartiles
Measures of location which divide a set of data into four groups with about 25% of the values in each group. The first quartile (Q1) is also called the lower quartile. The second quartile (Q2) is also called the Median. The third quartile (Q3) is also called the upper quartile. The Inter-Quartile (IQR) is the difference between the upper and lower quartiles (Q3-Q1) the same way the range is the difference between the max and min values.
Box Plot
A graph that shows a picture of the five number summary with a box representing the inter-quartile range and "whiskers" extending to the extremes. A modifies box plot also will show outliers if any by using open circles for extreme outliers and closed circles for mild outliers.
Variance
A measure of how much data values deviate away from the mean. A five step formula is employed to calculate the variance, and can be done for both population variance and sample variance.
Z-Scores
A measure of relative standing that represents the number of standard deviations away from the Mean a data value is. A Z-Score is calculated by taking the Mean and subtracting it from a given value, and then dividing by the standard deviation. Z-Scores that are more than two standard deviations from the Mean (+-2) are considered unusual. Z-Scores are positive when above the mean and negative when below the mean.
Empirical Rule
Or 68-95-99.7 Rule. If a distribution has a normal, bell-shoed, or approximately symmetrical shape, then about 68% of all data values fall within ONE Std.Dev. of the Mean, and about 95% fall within TWO Std.Dev. of the Mean, and about 99.7% fall within THREE.
Chebyshev's Theorem
Since the Empirical Rule only works for symmetrical distributions, this theorem works for any kind of distribution, symmetrical or not, and is a more conservative rule. The percentages change form 68-95-99.7% to 0-75-89% for the first three standard deviations. For any standard deviation amount K, the Chebyshev formula is 1-1/k^2.
Range
The difference between the max data value and the min data value. A very weak measure of spread since it only uses the extremes and can be greatly affected by outliers if any.
Median
The middle value when the data are arranged in order of increasing magnitude.If there are two values in the middle, find the mean of those two.
Standard Deviation
The square root of the variance, and more commonly used than variance. Variance is inflated too large because part of the formula is squaring values to make them positive.
Midrange
The value midway between the maximum and minimum values in the data set and is found by adding the max admin values and then dividing by two.
Mode
The value that occurs with the greatest frequency or tied for the greatest frequency.