3.2-3.4
The Coefficient of Variation
The coefficient of variation (or CV) for a set of nonnegative sample or population data, expressed as a percent, describes the standard devia- tion relative to the mean, and is given by the following:
Correlation
exists between two variables when the values of one variable are somehow associated with the values of the other variable.
Linear Correlation
exists between two variables when there is a correlation and the plotted points of paired data result in a pattern that can be approximated by a straight line.
Importance of Variance
the units of the variance are the squares of the units of the original data values. (If the original data values are in feet, the variance will have units of ft2; if the original data values are in seconds, the variance will have units of sec2.) The value of the variance can increase dramatically with the inclusion of one or more outliers (data values that are very far away from all of the others). The value of the variance is usually positive. It is zero only when all of the data values are the same number. (It is never negative.) the sample variance of s 2 is an unbiased estimator of the population variance o2, as described in Part 2 of this section. the variance has the serious disadvantage of using units that are different than the units of the original data set. This makes it difficult to understand variance as it relates to the original data set. Because of this property, it is better to focus on the standard deviation when trying to develop an understanding of variation, as we do in this section.
Variance
the variance of a set of values is a measure of variation equal to the square of the standard deviation.
Measures of Relative Standing
which are numbers showing the location of data values relative to the other values within the same data set.
The Empirical Rule
A concept helpful in interpreting the value of a standard deviation is the empirical rule. This rule states that for data sets having a distribution that is approximately bell- shaped, the following properties apply.
Interpreting a known value of the standard deviation
If the standard deviation of a collection of data is a known value, use it to find rough estimates of the minimum and maximum usual sample values as follows: minimum "usual" value = (mean) - 2 X (standard deviation) maximum "usual" value = (mean) +2 X (standard deviation)
Percentiles
Percentiles are measures of location, denoted P1, P2, . . . , P99, which divide a set of data into 100 groups with about 1% of the values in each group. Percentiles are one type of quantiles—or fractiles—which partition data into groups with roughly the same number of values in each group. ex the 50th percentile, denoted P50 has about 50% of data values below it and about 50% of the data values above it, so the 50th percentile is the same as the median.
The Chebyshev's theorum
The empirical rule applies only to data sets with bell-shaped distributions, but Chebyshev's theorem applies to any data set. Unfortunately, results from Chebyshev's theorem are only approximate. Because the results are lower limits ("at least"), Chebyshev's theorem has limited usefulness.
The Range
The range of a set of data values is the difference between the maximum data value and the minimum data value. Range= (maximum data value) − (minimum data value) because it uses only max and min data values it is very sensitive to extreme values and isn't as useful as the other measures of variation that use every data value.
Standard Deviation
The standard deviation of a set of sample values, denoted by s, is a measure of how much data values deviate away from the mean. It is calculated by using Formula 3-4 or 3-5. Formula 3-5 is just a different version of Formula 3-4, so both formulas are algebraically the same. The value of the standard deviation s is usually positive. It is zero only when all of s indicate greater amounts of variation. The value of the standard deviation s can increase dramatically with the inclusion of one or more outliers (data values that are very far away from all of the others). The units of the standard deviation s (such as minutes, feet, pounds, and so on) are the same as the units of the original data values. The sample standard deviation s is a biased estimator of the population standard deviation o , as described in Part 2 of this section.
Estimating a value of the standard deviation
To roughly estimate the standard deviation from a collection of known sample data, use: S =range /4 where range = (maximum data value) - (minimum data value)
Quartiles
are measures of location, denoted Q1, Q2, and Q3, which divide a set of data into four groups with about 25% of the values in each group
The Range Rule Of Thumb
is a crude but simple tool for understanding and inter- preting standard deviation. It is based on the principle that for many data sets, the vast could improve the accuracy of this rule by taking into account such factors as the size of the sample and the distribution, but here we sacrifice accuracy for the sake of simplicity.
Measure Of center
is a value at the center or middle of a data set.
Linear correlation coefficient r
measures the strength of the linear correlation between the paired quantitative x- and y-values in a sample. (Its value is computed by using Formula 10-1 or Formula 10-2, included in the follow- ing box. [The linear correlation coefficient is sometimes referred to as the Pearson product moment correlation coefficient in honor of Karl Pearson (1857-1936), who originally developed it.]
Importance Property of the mean
non-resistance, can change with an added outlier.
Mean
of a set of data is the measure of center found by adding the data values and dividing the total by the number of data values.