Chapter 3- Numerical Descriptive Measures
Coefficient of Variation
Measures relative variation, always expressed as %. Shows variation relative to the mean. (standard deviation/given value)x100%
Measures of Central Tendency:
Mean, Median, Mode
Quartile Measures
Quartiles split the ranked data into 4 segments with an equal number of values per segment. Q1, is the value for which 25% of the observations are smaller and 75% are larger. Q2, is the same as the median 50% greater and 50% smaller. Only 25% of values are greater than Q3.
Measures of Variation:
Range, Variance, Standard Deviation, Coefficient of Variation
Chebyshev Rule
Regardless of how the data are distributed, at least (1-1/ksquared) of the values will fall within k standard deivaions on the mean (for k > 1).
The Empirical Rule
The rules gives the approximate % of approximates the variation of data in bell-shaped distributions. 68% of the data in a bell-shaped distribution lies within one standard deviation of the mean, 95% of the data lies within two standard deviations of the mean, and 99.7% of the data lies within three standard deviations of the mean.
Box-and-Whisker Plot
a graphical display of the five number summary.
Interpreting Covariance:
cov(X,Y)>0: X and Y tend to move in the same direction. cov(X,Y)<0: X and Y tend to move in opposite directions. cov(X,Y)=0: X and Y are independent. The covariance has a major flaw, it is not possible to determine the relative strength of the relationship from the size of the covariance.
Measures of Variation
give information on the spread of variability or dispersion of the data values.
Median
in an ordered array, the median is the "middle" number (50% above, 50% below). Not affected by extreme values.
Skewness
measures the amount of asymmetry in a distribution. (symmetric, or skewed)
Kurtosis
measures the relative concentration of values in the center of a distribution as compared with the tails. (flatter than bell-shaped, bell-shaped, sharper peak than bell-shaped).
Correlation Coefficient
measures the relative strength of a linear relationship between two numerical variables.
Covariance
measures the strength of the linear relationship between two numerical variables.
Five Number Summary
minimum, Q1, median, Q3, maximum
Standard Deviation
most commonly used measure of variance. Shows variation (average scatter) about the mean. Is the square root of the variance.
Variation
the amount of dispersion, or scattering, of values.
Variance
the average (approximately) of squared deviations of values from the mean.
Population Variance
the average of squared deviations of values from the mean.
Range
the difference between the largest and the smallest values. Disadvantages: ignores the way in which data are distributed. Sensitive to outliers.
Central Tendency
the extent to which all the data values group around a typical or central value.
The arithmetic mean
the most common measure of central tendency. All values divided by the total number of values. Affected by extreme values (outliers).
Population Standard Deviation
the most commonly used measure of variation.
Z-Score
the number of standard deviations a data value is from the mean. To compute the z-score of a data value, subtract the mean and divide by the standard deviation. Z-scores less than -3.0, and greater than 3.0 are considered outliers.
Shape
the pattern of the distribution of values from the lowest value to the highest value.
Population Mean
the sum of the values in the population divided by the population size, N.
Mode
the value that occurs most often. Not affected by extreme values. Used for either numerical or categorical data. The can be no mode, or many.