Statistics Capter 3
Evaluating Normality
- Construct charts or graphs - Compute descriptive summary measures - Observe the distribution of the data set - Evaluate normal probability plot
Range
- Does not account for how the data are distributed - Sensitive to outliers
The Coefficient of Variation
- Measures relative variation - Always in percentage (%) - Shows variation relative to mean - Can be used to compare the variability of two or more sets of data measured in different units
Standard Deviation
- Most commonly used measure of variation - Shows variation about the mean - Is the square root of the population variance - Has the same units as the original data
Evaluating Normality
- Not all continuous distributions are normal It is important to evaluate how well the data set is approximated by a normal distribution. - Normally distributed data should approximate the theoretical normal distribution
IQR
- Q3 - Q1 and measures the spread in the middle 50% of the data - The midspread - A measure of variability that is not influenced by outliers or extreme values - Resistant measures
Mean Absolute Deviation (MAD)
- The first measure of dispersion. - The average of absolute differences between each value in a set of value, and the average of all the values of that sets.
Measures pf Variations: Summary Characteristics
- The more the data are spread out, the greater the range, variance, and standard deviation. - The more the data are concentrated, the smaller the range, variance, and standard deviation. - If the values are all the same (no variation), all these measures will be zero. - None of these measures are ever negative.
Z-score : Location Extreme Outlieners
- The number of standard deviations a data value is from the mean. A data value is considered an extreme outlier if it's outside range of -3.0 and +3.0. The larger the absolute value, the farther the data value is from the mean.
The Empirical Rule
- To approximates the variation of data in a bell-shaped distribution - Approximately 68% of the data in a bell shaped distribution is within 1 standard deviation of the mean -Approximately 95% of the data in a bell-shaped distribution lies within two standard deviations of the mean - Approximately 99.7% of the data in a bell-shaped distribution lies within three standard deviations of the mean
Mode
- Value that occurs most often - Not affected by extreme values - Used for either numerical or categorical (nominal) data - There may may be no mode - There may be several modes
The Boxplot
A Graphical display of the data based on the five-number summary
Mean, Median. Mode. Geometric Mean
Central Tendency
Median
In an ordered array, the median is the "middle" number (50% above, 50% below) Less sensitive than the mean to extreme values
Range, Variance, Standard Deviation, Coefficient of Variation
Measures of Variation
Chebyshev Rule
Regardless of how the data are distributed, at least (1 - 1/k2) x 100% of the values will fall within k standard deviations of the mean (for k > 1)
Variation
The amount of dispersion or scattering of values
Central Tendency
The extent to which all the data values group around a typical or central value
Skewness
The extent to which data values are not symmetrical
Mean
The most common measure of central tendency Affected by extreme values (outliers)
Shape
The pattern of the distribution of values from the lowest value to the highest value
Population mean
The sum of the values in the population divided by the population size, N
The Five Number Summary
To help describe the center, spread and shape of data are
The Covariance
To measure the strength of the linear relationship between two numerical variables
Quartile
To split the ranked data into 4 segments with an equal number of values per segment
Symmetric
With perfectly bell shaped distributions, the mean, median, and mode are identical.
Left-Skewed ( Negatively Skewed)
the mean is lowest, followed by the median and mode.
Right-Skewed ( Positively Skewed)
the mode is lowest, followed by the median and mean.