Chapter 3 Central Tendency, Variation, and Position
Measures of Variation
Measures of dispersion, variability or spread are numbers that describe how spread out or packed the data values are -range -variance -standard deviation
Range
Measures of variance, difference between the largest and the smallest values in the data set (maximum-minimum value) -same value equal to zero
Standard deviation
Measures of variation, also quantifies the amount of dispersion of data values from their average -same value equal to zero -square root of variance
Mean vs Median
Median is less influenced or sensitive than mean by outliers
Properties of the Coefficient of Variation
-only applies data ratio scale -usually % -less than, = too, more than 100% -higher CV, more relative variability in the data -unit-less or dimensionless
Normal Distribution
50% of data values are less than the mean 50% of data values are greater than the mean Mean, Median, mode located at the peak
Empirical Rule
68%-1 standard deviation of mean 95%-2 standard deviation of mean 99.7%-3 standard deviation of mean ***applies to all Normal Distribution, no matter what the shape***
Shape of a Normal Distribution
Bell-shaped Symmetric about the mean, has axis of symmetry Mean, median, mode coincide Defined by mean & Standard deviation *Mean-location or position of the bell *SD-widened or broadness of the bell
Outlier
Data value that is much larger or smaller than most values in data set; sensitive to outliers
Four Properties of Z-scores
Dimensionless (plain #'s, no units) Mean is always zero Positive z-score is above the mean Negative z-score is below the mean
Ungrouped Frequency Distribution
Frequency distribution in which the values of the variable are displayed individually
Grouped Frequency Distribution
Frequency distribution values of the variable are grouped in classes
Boxplot (a.k.a. Box and Whisker Diagram)
Graphical representation of the five number summary -outliers are represented by asterisks or dots beyond the whiskers
Interquartile Range
IQR=Q3-Q1 Measure of variation Distance from Q1 to Q3
Chebyshev's Theorem
In any data set, the fraction of data value that lie within K Standard Deviations from the mean is at least 1-(1/k^2) All distributions Lower bounds ("at least")
Coefficient of Variation
Is a measure of variance that quantifies the variability in a data set -relative to the mean -Relative Standard Deviation -CV=standard deviation/mean -unit-less
Central Tendency
Mean (average), median, mode
Median
Measure of central tendency defined as the value that separates the upper and lower halves of a data set -Middle Value -Numbers in order -Not always in the data set
variance
Measure of variance, measures how far a set of numbers are spread out from their mean -same value equal to zero -standard deviation squared
Percentiles
Numbered that divide a numerically ordered data set into one hundred equal groups, each one containing a hundredth of the data values -P1, P2, P3...P99
Quartiles
Numbers that divide a numerically ordered data set into 4 equal groups Each one containing a quarter of the data values -1st or lower quartile -2nd or median -3rd or upper quartile
Fences
Numbers used to determine if given values are outliers LF=Q1-1.5 times (IQR) UF=Q3+1.5 times (IQR) If x is greater than LF and less then UF=not an outlier If x is less than or equal to LF or x is greater than or equal to UF then it is an outlier
Extreme Fences
Numbers used to determine if the outliers are mild or extreme ELF=Q1-3 times (IQR) EUF=Q3+3 times (IQR) If x is greater than ELF, less than or equal to LF or x is greater than or equal to UF, less than EUF then x is a mild outlier If x is less than or equal to ELF Or x is greater than or equal to EUF then x is an extreme outlier
Relations between Quartiles and Percentiles
Q1=P25 Q2=P50=Median Q3=P75
Variation
Range, variance, standard deviation
Rounding Rule for Mean
Round the answer to one more decimal place than the original data
The Five Number Summary
Set of 5 values that provides information about the structure of a data set -minimum -Q1 -Q2 (Median) -Q3 -Maximum
The wider the bell curve, the bigger...
Standard deviation
Position
Standard scores, quartiles, percentiles
Me an (average)
Sum of all the numbers in a data set divided by the total amount of values
Table of Weighted Values
Table each values of the variable has an assigned weight
Properties of Normal Distribution
Total area under the curve of a normal distribution is equal to 1 (100%) Curve approaches but NEVER touches the x-axis Extends far away from the mean
Mode
Values that appear most often in a data set -not affected by outliers -applies to numerical and categorial -more than one mode -no mode at all
Standard Scores (or Z-scores)
Z=data value - mean/SD Indicates how many standard deviation away from the mean the value is
Population z-score
Z=x-mean/SD (GREEK letters)
Sample z-score
Z=x-mean/SD (LATIN letters)
Measures of central tendency
numbers describe "center" or "middle point" of data set (mean, median, mode)