Statistics, Chapter 3
Standard deviation (s)
Measures spread about the mean; larger SD= more dispersion the distribution not resistant statistic
Estimate SD (for outlier test)
Min usual observation: mean- 2s Max usual observation: mean+ 2s SD= [range]/4= [max obs-min obs]/4
Sample standard deviation
Most commonly used measure of variation - defined as the positive square root of sample variance (s-squared)
Degrees of freedom
(n-1)
Shape of frequency distribution
- perfectly symmetric: mean= median= mode - rightward skewness: mean> median> mode - leftward skewness: mean< median< mode
Measures of location
- symmetric: no outliers --- best measure is mean; all measures of center coincide (mean= median= mode) - asymmetric (skewed): and/or contains outliers --- best measure is median
Best measure: symmetric and no outliers
Center -- mean Dispersion -- SD
Best measure: skewed and outliers
Center -- median Dispersion -- IQR/2
Measures of relative standing (position)
Descriptive measures of the relationship of a data value to the rest of the data
Range
Difference between the largest and smallest observations R= LDV-SDV
Population coefficient of variation
Equation same as SCV but Roman letters are replaced with Greek
Resistant
Extreme values (very large or small) relative to the data do not affect its value substantially; not affected by outliers
Chebyshev's Rule
For any set and any given number k.... AT LEAST 100[1-(1/k-squared)]% will fall within k standard deviations of the mean -- intervals (look similar to those for on bell-shaped graph) sample: (x-bar - ks, x-bar + ks) population: (mew - ksigma, mew + ksigma) When k=.... 1 - 0% 2 - 75% 3 - 88.9% 4 - 93.75%
Empirical Rule
Frequency distribution is bell-shaped APPROX is the key word (mew - sigma, mew + sigma) -- 68% [one side 34%] (mew - 2sigma, mew + 2sigma) -- 95% [13.5] (mew - 3sigma, mew + 3sigma) -- 99.7% [2.35]
Box-and-whiskers plot
Graph representing information about the five-number summary and outliers Simple - no outliers Extended - outliers (mild - *; extreme - o)
Population variance
Greek sigma-squared; only difference between PV and SV is the denominator -- PV does not have the degrees of freedom (n-1) but instead just N
Population standard deviation
Greek sigma; only difference between PSD and SSD is the denominator -- PSD does not have the degrees of freedom (n-1) but instead just N
Variability
How spread out the data are around the middle
Percentile
Kth percentile is the value such that k% of observations fall below Pk and (100-k)% fall above Pk
Lower and upper fences
LIF= Q1 - 1.5(IQR) UIF= Q3 - 1.5(IQR) LOF= Q1 - 3(IQR) UOF= Q3 - 3(IQR)
Variance
Measure of variation that involves differences among all observations in the data set
Outlier
Observation unusually large or small relative to the other values Usual/ordinary observation: |z(x)| </= 2 Unusual observation: |z(x)| > 2 Mild outlier - z -- (2,3] Extreme outlier - z > 3
Interquartile range (IQR)
Range of the middle 50% of the observations IQR= Q3-Q1 If the data set is skewed and/or there are outliers, the best measure of dispersion is IQR/2 (SKEWED LEFT if Q1 and Q2 distance is larger; SKEWED RIGHT if Q2 and Q3 is larger)
Five number summary
SDV (x-min); Q1; M; Q3; LDV (x-max) SDV - smallest data value larger than LIF LDV - largest data value smaller than UIF
Sample coefficient of variation (CV)
Sample of n observations with mean x-bar and variance s-squared; the lower the CV, the less variation in the data CV= s/(|x|) X 100% CV (A) > CV (B) -- data (A) is more variable than data (B)
Sample variance
Sample of n observations with mean x-bar is equal to the sum of the squared deviations, divided by n-1 s-squared= (1/n-1)(xi-squared-- x-bar-squared)
Quartiles
Split the sorted data into four equal parts Q1= P25 Q2= P50 Q3= P75 **If odd, take below median and above median
Sample median
Value located in the middle of data when arranged in ascending order, with 50% observations above and 50% below; not affected by outliers -- denoted by M
Mode
Value that occurs most often in a data set -- one mode - unimodal -- two modes - bimodal -- more than two modes - multimodal
Arithmetic mean
measure of central tendency; affected by extreme values (outliers) -- population mean= MEW -- sample mean= X-BAR
Summation sign (sigma)
n [top] -- endpoint i= 1 -- starting point
Z-score
z(x)= [x-mean]/SD ***apply correct sample vs. population symbols z(x)= 0 -- x= mean z(x)< 0 -- x< mean z(x)> 0 -- x> mean z(x)= (-1,1) -- 68% z(x)= (-2,2) -- 95% z(x)= (-3,3) -- 99.7%