Statistics, Chapter 3

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Standard deviation (s)

Measures spread about the mean; larger SD= more dispersion the distribution not resistant statistic

Estimate SD (for outlier test)

Min usual observation: mean- 2s Max usual observation: mean+ 2s SD= [range]/4= [max obs-min obs]/4

Sample standard deviation

Most commonly used measure of variation - defined as the positive square root of sample variance (s-squared)

Degrees of freedom

(n-1)

Shape of frequency distribution

- perfectly symmetric: mean= median= mode - rightward skewness: mean> median> mode - leftward skewness: mean< median< mode

Measures of location

- symmetric: no outliers --- best measure is mean; all measures of center coincide (mean= median= mode) - asymmetric (skewed): and/or contains outliers --- best measure is median

Best measure: symmetric and no outliers

Center -- mean Dispersion -- SD

Best measure: skewed and outliers

Center -- median Dispersion -- IQR/2

Measures of relative standing (position)

Descriptive measures of the relationship of a data value to the rest of the data

Range

Difference between the largest and smallest observations R= LDV-SDV

Population coefficient of variation

Equation same as SCV but Roman letters are replaced with Greek

Resistant

Extreme values (very large or small) relative to the data do not affect its value substantially; not affected by outliers

Chebyshev's Rule

For any set and any given number k.... AT LEAST 100[1-(1/k-squared)]% will fall within k standard deviations of the mean -- intervals (look similar to those for on bell-shaped graph) sample: (x-bar - ks, x-bar + ks) population: (mew - ksigma, mew + ksigma) When k=.... 1 - 0% 2 - 75% 3 - 88.9% 4 - 93.75%

Empirical Rule

Frequency distribution is bell-shaped APPROX is the key word (mew - sigma, mew + sigma) -- 68% [one side 34%] (mew - 2sigma, mew + 2sigma) -- 95% [13.5] (mew - 3sigma, mew + 3sigma) -- 99.7% [2.35]

Box-and-whiskers plot

Graph representing information about the five-number summary and outliers Simple - no outliers Extended - outliers (mild - *; extreme - o)

Population variance

Greek sigma-squared; only difference between PV and SV is the denominator -- PV does not have the degrees of freedom (n-1) but instead just N

Population standard deviation

Greek sigma; only difference between PSD and SSD is the denominator -- PSD does not have the degrees of freedom (n-1) but instead just N

Variability

How spread out the data are around the middle

Percentile

Kth percentile is the value such that k% of observations fall below Pk and (100-k)% fall above Pk

Lower and upper fences

LIF= Q1 - 1.5(IQR) UIF= Q3 - 1.5(IQR) LOF= Q1 - 3(IQR) UOF= Q3 - 3(IQR)

Variance

Measure of variation that involves differences among all observations in the data set

Outlier

Observation unusually large or small relative to the other values Usual/ordinary observation: |z(x)| </= 2 Unusual observation: |z(x)| > 2 Mild outlier - z -- (2,3] Extreme outlier - z > 3

Interquartile range (IQR)

Range of the middle 50% of the observations IQR= Q3-Q1 If the data set is skewed and/or there are outliers, the best measure of dispersion is IQR/2 (SKEWED LEFT if Q1 and Q2 distance is larger; SKEWED RIGHT if Q2 and Q3 is larger)

Five number summary

SDV (x-min); Q1; M; Q3; LDV (x-max) SDV - smallest data value larger than LIF LDV - largest data value smaller than UIF

Sample coefficient of variation (CV)

Sample of n observations with mean x-bar and variance s-squared; the lower the CV, the less variation in the data CV= s/(|x|) X 100% CV (A) > CV (B) -- data (A) is more variable than data (B)

Sample variance

Sample of n observations with mean x-bar is equal to the sum of the squared deviations, divided by n-1 s-squared= (1/n-1)(xi-squared-- x-bar-squared)

Quartiles

Split the sorted data into four equal parts Q1= P25 Q2= P50 Q3= P75 **If odd, take below median and above median

Sample median

Value located in the middle of data when arranged in ascending order, with 50% observations above and 50% below; not affected by outliers -- denoted by M

Mode

Value that occurs most often in a data set -- one mode - unimodal -- two modes - bimodal -- more than two modes - multimodal

Arithmetic mean

measure of central tendency; affected by extreme values (outliers) -- population mean= MEW -- sample mean= X-BAR

Summation sign (sigma)

n [top] -- endpoint i= 1 -- starting point

Z-score

z(x)= [x-mean]/SD ***apply correct sample vs. population symbols z(x)= 0 -- x= mean z(x)< 0 -- x< mean z(x)> 0 -- x> mean z(x)= (-1,1) -- 68% z(x)= (-2,2) -- 95% z(x)= (-3,3) -- 99.7%


Ensembles d'études connexes

Chapter 7: Pay Level and Pay Mix Comparisons

View Set

Questions i got wrong practice tests

View Set

Ch 6 Video Questions and Interactive

View Set

NU140- Chapter 43 Loss, Grief, and Dying

View Set

Physiology 2 GI Practice Questions Sets 2 & 3

View Set