Chapter 3 - Numerical Descriptive Measures
sample standard deviation
- MOST COMMON used measure of variation - shows variation about the MEAN (shows average "scatter" around the mean) - is the square root of the variance - has same units as the orig. data
population variance
- average of squared deviations of values from the mean (x - population mean)^2 / population size (N)
coefficient of variation
- measures relative variation - always in percentage - shows variance relative to mean - can be used to compare variability of 2 or more sets of data
standard deviation
- most commonly used measure of variation - shows variation about the mean - is SQUARE ROOT of POPULATION VARIATION
n+1/4
1st quartile location
n+1/2
2nd quartile location
3(n+1)/4
3rd quartile location
smallest, 1st quartile, median, 3rd quartile, largest
5 number summary includes:
1 standard deviation
68% of data
2 standard deviations
95% of data
3 standard deviations
99.7% of data
kurtosis
affects the peakedness of the curve of distribution aka how sharply the curve rises approaching the center of distribution
empirical rule
approximates the variation of data in a bell shaped distribution
sample variance
average of squared deviations of values from the mean. sum of (value - mean)^2 / n-1
s/mean*100
coefficient of variation equation (s = standard deviation)
sum(xi-xmean)(yi-ymean)^2/n-1
covariance equation
3
data value is considered an extreme value if z-score is less than or greater than ____
measures of variation
give information on the SPREAD or VARIABILITY or dispersion of the data values
zero
if the values are all the same (NO variation), range, variance, and standard deviation will be ______
Q3-Q1
interquartile range
z score
is the number of standard deviations a data value is from the MEAN - the larger, the father data value is from the mean
left-skewed
mean < median
right skewed
mean > median
median
measure of central tendency - in an ordered array, the median is the middle - LESS SENSITIVE TO EXTREME VALUES
mean
measure of central tendency - most common, known as avg. - sum of values / number of values - AFFECTED BY EXTREME OUTLIERS
mode
measure of central tendency - value that occurs the most often - NOT AFFECTED BY EXTREME VALUES - used for either numerical or categorical data - can be several of these
range
measure of variation - difference b/t largest and smallest value - SIMPLEST measure of variation - can be MISLEADING B/C DOES NOT ACCOUNT FOR HOW DATA IS DISTRIBUTED - sensitive to outliers
negative
measures of variation will never be _____
skewness
measures the extent to which data values are NOT symmetrical
coefficient of correlation
measures the relative strength of the linear relationship b/t 2 numerical variables
interquartile range
measures the spread in the middle 50% of the data - not influenced by outliers or extreme values (resistant measure)
covariance
measures the strength of the linear relationship b/t 2 numerical variables (x&y) -BIGGEST FLAW: not able to measure the relative strength of the relationship from the size of the covariance -if covariance > 0 - move in X&Y tend to move in same direction - < 0 X&Y move in opposite direction - =0 X&Y are independent
(n+1)/2
median position equation -only works when values are in NUMERICAL order
population mean
numerical descriptive measure for a population - the sum of the values in the population divided by the population size N
Chebyshev rule
regardless of how data's distributed, at least (1-1/k^2)*100% of the values falls within "k "standard deviations of the mean.
r = cov(x,y)/sxsy
sample coefficient of correlation equation sx = sqrt of (xi-mean)^2/n-1 sy = sqrt of (yi-mean)^2/n-1
square root of variance
sample standard deviation
quartiles
split ranked data in to 4 segments w/ an equal number of values per segment
variation
the amount of dispersion or scattering away from a central value that the values of a numerical variable show
central tendency
the extent to which the values of a numerical variable group around a typical / central value
shape
the pattern of the distribution of values from the lowest value to the highest value
spread out
the range, variance, and standard deviation is greater as the data is more ______
concentrated
the range, variance, and standard deviation is smaller as the data is more ______
closer to 1
the stronger the positive relationship
closer to -1
the stronger the relationship
zero
the weaker the linear relationship (appears as straight line on scatter plot)
x - mean/standard deviation
z-score equation