ch 3 b
correlation formula
....
five number summary
1. smallest value 2. first quartile 3. median 4. third quartile 5. largest value
for data having a bell-shaped distribution
68.26% of the values of a normal random variable are within +/- 1 standard deviation of its mean >95.44% of the values of a normal random variable are within +/- 2 standard deviations of its mean >99.72% of the values of a normal random variable are within +/- 3 standard deviations
box plot
> graphical summary of data that is based on a five number summary > a key to the development of a box plot is the computation of the median and the quartiles Q1 and Q3 >another way to identify outliers >limits are located using the interquartile range
z-score definition
>a measure of the relative location of the observation in a data set > a data value less than the sample mean will have a z-score less than zero >a data value greater than the sample mean will have a z-score greater than zero >a data value equal to the sample mean will have a z-score of zero
correlation coefficient
>can take on values between -1 and +1 > values near -1 indicate a strong negative linear relationship >values near +1 indicate a strong positive linear relationship > the close the correlation is to zero the weaker the relationship
outlier
>is an unusually small or unusually large value in a data set >a data value with a z score less than -3 or greater than +3 might be considered an outlier
correlation
a measure of linear association and not necessarily causation
data dashboards
are not limited to graphical displays > addition of numerical measures, such a the mean and standard deviation of KPIs, to a data dashboard is critical >dashboards are often interactive
Chebyshev's theorem
at least (1 - 1 / z^2) of the items in any data set will be within z standard deviations of the mean, where z is any value greater than 1 >requires z > 1, but z need not be an integer -At least 75% of the data values must be within z= 2 standard deviations of the mean -At least 89% of the data values must be within z= 3 standard deviations of the mean -At least 94% of the data values must be within z = 4 standard deviations of the mean
covariance formula
formula?
covariance
measure of the linear association between two variables >positive values indicate a positive relationship > negative values indicate a negative relationship
skewness
n ------- x sum of (Xi-X/S)^3 (n-1)(n-2)
drilling down
refers to the functionality in interactive dashboards that allows the used to access information and analyses at increasingly detailed level
moderately skewed left
skewness is negative mean will be less than median
highly skewed right
skewness is positive mean will be more than median
moderately skewed right
skewness is positive mean will more than median
skewness symmetric
skewness is zero, mean and median are equal
z-scores
standardized value Zi = Xi- X ---------- s STANDARDIZE function on excel
empirical rule
used to determine the percentage of data values that must be within a specified number of standard deviations of the mean >based on normal distribution