Chapter 3: Describing, exploring, and comparing data
median
Middle number, after ranking data values in order from least to greatest. -if there is an even number of data values, take the two middle values and average them. (add together and divide by 2)
Midquartile
(Q3+Q1)/2
Empirical Rule
1. a rule to describe data that rely on the standard deviation, we will focus on this rule which only applies to data that is APPROXIMATELY NORMAL. 2. if a population is normally distributed, then approximately: I. 68% of the data are within one standard deviation of the mean II. 95% of the data are within two standard deviations of the mean III. 99.7% of the data are within three standard deviations of the mean.
How to find Quartiles
1. rank the data in ascending order 2.find the median of the data (Q2) 3. find the median of the lower 50% of the data (Q1) 4. find the median of the upper 50% of the data (Q3)
box plot/ percentiles/quartiles
L, Q1-P25, Q2-P50, Q3-P75, H
Inter-Quartile Range (IQR)
The difference between the first and third quartiles. (Note that the first quartile and third quartiles are sometimes called upper and lower quartiles.) Q3-Q1 aka the range of the middle 50% of the data.
Range
formula: (max. data value) - (min. data value)
midrange
formula: L+H/2 (L- Lowest data value, H-highest data value -midrange is affected by outliers
Z-score
indicates how many standard deviations a given value is from the mean. 2-ways to calculate Z-score: 1. Z=x-x-bar/s - sample 2. Z= x-mu/sigma - population
Standard Deviation
is the sq. root of the variance. it measures how much a data value differs or deviates from the mean. formula: see image
mode
is the value of the data set that occurs with the greatest frequency. when no data value is repeated there is no mode.
Special mean note
mean of sample is read as x-bar the population mean is written lowercase mu, Greek alphabet, and it is the mean of all x-values for the entire population.
measure of central tendency
mean, median, mode, midrange
5 number summary
min (L), Q1, median, Q3, max (H)
The "range rule of thumb"
the rule states that for many data sets, the vast majority (95%) of sample values lie within 2 standard deviations of the mean: to find the minimum "usual value" you take the mean and subtract twice the standard deviation (mu-2(sigma)). to find the maximum "usual value" you take the mean and add twice the standard deviation (mu+2(sigma)). the unusual value would be the z-score is less than or equal to -2 or if the z-score is greater than or equal to 2. usual= (-2,2) unusual= (-infinity,-2], [2, infinity)
Variance
the variance of a set of data values is a measure of variation equal to the sq. of the standard deviation (the variant must be non-negative [must be 0 or >]) (the variant can ^ dramatically with the inclusion of one or greater outliers) formula: sample variance squared= sum of all data values in the x- x-bar)squared column/ n-1. n= sample size. (ignore i in formula)
Measures of Variation
the variation is the measure of how spread or how dispersed the data is. (standard deviation, variance, and range)
mean
to find mean, you add up all the data values and divide that number by the total number of data values. -aka: x-bar (see image for how its written) -very sensitive to outliers formula: mean=sum of all x-values divided by the number of data values