Numerically Summarizing Data
Steps for Finding the Median
1) First sort the data values from smallest to largest. 2) If the number of observations is odd, the median is the middle observation. 3) If the number of observations is even, the median is the average of the middle two observations.
Interpretation:
25% of the speeds are less than or equal to the first quartile, 28 miles per hour, and 75% of the speeds are greater than 28 miles per hour. • 50% of the speeds are less than or equal to the second quartile, 32.5 miles per hour, and 50% of the speeds are greater than 32.5 miles per hour. • 75% of the speeds are less than or equal to the third quartile, 38 miles per hour, and 25% of the speeds are greater than 38 miles per hour.
The 2nd quartile divides______ from the top
50% of the data, 50%
The 3rd quartile divides ______ from the top
75% of the data, 25%
Ex. 13: Determining and Interpreting the Interquartile Range
Check the speed data for outliers. Step 1: The first and third quartiles are Q1 = 28 mph and Q3 = 38 mph. Step 2: The interquartile range is 10 mph. Step 3: The boundaries are: Lower Boundary = Q1 − 1.5(IQR) = 28 − 1.5(10) = 13 mph Upper Boundary = Q3 + 1.5(IQR) = 38 + 1.5(10) = 53 mph Step 4: There are no values less than 13 mph or greater than 53 mph. Therefore, there are no outliers
Chebyshev's Inequality
For any data set or distribution, at least (1 - 1/k2) 100% of the observations lie within k standard deviations of the mean, where k is any number greater than 1
Comparing Standard Deviations from Data Sets
Knowing the standard deviations for different data sets can be beneficial to compare their spreads. This can be especially useful for data sets having the same mean but different standard deviations.
Ex. 7: Finding the Range of a Set of Data The following data represent the travel times (in minutes) to work for all seven employees of a start-up web development company. 23, 36, 23, 18, 5, 26, 43 Find the range
Solution: min = 5, max = 43 R = max - min = 43 - 5 = 38
Checking for Outliers by Using Quartiles
Step 1 Determine the first and third quartiles of the data. Step 2 Compute the interquartile range. Step 3 Determine the boundaries. Boundaries serve as cutoff points for determining outliers. Lower Boundary = Q1 − 1.5(IQR) Upper Boundary = Q3 + 1.5(IQR) Step 4 If a data value is less than the lower boundary or greater than the upper boundary, it is considered an outlier.
Chebyshev's Inequality
The percentage of the observations that lie within k standard deviations of the mean is at least 1 - (1/k^2) when k > 1 u-ko and u+ko for k>1
Can there be more than one mode?
Yes and if two or more values each have the largest frequency or no mode (if each value has frequency of 1)
Can there be no mode?
Yes if each value has frequency of 1
The Median
a measure of the center of data which describes the 50th percentile, or the "middle value". about half of the data are less than the median and about half of the data are greater than it. We can only find the median for quantitative data
Small standard deviation measures indicate that the data values are
are less spread out and are close to the mean.
Large standard deviation measures indicate that the data values
are more spread out and farther away from the mean.
Quartiles
divide data sets into fourths, or four equal parts.
The 1st quartile, denoted Q1 , divides______ from the top
divides the bottom 25%, 75%.
Standard Deviation more
found by taking the square root of the sum of the squared differences of each data value and the mean divided by the number of data values minus 1.
Standard Deviation
gives a measure of how spread out all values of a data set are from the mean (center).
arithmetic mean (or mean)
is a measure of the center of data. The mean is found by dividing the sum of all of the observations by the number of observations.
The Mode
is a measure of the center of data which is an observation with the greatest frequency
population arithmetic mean, μ ,
is computed using all the individuals in a population. The population mean is a parameter
sample arithmetic mean, 𝑿
is computed using all the values in a sample. The sample mean is a statistic.
Interquartile Range (IQR)
is the range of the middle 50% of the observations in a data set. Q3 − Q1
population standard deviation
of a variable is the square root of the sum of squared deviations about the population mean divided by the size of the population, N
Range
range, R, of a set of data values is the difference between the largest data value and the smallest data values. That is, R = Largest Value − Smallest Value tells how far apart the smallest and largest values are, without revealing the spread of the values inbetween.
Empirical Rule (68-95-99.7) Rule
states that, in a normal distribution, about 68% of the terms are within one standard deviation of the mean, μ − 1σ and μ + 1σ , about 95% are within two standard deviations, lie between μ − 2σ and μ + 2σ. and about 99.7% are within three standard deviations, lie between μ − 3σ and μ + 3σ
sample variance
s²
variance of a variable is
the square of the standard deviation.
population variance
σ²
sample mean
𝑛 data values 𝑥1, 𝑥2, 𝑥3, ... , 𝑥𝑛 is