Unit 2 part 2
At the first quartile q1:
25% of the data lies below + 75% lies above
At the second quartile - median:
50% of the data lies below and 50% of the data lies above
At the third quartile- q3:
75% of the data lies below and 25% lies above
To make a box plot:
- draw a single axis, vertically or horizontally, that spans the extent of the data. - draw short lines at the values of q1 and q3. Connect them to form a box. Then draw a short line in the box at the value of the median. - determine the fences by using the 1.5 times IQR rule. - draw the lower whisker to the minimum value or the lower fence whichever comes first. Any values beyond the lower fence should be marked as outliers. - draw the upper whisker to the maximum value or the upper fence whichever comes first. Any values beyond the upper fence should be marked as outliers
To find the quartiles by hand:
- find the median of the data set - find the median of the observations below the overall median, this is the first quartile - q1. - find the median of the observations above the overall median, this is a third quartile - q3.
Quartile
Break the data set into four groups of equal size
Deviations can be what?
Positive or negative depending on whether they are above or below the mean
The 1.5 * IQR rule for outliers:
Call an observation an outlier if it falls more than 1.5 times IQR below the first quartile or above the third quartile. - q 1 - 1.5 times IQR - q3 + 1.5 times IQR
The five number summary can be displayed how?
Can be displayed graphically in a box plot
To find the interquartile range:
Find the difference between third and first quartiles. - IQR equals q3 - q1
The range gives us what?
Gives us a sense of the variability in the entire data set. It is not however resistant to outliers
Boxplots show less detail than stem plots or histograms, but they're good for what?
Good for comparing multiple distribution side by side.
Median always needs a what?
IQR
The box in a box plot indicates what?
Indicates the interquartile range
As standard deviation becomes larger what happens?
The spread increases, meaning there is more variability in the dataset.
In general means are reported for what? And medians are reported for what?
Means are reported for symmetric distributions. Medians are reported for skewed distributions
The five number summary:
Minimum, q1, median, q3, and the maximum.
It is more difficult to determine what about a box plot?
More difficult to determine the shape of a distribution from a box plot , but it can be done. Its either whisker is significantly longer, then there is a skew in that direction. You can also look at the location of the median to determine if half the data is more spread out than the other half.
The quartiles and the interquartile range can be used to numerically determine what?
Outliers
Standard deviation is not resistant to what?
Outliers , in fact it is more sensitive to a few extreme observations than the mean
Mean always needs a what?
Standard deviation
In most cases what is the value of the standard deviation?
Standard deviation is greater than 0
What does standard deviation tell us?
Tells us on average how far we expect individual observations to be from the mean
Standard deviation
The average distance of each observation from the mean
Deviations
The difference between each data value and the mean
The five number summary
The five number summary of a distribution reports the median, quartiles and extremes.
The more resistant measure of spread.
The interquartile range, which looks at the middle 50% of the data.
If the standard deviation is 0 there is no what?
There is no variability in the dataset. This only occurs when all the values are the same.
Spread refers to what?
To the amount of variability in a data set
Box plots are also very good for what?
Very good for comparing multiple data sets. Two or more boxplots can be stacked, using the same horizontal axis. Differences in center and spread can be much more apparent
Measures of center are meaningless without what?
Without a measure of spread to a company and vice versa
Standard deviation makes no sense without what?
Without mean