chapter 3 pt 1
skewed left
(a.k.a. negatively skewed)- The distribution of quantitative data is skewed left, if most of the data occurs on the right (upper) side (i.e. most of the data values are large) with a long tail on the left, lower side.
skewed right
(a.k.a. positively skewed)- The distribution of quantitative data is skewed right, if most of the data occurs on the left (lower) side (i.e. most of the data values are small) with a long tail on the right, upper side.
boxplot
- a graphical display of the five number summary. o Simple boxplot does not show outliers. o Modified boxplot shows outliers.
describing a distribution
1.) Center: WHAT IS THE TYPICAL VALUE? Mean or Median 2.) Variability: HOW SPREAD OUT IS THE DATA? Standard Deviation, Range, or Interquartile Range 3.) Shape: What does the data look like? 4.) Outliers: VALUES THAT STRAY FAR FROM THE BULK OF THE DATA. IF OUTLIER ARE PRESENT, GIVE LOCATION AND COUNT.
bimodal
2 peaks separated by valley
deviation
The distance of a data value from the mean is called its deviation
Symmetric Distribution
The distribution of quantitative data is symmetric, if when you draw a line at the center of the distribution the two halves are mirror images. Real data is almost never perfectly symmetric but is often roughly symmetric.
first quartile
The first quartile (Q1) of a data set is a value such that at least 25% of the data values are less than or equal to Q1 and at least 75% of the data values are greater than or equal to Q1. We can think of Q1 as splitting the lower 50% of the ordered data set in half. Q1 = 1 aka 25th percentile
five number summary
The five-number summary is the minimum (abbreviated min), the first quartile (denoted Q1), the median (abbreviated med), the third quartile (denoted Q3), and the maximum (abbreviated max). Max = 5 Q3 = 4 M = 3 Q1 = 1 Min = 0
mean
The mean is the arithmetic average of the data values.
median
The median of a data set is a value such that at least half of the data values are less than or equal to the median and at least half of the data values are greater than or equal to the median. We can think of the median as splitting the ordered data set in half. M = 3 aka 50th percentile and Q2
percentiles
The pth percentile of a data set is a value such that at least p% of the data values are less than or equal to the pth percentile.
third quartile
The third quartile (Q3) of a data set is a value such that at least 75% of the data values are less than or equal to Q3 and at least 25% of the data values are greater than or equal to Q3. We can think of Q3 as splitting the upper 50% of the ordered data set in half. Q3 = 4 aka 75th percentile
histograms
These are essentially bar graphs for quantitative date. A histogram is useful for identifying shape, but they are capable of identifying the other three descriptions of a distribution, center, spread and outliers.
variance
a measure of spread. It is the squared value of the standard deviation.
standard deviation
a measure of the spread of values. Another way to think of it is as roughly the average distance values fall from the mean. Also, standard deviation is the square root of the variance
interquartile range (IQR)
distance from Q3 to Q1. Represents the middle 50%
range
distance from the maximum to the minimum.
bell shaped
if the data set is unimodal, roughly symmetric, with its peak at the center and the graph looks similar to a bell.
unimodal
single peak
maximum
the largest value in a data set.
minimum
the smallest value in a data set.