Chapter 3.1
Tails
The parts of a distribution that typically trail off on either side. Distributions can be characterized as having long tails (if they straggle off for some distance) or short tails (if they don't) (p. 49).
Histogram (relative frequency histogram)
A histogram uses adjacent bars to show the distribution of a quantitative variable. A histogram uses adjacent bars to show the distribution of a quantitative variable.
Mode
A hump or local high point in the shape of the distribution of a variable. The apparent loca- tion of modes can change as the scale of a histogram is changed (p. 48).
Spread
A numerical summary of how tightly the values are clustered around the center. Measures of spread include the IQR and standard deviation (p. 52).
Gap
A region of the distribution where there are no values (p. 45).
Univocal (bimodal)
Having one mode. This is a useful term for describing the shape of a histogram when it's generally mound-shaped. Distributions with two modes are called bimodal. Those with more than two are multimodal (p. 48).
5-number summary
The 5-number summary of a distribution reports the minimum value, Q1, the median, Q3, and the maximum value (p. 54).
Interquartile range (IQR)
The IQR is the difference between the rst and third quartiles. IQR = Q3 - Q1. It is usually reported along with the median (p. 53).
Range
The difference between the lowest and highest values in a data set (p. 52). Range = max - min
Distribution
The distribution of a quantitative variable slices up all the possible values of the variable into equal-width bins and gives the number of values (or counts) falling into each bin (p. 44).
Percentile
The ith percentile is the number that falls above i% of the data (p. 53).
Quartile
The lower quartile (Q1) is the value with a quarter of the data below it. The upper quartile (Q3) has three quarters of the data below it. The median and quartiles divide data into four parts with equal numbers of data values (p. 53).
Mean
The mean is found by summing all the data values and dividing by the count: y = Total = ay. nn It is usually paired with the standard deviation (p. 57).
Median
The median is the middle value, with half of the data above and half below it. If n is even, it is the average of the two middle values. It is usually paired with the IQR (p. 51).
Center
The place in the distribution of a variable that you'd point to if you wanted to attempt the impossible by summarizing the entire distribution with a single number. Measures of center include the mean and median (p. 51).
Dotplot
A dotplot graphs a dot for each case against a single axis (p. 47).
Uniform
A distribution that doesn't appear to have any mode and in which all the bars of its histo- gram are approximately the same height (p. 48).
Boxplot
A boxplot displays the 5-number summary as a central box with whiskers that extend to the nonoutlying data values. Boxplots are particularly effective for comparing groups and for displaying possible outliers (p. 54).
Stem-and-leaf display
A display that shows quantitative data values in a way that sketches the distribution of the data. It's best described in detail by example (p. 46).
Skewed
A distribution is skewed if it's not symmetric and one tail stretches out farther than the other. Distributions are said to be skewed left when the longer tail stretches to the left, and skewed right when it goes to the right (p. 49).
Symmetric
A distribution is symmetric if the two halves on either side of the center look approxi- mately like mirror images of each other (p. 49).
Outliers
Outliers are extreme values that don't appear to belong with the rest of the data. They may be unusual values that deserve further investigation, or they may be just mistakes; there's no obvi- ous way to tell. Don't delete outliers automatically—you have to think about them. Outliers can affect many statistical analyses, so you should always be alert for them. Boxplots display points more than 1.5 IQR from either end of the box individually, but this is just a rule-of-thumb and not a de nition of what is an outlier (p. 49).
Shape
To describe the shape of a distribution, look for (p. 48) ■ single vs. multiple modes. ■ symmetry vs. skewness. ■ outliers and gaps.