Chapter 2
Nonzero Axis
Always examine a graph to see whether an axis begins at some point other than zero so that differences are exaggerated.
4 Common Distribution Shapes
1. Bell-Shaped (Normal) Distribution: A normal distribution has a "bell" shape. In a normal distribution, (1) the frequencies increase to a maximum and then decrease, and (2) the graph has symmetry, with the left half of the histogram being roughly a mirror image of the right half. 2. Uniform Distribution: With a uniform distribution, the different possible values occur with approximately the same frequency, so the heights of the bars in the histogram are approximately uniform. 3. Right Skewed Distribution: Data skewed to the right have a longer right tail. 4. Left Skewed Distribution: Data skewed to the left have a longer left tail.
4 Important Principles of Tufte:
1. For small data sets of 20 values or fewer, use a table instead of a graph. 2. A graph of data should make us focus on the true nature of the data, not on other elements, such as eye-catching but distracting design features. 3. Do not distort data; construct a graph to reveal the true nature of the data. 4. Almost all of the ink in a graph should be used for the data, not for other design elements.
6 Steps for Manually Constructing a Frequency Distribution
1. Select the numbers of classes, usually between 5 and 20. The number of classes might be affected by the convenience of using round numbers; 2. Calculate the class width: class width ≈ (maximum data value) - (minimum data value) / number of classes Round this result to get a convenient number. It's usually best to round up. 3. Choose the value for the first lower class limit by using either the minimum value or a convenient value below the minimum. 4. Using the first lower class limit and the class width, list the other lower class limits. 5. List the lower class limits in a vertical column and then determine and enter the upper class limits. 6. Take each individual data value and put a tally mark in the appropriate class. Add the tally marks to find the total frequency for each class.
3 Reasons of Constructing a Frequency Distribution/Table
1. So that we can summarize large data sets; 2. So that we can analyze the data to see the distribution and identify outliers; 3. So that we have a basis for constructing graphs such as histograms.
Normal Distribution
1. The frequencies start low, then increase to one or two high frequencies, and then decrease to a low frequency. 2. The distribution is approximately symmetric, with frequencies preceding the maximum being roughly a mirror image of those that follow the maximum. The presence of gaps can suggest that the data are from two or more different populations. Data from different populations do not necessarily result in gaps.
Bar Graphs
A bar graph uses bars of equal width to show frequencies of categories of categorical or qualitative data. The vertical scale represents frequencies or relative frequencies. The horizontal scale identifies the different categories of qualitative data. The bars may or may not be separated by small gaps. A multiple bar graph has two or more sets of bars and is used to compare two or more data sets.
Dotplots
A dotplot consists of a graph in which each data value is plotted as a point or dot along a horizontal scale of values. Dots representing equal values are stacked.
Using Frequency Distributions to Understand Data
A frequency distribution can help us understand the distribution of a data set, which is the nature or shape of the spread of the data over the range of values such as bell-shaped.
Frequency of Distribution
A frequency of distribution or frequency table shows how data are partitioned among several categories or classes by listing the categories along with the number (frequency) of data values in each of them. The frequency for a particular class is the number of original values that fall into that class.
Frequency Polygon
A frequency polygon uses line segments connected to points located directly above class midpoint values. A frequency polygon is very similar to a histogram, but a frequency polygon uses line segments instead of bars.
Histogram
A histogram is a graph consisting of bars of equal width drawn adjacent to each other (unless there are gaps in the data). The horizontal scale represents classes of quantitative data values and the vertical scale represents frequencies. The heights of the bars correspond to the frequency values. A histogram is a graph of a frequency distribution. Class frequencies should be used for the vertical scale. The bar locations on the horizontal scale are usually labeled with one of the following: 1. class boundaries; 2.class midpoints, or; 3. lower class limits.
Pie Charts
A pie chart is a graph that depicts categorical data as slices of a circle, in which the size of each slice is proportional to the frequency count for the category. Graphic expert Edwin Tufte: Never use pie charts because they waste ink on components that are not data, and they lack an appropriate scale.
Relative Frequency Histogram
A relative frequency histogram has the same shape and horizontal scale as a histogram, but the vertical scales uses relative frequencies as percentages or proportions instead of actual frequencies.
Stemplots
A stemplot or stem-and-leaf plot represents quantitative data by separating each value into two parts: the stem (such as the leftmost digit) and the leaf (such as the rightmost digit). Better stemplots are often obtained by first rounding the original data values. Also, stemplots can be expanded to include more rows and can be condensed to include fewer rows. One advantage of the stemplot is that we can see the distribution of data while keeping the original data values. Another advantage is that constructing a stemplot is a quick way to sort data (arrange them in order), which is required for some statistical procedures such as finding a median, or finding percentiles.
Time-Series Graph
A time-series graph is a graph of time-series data, which are quantitative data that have been collected at different points in time, such as monthly or yearly.
Relative Frequency Distribution
A variation of the basic frequency distribution is a relative frequency distribution or percentage frequency distribution, in which each class frequency if replaced by a relative frequency or proportion or a percentage. Relative Frequency for a class = frequency for a class / sum of all frequencies Percentage for a class = frequency for a class / sum of all frequencies • 100% The sum of the percentages in a relative frequency distribution must be very close to 100%.
Ogive
Another type of statistical graph is an ogive, which depicts cumulative frequencies. Ogives are useful for determining the number of values below some particular value. An ogive uses class boundaries along the horizontal scale and uses cumulative frequencies along the vertical scale.
5 Characteristics of Data (CVDOT)
CVDOT: "Computer Viruses Destroy or Terminate" Center: A representative value that indicates where the middle of the data set is located; Variation: A measure of the amount that the data values vary; Distribution: The nature or shape of the spread of the data over the range of values such as bell-shaped; Outliers: Sample values that lie very far away from the vast majority of the other sample values; Time: Any change in the characteristics of the data over time.
CVDOT
Computer Virus Destroy or Terminate: Center: A representative value that indicates where the middle of the data set is located; Variation: A measure of the amount that the data values vary; Distribution: The nature or shape of the spread of the data over the range of values such as bell-shaped; Outliers: Sample values that lie very far away from the vast majority of the other sample values; Time: Any change in the characteristics of the data over time.
10 Graphs That Enlighten
Histogram; Scatterplots; Time-Series Graph; Dotplots; Stemplots; Bar Graphs; Pareto Charts; Pie Charts; Frequency Polygon; Ogive.
5 Terms Used in Constructing Frequency Table
Lower Class Limits (LCL): lower class limits are the smallest numbers that can belong to the different classes. Upper Class Limits (UCL): upper class limits are the largest numbers that can belong to the different classes. Class Boundaries: class boundaries are the numbers used to separate the classes, but without the gaps created by class limits. Class Midpoints: class midpoints are the values in the middle of the classes. Each class midpoint is computed by adding the lower class limit to the upper class limit and dividing the sum by 2. Class Width: class width is the difference between two consecutive lower class limits or two consecutive lower class boundaries in a frequency distribution.
2 Graphs That Deceive
Nonzero Axis; Pictographs.
Criteria for Assessing Normality with a Normal Quantile Plot
Normal Distribution: The population distribution is normal if the pattern of the points in the normal quantile plot is reasonably close to a straight line, and the points do not show some systematic pattern that is not a straight-line pattern. Not a Normal Distribution: The population distribution is not normal if the normal quantile plot has either or both of these two conditions: The points do not lie reasonably close to a straight line. The points show some systematic pattern that is not a straight-line pattern.
Scatterplots
Scatterplots (or scatter diagram) is a plot of paired (x,y) quantitative data with a horizontal x-axis and a vertical y-axis. The horizontal axis is used for the first (x) variable, and the vertical axis is used for the second variable. The pattern of the plotted points is often helpful in determining whether there is a correlation or relationship between the two variables.
Cumulative Frequency Distribution
The frequency for each class is the sum of the frequencies for that class and all previous classes.
Pictographs
When examining data depicted with a pictograph, determine whether the graph is misleading because objects of area or volume are used to depict amounts that are actually one-dimensional. Histograms and bar charts represent one-dimensional data with two-dimensional bars, but they use bars with the same width so that the graph is not misleading.
Pareto Charts
When we want a bar graph to draw attention to the more important categories, we can use a Pareto chart, which is a bar graph for categorical data, with the added stipulation that the bars are arranged in descending order according to frequencies. The vertical scale in a Pareto chart represents frequencies or relative frequencies. The horizontal scale identifies the different categories of qualitative data. The bars decrease in height from left to right. The pareto chart does a better job of showing the relative sizes of the different components.
