Chapter 2
Features of a bar graph
1. Bars can be vertical or horizontal. 2. Bars are of uniform width and uniformly spaced. 3. The lengths of the bars represent values of the variable being displayed, the frequency of occurrence, or the percentage of occurrence. The same measurement scale is used for the length of each bar. 4. The graph is well annotated with title, labels for each bar, and vertical scale or actual value for the length of each bar.
How to make a frequency table
1. Determine the number of classes and the corresponding class width. 2. Create the distinct classes. We use the convention that the lower class limit of the first class is the smallest data value. Add the class width to this number to get the lower class limit of the next class. 3. Fill in upper class limits to create distinct classes that accommodate all possible data values from the data set. 4. Tally the data into classes. Each data value should fall into exactly one class. Total the tallies to obtain each class frequency. 5. Compute the midpoint (class mark) for each class. 6. Determine the class boundaries.
How to make a stem and leaf display
1. Divide the digits of each data value into two parts. The leftmost part is called the stem and the rightmost part is called the leaf. 2. Align all the stems in a vertical column from smallest to largest. Draw a vertical line to the right of all the stems. 3. Place all the leaves with the same stem in the same row as the stem, and arrange the leaves in increasing order. 4. Use a label to indicate the magnitude of the numbers in the display. We include the decimal position in the label rather than with the stems or leaves.
How to make an ogive
1. Make a frequency table showing class boundaries and cumulative frequencies. 2. For each class, make a dot over the upper class boundary at the height of the cumulative class frequency. The coordinates of the dots are (upper class boundary, cumulative class frequency). Connect these dots with line segments. 3. By convention, an ogive begins on the horizontal axis at the lower class boundary of the first class.
What do graphs tell us?
Appropriate graphs provide a visual summary of data that tells us • how data are distributed over several categories or data intervals; • how data from two or more data sets compare; • how data change over time.
Pareto Chart
A pareto chart is a bar graph in which the bar height represents frequency of an event. In addition, the bars are arranged from left to right according to decreasing height.
Stem and Leaf Display
A stem and leaf display is a method of exploratory data analysis that is used to rand-order and arrange data into groups.
What does an ogive tell us?
An ogive (also known as a cumulative-frequency diagram) tells us • how many data are less than the indicated value on the horizontal axis; • how slowly or rapidly the data values accumulate over the range of the data. In addition, the vertical scale can be changed to cumulative percentages by dividing the cumulative frequencies by the total number of data. Then we can tell what percentage of data are below values specified on the horizontal axis.
Ogive
An ogive is a graph that displays cumulative frequencies
Histogram
Histograms and relative frequency histograms provide effective visual displays of data organized into frequency tables. in these graphs, we use bars to represent each class, where the width of the bar is the class width. For histograms, the height of the bar is the class frequency, whereas for relative-frequency histograms, the height of the bar is the relative frequency of that class.
Time series graph
Data are plotted in order of occurrence at regular intervals over a period of time
Circle or pie chart
In a circle or pie chart, wedges of a circle visually display proportional parts of the total population that share a common characteristic
Cumulative Frequency
The cumulative frequency for a class is the sum of the frequencies for that class and all previous classes
Outliers
Outliers in a data set are the data values that are very different from other measurements in the data set.
How to find the class width
(Largest - smallest)/desired number of classes increase the computed value to the next highest whole number (even if calculation results in a whole number) *5 to 15 classes are usually used*
Midpoint
(Lower class limit + upper class limit)/2
How to make a dotplot
Display the data along a horizontal axis. Then plot each data value with a dot or point above the corresponding value on the horizontal axis. For repeated data values, stack the dots.
What do histograms and relative frequency histograms tell us
Histograms and relative frequency histograms show us how the data are distributed. By looking at such graphs, we can tell • if the data distribution is more symmetric, skewed, or bimodal; • if there are possible outliers; • which data intervals contain the most data; • how spread out the data are.
Frequency Table
A frequency table partitions data into classes or intervals of equal width and shows how many data values are in each class. The classes or intervals are constructed so that each data value falls into exactly one class
How to split a stem
When a stem has many leaves, it is useful to split the stem into two lines or more. For two lines per stem, place leaves 0 to 4 on the first line and leaves 5 to 9 on the next line.
Changing the scale
use the squiggle to change the scale on the graph
How to decide which type of graph to use
Bar graphs are useful for quantitative or qualitative data. With qualitative data, the frequency or percentage of occurrence can be displayed. With quantitative data, the measurement itself can be displayed, as was done in the bar graph showing life expectancy. Watch that the measurement scale is consistent or that a jump scale squiggle is used. Pareto charts identify the frequency of events or categories in decreasing order of frequency of occurrence. Circle graphs display how a total is dispersed into several categories. The circle graph is very appropriate for qualitative data, or any data for which percentage of occurrence makes sense. Circle graphs are most effective when the number of categories or wedges is 10 or fewer. Time-series graphs display how data change over time. It is best if the units of time are consistent in a given graph. For instance, measurements taken every day should not be mixed on the same graph with data taken every week. For any graph: Provide a title, label the axes, and identify units of measure. As Edward Tufte suggests in his book The Visual Display of Quantitative Information, don't let artwork or skewed perspective cloud the clarity of the information displayed.
What do stem and leaf displays tell us
Stem-and-leaf displays give a visual display that • shows us all the data (or truncated data) in order from smallest to largest; • helps us spot extreme data values or clusters of data values; • displays the shape of the data distribution.
How to Tally Data/Class Frequency
Tallying data is a method of counting data values that fall into a particular class or category. 1) To tally data into classes of a frequency table, examine each data value. 2) Determine which class contains the data value and make a tally mark or vertical stroke (|) beside that class. For ease of counting, each fifth tally mark of a class is placed diagonally across the prior four marks (||||). The class frequency for a class is the number of tally marks corresponding to that class.
Cluster Bar Graph
The graphs are called cluster bar graphs because there are two bars for each year of birth.
Lower Class Limit & Higher Class Limit
The lower class limit is the lowest data value that can fit in a class. The upper class limit is the highest data value that can fit in a class. The class width is the difference between the lower class limit of one class and the lower class limit of the next class.
Relative Frequency
The relative frequency of a class is the proportion of all data values that fall into that class Divide the class frequency by the total of all frequencies n (sample size) F/N The total of relative frequencies should be 1
Uniform or rectangular
These terms refer to a histogram in which every class has equal frequency. From one point of view, a uniform distribution is symmetrical with the added property that the bars are of the same height. Figure 2-8(b) illustrates a typical histogram with a uniform shape.
Skewed left or skewed right
These terms refer to a histogram in which one tail is stretched out longer than the other. The direction of skewness is on the side of the longer tail. So, if the longer tail is on the left, we say the histogram is skewed to the left. Figure 2-8(c) shows a typical histogram skewed to the left and another skewed to the right.
Mound Shaped symmetrical distribution shape
This term refers to a histogram in which both sides are (more or less) the same when the graph is folded vertically down the middle.
Bimodal
This term refers to a histogram in which the two classes with the largest frequencies are separated by at least one class. The top two frequencies of these classes may have slightly different values. This type of situation sometimes indicates that we are sampling from two different populations. Figure 2-8(d) illustrates a typical histogram with a bimodal shape.
How to find class boundaries (integer data)
To find upper class boundaries, add 0.5 unit to the upper class limits To find lower class boundaries, subtract 0.5 unit from the lower class limits.