Exploring Data With Tables and Graphs 2.1-2.3
Define bar graph.
A *bar graph* uses bars of equal width to show frequencies of categories of categorical (or qualitative) data. The bars may or may not be separated by small gaps. Feature of a Bar Graph: ■ Shows the relative distribution of categorical data so that it is easier to compare the different categories.
Describe two of the ways in which graphs are commonly used to misrepresent data.
• Nonzero vertical axis: --> A common deceptive graph involves using a vertical scale that starts at some value greater than zero to exaggerate differences between groups. • Pictographs: --> By using pictographs, artists can create false impressions that grossly distort differences.
Describe the standard terms used in constructing frequency distributions and graphs.
• *Lower class limits* are the smallest numbers that can belong to each of the differ-ent classes. • *Upper class limits* are the largest numbers that can belong to each of the different classes. • *Class boundaries* are the numbers used to separate the classes, but without the gaps created by class limits. --> They split the difference between the end of one class and the beginning of the next class. • *Class midpoints* are the values in the middle of the classes. Each class midpoint can be found by adding the lower class limit to the upper class limit and dividing the sum by 2. • *Class width* is the difference between two consecutive lower class limits (or two consecutive lower class boundaries) in a frequency distribution.
Define histogram and relative frequency histogram.
• A *histogram* is a graph consisting of bars of equal width drawn adjacent to each other (unless there are gaps in the data). The horizontal scale represents classes of quantitative data values, and the vertical scale represents frequencies. The heights of the bars correspond to frequency values. --> A histogram is basically a graph of a frequency distribution. • A *relative frequency histogram* has the same shape and horizontal scale as a histogram, but the vertical scale uses relative frequencies (as percentages or proportions) instead of actual frequencies. Features of a Histogram: ■ Visually displays the shape of the distribution of the data. ■ Shows the location of the centre of the data. ■ Shows the spread of the data. ■ Identifies outliers.
Describe the characteristics of data.
1) *Centre*: A representative value that shows us where the middle of the data set is located. 2) *Variation*: A measure of the amount that the data values vary. 3) *Distribution*: The nature or shape of the spread of the data over the range of values (such as bell-shaped). 4) *Outliers*: Sample values that lie very far away from the vast majority of the other sample values. 5) *Time*: Any change in the characteristics of the data over time. *** Use "CVDOT" to remember.
Describe four common histogram distributions.
1) *Normal Distribution*: --> A normal distribution has a "bell" shape. 2) *Uniform distribution*: --> The different possible values occur with approximately the same frequency, so the heights of the bars in the histogram are approximately uniform. 3) *Skewed to the right*: --> Data skewed to the right (also called positively skewed) have a longer right tail. --> Distributions skewed to the right are more common than those skewed to the left because it's often easier to get exceptionally large values than values that are exceptionally small. 4) *Skewed to the left*: --> Data skewed to the left (also called negatively skewed) have a longer left tail. ***Hint: a distribution skewed to the right resembles the toes on your right foot, and one skewed to the left resembles the toes on your left foot.
Describe the principles of frequency distributions.
1) Frequencies of last digits sometimes reveal how the data were collected or measured. --> E.g. In many surveys, we can determine that surveyed subjects were asked to report some values, such as their heights or weights, because disproportionately many values end in 0 or 5. 2) The presence of gaps can suggest that the data are from two or more different populations. --> E.g. Consider a frequency distribution of the weights of randomly selected pennies. Examination of the frequencies reveals a large gap between the lightest pennies and the heaviest pennies. This suggests that we have two different populations: Pennies made before 1983 are 95% copper and 5% zinc, but pennies made after 1983 are 2.5% copper and 97.5% zinc, which explains the large gap between the lightest pennies and the heaviest pennies. 3) Combining two or more relative frequency distributions in one table makes comparisons of data much easier. --> E.g. A relative frequency distributions for the drive-through lunch service times (seconds) for McDonald's and Dunkin' Donuts.
Describe the procedure for constructing a frequency distribution.
1) Select the number of classes, usually between 5 and 20. 2) Calculate the class width. --> Class width ≈ (max data value - min data value)÷2 --> Round this result to get a convenient number. It's usually best to round up. 3) Choose the value for the first lower class limit by using either the minimum value or a convenient value below the minimum. 4) Using the first lower class limit and the class width, list the other lower class limits. (Do this by adding the class width to the first lower class limit to get the second lower class limit. Add the class width to the second lower class limit to get the third lower class limit, and so on.) 5) List the lower class limits in a vertical column and then determine and enter the upper class limits. 6) Take each individual data value and put a tally mark in the appropriate class. Add the tally marks to find the total frequency for each class.
What are features that a frequency distribution must have in order to have an approximately normal distribution?
1) The frequencies start low, then increase to one or two high frequencies, and then decrease to a low frequency. 2) The distribution is approximately symmetric: Frequencies preceding the maximum frequency should be roughly a mirror image of those that follow the maximum frequency.
Define Pareto chart.
A *Pareto chart* is a bar graph for categorical data, with the added stipulation that the bars are arranged in descending order according to frequencies, so the bars decrease in height from left to right. Features of a Pareto Chart: ■ Shows the relative distribution of categorical data so that it is easier to compare the different categories. ■ Draws attention to the more important categories
Define dotplot.
A *dotplot* consists of a graph of quantitative data in which each data value is plotted as a point (or dot) above a horizontal scale of values. Dots representing equal values are stacked. Features of a Dotplot: ■ Displays the shape of the distribution of data. ■ It is usually possible to recreate the original list of data values.
Define frequency distribution.
A *frequency distribution (or frequency table)* shows how data are partitioned among several categories (or classes) by listing the categories along with the number (frequency) of data values in each of them.
Define frequency polygon and relative frequency polygon.
A *frequency polygon* uses line segments connected to points located directly above class midpoint values. A frequency polygon is very similar to a histogram, but a frequency polygon uses line segments instead of bars. ---> A variation of the basic frequency polygon is the *relative frequency polygon*, which uses relative frequencies (proportions or percentages) for the vertical scale. An advantage of relative frequency polygons is that two or more of them can be combined on a single graph for easy comparison.
Define pie chart.
A *pie chart* is a very common graph that depicts categorical data as slices of a circle, in which the size of each slice is proportional to the frequency count for the category. --> Although pie charts are very common, they are not as effective as Pareto charts. Feature of a Pie Chart: ■ Shows the distribution of categorical data in a commonly used format.
Define stemplot.
A *stemplot* (or stem-and-leaf plot) represents quantitative data by separating each value into two parts: the stem (such as the leftmost digit) and the leaf (such as the rightmost digit). --> Better stemplots are often obtained by first rounding the original data values. --> Stemplots can be expanded to include more rows and can be condensed to include fewer rows. Features of a Stemplot: ■ Shows the shape of the distribution of the data. ■ Retains the original data values. ■ The sample data are sorted (arranged in order).
Define time-series graph.
A *time-series graph* is a graph of time-series data, which are quantitative data that have been collected at different points in time, such as monthly or yearly. Feature of a Time-Series Graph: ■ Reveals information about trends over time.
Define relative frequency distribution (or percentage frequency distribution).
A variation of the basic frequency distribution is a *relative frequency distribution* or *percentage frequency distribution*, in which each class frequency is replaced by a relative frequency (or proportion) or a percentage. --> Percentage for a class = (frequency for a class ÷ sum of all frequencies) x 100% --> The sum of the percentages in a relative frequency distribution must be very close to 100% (with a little wiggle room for rounding errors).
Define cumulative frequency distribution.
Another variation of a frequency distribution is a *cumulative frequency distribution* in which the frequency for each class is the sum of the frequencies for that class and all previous classes. --> In addition to the use of cumulative frequencies, the class limits are replaced by "less than" expressions that describe the new ranges of values.
What are the criteria for assessing normality with a normal quantile plot?
• Normal Distribution: The population distribution is normal if the pattern of the points in the normal quantile plot is reasonably close to a straight line, and the points do not show some systematic pattern that is not a straight-line pattern. • Not a Normal Distribution: The population distribution is not normal if the normal quantile plot has either or both of these two conditions: --> The points do not lie reasonably close to a straight-line pattern. --> The points show some systematic pattern that is not a straight-line pattern.