Chapter 2 Organizing and Summarizing Data
If the data are discrete, but there are many different values of the variables, or if the data are continuous, the categories of data (the classes)
must be created using intervals of numbers.
If the data are discrete and there are relatively few different values of the variable, the categories of data (classes) will be the
observations (as in qualitative data).
time-series plot
obtained by plotting the time in which a variable is measured on the horizontal axis and the corresponding value of the variable on the vertical axis.
When data is collected from a survey or a designed experiment it must be__________
organized into a manageable form
stem and leaf plot
uses digits to the left of the rightmost digit to form the stem. Each rightmost digit forms a leaf.
time series data
value of a variable is measured at different points in time
first step in summarizing quantitative data is to determine
whether the data are discrete or continuous
Raw data.
data that is not organized
Advantage of Stem-and-Leaf Diagrams over Histograms?
Once a frequency distribution or histogram of continuous data is created, the raw data is lost (unless reported with the frequency distribution). However, the raw data can be retrieved from the stem-and-leaf plot
cumulative frequency distribution
displays the aggregate frequency of the category. For continuous data, it displays the total number of observations less than or equal to the upper class limit of a class.
cumulative relative frequency distribution
displays the proportion (or percentage) of observations less than or equal to the category for discrete data and the proportion (or percentage) of observations less than or equal to the upper class limit for continuous data.
relative frequency formula
frequency/sum of all frequencies
Guidelines for Determining the Lower Class Limit of the First Class and Class Width Determining the Class Width
Decide on the number of classes. Generally, there should be between 5 and 20 classes. The smaller the data set, the fewer classes you should have.
Horizontal Bars
Bar graphs may also be drawn with horizontal bars. Horizontal bars are preferable when category names are lengthy.
Determine the class width by computing
Round this value up to a convenient number.
Ex. 11: Constructing a Stem-and-Leaf Plot An individual is considered to be unemployed if he or she does not have a job, but is actively seeking employment. The following data represent the unemployment rate in each of the fifty United States plus the District of Columbia in June, 2008.
We let the stem represent the integer portion of the number and the leaf will be the decimal portion. For example, the stem of Alabama (4.7) will be 4 and the leaf will be 7
Stem-and-leaf Plot Example
a data value of 267 would have 26 as the stem and 7 as the leaf. Repeating numbers for the leafs is possible
Pie Charts Ex.7: (continued) a) What variable is described by this pie chart? Also, is it qualitative or quantitative? b) What proportion of fathers stayed home due to being ill or disabled? c) What percentage of fathers stayed home from reasons not related to school, being retired, or other? d) If there are 560 fathers surveyed that stayed home from being ill or disabled, how many fathers in total were sampled that stayed home?
a) Reason why fathers stay home; qualitative b) 0.34 c) 100% − 22% = 78% d) Define: N = number of fathers total; solve for N proportion ill or disabled = 𝟓𝟔𝟎/𝑵 = 0.34 -> 560 = 0.34N -> N = 1647
Classes
are categories into which data are grouped.
Pareto chart
is a bar graph where the bars are drawn in decreasing order of frequency or relative frequency.
pie chart
is a circle divided into sectors. Each sector represents a category of data. The area of each sector is proportional to the frequency of the category.
ogive
is a graph that represents the cumulative frequency or cumulative relative frequency for the class. It is constructed by plotting points whose x-coordinates are the upper class limits and whose y-coordinates are the cumulative frequencies or cumulative relative frequencies of the class. Line segments are drawn connecting consecutive points.
frequency polygon
is a graph that uses points, connected by line segments, to represent the frequencies for the classes. It is constructed by plotting a point above each class midpoint on a horizontal axis at a height equal to the frequency of the class. Line segments are drawn connecting consecutive points.
histogram
is constructed by drawing rectangles for each class of data. The height of each rectangle is the frequency or relative frequency of the class. The width of each rectangle is the same and the rectangles touch each other
bar graph
is constructed by labeling each category of data on either the horizontal or vertical axis and the frequency or relative frequency of the category on the other axis.
dot plot
is drawn by placing each observation horizontally in increasing order and placing a dot above the observation each time it is observed.
Frequency of a category
is the number of observations in that category
relative frequency
is the proportion (or percent) of observations within a category
class midpoint
is the sum of consecutive lower class limits divided by 2.
frequency distribution
lists each category of data and the number of occurrences for each category of data.
relative frequency distribution
lists each category of data with the relative frequency
Ways to Organize Data
• Tables • Graphs • Numerical Summaries