Chapter 2
Choosing a Graph Type for Quantitative Variables
-dot plot and stem-and-leaf plot are more useful for small data sets, since they portray each individual observation. Large data sets usually are displayed with histograms -more flexibility is possible in defining the intervals with a histogram than in defining the stems with a stem-and-leaf plot -data values are retained with the stem-and-leaf plot and dot plot, but NOT with a histogram.
proportion
number of the observations that fall in a certain category is the frequency (count) of observations in that category divided by the total number of observations
Categorical variable
observation belongs to a set of categories
Quantitative variable
observations take on numerical values Classified into to types
Types of Variables
Categorical and Quantitative
Graphs for Categorical Variables
Pie chart Bar graph
Measuring the Variability/Spread of Quantitative Data
range variance standard deviation
Shapes of Histograms
When describing the shape, look for the overall pattern of the data unimodal bimodal bell-shaped symmetric skewed right skewed left
bimodal
a distribution with two distinct mounds. Can result when a population is polarized on a controversial issue
The Shape of a Distribution
a graph for a data set describes the distribution of the data, that is, the values the variable takes and the frequency of occurrence of each value. The distribution of the data (or data distribution) can also be described by a frequency table.
position
a measure of position tells us the point where a certain percentage of the data fall above or fall below that point. Or a measure of position tells us how far an observation falls from a particular point, such as the number of standard deviations an observation falls from the mean.
the 1.5 x IQR Criterion for Outliers
an observation is a potential outlier if it falls more than 1.5 x IQR below the first quartile or more and 1.5 x IQR above the third quartile
outlier
an observation that falls well above or well below the overall bulk of the data
relative frequencies
are also called proportions and percentages, and serve as a way to summarize the measurements in categories of a categorical variable
bell-shaped
distribution is unimodal and approximately symmetric
Graphs for Quantitative Variables
dot plot stem-and-leaf plot histogram
The Empirical Rule
if a distribution of data is bell-shaped, then approximately: -68% of the data fall within 1 standard deviation of the mean -95% of the data fall within 2 standard deviations of the mean -99% of the data fall within 3 standard deviations of the mean
symmetric
if the side of the distribution below a central values is a mirror image of the side above the central value
Five Number Summary
includes the minimum, Q1, Q2, Q3, and maximum value of the data set. Graphed by a box plot
Frequency table
is a listing of possible values for a variable, together with the number of observations for each value
The pth percentile
is a value such that p(percent) of the observations fall below or at that value
Interquartile Range
is the distance between Q3 and Q1
percentage
is the proportion multiplied by 100
Discrete Quantitative variable
its possible values form a set of separate numbers such as 0, 1, 2, 3
Continuous Quantitative variable
its possible values form an interval
Measuring the Center (of Quantitative Data)
mean median
variance
the average of the squared deviations of each observation from the mean
Unimodal
the data has a single mound (the highest point is at the mode)
range
the difference between the largest and the smallest observations
skewed left
the left tail is longer than the right tail
median
the midpoint of the observations when they are ordered from the smallest to the largest -not as influenced by outliers as the mean is
z-score
the number of standard deviations an observation is from the mean
Quartile 1
the observation at the 25th percentile
Quartile 2
the observation at the 50th percentile
Quartile 3
the observation at the 75th percentile
mode
the observation that occurs most frequently
skewed right
the right tail is longer than the left tail
standard deviation
the square root of the variance. A typical distance of an observation from the mean -the larger the standard deviation, the greater the variability of the data -s=0, only when all observations take the same value
mean
the sum of the observations divided by the number of observations -highly influenced by outliers
bar graph
type of graph for Categorical variables. Displays a vertical bar for each category. The height of the bar is the percentage of observations in the category. Typically, the vertical bars for each category are apart, not side by side.
pie chart
type of graph for Categorical variables. Is a circle having a "slice of the pie" for each category. The size of a slice corresponds to the percentage of observations in the category.
histogram
type of graph for Quantitative variables. A graph that uses bars to portray the frequencies or relative frequencies of the possible outcomes for a Quantitative variable. -for a discrete variable, the graph has a separate bar for each possible value -for a continuous variable, you need to divide the interval of possible values into smaller intervals formed with values grouped together. You can use a frequency table for the intervals and graph the frequencies or percentages for those intervals. Intervals should have the same width.
stem-and-leaf plot
type of graph for Quantitative variables. Each observation is represented by a stem and a leaf. The stem consists of all the digits except for the final digit, which is the leaf. -sort data in order from smallest to largest -to make a stem-and-leaf plot more compact, we can truncate the data values: cut off the final digit and plot the data as 0, 34, 7, 14, 20 instead of 0, 340, 70, 140, 200...(not necessary to round)
dot plot
type of graph for Quantitative variables. Shows a dot for each observation, placed just above the value on the number line for that observation.
Measures of Position
used to describe variability: z-score pth percentile quartiles five number summary interquartile range the 1.5 x IQR Critierion for Outliers