AP Stats Chapter 1- Exploring Data
Simpson's Paradox
An association between two variables that holds for each individual value of a third variable can be changed or even reversed when the data for all values of third variable are combined.
_____ based on the five-number summary are useful for comparing distributions.
Boxplots
The 1.5 * IQR rule for outliers
Call an observation if it falls more than 1.5 * IQR above the third quartile or below the first quartile
There are two sets of ________ for a two-way table: the distribution of the row variable for each value of the column variable, and the distributions of the column variable for each value of the row variable.
Conditional Distributions
Dotplot
Each data value is shown as a dot above its location on a number line.
The distribution of a categorical variable lists the categories and gives the count (________) or percent (________) of individuals that fall in each category.
Frequency Table, relative frequency table.
Interquartile Range
IQR measures the range of the middle 50% of the data. IQR = Q3 - Q1.
A dat set contains information on a number of ________.
Individuals.
The row totals and column totals in a two-way table gave the ________ of the two individual variables. It is clearer to present these distributions as percents of table total.
Marginal Distributions
An important kind of departure is an _____, an individual value that falls outside the overall pattern.
Outliers
______ are observation that lie outside the overall pattern of a distribution.
Outliers
__________ and ________ display the distribution of a categorical variable.
Pie charts and bar graphs
First Quartile
Q1 lies one-quarter of the way up the list; is the median of the observations whose position in the ordered list is to the left of the median.
Third Quartile
Q3 lies three-quarters of the way up the list; is the median of the observation whose position in the ordered list is to the right of the median.
_____, ____, and _____ describe the overall pattern of the distribution of a quantitative variable.
Shape, center, and spread
You may want to use a ______ or possibly a ______ to display conditional distributions.
Side-by-side bar graph, segmented bar graph
An association between two variables that holds for each individual value of a third variable can be changed or even reversed when the dat for all values of the third variable are combined. This is _______.
Simpson's Paradox
The five-number summary
The five-number summary of a distribution consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest. In symbols, the five-number summary is Minimum, Q1, M, Q3, and maximum.
The mean
To find the mean of a set of observations, add their values and divide by the number of observations.
A _________ of counts organizes data about two categorical values.
Two-way table
For each individual, the data give values for one or more _____.
Variables.
Conditional Distribution
a variable described the values of that variable among individuals who have a specific value of another variable. There is a separate conditional distribution for each value of the other variable.
Individuals
are the objects described by a set of data. Individuals may be people, animals, or things.
To describe the _____ between the row and column variables, compare an appropriate set of conditional distributions.
association
Association
between two variables if specific values of one variable tend to occur in common with specific value of the other. If knowing the value of one variable helps predict the value of the other.
A _______ variable places each individual into a category such as male or female.
categorical
Bar graphs are for
categorical data.
Some variables are ________ and others are ________.
categorical, quantitative.
A numerical summary of distribution should report at least its
center and its spread, or variability.
The median describes the
center, and the quartiles, and extremes show the spread.
Two Way Table
describes two categorical variables, gender and opinion about becoming rich.
Numerical summaries do not fully describe the shape of a _____.
distribution
You can use a _______, ______, or ______ to show the distribution of a quantitative variable.
dotplot, stemplot, or histogram
Stemplots separate ______.
each observation into a stem and a one-digit leaf.
The ______ has about one-fourth of the observations below it, and the ________ has about three-fourths of the observations below it.
first quartile Q1, third quartile Q3
The _________ consisting of the median, the quartiles, and the maximum and minimum values provides a quick overall description of a distribution.
five-number summary
Stemplots
give us a quick picture of the picture of the shape of a distribution while including the actual numerical values in the graph.
No Association
if knowing the value of one variable does not help you predict the value of the other, then there is no association between the variables.
A dotplot displays _________.
individual vales on a number line.
A variable described some characteristic of an ______, such as ________, ______, or _______.
individual, person's height, gender, or salary.
The _________ is the range of the middle 50% of the observations and is found by IQR = Q3 - Q1.
interquartile range (IQR)
Variable
is any characteristic of an individual. A variable can take different values for different individuals.
Median
is the midpoint of a distribution, the number such that half the observations are smaller and the other half are larger.
Data Analysis
is the process of organizing, displaying, summarizing, and asking questions about data.
Outliers are plotted as
isolated points.
Biomodal
it has two clear peaks.
The ____ is the average of the observations, and the ____ is the midpoint of the values.
mean, and median
The _____ and the _____ describe the center of a distribution in different ways.
mean, and median M
Standard Deviation
measures the average distance of the observations from their mean. It is calculated by finding an average of the squared distances and then taking the square root. This average squared distance is called the variance.
The _____ is marked within the box. Lines extend from the box to the smallest and the largest observations that are not outliers.
median
They are most useful for the Normal distribution introduced in the next chapter. The ______ and ___ are a better description for skewed distributions.
median, IQR
The number of _____ (major peaks) is another aspect of overall shape. Not all distributions have a simple overall shape, especially when there are few observations.
modes
The standard deviation is zero when there is
no variability and gets larger as the spread increases.
The mean is ______. Among measures of spread, the IQR is _____, but the standard deviation is not.
nonresistant, resistant
Distribution
of a variable tells us what values the variable takes and how often it takes these values.
Marginal Distribution
of one of the categorical variables in a two-way table of counts is the distribution of values of that variable among all individuals described by the table. Marginal distributions tell us nothing about the relationship between two variables.
An extreme observation is an ______ if it is smaller than Q1 - (1.5 * IQR) or larger than Q3 + (1.5 * IQR).
outlier
The mean and standard deviation are good descriptions for symmetric distributions without ____.
outliers.
When examining any graph, for a(n) ________ and for notable _______ from that pattern.
overall pattern, departures
Individuals may be _____, _____, or _____.
people, animals, things.
Categorical Variable
places an individual into one of several groups or categories.
Always _____ your data.
plot
A ______ variable has numerical values that measure some characteristic of each individual, such as height in centimeters or salary in dollars.
quantitative
Histogram are for
quantitative data.
Bar graphs can also compare any set of ______.
quantities measured in the same units.
When you use the median to indicate the center of a distribution, describe the spread using the _____.
quartiles.
The median is a ______ measure of center because it is relatively unaffected by extreme observations.
resistant
You can describe the overall pattern by its ____, ____, ____.
shape, center, and spread.
When comparing distributions of quantitative data, be sure to discuss
shape, center, spread, and possible outliers.
It is _______ if the left side of the graph is much longer than the right side.
skewed to the left
A distribution is ________ if the right side of the graph (containing the half of the observations with larger values) is much longer than the left side.
skewed to the right
Inference
sometimes, we're interested in drawing conclusions that go beyond the data at hand.
A statistic problem has a real world setting. You can organize many problems use the four steps:
step, plan, do, and conclude.
Two-way tables are often used to ______.
summarize large amounts of information by grouping outcomes into categories.
Some distributions have simple shapes, such as _____ or _____.
symmetric and skewed
A distribution is roughly _____.
symmetric if the right and left sides of the graph are approximately mirror images of each other.
Quantitative Variable
takes numerical values for which it makes sense to find an average.
Marginal distributions
tell us nothing about the relationship between the variables.
Histograms plot _______.
the counts (frequencies) or percents (relative frequencies) of values in equal-width classes.
Histograms
the most common graph of the distribution of one quantitative variable.
Mean
the most common measure of center is the ordinary arithmetic average, or mean.
The box spans the quartiles and shows
the spread of the central half of the distribution.
Unimodal
they have a single peak.
The ______ and especially its square root, the _______, are common measures of spread about the mean as center.
variance, standard variation
Multimodal
with more than two clear peaks.