AP Stats Chapter 1- Exploring Data

¡Supera tus tareas y exámenes ahora con Quizwiz!

Simpson's Paradox

An association between two variables that holds for each individual value of a third variable can be changed or even reversed when the data for all values of third variable are combined.

_____ based on the five-number summary are useful for comparing distributions.

Boxplots

The 1.5 * IQR rule for outliers

Call an observation if it falls more than 1.5 * IQR above the third quartile or below the first quartile

There are two sets of ________ for a two-way table: the distribution of the row variable for each value of the column variable, and the distributions of the column variable for each value of the row variable.

Conditional Distributions

Dotplot

Each data value is shown as a dot above its location on a number line.

The distribution of a categorical variable lists the categories and gives the count (________) or percent (________) of individuals that fall in each category.

Frequency Table, relative frequency table.

Interquartile Range

IQR measures the range of the middle 50% of the data. IQR = Q3 - Q1.

A dat set contains information on a number of ________.

Individuals.

The row totals and column totals in a two-way table gave the ________ of the two individual variables. It is clearer to present these distributions as percents of table total.

Marginal Distributions

An important kind of departure is an _____, an individual value that falls outside the overall pattern.

Outliers

______ are observation that lie outside the overall pattern of a distribution.

Outliers

__________ and ________ display the distribution of a categorical variable.

Pie charts and bar graphs

First Quartile

Q1 lies one-quarter of the way up the list; is the median of the observations whose position in the ordered list is to the left of the median.

Third Quartile

Q3 lies three-quarters of the way up the list; is the median of the observation whose position in the ordered list is to the right of the median.

_____, ____, and _____ describe the overall pattern of the distribution of a quantitative variable.

Shape, center, and spread

You may want to use a ______ or possibly a ______ to display conditional distributions.

Side-by-side bar graph, segmented bar graph

An association between two variables that holds for each individual value of a third variable can be changed or even reversed when the dat for all values of the third variable are combined. This is _______.

Simpson's Paradox

The five-number summary

The five-number summary of a distribution consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest. In symbols, the five-number summary is Minimum, Q1, M, Q3, and maximum.

The mean

To find the mean of a set of observations, add their values and divide by the number of observations.

A _________ of counts organizes data about two categorical values.

Two-way table

For each individual, the data give values for one or more _____.

Variables.

Conditional Distribution

a variable described the values of that variable among individuals who have a specific value of another variable. There is a separate conditional distribution for each value of the other variable.

Individuals

are the objects described by a set of data. Individuals may be people, animals, or things.

To describe the _____ between the row and column variables, compare an appropriate set of conditional distributions.

association

Association

between two variables if specific values of one variable tend to occur in common with specific value of the other. If knowing the value of one variable helps predict the value of the other.

A _______ variable places each individual into a category such as male or female.

categorical

Bar graphs are for

categorical data.

Some variables are ________ and others are ________.

categorical, quantitative.

A numerical summary of distribution should report at least its

center and its spread, or variability.

The median describes the

center, and the quartiles, and extremes show the spread.

Two Way Table

describes two categorical variables, gender and opinion about becoming rich.

Numerical summaries do not fully describe the shape of a _____.

distribution

You can use a _______, ______, or ______ to show the distribution of a quantitative variable.

dotplot, stemplot, or histogram

Stemplots separate ______.

each observation into a stem and a one-digit leaf.

The ______ has about one-fourth of the observations below it, and the ________ has about three-fourths of the observations below it.

first quartile Q1, third quartile Q3

The _________ consisting of the median, the quartiles, and the maximum and minimum values provides a quick overall description of a distribution.

five-number summary

Stemplots

give us a quick picture of the picture of the shape of a distribution while including the actual numerical values in the graph.

No Association

if knowing the value of one variable does not help you predict the value of the other, then there is no association between the variables.

A dotplot displays _________.

individual vales on a number line.

A variable described some characteristic of an ______, such as ________, ______, or _______.

individual, person's height, gender, or salary.

The _________ is the range of the middle 50% of the observations and is found by IQR = Q3 - Q1.

interquartile range (IQR)

Variable

is any characteristic of an individual. A variable can take different values for different individuals.

Median

is the midpoint of a distribution, the number such that half the observations are smaller and the other half are larger.

Data Analysis

is the process of organizing, displaying, summarizing, and asking questions about data.

Outliers are plotted as

isolated points.

Biomodal

it has two clear peaks.

The ____ is the average of the observations, and the ____ is the midpoint of the values.

mean, and median

The _____ and the _____ describe the center of a distribution in different ways.

mean, and median M

Standard Deviation

measures the average distance of the observations from their mean. It is calculated by finding an average of the squared distances and then taking the square root. This average squared distance is called the variance.

The _____ is marked within the box. Lines extend from the box to the smallest and the largest observations that are not outliers.

median

They are most useful for the Normal distribution introduced in the next chapter. The ______ and ___ are a better description for skewed distributions.

median, IQR

The number of _____ (major peaks) is another aspect of overall shape. Not all distributions have a simple overall shape, especially when there are few observations.

modes

The standard deviation is zero when there is

no variability and gets larger as the spread increases.

The mean is ______. Among measures of spread, the IQR is _____, but the standard deviation is not.

nonresistant, resistant

Distribution

of a variable tells us what values the variable takes and how often it takes these values.

Marginal Distribution

of one of the categorical variables in a two-way table of counts is the distribution of values of that variable among all individuals described by the table. Marginal distributions tell us nothing about the relationship between two variables.

An extreme observation is an ______ if it is smaller than Q1 - (1.5 * IQR) or larger than Q3 + (1.5 * IQR).

outlier

The mean and standard deviation are good descriptions for symmetric distributions without ____.

outliers.

When examining any graph, for a(n) ________ and for notable _______ from that pattern.

overall pattern, departures

Individuals may be _____, _____, or _____.

people, animals, things.

Categorical Variable

places an individual into one of several groups or categories.

Always _____ your data.

plot

A ______ variable has numerical values that measure some characteristic of each individual, such as height in centimeters or salary in dollars.

quantitative

Histogram are for

quantitative data.

Bar graphs can also compare any set of ______.

quantities measured in the same units.

When you use the median to indicate the center of a distribution, describe the spread using the _____.

quartiles.

The median is a ______ measure of center because it is relatively unaffected by extreme observations.

resistant

You can describe the overall pattern by its ____, ____, ____.

shape, center, and spread.

When comparing distributions of quantitative data, be sure to discuss

shape, center, spread, and possible outliers.

It is _______ if the left side of the graph is much longer than the right side.

skewed to the left

A distribution is ________ if the right side of the graph (containing the half of the observations with larger values) is much longer than the left side.

skewed to the right

Inference

sometimes, we're interested in drawing conclusions that go beyond the data at hand.

A statistic problem has a real world setting. You can organize many problems use the four steps:

step, plan, do, and conclude.

Two-way tables are often used to ______.

summarize large amounts of information by grouping outcomes into categories.

Some distributions have simple shapes, such as _____ or _____.

symmetric and skewed

A distribution is roughly _____.

symmetric if the right and left sides of the graph are approximately mirror images of each other.

Quantitative Variable

takes numerical values for which it makes sense to find an average.

Marginal distributions

tell us nothing about the relationship between the variables.

Histograms plot _______.

the counts (frequencies) or percents (relative frequencies) of values in equal-width classes.

Histograms

the most common graph of the distribution of one quantitative variable.

Mean

the most common measure of center is the ordinary arithmetic average, or mean.

The box spans the quartiles and shows

the spread of the central half of the distribution.

Unimodal

they have a single peak.

The ______ and especially its square root, the _______, are common measures of spread about the mean as center.

variance, standard variation

Multimodal

with more than two clear peaks.


Conjuntos de estudio relacionados

500 SHRM-SCP Exam Study Prep 1200+

View Set

Chapter 28: Care of Patients Requiring Oxygen Therapy or Tracheostomy

View Set

Fortnite, Stump Mr. Wagner, Greece

View Set

BIOL 3204 Minghetti Ch 3 Test Bank

View Set

PQ 2.4.6 Data Encapsulation and Communications

View Set

Urinary and Bowel Elimination, Therapeutic Communication

View Set