AP Statistics Chapter 1

Ace your homework & exams now with Quizwiz!

1.1 Summary 1

The distribution of a categorical variable lists the categories and gives the count (frequency table) or percent (relative frequency table) of individuals that fall in each catagory.

1.2 Summary 5

RememberL Histograms are for quantitative data; bar graphs are for categorical data. Also, be sure to use relative frequency histograms when comparing data sets of different sizes.

1.1 Summary 4

The row totals and column totals in a two-way table give the marginal distributions of the two individual variables. It is clearer to present these distributions as percents of the table total. Marginal distributions tell us nothing about the relationship between the variables.

1.1 Categorical variable

A categorical variable places an individual into one of several groups or categories.

1.1 Conditional distribution

A conditional distribution of a variable describes the values of that variable among individuals who have a specific value of another variable. There is a separate conditional distribution for each value of the other variable.

1.2 Symmetric

A distribution is roughly symmetric if the right and left sides of the graph are approximately mirror images of each other.

1.2 Skew to the left

A distribution is skewed to the left if the left side of the graph is much longer than the right side.

1.2 Skew to the right

A distribution is skewed to the right if the right side of the graph (containing the half of the observations with larger values) is much longer than the left side.

1.3 Summary 1

A numerical summary of a distribution should report at least its center ad its spread, or variability.

1.1 Quantitative variable

A quantitative variable takes numerical values for which it makes sense to find an average.

1.1 Summary 6

A statistical problem has a real-world setting. You can organize many problems using the four steps state, plan, do and conclude.

1.1 Summary 3

A two-way table of counts organizes data about two categorical variables. Two-way table are often used to summarize large amounts of information by grouping outcomes into categories.

1.1 Variable

A variable is any characteristic of an individual. A variable can take different values for different individuals.

1.1 Summary 8

An association between two variables that holds for each individual value of a third variable can be changed or even reversed when the data for all values of the third variable are combined. This is Simpson's paradox.

1.1 Simpson's paradox

An association between two variables that holds for each individual value of a third variable can be changed or even reversed when the data for all values of the third variable are combined. This reversal is called Simpson's paradox.

1.3 Summary 5

Boxplots based on the five-number summary are useful for comparing distributions. The box spans the quartiles and shows the spread of the centra half of the distribution. The median is marked within the box. Lines extend from the box to the smallest and the largest observations that are not outliers. Outliers are plotted as isolated points.

1.3 The 1.5*IQR rule for outliers

Call an observation an outlier if it falls more than 1.5*IQR above the third quartile or below the first quartile.

1.3 Summary 9

Numerical summaries do not fully describe the shape of a distribution. Always plot your data.

1.1 Summary 2

Pie charts and bar graphs display the distribution of a categorical variable. Bar graph can also compare any set of quantities measured in the same units. When examining any graph, ask yourself,"What do I see?"

1.2 Summary 3

Some distributions have simple shapes, such as symmetric or skewed. The number of modes (major peaks) is another aspect of overall shape. Not all distributions have a simple overall shape, especially when there are few observations.

1.1 Distribution

The distribution of a variable tells us what values the variable takes and how often it takes these values.

1.3 Summary 4

The five-number summary consist of the median, the quartiles, and the maximum and minimum values provides a quick overall description of a distribution. The median describes the center, and the quartiles and extremes show the spread.

1.1 Marginal distribution

The marginal distribution of one of the categorical variables in a two-way table of counts is the distribution if values of that variable among all individuals described by the table.

1.3 Summary 8

The mean and standard deviation are good descriptions for symmetric distributions without outliers. They are most useful for the normal distribution introduced in the next chapter, The median and IQR are a better description for skewed distribution.

1.3 Summary 2

The mean and the median describe the center of a distribution in different ways. The mean is the average of the observations, and the median is the midpoint of the values.

1.3 The median M

The median M is the midpoint of a distribution, the number such that half the observations are smaller and the other half are larger. To find the median of a distribution: 1. Arrange all observations in order of size, from smallest to largest. 2. If the number of observations n is odd, the median M is the center observation in the ordered list. 3. If the number of observations n is even, the median M is the average of the two center observations in the ordered list.

1.3 Summary 7

The median is a resistant measure of center because it is relatively unaffected by extreme observations. The mean is nonresistant. Among measures of spread, the IQR is resistant, but the standard deviation is not.

1.3 The standard deviation Sx and variance S^2x

The standard deviation Sx measures the average distance of the observations from their mean. It is calculated by finding an average of the squared distances and then taking the square root. This average squared distance is called the variance.

1.3 Summary 6

The variance and especially its square root, the standard deviation, are common measures of spread about the mean as center. The standard deviation is zero when there is no variability and gets larger as the spread increases.

1.1 Summary 5

There are two sets of conditional distribution for a two-way table: the distributions of the row variable for each value of the column variable, and the distributions of the column variable for each value of the row variable. You may want to use a side-by-side bar graph (or possibly a segmented bar graph) to display conditional distributions.

1.1 Summary 7

To describe the association between the row and column variables, compare an appropriate set of conditional distributions. Remember that even a strong association between two categorical variables can be influenced by other variables lurking in the background.

1.3 The mean x̄

To find the mean (x̄ x-bar) of a set of observations, add their values and divide by the number of observations. If the n observations are x1,x2,x3...Xn, their mean is the sum of observation/n.

1.1 Association

We say that there is an association between two variables if specific values of one variable tend to occur in common with specific values of the other.

1.2 Summary 4

When comparing distributions of quantitative data, be sure to discuss shape, center, spread, and possible outliers.

1.2 Summary 2

When examining any graph, look for an overall pattern and for notable departures from that pattern. Shape, center, and spread describe the overall pattern of the distribution of a quantitative variable. Outliers are observations that lie outside the overall pattern of a distribution. Always look for outliers and try to explain them. Don't forget your SOCS!

1.3 Summary 3

When you use the median to indicate the center of a distribution, describe its spread using the quartiles. The first quartile Q1 has about one-fourth of the observations below it, and the third quartile Q3 has about three-fourths of the observations below it. The Interquartile range (IQR) is the range of the middle 50% of the observations and is found by IQR=Q3-Q1. An extreme observation is an outlier if it is smaller than Q1- (1.5*IQR) or larger than Q3 + (1.5*IQR)

1.2 Summary 1

You can use a dotplot, stemplot or histogram to show the distribution of a quantitative variable. A dotplot displays individual values on a number line. Stemplots separate each observation into a stem and a one-digit leaf. Histograms plot the counts (frequencies) or percents (relative frequencies) of values in equal-width classes.

1.1 Individuals

individuals are the objects described by a set of data. Individuals may be people, animals, or things.

1.3 The five-number summary

minimum, Q1, median, Q3, maximum


Related study sets

English as a Second Language Supplemental (154)

View Set

Nuclear Energy Assignment and quiz

View Set

Psychology Ways to Study the Brain

View Set

BASIC VEHICLE TECHNOLOGIES 1: COMFORT and 2 confort

View Set

Vascular registry URR practice tests

View Set