12.1 Median and quartiles

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

The five-number summary consists of

the median M to measure center and the two quartiles Q1 and Q3 and the smallest and largest observations to describe spread.

s is calculated by finding the mean first, then use an observation and square distance from the mean, then find the variance, and finally the standard deviation.

1)sum of observations over n 2)OBSERVATION and SQUARED DISTANCE FROM THE MEAN table=(AN OBSERVATION minus XBAR)SQUARED. DO this for each observation. Then add all those results. 3)the variance is the average of the squared distances( the previous result's sum/n= average called the variance) 4)the standard deviation is the SQUARE ROOT OF THAT VARIANCE number.

The five-number summary of a distribution consists of (written in order from smallest to largest)... minimum Q1 M Q3 maximum

1)the smallest observation 2) the first quartile 3) the median 4) the third quartile 5) the largest observation *The five-number summary is not the most common numerical description of a distribution. *The five-number summary is easy to understand and is the best short description for most distributions. *The five-number summary, with its two quartiles and two extremes, does a better job in describing skewed distributions.

A boxplot is a graph of the five-number summary.

A central box spans the quartiles. A line in the box marks the median. Lines extend from the box out to the smallest and largest observations.

s > 0

As the observations become more spread out about their mean, s gets larger.

You can draw boxplots either horizontally or vertically.

Because boxplots show less detail than histograms or stemplots, they are best used for side-by-side comparison of more than one distribution. Be sure to include a numerical scale in the graph.

Avoid the standard deviation in describing skewed distributions.

Because the two sides of a strongly skewed distribution have different spreads, no single number such as s describes the spread well. In most situations, it is wise to use x and s only for distributions that are roughly symmetric.

Normal distribution N(μ, σ).

Distribution described by a special family of bell-shaped, symmetric density curves, called Normal curves. The mean μ and standard deviation σ completely specify a Normal distribution N(μ, σ).

Always start with a graph of your data

Do remember that a graph gives the best overall picture of a distribution. Numerical measures of center and spread report specific facts about a distribution, but they do not describe its entire shape. Numerical summaries do not disclose the presence of multiple peaks or gaps, for example.

Q1

Has one-fourth of the observations below . The first quartile Q1 is the median of the observations whose position in the ordered list is to the left of the location of the overall median. The overall median is not included in the observations considered to be to the left of the overall median.

Q3

Has three-fourths of the observations below it.The third quartile Q3 is the median of the observations whose position in the ordered list is to the right of the location of the overall median. The overall median is not included in the observations considered to be to the right of the overall median.

Median M.

Midpoint of the values in a data set. A simple and effective way to describe center and spread .Value that separates the smaller half of the observations from the larger half. Even n=average of two middle numbers, odd n=>M=middle number of a set that is ordered from smallest to largest. The median is the midpoint of the observations.

Quartile

Statistic used to describe the spread of a distribution. To find the quartiles, ignore this central observation. The quartiles are much less sensitive to a few extreme observations.

Why is the mean so much higher than the median?in a stemplot of the salaries, with millions as stems.

The distribution is skewed to the right and there are high outliers. If we drop the outliers, the mean DROPS DRASTICALLY. The median doesn't change nearly as much

Choosing a summary The five-number summary is usually better than the mean and standard deviation for describing a skewed distribution or a distribution with outliers. Use x and s only for reasonably symmetric distributions that are free of outliers.

The mean and standard deviation are strongly affected by outliers or by the long tail of a skewed distribution. The median and quartiles are less affected.

Mean and Median contrasts

The most important distinction is that the mean (the average) is strongly influenced by a few extreme observations and the median (the midpoint) is not.

Why do we bother with the standard deviation at all?

Tthe mean and standard deviation are the natural measures of center and spread for an important kind of symmetric distribution, called the Normal distribution.

variability

You can see that risk (variability) goes up as the mean return goes up, just as financial theory claims.

The mean and standard deviation can be changed a lot by ...

a few outliers

To calculate the quartiles first...

arrange the observations in increasing order and locate the median M in the ordered list of observations.

The standard deviation s measures spread as a kind of ...

average distance from the mean, so use it only with the mean

The simplest useful description of a distribution consists of

both a measure of center and a measure of spread.

The mean and median are the same for symmetric distributions, but the mean moves...

farther toward the long tail of a skewed distribution.

In general, use the five-number summary to describe most distributions and the mean and standard deviation only for...

roughly symmetric distributions.

Properties of the standard deviation s

s measures spread about the mean xbar. Use s to describe the spread of a distribution only when you use xbar to describe the center. s= 0 only when there is no spread. This happens only when all observations have the same value. Standard deviation zero means no spread at all.

Standard deviation measures...

the average distance of the observations from their mean. The mean and standard deviation are harder to understand but are more common.

The mean x is.

the average of the observations

There are two common descriptions of center and spread...

the five-number summary and the mean and standard deviation.

A boxplot is a graph of .

the five-number summary.

When you look at a boxplot first locate...

the median, which marks the center of the distribution. Then look at the spread. The quartiles show the spread of the middle half of the data, and the extremes (the smallest and largest observations) show the spread of the entire data set.

The variance is...

the square of the standard deviation.

Back-to-back stemplot is better yet for..

very small numbers of observations

If we have data on a single quantitative variable, we start with a histogram or stemplot to display the distribution. Then ....

we add numbers to describe the center and spread of the distribution.


Kaugnay na mga set ng pag-aaral

SINDROMUL MIELODISPLAZIC, MIELOPROLIFERATIV ȘI GAMAPATIILE MONOCLONALE

View Set

Hello kids (7) L3 Sentence Pattern B p.30

View Set

Chapter 25 - Fluid and Electrolytes

View Set

Class 2: Bonding, Intermolecular Forces, Thermodynamics

View Set

Western Civilization Ch 13 Final

View Set