MATH 1680 Chapter 3

Ace your homework & exams now with Quizwiz!

Symmetric box plot

Median is in the center of the box Left and right whiskers are roughly the same length

Skewed right box plot

Median is left of center in the box Left whisker is shorter than right whisker

Skewed left box plot

Median is right of center in the box Left whisker is longer than right whisker

Empirical Rule

68%, 95%, 99.7%

Define what it means for a numerical summary of data to be resistant.

A numerical summary of data is said to be resistant if values that are extreme (very large or small) relative to the data do not affect its value substantially

What does a z-score represent?

A z score represents the distance that a data value is from the mean in terms of the number of standard deviations

What does a z-score measure?

A z-score measures the number of standard deviations an observation is above or below the mean. For example, a z-score of 1.24 means the data value is 1.24 standard deviations above the mean. A z-score of −2.31 means the data value is 2.31 standard deviations below the mean.

Explain how to compute the arithmetic mean of a variable.

Add all the values of the variable in the data set and divide by the number of observations

When describing the shape of a distribution from a boxplot, be sure to justify your conclusion. Possible areas to discuss:

Compare the length of the left whisker to the length of the right whisker The position of the median in the box Compare the distance between the median and the first quartile to the distance between the median and the third quartile Compare the distance between the median and the minimum value to the distance between the median and the maximum value

When an observation that is much larger than the rest of the data is added to a data​ set, the value of the median will increase substantially.

False

What does a positive z-score for a data value indicate? What does a negative z-score indicate?

If a data value is larger than the mean, the z-score is positive. If a data value is smaller than the mean, the z-score is negative. If the data value equals the mean, the z-score is zero.

When comparing two populations, what does a larger standard deviation imply about dispersion?

It implies that there is a greater dispersion or spread of the distribution provided the variable of interest from the two populations has the same unit of measure.

If the shape of a distribution is skewed left or skewed right, which measure of central tendency and which measure of dispersion should be reported? Why?

It is best to use the median as the measure of central tendency and the interquartile range as the measure of dispersion because these measures are resistant

What does a measure of central tendency describe?

It numerically describes the average or "typical" data value. In everyday language the word average often represents the arithmetic mean (to compute the arithmetic mean of a set of data, the data must be quantitative

For a distribution that is​ symmetric, which of the following is​ true?

Mean = median

If the shape of a distribution is symmetric, which measure of central tendency and which measure of dispersion should be reported?

Mean should be the measure of central tendency and standard deviation should be the measure of dispersion

What is an outlier?

Outliers are extreme observations in data sets. They can occur by chance, because of errors in measurement of a variable, during data entry, or from errors in sampling.

Which measure of dispersion is resistant?

Quartiles are resistant & for this reason they are used to define a resistant measure of dispersion.

______ is not resistant

Range

Is standard deviation resistant? Why or why not?

Standard deviation is NOT resistant because an extreme value changing can dramatically increase or decrease the standard deviation.

List the four steps for checking for outliers by using quartiles.

Step 1. Determine the first and third quartiles of the data. Step 2: Compute the interquartile range. Q3-Q1 Step 3: Determine the fences. Fences serve as cutoff points for determining outliers. Lower fence = Q1−1.5(IQR) Upper fence = Q3+1.5(IQR) If a data value is less than the lower fence or greater than the upper fence, it is considered an outlier.

List the three steps for finding quartiles.

Step 1: Arrange the data in ascending order. Step 2: Determine the median, M, or second quartile, Q2. Step 3: Divide the data set into two halves: the observations less than M and the observations greater than M. The first quartile, Q1, is the median of the bottom half, and the third quartile, Q3, is the median of the top half. Do not include M in these halves.

List the three steps in finding the median of a data set.

Step 1: Arrange the data is ascending order Step 2: Determine the number of observations, n Step 3: Determine the observation in the middle of the data set - if the number of observations is odd, then the median is the data value exactly in the middle of the data set. That is, the median is the observation that lies in the (n+1)/2 position -if the number of observations is even, then the median is the mean of the two middle observations in the data set. That is, the median is the mean of the observations that lie in the n/2 position and the (n/2) + 1 position

List the five steps for drawing a boxplot.

Step 1: Determine the lower and upper fences: LF = Q1 - 1.5(IQR) UF = Q3 + 1.5(IQR) where IQR = Q3-Q1 Step 2: Draw a number line long enough to include the maximum and minimum values. Insert vertical lines at Q1, M, and Q3. Enclose those vertical lines in a box Step 3: Label the lower and upper fences with a temporary mark Step 4: Draw a line from Q1 to the smallest data value that is larger than the lower fence. Draw a line from Q3 to the largest data value that is smaller than the upper fence. These lines are called whiskers. Step 5: Plot any data values less than the lower fence or greater than the upper fence as outliers. Outliers are marked with an asterisk (*). Remove the temporary marks labeling the fences.

What is the range of a variable?

The difference between the largest and smallest data value

Define the first, second, and third quartiles

The first quartile, denoted Q1, divides the bottom 25% of the data from the top 75%. The second quartile, Q2, divides the bottom 50% of the data from the top 50%. The third quartile, Q3, divides the bottom 75% of the data from the top 25%.

What values does the five-number summary consist of?

The five-number summary of a set of data consists of the smallest data value, Q1, the median, Q3, and the largest data value. We use the five-number summary to learn information about the extremes of the data set. The summary is organized as: Minimum Q1 M(edian) Q3 Maximum

If a data set has many values that are "far" from the mean, how is the standard deviation affected?

The further an observation is from the mean, the larger the squared deviation. If a data set has many observations that are "far" from the mean, then the sum of the squared deviations will be large and the standard deviation will be large.

Explain the circumstances for which the interquartile range is the preferred measure of dispersion. What is an advantage that the standard deviation has over the interquartile​ range?

The interquartile range is preferred when the data are skewed or have outliers. An advantage of the standard deviation is that it uses all the observations in its computation.

Define the interquartile range, IQR.

The interquartile range, IQR, is the range of the middle 50% of the observations in a data set. That is, the IQR is the difference between the first and third quartiles and is found using this formula IQR=Q3−Q1.

Why is the median resistant but the mean is not?

The mean is not resistant because when data are​ skewed, there are extreme values in the​ tail, which tend to pull the mean in the direction of the tail. The median is resistant because the median of a variable is the value that lies in the middle of the data when arranged in ascending order and does not depend on the extreme values of the data.

Define the mode of a variable.

The observation of the variable that occurs most frequently in the data set

What symbols are used to represent the population mean and the sample mean?

The population arithmetic mean, μ (pronounced "mew"), is a parameter that is computed using data from all the individuals in a population. The sample arithmetic mean, x (with a line over it) (pronounced "x-bar"), is a statistic that is computed using data from individuals in a sample.

What is the mean of the data?

The value such that a histogram of the data is perfectly balanced, with equal weight on each side of the mean

Define the median of a variable.

The value that lies in the middle of the data when arranged in ascending order. We use M to represent the median.

Define variance.

The variance of a variable is the square of the standard deviation.

Range, standard deviation, and variance are not resistant.

True

List the conditions for determining when to use mean:

Use mean when data are quantitative and the frequency distribution is roughly symmetric

List the conditions for determining when to use median:

Use median when the data are quantitative and the frequency distribution is skewed left or skewed right

List the conditions for determining when to use mode:

Use mode when the most frequent observation is the desired measure of central tendency or the data are qualitative.

State the reason that we compute the mean.

We compute the mean because much of statistical inference is based on the mean

The standard deviation can be negative.

false

When an observation that is much larger than the rest of the data is added to a data​ set, the value of the mean will

increase

For a distribution that is skewed​ right, the median is ___________ of the box

left of center

For a distribution that is skewed​ left, which of the following is​ true?

mean < median

For a distribution that is skewed​ right, which of the following is​ true?

mean > median

In distributions that are skewed to the right, what is the relationship of the mean, median, and mode?

mean > median > mode

After all, the mean and median are close in value for symmetric data, and the ___________ is the better measure of central tendency for skewed data.

median

Therefore, when the distribution of data is highly skewed or contains extreme observations, it is best to use the _________ as the measure of central tendency and the interquartile range as the measure of dispersion because these measures are resistant.

median

Which measure, the mean or the median, is resistant?

median is resistant, mean is not resistant

With ____________ z-scores, we need to be careful when deciding the better outcome. For example, when comparing finishing times for a marathon the lower score is better because it is more standard deviations below the mean.

negative

The interpretation of the interquartile range is the range of the middle 50% of the data. The more spread a set of data has, the higher the interquartile range will be. The interquartile range, IQR, is a __________ measure of dispersion.

resistant

Which measure, the mean or the median, is least affected by extreme observations?

the median

The​ _______ represents the number of standard deviations an observation is from the mean.

z score

The sum of the deviations about the mean always equals _______

zero

True or​ False: When comparing two​ populations, the larger the standard​ deviation, the more dispersion the distribution​ has, provided that the variable of interest from the two populations has the same unit of measure.

​True, because the standard deviation describes how​ far, on​ average, each observation is from the typical value. A larger standard deviation means that observations are more distant from the typical​ value, and​ therefore, more dispersed.


Related study sets

Chapter 1 -- Patient Centered Care

View Set

Chapter 6 - the political environment: a critical concern

View Set

LUOA 9th Grade Health Semester Exam

View Set

SOCI 102 - Intro/the Sociological Imagination (Chapter 1)

View Set

PSYC 301 Biological Basis of Behavior

View Set