Statistics Chapter 2

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

typical

The center of a data set tells us information about the "___________" value. We will also use numerical measurements to describe the shape of the data.

minus

The deviation of an observation xi is how far it is from the mean... xi _________ the mean

zero

The sum of the deviations is always _______.

variance

s^2 = 1/n-1 x the sum of the deviations

four-step process

1. *State*: What is the practical question in the context of the real-world setting? 2. *Plan*: What specific statistical operations does this problem call for? 3. *Solve*: Make the graphs and carry out all required calculations. 4. *Conclude*: Give your practical conclusion in the setting of the real-world problem.

quartiles

1. Arrange the observations in increasing order and locate the median M in the ordered list of observations. 2. The first quartile, Q1, is the median of the observations that are to the left of M. 3. The third quartile, Q2, is the median of the observations that are to the right of M.

3, 1

IQR = Q__ - Q__

below, above

Mathematically, a suspected outlier lies ___________ Q1 *-* 1.5 x IQR or __________ Q3 *+* 1.5 x IQR

False... Since both mean and standard deviation are affected by strong outliers, they should not be used as measures for describing distributions when there is a strong outlier in the dataset.

True or false: The mean and standard deviation are always valid measures for describing a distribution even if there is a strong outlier in the dataset.

real data outlier

Which one of the following types of outliers can NOT be corrected or deleted? A. Real data outlier B. Typing error outlier C. Frivolous response outlier

symmetric

The mean and standard deviation should be used for reasonably ______________ distributions that are free of outliers.

The mean will increase

There are eight boys in a pre-school class. Their mean height is 33 inches and their median height is 33 inches. The tallest boy whose height is 38 inches moves away and is replaced by a boy whose height is 39 inches. How does this affect the mean?

interquartile range (IQR)

a measure of spread that plays a role in mathematically determining outliers. The IQR is the distance between Q1 and Q3.

standard deviation

used to describe the variation around the mean... s

modified

A ___________ box plot is similar to a boxplot, but it shows suspected outliers as dots or another symbol, such as an asterisk.

five number summary

Consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation of a distribution, written in order from smallest to largest.

right

_________ skew = Mean > Median

outliers

_____________ can strongly influence the mean. The mean is not a resistant measure of center.

mean

A suspected outlier will influence which statistic the most?

no

Since the deviation depends on the mean, is it resistant to outliers and skewness?

5-number summary

The ___-___________ _________ is resistant to strong *outliers*.

interquartile range

The distance between the first and third quartiles.

Median... The median will be less affected by outliers and is a good choice for this data set.

When examining the gas mileage for cars of a particular make and model, you notice that most of the cars have similar gas mileages but one or two have gas mileages very different from the others. Should you use the mean or the median to describe the center?

left

_______ skew = Mean < Median

mean

standard deviation measured spread about the _________ and should be used when the mean is most appropriate.

median

the midpoint of a distribution— the number such that half of the observations are smaller than it & the other half are larger... M

first quartile, Q1

the value in the sample that has 25% of the data at or below it.

third quartile, Q3

the value in the sample that has 75% of the data at or below it.

1.5 x IQR rule for outliers

Call an observation a suspected outlier if it falls more than 1.5 x IQR above the third quartile or below the first quartile.

False... The central box spans where the middle 50% of the data lie regardless of how wide the box is.

True or false: If a boxplot has a wide central box, a high percentage of the data falls between the quartiles Q1 and Q3.

False... Deleting the value of $75.58 would reduce the mean from $34.32 to $28.45.

A shopper at a local supermarket spent the following amounts in his last eight trips to the store: $32.92 $14.14 $30.80 $28.34 $75.58 $36.33 $33.51 $22.94 True or false: Even though $75.58 appears to be an outlier, deleting it would not change the value of the mean.

123

A statistics class has 245 students. To find the median score on the first midterm, you should first order the exam scores and then find the score in the ___rd position. Give your answer as a whole number.

1.5

A suspected outlier falls more than _____ "IQRs" away from either Q1 or Q3.

6.5

Below is the five-number summary for the tar content of 25 different brands of cigarettes. Min=1, Q1=8.5, Median=12.6, Q3=15, Max=17 The value of the interquartile range is _________________. Give answer as X.X.

boxplot

a graphical display of the five-number summary... A central box spans the middle 50% of the data (marked by the first and third quartiles)... A line in the box marks the median M... Lines extend from the box out to the smallest and largest observations.

Median... Since the median is resistant to the outliers of a few very wealthy individuals, it is a better choice than the mean.

For the dataset, incomes for people in the United States, should you use the mean or the median to describe the center?

even

If n is ________, the median M is midway between the two center observations in the ordered list.

odd

If the number of observations (n) is _______, the median M is the center observation in the list.

median... Since the median is resistant to the outliers of a few very truant students, it is a better choice than the mean.

If you were interested in studying the number days missed during the last school year for 9th graders in a large urban school district, should you use the mean or the median to describe the center? Hint: Consider whether or not there would be any outliers.

skewed

In a ______________ distribution, the mean is usually farther in the long tail than the median.

center, spread

Numerical measures of ____________ and __________ report specific facts about a distribution. Be sure to produce graphical displays to better understand the behavior of the data.

variance

Prior to finding the standard deviation, we first calculate the _______________ s2, the average of the squares of the deviations of the observations from their mean.

resistant measure

Relatively unaffected by changes in the numerical value of a small proportion of the total number of observations of any aspect of a distribution, no matter how large these changes are.

increases

Standard deviation ____________ as the observations become more spread out about their mean.

units of measurement

Standard deviation has the same ________ ____ ___________ as the original observations.

greater

Standard deviation is always ___________ than or equal to 0.

resistant

Standard deviation, like the mean, is NOT ____________ to outliers and skewness.

spread

The *IQR* is a resistant measure of __________.

center

The *median* is a resistant measure of _________.

median

The ___________ is often unaffected by extreme observations like outliers. It is a more resistant measure than the mean.

symmetric

The mean and median of a roughly ______________ distribution are close together. If the distribution is exactly this, the mean and median are exactly the same.

variability

The mean is a measure of center whereas the standard deviation measures the ____________ of data about the mean.

square root

The standard deviation is the __________ ________ of the variance.

stay the same, decrease

There are three children in a room, ages 3, 4, and 5. If another 4 year old enters the room, the mean age will ... but the variance will ...

second quartile, Q2

synonymous to the median

False... Two datasets can have the same values for mean and standard deviation, but totally different shapes. For example, one data set could be right-skewed, and the other could be left-skewed

True or false: If two datasets have the same values for the mean and standard deviation, then their shapes will also be the same.

True... Since none of these are computed using values in the lower or upper 10% of the data, they are resistant measures

True or false: The first quartile, the median and the third quartile are all resistant measures.

True... The sum of the data values has the same unit of measure as the data values. Since the mean is computed by dividing this sum by n, it has the same unit of measure.

True or false: The mean has the same unit of measure as the data values.

true

True or false: You should always plot your data because numerical summaries do not reveal multiple peaks, clusters or outliers.

outliers

When extreme observations are involved, be sure to choose statistical methods that are not influenced by ____________.

mean

an arithmetic average and is calculated by summing the observations and then dividing by the number of observations. It is considered the "balance point" of the distribution.


Ensembles d'études connexes

PostTest, Practice Exam 1 & Practice Exam 2

View Set

Coursera Spanish Vocabulary: Meeting People: Vocab 1

View Set

BIOLOGY FINAL EXAM REVIEW CH.12 QUESTIONS

View Set

很重要SAUNDERS Pediatric - DISORDERS( 2 )

View Set