Statistics Chapter 2
typical
The center of a data set tells us information about the "___________" value. We will also use numerical measurements to describe the shape of the data.
minus
The deviation of an observation xi is how far it is from the mean... xi _________ the mean
zero
The sum of the deviations is always _______.
variance
s^2 = 1/n-1 x the sum of the deviations
four-step process
1. *State*: What is the practical question in the context of the real-world setting? 2. *Plan*: What specific statistical operations does this problem call for? 3. *Solve*: Make the graphs and carry out all required calculations. 4. *Conclude*: Give your practical conclusion in the setting of the real-world problem.
quartiles
1. Arrange the observations in increasing order and locate the median M in the ordered list of observations. 2. The first quartile, Q1, is the median of the observations that are to the left of M. 3. The third quartile, Q2, is the median of the observations that are to the right of M.
3, 1
IQR = Q__ - Q__
below, above
Mathematically, a suspected outlier lies ___________ Q1 *-* 1.5 x IQR or __________ Q3 *+* 1.5 x IQR
False... Since both mean and standard deviation are affected by strong outliers, they should not be used as measures for describing distributions when there is a strong outlier in the dataset.
True or false: The mean and standard deviation are always valid measures for describing a distribution even if there is a strong outlier in the dataset.
real data outlier
Which one of the following types of outliers can NOT be corrected or deleted? A. Real data outlier B. Typing error outlier C. Frivolous response outlier
symmetric
The mean and standard deviation should be used for reasonably ______________ distributions that are free of outliers.
The mean will increase
There are eight boys in a pre-school class. Their mean height is 33 inches and their median height is 33 inches. The tallest boy whose height is 38 inches moves away and is replaced by a boy whose height is 39 inches. How does this affect the mean?
interquartile range (IQR)
a measure of spread that plays a role in mathematically determining outliers. The IQR is the distance between Q1 and Q3.
standard deviation
used to describe the variation around the mean... s
modified
A ___________ box plot is similar to a boxplot, but it shows suspected outliers as dots or another symbol, such as an asterisk.
five number summary
Consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation of a distribution, written in order from smallest to largest.
right
_________ skew = Mean > Median
outliers
_____________ can strongly influence the mean. The mean is not a resistant measure of center.
mean
A suspected outlier will influence which statistic the most?
no
Since the deviation depends on the mean, is it resistant to outliers and skewness?
5-number summary
The ___-___________ _________ is resistant to strong *outliers*.
interquartile range
The distance between the first and third quartiles.
Median... The median will be less affected by outliers and is a good choice for this data set.
When examining the gas mileage for cars of a particular make and model, you notice that most of the cars have similar gas mileages but one or two have gas mileages very different from the others. Should you use the mean or the median to describe the center?
left
_______ skew = Mean < Median
mean
standard deviation measured spread about the _________ and should be used when the mean is most appropriate.
median
the midpoint of a distribution— the number such that half of the observations are smaller than it & the other half are larger... M
first quartile, Q1
the value in the sample that has 25% of the data at or below it.
third quartile, Q3
the value in the sample that has 75% of the data at or below it.
1.5 x IQR rule for outliers
Call an observation a suspected outlier if it falls more than 1.5 x IQR above the third quartile or below the first quartile.
False... The central box spans where the middle 50% of the data lie regardless of how wide the box is.
True or false: If a boxplot has a wide central box, a high percentage of the data falls between the quartiles Q1 and Q3.
False... Deleting the value of $75.58 would reduce the mean from $34.32 to $28.45.
A shopper at a local supermarket spent the following amounts in his last eight trips to the store: $32.92 $14.14 $30.80 $28.34 $75.58 $36.33 $33.51 $22.94 True or false: Even though $75.58 appears to be an outlier, deleting it would not change the value of the mean.
123
A statistics class has 245 students. To find the median score on the first midterm, you should first order the exam scores and then find the score in the ___rd position. Give your answer as a whole number.
1.5
A suspected outlier falls more than _____ "IQRs" away from either Q1 or Q3.
6.5
Below is the five-number summary for the tar content of 25 different brands of cigarettes. Min=1, Q1=8.5, Median=12.6, Q3=15, Max=17 The value of the interquartile range is _________________. Give answer as X.X.
boxplot
a graphical display of the five-number summary... A central box spans the middle 50% of the data (marked by the first and third quartiles)... A line in the box marks the median M... Lines extend from the box out to the smallest and largest observations.
Median... Since the median is resistant to the outliers of a few very wealthy individuals, it is a better choice than the mean.
For the dataset, incomes for people in the United States, should you use the mean or the median to describe the center?
even
If n is ________, the median M is midway between the two center observations in the ordered list.
odd
If the number of observations (n) is _______, the median M is the center observation in the list.
median... Since the median is resistant to the outliers of a few very truant students, it is a better choice than the mean.
If you were interested in studying the number days missed during the last school year for 9th graders in a large urban school district, should you use the mean or the median to describe the center? Hint: Consider whether or not there would be any outliers.
skewed
In a ______________ distribution, the mean is usually farther in the long tail than the median.
center, spread
Numerical measures of ____________ and __________ report specific facts about a distribution. Be sure to produce graphical displays to better understand the behavior of the data.
variance
Prior to finding the standard deviation, we first calculate the _______________ s2, the average of the squares of the deviations of the observations from their mean.
resistant measure
Relatively unaffected by changes in the numerical value of a small proportion of the total number of observations of any aspect of a distribution, no matter how large these changes are.
increases
Standard deviation ____________ as the observations become more spread out about their mean.
units of measurement
Standard deviation has the same ________ ____ ___________ as the original observations.
greater
Standard deviation is always ___________ than or equal to 0.
resistant
Standard deviation, like the mean, is NOT ____________ to outliers and skewness.
spread
The *IQR* is a resistant measure of __________.
center
The *median* is a resistant measure of _________.
median
The ___________ is often unaffected by extreme observations like outliers. It is a more resistant measure than the mean.
symmetric
The mean and median of a roughly ______________ distribution are close together. If the distribution is exactly this, the mean and median are exactly the same.
variability
The mean is a measure of center whereas the standard deviation measures the ____________ of data about the mean.
square root
The standard deviation is the __________ ________ of the variance.
stay the same, decrease
There are three children in a room, ages 3, 4, and 5. If another 4 year old enters the room, the mean age will ... but the variance will ...
second quartile, Q2
synonymous to the median
False... Two datasets can have the same values for mean and standard deviation, but totally different shapes. For example, one data set could be right-skewed, and the other could be left-skewed
True or false: If two datasets have the same values for the mean and standard deviation, then their shapes will also be the same.
True... Since none of these are computed using values in the lower or upper 10% of the data, they are resistant measures
True or false: The first quartile, the median and the third quartile are all resistant measures.
True... The sum of the data values has the same unit of measure as the data values. Since the mean is computed by dividing this sum by n, it has the same unit of measure.
True or false: The mean has the same unit of measure as the data values.
true
True or false: You should always plot your data because numerical summaries do not reveal multiple peaks, clusters or outliers.
outliers
When extreme observations are involved, be sure to choose statistical methods that are not influenced by ____________.
mean
an arithmetic average and is calculated by summing the observations and then dividing by the number of observations. It is considered the "balance point" of the distribution.