Chapter 3: Numerical Descriptive Measures
1Q = ?
25th percentile
As a measure of central location, the mode's value diminishes with data sets that have more than ___ modes
3
IQR:
50% of the data, the middle chunk
2Q = ?
50th percentile = median
3Q = ?
75th percentile
what does the z-score measure?
It measures the distance of a given sample value from the mean in standard deviations -it's unitless
to calculate the position of the percentile (Lp):
Lp = (n + 1) (p/100)
Median:
Midpoint in a string of sorted data, where 50% of the observations, or values, are below and 50% are above.
downfall of Chebyshev's theorem?
it results in conservative bounds for the percentage of observations falling in a particular interval. The actual percentage of observations lying in the interval may in fact be much larger.
dispersion (variability) is good to look at with ____?
location such that.. in choosing between supplier A & supplier B, we should consider: not only the average delivery time for each but also the variability in delivery time for each
Use Chebyshev's theorem and the empirical rule to make ___?
precise statements regarding the percentage of data values that fall within a specified number of standard deviations from the mean
Empirical rule:
provides the approximate percentage of observations that fall within 1, 2, or 3 standard deviations from the mean.
the measures of dispersion/variability:
range variance/standard deviation coefficient of variation (COV)
Coefficient of variation (CV) textbook definition:
serves as a relative measure of dispersion and adjusts for differences in the magnitudes of the means.
the two main causes of variation:
special and common
if CV is greater than 1 or 100%, than ?
standard deviation is greater than the mean
if CV is less than 1 or 100%, than ?
standard deviation is less than the mean
what does 1.5 or greater indicate? (for CV)
that an it is an out of control condition and the data shouldn't be used to make an indication
the common measures of central tendency: mean, median, and mode; prove which principle of variation?
that it can be measures
what does a z-score of −1.5 imply?
that the given sample value is 1.5 standard deviations below the mean.
a high standard deviation indicates that ___?
the data are spread out
if the number calculated with the empirical rule and chebyshev's theorem is different, then....
the data is non-normal
A low value of standard deviation indicates that ___?
the data points are close to the mean
Variance is based on ?
the difference between the value of each observation (xi) and the mean
in a box plot, if the median is right of center and the left whisker is longer than the right whisker....?
the distribution is negatively skewed.
in a box plot, If the median is left of center and the right whisker is longer than the left whisker....?
the distribution is positively skewed
For example, a z-score of 2 implies that...?
the given sample value is 2 standard deviations above the mean.
what is the Best measure of location for normal data?
the mean
which measure of location is Influenced by outliers or skewed data?
the mean
what is the Best measure of location for skewed or non-normal data?
the median
If a process is affected only by common or inherent causes of variation...
the process measurements will form distributions that are stable or predictable over time
if a process is affected by special causes of variation...
the process measurements will form distributions that are unstable or unpredictable over time
Use the z−score to find...
the relative position of a sample value within the data set by dividing the deviation of the sample value from the mean by the standard deviation
Range:
the simplest measure of dispersion; it is the difference between the maximum (Max) and the minimum (Min) values in a data set.
for a sample, standard deviation = s = ?
the square root of s^2 (the variance)
for a sample, variance = s^2 = ?
the sum of ((the observations) - (the mean))sqaured / ((the number of observations) - 1)
to calculate weighted mean (x-bar):
the sum of (the weight of an observation multiplied by the value of the observation)
mode:
the value of a data set that occurs most frequently
what happens when the CV increases?
the variation, relative to the mean, increases
Central location:
the way quantitative data tend to cluster around some middle or central value
what should you use when trying to forecast?
the weighted mean
why doesn't The interquartile range, IQR = Q3 − Q1, not depend on the extreme values?
this measure still does not incorporate all the data so, the IQR is a good measure of dispersion
The purpose of measuring central location is...
to find a typical or central value that describes the data
true or false: An equal number of observations lie above and below the median
true
z-scores can predict outliers, true or false
true
Chebyshev's theorem you can...
use the standard deviation to make statements about the proportion of observations that fall within certain intervals
• Population Mean:
use when calculating data from the entire population opposed to the data from a sample set
finding the Median of a data set:
value in the middle when data items are arranged in ascending order
Mode of a data set:
value that occurs with greatest frequency
dispersion (in this sense) = ?
variability
multimodal:
when more than two modes exist
bimodal:
when two modes exist
with the z-score of the min and max, what can you determine?
whether the data is normal or non-normal
Mode:
The most frequently occurring value
Standard Deviation of a data set:
positive square root of the variance
CV is Calculated by...
(s / x-bar) (100) -dividing a data set's standard deviation by its mean, -CV is a unitless measure that allows for direct comparisons of mean-adjusted dispersion across different data sets.
z-score = ?
(x - x-bar) / s
the principles of variation:
1. variation always exists 2. variation can be measured 3. variation forms a pattern called distribution 4. a distribution can vary central tendency/location,, spread, and shape
the pth percentile divides a data set into two parts:
Approximately p percent of the observations have values less than the pth percentile; AND Approximately (100 - p ) percent of the observations have values greater than the pth percentile.
Mean/Average:
Average of a set of values
what is The main difference between Chebyshev's theorem and the empirical rule?
Chebyshev's theorem applies to all data sets whereas the empirical rule is appropriate when the distribution is symmetric and bell-shaped
standardizing the data
Converting sample data into z-scores
Chebyshev's theorem:
For any data set, the proportion of observations that lie within k standard deviations from the mean is at least 1 − 1/k2, where k is any number greater than 1.
IQR = ?
Q3 - Q1
Box plot (box-and-whisker plot):
a convenient way to graphically display the minimum value (Min), the quartiles (Q1, Q2, and Q3), and the maximum value (Max) of a data set. -also are used as an effective tool for identifying outliers and skewness -used to informally gauge the shape of the distribution
Median is measure most often reported for
annual income and property value data [A few extremely large incomes or property values can inflate the mean.]
why is the range Not considered a good measure of dispersion?
because it focuses solely on the extreme values and ignores every other observation in the data set
why use the variance and s.d. instead of finding the average distance from the mean?
because we would get 0
the following are examples that describe what? • A typical value that describes the return on an investment • The number of defects in a production process • The salary of a business graduate • The rental price in a neighborhood • The number of customers at a local convenience store
central location
variation exists in ______ and can be measured by ____ _____ because it has a _____ that we want to find
everything central tendency pattern
what do measures of dispersion indicate?
how the data vary around the center
when is it preferable to use the empirical rule?
if the histogram or other visual and numerical measures suggest a symmetric and bell-shaped distribution
what does "1.5 × IQR " tell us when used correctly?
if there is an outlier
Coefficient of Variation:
indicates how large the standard deviation is in relation to the mean
what does it mean If the mean and median are substantially different?
it is most likely that the data set contains outliers
to calculate "mu":
pop mean = Mu = (Sum of values) / (Number of observations in the population)
measures of location:
mean median mode weighted mean percentile/quartiles
Variance:
measure of variability that utilizes all the data
Population Parameters:
measures are computed for data from a population
Sample Statistics:
measures computed for data from a sample
what answers this question? Where does the average or location of the population "tend to center?"
measures of central tendency
what divides the data in half?
median
what is Also referred to as the 50th percentile?
median
When data set has extreme values...
median is the preferred measure of central location
which measure of location is not often used?
mode
unimodal:
no mode
the five summary values:
o Min = smallest value o Q1 = first quartile = 25th percentile o Q2 = median = second quartile = 50th percentile o Q3 = third quartile = 75th percentile o Max = largest value
the weighted mean is used when...
observations may be more important than others
after finding the position of Lp, you find the percentile or _Q =
p1 + [(decimal in Lp)(p1 - p2)] so, the pth percentile is located __% (decimal in Lp converted into a percentage) of the distance between the ___th and ___th observation
sample mean (x-bar):
point estimator of the population mean (mu)
Sample Statistic =
point estimator for corresponding population parameter