QM 214 - Chapter 3
outliers
extremely small or large variables
the 25th percentile is referred to as:
first quartile (Q1)
the median is especially (useful/useless) when outliers are present
useful
in order to return to the original units of measurement, we take the positive square root of ________, which gives us the standard deviation
variance
the ___________ is defined as the average of the squared differences between the observation and the mean
variance
two most widely used measures of dispersion:
variance and standard deviation
the ________ ________ is relevant when some observations contribute more than others
weighted mean
the mean of the sample (sample mean) is referred to as:
x̅ (x-bar)
sample mean formula:
x̅ = ( Σ xi ) / n -all that formula is saying is add up all of the numbers in your data set ( Σ means "add up", xi means "all the numbers in the data set, and n is # of items in the sample)
weighted mean formula (sample):
x̅ = Σwi*xi
the mean of the population (population mean) is referred to as:
μ
population mean formula:
μ = ∑*X / N -all that formula is saying is add up all of the numbers in your data set (Σ means "add up", xi means "all the numbers in the data set, and N is # of items in the population)
in general, the pth percentile divides a data set into two parts:
- approximately p percent of the observations have values less than the pth percent - approximately (100-p) percent of the observations have values greater than the pth percentile
the median is also known as the ______ percentile
50th
locate approximate position of the percentile by calculating Lp:
Lp = (n+1)*P/100
we calculate the median as the average of the two middle values if the number of observations is _______
even
mean absolute deviation (MAD)
an average of the absolute differences between the observations and the mean
to calculate a percentile, first arrange the data in ________ order
ascending (smallest to largest)
we arrange the median data in __________ order from ______ to ________.
ascending / smallest / largest
a _______ ______ is (more or less) a visual representation of particular percentiles
box plot
a _______ _______ is a convenient way to graphically display the minimum value (Min), the quartiles (Q1, Q2, Q3), and the maximum value (Max) of a data set
box plot
a unitless measure that allows for direct comparisons of mean-adjusted dispersion across different data sets
coefficient of variation (CV)
unimodal
data set has one mode
bimodal
data set has two modes
multimodal
data set has two or more modes
calculating coefficient of variation (CV):
dividing a data set's standard deviation by its mean
if Lp is an ______ then Lp denotes the location of the pth percentile
integer
if Lp is not an integer, we need to _________ between two observations to approximate the desired percentile
interpolate
population CV formula
look it up
population MAD formula
look it up
population standard deviation formula
look it up
population variance formula
look it up
sample CV formula
look it up
sample MAD formula:
look it up
sample standard deviation formula
look it up
sample variance formula
look up
the range is the difference between:
maximum (Max) and minimum (Min) Max - Min = Range
the _______ is the most commonly used measure of central location
mean
the _______ divides the data in _____. an equal number of observations lie above and below the _________
median / half / median
the __________ is the only meaningful measure of central location in qualitative data, as opposed to quantitative data
mode
if the median is right of center and the left whisker is longer than the right whisker, then the distribution is
negatively skewed
we calculate the median as the middle value if the number of observations is _______
odd
variance squares the __________ units of measurement
original
if the values of the mean and median differ significantly, it is likely that the data set contains ___________
outliers
one weakness of the mean is that it is unduly influenced by __________
outliers
we refer to the population mean as a _______ and the sample mean as a _______ since the population mean is generally unknown and the sample mean is used to estimate it.
parameter / statistic
_________ provide detailed information about how data are spread over the interval from the smallest value to the largest value
percentiles
if the median is left of center and the right whisker is longer than the left whisker, then the distribution is
positively skewed
arithmetic mean
primary measure of central location. the mean/average. -add up values of all data points and divide by number of data points in population/sample
the ________ is not considered a good measure of dispersion because it focuses solely on the extreme values and ignores every other observation in the data set
range
the simplest measure of dispersion
range
n indicates:
sample size (for population percentile, replace n with N)
the 50th percentile is referred to as:
second quartile (Q2)
coefficient of variation (CV)
serves as a relative measure of dispersion and adjusts for differences in the magnitudes of the means
interquartile range (IQR)
the difference between the first and third quartiles IQR = Q3 - Q1
Lp indicates:
the location of the desired pth percentile
median
the middle value of a data set
mode
the most frequently occurring value in a data set. a data set may have no mode or more than one mode
central location
the way quantitative data tend to cluster around some middle or central value. measures of central location attempt to find a typical or central value that describes the data
the 75th percentile is referred to as:
third quartile (Q3)
the mode's value as a measure of central location tends to diminish with data sets that have more than ______ modes
three