Quantitative Survey Methods Ch 3
what are the 5 values in 5 number summaries
Smallest Value First Quartile Median Third Quartile Largest Value
an outlier might be:
an incorrectly recorded data value a data value that was incorrectly included in the data set a correctly recorded unusual data value that belongs in the data set
for chebyshev's theorem, At least 75% of the data values must be within z = ____ standard deviations of the mean.
2
First Quartile = ____th Percentile
25th
according to the empirical rule, Almost all of the data values will be within +/- ____ standard deviations of its mean.
3
for chebyshev's theorem, At least 94% of the data values must be within z =____ standard deviations of the mean
4
Second Quartile = ___th Percentile = Median
50th
according to the empirical rule, For data having a bell-shaped distribution: Approximately ____% of the data values will be within +/- 1 standard deviation of its mean.
68%
Third Quartile = ___th Percentile
75th
for chebyshev's theorem, At least _____% of the data values must be within z = 3 standard deviations of the mean.
89
according to the empirical rule, Approximately ___% of the data values will be within +/- 2 standard deviations of its mean.
95
___________ is a measure of linear association and not necessarily causation.
Correlation
_________ _______ refers to functionality in interactive dashboards that allows the user to access information and analyses at increasingly detailed level
Drilling down
Range =
Largest value - Smallest value
formula to Compute Lp , the location of the pth percentile.
Lp= (p/100)(n + 1)
___________ are specific percentiles.
Quartiles
If the data distribution is symmetric, the skewness is _____. a. 0 b. .5 c. 1 d. None of these answers are correct.
a. 0
Which of the following is NOT a measure of variability of a single variable? a. covariance b. standard deviation c. range d. coefficient of variation
a. covariance
A(n) _____ is an unusually small or unusually large data value. a. outlier b. median c. sample statistic d. z-score
a. outlier
A numerical measure computed from a sample, such as sample mean, is known as a _____. a. sample statistic b. population parameter c. sample parameter d. population statistic
a. sample statistic
which of the following is not a measure of location: a) median b) variance c) mode d) mean
b) variance
Which of the following is not a measure of dispersion? a. range b. 50th percentile c. standard deviation d. interquartile range
b. 50th percentile
The measure of location often used in analyzing growth rates in financial data is the _____. a. hyperbolic mean b. geometric mean c. arithmetic mean d. weighted mean
b. geometric mean
The coefficient of variation indicates how large the standard deviation is relative to the _____. a. median b. mean c. range d. variance
b. mean
For data skewed to the left, the skewness is _____. a. positive b. negative c. between 0 and .5 d. less than 1
b. negative
Which of the following symbols represents the standard deviation of a population? a. μ b. σ c. x̄ d. σ 2
b. σ
If the data have exactly two modes, the data are ____________
bimodal
A ____ ________ is a graphical summary of data that is based on a five-number summary.
box plot
A set of visual displays organizing and presenting information used to monitor the performance of a company or organization in a manner that is easy to read, understand, and interpret is called a _____. a. crosstabulation b. stem-and-leaf display c. data dashboard d. stacked bar chart
c. data dashboard
In a five-number summary, which of the following is NOT used for data summarization? a. largest value b. median c. mean d. smallest value
c. mean
The measure of variability easiest to compute, but seldom used as the only measure, is the _____. a. interquartile range b. variance c. range d. standard deviation
c. range
Which of the following values of r indicates the strongest correlation? a. .361 b. 0 c. −.9 d. .82
c. −.9
The mean provides a measure of ___________ _____________.
central location
The _____________ ___ __________ indicates how large the standard deviation is in relation to the mean
coefficient of variation
the _____________ is a measure of the linear association between two variables. Positive values indicate a positive relationship. Negative values indicate a negative relationship
covariance
Two descriptive measures of the relationship between two variables are ______________ and _____________ ________________
covariance and correlation coefficient
Since the median is the middle value of a data set, it must always be _____. a. smaller than the mode b. smaller than the mean c. larger than the mode d. None of these answers are correct
d. None of these answers are correct
When the data are positively skewed, the mean will usually be _____. a. less than the median b. greater than the mode c. less than the mode d. greater than the median
d. greater than the median
A graph with skewness −1.8 would be which of the following? a. moderately skewed right b. highly skewed left c. moderately skewed left d. highly skewed right
d. highly skewed right
The correlation coefficient ranges from which two values? a. 0 and 1 b. 1 and 100 c. minus infinity and plus infinity d. −1 and +1
d. −1 and +1
When the data are believed to approximate a bell- shaped distribution: The _____________ ________ can be used to determine the percentage of data values that must be within a specified number of standard deviations of the mean.
empirical rule
Summary statistics and easy-to-draw graphs can be used to quickly summarize large quantities of data. Two tools that accomplish this are _______-________ summaries and _____ ______
five-number summaries and box plots
The _______________ _________ is calculated by finding the nth root of the product of n values. It is often used in analyzing growth rates in financial data (where using the arithmetic mean will provide misleading results). It should be applied anytime you want to determine the mean rate of change over several successive periods (be it years, quarters, weeks, . . .). Other common applications include: changes in populations of species, crop yields, pollution levels, and birth and death rate
geometric mean
A data value greater than the sample mean will have a z-score ___________ than zero.
greater
At least (1- 1/z^2) of the items in any data set will be within z standard deviations of the mean, where z is any value _________ than 1.
greater
Chebyshev's theorem requires z > 1, but z need not be an __________.
integer
Th ________________ _______ of a data set is the difference between the third quartile and the first quartile. It is the range for the middle 50% of the data. It overcomes the sensitivity to extreme data value
interquartile range
Limits for a box plot are located (not drawn) using the _______________ ________.
interquartile range (IQR)
A data value less than the sample mean will have a z-score _____ than zero.
less
The median is the measure of location most often reported for annual income and property value data. As a few extremely large incomes or property values can inflate the __________
mean
Perhaps the most important measure of location is the __________.
mean
The ______ of a data set is the average of all the data values.
mean
Data dashboards are not limited to graphical displays. The addition of numerical measures, such as the ______ and _________ __________ of KPIs, to a data dashboard is often critical
mean and standard deviation
Whenever a data set has extreme values, _________ is the preferred measure of central location.
median
The _________ of a data set is the value in the middle when the data items are arranged in ascending order.
median
The _____________ provides the preferred measure of location when the data are highly skewed.
median
for an even number of observations, the _______ is the average of the two middle values
median
the 50th percentile is the __________
median
A key to the development of a box plot is the computation of the __________ and the quartiles ____ and ____.
median; Q1 and Q3
for an odd number of observations, the median is the _________ value
middle
The _________ of a data set is the value that occurs with greatest frequency. The greatest frequency can occur at two or more different values.
mode
If the data have more than two modes, the data are ________________
multimodal
if the distribution shape is moderately Skewed Left the Skewness is _______________. Mean will usually be less than the media
negative
The empirical rule is based on _________ _________________
normal distribution
n=
number of observations in the sample
An __________ is an unusually small or unusually large value in a data set. A data value with a z-score less than -3 or greater than +3 might be considered an outlier. This can be -2 and +2 depending how you want to see outliers.
outlier
Box plots provide another way to identify __________
outliers
in a box plot, Data outside the interquartile range limits are considered __________ The locations of each is shown with the symbol (dot)
outliers
The pth percentile of a data set is a value such that at least ___ percent of the items take on this value or less and at least (100 - p ) percent of the items take on this value or more
p
a ____________ provides information about how the data are spread over the interval from the smallest value to the largest value.
percentile
A sample statistic is referred to as the _________ ______________ of the corresponding population parameter
point estimator
The sample mean x is the ________ _______________ of the population mean μ.
point estimator
If the measures are computed for data from a population, they are called ________________ ______________.
population parameters
if the distribution shape is moderately Skewed Right the Skewness is ____________. Mean will usually be more than the median
positive
the __________ of a data set is the difference between the largest and smallest data values. It is the simplest measure of variability. It is very sensitive to the smallest and largest data values
range
If the measures are computed for data from a sample, they are called ____________ _________
sample statistics
An important measure of the shape of a distribution is called _____________
skewness
The ____________ __________ of a data set is the positive square root of the variance. It is measured in the same units as the data, making it more easily interpreted than the variance
standard deviation
The z-score is often called the ____________ _____________.
standardized value
ΣXi=
sum of the values of the n observations
Another measure sometimes used when extreme values are present, is the ___________ _________. It is obtained by deleting a percentage of the smallest and largest values from a data set and then computing the mean of the remaining values
trimmed mean
t or f: Just because two variables are highly correlated, it does not mean that one variable is the cause of the other
true
Often a manager or decision-maker is interested in the relationship between _____ variables.
two
It is often desirable to consider measures of ____________ (dispersion), as well as measures of locatio
variability
The ____________ is a measure of variability that utilizes all the data. It is based on the difference between the value of each observation (xi) and the mean (X̅ for a sample, μ for a population). The variance is useful in comparing the variability of two or more variables
variance
The ______________ is the average of the squared differences between each data value and the mean
variance
The correlation coefficient can take on values between -1 and +1. Values near -1 indicate a strong negative linear relationship. Values near +1 indicate a strong positive linear relationship. The closer the correlation is to zero, the __________ the relationship
weaker
In some instances the mean is computed by giving each observation a _________ that reflects its relative importance. The choice of weights depends on the application
weight
The process of converting a value for a variable to a z-score is often referred to as a ___ ___________________.
z transformation
An observation's ____-_______ is a measure of the relative location of the observation in a data set.
z-score
denotes the number of standard deviations a data value Xi is from the mean.
z-score
Suppose annual salaries for sales associates from Hayley's Heirlooms have a bell-shaped distribution with a mean of $32,500 and a standard deviation of $2,500.The z-score for a sales associate from this store who earns $37,500 is _____.
z-score = 2
A data value equal to the sample mean will have a z-score of _________
zero
c symmetric (not skewed) the Skewness is ______. Mean and median are equal
zero
sample mean of x forumla
ΣXi/ n
Population Mean of μ formula
μ =∑Xi/ N