Reading 7: Statistical Concepts and Market Returns
zero
-All interval and ratio data sets have an arithmetic mean -All data values are considered and included in the arithmetic mean calculation -A data set has only one arithmetic mean (ie. it is unique) -The sum of the deviations of each observation in the data set from the mean is always ___________
underestimate
Based on the mathematical theory behind statistical procedures, the use of the entire number of sample observations, n, instead of n-1 as the divisor in the computation of s^2, will systematically _____________________ the population parameter. This systematic underestimation causes the sample variance to be what is referred to as a biased estimator.
Central Tendancy
Measures of _________________ include: 1. Arithmetic Mean 2. Geometric Mean 3. Weighted Mean 4. Median 5. Mode
dispersion
Measures of ______________________ indicate the riskiness of an investment.
three
The computed kurtosis for all normal distributions is ______________.
standard deviation
The problem with variance being computed in terms of squared deviations is solved by using the ____________________.
arithmetic mean
Unusually large or small values can have disproportionate effect on the computed value for the ____________________.
Nominal scales
_______________ are the level of measurement that contains the least information. Observations are classified or counted with no particular order.
Leptokurtic
________________ describes a distribution that is more peaked than a normal distribution.
kurtosis
________________ is a measure of the degree to which a distribution is more or less "peaked" than a normal distribution.
outliers
_________________ are observations with extraordinarily large values, either positive or negative.
platykurtic
_________________ refers to a distribution that is less peaked, or flatter than a normal distribution.
Ratio Scales
__________________ represent the most refined level of measurement. These scales provide ranking and equal differences between scale values, and they also have a true zero point as the origin.
relative dispersion
____________________ is the amount of variability in a distribution relative to a reference point or benchmark.
Interval scale
____________________ measurements provide relative ranking, like ordinal scales, plus the assurance that differences between scale values are equal. Temperature measurement in degrees is a prime example.
skewness
____________________ refers to the extent to which a distribution is not symmetrical. Nonsymmetrical distributions ma be either positively or negatively skewed and result from the occurrence of outliers in a data set.
Ordinal scale
____________________ represent a higher level of measurement than nominal scales. This scale involves every observation being assigned to one of several categories. Then these categories are ordered with respect to specified characteristic.
dispersion
______________________ is defined as the variability around the central tendency.
relative frequency
_______________________ is calculated by dividing the absolute frequency of each return interval by the total number of observations.
coefficient of variation (CV)
______________________________ is the amount of dispersion in a distribution relative to the distribution's mean. It is useful because it enables us to make a direct comparison of dispersion across different sets of data.
Chebyshev's Inequality
_________________________________ states that for any set of observations, whether sample or population data and regardless of the shape of the distribution, the percentage of the observations that lie within k standard deviations of the mean is at least 1 - 1/k^2 for all k > 1.
measures of central tendency
____________________________________________ identify the center, or average, of a data set. This central point can then be used to represent the typical, or expected, value in the data set.
sample
a _____________ is defined as a subset of the population of interest.
population
a ___________________ is defined as the set of all possible members of a stated group.
histogram
a ____________________ is the graphical presentation of the absolute frequency distribution. It is simply a bar chart of continuous data that has been classified into a frequency distribution.
harmonic mean
a ______________________ is used for certain computations, such as the average cost of shares purchased over time.
positively skewed
a ____________________________ distribution is characterized by many outliers in the upper region, or right tail.
negatively skewed
a ______________________________ distribution has a disproportionately large amount of outliers that fall within its lower (left) tail.
frequency distribution
a ___________________________________ is a tabular presentation of statistical data that aids the analysis of large data sets. These distributions summarize statistical data by assigning it to specified groups, or intervals.
mesokurtic
a distribution is ___________________ if it has the same kurtosis as a normal distribution.
symmetrical
a distribution is ____________________ if it is shaped identically on both sides of its mean. Distributional symmetry implies that intervals of losses and gains will exhibit the same frequency.
excess kurtosis
a distribution is said to exhibit ____________________ if it has either more or less kurtosis than the normal distribution.
greater than
a leptokurtic distribution has excess kurtosis _________________ zero.
fatter tails
a leptokurtic return distribution will have more returns clustered around the mean and more returns with large deviations from the mean (________________)
squared units
a major problem with using the variance is the difficulty of interpreting it since it is in terms of ____________________ of measurement.
less than
a platykurtic distribution has excess kurtosis _________________ zero.
56
according to Chebyshev's inequality, the following relationships hold for any distribution. At least: -36% of observations lie within ±1.25 standard deviations of the mean. -___% of observations lie within ±1.50 standard deviations of the mean. -75% of observations lie within ±2 standard deviations of the mean. -89% of observations lie within ±3 standard deviations of the mean. -94% of observations lie within ±4 standard deviations of the mean.
quartile
any ________________ may be expressed as a percentile. For example, the third quartile partitions the distribution at a value such that three-fourths, or 75%, of the observations fall below that value. Thus, the third quartile is the 75th percentile.
Descriptive Statistics
are used to summarize the important characteristics of large data sets.
Measurement scales
different statistical methods use different levels of measurement, or ________________________.These may be classified into four main categories: 1. Nominal Scales 2. Ordinal Scales 3. Interval Scales 4. Ratio Scales
zero
excess kurtosis is defined as kurtosis minus 3. Thus, a normal distribution has an excess kurtosis equal to ___________.
less than
for a negatively skewed, unimodal distribution, the mean is ____________ the median, which is less than the mode. In this case, there are large, negative outliers that tend to "pull" the mean downward (to the left)
less than
for a positively skewed, unimodal distribution, the mode is ___________________ the median, which is less than the mean. The mean is affected by outliers; in a positively skewed distribution, there are large, positive outliers which will tend to "pull" the mean upward, or more positive.
equal
for a symmetrical distribution, the mean, median, and mode are __________.
dollar cost averaging
for values that are not all equal: harmonic mean < geometric mean < arithmetic mean. This mathematical fact is the basis for the claimed benefit of purchasing the same dollar amount of mutual fund shares each month or each week - this is referred to as: "________________________________________"
CV
in an investments setting, the _____ is used to measure the risk (variability) per unit of expected return (mean).
sample statistic
in the same manner that a parameter may be used to describe a characteristic of a population, a ______________________ is used to measure a characteristic of a sample.
summing
it is also possible to compute the cumulative absolute frequency and cumulative relative frequency by ________________ the absolute or relative frequencies starting at the lowest interval and progressing through the highest.
Inferential Statistics
pertain to the procedures used to make forecasts, estimates, or judgments about a large set of data on the basis of the statistical characteristics of a smaller set (a sample).
greater
population standard deviation is typically _____________ than MAD
measures of location
quantiles and measures of central tendency are known collectively as ____________________________.
coefficient of variation (cv)
relative dispersion is commonly measured with the _______________________.
mode
the __________ is the value that occurs most frequently in a data set. A data set may have more than one mode or even no mode.
range
the ___________ is the distance between the largest and the smallest value in a data set.
median
the _______________ is the midpoint of a data set when the data is arranged in ascending or descending order. Half of the observations lie above the median and half are below.
sample variance
the ________________ is the measure of dispersion that applies when we are evaluating a sample of n observations from a population.
geometric mean
the ___________________ is often used when calculating investment returns over multiple periods or when measuring compound growth rates.
sample mean
the ________________________ is the sum of all the values in a sample of a population, divided by the number of observations in the sample, n. It is used to make inferences about the population mean.
mean absolute deviation (MAD)
the ___________________________ is the average of the absolute values of the deviations of individual observations from the arithmetic mean.
population variance
the _______________________________ is defined as the average of the squared deviations from the mean.
population standard deviation
the ________________________________ is the square root of the population variance.
sample standard deviation
the __________________________________ can be calculated by taking the square root of the sample variance.
absolute frequency
the ___________________________________ or simply, the frequency, is the actual number of observations that fall within a given interval.
zero
the arithmetic mean is the only measure of central tendency for which the sum of the deviations from the mean is ___________________.
risk
the common theme in finance and investments is the tradeoff between reward and variability, where the central tendency is the measure of the reward and dispersion is a measure of _________.
weighted mean
the computation of a ________________________ recognizes that different observations may have a disproportionate influence on the mean.
frequency distribution
the data employed with a _______________________________________ may be measured using any type of measurement scale.
frequency distribution
the following procedure describes how to construct a _________________________: 1. Define the intervals 2. Tally the observations 3. Count the observations
less than or equal
the geometric mean is always ________________ to the arithmetic mean, and the difference increases as the dispersion of the observations increases.
any distribution
the importance of Chebyshev's inequality is that it applies to _________________________.
nonoverlapping
the intervals in a frequency distribution should always be __________________________________________.
n-1
the most noteworthy difference from the formula for population variance and sample variance is that the denominator for s^2 is __________________.
arithmetic means
the population mean and sample mean are both examples of ____________________. The _______________________ is the sum of the observation values divided by the number of observations.
portfolio
the return for a portfolio is the weighted average of the returns of the individual assets in the ______________.
population mean
to compute the ________________________, all the observed values in the population are summed and divided by the number of observations in the population, N.
frequency polygon
to construct a _____________________________, the midpoint of each interval is plotted on the horizontal axis, and the absolute frequency for that interval is plotted on the vertical axis. Each point is then connected with a straight line. (line graph)
unimodal
when a distribution has one value that appears most frequently, it is said to be ______________________.
bimodal, trimodal
when a set of data has two or three values that occur most frequently, it is said to be ________________ or _________________, respectively.
median
when the arithmetic mean is affected by extremely large or small values (outliers), the __________________ is a better measure of central tendency than the mean because it is not affected by extreme values.