1.2.2. Measures of Central Tendency, Quantiles, and Dispersion
Computing the Population Mean and Sample Mean The scores of 20 students on a 100-point exam are given as 77, 90, 57, 85, 68, 31, 45, 86, 46, 98, 25, 10, 57, 67, 88, 77, 34, 89, 47, and 77. Calculate the population mean and the mean for a sample including only the first 5 observations. Show that the sum of the deviations from the sample mean equals zero.
1,254/20 = 62.7 (77+90+57+85+68)/5=377/5=75.4 (77−75.4)+(90−75.4)+(57−75.4)+(85−75.4)+(68−75.4)=0
Calculating the Weighted Mean An individual invests 30% of her portfolio in Stock A, 40% in Stock B, and 30% in Stock C. The expected returns on Stock A, B, and C are 10%, 14%, and 6% respectively. What is the portfolio's expected return?
=(0.3×0.1)+(0.4×0.14)+(0.3×0.06)=10.4%
The mode is the most frequently occurring value in a data set. For grouped data, the mode is the interval with the greatest number of observations.
A data set may have no mode, one mode, or more than one mode.
The differences of each variable range from mean sum to zero, so it will do little good to average the differences. The mean absolute deviation (MAD), however, uses the absolute value of these differences
Although difficult to manipulate mathematically, the MAD does use all observations in the data set where the range does not.
LOS j: Calculate and interpret measures of dispersion. Dispersion describes variability of the data series around the central tendency and is often described as risk in the context of average returns.
Although it cannot describe the shape of the distribution, the range (i.e., maximum less minimum value) is the simplest dispersion measure. If extreme outliers exist, the range will not be representative of the average risk to return.
Cutoffs for the other quantiles may be stated in terms of percentages; for example, 75% of observations fall below the 4th quartile.
An interquartile range is the difference between the top of the third quartile and the bottom of the second quartile (i.e., top of the first quartile) or IQR = Q3 − Q1.
When the observations are returns, for example, the coefficient of variation measures the amount of risk (standard deviation) per unit of reward (mean return). .
An issue that may arise, especially when dealing with returns, is that if X is negative, the statistic is meaningless
The geometric mean uses multiplication rather than addition to better establish an average proportional change.
Because of the formula for this, all values must be positive
LOS k: Calculate and interpret target downside deviation. Target downside deviation (TDD), also known as target semi-deviation, measures dispersion of observations below the target rather around the average.
First, identify observations less than the target, and then determine the average dispersion below the target (see image) This better focuses on downside risk than the standard deviation, and the calculation is similar to that of standard deviation. As an investor's return target increases, the number of observations used for a target downside deviation calculation in a notably large sample will most likely increase
Relative dispersion describes dispersion relative to a reference value or benchmark. One measure of relative dispersion is the coefficient of variation, which identifies risk per unit of return
For returns, the CV measures the standard deviation of returns per unit of average return. The statistic is meaningless, however, if average return is negative.
The arithmetic mean provides the best estimate for a one-period growth rate because it is the average of all the periodic growth rates over time.
Geometric mean, however, indicates how the periodic returns are linked to the total value change over time.
The mean is also unique; that is, each data set has only one mean.
However, the arithmetic mean is sensitive to extreme values; that is, disproportionately large or small values will drag the mean toward itself.
The harmonic mean will be less than the geometric mean which is, in turn, less than the arithmetic mean . . . unless all observations are equal. HM < GM < AM
In fact, they are related: GM = HM × AM. The harmonic mean would allow an analyst to use a data set without trimming or replacing the outlier while still giving less influence to an outlier.
The arithmetic mean essentially assigns a weight of 1/n, an equal weighting, to each observation.
In financial applications, it often makes sense to weight an observation to assess its value in the ultimate outcome.
The harmonic mean reduces the tendency of extreme outliers to pull the mean in that direction.
It is also useful in dollar cost averaging or other applications in which a ratio is applied to a fixed quantity to obtain a variable number of units.
The geometric mean represents the average growth rate of an investment over time and may be called the compound annual growth rate or CAGR.
It is the periodic rate of growth that can take a beginning investment forward in time to its ultimate value.
The positional location, L, in an ascending array of the value below which y% of observations lie is:
Ly=(n+1)(y/100) When L is a whole number, it represents an actual observation. When L is not a whole number, it becomes necessary to use the linear interpolation between the two observations around L.
Computing the Geometric Mean The returns on XYZ common stock in the years 2000, 2001, 2002, and 2003 were 15.3%, 6.7%, −10%, and −2.3% respectively. Compute the geometric mean of these returns.
RG = 1.0198−1 = 0.0198 or 1.98%
Calculation of the Median Calculate the median score of the 20 scores: 77, 90, 57, 85, 68, 31, 45, 86, 46, 98, 25, 10, 57, 67, 88, 77, 34, 89, 47, and 77.
Solution First we must arrange the scores in ascending or descending order. We'll go with ascending order. 10, 25, 31, 34, 45, 46, 47, 57, 57, 67, 68, 77, 77, 77, 85, 86, 88, 89, 90, 98. Because this is an even-numbered data set (20 observations), the median equals the average of the 10th (n/2) and 11th [(n+2)/2] observations. The median for this data set is therefore 67.5 [(67+68)/2]
Calculating the Mean Absolute Deviation Calculate the mean absolute deviation of the following data set that contains average returns earned by a portfolio manager over the last 6 years: 4%, 1%, 8%, 10%, 3%, 15%.
Solution Mean=X=(4+1+8+10+3+15)/6 =6.83% MAD=(|4−6.83| + |1−6.83| + |8−6.83| + |10−6.83| + |3−6.83| + |15−6.83|) / 6 = 4.167% On average, the observations in the data set deviate 4.167% from the mean return of 6.83%.
Harmonic Mean An investor purchased $5,000 worth of RPS Stock each month over the last 4 months at prices of $4, $5, $6, and $7. Determine the average cost of the shares acquired.
Solution (see image) To check this result, calculate the total number of shares purchased and compute the average price. 5,000 / 4 + 5,000 / 5 + 5,000 / 6 + 5,000 / 7 = 3,798 shares The average price equals ($20,000 / 3798) = $5.27
Quartiles Calculate the first quartile of a distribution that consists of the following asset returns: 10%, 23%, 13%, 17%, 19%, 5%, 4%. If we include one more return observation of 10% in our data set, what is the new value of the first quartile?
Solution - Ly=(n+1)(y/100) First, arrange the data in ascending order: 4%, 5%, 10%, 13%, 17%, 19%, 23% The first quartile = (7+1)(25/100) = 2nd item in the data set. One-fourth or 25% of the observed returns lie below the second observation from the left, which is 5%. Once again we begin by arranging the data in ascending order: 4%, 5%, 10%, 10%, 13%, 17%, 19%, 23% The first quartile = (8 + 1)(25/100) = 2.25 This means that once the data set has been rearranged in ascending order, the first quartile is the second observation from the left plus 0.25 times the difference between the 2nd and 3rd observations. Therefore, one-fourth of the observed returns are below 6.25%. [5% + 0.25(10% − 5%)]
Variance is defined as the average of the squared deviations around the mean, which takes care of the problem of differences from mean cancelling each other out.
The downside is that using the squared differences is counterintuitive; i.e., the unit values of the original data are not preserved.
When the variable is a rate or a ratio, the harmonic return presents a better picture of central tendency.
The harmonic return equals the reciprocal of the average of the reciprocals of the observations The observation's weight in a harmonic mean is inversely proportional to its magnitude.
LOS g: Calculate and interpret measures of central tendency. LOS h: Evaluate alternative definitions of mean to address an investment problem. A measure of central tendency describes the central value or the midpoint of the arrayed data.
The mean is a parameter if it derives from population data and is a statistic if it derives from sample data.
The median is the middle value of the data once arranged in ascending or descending order. The median for an odd number of variables is the observation in the position identified by (n + 1)/2.
The median for an even number of variables is the average of observations in the positions identified by n/2 and (n + 2)/2.
The geometric mean will almost always be less than the arithmetic mean, and the difference increases with the variability of the returns.
The only time the geometric mean equals the arithmetic mean is when there is no variability in the return data.
The arithmetic average is usually presented with the standard deviation to provide context. If the series is returns over time, the geometric return may also be provided.
The relationship between the geometric and arithmetic means is (see image)
Calculating Sample Variance and Standard Deviation Calculate the variance and standard deviation for five golfers assuming that they represent the entire population of golfers participating in a particular tournament.
Their scores are 67, 71, 72, 75, and 68.
Quantiles may be visualized by a box-and-whisker plot, which shows whiskers at the highest and lowest values and boxes for the middle quartile ranges.
To better identify outliers, the boundaries of the IQR +/- some percentage will be applied to establish the whiskers. Outliers will appear outside the whiskers.
Market index returns, for example, are computed as the sum of weighted price changes across all constituents of the index.
Weighted means are useful for establishing return from a portfolio of securities with different asset classes, among other things.
To help avoid the outlier problem, a trimmed mean includes only some percentage of the middle of the values A trimmed mean excludes extreme outliers from the calculation, thus interrupting the integrity of the data set.
a winsorized mean replaces outliers to the highest or lowest observation in some percentage of the middle data. A winsorized mean substitutes more likely values for extreme outliers, thus interrupting the integrity of the data set.
To summarize, use the
arithmetic mean if the data is well-behaved (i.e., no outliers) geometric mean to represent the compound growth rate. harmonic mean to suppress outliers, (or use the trimmed mean, or winsorized mean as well)
The geometric mean
can be alternatively written for portfolio returns (see image)
The arithmetic mean is the sum of the observations
divided by the number of observations
In the context of sample data, the variance formula uses n − 1 observations in the denominator;
if you know n − 1 variables and the sample mean, you would be able to calculate the remaining variable
The advantage of the median is that
it is not sensitive to extreme values and is useful for asymmetrical distributions.
Mean absolute deviation will always be
less than standard deviation because the differences from mean are squared prior to averaging.
LOS i: Calculate quantiles and interpret related visualizations. Where the median divides ascending (i.e., lowest to highest) observations in a data set...
quartiles divide the data into four groups, quintiles into five groups, deciles into 10ths, percentiles into 100ths. The yth percentile describes the value at or below which y% of observations occur.
Standard deviation is
the positive square root of the variance and is intuitive because it is stated in units of the original variable.
The arithmetic mean of a sample is the best estimate of the next observed value and
the sum of the deviations from the arithmetic mean always equals 0