Statistical Concepts & Market Returns
ratio scales
Most refined level of measurement. Provides ranking and equal differences between scale values, and a true zero point as the origin. (ex. $) LOS7.a
geometric mean return formula
1 + RG = [(1+R1) + (1+R2) + ... + (1 + Rn)] ^ 1/n LOS7.e
parameter
A characteristic of a population. LOS7.b
histogram
A graphical presentation of the absolute frequency distribution. Frequency is on the vertical axis, interval is on the horizontal axis. Allows us to see where most of the observations are concentrated. LOS7.d
sample
A subset of the population of interest. LOS7.a
frequency distribution
A tabular presentation of statistic data that aids the analysis of large sets. LOS7.b
relative dispersion
Amount of variability in a distribution relative to a reference point or benchmark. Commonly measured with a coefficient of variation (CV). LOS7.i
mean absolute deviation (MAD) formula
Average of the absolute values of the deviations of individual observations of the arithmetic mean. MAD = (Σ |Xi - x̅|) / n LOS7.g
coefficient of variation (CV)
CV = standard deviation of x / average value of x measures amount of dispersion in a distribution relative to the distribution's mean. Enables us to make a direct comparison of dispersion across difference sets of data. In an investment setting, the CV is used to measure the risk (variability) per unit of expected return (mean). LOS7.i
relative frequency
Calculated by dividing the frequency of each return interval by the total number of observations. The percentage of total observations falling within each interval. LOS7.c
cumulative absolute frequency / cumulative relative frequency
Calculated by summing the absolute relative frequencies at the lowest interval to the highest. LOS7.c
uni-modal / bimodal / trimodal
Data sets that have one / two / three values that occur most frequently. LOS7.e
population variance formula
Defined as the average of the squared deviations from the mean. σ^2 = Σ (Xi - μ)^2 / N LOS7.g
platyjurtic
Describes a distribution that is less peaked than a normal distribution. LOS7.l
leptokurtic
Describes a distribution that is more peaked than a normal distribution. LOS7.l
range
Distance between the largest and smallest value in the data set. range = max value - min value LOS7.g
sample skewness
Equal to the sum of the cubed deviations from the mean divded by the cubed standard deviation and the number of observations. Sk = 1/n * ( Σ (Xi - x̅)^3 / s^3 ) LOS7.l
arithmetic means
Examples are population mean and sample mean. Most widely used measure of central tendency. The sum of the deviations of each observation in the data set from the mean is always zero. LOS7.e
Chebyshev's inequality
For any set of observations, whether sample or population data and regardless of the shape of hte distribution, the percentage of the observations that lie within k standard deviations of the mean is at least 1 - 1/k^2 for all k > 1. LOS7.h
Quantile
General term for the value at or below which a stated proportion of the data in a distribution lies. Ex. Quartiles (quarters), quintiles (fifths), deciles (tenths), percentile (hundredths). LOS7.f Ly = (n + 1) * y/100
Analyzing investment returns & Arithmetic / Geometric Means
Geometric means = for multiple years Arithmetic Mean = for the next year LOS7.g
mesokurtic
Has the same kurtosis as a normal distirbution. LOS7.l
ordinal scales
Higher level of measurement than nominal scale. Each observation is assigned to a category, and the categories are then ordered with respect to a characteristic. LOS7.a
measures of central tendency
Identify the center, or average (mean) of the data set. LOS7.e
nominal scales
Level of measurement that contains the least information (ex. numbered from 1-10 in no order). LOS7.a
frequency polygon
Like a histogram. The midpoint of each interval is plotted on the horizontal axis, the absolute frequency is plotted on the vertical axis. Each point is connected with a straight line. LOS7.d
sample variance
Measure of dispersion that applies when we are evaluating sample n observations from the population. s^2 = Σ (Xi - x̅)^2 / (n-1) LOS7.g
sample kurtosis
Measured using deviations raised to the fourth power. Sample kurtosis = 1/n * ( Σ (Xi - x̅)^4 / s^4 ) LOS7.l
inferential statistics
Pertains to the procedures used to make forecasts, estimates, or judgements about a large set of data on the basis of the statistical characteristics of a smaller set (a sample). LOS7.a
interval scales
Provide relative ranking, plus assurance that differences between scale values are equal. (ex. temperature) LOS7.a
measures of location
Quantiles and measures of central tendency are known collectively as Measures of Location. LOS7.f
weighted mean formula
Recognizes that different observations may have a disproportionate influence on the mean. x̅w = Σ wiXi - (w1X1 + w2X2 + ... + wnXn) LOS7.e
skewness
Refers to the extent to which a distribution is not symmetrical. Positively skewed (many outliers in the upper region, or right tail) or negatively skewed (many outliers in the lower left tail). Mean is affected more than median and mode and is pulled in direction of skew. LOS7.j
Sharpe measure (Sharpe ratio)
Sharpe ratio = (rp - rf) / σp rp = portfolio return rf = risk free return σp - standard deviation of portfolio returns Measures the excess return per unit of risk. LOS7.i
population standard deviation
Square root of the population variance. σ = [ Σ (Xi - μ)^2 / N ] ^1/2 LOS7.g
sample standard deviation
Square root of the sample variance. s = [ Σ (Xi - x̅)^2 / (n-1) ] ^1/2 LOS7.g
median
The midpoint of a data set when the data is arranged in ascending or descending order. Important because mean can be affected by outliers. LOS7.e
population
The set of all possible members of a stated group. LOS7.a
mode
The value that occurs most frequently in a data set. LOS7.e
harmonic mean
Used for certain computations, such as the average cost of shares purchased over time. N / ( Σ 1 / Xi) LOS7.e
sample statistic
Used to mature a characteristic of a sample. LOS7.b
descriptive statistics
Used to summarize the important characteristics of large data sets. LOS7.a
geometric mean (average) formula
Used when calculating investment returns over multiple periods or when measuring compound growth rates. G = (X1 * X2 * X3 * ... * Xn) ^ 1/n LOS7.e
dispersion
Variability around the central tendency. LOS7.g
symmetrical distribution
identical on both sides LOS7.j
kurtosis
measure of the degree to which a distribution is more or less peaked than a normal distribution. Kurtosis for all normal distributions is 3. LOS7.l
outliers
observations with extraordinarily large values, either positive or negative. LOS7.j
sample mean (average) formula
x̅ = Σ Xi / n LOS7.e
sum of mean deviations formula
Σ Xi - x̅ = 0 LOS7.e
population mean (average) formula
μ = Σ Xi / N LOS7.e