CFA Level 1 - Section 2: Quantitative Methods - Reading 10 and 11: Probability and Sampling
Efficiency
An estimator is efficient if no other unbiased estimator of the sample parameter has a sampling distribution with smaller variance. That is, in repeated samples, analysts expect the estimates from an efficient estimator to be more tightly grouped around the mean than estimates from other unbiased estimators. For example, the sample mean is an efficient estimator of the population mean, and the sample variance is an efficient estimator of the population variance.
Unbiasedness
An estimator's expected value (the mean of its sampling distribution) equals the parameter it is intended to estimate. For example, the sample mean is an unbiased estimator of the population mean because the expected value of the sample mean is equal to the population mean.
Students' t-Distribution Graph
The value of t can be determined from a t-table. The degrees of freedom for t are equal to the degrees of freedom for the estimate of σm, which is equal to N-1.
Assumptions Regarding Confidence Intervals
1.) The point estimate will always lie exactly at the midway mark of the confidence interval. This is because it is the "best" estimate for μ, and so the confidence interval expands out from it in both directions. 2.) The higher the percentage of confidence, the wider the interval will be. As the percentage is increased, a wider interval is needed to give us a greater chance of capturing the unknown population value within that interval. 3.) The width of the confidence interval is always twice the part after the positive or negative sign, that is, twice the reliability factor x standard error. The width is simply the upper limit minus the lower limit. It is very rare for a researcher wishing to estimate the mean of a population to already know its standard deviation. Therefore, the construction of a confidence interval almost always involves the estimation of both μ and σ.
Biased Sample
A biased sample is one in which the method used to create the sample results in samples that are systematically different from the population. For instance, consider a research project on attitudes toward sex. Collecting the data by publishing a questionnaire in a magazine and asking people to fill it out and send it in would produce a biased sample. People interested enough to spend their time and energy filling out and sending in the questionnaire are likely to have different attitudes toward sex than those not taking the time to fill out the questionnaire. It is important to realize that it is the method used to create the sample, not the actual makeup of the sample, that defines the bias. A random sample that is very different from the population is not biased: it is by definition not systematically different from the population. It is randomly different.
Binomial Random Variable
A binomial random variable X is defined as the number of successes in n Bernoulli trials. The assumptions are: The probability (p) of success is constant for all trials. Similarly, the failure probability 1 - p stays constant throughout the experiment. The trials are independent. Thus, the outcome of one trial does not in any way affect the outcome of any subsequent trial. The sampling is done with replacement. This means that once an outcome has occurred, it is not precluded from occurring again.
Confidence Interval
A confidence interval is an interval for which one can assert with a given probability 1 - α, called the degree of confidence, that it will contain the parameter it is intended to estimate. This interval is often referred to as the (1 - α)% confidence interval for the parameter, where α is referred to as the level of significance. The end points of a confidence interval are called the lower and upper confidence limits. For example, suppose that a 95% confidence interval for the population mean is 20 to 40. This means that: 1.) There is a 95% probability that the population mean lies in the range of 20 to 40. 2.) "95%" is the degree of confidence. 3.) "5%" is the level of significance. 4.) 20 and 40 are the lower and higher confidence limits, respectively.
Consistency
A consistent estimator is one for which the probability of accurate estimates (estimates close to the value of the population parameter) increases as sample size increases. In other words, a consistent estimator's sampling distribution becomes concentrated on the value of the parameter it is intended to estimate as the sample size approaches infinity. As the sample size increases to infinity, the standard error of the sample mean declines to 0 and the sampling distribution concentrates around the population mean. Therefore, the sample mean is a consistent estimator of the population mean.
Cumulative Frequency Distribution
A cumulative frequency distribution is a plot of the number of observations falling in or below an interval. It can show either the actual frequencies at or below each interval (as shown here) or the percentage of the scores at or below each interval. The plot can be a histogram as or a polygon. Example Consider a probability function: p(X) = X/6 for X = 1, 2, 3 and p(X) = 0 otherwise. In a previous example it was shown that p(1) = 1/6, p(2) = 2/6, and p(3) = 3/6. F(1) indicates the probability that has been accumulated up to and including the point X = 1. Clearly, 1/6 of probability has been accumulated up to this point, so F(1) = 1/6. F(2) indicates the probability that has been accumulated up to and including the point X = 2. When X = 2 is reached, the accumulation of 1/6 is taken from X = 1 and 2/6 from X = 2; in total accumulation is 1/6 + 2/6 = 3/6 or, of the probability, so F(2) = 3/6. F(3) indicates the probability that has been accumulated up to and including the point X = 3. By the time X = 3 is reached, all the probability has been accumulated: 1/6 from X = 1, 2/6 from X = 2 and 3/6 from X = 3. Thus, 1/6 + 2/6 + 3/6 = 1. Therefore, F(3) = 1. It is also possible to calculate F(X) for intermediate values. F(0) = 0, as no probability has been accumulated up to the point X = 0; F(1.5) = 1/6, as by the time X = 1.5 is reached, 1/6 of probability has been accumulated from X = 1; F(7) = 1, as by the time 7 is reached, all possible probability from X = 1, 2 and 3 has been collected.
Discretely Compounded Rate of Return
A discretely compounded rate of return measures the rate of changes in the value of an asset over a period under the assumption that the number of compounding periods is countable. Most standard deposit and loan instruments are compounded at discrete and evenly spaced periods, such as annually or monthly. For example, suppose that the holding period return on a stock over a year is 50%. If the rate of return is compounded on a quarterly basis, the compounded quarterly rate of return on the stock is (1 + 0.5)^(1/4) - 1 = 10.67%.
Multivariate Distribution
A multivariate distribution specifies the probabilities for a group of related random variables. It is used to describe the probabilities of a group of continuous random variables if all of the individual variables follow a normal distribution. Each individual normal random variable would have its own mean and its own standard deviation, and hence its own variance. When you are dealing with two or more random variables in tandem, the strength of the relationship between (or among) the variables assumes huge importance. You will recall that the strength of the relationship between two random variables is known as the correlation. When there is a group of assets, the distribution of returns on each asset can either be modeled individuallyor on the assets as a group. A multivariate normal distribution for the returns on n stocks is completely defined by three lists of parameters: 1.) The list of the mean returns on the individual securities (n means in total). 2.) The list of the securities' variances of return (n variances in total). 3.) The list of all the distinct pairwise return correlations (n(n-1)/2 distinct correlations in total). The higher the correlation values, the higher the variance of the overall portfolio. In general, it is better to build a portfolio of stocks whose prices are not strongly correlated with each other, as this lowers the variance of the overall portfolio. It is the correlation values that distinguish a multivariate normal distribution from a univariate normal distribution. Consider a portfolio consisting of 2 assets (n = 2). The multivariate normal distribution can be defined with 2 means, 2 variances, and 2 x (2-1)/2 = 1 correlation. If an analyst has a portfolio of 100 securities, the multivariate normal distribution can be defined with 100 means, 100 variances, and 100 x (100 - 1)/2 = 4950 correlations. Portfolio return is a weighted average of the returns on the 100 securities. A weighted average is a linear combination. Thus, portfolio return is normally distributed if the individual security returns are (joint) normally distributed. In order to specify the normal distribution for portfolio return, analysts need means, variances, and the distinct pairwise correlations of the component securities.
Probability Distribution
A probability distribution specifies the probabilities of the possible outcomes of a random variable. If you toss a coin 3 times, the possible outcomes are as follows (where H means heads and T means tails): TTT, TTH, THT, HTT, THH, HTH, HHT, HHH. In total, there are 8 possible outcomes. Of these: Only 1 (TTT) has 0 heads occurring. Three (TTH, THT and HTT) have 1 heads occurring. Three (THH, HTH and HHT) have 2 heads occurring. One (HHH) has 3 heads occurring. Thus, if x = number of heads in 3 tosses of a coin, then x = 0, 1, 2 or 3. Now, the respective probabilities are 1/8, 3/8, 3/8 and 1/8, as you have just seen. So: p(0) = p(0 Heads) = 1/8 p(1) = p(1 Head) = 3/8 p(2) = p(2 Heads) = 3/8 p(3) = p(3 Heads) = 1/8 This is a probability distribution; it records probabilities for each possible outcome of the random variable.
Lognormal Distribution
A random variable, Y, follows a lognormal distribution if its natural logarithm, lnY, is normally distributed. You can think of the term lognormal as "the log is normal." For example, suppose X is a normal random variable, and Y = e^x. Therefore, LnY = Ln(e^x) = X. Because X is normally distributed, Y follows a lognormal distribution. 1.) Like the normal distribution, the lognormal distribution is completely described by two parameters: mean and variance. 2.) Unlike the normal distribution, the lognormal distribution is defined in terms of the parameters of the associated normal distribution. Note that the mean of Y is not equal to the mean of X, and the variance of Y is not equal to the variance of X. In contrast, the normal distribution is defined by its own mean and variance. 3.) The lognormal distribution is bounded below by 0. In contrast, the normal distribution extends to negative infinity without limit. 4.) The lognormal distribution is skewed to the right (i.e., it has a long right tail). In contrast, the normal distribution is bell-shaped (i.e., it is symmetrical). The reverse is also true; if a random variable Y follows a lognormal distribution, then its natural logarithm, lnY, is normally distributed.
Sampling Distribution
A sample statistic itself is a random variable, which varies depending upon the composition of the sample. It therefore has a probability distribution. The sampling distribution of a statistic is the distribution of all the distinct possible values that the statistic can assume when computed from samples of the same size randomly drawn from the same population. The most commonly used sample statistics include mean, variance, and standard deviation. If you compute the mean of a sample of 10 numbers, the value you obtain will not equal the population mean exactly; by chance, it will be a little bit higher or a little bit lower. If you sampled sets of 10 numbers over and over again (computing the mean for each set), you would find that some sample means come much closer to the population mean than others. Some would be higher than the population mean and some would be lower. Imagine sampling 10 numbers and computing the mean over and over again, say about 1,000 times, and then constructing a relative frequency distribution of those 1,000 means. This distribution of means is a very good approximation to the sampling distribution of the mean. The sampling distribution of the mean is a theoretical distribution that is approached as the number of samples in the relative frequency distribution increases. With 1,000 samples, the relative frequency distribution is quite close; with 10,000, it is even closer. As the number of samples approaches infinity, the relative frequency distribution approaches the sampling distribution. The sampling distribution of the mean for a sample size of 10 is just an example; there is a different sampling distribution for other sample sizes. Also, keep in mind that the relative frequency distribution approaches the sampling distribution as the number of samples increases, not as the sample size increases, since there is a different sampling distribution for each sample size. A sampling distribution can also be defined as the relative frequency distribution that would be obtained if all possible samples of a particular sample size were taken. For example, the sampling distribution of the mean for a sample size of 10 would be constructed by computing the mean for each of the possible ways in which 10 scores could be sampled from the population and creating a relative frequency distribution of these means. Although these two definitions may seem different, they are actually the same: Both procedures produce exactly the same sampling distribution. Statistics other than the mean have sampling distributions too. The sampling distribution of the median is the distribution that would result if the median instead of the mean were computed in each sample.
Simple Random Sample
A simple random sample is a sample obtained in such a way that each element of a population has an equal probability of being selected. The selection of any one element has no impact on the chance of selecting another element. A sample is random if the method for obtaining the sample meets the criterion of randomness (each element having an equal chance at each draw). "Simple" indicates that this process is not difficult, and "random" indicates that you don't know in advance which observations will be selected in the sample. The actual composition of the sample itself does not determine whether or not it's a random sample.
Discrete Probability Distribution
A table, graph or rule that associates a probability P(X=x ) with each possible value x that the discrete random variable X can assume is called a discrete probability distribution. It is a theoretical model for the relative frequency distribution of a population.
Cumulative Distribution Function
Analysts are often interested in finding the probability of a range of outcomes rather than a specific outcome. A cumulative distribution function (cdf) gives the probability that a random variable X is less than or equal to a particular value x, P(X≤x). In contrast, a probability function is used to find the probability of a specific outcome. To derive a cumulative distribution function F(x), simply sum the values of the probability function for all outcomes less than or equal to x. The two characteristics are: 1.) The cumulative distribution function lies between 0 and 1 for any x: 0 ≤ F(x) ≤ 1. 2.) As we increase x, the cdf either increases or remains constant. Given the cumulative distribution function, the probabilities for the random variable can also be calculated. In general: P(X = xn) = F(Xn) - F(Xn-1)
Point Estimates
Analysts can use the sample mean to estimate the population mean, and the sample standard deviation to estimate the population standard deviation. The sample mean and sample standard deviation are point estimates.
Cross-Sectional Data
Cross-sectional data are observations that come from different individuals or groups at a single point in time. If one considered the closing prices of a group of 20 different tech stocks on December 15, 1986, this would be an example of cross-sectional data. Note that the underlying population should consist of members with similar characteristics. For example, suppose you are interested in how much companies spend on research and development expenses. Firms in some industries, such as retail, spend little on research and development (R&D), while firms in industries such as technology spend heavily on R&D. Therefore, it's inappropriate to summarize R&D data across all companies. Rather, analysts should summarize R&D data by industry and then analyze the data in each industry group. Other examples of cross-sectional data would be an inventory of all ice creams in stock at a particular store and a list of grades obtained by a class of students on a specific test.
Data-Snooping Bias
Data-snooping bias is the bias in the inference drawn as a result of prying into the empirical results of others to guide your own analysis. Finding seemingly significant but in fact spurious patterns in data is a serious problem in financial analysis. Although it afflicts all non-experimental sciences, data-snooping is particularly problematic for financial analysis because of the large number of empirical studies performed on the same datasets. Given enough time, enough attempts, and enough imagination, almost any pattern can be teased out of any dataset. In some cases, these spurious patterns are statistically small, almost unnoticeable in isolation. But because small effects in financial calculations can often lead to very large differences in investment performance, data-snooping biases can be surprisingly substantial. Data-snooping bias can easily lead to data-mining bias.
Probability Function
Every random variable is associated with a probability distribution that describes the variable completely. A probability function is one way to view a probability distribution. It specifies the probability that the random variable takes on a specific value; P(X = x) is the probability that a random variable X takes on the value x. A probability function has two key properties: 0 ≤ P(X=x) ≤ 1, because probability is a number between 0 and 1. ΣP(X=x) = 1. The sum of the probabilities P(X=x) over all values of X equals 1. If there is an exhaustive list of the distinct possible outcomes of a random variable and the probabilities of each are added up, the probabilities must sum to 1. The following examples will utilize these two properties in order to examine whether they are probability functions. Example 1 p(x) = x/6 for X = 1, 2, 3, and p(x) = 0 otherwise Substituting into p(x): p(1) = 1/6, p(2) = 2/6 and p(3) = 3/6 Note that it is not necessary to substitute in any other values, as p(x) is only non-zero for X values 1, 2 and 3. In all 3 cases, p(x) lies between 0 and 1, as 1/6, 2/6 and 3/6 are all values in the range 0 to 1 inclusive. So, the first property is satisfied. Summing the probabilities gives 1/6 + 2/6 + 3/6 = 1, showing the second property is also satisfied. Example 2 p(x) = (2x - 3)/16 for X = 1, 2, 3, 4 and p(x) = 0 otherwise Substituting into p(x): p(1) = -1/16 STOP HERE! It is impossible for any probability to be negative, so it's not necessary to continue. Property 1 is violated, so it can be said straightaway that p(x) is not a probability function. Note that individual probabilities in a continuous case cannot occur, so P(X = 5), say, is 0 if X is continuous. In a continuous case, only a range of values can be considered (that is, 0 < X < 10), whereas in a discrete case, individual values have positive probabilities associated with them.
Probability Density Function (pdf)
For continuous random variables, the probability function is denoted f(x) and called probability density function (pdf), or just the density. This function is effectively the continuous analogue of the discrete probability function p(x). 1.) The probability density function, which has the symbol f(x), does not give probabilities, despite its name. Instead, it is the area between the graph and the horizontal axis that gives probabilities. Because of this, the height of f(x) is not restricted to the range 0 to 1, and the graph, which in itself is not a probability, is unrestricted as far as its height is concerned. 2.) From this information, it follows that the area under the entire graph (i.e., between the graph and the xaxis) must equal 1, because this area encapsulates all the probability contained in the random variable. Recall that for discrete distributions, the probabilities add up to 1. 3.) Because continuous random variables are concerned with a range of values, individual values have no probabilities, because there is no area associated with individual values. Rather, probabilities are calculated over a range of values. Another way of saying this is that p(x) = 0 for every individual X. 4.) If a discrete random variable has many possible outcomes, then it can be treated as a continuous random variable for conciseness, and ranges of values can be considered in determining probabilities.
Continuous Probability Distribution.
For example, the possible outcomes are the integers 1 to 8 (inclusive), and the probability that the random variable takes on any of those possible values is the same for all outcomes (i.e., it is uniform). If a continuous random variable is equally likely to fall at any point between its maximum and minimum values, it is a continuous uniform random variable, and its probability distribution is a continuous probability distribution. The probability density function is a horizontal line with a height of 1/(b-a) over a range of values from a to b. The cumulative density function is a sloped line with a height of 0 to 1 over a range of values from a to b, and is a horizontal line with a height of 1 when the value of the variable equals or exceeds b. 1.) The probability density function is a horizontal line with a height of 1/(b-a) over a range of values from a to b. 2.) The cumulative density function is a sloped line with a height of 0 to 1 over a range of values from a to b, and is a horizontal line with a height of 1 when the value of the variable equals or exceeds b.
Historical simulation
Historical simulation samples from a historical record of returns (or other underlying variables) are used to simulate a process because the historical record provides the most direct evidence on distributions (and that past applies to the future). In contrast, Monte Carlo simulation uses a random number generator with a specified distribution. A drawback is that any risk not represented in the time period selected will not be reflected in the simulation. For example, if a stock market crash did not take place in the sample period, such a risk will not be reflected in the simulation. In addition, this method does not lend itself to "what-if" analysis.
Z-Statistic for a Standard Normal Random Variable
If a population is normally distributed with a known variance, a z-statistic is used as the reliability factor to construct confidence intervals for the population mean. In practice, the population standard deviation is rarely known. However, learning how to compute a confidence interval when the standard deviation is known is an excellent introduction to how to compute a confidence interval when the standard deviation has to be estimated. Three values are used to construct a confidence interval for μ: 1.) The sample mean (m) 2.) The value of z (which depends on the level of confidence) 3.) The standard error of the mean (σ)m The confidence interval has m for its center and extends a distance equal to the product of z and in both directions. Therefore, the formula for a confidence interval is: m - z σm <= μ <= m + z σm For a (1 - α)% confidence interval for the population mean, the z-statistic to be used is z(a/2). z(a/2) denotes the points of the standard normal distribution such that α/2 of the probability falls in the right-hand tail. Effectively, what is happening is that the (1 - α)% of the area that makes up the confidence interval falls in the center of the graph, that is, symmetrically around the mean. This leaves α% of the area in both tails, or α/2 % of area in each tail. Commonly used reliability factors are as follows: 1.) 90% confidence intervals: z(0.05) = 1.645. α is 10%, with 5% in each tail 2.) 95% confidence intervals: z(0.025) = 1.96. α is 5%, with 2.5% in each tail 3.) 99% confidence intervals: z(0.005) = 2.575. α is 1%, with 0.5% in each tail
Stratified Random Sampling
In stratified random sampling, the population is subdivided into subpopulations (strata) based on one or more classification criteria. Simple random samples are then drawn from each stratum (the sizes of the samples are proportional to the relative size of each stratum in the population). These samples are then pooled. It is important to note that the size of the data in each stratum does not have to be the same or even similar, and frequently isn't. Stratified random sampling guarantees that population subdivisions of interest are represented in the sample. The estimates of parameters produced from stratified sampling have greater precision (i.e., smaller variance or dispersion) than estimates obtained from simple random sampling. For example, investors may want to fully duplicate a bond index by owning all the bonds in the index in proportion to their market value weights. This is known as pure bond indexing. However, it's difficult and costly to implement because a bond index typically consists of thousands of issues. If simple sampling is used, the sample selected may not accurately reflect the risk factors of the index. Stratified random sampling can be used to replicate the bond index. 1.) Divide the population of index bonds into groups with similar risk factors (e.g., issuer, duration/maturity, coupon rate, credit rating, call exposure, etc.). Each group is called a stratum or cell. 2.) Select a sample from each cell proportional to the relative market weighting of the cell in the index. A stratified sample will ensure that at least one issue in each cell is included in the sample.
Confidence Intervals
Probability statements about a random variable are often framed using confidence intervals built around point estimates. In investment work, confidence intervals for a normal random variable in relation to its estimated mean are often used. Confidence intervals use point estimates to make probability statements about the dispersion of the outcomes of a normal distribution. A confidence interval specifies the percentage of all observations that fall in a particular interval. The exact confidence intervals for a normal random variable X: 1.) 90% confidence interval for X is: x-bar - 1.645σ to x-bar + 1.645σ: this means that 10% of the observations fall outside the 90% confidence interval, with 5% on each side. 2.) 95% confidence interval for X is: x-bar - 1.96σ to x-bar + 1.96 σ: this means that 5% of the observations fall outside the 95% confidence interval, with 2.5% on each side. 3.) 99% confidence interval for X is: x-bar - 2.58 σ to x-bar + 2.58 σ: this means that 1% of the observations fall outside the 99% confidence interval, with 0.5% on each side.
Roy's Safety-First Criterion Example
Roy's safety-first criterion states that the optimal portfolio should minimize the probability that the rate of return of the portfolio (R ) will fall below a stated threshold level (R ). If returns are normally distributed, it states that the optimal portfolio maximizes the safety-first ratio. Therefore, assuming that returns are normally distributed, the safety-first optimal portfolio can be selected using one of the following two criteria: 1.) Lowest probability of Rp < Rl 2.) Highest safety-first ratio. There are three steps in choosing among portfolios using Roy's criterion (assuming normality): 1.) Calculate the portfolio's SFRatio. 2.) Evaluate the standard normal cdf at the value calculated for the SFRatio; the probability that return will be less than Rl is N(-SFRatio). 3.) Choose the portfolio with the lowest probability. Example: Suppose that a certain fund has reached a value of $500,000. At the end of the next year, the fund managers wish to withdraw $20,000 for additional funding purposes, but do not wish to tap into the original $500,000. There are three possible investment options: Which option is most preferable? (You may assume normally distributed returns throughout.) Answer First, note that since the managers do not want to tap into the original fund, a return of 20000/500000 = 0.04 is the minimum acceptable return; this is the threshold return, Rl. You now need to calculate the SFRatio in each case: A: (10-4)/15 = 0.4 B: (8-4)/12 = 0.33 C: (9-4)/14 = 0.36 You can conclude that portfolio A, with the highest SFRatio of the three, is the most preferable. You can also take this a step further and calculate the probability that the portfolio return will fall below the threshold return, that is, P(R < R ). To do this, we take the negative of the SFRatio in each case and find the cdf of the standard normal distribution for this value. In symbols, P(R < R ) = F(-SFRatio), where F is the cdf of the standard normal distribution. From normal tables, F(-0.4) = 0.3446, F(-0.33) = 0.3707 and F(-0.36) = 0.3594. This indicates that, for portfolio A, the chance of obtaining a return below R is 0.3446, with corresponding values for portfolios B and C of 0.3707 and 0.3594 respectively. Since the chance of not exceeding the threshold return is lowest for portfolio A, this is again the best option.
Central Limit Theorem
The central limit theorem states that, given a distribution with a mean μ and variance σ^2, the sampling distribution of the mean x-bar approaches a normal distribution with a mean (μ) and a variance (σ^2)/N as N, the sample size, increases. The amazing and counter-intuitive thing about the central limit theorem is that no matter the shape of the original distribution, x-bar approaches a normal distribution. 1.) If the original variable X has a normal distribution, then x-bar will be normal regardless of the sample size. 2.) If the original variable X does not have a normal distribution, then x-bar will be normal only if N ≥ 30. This is called a distribution-free result. This means that no matter what distribution X has, it will still be normal for sufficiently large n. Keep in mind that N is the sample size for each mean and not the number of samples. Remember that in a sampling distribution the number of samples is assumed to be infinite. The sample size is the number of scores in each sample; it is the number of scores that goes into the computation of each mean. Two things should be noted about the effect of increasing N: 1.) The distributions become more and more normal. 2.) The spread of the distributions decreases. Based on the central limit theorem, when the sample size is large, you can: 1.) Use the sample mean to infer the population mean. 2.) Construct confidence intervals for the population mean based on the normal distribution. Note that the central limit theorem does not prescribe that the underlying population must be normally distributed. Therefore, the central limit theorem can be applied to a population with any probability distribution.
Discrete Uniform Distribution
The discrete uniform distribution is the simplest of all probability distributions. This distribution has a finite number of specified outcomes, and each outcome is equally likely. Mathematically, suppose that a discrete uniform random variable, X, has n possible outcomes: x1, x2, ..., xn-1, and xn. 1.) p(x ) = p(x ) = p(x ) = ... = p(x ) = p(x ) = p(x). That is, the probabilities for all possible outcomes are equal. 2.) F(x ) = kp(x ). That is, the cumulative distribution function for the k outcome is k times of the probability of the k outcome. 3.) If there are k possible outcomes in a particular range, the probability for that range of outcomes is kp(X).
Sampling Error
The sample taken from a population is used to infer conclusions about that population. However, it's unlikely that the sample statistic would be identical to the population parameter. Suppose there is a class of 100 students and a sample of 10 from that class is chosen. If, by chance, most of the brightest students are selected in this sample, the sample will provide a misguided idea of what the population looks like (because the sample mean x-bar will be much higher than the population mean in this case). Equally, a sample comprising mainly weaker students could be chosen, and then the opposite applicable characteristics would apply. The ideal is to have a sample which comprises a few bright students, a few weaker students, and mainly average students, as this selection will give a good idea of the composition of the population. However, because which items go into the sample cannot be controlled, you are dependent to some degree on chance as to whether the results are favorable (indicative of the population) or not. Sampling error (also called error of estimation) is the difference between the observed value of a statistic and the quantity it is intended to estimate. For example, sampling error of the mean equals sample mean minus population mean. Sampling error can apply to statistics such as the mean, the variance, the standard deviation, or any other values that can be obtained from the sample. The sampling error varies from sample to sample. A good estimator is one whose sample error distribution is highly concentrated about the population parameter value. Sampling error of the mean would be: Sample mean - population mean = x-bar - μ. Sampling error of the standard deviation would be: Sample standard deviation - population standard deviation = s - σ.
Standard Error of a Statistic
The standard error of a statistic is the standard deviation of the sampling distribution of that statistic. Standard errors are important because they reflect how much sampling fluctuation a statistic will show. The inferential statistics involved in the construction of confidence intervals and significance testing are based on standard errors. The standard error of a statistic depends on the sample size. In general, the larger the sample size, the smaller the standard error. The standard error of a statistic is usually designated by the Greek letter sigma (σ) with a subscript indicating the statistic.
The Standard Error of the Mean
The standard error of the mean is designated as: σ . It is the standard deviation of the sampling distribution of the mean. The formula for the standard error of the mean is: where σ is the standard deviation of the original distribution and N is the sample size (the number of scores each mean is based upon). This formula does not assume a normal distribution. However, many of the uses of the formula do assume a normal distribution. The formula shows that the larger the sample size, the smaller the standard error of the mean. More specifically, the size of the standard error of the mean is inversely proportional to the square root of the sample size. Example 1 Suppose that the mean grade of students in a class is 62%, with a standard deviation of 10%. A sample of 30 students is taken from the class. Calculate the standard error of the sample mean and interpret your results. You are given that μ = 62, and σ =10. Since n = 30, the standard error of the sample mean is: σ = 10/(30^1/2) = 1.8257. This means that if you took all possible samples of size 30 from the class, the mean of all those samples would be 62 and the standard error would be 1.8257. Note that if you took a sample size of 50, the standard error would then be: σ = 10 / 50 = 1.4142. The standard error would drop as the sample size increased, which agrees with the information above.
Z distribution (Z Score)
The standard normal distribution is sometimes called the z distribution. A z score (also called z-value or zstatistic) is the distance between a selected value (X) and the population mean, divided by the population standard deviation. It is in fact a standard normal random variable. For instance, if a person scored 70 on a test with a mean of 50 and a standard deviation of 10, that person scored 2 standard deviations above the mean. Converting the test scores to z scores, an X of 70 would be: z = (70 - 50) / 10 = 2.
Monte Carlo Simulation
When a system is too complex to be analyzed using ordinary methods, investment analysts frequently use Monte Carlo simulation. Monte Carlo simulation involves trying to simulate the conditions that apply to a specific problem by generating a large number of random samples using a random number generator on a computer. After generating the data, quantities such as the mean and variance of the generated numbers can be used as estimates of the unknown parameters of the population (parameters are too complex to find through normal methods). The term "Monte Carlo simulation" derives from the generation of a large number of random samples, such as might occur in the Monte Carlo Casino. 1.) It allows us to experiment with a proposed policy and assess the risks before actually implementing it. For example, it is used to simulate the interaction of pension assets and the liabilities of defined benefit pension plans. 2.) It is widely used to develop estimates of Value at Risk (VAR). VAR involves estimating the probability that portfolio losses exceed a predefined level. 3.) It is used to value complex securities such as European options, mortgage-backed securities with complex embedded options. 4.) Researchers use it to test their models and tools. Limitations of Monte Carlo Simulation: 1.) It is a complement to analytical methods. It provides only statistical estimates, not exact results. 2.) It does not directly provide precise insights as analytical methods do. For example, it cannot reveal cause-and-effect relationships.
Students' t-Distribution
When σ is known, the formula m - z σm <= μ <= m + z σm is used for a confidence interval. When σ is not known, σm = s/N^(1/2) (N is the sample size) is used as an estimate of σ and μ. Whenever the standard deviation is estimated, the t rather than the normal (z) distribution should be used. The values of t are larger than the values of z, so confidence intervals when σ is estimated are wider than confidence intervals when σ is known. The formula for a confidence interval for μ when σ is estimated is: where m is the sample mean, s is an estimate of σ , and t depends on the degrees of freedom and the level of confidence. The t-distribution is a symmetrical probability distribution defined by a single parameter known as degrees of freedom (df). Each value for the number of degrees of freedom defines one distribution in this family of distributions. Like a standard normal distribution (e.g., a z-distribution), the t-distribution is symmetrical around its mean. Unlike a standard normal distribution, the t-distribution has the following unique characteristics. 1.) It is an estimated standardized normal distribution. When n gets larger, t approximates z (s approaches σ). 2.) The mean is 0 and the distribution is bell-shaped. 3.) There is not one t-distribution, but a family of t-distributions. All t-distributions have the same mean of 0. Standard deviations of these t-distributions differ according to the sample size, n. 4.) The shape of the distribution depends on degrees of freedom (n - 1). The t-distribution is less peaked than a standard normal distribution and has fatter tails (i.e., more probability in the tails). 5.) t(a/2) tends to be greater than z(a/) for a given level of significance, α. 6.) Its variance is v/(v-2) (for v > 2), where v = n-1. It is always larger than 1. As v increases, the variance approaches 1.
Continuous Variable
A continuous variable is one within the limits of variable ranges for which any value is possible. The number of possible values cannot be counted, and, as you will see later, each individual value has zero probability associated with it. For example, the variable "time to solve an anagram problem" is continuous since it could take 2 minutes or 2.13 minutes, etc., to finish a problem. A variable such as a person's height can take on any value as well. The rate of return on an asset is also a continuous random variable since the exact value of the rate of return depends on the desired number of decimal spaces. Statistics computed from discrete variables are continuous. The mean on a five-point scale could be 3.117 even though 3.117 is not possible for an individual score.
Bernoulli Trial
A Bernoulli trial is an experiment with two outcomes, which can represent success or failure, up move or down move, or another binary outcome. As one of these two outcomes must definitely occur, that is, they are exhaustive, and also mutually exclusive, it follows immediately that the sum of the probabilities of a "success" and a "failure" is 1.
Discrete Variable
A discrete variable is one that cannot take on all values within the limits of the variable. It can assume only a countable number of possible values. For example, responses to a five-point rating scale can only take on the values 1, 2, 3, 4, and 5. The variable cannot have the value 1.7. The variable "number of correct answers on a 100-point multiple-choice test" is also a discrete variable since it is not possible to get 54.12 problems correct. The number of movies you will see this year, the number of trades a broker will perform next month, and the number of securities in a portfolio are all examples of discrete variables.
Random Variable
A random variable is a quantity whose future outcomes are uncertain. Depending on the characteristics of the random variable, a probability distribution may be either discrete or continuous. For any random variable, it is necessary to know two things: 1.) The list of all possible values that the random variable can take on. 2.) The probability of each value occurring.
Uniform Distribution (Rectangular Distribution)
A uniform distribution is one for which the probability of occurrence is the same for all values of X. It is just one type of special random variable and is sometimes called a rectangular distribution. For example, if a die is thrown, the probability of obtaining any one of the six possible outcomes is 1/6. Since all outcomes are equally probable, the distribution is uniform. If a uniform distribution is divided into equally spaced intervals, there will be an equal number of members of the population in each interval.
Confidence Intervals for the Population Mean
Confidence intervals are typically constructed using the following structure: Confidence Interval = Point Estimate ± Reliability Factor x Standard Error 1.) The point estimate is the value of a sample statistic of the population parameter. 2.) The reliability factor is a number based on the sampling distribution of the point estimate and the degree of confidence (1 - α). 2.) Standard error refers to the standard error of the sample statistic that is used to produce the point estimate. Whatever the distribution of the population, the sample mean is always the point estimate used to construct the confidence intervals for the population mean. The reliability factor and the standard error, however, may vary depending on three factors: 1.) Distribution of population: normal or non-normal 2.) Population variance: known or unknown 2.) Sample size: large or small
Data-Mining
Data-mining is the practice of finding forecasting models by extensive searching through databases for patterns or trading rules (i.e., repeatedly "drilling" in the same data until you find something). It has a very specific definition: continually mixing and matching the elements of a database until one "discovers" two more or more data series that are highly correlated. Data-mining also refers more generically to any of a number of practices in which data can be tortured into confessing anything. Two signs may indicate the existence of data-mining in research findings about profitable trading strategies: 1.) Many of the variables actually used in the research are not reported. These terms may indicate that the researchers were searching through many unreported variables. 2.) There is no plausible economic theory available to explain why these strategies work. To avoid data-mining, analysts should use out-of-sample data to test a potentially profitable trading rule. That is, analysts should test the trading rule on a data set other than the one used to establish the rule.
Discrete Random Variable
For a discrete random variable, the shorthand notation is p(x) = P(X = x).
Continuous Uniform Random Variable
For example, the possible outcomes are the integers 1 to 8 (inclusive), and the probability that the random variable takes on any of those possible values is the same for all outcomes (i.e., it is uniform). If a continuous random variable is equally likely to fall at any point between its maximum and minimum values, it is a continuous uniform random variable, and its probability distribution is a continuous probability distribution. The probability density function is: f(x) = 1/(b - a) for a ≤ x ≤ b; or 0 otherwise.
Sampling
In investment analysis, it is often impossible to study every member of a population. Even if analysts could examine an entire population, it may not be economically efficient to do so. Sampling is the process of obtaining a sample.
Look-Ahead Bias
Look-ahead bias exists when studies assume that fundamental information is available when it is not. For example, researchers often assume that a person had annual earnings data in January; in reality, the data might not be available until March. This usually biases results upwards.
Normal Distributions
Normal distributions are a family of distributions that have the same general shape. 1.) They are symmetrical with scores more concentrated in the middle than in the tails. 2.) Normal distributions are sometimes described as bell-shaped with a single peak at the exact center of the distribution. 3.) The tails of the normal curve extends indefinitely in both directions. That is, possible outcomes of a normal distribution lie between - ∞ and + ∞. 4.) Normal distributions may differ in how spread-out they are. The key properties of a normal distribution: 1.) The normal distribution is completely described by two parameters: the mean (μ) and the standard deviation (σ). 2.) The normal distribution is symmetrical: it has a skewness of 0, a kurtosis (it measures the peakedness of a distribution) of 3, and an excess kurtosis (which equals kurtosis less 3) of 0. As a consequence, the mean, median, and mode are all equal for a normal random variable. 3.) A linear combination of two or more normal random variables is also normally distributed. One reason the normal distribution is important is that many psychological, educational, and financial variables are distributed approximately normally. Measures of reading ability, introversion, job satisfaction, and memory are among the many psychological variables approximately normally distributed. Although the distributions are only approximately normal, they are usually quite close.
Standard Normal Distribution Formula - Z-Score Formula
Normal distributions can be transformed to standard normal distributions by the formula: where X is a score from the original normal distribution, μ is the mean of the original normal distribution, and σ is the standard deviation of original normal distribution.
Roy's Safety-First Criterion
Roy's safety-first criterion states that the optimal portfolio minimizes the probability that portfolio return, R , falls below the threshold level, Rp. In symbols, the investor's objective is to choose a portfolio that minimizes P(Rp < Rl). When portfolio returns are normally distributed, the investor can calculate P(Rp < Rl ) using the number of standard deviations Rl lies below the expected portfolio return, E(Rp). The portfolio for which E(Rp) - Rl is largest in terms of units of standard deviations minimizes P(Rp < Rl). Thus, if returns are normally distributed, the safety-first optimal portfolio maximizes the safety-first ratio (SFRatio): The quantity E(Rp) - Rl is the distance from the mean return to the shortfall level. It measures the excess return over the threshold return level. SFRatio gives the distance in units of standard deviation: it measures the excess return over the threshold level per unit of risk. For example, the expected return on a portfolio is 20%, and the standard deviation of the portfolio return is 15%. Suppose the minimum acceptable return level is 10%. SFRatio = (20% - 10%)/15% = 0.67. Note that the SFRatio is similar to the Sharpe ratio (E(Rp) - Rf)/standard deviation, where Rf is the risk-free rate. The SFRatio becomes the Sharpe ratio if the risk-free rate is substituted for the threshold return Rl. Therefore, the safety-first criterion focuses on the excess return over the threshold return, while the Sharpe ratio focuses on the excess return over the risk-free rate.
Sample Selection Bias
Sample selection bias occurs when data availability leads to certain assets being excluded from the analysis. The discrete choice has become a popular tool for assessing the value of non-market goods. Surveys used in these studies frequently suffer from large non-response numbers, which can lead to significant bias in parameter estimates and in the estimate of mean.
Survivor-ship Bias
Survivorship bias is the most common type of sample selection bias. It occurs when studies are conducted on databases that have eliminated all companies that have ceased to exist (often due to bankruptcy). The findings from such studies most likely will be upwardly biased, since the surviving companies will look better than those that no longer exist. For example, many mutual fund databases provide historical data about only those funds that are currently in existence. As a result, funds that have ceased to exist due to closure or merger do not appear in these databases. Generally, funds that have ceased to exist have lower returns relative to the surviving funds. Therefore, the analysis of a mutual fund database with survivorship bias will overestimate the average mutual fund return because the database only includes the better-performing funds. Another example is the return data on stocks listed on an exchange, as it is subject to survivorship bias; it's difficult to collect information on delisted companies and these companies often have poor performances.
Binomial Probability Formula
The binomial probability for obtaining r successes in n trials is: where p(r) is the probability of exactly r successes, n is the number of events, and p is the probability of success on any one trial. This formula assumes that the events are: 1.) Dichotomous (fall into only two categories) 2.) Mutually exclusive 3.) Independent 4.) Randomly selected To remember the formula, note that there are three components: 1.) n!/[(n-r)! x r!]. This indicates the number of ways r successes can be achieved and n - r failures in n trials, where the order of success or failure does not matter. This is the combination formula. 2.) p^r. This is the probability of getting r consecutive success. 3.) (1 - p)^(n-r). This is the probability of getting n - r consecutive failures. The values for n and p will always be given to you in a question; their values will never have to be guessed. Consider this simple application of the binomial distribution. What is the probability of obtaining exactly 3 heads if a coin is flipped 6 times? For this problem, n = 6, r = 3, and p = 0.5, =>p(3) = {6!/[(6 - 3)! x 3!]}0.5^3(1 - 0.5)^(6-3) = 0.3125.
Continuously Compounded Rate of Return
The continuously compounded rate of return measures the rate of change in the value of an asset associated with a holding period under the assumption of continuously compounding. It is the natural logarithm of 1 plus the holding period return, or equivalently, the natural logarithm of the ending price over the beginning price. From t to t + 1: S: stock price. Rt, (t+1): the rate of return from t to t + 1 Example 1 S = $30, S1 = $34.50. ==> Rt = $34.50/$30 - 1 = 0.15, and r0,1 = 0.139762. The continuously compounded return is smaller than the associated holding period return.
Confidence Limits
The end points of a confidence interval are called the lower and upper confidence limits. For example, suppose that a 95% confidence interval for the population mean is 20 to 40. This means that: There is a 95% probability that the population mean lies in the range of 20 to 40. 1.) "95%" is the degree of confidence. 2.) "5%" is the level of significance. 3.) 20 and 40 are the lower and higher confidence limits, respectively.
Shortfall Risk
The focus of this section is assessing risks in a portfolio, a process that allows us to establish rules for dealing with those risks and minimize them as much as possible. Shortfall risk is the risk that portfolio value will fall below some minimum acceptable level over some time horizon. The risk that assets in a defined benefit plan will fall below plan liabilities is an example of a shortfall risk. Therefore, shortfall risk is a downside risk. In contrast, when a risk-averse investor makes portfolio decisions in terms of the mean return and the variance (or standard deviation) of return, both upside and downside risks are considered. For example, portfolios A and B have the same mean return of 20%. The standard deviations of the returns on A and B are 5% and 8% respectively. Portfolio B has a higher risk because its standard deviation is higher. However, though the return on portfolio B is more likely to fall below 20%, it's also more likely to exceed 20%.
Standard Normal Distribution
The problem with working with a normal distribution is that its formula is very complicated. A computer is needed to calculate areas under the graph; this is required in order to calculate probabilities. The way to get around this problem is to standardize a normal random variable, which involves converting it to a general scale for which probability tables exist. The standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1. It is denoted as N(0,1). Below are some confidence intervals for the standard normal distribution:
Point Estimate
The single estimate of an unknown population parameter calculated as a sample mean is called a point estimate of the mean. The formula used to compute the point estimate is called an estimator. The specific value calculated from sample observations using an estimator is called an estimate. For example, the sample mean is a point estimate of the population mean. Suppose two samples are taken from a population and the sample means are 16 and 21 respectively. Therefore, 16 and 21 are two estimates of the population mean. Note that an estimator will yield different estimates as repeated samples are taken from the sample population.
Time period Bias
Time period bias occurs when a test design is based on a time period that may make the results time-period specific. Even the worst performers have months or even years in which they look wonderful. After all, stopped clocks are right twice a day. To eliminate strategies that have just been lucky, research must encompass many years. However, if the time period is too long, the fundamental economic structure may have changed during the time frame, resulting in two data sets that reflect different relationships.
Time-Series Data
Time-series data is a set of observations collected at usually discrete and equally spaced time intervals. The daily closing price of a certain stock recorded over the last six weeks is an example of time-series data. Note that a too-long or too-short time period may lead to time-period bias. Refer to subject g for details. Other examples of time-series data would be staff numbers at a particular institution taken on a monthly basis in order to assess staff turnover rates, weekly sales figures of ice cream sold during a holiday period at a seaside resort and the number of students registered for a particular course on a yearly basis. All of the above would be used to forecast likely data patterns in the future.
Univariate Distribution
To this point, the focus has been on distributions that involve only one variable, such as the binomial, uniform, and normal distributions. A univariate distribution describes a single random variable. For example, suppose that you would like to model the distribution of the return on an asset. Such a distribution is a univariate distribution.
Binomial Distribution
When a coin is flipped, the outcome is either heads or tails. When a magician guesses the card selected from a deck, the magician can either be correct or incorrect. When a baby is born, the baby is either born in the month of March or is not. In each of these examples, an event has two mutually exclusive possible outcomes. For convenience, one of the outcomes can be labeled "success" and the other outcome "failure." If an event occurs N times (for example, a coin is flipped N times), then the binomial distribution can be used to determine the probability of obtaining exactly r successes in the N outcomes.
Appropriate Sample Size
When a large sample size (generally larger than 30 samples) is used, a z-table can always be used to construct the confidence interval. It does not matter if the population distribution is normal or if the population variance is known. This is because the central limit theorem assures us that when the sample is large, the distribution of the sample mean is approximately normal. However, the t-statistic is more conservative because it tends to be greater than the z-statistic; therefore, using a t-statistic will result in a wider confidence interval. If there is only a small sample size, a t-table has to be used to construct the confidence interval when the population distribution is normal and the population variance is not known. If the population distribution is not normal, there is no way to construct a confidence interval from a small sample (even if the population variance is known). Therefore, if all other factors are equal, you should try to select a sample larger than 30. The larger the sample size, the more precise the confidence interval. In general, at least one of the following is needed: 1.) A normal distribution for the population 2.) A sample size that is greater than or equal to 30 If one or both of the above occur, a z-table or t-table is used, dependent upon whether σ is known or unknown. If neither of the above occurs, then the question cannot be answered. A summary of the situation is as follows: 1.) If the population is normally distributed and the population variance is known, use a z-score (irrespective of sample size). 2.) If the population is normally distributed and the population variance is unknown, use a t-score (irrespective of sample size). 3.) If the population is not normally distributed, and the population variance is known, use a z-score only if n >= 30; otherwise, it cannot be calculated. 4.) If the population is not normally distributed and the population variance is unknown, use a t-score only if n >= 30; otherwise, it cannot be calculated.