MAS-I Sections 2.1 - 2.2

Ace your homework & exams now with Quizwiz!

Steps to Calculate the MLE

1. Calculate the likelihood function as the product of each observation's likelihood 2. Take the natural log of the likelihood function 3. To maximize the function, take its first derivative and set equal to 0. *Multiplicative constants do not impact the MLE.

Smoothed Empirical Percentile - Unique Values

1. Let x(i) denote the ith observed value in ascending order, e.g. x(10) is the 10th smallest sample data point. Also, define b=⌊q(n+1)⌋, i.e. round q(n+1) down to the nearest integer. 2. Calculate q(n+1). For example, if q=0.65 and n=34, then q(n+1)=22.75. You may interpret this to loosely mean "the 65th sample percentile is the 22.75th observed value in ascending order". -If q(n+1) is a non-integer, calculate π^q by linearly interpolating between x(b) and x(b+1). Since 22.75 is between 22 and 23, we need to interpolate between x(22) and x(23), the 22nd and 23rd observed values in ascending order. Take the numbers after the decimal of q(n+1) as the weight that is multiplied to the larger value x(b+1). Then, the smaller value x(b) gets the complement weight. In other words, π^0.65=0.25x(22)+0.75x(23), where 0.75 is taken from 22.75, and 0.25=1−0.75. 4. If q(n+1) is an integer, then π^q=x(b). -This is consistent with the interpolation technique above; integers have 0's after the decimal, so the larger value x(b+1) will receive a weight of 0. There are a couple of caveats worth mentioning: q(n+1) must be between 1 and n, inclusive. For this exam, we may take this for granted; otherwise, π^q is undefined. The interpolation technique in Step 3 assumes that all observed values from the sample are unique, i.e. there are no repeated values. We will address how to handle repeated values shortly.

Random Sample

A random sample describes a collection of random variables that are independent and identically distributed (i.i.d.).

Statistic

A statistic is a function of random variables from a sample. A statistic summarizes the n random variables by mapping them to one value.

Invariance Property

An appealing property of MLE is its invariance property: for a function g(⋅), the MLE estimate of g(θ) equals the function evaluated at the MLE estimate of θ, i.e. g(θ^). It is particularly helpful for determining the MLE estimate of a distribution's mean or variance, as they are functions of distribution parameters.

Estimator

An estimator is a rule (i.e. a function) that describes how to calculate a parameter estimate. In other words, an estimator is a statistic whose purpose is to estimate a parameter.

Consistency

Another way to examine an estimator is to see if it is consistent. By definition, θ^ is a consistent estimator of θ if: lim(n→∞) Pr(∣θ^−θ∣>ε) = 0 for all ε>0. Consistency assesses whether θ^ converges (in probability) to θ when the sample size approaches ∞. In the context of this exam, using the definition to prove consistency tends to be challenging. On the other hand, if: 1. θ^ is asymptotically unbiased, and 2. lim(n→∞) Var[θ^] = 0 then θ^ is consistent. Otherwise, it is inconclusive. The two sufficient conditions for consistency can be described as a single sufficient condition: lim(n→∞) MSE[θ^] = 0

Percentile Matching

Aside from moments, there are other characteristics that should ideally align between the sample data and distribution. The percentile matching method focuses on using percentiles to obtain parameter estimates.

Variance

By definition, variance measures the dispersion of a random variable from its mean. In math notation, the variance of θ^is Var[θ^] = E[(θ^−E[θ^])^2] Hence, an estimator with a larger variance suggests that the estimate is prone to be a value further from the estimator's mean. As a result, even an unbiased estimator may be unhelpful when the variance is large.

MLE - Censored Data

Censored data is an example of observations for which the exact values are not known. An observation is right-censored at m when the value is known to be at least mm but is only recorded as mm. The likelihood of the observation is: Pr(X≥m) Right censoring is commonly seen in insurance whenever there is a policy limit. A policyholder can claim at most the policy limit. Thus, for losses that exceed the maximum covered loss, records will likely only document them as the maximum covered loss. So while the exact amounts of the losses are not known, it is known that those losses are at least the maximum covered loss.

MLE - Binomial

For X∼Binomial (m,q) where m is fixed, the MLE estimate of q is the sample mean divided by m. E[X] = mq = x¯ ⇒ q^ = x¯ / m

MLE - Gamma

For X∼Gamma (α,θ) where α is fixed, the MLE estimate of θ is the sample mean divided by α. E[X]=αθ=x¯⇒ θ^=x¯/α The exponential distribution is a special case of the gamma distribution withα=1. Thus, the MLE estimate of θ would then be the sample mean.

MLE - Negative Binomial

For X∼Negative Binomial(r,β)where r is fixed, the MLE estimate of β is the sample mean divided by r. E[X]=rβ=x¯ ⇒ β^=x¯ / r The geometric distribution is a special case of the negative binomial distribution with r=1. Thus, the MLE estimate of β would then be the sample mean.

MLE - Poisson

For X∼Poisson (λ), the MLE estimate of λ is the sample mean. λ^ = x¯

MLE Laplace

For a Laplace or double exponential distribution with parameter θ, a possible MLE estimate is the sample median. θ^ = π^0.5

MLE - Uniform

For a uniform distribution on the interval [0,θ], the MLE estimate of θ is the largest observed value. θ^ = max(x1, x2, ... , xn) = x(n)

MLE - Normal

From the list, the normal distribution is the only one where we can match the first two moments to obtain the MLE estimates of two parameters. For X∼Normal (μ,σ^2), matching the moments produces E[X]=μ=x¯ E[X^2] = σ^2+μ^2 = ∑x^2/n Therefore, the MLE estimates of μ and σ^2 are μ^ = x¯ σ^2= (∑x^2)/n − x¯2 = ∑(x−x¯)^2/n ***If μ is fixed rather than estimated using MLE, then the MLE estimate of σ^2 is: σ^2 = (xi - μ)^2 / n which does not equal to (∑x^2) / n − μ^2

Smoothed Empirical Percentile - Repeated Values

If any of the observed values from the sample are repeated, then our process of computingπ^q needs a minor adjustment. The index i should be updated such that every unique observed value corresponds to only one i — the largest one. To demonstrate, consider the following sample data: 2 4 4 5 7 9 11 11 11 Note that x(2)=x(3)=4and x(7)=x(8)=x(9)=11. Even though there are 9 observed values, only 6 are unique. Thus, the index i is updated to have the following 6 numbers: i = 1, 3, 4, 5, 6, 9 In other words, 2, 7, and 8 are dropped from the index; we keep the indices 3 and 9 because they are the largest ones for the repeated values. Consequently, linear interpolation occurs between the x(i)'s given the updated i. As practice, let's solve for π^0.7. First, calculate q(n+1)=0.7(9+1)=7. Since 7 is not listed in the updated i, we linearly interpolate between the observed values whose updated i's enclose 7, i.e. interpolate between x(6) and x(9). We assign more weight to x(6) because 7 is closer to 6, and we assign less weight to x(9) because 9 is farther from 7. Therefore, x(6) receives 2 out of the 3 units, while x(9) receives 1 out of the 3 units.

MVUE

It is an unbiased estimator with the smallest variance among all unbiased estimators, regardless of the true parameter value. To be clear, the MVUE does not necessarily have the smallest variance of all estimators; it is only smallest among the unbiased ones. Therefore, the variance of the MVUE could possibly be larger than the variance of a biased estimator. With Y denoting a statistic of the random sample X1, ..., Xn, the Lehmann-Scheffé theorem identifies three conditions that lead to a unique MVUE: 1. Y is a sufficient statistic for θ. 2. The distribution of Y comes from a complete family of distributions. 3. There is a function of Y, φ(Y), that is an unbiased estimator of θ. When all three conditions are met, φ(Y) is the MVUE of θ. Collectively, the first two conditions can be described as Y being a complete sufficient statistic. We will study these conditions in detail.

Maximum Likelihood Estimation (MLE)

Maximum likelihood estimation (MLE) is another approach to point estimation. As the name suggests, MLE finds the parameters that maximize the likelihood of the observations. The likelihood function is the product of each observation's likelihood.

Unbiased Sample Variance

S^2 = ∑(Xi−X¯)^2 / (n−1)

MLE - Grouped Data

Similar to censored data, grouped data is an example of observations for which the exact values are not known. Grouped data is presented as the number of observations in a distinct interval. The likelihood of an observation in the interval (a,b] is Pr(a<X≤b) = F(b) − F(a) = S(a) − S(b) For discrete distributions, the likelihood of grouped data is more easily expressed as the sum of PMFs evaluated at every value within the range, e.g. Pr(a≤X≤b)=p(a)+...+p(b)

Bias

The bias of an estimator is calculated as Bias[θ^] = E[θ^] - θ This describes how close the estimator's mean is to the true value of the parameter. θ^ is an unbiased estimator when its bias is 0, i.e. when E[θ^]=θ. Otherwise, θ^ is a biased estimator. Clearly, unbiasedness is a favorable quality, but remember this is only one aspect of an estimator.

Likelihood vs Probability

The likelihood function appears to be the same as a joint probability function evaluated at all the observed values. You are free to make that association, but the two functions are inherently different. One key distinction is that a likelihood function is a function of the parameter(s), whereas a joint probability function is a function of the random variable outcomes. The point is that likelihood is not inherently the same as probability.

Mean Squared Error (MSE)

The mean squared error is also commonly used to determine the quality of an estimator. By definition, MSE[θ^] = E[(θ^−θ)^2] MSE[θ^] = Var[θ^] + (Bias[θ^])^2 *Notice the definition of a variance is similar to the formula above, where the parameter θ is replaced with the mean of θ^. Thus, the MSE and variance of an estimator are similar measures; the MSE measures the average squared deviation from the true parameter value, while the variance measures the average squared deviation from the estimator's mean. The relation between MSE and variance is also seen in this formula. As mentioned, the distinction is whether θ or E[θ^] is the "point of reference" from which the squared deviation is measured. Therefore, when the estimator's bias is 0 (i.e. E[θ^]=θ), the MSE will equal the variance; they have the same "point of reference".

Method of Moments

The sample data should ideally be similar to the presumed distribution. The method of moments applies this idea to the moments of a distribution. In short, this method determines the parameter values that cause the theoretical moments to equal the sample moments.

MLE - Special Cases

There are many handy formulas for MLE estimates when certain distributions are assumed. We begin by discussing the following distributions in the context of complete data: -Gamma -Normal -Poisson -Binomial -Negative binomial For all distributions on this list, the MLE estimate is equivalent to the method of moments estimate.

MLE - Truncated Data

Truncated data is an example of an incomplete dataset. A dataset that is left-truncated at d means the dataset does not include data below d. Thus, the likelihood of such an observation must be the probability function conditioned on being above d, i.e. f(x)/Pr(X>d) = f(x)/S(d) Left truncation is commonly seen in insurance in the form of a deductible. When the loss amount is below the deductible, the policyholder will not receive any reimbursement for it, so the loss likely will not be reported. Thus, an insurer will only record a loss when it exceeds the deductible.

Asymptotically Unbiased

Ultimately, bias reveals whether an estimator is unbiased or not. If the bias is not zero, the resulting expression could tell us something about the behavior of the bias. For example, the expression may contain the sample size n, so if lim(n→∞) Bias[θ^]=0 then θ^ is asymptotically unbiased. Intuitively, this means the bias is less of an issue when a large sample is taken.

MLE - Lognormal

When asked to estimate the lognormal parameters, we can convert the lognormal data points to normal and apply the normal shortcuts. For X∼Lognormal (μ,σ2) the MLE estimates are μ^ = ∑lnx / n σ^2=∑(lnx)^2 / n − μ^2

Sample Mean

X¯= ∑X / n

Sufficiency

YY is a sufficient statistic for θθ if and only if f( x1,...,xn ∣ y) = h(x1, ... ,xn) where h(x1,...,xn) does not depend on θ. The equation says that conditioning on the value of Y leads to a joint distribution of the sample that is unaffected by θ. Intuitively, this means knowing Y=y would thoroughly or sufficiently capture how θ impacts the sample, making it unnecessary to know each of the n observed values.


Related study sets

Clinical Judgement Through the Nursing Process & Priority-Setting Frameworks

View Set

NUR 346 PrepU: Chapter 23: Assessing Abdomen

View Set

Cold War Terms (military complex until end)

View Set

Psychology Unit 4 Exam Study Guide

View Set