MS Core Exam Day 1
[The Delta Method] Let {X_n} be a sequence of random variables such that (sqrt(n))(X_n - θ) converges in distribution to N(0, σ^2). Suppose function g is differentiable at θ and g'(θ) != 0 [the derivative of g(θ) is not equal to zero]. Then we can conclude that ...
(sqrt(n))(g(X_n) - g(θ)) converges in distribution to N(0, (σ^2)(g'(θ))^2) Note that the square root of n cannot be incorporated into the variance in the convergence of distribution statement.
Assume X_1, ..., X_n are observations from the same distribution with a mean μ and finite variance σ^2. Let xbar = (X_1 + ... + X_n)/n. The central limit theorem tells us ...
(xbar - μ)/(σ/sqrt(n)) converges in distribution to N(0, 1) as n approaches + infinity.
If A and B are mutually disjoint (P(A ∩ B) = 0), then P(A|B) = ...
0
P(empty set) = ...
0
... <= P(A) <= ... [based on rules of probability]
0 <= P(A) <= 1
P(A|A) = ...
1
P(complement of A) = ...
1 - P(A)
3 properties of joint cumulative distribution function F(x1,x2)
1) F(x1,x2) is nondecreasing in x1 and x2 2) F(-infinity,x2) = F(x1,-infinity) = F(-infinity,-infinity) = 0 3) F(+infinity,+infinity) = 1
4 properties of cumulative density functions
1) If a < b, then F(a) <= F(b). 2) The limit as x approaches negative infinity of F(x) = 0. 3) The limit as x approaches positive infinity of F(x) = 1. 4) The limit as x approaches x_0 from above (meaning x > x_0) of F(x) = F(x_0) [right-continuous].
3 axioms of probability
1) P(A) >= 0 for all events A 2) P(C) = 1 where C is the entire sample space 3) In a sample space where all events are mutually exclusive, P(A1 + ... + Ak) = P(A1) + ... + P(Ak). This is known as countable additivity.
random experiment definition (4 parts)
1) The experiment can be repeated under the same condition(s). 2) Each experiment terminates with an outcome. 3) The outcome cannot be predicted with certainty prior to the performance of the experiment. 4) The collection of all possible outcomes (sample space) can be described prior to the experiment.
[Neyman-Pearson Theorem] Let X_1, ..., X_n be a random sample from a distribution with pdf or pmf f(x; θ). The likelihood function L(θ) is the product f(x_1; θ)f(x_2; θ)...f(x_n; θ). C, a subset of the sample space, is the best critical region of size a for testing H0: θ = θ_0 vs H1: θ = θ_1 if which 3 conditions hold?
1) [L(θ_0, x)]/[L(θ_1, x)] <= k for all x in C. 2) [L(θ_0, x)]/[L(θ_1, x)] >= k for all x in the sample space but not in C. 3) C is of size a, meaning that the probability under H_0 that x exists in C is equal to a.
2 properties of joint pdf's
1) f(x1,x2) >= 0 2) integrating f(x1,x2)dx1dx2 over all possible values (negative to positive infinity for both x1 and x2) gives 1, the total probability
2 properties of joint pmf's
1) p(x1,x2) >= 0 for any real (x1,x2) 2) The probabilities must sum to 1.
The Gauss-Markov theorem states that ordinary least squares regression is the best linear unbiased estimator when which 5 conditions are met?
1) parameters have linear relationships with predictor variables 2) data comes from a random sample. 3) no correlations among predictor variables 4) error terms are independent of predictor variables 5) homoscedasticity (equal variance)
Bayes' Theorem: If C1,...,Cj,...,Cn are exhaustive (their union takes up the entire sample space) and mutually exclusive, then P(Cj|B) = ...
= P(Cj ∩ B)/P(B) = P(Cj)*P(B|Cj)/[P(C1)*P(B|C1)+...+P(Cn)*P(B|Cn)]
Statistic T_n is an unbiased estimator of θ if ...
E[T_n] = θ for any value of θ
E[E(X2|X1)] = ...
E[X2]
If V~χ^2(r) and W~χ^2(s) [chi square distributions with r and s degrees of freedom], then (V/r)/(W/s) ~ ...
F distribution with r and s degrees of freedom
True of false: E[XY] = E[X]E[Y] when X and Y are both random variables.
False, however, it is true if X and Y are independent.
Assuming X is a random variable with a finite mean μ and variance σ^2, Chebyshev's inequality tells us that for all ε > 0, ...
For all ε > 0, P(|X - μ| >= ε) <= (σ^2)/(ε^2)
How do you find P(a<X<=b) given X's cdf Fx(x)?
Fx(b)-Fx(a)
cumulative density function of X
Fx(x) = P(X<=x) where X is a discrete or continuous random variable. Note that Fx(x) should be defined for all real values, not just the domain of the function.
The hypergeometric distribution is similar to the binomial distribution, but in the hypergeometric distribution draws are done (with/without) replacement, as opposed to the binomial distribution where draws are done (with/without) replacement.
Hypergeometric: without replacement Binomial: with replacement
Fisher information about θ in X ~ f(x; θ) I(θ) = ...
I(θ) = E[(ℓ'(θ))^2] = -E[ℓ''(θ)] where ℓ'(θ) and ℓ''(θ) are the first and second derivatives of [log f(X; θ)] ***Note that the last part of the equation (-E[ℓ''(θ)]) only applies when regularity conditions R3 (ℓ is twice differentiable with respect to θ) and R4 (the first- and second-order derivatives can be taken outside, rather than inside, the integration without changing the value) are satisfied.
E[g(x)] calculation when X is continuous, given pdf fx(x)
Integrate g(x)fx(x)dx from negative to positive infinity. Integration of |g(x)|fx(x)dx must be finite for expectation to be defined. Alternatively, you can let y=g(x) and integrate yfy(y)dy from negative to positive infinity.
Marginal Moment Generating Function for x1 from the random vector (x1, x2)^T (the vector is transposed to be vertical)
M(t1) = E[e^(t1X1)] = M(t1, 0), the joint moment generating function with t2 replaced by 0
Moment Generating Function for random vector (x1, x2)
M(t1, t2) = E[e^(t1X1 + t2X2)] assuming the expectation exists for t1 and t2 values very close to 0
The mean of X is equivalent to E[...]. The variance of X is equivalent to E[...].
Mean = E[X] Variance = E[(X-E[X])^2] = E[X^2] - (E[X])^2
If X_1~N(μ_1,σ_1^2) ... Xn~N(μ_n,σ_n^2) are all independent, and a_1 ... a_n are constants, then a_1X_1 +... + a_nX_n ~ ...
N(a_1μ_1 + ... + a_nμ_n, (a_1^2)(σ_1^2) + ... + (a_n^2)(σ_n^2) In other words, a linear combination of independent normally-distributed variables is also normally distributed. Note that the constants are squared when summing the variances, so they are always nonnegative.
P(A|B) = ...
P(A ∩ B)/P(B)
P(A ∪ B) = ...
P(A) + P(B) - P(A ∩ B)
What can be concluded about P(A) and P(B) if A is a subset of B?
P(A) <= P(B)
P(X ∪ Y ∪ Z) = ...
P(X) + P(Y) + P(Z) - P(X ∩ Y) - P(Y ∩ Z) - P(X ∩ Z) + P(X ∩ Y ∩ Z) If there were more terms, + and -'s would keep alternating for intersections containing different numbers of events (add1, subtract 2, add 3, subtract 4, add 5, ...).
joint probability mass function: p(x1,x2) = ...
P(X1=x1 and X2=x2)
P(X=x) is ... for a continuous X and ... for a discrete X, given the pdf fx(x) and the pmf px(x).
P(X=x) is zero for a continuous variable, because integrating from x to x gives 0. P(X=x) = px(x) for a discrete random variable.
joint cumulative distribution function Fx_1x_2(x_1,x_2) = ...
P[{X_1<=x_1} ∩ {X_2<=x_2}]
What are permutation tests and when are they appropriate to use?
Permutation tests can be used when the null hypothesis is that there is no association between an outcome and specified groups. Parameter values are simulated under the assumption that the null hypothesis is true, by randomly assigning the observed outcomes to groups. If there is no association, then the observed parameter should be similar to ones found be randomly mixing the groups. The proportion of simulated parameters that are more extreme that the observed parameter is the p value. Note that the smallest possible p value is [1/(number of simulated parameters)]. This can cause problems if the number of simulated parameters is too small.
empirical definition of probability
Relative frequency (f/N) is the ratio between frequency, or number of times an event has occurred, and sample size, the number of times the experiment was repeated. As sample size increases and approaches infinity, the relative frequency will be approximately equal to a constant in [0,1], which is the probability of the event.
Assume X_1 and X_2 have unknown variances, but they follow a known ratio k. If k is a constant greater than 1, and it is known that σ_1 = kσ_2, then we estimate σ_2 using both group with the formula S_p = ... so that E[S_p] = σ_2. The standard error for X_1 - X_2 is ...
S_p = [(n_1 - 1)((S_1^2)/k) + (n_2 - 1)(S_2^2)]/[n_1 + n_2 - 2] SE: (S_p)sqrt(k/n_1 + 1/n_2)
What is the difference between simple and composite hypotheses?
Simple hypotheses exactly specify each parameter, for example, mu = 7. Composite hypotheses have at least one parameter not specified exactly, for example, mu > 7 or mu != 7 (not equal). Sometimes composite tests do not have a uniformly most powerful test.
Statistic T_n is an efficient estimator of θ if ...
T_n achieves the Rao-Cramer lower bound. If T_n is an unbiased estimator of θ and regularity conditions are met, achieving the RC lower bound means that var(T_n) = 1/(nI(θ)) where I(θ), the Fisher information, is negative one time the expectation of the second derivative of the log likelihood function of θ (assuming regularity conditions R0-4 are satisfied).
method of moments for point estimation
The limit as sample size n approaches infinity of the moment generating function for the sample's random variables is the population's true moment generating function. May create biased estimators.
definition of maximum likelihood estimate for θ
The point in the parameter space where L(θ;x) attains its maximum. L(θ;x) is the likelihood function, which gives the probability of sample values X_1 = x_1, ..., X_n = x_n as a function of an unknown θ.
The joint probability density function f(x1,x2) can be integrated with respect to both x1 and x2 over area A to find ...
The probability that (x1,x2) exists in area A
True of false: if θ maximizes the likelihood function L(θ), then it will also maximize the log likelihood function ℓ(θ).
True
True of false: order matters in permutations, but not in combinations.
True
True or false: The normal distribution [pdf = (1/(σ(sqrt(2π))))e^(-0.5((x-μ)/σ)^2)] is an example of the regular exponential class of distributions?
True
True or false: discrete random variables have cdfs that are step functions, while continuous random variables have cdfs that are continuous functions.
True
least squares estimation
Used with independent random variables with the same variances and the same higher-order moments, assuming the expectations of the random variables are linear functions of other variables. The squared differences between observed and expected values are minimized. Least squares estimation is considered the best linear unbiased estimator for this circumstance. Used in regression.
Var[E(X2|X1)] + E[Var(X2|X1)] = ...
Var(X2)
Assume X_1, ..., X_n are i.i.d. with a common pdf f(x; θ) and that regularity conditions R0-R4 hold. Let Y = u(X_1, ..., X_n) be a statistic with mean E(Y) = k(θ). What does the Rao-Cramer inequality tell us? What is the potential special case of the Rao-Cramer inequality?
Var(Y) >= [(k'(θ))^2]/[nI(θ)] where I(θ) is the Fisher information and k'(θ) is the first derivative of k(θ). If E(Y) = θ [meaning Y is an unbiased estimator of θ], then Var(Y) >= 1/[nI(θ)]
X_1 ~ X^2(r_1) [chi squared with r_1 degrees of freedom] X_2 ~ X^2(r_2) X_3 ~ X^2(r_3) X_1 + X_2 + X_3 ~ ...
X_1 + X_2 + X_3 ~ X^2(r_1 + r_2 + r_3)
X_1 ~ gamma(a_1, B) X_2 ~ gamma(a_2, B) X_3 ~ gamma(a_3, B) X_1 + X_2 + X_3 ~ ...
X_1 + X_2 + X_3 ~ gamma(a_1 + a_2 + a_3, B) This only works when all B values are the same.
Student's Theorem: If X1 ... Xn are independently and identically distributed such that X1, ... , Xn~N(μ,σ^2), then Xbar = (X1 + ... + Xn)/n ~ ... S^2 = (1/(n-1))((X1-Xbar)^2 + ... + (Xn-Xbar)^2) Xbar and S^2 (are/are not) independent. (n-1)(S^2)/(σ^2) ~ ... T = (Xbar - μ)/(S/sqrt(n)) follows a t distribution with how many degrees of freedom?
Xbar = (X1 + ... + Xn)/n ~ N(μ,(σ^2)/n) Xbar and S^2 ARE independent (n-1)(S^2)/(σ^2) ~ χ^2(n-1) T = (Xbar - μ)/(S/sqrt(n)) follows a t distribution with n-1 degrees of freedom
event definition (within the context of a random experiment)
a desired subset of outcomes from the sample space (often denoted with A)
sample space definition
a set containing the collection of all possible outcomes of an experiment (often denoted with C)
E[a+b(g(x))+c(h(x))] where a, b, and c are constants and g(x) and h(x) are functions
a+b(E[g(x)])+c(E[h(x)])
If 3 objects must be chosen from a set of 5, how many combinations and permutations are possible?
combinations = 5!/(2!*3!) = 4*5/(2*1) = 10 permutations = 5!/(2!) = 3*4*5 = 60
conditional pmf p(x_2|x_1) and conditional pdf f(x_2|x_1) basic structure of conditional pdf/pmf 2 properties of conditional pdf/pmf
conditional pmf = p(x_1,x_2)/p(x_1) conditional pdf = f(x_1,x_2)/f(x_1) basic structure: (joint pdf or pmf)/(marginal pdf or pmf of fixed variable) property 1) nonnegative property 2) integrates or sums to 1
How do you calculate E[X] when X is continuous given pdf fx(x)? How do you calculate E[X] when X is discrete given pmf px(x)?
continuous: integrate xfx(x)dx from negative to positive infinity. The expectation does not exist if |x|fx(x)dx integrated from negative to positive infinity is equal to infinity. discrete: sum xpx(x) over all possible x values, which is taking a weighted average of x values based on the pmf. If |x|px(x) sums to infinity, then the expectation does not exist.
Relationship between joint cumulative distribution function F(x_1,x_2) and joint pmf p(x_1,x_2) or joint pdf f(x_1,x_2)
discrete: F(x_1,x_2) = summation of p(x_1,x_2) of all values where both X_1<=x_1 and X_2<=x_2 continuous: F(x_1,x_2) = integration of f_x_1x_2(t_1,t_2)dt_2dt_1 over x_2 values from negative infinity to x_2 and over t_1 values from negative infinity to x_1
The regular exponential class of distributions has pdfs of pmfs of the form ... Which 3 conditions must be satisfied for a distribution to belong in the regular exponential class?
f(x; θ) = e^[p(θ)K(x) + S(x) + q(θ)] condition 1) The support of X does not depend on θ. condition 2) p(θ) is a nontrivial function of θ, which exists within its defined parameter space. condition 3) If X is a continuous random variable, K'(x), the derivative of K(x), cannot equal 0. Additionally, S(x) must be a continuous function of X. If X is a discrete random variable, K(x) must be a nontrivial function of X within its defined parameter space.
Statistic T_n is a consistent estimator of θ if ...
for any θ, T_n converges in probability to θ. This means that for all ε > 0, the limit as n approaches + infinity of P[|T_n - θ| >= ε] is 0.
probability density function of X is the derivative of ...
fx(x) = F'x(x), the pdf is the derivative of the cdf for a continuous random variable x, defined from negative to positive infinity
Y_1, ..., Y_n are order statistics from an i.i.d. random sample from a continuous distribution with support (a, b), pdf f(x), and cdf F(x). The joint pdf of the order statistics, g(y_1, ..., y_n) = ... The pdf for an individual order statistic that is neither first nor last, g(y_i) = ... The pdf for a pair of order statistics where j > i, g(y_i, y_k) = ...
g(y_1, ..., y_n) = n!f(y_1)f(y_2)...f(y_n) for a < y_1 < y_2 < ... < y_n < b, and 0 otherwise g(y_i) = [n!/((n - i)!(i - 1)!)](F(y_i)^(i - 1))((1 - F(y_i))^(n - i))f(y_i) for a < y_i < b and 0 otherwise g(y_i, y_k) = [n!/((n - k)!(k - i)!(i - 1)!)](F(y_i)^(i - 1))((F(y_k) - F(y_i))^(k - i))((1 - F(y_k))^(n - k))f(y_i)f(y_k) for a < y_i < y_k < b and 0 otherwise
Statistic T is considered complete if ...
if the family{f_T(t;θ) :θ∈Ω} is complete, meaning that there is only one function of T that is an unbiased estimator of 0.
given joint pdf f(x_1,x_2), the marginal pdf f(x_1) can be found by ...
integrating f(x_1,x_2)dx_2 from negative to positive infinity
permutations
k objects are chosen from a set of n possible objects and order does matter. There are n!/[(n-k)!] permutations. Note that [# of combinations]*[k!] = [# of permutations] because k! is the number of ways each combination can be ordered.
combinations
k objects are selected from n possible objects and order does not matter. There are n!/[(n-k)!*k!] combinations. Note that [# of permutations]/[k!] = [# of combinations] because k! is the number of ways each combination can be ordered.
How do you construct a 95% asymptotic CI for μ if σ is known? How does this change if σ is unknown?
known σ: (Xbar - 1.96σ/sqrt(n), Xbar + 1.96σ/sqrt(n)) unknown σ: Use S, the sample standard deviation, instead of σ. Use T distribution.
X~Beta(a,B) where a,B > 0 What are the mean and variance of X?
mean = a/(a + B) variance = aB/[((a + B)^2)(a + B + 1)]
X~Normal(μ,σ^2) What are the pdf and mgf?
pdf = (1/(σ(sqrt(2π))))e^(-0.5((x-μ)/σ)^2) mgf = e^(μt + 0.5(σ^2)(t^2))
X~continuous uniform distribution over (a,b) What is the pdf and mean?
pdf = 1/(b-a) for a<x<b, and 0 otherwise mean = (a+b)/2
X~gamma(a,B) where a > 0 and B > 0. What are the pdf, mgf, mean, and variance of X?
pdf = [B^(a)x^(a - 1)e(-Bx)] / [Γ(a)] for 0<x<+ infinity, 0 otherwise mean = a/B variance = a/(B^2) mgf = (1 - (t/B))^(-a) ***Note that sometimes the second parameter is defined as 1/B, which changes all the above information.
X~X^2(r) [chi squared distribution with r degrees of freedom] What are the pdf, mgf, mean, and variance of X?
pdf = [x^((r/2) - 1)e^(-x/2)]/[Γ(r/2)2^(r/2)] mgf = (1 - 2t)^(-r/2) mean = r variance = 2r ***the chi squared distribution is a special case of the gamma distribution
X~Exponential(λ) where λ > 0 What are the pdf, mgf, mean, and variance of X?
pdf = λe^(-λx) for 0 <= x mgf = λ/(λ - t) mean = 1/λ variance = 1/(λ^2) ***"memoryless" property
X~Binomial(n,p) What is the pmf, mean, variance, and mgf?
pmf = (n choose x)(p^x)(1-p)^(n-x) for x = 0, 1, 2 ... n and 0 otherwise mean = np variance = np(1-p) mgf = (1 - p + pe^t)^n
X~Bernoulli(p) What is the pmf, mean, variance, and mgf?
pmf = (p^x)(1-p)^(1-x) for x = 0, 1 and 0 elsewhere mean = p variance = p(1-p) mgf = 1 - p + pe^t
X~Poisson(λ) What are the pmf, mean, variance, and mgf?
pmf = (λ^x)(e^-λ)/x! for x = 0, 1, 2, ... and 0 elsewhere mean = λ variance = λ mgf = e^(λ((e^t) - 1))
X~hypergeometric(N, K, n) What is the pmf and mean? hint: N = population size, K = total successes in population, n = number of draws
pmf = P(X = k) = (K choose k)(N-K choose n-k)/(N choose n) mean = nK/N
X~Multinomial(n, p1, p2, ..., p(k-1)) What is the pmf?
pmf = P(X1=x1, X2=x2, ..., X(k-1)=x(k-1)) = (n!/(x1!x2!...x(k-1)!x(k)!))(p1^x1)(p2^x2)...(p(k-1)^x(k-1))(p(k)^(n - x1 -x1 - ... -x(k-1))
probability mass function of X
px(di) = P(X=di) for i = 1, 2, 3, ... for a discrete random variable X
[Rao-Blackwell Theorem] Let X_1, ..., X_n be a random sample from a distribution with pdf of pmf f(x; θ). Let Y_1 = u_1(X_1, ..., X_n) be a sufficient statistic for θ. Let Y_2 = u_2(X_1, ..., X_n) be an unbiased estimator of τ(θ). Then the statistic ... will be an unbiased estimator of τ(θ) with a smaller variance than Y_2. This is only useful when ...
statistic: φ(y_1) = E[Y_2|y_1] The Rao-Blackwell theorem is only useful when Y_2 is NOT a function of Y_1.
given joint pmf p(x_1,x_2), the marginal pmf p(x_1) can be found by ...
summing p(x_1,x_2) over all values of x_2 while keeping x_1 fixed
If Z~N(0,1) and V~χ^2(r) [chi square distribution with r degrees of freedom], then Z/(sqrt(V/r) ~ ...
t distribution with r degrees of freedom
Likelihood ratio test statistic for H0: θ_0 exists in Ω_0 and H1: θ_1 exists in Ω_1. Large values support ... Small values support ... What is the decision rule for alpha value a?
test statistic: Λ is equal to the maximum likelihood for all θ in Ω_0 divided by the maximum likelihood of all θ in the sample space. Large values support H0. Small values support H1. Decision rule: reject H0 for H1 if Λ <= c, where a is the highest probability under H0 of Λ being less than or equal to c.
A uniformly most powerful test is defined by using ...
the uniformly most powerful critical region of size a, which is considered by the Neyman-Pearson theorem to be the "best critical region" for testing H0 against EACH simple hypothesis contained in H1 (assuming a simple H0 and a composite H1).
Gamma Function Γ(a) = ... where a > 0
Γ(a) = the integral from 0 to + infinity of y^(a - 1)e^(-y)dy
Let X be a random variable whose distribution comes from the regular exponential class, having the pdf or pmf f(x; θ) = e^[p(θ)K(x) + S(x) + q(θ)]. If a random sample is taken from this distribution, what statistic is guaranteed to be a complete sufficient statistic for θ?
ΣK(x_i) summed from i = 1 to i = n (over the entire sample) This statistic also forms a complete family of distributions.
Approximate 95% confidence interval for difference in means of X and Y when both variances are known to be σ: ... How does this change when σ is unknown (but the variances are still assumed to be equal)?
σ is known: (xbar - ybar - 1.96σsqrt((1/n_1) + (1/n_2)), xbar - ybar + 1.96σsqrt((1/n_1) + (1/n_2))) σ is unknown: σ is estimated with S_p = [(n_1 - 1)(S_1^2) + (n_2 - 1)(S_2^2)]/[n_1 + n_2 - 2] t distribution is used complex equation for calculating v, the degrees of freedom for the t distribution
Approximate 95% confidence interval for difference in means of X and Y when σ_1 and σ_2 are known : ... How does this change when σ_1 and σ_2 are unknown? Assume unequal variances for both cases.
σ_1 and σ_2 are known : (xbar - ybar - 1.96sqrt([(σ_1^2)/n_1] + [(σ_2^2)/n2]), xbar - ybar + 1.96sqrt([(σ_1^2)/n_1] + [(σ_2^2)/n2])) σ_1 and σ_2 are unknown: Use S_1^2 and S_2^2, the sample variances, instead of σ_1^2 and σ_2^2. T distribution is used.
When calculating the confidence interval for the difference in proportions p_1 and p_2, how are the variances estimated?
σ_1 is approximated with phat_1(1 - phat_1) where phat_1 is the sample proportion 1 likewise for σ_2 the two variances are assumed to be unequal when the proportions are unequal
If Z~N(0,1), then Z^2 ~ ...
χ^2(1), a chi square distribution with 1 degree of freedom
Product moment: E[(X1^r)(X2^s)] given the moment generating function M(t1, t2) of (X1, X2)^T (transposed)
∂^(r+s) -------------- M(t1, t2) (∂t1^r)(∂t2^s) the derivative of the mgf with respect to t1 to the power of r and t2 to the power of s *note that this can be modified to find the expectation for x1^r without an x2 term by setting s equal to 0