Chapter 1: Review of Probability
Proposition 1.1.7 (Inclusion-Exclusion)
For any finite collection {A1, ..., An} ⊂F of sets in a measure space (Ω, F, P), we have ... where all the ik all like in {1, ..., n} See figure 1.1 for n=3 case (pg. 7)
Proposition 1.1.5: Discrete Probability space properties
Let (Ω, F, P) be a discrete probability space. If E, F∈ 2^Ω and E^c = Ω\E, then (i) P(E^c) = 1- P(E), and, in particular, P(θ) = 0, (ii) E⊂F implies P(E) ≤ P(F), and (iii) P(E∪F) = P(E) + P(F) - P(E∩F) (pg. 6)
Normal distribution
Support: R^n e.g.: Height distribution of males in a population (pg. 34)
Definition 1.2.6: Independent Events
Two events E, F in a discrete probability space are independent if P(E∩F) = P(E)P(F) Note that if the outcome of E has no impact on the probability of F, then it also has no impact on the probability on the compliment of F (pg. 11)
Definition 13.7: Independent random variables
Two univariate random variables X and Y on a discrete probability space are independent if their joint p.m.f. factors as the product of the marginals: gX,Y(x, y) = gX(x)gY(y) for all x and y (pg. 16)
Student's distribution
Use when sampling from a normal distribution and sample sizes are small (to avoid skewing distribution to outliers) When v=1, this is called the Cauchy distribution (pg. 39)
Properties of Expected value
(i) Expectation is a linear operator on the space of random variables: E[αX + βY] = αE[X] + βE[Y] (ii) Law of unconscious statistician: If X:Ω→R^d is a discrete random variable, and h:R^d → R^m is any function, then h⁰X is a discrete random variable, and the expected value of h(X) is given by E[h(X)] = Sum[h(x)P(X=x), x] = Sum[h(x) gX(x), x] (iii) Factorization of Expectation for Independent Random Variables: For any independent random variables X and Y on a discrete probability space Ω, then product XY is also a random variable, defined by XY(w) = X(w)Y(w), with the expectation E[XY] = E[X]E[Y] (pg. 17)
Covariance matrix
(pg. 18)
Binomial Distribution
(pg. 20)
Finite uniform distribution
(pg. 20)
Categorical Distribution
(pg. 21)
Multinomial Distribution
(pg. 23)
Poisson Distribution
(pg. 24)
Negative Binomial Distribution
(pg. 25)
Hypergeometric Distribution
(pg. 27)
List the notation, pmf, support, expectation, variance, etc. for each of the discrete probability distributions
(pg. 29)
Jensen's Inequality (All three forms)
(pg. 41-42)
Markov's Inequality
(pg. 42)
Chebyshiv's Inequality
(pg. 43)
Weak Law of Large Numbers
(pg. 44)
What is a sigma algebra σ-algebra?
A collection of subsets F⊂2^Ω is called a σ-algebra if it satisfies a) ∅∈F b) If E∈F, then F^c ∈ F for all E∈F (closed under complements) c) If E1,...∈F then Union[Ei, {i, 0, ∞}]∈F (
Definition 1.6.2: Continuous random variable, multivariate random variable. Cumulative distribution function
A function X:Ω→R on a probability space (Ω, F, P), is a univariate random variable if X⁻¹((-∞, x])∈F for every x∈R. A function Y:Ω→R^n with Y=(Y1, ..., Yn) is a multivariate random variable if every coordinate function Yi:Ω→R is a univariate random variable. The cumulative distribution function (c.d.f.) of a univariate random variable X is the function Fx:R→[0, 1] given by Fx(a) = P(X≤a) = P(X⁻¹(-∞, a]). For a random variable Y = (Y1, ..., Yn) on a general probability space (Ω, F, P), the function Fy(y) = P(Y≤y) = P(Y1≤y1,..., Yn≤yn) is called the joint cummulative distribution function of the random variables Y1, ..., Yn or just the cumulative distribution function (c.d.f.) of Y.
Dirichlet distribution
A multivariate generalization of the beta distribution (pg. 39)
Definition 1.6.4: probability density function pdf
A random variable X has a continuous distribution if the partial derivative δ^n/δx1...δxn(Fx(x)) exists and is continuous on the support of X. Denote this derivative by fx. The function fx:R^n→R is called the joint probability density function for the random variables X1,...,Xn and Fx or the probability density function (p.d.f.) for the multivariate random variable X = (X1, ..., Xn).
Proposition 1.1.6: Counting measure. A way to create a discrete probability measure from any finite set. Equally likely outcomes
Assume Ω is a finite set. Define a map P: 2^Ω → [0, 1] by P(E) =|E|/|Ω| for every E∈Ω. This map is a discrete probability measure; that is, it satisfies the conditions of Definition 1.1.3 (pg. 6)
Three key properties of conditional probability
Chain rule, the law of total probability, and Bayes rule (pg. 10)
Definition 1.1.3: Discrete Probability Measure, Discrete Probability space
Consider a countable sample space Ω, and let F=2^Ω be the power set of Ω. A function P:F→[0, 1] is called a discrete probability measure whenever the following conditions hold: i) P(Ω) = 1 ii) Additivity: If {Ei}i∈I ⊂ F is a collection of mutually exclusive events, indexed by a countable set I, then P(Union[Ei, i∈I]) = Sum[P(Ei), i∈I] In this case, the triple (Ω, F, P) is called a discrete probability space. We say that an event E∈F occurs with probability P(E). In the case that E = {w} is a singleton set, it is common to write P(w) instead of P({w}). (pg. 5)
Discrete Probability Space
Consists of a countable sample space Ω, corresponding to all possible outcomes of an experiment, the collection F=2^Ω of all events, and a probability measure P:F→[0, 1] (pg. 5)
Proposition 1.3.12: How to write variance and covariance in terms two expectation terms
Cov(X, Y) = E[XY] - E[X]E[Y] Var(X) = E[X²] - E[X]² (pg. 18)
Gamma distribution
Describes the waiting time for at least a>0 events to occur in a homogeneous Poisson process of rate b>0. pdf: fX(t) = b^a/Gamma[a] t^(a-1)Exp(-tb) mean: a/b variance: a/b² (pg. 36)
Definition 1.3.3: Probability mass function, joint probability mass function
If (Ω, F, P) is a discrete probability space and X is a multivariate random variable on Ω, then the function gX(x) = P(X=x) is called the probability mass function of X. If X=(X1, ..., Xd) is multivariate, then gX(x) is often called the joint probability mass function of the univariate random variables X1,...,Xn (pg. 15)
Proposition 1.2.7: Independent events, effect on compliments
If E and F are independent events, then E^c and F are also independent (pg. 11)
Proposition 1.3.6
If X = (X1, ..., Xn) is a random variable with probability density function fX, then for for each i∈{1, ..., n} the marginal probability mass function gi is the probability mass function of Xi: gi = gX (pg. 16)
Definition 1.3.5: Marginal probability mass function
If X is a discrete multivariate random variable with probability mass function gX(x), then the marginal probability mass function gi(a) is the sum of the joint p.m.f. over all values of x with the ith coordinate equal to a: gi(a) = Sum[gX(x), x:xi=a] (pg. 16)
Definition 1.1.1: Mutually exclusive, collectively exhaustive
If a collection of events {Ei}i∈I is pairwise disjoint, meaning that Ei∩Ej = ∅ whenever i≠j, then we say the sets are mutually exclusive. If the union Union[Ei, i∈I] of all events is the entire sample space Ω, then we say that the events Ei are collectively exhaustive (pg. 5)
Remark 1.1.2
If the subsets Ei are all nonempty, then saying they are mutually exclusive and collectively exhaustive is another way of saying that they form a partition of Ω (pg. 5)
Proposition 1.2.2: Chain rule
If {Ei}i=1^n are events in a probability space (Ω, F, P) with P(E₁, ..., En-1) > 0, then P(E1, ..., En) = Product[P(Ei|E1, ..., Ei-1), i=1, n] (pg. 10)
Proposition 1.2.3: Law of Total Probability
If {Ei}i∈I is a countable collection of mutually exclusive and collectively exhaustive events in a probability space (Ω, F, P), then for any event f∈F, we have P(f) = Sum[P(f|Ei)P(Ei), i∈I] Here we use the convention that P(f|Ei)P(Ei) = 0 whenever P(Ei) = 0, even though P(F|Ei) is undefined in that case The law of total probability allows us to compute the probability of an event by first partioning the sample space into several other events, and then condition on those events. (pg. 10)
Remark 1.1.4
It is common to write P(E, F) instead of P(E∩F) to indicate the probability that both E and F occur. (pg. 6)
Definition 1.3.1: Discrete Random Variable
Let (Ω, F, P) be a discrete probability space. Any function X: Ω→R^d is called a discrete random variable or a random variable on (Ω, F, P). It is common to denote random variables by capital letters. When n=1, we often call the random variable univariate. If n=2, we often call the random variable bivariate, and whenever n>1 we often call the random variable multivariate (pg. 15)
Theorem 1.2.4: Bayes' Rule
Let (Ω, F, P) be a probability space, and let E, G ∈ F with P(E) > 0 and P(G) > 0. We have P(E|G) = P(G|E)P(E)/P(G) Moreover if {Ei}i=1^n is a collection of mutually exclusive and collectively exhaustive subsets of Ω with P(Ej) > 0 for each j, then for any choice of i we have P(Ei|G) = P(G|Ei)P(Ei)/(...Law of total probability formula) Bayes' rule allows us to compute the condional probability P(A|B) using P(B|A) (pg. 11)
Definition 1.2.8: What does it mean for a collection of events to be independent?
Let (Ω, F, P) be a probability space. A collection L = {Ei}i∈I of events is independent if for every finite subcollection {Eik}k=1^m of L, we have see 1.8 (pg. 12)
Definition 1.2.1: Conditional Probability
Let A and B be events in a discrete probability space (Ω, F, P), and assume that P(B) > 0. The probability of A occurring, given that B occurs, denoted P(A|B), is written as P(A|B) = P(A∩B)/P(B) Alternatively we say that the left side is the probability of A conditioned on B. If P(B) = 0, the P(A|B) is undefined. (pg. 10)
Definition 1.3.10: Indicator Random Variable
Let E be any event in a probability space (Ω, F, P) The function IndE:Ω→{0, 1}, given by IndE(w) = 1 if w∈E, else 0 is called the indicator random variable of E For any E⊂Ω the indicator random variable IndE has expected value equal to the probability of E: E[IndE] = P(E) This is sometimes called the fundamental bridge between expectation and probability.(pg. 17)
Definition 1.3.8: Expected value of a discrete random variable
Let X be a random variable on a discrete probability space. The expectation (or expected value) of X is given by E[X] = Sum[X(w)P(w), x∈Ω] provided the sum converges absolutely. If the sum does not converge absolutely, the expected value does not exist. It is immediate from the definition that E[X] = Sum[x P(X=x), x] = Sum[x gx(x), x] where the sums run over all values x in the image of X (pg. 16)
Definition 1.3.11: Variance of a random variable. Standard Deviation
Let X be a univariate random variable with E[X] = µ. The variance of X is the quantity Var(X) = E[(X-µ)²] provided the expectation is define (absolutely convergent). The standard deviation is the square root of the variance. (pg. 18)
Definition 1.6.1: Probability Measure; Probability space (Continuous case)
Let Ω be a set, and let F⊂2^Ω be a collection containing Ω that is closed under complements and countable unions. A function P:F→[0, 1] is a probability measure on (Ω, F) if P(Ω)=1 and countable additivity (1.1 pg. 6) holds. In this case, the triple(Ω, F, P) is called a probability space (pg. 29)
Bernoulli Distribution
Notation: X~Bernoulli(p) Support: {0, 1} gX(x) = p^x(1-p)^(1-p) = p if x=1, (1-p) if x=0 E[X] = p Var(X) = Example: A free throw; getting heads on a coin flip (pg. 19)
Proposition 1.6.3: The cdf Fx(x) of any univariate random variable X:Ω→R is nondecreasing and satisfies lim[Fx(x), {x→∞}] = 1 and lim[Fx(x), {x→-∞}] = 0
See exercise 1.37 for proof (pg. 30)
T/F: If Cov(X, Y) = 0, then X and Y are independent random variables
This is FALSE. See nota bene 1.3.15 (pg. 19)
Uniform distribution
This is the continuous version of equally likely outcomes p.d.f.: fx(x) = {1/λ(A) if x∈A, 0 otherwise If A = [a, b]⊂R, then mean: (a+b)/2 variance: (b-a)²/12 (pg. 34)
Beta distribution
pdf: fx(x) Gamma(a+b)/(Gamma(a)Gamma(b))x^(a-1)(1-x)^(b-1), a, b > 0 Mean: a/(a+b) Variance: ab/((a+b)²(a+b+1)) Given independent random varaiables A and B having gamma distributions with parameters α, θ, β, θ respectively, the random variable Z = A/(A+B) has beta distribution Beta(α, β) (pg. 37)
