Probability
Properties of conditional expectation
1. E[g(Y)X|Y] = g(Y)E[X|Y] 2. If h is an invertible function, then E[X|Y]=E[X|h(Y)].
Properties of correlation cofficient
1. Linear transformation of a random variable X = aX + b does not change the value of the correlation coefficient, except for a possible sign change if a is negative.
Properties of convariance
1. cov(X, Y) = E[XY] - E[X]E[Y] 2. cov(X, X) = var(X) = σ²(X) 3. cov(X, Y) = cov(Y, X) 4. cov(aX, bY) = ab*cov(X, Y) 5. cov(X + a, Y + b) = cov(X, Y) 6. cov(aX + bY, cW + dV) = ac*cov(X, W) + ad * cov(X, V) + bc * cov(Y, W) + bd * cov(Y, V)
properties of exponential distribution
1. memoryless
Properties of CDF
1. non decreasing 2. right continuous 3. lim F(x) = 0 when x→-∞ 3. lim F(x) = 1 when x→∞ Every function with these four properties is a CDF.
Axioms of Probability
1. nonnegativity axiom: P(A) ≥ 0. 2. normalization axiom: P(Ω) = 1. 3. additivity axiom: Given mutually exclusive events A₁, A₂, A₃, ... that is, where Ai ∩ Aj = Ø, for i ≠ j, A. the probability of a finite union of the events is the sum of the probabilities of the individual events, P(A₁∪A₂∪⋯∪Ak) = P(A₁) + P(A₂) + ⋯ + P(Ak) B. the probability of a countably infinite union of the events is the sum of the probabilities of the individual events, P(A₁∪A₂∪⋯) = P(A₁) + P(A₂) + ⋯
mutual independence
A finite set of events is mutually independent if and only if every event is independent of any intersection of the other events, i.e. for every n-element subset {Ai}, P(∩Ai) = ΠP(Ai)
pairwise independence
A finite set of events {Ai} is pairwise independent if and only if every pair of events is independent, i.e. for all distinct pairs of indices m, k P(Am∩Ak) = P(Am)P(Ak)
probability mass function (PMF)
A function that gives the probability that a discrete random variable is exactly equal to some value. Suppose that X: S → A (A ⊆ R) is a discrete random variable defined on a sample space S. Then the probability mass function fX: A → [0, 1] for X is defined as fX(x) = Pr(X=x) = Pr({s ∈ S : X(s)=x})
Probability density function (PDF)
A function that specifies the probability of the random variable falling within a particular range of values, as opposed to taking on any one value.
Probability distribution
A mathematical function that provides the probability of occurrence of different possible outcomes in an experiment. Probability distributions are generally divided into two classes: discrete probability distributions and continuous probability distributions.
Discrete probability distribution
A probability distribution characterized by a probability mass function. Poisson distribution, Bernoulli distribution, Binomial distribution, Geometric distribution, Negative binomial distribution Discrete uniform distribution
Continuous probability distribution
A probability distribution that has a cumulative distribution function that is continuous. Most often they are generated by having a probability density function.
Probability Models
A probability model is a mathematical representation of a random phenomenon. It is defined by its sample space, events within the sample space, and probabilities associated with each event.
Binomial experiment
A random experiment which consists of a fixed number n of statistically independent Bernoulli trials, each with a probability of success p, and counts the number of successes.
Bernoulli trial
A random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is conducted. Named after Jacob Bernoulli. Given any probability space, for any event (set of outcomes), one can define a Bernoulli trial, corresponding to whether the event occurred or not (event or complementary event).
random variable
A random variable X : Ω→R is a function from the set of possible outcomes of sample space Ω to real numbers R. It describes some numerical property that outcomes in Ω may have, such as the height of a random person. The probability that X takes value ≤ 3 is the probability of the set of outcomes {ω ∈ Ω : X(ω) ≤ 3}, denoted P(X ≤ 3). The real number X(ω) associated to a sample point ω in Ω is called a realization of the random variable. The set of all possible realizations is called support.
Exponential random variable
A random variable X describing the time between events in a Poisson process, i.e. a process in which events occur continuously and independently at a constant average rate. Denoted by X ∼ exp(λ) The continuous analogue of the geometric random variable.
Memorylessness
A random variable X is memoryless if for all numbers a and b in its range, we have P(X > a + b | X > b) = P(X > a). If the range of X is [0, ∞), then X must be exponential. Similarly, if the range of X is {0, 1, 2, ...}, then X must be geometric. It usually refers to the cases when the distribution of a "waiting time" until a certain event does not depend on how much time has elapsed already.
Indicator random variable
A random variable associated with the occurrence of an event. It has a value of 1 if the event occurs and 0 otherwise. Useful in translating a manipulation of events to a manipulation of random variables.
Binomial random variable
A random variable describing the binomial experiment and is denoted by X ∼ B(n, p). It is said to have a binomial distribution.
discrete uniform random variable
A random variable with a discrete uniform distribution, i.e. each of the n values in its range, say x1, x2, . . . , xn, has equal probability. ƒ(xi) = 1/n where ƒ(x) is the probability mass function. Used to model cases where we have a range of possible values, and we have complete ignorance, no reason to believe that one value is more likely than the other.
Expected Value (expectation)
Also known as the expectation, average, mean value, mean, or first moment. The expected value of a random variable is the integral of the random variable with respect to its probability measure, or the probability-weighted average of all possible values. Intuitively, it is the long-run average value of repetitions of the experiment it represents.
Geometric random variable
Consider an experiment which consists of repeating independent Bernoulli trials until a success is obtained. Assume that probability of success in each independent trial is p. The geometric random variable, denoted by X ∼ geo(p), counts the number of attempts needed to obtain the first success.
Independent random variables
Consider n discrete random variables X₁, X₂, X₃,...,Xn. We say that X₁, X₂, X₃,...,Xn are independent if P(X₁ = x1, X₂ = x2, ... Xn = xn) = P(X₁ = x1)P(X₂ = x2)...P(Xn = xn) i.e. joint probability equals to the product of the marginal probabilities. For continuous random variables, the joint probability density function equals to the product of the marginal pdfs. Intuitively, two random variables X and Y are independent if knowing the value of one of them does not change the probabilities for the other one, thus the conditional probability is the same as the unconditional probability, P(Y=y | X=x) = P(Y=y), for all x,y.
Covariance
Cov(X, Y) = E[(X - E[X])(Y - E[Y])] covariance measures if two random variables deviate from their means in a coordinated way. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the lesser values, i.e., the variables tend to show similar behavior, the covariance is positive. Sometimes denoted by σ, in analogy to variance.
Law of iterated expectations
E[E[X|Y]] = E[X]
Expectation of independent random variables
E[X*Y] = E[X] * E[Y] E[g(X)*h(Y)] = E[g(X)] * E[h(Y)]
Expected value of discrete random variable
E[X] = ∑xi * p(xi)
Expected value of continuous random variable
E[X] = ∫x*f(x)dx
Linearity of expectations
E[a*X + b*Y + c] = a*E[X] + b*E[Y] + c
Boole's inequality (union bound)
For any finite or countable set of events, the probability that at least one of the events happens is no greater than the sum of the probabilities of the individual events. For a countable set of events A₁, A₂, A₃, ..., P(A₁∪A₂∪A₃∪...) ≤ ∑P(Ai)
Joint probability distribution
Given at least two random variables X, Y, ..., that are defined on a probability space, the joint probability distribution for X, Y, ... is a probability distribution that gives the probability that each of X, Y, ... falls in any particular range or discrete set of values specified for that variable. In the case of only two random variables, this is called a bivariate distribution, but the concept generalizes to any number of random variables, giving a multivariate distribution. The joint probability distribution can be expressed either in terms of a joint cumulative distribution function or in terms of a joint probability density function (in the case of continuous variables) or joint probability mass function (in the case of discrete variables).
Coupon collector's problem
Given n coupons, how many coupons do you expect you need to draw with replacement before having drawn each coupon at least once? E(T) = n/n + n/(n-1) + ... + n/1 = n(1 + 1/2 + 1/3 +... + 1/n) = n*Hn where Hn is the n-th harmonic number.
Find the PDF of a function of a random variable
Given random variable X with a PDF f_X, and Y = g(X), to find the PDF of Y f_Y, 1. Find the CDF of Y 2. Take the derivative of CDF of Y, which gives the PDF of Y. If g is monotonic, then 1. F_Y(y) = P(Y <= y) = P(g(X) ≤ y) = F_X(X ≤ g⁻¹(y)) 2. Take the derivative of CDF F_Y(y) f_Y(y) = f_X(g⁻¹(y)) * d(g⁻¹(y))/dy.
Conditional probability distribution
Given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value.
Law of total expectation
If A₁, A₂,...,An is a partition of the whole outcome space, then E[X] = ∑E[X | Ai]*P(Ai)
Law of total variance
If X and Y are random variables on the same probability space, and the variance of Y is finite, then Var(Y) = E[Var(Y|X)] + Var(E[Y|X]) We can think of different values of X as dividing Y into different groups, Var(Y|X) would be the variability within a group, and E[Var(Y|X)] would be the average of the within-group variability. Var(E[Y|X]) would be the variability between groups.
Properties of normal random variable
If X∼N(µ, σ²), then aX + b ∼ N(aµ + b, a²σ²)
continuous random variable
If the image of a random variable X is uncountably infinite then X is called a continuous random variable. In the special case that it is absolutely continuous, its distribution can be described by a probability density function, which assigns probabilities to intervals; in particular, each individual point must necessarily have probability zero for an absolutely continuous random variable.
Bernoulli random variable
If the random variable X has the following distribution P(X = 1) = p P(X = 0) = 1 − p for some 0 < p < 1, then X is called a Bernoulli random variable and we write X ∼ Ber(p) Used to model Bernoulli trials. Used as indicator random variable.
Law of total probability
If {Bn : n=1,2,3,...} is a finite or countably infinite partition of a sample space and each event Bn is measurable, then for any event A of the same probability space: Pr(A) =∑Pr(A ∩ Bn) or, alternatively, Pr(A) = ∑Pr(A | Bn)Pr(Bn)
Maximum a posteriori estimation
In Bayesian statistics, a maximum a posteriori probability (MAP) estimate is an estimate of an unknown quantity, that equals the mode of the posterior distribution.
Birthday problem
In a set of n randomly chosen people, what is the probability that some pair of them will have the same birthday? More general, Given n random integers drawn from a discrete uniform distribution with range [1,d], what is the probability p(n;d) that at least two numbers are the same?
Conditional probability density function
Let (X, Y ) be a continuous bivariate random vector with joint pdf f(x, y) and marginal pdfs fX(x) and fY (y). For any x such that fX(x) > 0, the conditional pdf of Y given that X = x is the function of y denoted by f(y|x) and defined by f(y|x) = f(x, y)/fX(x)
Sum of independent normal random variables
Let X and Y be two independent normal random variables, X∼N(µ₁, σ₁²) and Y∼N(µ₂, σ₂²), then Z = X + Y ∼N(µ₁+µ₂, σ₁²+σ₂²)
Sum of independent random variables
Let X and Y be two independent random variables with density functions f_X(x) and f_Y (y) defined for all x. Then the sum Z = X + Y is a random variable with density function f_Z(z), where f_Z is the convolution of f_X and f_Y.
Marginal probability mass function
Let X₁, X₂, ..., Xk be K discrete random variables forming a Kx1 random vector. Then, for each i = 1,... ,K, the probability mass function of the random variable X_i, is called marginal probability mass function.
Conditional probability
P(A|B) : the conditional probability of A given B Formally, P(A|B) is defined as the probability of A according to a new probability function on the sample space, such that outcomes not in B have probability 0 and that it is consistent with all original probability measures.
Multiplication Rule
P(A∩B) = P(A)P(B|A) = P(B)P(A|B)
Poisson random variable
Suppose an event has a small probability of occurring and a large number of independent trials take place. Suppose further that you know the average number of occurrences µ over a period of time. Then the Poisson random variable, denoted X ~ Poi(µ), counts the total number of occurrences during a given time period.
Cumulative distribution function (CDF)
The cumulative distribution function of a real-valued random variable X is the function given by F(x) = P(X ≤ x) It is conventional to use a capital F for a cumulative distribution function, in contrast to the lower-case f used for probability density functions and probability mass functions.
Sample Space
The sample space of an experiment or random trial is the set of all possible outcomes or results of that experiment. A sample space is usually denoted using set notation, and the possible outcomes are listed as elements in the set. It is common to refer to a sample space by the labels S, Ω, or U (for "universal set").
Variance
The variance of a random variable X is the expected value of the squared deviation from the mean of X, µ = E[X]: Var(X) = E[(X - µ)²] Often denoted as Var(X), σ², or s²
Independence of two events
Two events A and B are independent (written as A⊥B) if their joint probability equals the product of their probabilities: P(A∩B) = P(A)P(B) Intuitively, the occurrence of one does not affect the probability of occurrence of other.
Conditional independence
Two events R and B are conditionally independent given Y if and only if, given knowledge that Y occurs, knowledge of whether R occurs provides no information on the likelihood of B occurring, and knowledge of whether B occurs provides no information on the likelihood of R occurring. P(R∩B|Y) = P(R|Y)P(B|Y) Independence neither implies nor is implied by conditional independence.
Properties of variance
Var(aX + b) = a²Var(X) Var(X) = E(X²) - (E[X])² (mean of square minus square of mean) Var(X + Y) = Var(X) + Var(Y) if X, Y are independent
discrete random variable
When the image (or range) of a random variable X is finite or countably infinite, X is called a discrete random variable, and its distribution can be described by a probability mass function which assigns a probability to each value in the image of X.
Conditional expectation
denoted as E(X | Y), the conditional expectation of a random variable X is another random variable equal to the average of X over each possible "condition" (Y). In the case when the random variable is defined over a discrete probability space, the "conditions" are a partition of this probability space. E(X | Y) : ω → E(X | Y = Y(ω))
Correlation cofficient
is a measure of the linear dependence (correlation) between two variables X and Y. It has a value between +1 and −1 inclusive, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation.
Properties of geometric random variable
memoryless
Variance of the sum of random variables
var(X₁ + X₂) = var(X₁) + var(X₂) + 2cov(X₁, X₂)