Probability: Midterm 2
Tail sum formula for expectation of a counting variable
For a random variable X with possible values {0, 1, ..., n}: E[X] = ∑(j = 1 to n) P(X ≥ j)
Which is typically better: Markov's inequality or Chebyshev's inequality?
Generally speaking, Chebyshev's inequality will produce tighter bounds than Markov's inequality, since it uses the additional information of the standard deviation.
Expectation of a function of a discrete random variable
IF X is a discrete random variable and g(X) is a function of X, then E[g(X)] = ∑ₓ g(x)P(X = x)
Expectation of an indicator
If Iₐ is an indicator for event A, then E[Iₐ] = P(A)
Multiplication rule for expectation
If X and Y are independent random variables, then E[XY] = E[X]E[Y]
Linearity of variance
If X and Y are independent random variables, then Var(X + Y) = Var(X) + Var(Y)
Binomial random variable
If X follows a binomial distribution with probability p of success for any given trial, then the probability that there are k successes in n trials is given by: P(X = k) = C(n, k)⋅pᵏ(1-p)ⁿ⁻ᵏ
Population parameters of a binomial distribution
If X follows a binomial distribution with probability p of success, probability q = 1 - p of failure, and n total trials, then: E[X] = µ = np Var(X) = σ² = npq SD(X) = σ = √(npq)
Population parameters of the Poisson distribution
If X is a Poisson random variable with parameter λ, then: E[X] = µ = λ Var(X) = σ² = λ SD(X) = σ = √(λ)
Expectation of a continuous random variable
If X is a continuous random variable and probability density function f(x), then E[X] = ∫(-∞ to ∞) x⋅f(x)dx
Variance of a continuous random variable
If X is a continuous random variable with mean µ and probability density function f(x), then Var(X) = ∫(-∞ to ∞) (x - µ)⋅f(x)dx
Population parameters of a geometric distribution
If X is a geometric random variable with probability p of success and probability q = 1 - p of failure, then: E[X] = µ = 1/p Var(X) = σ² = q / p² SD(X) = σ = √(q) / p
P(X > n) and P(X ≥ n) for a geometric random variable
If X is a geometric random variable with probability p of success, the chance that the first success happens after the nth trial is equal to the chance that the first n trials were all failures: P(X > n) = (1 - p)ⁿ P(X ≥ n) = (1 - p)ⁿ⁻¹
Population parameters of a negative binomial distribution
If X is a negative binomial random variable with probability p of success, probability q = 1 - p of failure, and r successes, then: E[X] = µ = r/p Var(X) = σ² = rq / p² SD(X) = σ = √(rq) / p
Corollary to Markov's inequality
If X is a nonnegative random variable (X ≥ 0), then P(X < a) ≥ 1 - (E[X] / a) for every a > 0
Expectation of a linear function of a random variable
If X is a random variable and a, b are constants, then E[aX + b] = aE[X] + b, so for any linear function g(X) it follows that E[g(X)] = g(E[X])
Normal random variable
A continuous random variable X is said to be normal if it has a probability density function of the form f(x) = 1/(σ√(2π))⋅e^(-0.5⋅((x - µ) / σ)²) where µ and σ are two scalar parameters characterizing the PDF, with σ assumed nonnegative. µ specifies the mean of the normal distribution, and σ specifies the standard deviation of the normal distribution
Probability density function
A function f(x) satisfying: 1) f(x) ≥ 0 2) f(x) is continuous 3) ∫(-∞ to ∞) f(x)dx = 1 (sum of the area under f(x) is 1) is called a probability density function if, for some random variable X, P(a ≤ X ≤ b) is given by ∫(a to b) f(x)dx
Relationship between geometric random variables and negative binomial random variables
A geometric random variable is just a negative binomial random variable with parameters (r=1, p=p).
Negative binomial experiment
A negative binomial experiment satisfies: 1) The experiment consists of a sequence of independent trials. 2) Each trial can result in either a success (S) or a failure (F). 3) The probability of success is constant from trial to trial, so for i = 1, 2, 3, ... 4) The experiment continues (trials are performed) until a total of r successes have been observed, where r is a specified positive integer. 5) The random variable of interest is X = the number of trials required to achieve r successes (the number of successes is fixed and the number of trials is random)
Indicator random variable
A random variable Iₐ is an indicator random variable for event A if Iₐ = 1 if A occurs Iₐ = 0 if A does not occur (if Aᶜ occurs)
Negative binomial random variable
A random variable X is a negative binomial RV if it represents the number of trials required to achieve r successes. If p is the probability of success for any trial, then the probability that it takes n trials to achieve r successes is given by: P(X = n) = C(n - 1, r - 1)⋅pʳ(1-p)ⁿ⁻ʳ
Continuous random variable
A random variable X is continuous if there is a function f(x) such that for any a ≤ b we have P(a ≤ X ≤ b) = ∫(a to b) f(x)dx
Geometric random variable
A random variable X is geometric if it represents the number of trials it takes to get a success in a geometric setting (possible values are 1, 2, 3, ...) If p is the probability of success for any trial, then the probability that it takes n trials to achieve the first success is given by: P(X = n) = (1 - p)ⁿ⁻¹p
Uniform random variable
A random variable X is said to be uniformly distributed over the interval (α, β) if its probability density function is given by f(x) = 1 / (β - α) if α < x < β f(x) = 0 otherwise
Method of indicators
A random variable X that counts the number of events of some kind that occur can be represented as the sum of the indicators of these events. Then, E[X] = E[∑ₓIₓ] = ∑ₓE[Iₓ] = ∑ₓP(X = x)
Poisson random variable
A random variable X that takes on one of the values 0, 1, 2, ... is said to be a Poisson random variable with parameter λ if, for some λ > 0: P(X = k) = e^(-λ)⋅λᵏ/k! λ represents the average number of events that can occur (if approximating a binomial distribution, λ = np) P(X = k) represents the probability of k events occurring in a fixed interval of time.
Discrete random variable (informal definition)
A random variable that can take on at most a countable number of possible values
Random variable (informal definition)
A variable that assumes numerical values associated with the random outcome of an experiment, where one (and only one) numerical value is assigned to each sample point.
Relationship between binomial and normal distributions
As the number of observations n becomes larger, the binomial distribution gets close to a normal distribution.
Law of large numbers
As the number of trials in a probability experiment increases, the difference between the theoretical probability of an event and the relative frequency approaches zero
Boole's inequality for counting random variables
If X is a counting random variable, then P(X ≥ 1) ≤ E[X]
Expectation of a discrete random variable
If X is a discrete random variable, the expectation of X, denoted E[X], is the mean of the distribution of X. E[X] = ∑ₓ x⋅P(X = x)
Markov's inequality
If X is a nonnegative random variable (X ≥ 0), then P(X ≥ a) ≤ E[X] / a for every a > 0
Population parameters of the normal distribution
If X is a normal random variable with parameters µ and σ, then: E[X] = µ Var(X) = σ² SD(X) = σ
Standard deviation of scaling/shifting
If X is a random variable and a, b are constants, then SD(aX + b) = |a|SD(X)
Variance of scaling/shifting
If X is a random variable and a, b are constants, then Var(aX + b) = a²Var(X)
Bound on one tail (Chebyshev's inequality)
If X is a random variable with expectation µ and standard deviation σ, then for all k > 0: P(X - µ ≥ k) ≤ P(|X - µ| ≥ k) ≤ σ² / k²
Chebyshev's inequality
If X is a random variable with expectation µ and standard deviation σ, then for all k > 0: P(|X - µ| ≥ k) ≤ σ² / k² = Var(X) / k² Equivalently, P(|X - µ| ≥ kσ) ≤ 1 / k² This inequality finds the bound on both tails in X's distribution
Variance
If X is a random variable, the variance of X, denoted Var(X) or σ², is the mean squared deviation of X from its expected value µ = E(X): Var(X) = σ² = E[(X - µ)²] = ∑ₓ (x - µ)²P(X = x) Alternatively, Var(X) = E[X²] - E[X]² = E[X²] - µ² = ∑ₓ(x²P(X = x)) - µ²
Standard deviation
If X is a random variable, then the standard deviation of X, denoted SD(X), is the square root of the variance of X: SD(X) = σ = ƒ√(Var(X))
Population parameters of the uniform distribution
If X is a uniform random variable over the interval (α, β), then: E[X] = µ = (β + α) / 2 Var(X) = σ² = (β - α)² / 12 SD(X) = σ = (β - α) / √(12)
Standardization of a normal random variable
If X is normally distributed with mean µ and SD σ, then Z = (X - µ) / σ has the standard normal distribution
Rule of three sigmas for a normal random variable
If X is normally distributed, then P(µ - 3σ < X < µ + 3σ) ≈ 1
Discrete random variable (formal definition)
If Ω is an outcome space, then a discrete random variable is a function X : Ω → ℝ that takes a discrete set of values and returns the probability that each value is chosen.
Weak law of large numbers
Let X₁, ..., Xₙ be a sequence of independent and identically distributed random variables, each having finite mean E[Xᵢ] = μ. Then, for any ε > 0, P(|(X₁ + ... + Xₙ)/n - µ| ≥ ε) → 0 as n → ∞
P(X = a) if X is a continuous random variable
P(X = a) = ∫(a to a) f(x)dx = 0
Geometric setting
The goal is to repeat a chance behavior until a success occurs. A setting is geometric if: B (binary?): The possible outcomes of each trial are binary (success/failure) I (independent?): Trials are independent T (trials?): The goal is to count the number of trials until the first success occurs S (success?): On each trial, the probability of success is the same
Standard normal distribution
The standard normal distribution is the normal distribution with mean 0 and standard deviation 1.
Infinite series expansion for e^a
e^a = ∑(k = 0 to ∞) a^k / k!
Binomial experiment
A binomial experiment is an experiment that satisfies: 1) Each trial results in one of two mutually exclusive outcomes (success/failure). 2) There are a fixed number of trials. 3) Outcomes of different trials are independent. 4) The probability that a trial results in success is the same for all trials.
Examples of non-binomial experiments
1) Deal 10 cards from a shuffled deck and count the number of red cards (not a binomial experiment since the probability does not remain constant for each trial) 2) Two parents with genes for O and A blood types and count the number of children with blood type O (not a binomial experiment since there is no fixed number of kids/trials) 3) You roll a die 10 times and note the number the die lands on (not a binomial experiment since there are more than 2 possible outcomes)
Properties of the normal distribution
1) The mean, median, and mode are equal. 2) The normal curve is bell-shaped and symmetric about the mean. 3) The total area under the curve is equal to one. 4) The normal curve approaches, but never touches the x-axis as it extends further and further away from the mean. 5) The inflection points are μ − σ and μ + σ (concave down between these points, and concave up outside of this range)
Differences between binomial and geometric distributions
1) There is NOT a fixed number of trials in geometric distributions! 2) Binomial random variables start with 0 while geometric random variables start with 1 3) Binomial distributions are finite, while geometric distributions are infinite.
Examples of binomial experiments
1) Toss a coin 10 times and count the number of heads. 2) You randomly select a card from a deck of cards, and note if the card is an Ace. You then put the card back and repeat this process 8 times.
Calculating normal distribution probabilities using the standard normal cumulative distribution function
If X is normally distributed, then: P(a ≤ X ≤ b) = Φ((b - µ) / σ) - Φ((a - µ) / σ) where Φ is the standard normal cumulative distribution function, which gives the cumulative area to the left of a specified point for the standard normal distribution Values of Φ are typically given in a table.
Expected number of events that occur
If X is the number of events that occur among some collection of events A₁, ..., Aₙ, then E[X] = P(A₁) + ... + P(Aₙ)
Expectation/variance/standard deviation of the average of independent random variables
If X₁, ..., Xₙ are independent random variables, each with the same expectation µ and variance σ², and if we denote the average of these variables as X̄ = (X₁ + ... + Xₙ) / n, then: E[X̄] = µ Var(X̄) = σ² / n SD(X̄) = σ / √(n)
Sums of independent Poisson random variables
If X₁, ..., Xⱼ are independent Poisson random variables with parameters λ₁, ..., λⱼ, then X₁ + ... + Xⱼ is a Poisson random variable with parameter λ₁ + ... + λⱼ
Expectation of a constant
If c is a constant, then E[c] = c Notably, this implies that E[cX] = cE[X]
Variance of a constant
If c is a constant, then Var(c) = 0
When to use the method of indicators
Prerequisite: X must be a counting variable among some collection of events A₁, ..., Aₙ. 1) The probabilities P(X = x) are known, but given by a formula that makes the expression for E[X] hard to simplify. 2) The nature of the dependence between the events is either unknown, or known but so complicated that it is difficult to obtain a formula for P(X = x).
Standard normal cumulative distribution function symmetry
Since the standard normal cumulative distribution function is symmetric: Φ(-z) = 1 - Φ(z)
When is the sum of indicators itself an indicator?
Suppose A₁, ..., Aₙ are events with indicators I₁, ..., Iₙ. Then ∑ⱼIⱼ is an indicator iff all events Aⱼ are mutually exclusive.
Boole's inequality
Suppose A₁, ..., Aₙ are events. The probability of the union of all events Aⱼ is less than or equal to the sum of the probabilities of all Aⱼ
When can the Poisson distribution be utilized to approximate the binomial distribution?
When n is large and p is small. A good rule of thumb is that if np < 10, you can use the Poisson distribution as an approximation
Linearity of expectation
For any two random variables X and Y, E[X + Y] = E[X] + E[Y] no matter whether X and Y are independent or not
Effect of varying µ for a normal distribution
µ controls the location of the mean, so the "hump" will shift to be centered at µ.
Effect of varying σ for a normal distribution
σ controls the spread of the distribution, since it is exactly the standard deviation for the normal distribution.
Sum of infinite geometric series rule
∑(n = 0 to ∞) arⁿ = a / (1 - r)