320 Midterm 2

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Cardinality of a powerset of a finite set S (pg 194)

|2^S| = 2^|S|

What is expected value of rolling a fair die?

1*(1/6)+2(1/6)+3(1/6)+4(1/6)+5(1/6)+6(1/6) = 3.5 (class)

Gamma distribution

A continuous distribution that describes the waiting time for at least a>0 events to occur in a homogeneous Poisson process of rate b>0. Type: Continuous Notation: X~Gamma(a, b) Parameters: a>0, b>0, shape and rate respectively Support: (0, ∞) p.d.f.: see picture c.d.f.: see picture (not in book) Expectation: a/b Variance: a/b² Example: In the special case that a=1, the gamma distribution is usually called the exponential distribution. chi-squared distribution (pg. 235)

What is an event E?

An event E is a subset of the sample space Ω. It can also be thought of as any element of the powerset of Ω

The uniform distribution

Continuous analogue of equally likely outcomes. Type: Continuous Notation: X~Uniform([a, b]) Parameters: a, b (lower and upper bound) support: [a, b] p.d.f.: fX(x) = indicator([a, b])/(b-a) c.d.f.: FX(x) = 0 if x ≤ a, (x-a)/(b-a) if x∈[a, b], 1 if x > b (just think of the area under the p.d.f) Expectation: (a+b)/2 Variance: (b-a)²/12 Example: The angle (divided by 2pi) after spinning a spinner.

Markov's proves Chebyshev's proves weak law of large numbers

GO OVER FOR TEST

Proposition 5.4.23: The Variance of a linear combination of two random variables X and Y

If X and Y are random variables and α,β∈R are constants, then Var(αX + βY) = α²Var(X) + 2αβ(E[XY] - E[X]E[Y]) + β²Var(Y) If X and Y are independent, then the variance behaves like the square of a norm: Var(αX + βY) = α²Var(X) + β²Var(Y) PRACTICE FOR EXAM see exercise 5.23 for proof (pg. 220)

What is the expected value of a random variable X with a Bernoulli distribution?

If X~Bernoulli(p), then E[X] = 1*p + 0*(1-p) = p (pg. 221)

Remark 5.7.17: What is the covariance of any univariate random variable Y with itself?

It's variance (pg. 242)

Definition 5.4.1: What is a discrete random variable?

Let (Ω, f, P) be a discrete probability space. Any function X: Ω→R is a called a discrete random variable or a random variable on (Ω, f, P). (pg 214)

Definition 5.4.9: What is the expectation of a random variable X?

Let X be a random variable on a discrete probability space. The expectation of X is given by E[X] = ∑_(w∈Ω) X(w)P(w), provided this sum converges absolutely. If the sum does not converge absolutely, the expected value does not exist. We can also write E[X] = ∑_(i) i*P(X=i) = ∑_(i) gX(i) where the sums run over all value i in the image of X (pg. 216)

Remark 5.1.8: What is P(E, F)?

P(E, F) = P(E∩F) is the probability that event E and event F occur (pg. 195)

Explain why pairwise independence isn't strong enough for a collection of events to independent

See example 5.3.7

Proposition 5.5.5: Give the expected value and variance of a random variable X with a binomial distribution

Since a Binomial distributed variable X has the same p.m.f. as ∑_(i=1)^n X_i where each X_i ~Bernoulli(p). Since ... FIXME E[X] = np and Var(X) = np(1-p) (pg. 224)

What is the maximum likelihood estimate (MLE) of a parameter θ

The point θ_hat that maximizes the likelihood of θ (pg. 258)

Name four important continous distributions

Uniform, normal, Gamma, and Beta (pg. 233)

Theorem 5.4.20: Another way to compute variance

Var(X) = E[X²]-E[X]² = E[X²]-µ² (pg. 219)

What is f?

f = 2^Ω, the set of all events of Ω (pg. 194)

Proposition 5.6.3: The cumulative density function of any random variable X is nondecreasing and appoaches 1 as x approaches infinity and approaches 0 as x approaches negative infinity

(pg. 231)

Law of Unconscious Statistician for a continuous random variable

(pg. 233)

Write the following results as used in a continuous setting The expectation of a random variable Linearity of expectation Law of Unconscious statistician Expectation of a product of independent random variables Variance of a random variable Variance of a linear combination of two independent random variables

(pg. 233)

What is a sample of a distribution?

(pg. 256)

Definition 5.1.11: Equally likely outcomes

Assume (Ω, F, P) is a finite probability space. We say that all outcomes of Ω are equally likely if P(w) = 1/|Ω| for every w∈Ω An example is rolling a fair die (every number has a 1/6 chance of getting landed on) In the coninuous case we talk about intervals as being equally likely rather than distinct events (like a bus arriving between noon and 1 rather than a bus arriving at exactly noon) Recall P({w}) is written P(w) (pg. 196)

Definition 6.1.3: biased vs. unbiased estimators

The bias of an estimator theta_hat of a parameter theta is given by bias(theta_hat) = E[theta_hat] - theta. If bias(theta_hat) is zero then the estimator theta_hat is unbiased. Otherwise, it is biased (pg. 257)

Theorem 6.3.1: The Central Limit Theorem

The central limit theorem says that for i.i.d. random variables X1, X2, ..., Xn, with mean µ and variance σ², then the sample mean is approximately normal with mean µ and variance σ²/n. This is both surprising and powerful because it holds no matter what the distribution of each Xi is. Let X1, ..., Xn be i.i.d. random variables each with mean µ and variance σ². Define the random variables Sn = X1 + ... + Xn Yn = (Sn - nµ)/(√n σ) Then the c.d.f. FYn(y) converges pointwise to the standard (mean 0 variance 1) normal distribution. That is P(Yn ≤ y) → 1/(√(2π) Integrate[Exp[-x²/2], {x, -∞, y}] as n→∞ (pg. 269)

Prior distribution vs. posterior distribution

(In Bayesian Statistics) The initial distribution of the unknown parameter is called the prior distribution. After we use Bayes' rule to incorporate new data, the updated distribution is called the posterior distribution In equations (6.16) and (6.17) P(θ) is the prior distribution and P(θ|x) is the posterior distribution taking into account the observation x (pg. 275)

What is the maximum a posteriori estimate (MAP)?

(In Bayesian Statistics) The mode of (or the point that maximizes) the posterior distribution P(θ|x), if it exists and is unique, is called the maximum a posterior estimate (MAP) (pg. 281)

Proposition 5.1.9: Given a discrete probability space (Ω, f, P), E, F∈f and E^c = Ω/E, (i) What is P(E^c)? (ii) What does E⊂F imply? (iii) What is P(E∪F)

(i) P(E^c) = 1 - P(E) (ii) E⊂F implies P(E) ≤ P(F) (iii) P(E∪F) = P(E) + P(F) - P(E∩F) (pg. 195)

Fundamental bridge between expectation and probability

(pg. 264)

What is a Bernoulli distribution?

A random variable X has a Bernoulli distribution if its support is equal to {0, 1}. The Bernoulli distribution typically represents the results of a Bernoulli trial where the results are always exactly one of two categories . Notation: X~Bernoulli(p) parameter: p support: {0, 1} p.m.f: p^x(1-p)^(1-x) Expectation: p Variance: p(1-p) Example: Flipping a coin; shooting a free throw (pg. 221)

Definition 5.6.4: What does it mean for a random variable to have a continous distribution? What the probability density function of a random variable X?

A random variable X has a continuous distribution if its cumulative density function is a continuously differentiable function of x, when restricted to the range of X. The derivative of the cumulative mass function is called the probability density function (p.d.f)(pg. 231)

What is a discrete probability space?

A triple (Ω, f, P). We say that the event E∈f occurs with probability P(E). (pg. 195)

The normal distribution

Among the most important of all distributions due to its appearance in the central limit theorem Type: Continuous Notation: X~N(µ, σ²) Parameters: µ, σ² (mean and variance) Support: (-∞, ∞) p.d.f.: fX(x) = 1/(√(2π))exp(-(x-µ)²/(2σ²)) c.d.f.: FX(x) = 1/(√(2π))Integrate[exp(-(t-µ)²/(2σ²)), {t, -∞, x}] (integral of p.d.f. from -∞ to x) Expectation: µ Variance: σ² Example: The heights (one gender) in a human population is approximately normal. (pg. 234)

What are estimators? What is an estimate?

An estimator is a statistic that is used to estimate some quantity or parameter. An estimate is the result we get when we replace each random variable X_i in an estimator by the given data data x_i. An estimator is a random variable while a estimate is a evaluation (realization) of that random variable. Estimators, in this book, are denoted with a hat. (pg. 256)

Definition 5.1.2: What is the power set of a set S?

The set of all subsets of S. It's often denoted by the symbol 2^S (pg. 194)

Corollary 6.3.2: How are the sample mean and sample sum approximately distributed according to the central limit theorem?

µn ~ N(µ, σ²/n) Sn ~ N(nµ, nσ²) To prove this, start with P(µn ≤ z) and transform it to look like P(Yn ≤ "Stuff"), then apply the CLT. After a change of variable, we see µn is approximately distributed as N(µ, σ²/n A similar calculation shows Sn is approximately distributed as N(nµ, nσ²) (pg. 269)

Definition: What is a sample space Ω of an experiment?

Ω is a set consisting of all possible outcomes of an experiment Eg.: If a die is rolled twice then Ω = {(1,1), (1, 2), (1, 3), ... , (6, 6)} The event that a double is rolled is E = {(1, 1), (2, 2), ..., (6, 6)} Class (pg. 193)

Definition 5.3.6: What does it mean for a collection of events to be independent?

A collection of events A is is independent if for every finite sub collection B of A we have that the probability of all events in B occurring is the product of their individual probabilities. (pg. 207)

Definition 5.5.2: Indicator Random Variable

A function (random variable) mapping from the sample space to the set {0, 1} we associate we any event E in a probability space. It maps w to 1 if w is in E and maps w to 0 otherwise The indicator random variable is Bernoulli distributed (pg. 222)

Definition 5.1.6: What is a discrete probability measure?

A function P mapping from the set of all events to [0, 1] that satisfies the conditions (i) P(Ω) = 1 (ii) If we have a collection of mutually exclusive events, then the probability of their union is the sum of their individual probabilities. The triple (Ω, f, P) is called a discrete probability space (pg. 194-195)

Definition 5.7.1: What is a multivariate random variable? What is a bivariate random variable? What is a univariate random variable

A function X:Ω→ on a probability space (Ω, F, P) with X = (X1, ..., Xn) is a multivariate random variable if every coordinate function Xi: Ω→R is a random variable. If n = 2, we call the random variable bivariate, and when n=1, we call the random variable univariate (pg. 238)

Definition 5.6.1: What is a continuous random variable? What is its corresponding cumulative distribution function (c.d.f)?

A function X:Ω→R on a probability space (Ω, f, P), is a random variable if X⁻((-∞, x]) ∈ f for every x ∈ R. That is we require the preimage of (−∞, x] to be an element of f for all x. This is a generalization of the definition given in the discrete section. Essentially every function that we encounter in applications will satisfy these conditions and have a well-defined c.d.f (pg. 230)

Definition 5.7.13: What is the covariance of two univariate variate random variables X and Y? What is the covariance matrix of a multivariate random variable Z: Ω→ R^n? What is the covariance of any univariate random variable Y with itself?

Cov(X, Y) = E[(X-µx)(Y-µy)] ∑ = E[(Z-µz)(Z-µz))^T] (nxn matrix) where Z = [Z1, ... , Zn]^T and µz = [µ1, ... , µn]^T and the expected value is taken element wise over ∑ The covariance of any univariate random variable Y with itself is its variance: Cov(Y, Y) = E[(Y-µy)(Y-µy)] = E[(Y-µy)²] = Var(Y) (pg. 241)

pmf vs. pdf vs. cdf

Discrete: pmf is P(X=a) = g_X(a) Continuous cdf: P(X≤a) = F_X(a) pdf: P(X≤a) = ∫f_x(u)du (from -∞, a) The cdf and pdf are related by the fundamental theorem of calculus where fX(x) = d(FX(x))/dx e.g. E[X] = ∑X(w)P(w) = ∑i g_X(i)

Proposition 5.4.13: What is the expectation value of a constant α∈R?

E[α]=α (pg. 217)

Covariance matrix is symmetrical and has real eigenvalues

FIXME

Definition 6.1.7: What is the likelihood of a parameter θ?

FIXME (pg. 258)

Theorem 5.4.14: Expectation is linear

For any constants α,β ∈ R and any two random variables X and Y on the same probability space Ω, we have E[αX + βY] = αE[x]+βE[y] This follows since the summation operator is linear (pg. 217)

Proposition 5.7.16: Property of the covariance

For any multivariate random variable X and for any i≠j the covariance satisfies Cov(Xi, Xj) = E[XiXj]-E[Xi]E[Xj] (pg. 242)

Lemma 6.2.1: Given X and Y are random variables on a probability space with X(w) ≤ Y(w) for all w, then how do the expected values of X and Y compare?

Given X and Y are random variables on a probability space (Ω, F, P) with X(w) ≤ Y(w) for all w∈Ω, then E[X]≤E[Y] (pg. 263)

Explain the birthday problem

Given a group of k people, what is the probability that tow or more people share the same birthday? See example 5.1.14 (pg. 197)

What is a statistic?

Given a sequence X1, ..., Xn of i.i.d random variables having the same distribution as X, a statistic is any function T(X1, ..., Xn) and is itself a random variable (pg. 256)

What is Maximum Likelihood Estimation?

Given a set of observations, we pick the parameter (or parameters) that is most likely to have generated those observations. For example, if we flipped a coin with an unknown weight and found that it landed on heads 6/10 times, we'd estimate that p = 0.6 because that is the parameter that would most likely generate the observations (pg. 258)

Proposition 5.7.11: What is the expectation of any multivariate random variable in terms of it's components?

Given any multivariate random variable X = (X1, ..., Xn) the expected value satisfies E[X]=(E[X1], ..., E[Xn]) (pg. 241)

Definition 6.1.1: The sample mean estimator and the biased sample variance estimator

Given the i.i.d. random variables X1, ..., Xn, the (unbiased) sample mean estimator is given by the average of all the Xi (6.1) while the biased sample variance estimator is given by (6.2) (pg 256)

Definition 5.7.5: What is the marginal probability mass function? What is the marginal probability density function?

If X is a discrete multivariate random variable with probability mass function gX(x), then the marginal probability mass function gi is the sum of the joint p.m.f. over all values of x with the ith coordinate equal to a: gi(a) = ∑_{x:xi=a} gX(x) Similarly if X is a continous multivariate random variable with probabiliy density function fX, then for each i∈{1, ..., n} the marginal probability density function fi is the integral of the joint p.d.f. fX(t1, ..., ti-1, a, ti+1, ..., tn) over all values with the ith coordinate equal to a: fi(a) = See picture probability mass function fX, then for each i ε {1, ..., n} the marginal probability density function(pg. 239)

Theorem 5.4.15: The law of the Unconscious Statistician. How to easily find the expected values of function compositions

If X is a discrete random variable and h: R→R is any function, then hX (composition) is a discrete random variable, and the expected value of h(X) is given by E[h(X)] = ∑_(i)h(i)P(X=i) = ∑_(i)h(i)gX(i) where the sum runs over all values i in the image of X. See exercise 5.20 for proof (Study proof for test) (pg. 218)

Theorem 6.2.2: Markov's Inequality

If X is a nonnegative random variable, then for any a>0 P(X≥a) ≤ E[X]/a This is used to prove Chebyshev's Inequality. It is proven by constructing the composition 1_{x≥a} = 1_[a, ∞)oX (1 if X ≥a, 0 if x < a), observing that 1_{x≥a} ≤ X/a, then taking the expectation of both sides. E[1_{x≥a}] = P(X≥a) by the fundamental bridge between expectation and probability and E[X/a] = E[X]/a by linearity of expectation. (pg. 263)

Corollary 6.2.5: Chebyshev's Inequality

If X is any random variable with mean µ and variance σ². Then for all ε>0, P(|X - µ| ≥ ε) ≤ σ²/ε² We prove this using Markov's Inequality. Observe that |X-µ| is a nonnegative random variable. Thus P(|X - µ| ≥ ε) = P((X-µ)² ≥ ε²) ≤ E[(X-µ)²]/ε² = σ²/ε² This is then used to prove weak law of large numbers. Go over proof for test (pg. 264)

What is the variance of a random variable X with a Bernoulli distribution?

If X~Bernoulli(p), then Var(X) = E[X²] - (E[X])² = E[X²] - p² = 1²*p + 0²*(1-p) - p² = p(1-p)

Definition 5.1.4: When are events mutually exclusive? When are events collectively exhaustive?

If a collection of events {E_i}iεI is pairwise-disjoint, meaning that E_i∩E_j = ∅ whenever i≠j, then we say the sets are mutually exclusive. If the union of the collection of events is equal to the sample space Ω, then we say that the events are collectively exhaustive (pg. 194)

Theorem 5.2.13: Bayes' Rule

Let (Ω, f, P) be a probability space, and let E, F ∈ f, with P(E)>0 and P(F)>0. Then P(E|F) = (P(F|E)P(E))/P(F) (FIXME: get (5.6) Use this to find conditional probabilities (pg. 205)

Definition 5.4.6: What is the event "X = a"? What is the probability mass function (p.f.m.) of a random variable X?

Let X be a random variable on a discrete probability space. Given a∈R, define the event "X=a" to be the set X⁻¹(a) = {w∈Ω| (X(w) = a} Hence P(X=a) = P(X⁻¹(a)) The probability mass function (p.m.f.) of X is the function gX: R→[0,1] given by gX(a) = P(X=a) (pg. 215)

Definition 5.7.9: What is the expected value of a discrete multivariate random variable? What is the expected value of a continuous multivariate random variable?

Notice that linearity of expectation follows immediately from linearity of summation and integration (pg. 240)

Definition 5.2.1: Given a probability space (Ω, f, and P) and events E, F ∈ F, what is P(E|F), the probability of E conditioned on F?

P(E|F) = P(E∩F)/P(F) One way to think about conditional probability is to think of probability as the percentage of times a given event occurs when the experiment is repeated a larger number of times (pg. 200)

What's the big idea behind Bayesian Statistics?

Rather than making a single POINT estimate for an unknown parameter (as in Maximum likelihood estimation) Bayesian statistics is about estimating DISTRIBUTIONS for an unknown parameter or parameters. New data is incorporated into a model using Bayes' rule, and gives an improved estimate of the distribution of the parameter. (pg. 275)

Theorem 6.2.8: Weak Law of Large Numbers

The PROBABILITY that the sample mean is far from the true mean approaches zero as our sample size increases. Let X1, ..., Xn be a sequence of i.i.d random variables, each having mean µ and variance σ². For each positive integer n let µn be the sample mean estimator 1/n(X1+...+Xn). Then for all ε>0, P(|µn-µ|≥ε)→0 as n→∞. More specifically, for every ε>0, P(|µn-µ|≥ε) ≤ σ²/(nε²) This is proven using Chebyshev's inequality. First show that E[µn] = µ and Var(µn) = σ²/n. Then Chebyshev's inequality gives P(|µn-µ|≥ε) ≤ σ²/(nε²) which gives the result of the limit as n→∞ (pg. 265)

What is a Poisson distribution?

The Poisson distribution is used to describe the number X of occurrences of an event in a given interval of time or space, where the interval is made up of many small sub-intervals in which the probably of an occurrence is low, and the occurrence of an event in a given sub-interval is essentially independent of the occurrence of any event in any other sub-interval. For example, it is often used to describe situations like the number of radioactive particles that hit a detector in a second, the number of automobiles that arrive at an intersection in a minute, or the number of customers that come into a store in an hour. Notation: X~Poisson(λ) Support: {0, 1, 2, ...} (natural numbers) parameters: λ (the average rate of occurrence in a given amount of time or space) p.m.f.: e^(-λ)λ^x/(x!) Expectation: λ Variance: λ Example: The number of customers to call in the next 10 minutes given an average of 1 per minute. Then number of chocolate chips in a cookie (space rather than time) (pg. 225)

What is the cumulative distribution function?

The cumulative distribution function (c.d.f) of a continuous random variable X gives the probability P(X≤x) that X will be no grater than a given amount x. It's represented as F_X in the book. It's derivative f_X = dF_X/dx is called the probability density function. (pg. 230)

Definition 5.6.6: The Expectation of a continuous random variable

The expectation of a continuous random variable Xis the integral from negative infinity to positive infinity of x times the probability mass function (in terms of x) (pg. 233)

Proposition 5.4.18: If X and Y are independent random variables on a disrecte probability space (Ω, f, P), then what is the expectation of their product?

The expectation of the product of two independent variables is the product of their expectations: E[XY] = E[X]E[Y] (pg. 218)

Give an example of a binomailly distributed random variable

The number of free-throws a made out a specified amount. In this case n is the number of shots taken and p is the probability a shot is made (pg. 223)

Definition 5.4.3: Given f: A→B and S⊂B, what is the preimage of S?

The preimage of S is the set f⁻¹(S) = {a ∈ A | f(a) ∈ S} For a single element b∈B, we often abuse notation and write f⁻¹(b) when we mean f⁻¹({b}) Note that the notation makes sense even if f has no inverse. (pg. 214)

What is the support of a distribution for a discrete random variable X?

The values of x for which the probability mass function (p.m.f) is nonzero. It is a subset of the range of the random variable X (pg. 221). The p.m.f is always 0 for values of x that do not lie in the support

Definition 5.7.3: What is the joint probability mass function? What is the joint cumulative distribution function? What is a joint probability density function?

These are generalizations for the p.m.f., p.d.f. and c.d.f. for the multivariate case (pg. 239)

Proposition 6.5.6: For any given probability density function f(x, θ), if θ is known to lie in the interval [a, b] and the prior distribution P(θ) is uniform on [a, b], then what can be said of the maximum a posteriori estimate and the maximum likelihood estimate?

They are the same See exercise 6.20 for the proof (pg. 281)

Proposition 5.2.6: The chain rule

This gives a way to write the probability of the intersection of several events in terms of conditional probabilities. This is proven using the definition of P(E|F) and using induction (pg. 201)

Multinomial distribution

This is a generalization of the binomial distribution that counts the outcomes of a sequence of n repeated i.i.d. trials of an experiment where exactly k outcome can occur, and for each j, the jth outcome has probability pj, with ∑pj = 1 over all j. The ith coordinate Xi represents the number of experiments that had result i. The range of a random variable with multinomial distribution consists of k-tuples of nonnegative integers (x1, x2, ..., xn) that sum up to n. p.m.f: C(n, (x1, x2, ..., xk)) p1^x1 p2^x2 *** pk^xk The coordinate Xi is binomailly distributed with probability of success pi. FIXME marginal p.m.f. The expected value of X is E[X] = (E[X1], ..., E[Xn]) = (np1, ..., npk) (pg. 244)

Multivariate Normal Distribution

This is a generalization of the single-variable normal distribution. Parameters: µ = (µ1, ..., µn) ∈ R^n ∑ (nxn positive definite covariance matrix) p.d.f.: fX(x1, ..., xn) = det(2π∑)^(-1/2) exp(-(x-µ)^T ∑⁻¹(x-µ)/2) (pg. 245) E[X] = µ Covariance matrix of X is ∑

5.2.10: Law of total probability

This stems from the additive property of probability, combined with the definition of conditional probability. This allows us to condition on a partition in order to compute the probability of an event. If {E_i}i∈I is a countable collection of mutually exclusive and collectively exhaustive events in a probability space (Ω, f, P), then for any event F ∈ f, we have P(F) is the sum over all i∈I of P(F|E_i)P(E_i) (pg. 204)

T/F: If the subsets E_i are all nonempty, then saying that they are mutually exclusive and mutually exhaustive is another way of saying that they form a partition of Ω

True. See remark 5.1.5 (pg. 194)

Definition 5.4.17: What does it mean for two discrete random variables X and Y to be independent?

Two discrete random variables X and Y are independent if the events X=a and Y=b are independent for all a,b∈R, that is P((X=a)∩(Y=b)) = P(X=a)(P(Y=b) (pg. 218)

Definition 5.3.1: What does it mean for two events E and F to be independent?

Two events E, F in a probability space are independent if P(E∩F) = P(E)P(F) Informally, we say two events are independent if knowing the outcome of one event gives no information about the probability of the other. Said more carefully, events E and F are independent if P(E|F) = P(E) and P(F|E) = P(F) We use this when drawing independent random variables from a distribution (pg. 206)

Beta Distribution

Type: Continuous Notation: X~Beta(a, b) parameters: a>0, b>0 (both called shape) Support: [0, 1] p.d.f.: see picture c.d.f.: not given in book Expectation: a/(a+b) Variance: ab/((a+b)²(a+b+1)) Example: If A~gamma(α, θ) and B~gamma(β, θ) then the random variable Z = A/(A+B) has distribution Beta(α, β). Alternatively, a draw from a beta distribution Beta(k, n+1-k) comes from drawing n numbers from the uniform distribution on [0, 1] and the ordering them and taking the kth smallest number (pg. 235)

Definition 5.4.19: Given X, a discrete random variable with E[X] = µ, what is the variance of X?

Var(X) = E[(X-µ)²] provided this expectation is defined (absolutely convergent). The standard deviation is the square root of the variance. (pg. 219)

Proposition 5.4.22: How does scaling and shifting a random variable affect it's variance?

Var(αX + β) = α²Var(X) Scaling a random variable by α scales it's variance by α². Shifting a random variable by β does not affect it's variance. In other words variances are invariant to horizontal shifts. (pg. 220)

Proposition 6.1.6: An unbiased estimator of the variance

We divide by 1/(n-1) instead of by 1. See example 6.1.5 (pg. 257)

Proposition 5.3.5: If E and F are independent events, then E^c and F are also independent

We have P(E∩F) = P(E)P(F) Thus, P(E^C∩F) = P(F) - P(E∩F) = P(F) - P(F)P(E) = P(F)(1-P(E) = P(F)P(E^c) Thus E^c and F are independent (pg. 207)

What is a binomial distribution?

We say a random variable X has a binomial distribution with parameters n and p if the support is {0, 1, 2, 3, ..., n} and the p.m.f of X is g_X(x) = C(n, x)p^x(1-p)^(n-x) In this case we write X~Binomial(n, p) The sum of n independent Bernoulli random variables X1, ..., Xn, all with parameter p is a binomially distributed random variable with parameters n and p. Notation: X~Binomial(n, p) Support: {0, 1, ..., n} parameters: n, p p.m.f: C(n, x)p^x (1-p)^(n-x) Expectation: np Variance: np(1-p) Example: Number of free throws made out of 10 attempts. (pg. 222)

Conjugacy

When the prior and posterior distributions are of the same type. E.g. the beta distribution is conjugate to the Bernoulli likelihood (pg. 282)

What is a Discrete Distribution?

When we talk about a discrete random variable having a particular distribution, we mean it has a particular probability mass distribution (p.m.f) (pg. 221)

Give an example of a random variable X that is Bernoulli distributed.

Wheter Kevin Durant makes a free throw. See Example 5.5.1(pg 222)


Ensembles d'études connexes

Chinese Made Easy Book 2 Lesson 12

View Set

7 - Annuities (Test only has 10 questions)

View Set

Biology Quiz Questions (GO THROUGH THESE)

View Set

Fundamentals Nursing Prep U Chapter 20 Communicator

View Set