Bayesian Midterm

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Normalizing Constant Question

What's the overall chance of observing data X?

Heavier Prior means we have

a small sample size n informative prior

Normalizing constant pdf

f(x)

Chain structure 'iter'

# iterations *First-half are always thrown-out as "warm-up" samples (takes time before starts to produce values that mimic random sample...that's why always double)

Chain structure 'chains'

# parallel chains to run

'stan()' arguments

1) model structure: must specify the structure, tuning parameter, and observed data 2) chain structure

Monte Carlo Simulation:

1) simulate pi from prior (data.frame(pi....)) 2) Simulate x from pi [mutate(x=rbinom()) 3) Choose filter on certain variable [filter(x=8)]

Heavier Data means we have

A large sample size n vague prior

Bayesian Analysis

Combines our prior information with the observed data to construct updated or posterior information about a r.v θ

MCMC limitation

Complicated to implement

Sequential Bayesian Analysis

Continue updating our posterior understanding by each time our new posterior becomes our new prior, etc.

Frequentist interpretation of Data

Data alone should drive our outgoing information

Bayesian interpretation of Data

Data should be weighed against our incoming information

Posterior mean

E(π|X = x) = ∫ f(π)f(π|x) dπ

Mode

Highest point so most plausible π

Likelihood Function question

How compatible is r.v θ with data x?

Shape of curve means

How spread out our value is...and our confidence peak = most plausible value of π spread = how certain/confident we are *Skinny = pretty certain

Likelihood Interpretation

If (x = 0, θ = 0)...probability that X = 0 if θ = 0 is ...

Frequentist interpretation of Questions Asked

If Hypothesis isn't correct, what is the chance that I observed these data?

THE CONTINUOUS UNIFORM DISTRIBUTION

If continuous RV X is uniformly distributed across the interval [a,b]

Bayesian interpretation of Questions Asked

In light of data, what are the chances that hypothesis is correct?

Likelihood Function PDF

L(θ|x) := f(x|θ) x | θ ~ ...() Where data, x, is known and θ is unknown

Posterior f(θ|x) will be large if

L(θ|x) is large or f(θ) is large or both *Either needs to be highly likely or a lot of evidence

Posterior f(θ|x) will be small if

L(θ|x) is small or f(θ) is small or both *Small prior chance of happening or inconsistent data

THE NORMAL DISTRIBUTION

Let X be an RV with a bell-shaped distribution centered at μ and with variance σ2

THE EXPONENTIAL DISTRIBUTION

Let X be the waiting time until a given event occurs

THE BINOMIAL (& BERNOULLI) DISTRIBUTION

Let discrete RV X be the number of successes in n trials where: the trials are independent; each trial has an equal probability p of success

THE GEOMETRIC DISTRIBUTION

Let discrete RV X be the number of trials until the 1st success where: the trials are independent; each trial has an equal probability p of success

THE POISSON DISTRIBUTION

Let discrete RV X∈{0,1,2,...} be the number of events in a given time period. The outcome of X depends upon parameter λ>0, the rate at which the events occur

x | π ~ ...()

Likelihood

π | x ~ ...()

Posterior *Upon observing X

Informative Prior

Posterior will be more influenced by prior *Typically tuned using expert information

Vague Prior

Posterior will resemble Data X/likelihood *Large Variance

π ~ ...()

Prior

Never accept proposal

Straight line because never move

THE BETA DISTRIBUTION

The Beta distribution can be used to model a continuous RV X that's restricted to values in the interval [0,1]

THE GAMMA DISTRIBUTION

The Gamma distribution is often appropriate for modeling positive RVs X (X>0) with right skew. For example, let X be the waiting time until a given event occurs s times. The outcome of X depends upon both the shape parameter s>0 (the number of events we're waiting for) and the rate parameter r>0 (the rate at which the events occur)

Interpreting 95% CI

There's a 95% posterior probability that lambda is in my interval [] given my data

Step 1 in Bayesian Analysis

Tune Prior Model

Frequentist...making inferences about pi?

Using only data x

Bayes...making inferences about pi?

Using posterior which combines our prior and data x *Entire pdf is the estimate for π

Prior question

What do we understand about θ before observing data x?

Posterior Question

What do we understand about θ now that we've observed data x?

THE DISCRETE UNIFORM DISTRIBUTION

discrete RV X is equally likely to be any value in the discrete set S

Density + Histogram

distribution of chain values

Posterior predictive model

f(x' | x) = ∫f(x'|π)f(π | x) dπ f(x'|x) = Σf(x'| θ)f(θ|x) Observe x' if θ * posterior plausibility of θ given original data

LTP

f(x) = Σf(x,θ) = Σf(x|θ)f(θ) = ΣL(θ|x)f(θ) *Σ over all θ f(x) = Integral version

Independent Variables

f(x|y) = f(x) or f(x,y) = f(x)f(y)

Conditional Models

f(x|y) = f(x,y) / f(y) = 1

Prior pdf

f(θ)

Posterior PDF

f(θ|x) = f(θ)L(θ|x) / f(x) or f(θ|x) = f(θ)f(x|θ) / f(x)

Frequentist...what is pi?

fixed, unknown quantity

Prior definition

incoming/prior information

Likelihood Defintion

is a function of θ that measures the relative likelihood of the model parameter being θ given that we observed data X

Trace plot

longitudinal behavior of chains

Normalizing Constant Definition

measures the overall chance of observing X = x across all possible values of θ, taking into account the prior plausibility of each possible θ. Specifically by the Law of Total Probability

MAP

mode argmaxπf(π|x) so where the posterior is maximized

Bigger sample size means

more confident *More data more mathematical influence • If n is sufficiently large will see a convergence to general consensus

Making Inferences...hypothesis testing

posterior assessment about a claim regarding π

Making Inferences...prediction

predicting new observations from the model

Conjugate Prior

produces a posterior. model in the same family

Bayes...what is pi?

r.v we can model using a pdf

Making Inferences...interval estimation

range of posterior plausible values of π

Frequentist interpretation of probability

relative frequency of repeatable event

Bayesian interpretation of probability

relative plausibility of an event

Monte Carlo What is does:

simulating (x, pi) pairs from prior/likelihood

Making Inferences...point estimation

single posterior estimate of π

Why simulation?

some models becomes too difficult to derive from hand, so simulation is needed

Variance

spread *Small sample size = wide plausible range

Mean

where pi is the most common

Monte Carlo

{θˆ1, ... , θˆn} is a random sample of size N from posterior f(θ|x) where θ is: i.i.d • Independent • Identically Distributed (drawn directly from posterior f(θ|x))

Markov Chain Monte Carlo (MCMC)

{θˆ1, ... , θˆn} is not a random sample from the posterior f(θ|x) but can be designed to mimic one 1) Chain values are dependent ... θˆ(i+1) is drawn from a model that depends on the current state, θî, g(θˆ(i+1)|θˆi) 2) Chains are not drawn from the posterior but converge to it which provides a good approximation

Monte Carlo Limitations

• Doesn't work if data is uncommon • Computationally inefficient • Can breakdown when sample space of x is continuous or large • Can't model more than one parameter θ

Monte Carlo Filtering Drawbacks

• Doesn't work if we have a small sample size • Can actually cut down our sample size • Certain values produce more accurate results

A good MCMC

• Even distribution with not too much variance • will tour around the state space of possible θ values in a way so that the visits to the different spots provide a good approximation of the true but unknown posterior

Voir tous les ensembles d'études

Ensembles d'études connexes

EVR 1001 Exam 1 Study Guide - Chapter 2

Anatomy&Physiology Chapter 1

PREP U- Antiseizure chap 23

Chemistry - Unit 3 test

Sports Chapter 12

Hinduism and Buddhism Which is an example of karma? Jane accidentally trips someone in the hallway but does not apologize. Later, someone helps her up when she trips. Scott helps an elderly woman cross the street, and then he finds a $10 bill lying on th

Bayesian Midterm

Ensembles d'études connexes

EVR 1001 Exam 1 Study Guide - Chapter 2

Anatomy&Physiology Chapter 1

PREP U- Antiseizure chap 23

Chemistry - Unit 3 test

Sports Chapter 12

Hinduism and Buddhism Which is an example of karma? Jane accidentally trips someone in the hallway but does not apologize. Later, someone helps her up when she trips. Scott helps an elderly woman cross the street, and then he finds a $10 bill lying on th

260 Unit 1 Quiz

nurs 250 prepu

OBHR exam 1

Nutrition Consultant Exam Chp 1-3

ACC250 Ch 3 Review

Federal Taxes Test 3

WEEK 3: CHAPTER 12 - SOLVING A PROBLEM: CRIME AND JUSTICE

1137 Communication

Quiz II Review (Ch.8)

Peds Exam 1: Genitourinary & Respiratory

biomed unit 4 test

Spanish 3 Final Exam

Business Exam Review (Set 4)

International Business Ch. 6: International Trade Theory