Naive Bayes

Ace your homework & exams now with Quizwiz!

MLE for multinomials

-Let X in {1, ..., k} be a discrete random variable with k values, where P(X=j)=thetaj -then P(X) is a multinomial distribution where I(X=j) is an indicator function -the likelihood is -the maximum likelihood estimates for each parameter are

Naive Bayes classifiers

-instead of learning a function f that assigns labels, learn a conditional probability distribution over the output function f -P(f(x) | x) = P(f(x) = y | x1, x2, ..., xp) -Can use probabilities for other tasks, like classification and ranking

Maximum likelihood estimation

-most widely used method of parameter estimation -"learn" the best parameters by finding the values of theta that maximizes likelihood -often easier to work with loglikelihood

Numerical Stability

-multiplying probabilities can get us into problems -we need underflow prevention -better to sum logs of probabilities rather than multiplying probabilities -class with highest final un-normalized log probability score is still the most probable

NBC learning model space

-parametric model with specific form -models vary based on parameter estimates in CPDs

Naive Bayes classifier

-simplifying (naive) assumption: attributes are conditionally independent given the class -Strengths: easy to implement, often performs well even when assumption is violated, can be learned incrementally -Weaknesses: class conditional assumption produces skewed probability estimates, dependencies among variables cannot be modeled

Bayes rule for probabilistic classifier

-the learner considers a set of candidate labels, and attempts to find the most probable one y in Y given the observed data -such maximally probable assignment is called maximum a posteriori assignment (MAP); Bayes theorem is used to compute it

Score Function: Likelihood

Let D = {x(1), ..., x(n) } -assume data D are independently sampled from the same distribution: p(X|theta) -the likelihood function represents the probability of the data as a function of model parameters

Likelihood

Likelihood is not a probability distribution -gives relative probability of data given a parameter -numerical value of L is not relevant, only the ratio of two scores is relevant

NBC learning search algorithm

MLE optimization of parameters (convex optimization results in exact solution)

Laplace correction

Numerator: add 1 Denominator: add k, where k = number of possible values of X

Likelihood function

allows us to determine unknown parameters based on known outcomes

Probability distribution

allows us to predict unknown outcomes based on known parameters

NBC learning scoring function

likelihood of data given NBC model form

Zero counts

problem -if an attribute value does not occur in training example, we assign zero probability to that value -that could make conditional probability equal 0 adjust zero counts by smoothing probability estimates

Bayes rule

see slides for equation

P(y|x)

the posterior probability of v. the probability that v is the target, given that D has been observed

P(y)

the prior probability of a label y, reflects background knowledge; before data is observed. If no information - uniform distribution

P(x|y)

the probability of observing the sample x, given the label y is the target (likelihood)

P(x)

the probability that this sample of the data is observed (no knowledge of the label)


Related study sets

Figurative Language - Conch Edition

View Set

Chapter 15: Evolution and Human Behavior

View Set

CH 22 Digestive System Objectives

View Set

Chapter 48: Management of Patients With Intestinal and Rectal Disorders

View Set