Naive Bayes Theorem

September 21, 2020

1 Conditional Probability - Finding an probability when an event is already occur

2 Independent Events - Tossing of coin

3 Dependent Events -Like shown in below image

P(A) -> probability of a is already picked up

P(B) -> Probability

Conditional Probability

These all are dependent evnet

P(B|A)-> probability of B when A is already occured P(A intersection B)/P(B)

This is BAyes Theorem P(A|B)^^

P(A) is the priori of A (the prior probability, i.e. Probability of event before evidence is seen). The evidence is an attribute value of an unknown instance(here, it is event B).

P(A|B) is a posteriori probability of B, i.e. probability of event after evidence is seen

Lets see how the Naive bayes Algorithm Works on classification problem :

Lets's say we have n features { x1,x2,x3,x4...Xn} and output {y}

then our equation would be :::

P(y|x1,x2,x3,x4...Xn)=P(x1|y)P(x2|y)P(x3|y)....P(Xn|y) * P(y)/P(x1)p(x2)P(x3)...P(Xn)

Lets Understand this algorithm with example

I have features like Overcast,Temperature,Play

Now I have to predict whether there will be paly or not

Now we have to Take the normalization of these probs :

P(Yes) = 0.031 / (0.031 + 0.08571)=0.27 apprx

P(No)=P(yes) -1 =0.73

Now with respect to text data How Naive Data Behave :

Answer

x1 x2 x3

Sent 1 -- The food is Bad

Sent 2-- THe food is Bad

Sent3-- Food Is Bad

The

Food

Delicious

Bad

Output

This is the binary dataset Created using these all sentence

by applying various methods of nlp:

StopWords

Stemming

Bow

Tf/Idf

Now applying Naives bayes algorithm

P(y=Yes | Sentence )=>

P(y=Yes | (x1,x2,x3...Xn) )

this would be equal to :

= P(Y)*pi P(x1|y=Yes) *P(x2 | y=yes)*....P(Xn|y=yes)

= P(y=Yes) * p(x1|y=yes) * P(x2|y=yes)

= 2/5 * 1/2 * 2/4

=1/10=0.1

P(y=No | (x1,x2,x3...Xn) )

all process is similar

we get answer as :

0.03

Now are actual prediction is after normlize

P(Yes)=(0.01) / (0.01 + 0.03)=0.25(these all are assumption) they can be change

P(No)= 1-P(yes)=0.75

Where does it fail..?

Suppose we have one more word like tasty(which is not in feature) in sentence 1 Then the probability of that word would be zero The this will treated as neagtive value which is not true ..this is calles as correlation of features

This is often known as Zero Frequency. To solve this, we can use the smoothing technique. One of the simplest smoothing techniques is called Laplace estimation.

The assumptions made by Naive Bayes are not generally correct in real-world situations. In-fact, the independence assumption is never correct but often works well in practice.

On the other side naive Bayes is also known as a bad estimator, so the probability outputs are not to be taken too seriously.

Naive Bayes can handle missing data. Attributes are handled separately by the algorithm at both model construction time and prediction time. As such, if a data instance has a missing value for an attribute, it can be ignored while preparing the model, and ignored when a probability is calculated for a class value tutorial

It does this By taking the possible outcomes

Search This Blog

Sequence Model

Naive Bayes Theorem

Comments

Post a Comment

Popular posts from this blog

Presentation_Rashmi

MySQL : Structured Query Language

spoken