Naive Bayes Theorem

1 Conditional Probability - Finding an probability when an event is already occur 

2 Independent Events - Tossing of coin

3 Dependent Events -Like shown in below image 

 

P(A) -> probability of a is already picked up

P(B) -> Probability 


Conditional Probability



These all are dependent evnet 

P(B|A)-> probability of B when A is already occured P(A intersection B)/P(B)


   


This is BAyes Theorem P(A|B)^^

P(A) is the priori of A (the prior probability, i.e. Probability of event before evidence is seen). The evidence is an attribute value of an unknown instance(here, it is event B).

P(A|B) is a posteriori probability of B, i.e. probability of event after evidence is seen

Lets see how the Naive bayes Algorithm Works on classification problem :

Lets's say we have n features { x1,x2,x3,x4...Xn} and output {y}

then our equation would be :::
P(y|x1,x2,x3,x4...Xn)=P(x1|y)P(x2|y)P(x3|y)....P(Xn|y) * P(y)/P(x1)p(x2)P(x3)...P(Xn)




Lets Understand this algorithm with example 
I have features like Overcast,Temperature,Play
Now I have to predict whether there will be paly or not 



Now we have to Take the normalization of these probs :

P(Yes) = 0.031 / (0.031 + 0.08571)=0.27 apprx
P(No)=P(yes) -1 =0.73


Now with respect to text data How Naive Data Behave :
Answer 
                x1    x2       x3
Sent 1 -- The food is Bad
Sent 2-- THe food is Bad
Sent3-- Food Is Bad


F1

The                    


F2

Food


F3

Delicious


F4

Bad


Output



1


1


1


0


1


1


1


0


1


0


0


1


0


1


0


0


1


1


0


1


0


0


0


1


0


This is the binary dataset Created using these all sentence 
by applying various methods of nlp:
 StopWords
 Stemming
 Bow 
 Tf/Idf

Now applying Naives bayes algorithm 

P(y=Yes | Sentence )=>
P(y=Yes | (x1,x2,x3...Xn) )

this would be equal to :
                                   n
                 =    P(Y)*pi    P(x1|y=Yes) *P(x2 | y=yes)*....P(Xn|y=yes)
                                   i
                 =   P(y=Yes) * p(x1|y=yes) * P(x2|y=yes)
               
                 = 2/5 * 1/2 * 2/4 
                 =1/10=0.1
P(y=No | (x1,x2,x3...Xn) )
         all process is similar 
               we get answer as :
                         0.03
  
Now are actual prediction is after normlize 
P(Yes)=(0.01) / (0.01 + 0.03)=0.25(these all are assumption) they can be change 
P(No)= 1-P(yes)=0.75

Where does it fail..?
Suppose we have one more word  like tasty(which is not in feature) in sentence 1 Then the probability of that word would be zero The this will treated as neagtive value which is not true ..this is calles as correlation of features 

This is often known as Zero Frequency. To solve this, we can use the smoothing technique. One of the simplest smoothing techniques is called Laplace estimation.

The assumptions made by Naive Bayes are not generally correct in real-world situations. In-fact, the independence assumption is never correct but often works well in practice. 

On the other side naive Bayes is also known as a bad estimator, so the probability outputs are not to be taken too seriously.

              

Naive Bayes can handle missing data. Attributes are handled separately by the algorithm at both model construction time and prediction time. As such, if a data instance has a missing value for an attribute, it can be ignored while preparing the model, and ignored when a probability is calculated for a class value tutorial 

It does this By taking the possible outcomes




Comments

Popular posts from this blog

Presentation_Rashmi

MySQL : Structured Query Language

spoken