Support Vector Machine

It Creates hyperplane between the points so that it can be differentiable that's why it is known as classification algorithm 

It works on smaller datasets ,but on the complex ones it can be much stronger and powerfull in building macine Learning Models 

It can be Used for both classification and regression problems 

But mostly it is used in Classification algorithms . In this we plot each data item as a point in n dimensional space (where n is number of features you have) with the value of each feature being the value  of a particular coordinate .

SVM_1


Support Vectors are basically the co-ordinates of individual observation. The SVM classifier  is a fromtier which segregates the two classes (Hyper plane / line)


Now How can we Identify the right Hyperplane :

Identify the right hyperplane (Scenario-1)- Here we have three hyper planes (A,B and C) Now identify the right Hyperplane to classify star and circle 

SVM_2

Hyperplane B will seggregate be treated as Hyperplane 

Identify the right hyper-plane (Scenario-2):

Here We have three Hyperplane (A,B,C) and allare segreagting the classes well.Now how we choose the right hyperplane .

SVM_3

Here maximizing the distance between nearest dat point (either class) and hyper-plane will help us to decide the right hyperplane The distance is called as Margin Lets look at the below snapshot 

SVM_4

Margin of C is High so C is Hyper plane 

Another reason is Robustness  If we select hyper-plane with low margin then there is high chance of miss-Classification.

Identify the right hyperplane-plane (Scenario-3)-

 SVM_5

from this diagram we selected the B plane because of high margin but 

One star is other side so to make it perfect model we will select A as a hyperplane .


Can we Classify Two classes (Scenario-4)? Below ,I am unable to segregate the two classses using a straight line, as one of tthe stars lies in other classes .

SVM_6

As I have mentiones ,one star at the other end is like an outlier for star class.

SVM has a feature to ignore outliers and find the hyperparameters that has the maximum margin.

Hence we can say that we can say it is robust to outliers.

SVM_7

Find the hyper-plane to segregate to classes(Scenario-5)- 

In this we can't have linear hyper-plane btw two classes So how does SVM  Classify these two classses.

SVM_8

SVm can solve this problem It does by introducing new feature z=x^2+ y^2,Now let's plot the data point on axis x and z.

SVM_9

Points to remember from above plane:

Z would be always positive beacuse it is squared term

In the original plot red circles appear close to the origin of x and y axes .leading to lower value of z and star relatively away from the origin result to higher value of z.

SVM Kernels is a function tht takes low dimensional input space and transforms it to a higher dimensional space i.e it converts not separable columns to separable columns.



Example of SVM :

Let suppose we have a data of 50%male and 50% female  

differentiating factors are :

xyplot

the blue circle represents boys and blue circles represent girls :

Few insights:

    Males in our population have a higher average height 

    Females in our population have longer scalp hairs

If we were to see an individual with height 180 cms and hair length 4 cms ,our best guess will be to classsify this individual as a male . This is how it is done 


Basic Syntax:

from sklearn.svm import SVC or SVR 

MATH Behind SVM :

As we know that the equation of hyperplane is W^t.X+b=0

and equation of line is y=W^t.X + b and slope of our hyperlane is -1 Now take two point at each side of the hyperplane just use these equation


as we seen in Image 2 our value of y is positive that sayas that whichever point in the area of point(like the point is below our hyperplane ) that would also be the positive value if it is -ve(y value)  then there is classification error 

Same equation for second point(4,4) its result is -ve (4) so every point which comes into the region of this point its y value should be neagtive if not then this is classification error 


now what we have to do is we have to minimize X2-X1 like X2 is nearest point in below the hyperplane and x2 is furthest point above the plane ..

SO 

  W^t.x1 + b=-1

  W^t.x2+b=1

=w^t(x2-x1) =2

W^t/||W|| (x2-x2) =2/||w||(we Haveto maximize this function)

or 

Image for post

Image for post

Image for post
Image for post
We want our points as separated as possible ie we want this width to be maximum . Here,w can be minimised to increasethe width ,In other words we can write this as below

 Image for post



We converted this into min because in gradient descent we try to minimize the function by converging so it will be good 

Now what is this C(how many error) and Zita (value of error)

C -> How many error we are considering in our model.like we can manage to have 3 error then our c will be 3 

Zita-> sum of these error(point) from the plane 

then we multiply both so basically this is one of the hyperparameter Regularization method

Now to maximize margin we use constraint optimization technique called Lagrange Multiplier technique .to find the optimal value of any parameters x and y we can use the following equation.

This is hyperparameter tuning we use this to reduce overfitting 

Image for post


SVM will be more effective when features are less.













Comments

Popular posts from this blog

Presentation_Rashmi

MySQL : Structured Query Language

spoken