Support Vector Machine

September 21, 2020

It Creates hyperplane between the points so that it can be differentiable that's why it is known as classification algorithm

It works on smaller datasets ,but on the complex ones it can be much stronger and powerfull in building macine Learning Models

It can be Used for both classification and regression problems

But mostly it is used in Classification algorithms . In this we plot each data item as a point in n dimensional space (where n is number of features you have) with the value of each feature being the value of a particular coordinate .

SVM_1

Support Vectors are basically the co-ordinates of individual observation. The SVM classifier is a fromtier which segregates the two classes (Hyper plane / line)

Now How can we Identify the right Hyperplane :

Identify the right hyperplane (Scenario-1)- Here we have three hyper planes (A,B and C) Now identify the right Hyperplane to classify star and circle

SVM_2

Hyperplane B will seggregate be treated as Hyperplane

Identify the right hyper-plane (Scenario-2):

Here We have three Hyperplane (A,B,C) and allare segreagting the classes well.Now how we choose the right hyperplane .

SVM_3

Here maximizing the distance between nearest dat point (either class) and hyper-plane will help us to decide the right hyperplane The distance is called as Margin Lets look at the below snapshot

SVM_4

Margin of C is High so C is Hyper plane

Another reason is Robustness If we select hyper-plane with low margin then there is high chance of miss-Classification.

Identify the right hyperplane-plane (Scenario-3)-

SVM_5

from this diagram we selected the B plane because of high margin but

One star is other side so to make it perfect model we will select A as a hyperplane .

Can we Classify Two classes (Scenario-4)? Below ,I am unable to segregate the two classses using a straight line, as one of tthe stars lies in other classes .

SVM_6

As I have mentiones ,one star at the other end is like an outlier for star class.

SVM has a feature to ignore outliers and find the hyperparameters that has the maximum margin.

Hence we can say that we can say it is robust to outliers.

SVM_7

Find the hyper-plane to segregate to classes(Scenario-5)-

In this we can't have linear hyper-plane btw two classes So how does SVM Classify these two classses.

SVM_8

SVm can solve this problem It does by introducing new feature z=x^2+ y^2,Now let's plot the data point on axis x and z.

SVM_9

Points to remember from above plane:

Z would be always positive beacuse it is squared term

In the original plot red circles appear close to the origin of x and y axes .leading to lower value of z and star relatively away from the origin result to higher value of z.

SVM Kernels is a function tht takes low dimensional input space and transforms it to a higher dimensional space i.e it converts not separable columns to separable columns.

Example of SVM :

Let suppose we have a data of 50%male and 50% female

differentiating factors are :

xyplot

the blue circle represents boys and blue circles represent girls :

Few insights:

Males in our population have a higher average height

Females in our population have longer scalp hairs

If we were to see an individual with height 180 cms and hair length 4 cms ,our best guess will be to classsify this individual as a male . This is how it is done

Basic Syntax:

from sklearn.svm import SVC or SVR

MATH Behind SVM :

As we know that the equation of hyperplane is W^t.X+b=0

and equation of line is y=W^t.X + b and slope of our hyperlane is -1 Now take two point at each side of the hyperplane just use these equation

as we seen in Image 2 our value of y is positive that sayas that whichever point in the area of point(like the point is below our hyperplane ) that would also be the positive value if it is -ve(y value) then there is classification error

Same equation for second point(4,4) its result is -ve (4) so every point which comes into the region of this point its y value should be neagtive if not then this is classification error

now what we have to do is we have to minimize X2-X1 like X2 is nearest point in below the hyperplane and x2 is furthest point above the plane ..

W^t.x1 + b=-1

W^t.x2+b=1

=w^t(x2-x1) =2

W^t/||W|| (x2-x2) =2/||w||(we Haveto maximize this function)

Image for post

Image for post

We want our points as separated as possible ie we want this width to be maximum . Here,w can be minimised to increasethe width ,In other words we can write this as below

Image for post

We converted this into min because in gradient descent we try to minimize the function by converging so it will be good

Now what is this C(how many error) and Zita (value of error)

C -> How many error we are considering in our model.like we can manage to have 3 error then our c will be 3

Zita-> sum of these error(point) from the plane

then we multiply both so basically this is one of the hyperparameter Regularization method

Now to maximize margin we use constraint optimization technique called Lagrange Multiplier technique .to find the optimal value of any parameters x and y we can use the following equation.

This is hyperparameter tuning we use this to reduce overfitting

Image for post

SVM will be more effective when features are less.

Search This Blog

Sequence Model

Support Vector Machine

Comments

Post a Comment

Popular posts from this blog

Presentation_Rashmi

MySQL : Structured Query Language

spoken