next up previous contents
Next: Extending SVM to a Up: The Formulation of Support Previous: VC dimension and VC

Linear Support Vector Machine - A Maximum Margin Classifier

Consider a two-class classifier gif (for the same problem described in the beginning of the chapter) with oriented hyper-planes. The decision function gif of the classifier is tex2html_wrap_inline1678 , where tex2html_wrap_inline1680 . If the training data is linearly separable, then a set of tex2html_wrap_inline1682 pairs can be found such that the constraints in Eqn. gif are satisfied. Notice that there is ambiguity in the magnitude of w and b. They can be arbitrary scaled such that tex2html_wrap_inline1688 , where tex2html_wrap_inline1690 is the training data nearest to the decision plane.


Intuitively, the classifier with the largest margin will give lower expected risk (Eqn. gif), i.e. better generalisation. Comparing the two different decision planes in Figure gif, the classifier with smaller margin will have higher expected risk. The margin for this linear classifier is just 2/||w|| (Figure gif). Hence to maximise the margin, one needs to minimise the ||w|| with the constraints in Eqn. gif. In short, the training of this classifier is achieved by solving a linearly constraint-ed optimisation problem.

Figure: Comparing liner classifier with different margin size

Figure: Decision margin for a oriented hyper-planes classifier

This optimisation problem is solved using the Lagrangian formulation. The Lagrangian for minimising ||w|| is shown in Eqn. gif. This needs to be minimised with respect to w, b, and it is simultaneously required that tex2html_wrap_inline1702 and Eqn. gif be satisfied. Applying the prima-dual formulation, this can be achieved by maximising tex2html_wrap_inline1704 subject to the constraints that tex2html_wrap_inline1706 , tex2html_wrap_inline1708 and also satisfying Eqn. gif.




From the constraints in the dual formulation, the condition in Eqn. gif and gif is obtained. These can be substituted into Eqn. gif to give the new Lagrangian in Eqn. gif. The training problem is transformed to maximise Eqn. gif with respect to constraints in Eqn. gif, gif and gif.

This is a convex QP problem, since ||w|| is convex (Figure gif shows that the tex2html_wrap_inline1712 norm of a vector is convex) and all constraints are linear. For this particular problem, the Karush-Kuhn-Tucker (KKT) conditions are Eqn. gif, gif and gif with an additional condition as stated in Eqn. gif. From these KKT condition, the following conclusion can be made.

At this point it would be beneficial to consider the significance the tex2html_wrap_inline1642 value for each set of training data. Those training data with nonzero tex2html_wrap_inline1642 values will fall on the +1 or -1 plane (see Figure gif), i.e. these are the data that contribute to defining the decision boundary. If the other data are removed and the classifier is retrained on the remaining data, the training will result in the same decision boundary. These data points are called the Support Vectors (SV). SV with larger tex2html_wrap_inline1642 are more important, since they have stronger influence on the decision boundary.



Solving Eqn. gif will give the value for all tex2html_wrap_inline1642 . From Eqn. gif, w is the a linear combination of all the training data tex2html_wrap_inline1624 (only those training points with nonzero tex2html_wrap_inline1642 value contribute). b is found by using Eqn. gif, by selecting any training data with nonzero tex2html_wrap_inline1642 values. It is numerically wiser to average b over all such training data.

next up previous contents
Next: Extending SVM to a Up: The Formulation of Support Previous: VC dimension and VC

K.K. Chin
Thu Sep 10 11:05:30 BST 1998