next up previous contents
Next: Nonlinear Support Vector Machine Up: The Formulation of Support Previous: Linear Support Vector Machine

Extending SVM to a Soft Margin Classifier

 

The above algorithm can be extended to non-separable data. The correct classification constraints in Eqn. gif is revised by adding a slack variable tex2html_wrap_inline1748 to Eqn. gif. This will allow some points to be misclassified (Figure gif). The training algorithm will need to minimise the cost function in Eqn. gif , i.e. a trade-off between maximum margin and classification error. The selection of C and k define the cost of constraint violation. [Cortes and Vapnik, 1995] define a more general error function. The choice of C and its effect will be discussed in detail in Chapter gif.

  equation214

For positive integers k, the above optimisation problem is a convex programming problem (if k=1 or 2, it is a QP problem). For simplicity, k is usually set to 1. Using a similar approach as in the separable case, the training results in a similar QP problem (Eqn. gif) but now the tex2html_wrap_inline1642 value is bound by C, Eqn. gif. This bound will limit the search space for the QP problem , i.e. the possible range for the tex2html_wrap_inline1642 value. A larger search space will generally slow down the QP optimiser. Results in Section gif (Table gif) show that this is not the case. The optimiser used in this project (Chapter gif) is able to home in to the optimal solution regardless of the size of the search space.

It can be proved that, for any misclassified training data, tex2html_wrap_inline1624 , the corresponding tex2html_wrap_inline1772 must be at the upper bound. This can be understood by imagining that a particular data point is trying to assert a stronger influence on the boundary so that it can be classified correctly, by increasing the tex2html_wrap_inline1642 value. When the tex2html_wrap_inline1642 value reaches its maximum bound, it cannot increase its influence further, hence this point will stay misclassified. This analogy is consistent with the fact that C, the upper bound for tex2html_wrap_inline1642 , is the trade-off between maximum margin and classification error. A higher C value will give a larger penalty for classification error, and this means tex2html_wrap_inline1642 is allowed to have a larger value; hence, each misclassified data can assert a stronger influence on the boundary.

   eqnarray222

   figure229
Figure: Handling non-separable case with slack variables


next up previous contents
Next: Nonlinear Support Vector Machine Up: The Formulation of Support Previous: Linear Support Vector Machine

K.K. Chin
Thu Sep 10 11:05:30 BST 1998