The above algorithm can be extended to non-separable data. The correct classification constraints in Eqn. is revised by adding a slack variable to Eqn. . This will allow some points to be misclassified (Figure ). The training algorithm will need to minimise the cost function in Eqn. , i.e. a trade-off between maximum margin and classification error. The selection of C and k define the cost of constraint violation. [Cortes and Vapnik, 1995] define a more general error function. The choice of C and its effect will be discussed in detail in Chapter .
For positive integers k, the above optimisation problem is a convex programming problem (if k=1 or 2, it is a QP problem). For simplicity, k is usually set to 1. Using a similar approach as in the separable case, the training results in a similar QP problem (Eqn. ) but now the value is bound by C, Eqn. . This bound will limit the search space for the QP problem , i.e. the possible range for the value. A larger search space will generally slow down the QP optimiser. Results in Section (Table ) show that this is not the case. The optimiser used in this project (Chapter ) is able to home in to the optimal solution regardless of the size of the search space.
It can be proved that, for any misclassified training data, , the corresponding must be at the upper bound. This can be understood by imagining that a particular data point is trying to assert a stronger influence on the boundary so that it can be classified correctly, by increasing the value. When the value reaches its maximum bound, it cannot increase its influence further, hence this point will stay misclassified. This analogy is consistent with the fact that C, the upper bound for , is the trade-off between maximum margin and classification error. A higher C value will give a larger penalty for classification error, and this means is allowed to have a larger value; hence, each misclassified data can assert a stronger influence on the boundary.
Figure: Handling non-separable case with slack variables