next up previous contents
Next: Selecting the Kernel function Up: Tuning the Support Vector Previous: Tuning the Support Vector

Trade-off between Maximum Margin and Classification Errors

 

The trade-off between maximum margin and the classification error (during training) is defined by the value C in Eqn. gif. The value C is called the Error Penalty. A high error penalty will force the SVM training to avoid classification errors (Section gif gives a brief overview of the significance of the value of C).

A larger C will result in a larger search space for the QP optimiser. This generally increases the duration of the QP search, as results in Table gif show. Other experiments with larger numbers of data points (1200) fail to converge when C is set higher than 1000. This is mainly due to numerical problems. The cost function of the QP does not decrease monotonically gif. A larger search space does contribute to these problems.

The number of SVs does not change significantly with different C value. A smaller C does cause the average number of SVs to increases slightly. This could be due to more support vectors being needed to compensate the bound on the other support vectors. The tex2html_wrap_inline1712 norm of w decreases with smaller C. This is as expected, because if errors are allowed, then the training algorithm can find a separating plane with much larger margin. Figures gif, gif, gif and gif show the decision boundaries for two very different error penalties on two classifiers (2-to-rest and 5-to-rest). It is clear that with higher error penalty, the optimiser gives a boundary that classifies all the training points correctly. This can give very irregular boundaries.

One can easily conclude that the more regular boundaries (Figures gif and gif) will give better generalisation. This conclusion is also supported by the value of ||w|| which is lower for these two classifiers, i.e. they have larger margin. One can also use the expected error bound to predict the best error penalty setting. First the expected error bound is computed using Eqn. gif and gif ( tex2html_wrap_inline2446 ). This is shown in Figure gif. It predicts that the best setting is C=10 and C=100. The accuracy obtained from testing data (Figure gif) agrees with this prediction.

   table1242
Table: The results for different Error Penalty setting

   figure1251
Figure: The decision boundary for class 2 (C=10)

   figure1261
Figure: The decision boundary for class 2 (C=100000)

   figure1271
Figure: The decision boundary for class 5 (C=10)

   figure1281
Figure: The decision boundary for class 5 (C=100000)

   figure1291
Figure: The bound on expected error with different error penalties

   figure1301
Figure: The accuracy with different error penalty settings


next up previous contents
Next: Selecting the Kernel function Up: Tuning the Support Vector Previous: Tuning the Support Vector

K.K. Chin
Thu Sep 10 11:05:30 BST 1998