next up previous contents
Next: SVM training and SRM Up: The Formulation of Support Previous: Extending SVM to a

Nonlinear Support Vector Machine

So far the SVM classifier can only have a linear hyper-plane as its decision surface. This formulation can be further extended to build a nonlinear SVM. The motivation for this extension is that a SVM with nonlinear decision surface can classify nonlinearly separable data.

Consider a mapping tex2html_wrap_inline1788 , Eqn. gif, which maps the training data from tex2html_wrap_inline1668 to some higher Euclidean space tex2html_wrap_inline1792 , which possibly has infinite dimensions. In this high dimension space, the data can be linearly separable, hence the linear SVM formulation above can be applied to these data. In the SVM formulation, the training data only appear in the form of dot products, tex2html_wrap_inline1794 i.e. in Eqn. gif, the same is true in the decision function tex2html_wrap_inline1796 ( tex2html_wrap_inline1798 ). These can be replace by dot products in the Euclidean space tex2html_wrap_inline1792 , i.e. tex2html_wrap_inline1802 .

  equation244

The dot product in the high dimension space can also be replace by a kernel function, Eqn. gif. By computing the dot product directly using a kernel function, one avoid the mapping tex2html_wrap_inline1804 gif. This is desirable because tex2html_wrap_inline1792 has possibly infinite dimensions and tex2html_wrap_inline1804 can be tricky or impossible to compute. Using a kernel function, one need not explicitly know what tex2html_wrap_inline1788 is. By using a kernel function, a SVM that operates in infinite dimensional space can be constructed. This also means that w cannot be precomputed, and that the decision function will be replaced by Eqn. gif, where for every new test data, the kernel function for each SV need to be recomputed.

  equation251

  equation254

The question to ask next is ``what kernel function can be used gif ?'' i.e. is there any constraint on the type of kernel function suitable for this task. For any kernel function suitable for SVM, there must exist at least one pair of tex2html_wrap_inline1816 , such that Eqn. gif is true, i.e. the kernel function represents the dot product of the data in tex2html_wrap_inline1792 . The kernel that has these properties satisfies the Mercer's condition [Vapnik, 1995], i.e. for any g(x) with finite tex2html_wrap_inline1712 norm (Eqn. gif), Eqn. gif must hold. Any positive definite kernel satisfies this condition. It can be proved that the kernel functions in Table gif satisfy this condition.

  equation264

  equation267

It is interesting to note that by choosing different kernel functions, the SVM can emulate some well known classifiers [Osuna et al., 1997b], as shown in table gif.

   table272
Table: Some possible kernel functions and the types of decision surface they define


next up previous contents
Next: SVM training and SRM Up: The Formulation of Support Previous: Extending SVM to a

K.K. Chin
Thu Sep 10 11:05:30 BST 1998