So far the SVM classifier can only have a linear hyper-plane as its decision surface. This formulation can be further extended to build a nonlinear SVM. The motivation for this extension is that a SVM with nonlinear decision surface can classify nonlinearly separable data.

Consider a mapping , Eqn. , which maps the training data from to some higher Euclidean space , which possibly has infinite dimensions. In this high dimension space, the data can be linearly separable, hence the linear SVM formulation above can be applied to these data. In the SVM formulation, the training data only appear in the form of dot products, i.e. in Eqn. , the same is true in the decision function ( ). These can be replace by dot products in the Euclidean space , i.e. .

The dot product in the high dimension space can also be replace by
a kernel function, Eqn. . By computing the dot product
directly using a kernel function, one avoid the mapping
. This is desirable because
has possibly infinite
dimensions and can be tricky or impossible to compute.
Using a kernel function, one need not explicitly know what is.
By using a kernel function, a SVM that operates in infinite dimensional
space can be constructed. This also means that *w* cannot be precomputed,
and that the decision function will be replaced by Eqn. ,
where for every new test data, the kernel function for each SV need
to be recomputed.

The question to ask next is ``what kernel function can be used
?'' i.e. is
there any constraint on the type of kernel function suitable for this
task. For any kernel function suitable for SVM, there must exist at
least one pair of , such that Eqn. is
true, i.e. the kernel function represents the dot product of the data
in . The kernel that has these properties satisfies the
Mercer's condition [Vapnik, 1995], i.e. for any *g*(*x*) with finite
norm (Eqn. ), Eqn. must hold. Any positive
definite kernel satisfies this condition.
It can be proved that the kernel functions in Table
satisfy this condition.

It is interesting to note that by choosing different kernel functions, the SVM can emulate some well known classifiers [Osuna et al., 1997b], as shown in table .

**Table:** Some possible kernel functions and the types of decision
surface they define

Thu Sep 10 11:05:30 BST 1998