So far the SVM classifier can only have a linear hyper-plane as its decision surface. This formulation can be further extended to build a nonlinear SVM. The motivation for this extension is that a SVM with nonlinear decision surface can classify nonlinearly separable data.
Consider a mapping , Eqn. , which maps the training data from to some higher Euclidean space , which possibly has infinite dimensions. In this high dimension space, the data can be linearly separable, hence the linear SVM formulation above can be applied to these data. In the SVM formulation, the training data only appear in the form of dot products, i.e. in Eqn. , the same is true in the decision function ( ). These can be replace by dot products in the Euclidean space , i.e. .
The dot product in the high dimension space can also be replace by a kernel function, Eqn. . By computing the dot product directly using a kernel function, one avoid the mapping . This is desirable because has possibly infinite dimensions and can be tricky or impossible to compute. Using a kernel function, one need not explicitly know what is. By using a kernel function, a SVM that operates in infinite dimensional space can be constructed. This also means that w cannot be precomputed, and that the decision function will be replaced by Eqn. , where for every new test data, the kernel function for each SV need to be recomputed.
The question to ask next is ``what kernel function can be used ?'' i.e. is there any constraint on the type of kernel function suitable for this task. For any kernel function suitable for SVM, there must exist at least one pair of , such that Eqn. is true, i.e. the kernel function represents the dot product of the data in . The kernel that has these properties satisfies the Mercer's condition [Vapnik, 1995], i.e. for any g(x) with finite norm (Eqn. ), Eqn. must hold. Any positive definite kernel satisfies this condition. It can be proved that the kernel functions in Table satisfy this condition.
It is interesting to note that by choosing different kernel functions, the SVM can emulate some well known classifiers [Osuna et al., 1997b], as shown in table .
Table: Some possible kernel functions and the types of decision surface they define