Locally finite operator

From formulasearchengine
Jump to navigation Jump to search
Illustration of the mapping . On the left a set of samples in the input space, on the right the same samples in the feature space where the polynomial kernel (for some values of the parameters and is the inner product. The hyperplane learned in feature space by an SVM is an ellipse in the input space.

In machine learning, the polynomial kernel is a kernel function commonly used with support vector machines (SVMs) and other kernelized models, that represents the similarity of vectors (training samples) in a feature space over polynomials of the original variables, allowing learning of non-linear models.

Intuitively, the polynomial kernel looks not only at the given features of input samples to determine their similarity, but also combinations of these.

Definition

For degree-d polynomials, the polynomial kernel is defined as[1]

where and are vectors in the input space, i.e. vectors of features computed from training or test samples, is a constant trading off the influence of higher-order versus lower-order terms in the polynomial. When , the kernel is called homogeneous.[2] (A further generalized polykernel divides by a user-specified scalar parameter .[3])

As a kernel, corresponds an inner product in a feature space based on some mapping :

The nature of can be glanced from an example. Let , so we get the special case of the quadratic kernel. Then

From this it follows that the feature map is given by:

When the input features are binary-valued (booleans), and the terms are ignored, the mapped features correspond to conjunctions of input features.[4]

Practical use

Although the RBF kernel is more popular in SVM classification than the polynomial kernel, the latter is quite popular in natural language processing (NLP).[4][5] The most common degree is d=2, since larger degrees tend to overfit on NLP problems.

Various ways of computing the polynomial kernel (both exact and approximate) have been devised as alternatives to the usual non-linear SVM training algorithms, including:

One problem with the polynomial kernel is that may it suffer from numerical instability: when , tends to zero as is increased, whereas when , tends to infinity.[3]

References

  1. http://www.cs.tufts.edu/~roni/Teaching/CLT/LN/lecture18.pdf
  2. Template:Cite arXiv
  3. 3.0 3.1 Chih-Jen Lin (2012). Machine learning software: design and practical use. Talk at Machine Learning Summer School, Kyoto.
  4. 4.0 4.1 4.2 Yoav Goldberg and Michael Elhadad (2008). splitSVM: Fast, Space-Efficient, non-Heuristic, Polynomial Kernel Computation for NLP Applications. Proc. ACL-08: HLT.
  5. 5.0 5.1 Yin-Wen Chang, Cho-Jui Hsieh, Kai-Wei Chang, Michael Ringgaard and Chih-Jen Lin (2010). Training and testing low-degree polynomial data mappings via linear SVM. J. Machine Learning Research 11:1471–1490.
  6. 6.0 6.1 T. Kudo and Y. Matsumoto (2003). Fast methods for kernel-based text analysis. Proc. ACL 2003.