Volume 3 Supplement 1

4th German Conference on Chemoinformatics: 22. CIC-Workshop

Open Access

A new approach to kernel based data analysis algorithms

  • HY Mussa1 and
  • RC Glen1
Chemistry Central Journal20093(Suppl 1):O6

https://doi.org/10.1186/1752-153X-3-S1-O6

Published: 05 June 2009

Kernel based methods (KBMs) [1, 2] are arguably the best data analysis technique currently available [3, 4]. Unlike Neural Networks in which, besides a global minimum, several local minima exist, a Kernel based fitting/classifying problem is a convex optimization problem with a single minimum. However, finding this minimum (and in doing so yielding optimal parameters of a given observational model) in practice requires the manipulation, such as inversion, of large matrices. This has been challenging even when the number of data points is just over a few thousands [5][6].

The well established direct methods for updating, or inverting huge matrices fail due to the expense of a large increase in core-memory storage and CPU-time, even for moderately-sized systems. The root of the problem is that direct methods have O(N2) core memory storage requirements and the CPU-time scales as O(N3), where N is the dimension of the matrix (the number of data points, here). Despite the advances in computer power, "conventional" computers can only solve relatively small problems (N ≈ 104 to 105).

Another outstanding drawback of the KBMs is how to choose the appropriate kernel function for a given data set [4].

In this paper we would like to propose a computationally efficient training scheme for KBMs for obtaining the global minimum. We also present a systematic approach to selecting the appropriate kernel functions. Some preliminary results on chemical data sets will be illustrated.

Authors’ Affiliations

(1)
Unilever Centre for Molecular Informatics, Department of Chemistry, University of Cambridge Lensfield Road

References

  1. Nadaraya EA: Theory Prob Appl. 1964, 10: 186-10.1137/1110024.View ArticleGoogle Scholar
  2. Watson GS: Sankhya Ser A. 1964, 26: 359-Google Scholar
  3. Vapnik V: The Nature of Statistical Learning Theory. 1995, Springer-Verlag, New YorkView ArticleGoogle Scholar
  4. Shawe-Taylor J, Cristianini N: Kernel Methods for Pattern Analysis. 2004, Cambridge University PressView ArticleGoogle Scholar
  5. Chua KS: Pattern Recognition Letters. 2003, 24: 75-10.1016/S0167-8655(02)00190-3.View ArticleGoogle Scholar
  6. Mangasarian OL, Musicant DR: J Mach Learn Res. 2001, 1: 161-10.1162/15324430152748218.Google Scholar

Copyright

© Mussa and Glen; licensee BioMed Central Ltd. 2009

This article is published under license to BioMed Central Ltd.