Beyond descriptor vectors: QSAR modelling using structural similarity
© Zell et al. 2008
Published: 26 March 2008
Kernel based machine learning methods like support vector machines or gaussian processes have gained increasing attention for QSAR modelling in recent years. One of the most interesting aspects of this method is the analogy between the kernel and a similarity measure. Each similarity measure that fulfils the kernel properties can be used as a kernel. But despite the possibility to incorporate structural/topological information directly into the similarity score, as it is done by state-of-the-art methods like feature tree s, most studies that use kernel methods are limited to classical descriptor representations .
In this work we introduce some structural kernels that overcome these limitations and compare them to QSAR models based numerical encoding. The structural kernels regard the molecular topology as based on atom environments (assignment kernels ) or as based on adapted fingerprint representations. Additionally we compare the structural kernels to numerical kernels calculated for descriptor representation using a set of 200 molecular descriptors.
The structural and numerical kernels were used for training models with both support vector machines and gaussian processes on more than ten different QSAR datasets taken from BindingDB .
The results clearly indicate that in general the structural kernels are superior to the numerical encodings although no single kernel dominates. In the case that no further information is available it seems recommendable to use the optimal assignment kernel due to its overall good performance.
- Rarey M, Dixon JS: J Comp Aid Mol Design. 1998, 12: 471-490.View ArticleGoogle Scholar
- Schwaighofer A, et al: J Chem Inf Mod. 2007, 401-424.Google Scholar
- Fröhlich H, Wegner JK, Zell A: Proc Int Joint Conf Neur Net (IJCNN). 2005, 913-918.Google Scholar
- Chen X, Lin Y, Liu M, Gilson M: Bioinformatics. 2002, 18: 130-139.View ArticleGoogle Scholar