Exploring benchmark dataset bias in ligand based virtual screening
© Baumann and Rohrer 2008
Published: 26 March 2008
A common finding of many reports evaluating VS methods is that validation results vary considerably with changing datasets, i.e. chemical space of the active ligands. It is assumed that these dataset specific effects are caused by the self-similarity and cluster structure inherent to these datasets.
As a first step, an experimental setup was developed that isolated dataset composition as the sole factor of variance influencing VS performance. The Hert-Willet benchmark datasets have been widely used for the validation of ligand based VS protocols. Various sampling strategies (D-optimum design, Onion-design, minimum distance design) were employed to generate archetypal subsamples from these datasets: (1) maximum diversity subsets, (2) space filling samples and (3) subsets with the minimum intra-set diversity. The analysis of the varying VS performance on these prototype datasets showed that dataset composition does indeed exert a critical influence on VS validation and identified local clustering and global spread of the datasets with respect to the set of decoys as the factors with the highest impact on VS performance.
Keeping the concept of chemical space in mind, it is reasonable to make use of the field of spatial statistics, which offers a wealth of methods for the analysis of clustering, patchiness and dispersion of datasets. By employing these, we were able to analyse the spatial composition of the benchmark datasets in more detail and derive several rules of thumb for choosing unbiased datasets for evaluating ligand based VS methods.