Volume 2 Supplement 1
Finding new potential acetylcholine esterase Inhibitors in SDFiles using CWM Lead Finder and PASS (Prediction of Activity Spectra for Substances)
© Himmler and Kos 2008
Published: 26 March 2008
Chemical Workflow Manager (CWM) provides the non computer specialist a graphical user interface (GUI) to execute predefined chemistry related workflows. We provide several such predefined (and supported) workflows as products such as CWM Lead Finder and CWM Structure Comparer. CWM products are ready to use ‘one-click’ workflows. CWM is not a toolbox such as most of other workflow shells. Our vision is to incorporate free scientific software (such as CDK , Weka  and others) in ready to use and supported products usable by non computer specialists using a common user interface.
In the poster we present CWM Lead Finder. CWM Lead Finder reads a (in most cases rather small) SDFile containing structures with known biological activity and reads a (normally rather large) SDFile with structures of unknown activity. During execution of the workflow, the PASS (Prediction of Activity Spectra of Substances) program  is used to calculate prediction coefficients for around 3000 biological activities for both the structures with known and unknown' activity. These numbers are the difference between the percentage of prediction of being active (Pa) and being inactive (Pi). Using a cutoff factor the user can create a ‘biological profile’ (biological effects predicted with high probability for structures with known activity) that is further used in the workflow. Using this activity list the prediction coefficients are calculated for both the structures with known and unknown activity and afterwards are clustered. Structures from the SDFile with unknown activity that end up in clusters that contain structures with known activity have a high probability to show the same biological effects, thus are potential candidates for testing against this activity. This application answers the question: ”Which compounds are active?”
The workflow goes one step further, and helps to present the most interesting compounds. The workflow selects compounds from clusters that have the same biological profile as the ‘known’ compounds, and calculates the chemical fingerprints. We use the CDK 2 for calculating the chemical fingerprints. The fingerprints are again clustered: some clusters are a mixture of compounds of known and unknown activity, some contain only ‘known’ compounds, and other only ‘unknowns’. The most interesting clusters to look at are the ones that contain exclusively structures with unknown activity, since those structures are divers to the structures of known activity but have a high probability to show the same activity.
The aim of CWM Lead Finder is to allow users that are not computer specialists or do not have in-depth know-how about prediction software to easily test their compounds for biological activity. Chemical Workflow Manager products are always predefined workflows that are (in the easiest case) ‘one-click’ workflows. For advanced users the CWM user interface also allows to define new workflows, respectively modifying existing workflows.
Chemical Workflow Manager is based on the new Microsoft Workflow Foundation (WF) . WF is part of .Net and therefore available for free. It provides a framework for developing workflow centric solutions.
Workflows executed in Chemical Workflow Manager are tightly integrated in Microsoft SQL Server, thus allowing to run workflows from the database (i.e. eliminates the problem of local files). All relevant information about the executed workflow is stored in the SQL Server database allowing easy generation of reports about how the results got obtained.
- Chemical Development Kit (CDK): [http://sourceforge.net/projects/cdk]
- Witten Ian, Frank Eibe: “Data Mining: Practical machine learning tools and techniques”. 2005, Morgan Kaufmann, San Francisco, (2005) [http://www.cs.waikato.ac.nz/ml/weka/], 2nd EditionGoogle Scholar
- Poroikov VV, Filimonov DA, Borodina YV, Lagunin AA, Kos A: Robustness of biological activity spectra predicting by computer program PASS for non-congeneric sets of chemical compounds. J. Chem Inform Comput Sci. 2000, 40 (6): 1349-1355. 10.1021/ci000383k. [http://www.akosgmbh.eu/pass]View ArticleGoogle Scholar
- Windows Workflow Foundation: [http://msdn2.microsoft.com/en-us/netframework/aa663328.aspx]