Parsers for SMILES and SMARTS
- Andrew Dalke1
© Dalke 2008
Published: 26 March 2008
SMILES  and SMARTS  are two line notations developed by Daylight and implemented in a number of chemical informatics software tools. For the most part these are hand-written parsers and quite complicated and hard to read, modify, maintain, optimize or reuse. A traditional computer science approach would use a parsing system like lex/yacc but that has not made much inroad in computational chemistry. The tools have been difficult to use, especially for error reporting and recovery, and most of the developers have a chemistry background and don't know the language theory underlying this approach.
Modern programming languages and parser systems have made many of the difficulties disappear. I have been working with people from OpenSMILES  and several of the existing open source toolkits (Open Babel , CDK  and RDKit ) develop valid, useful grammars for SMILES and SMARTS. I have also been evaluating how to implement those grammars using parsing systems like ANTLR , PLY  and ragel . My plan is to fold that work back into the different projects so there is a broader and more consistent support for these two important notations. I expect also that resulting code will be faster, more maintainable, and more flexible for trying new ideas. By documenting the different parts I hope the knowledge of how to use parser frameworks is disseminated into the computational chemistry development community and helps to develop the next generation of chemistry toolkits and line notations like MQL.
This poster presents some of the preliminary results of that work including a SMILES grammar, implementations for ANTLR and PLY, and early performance analysis.
- Weininger D: J Chem Inf Comput Sci. 1988, 28: 31-36. 10.1021/ci00057a005.View ArticleGoogle Scholar
- Proschak E, Wegner J, Schüller A, Schneider G, Fechner UJ: Chem Inf Model. 2007, 47 (I2): 295-301. 10.1021/ci600305h.View ArticleGoogle Scholar