five

Benchmark comparison tests between Ambit-SMIRKS and RDKit chemoinformatics tools

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/1322630
下载链接
链接失效反馈
官方服务:
资源简介:
This archive contains benchmark code and results for Ambit-SMIRKS software package (http://ambit.sf.net) , described in the publication “Kochev N,, Avramova S., Jeliazkova N. Ambit-SMIRKS: a Software Module for Reaction Representation, Reaction Search and Structure Transformation”.  We have performed benchmark testing of Ambit-SMIRKS and RDKit SMIRKS transformation algorithms. For this purpose we used a set of 545 compounds (see file smiles-set.txt) including normal constituents of the body or common components of food, provided by Munro et al. [1] and a set of 84 reactions from RetroTransformDB [2] represented as SMIRKS linear notations (see file SMIRKS-RetroDB.txt). In both software tools (RDKit and Ambit-SMIRKS), each reaction was applied for all compounds at all possible sites thus performing more than 46000 SMIRKS transformations. For the purpose of comparison, Ambit-SMIRKS was applied in mode ALL with a single copy of the products for each reaction site. The java code for Ambit-SMIRKS test is available in file TestAmbitSmirks.java and respectively python code for RDKit test is present in rdkit-smirks-test-02.py. In order to run the tests, Ambit dependency modules (version 3.2.0) are required (see more about Ambit at https://ambit.sf.net/) as well as RDKit (release 2018.03) installation is needed (see http://www.rdkit.org/). The tests were performed on a PC computer (Intel/Core i5-8250U, 1.6GHz/12 GB RAM), under Win10 Operating system. The calculations took about 30 seconds for RDKit software and about 40 seconds for Ambit-SMIRKS.  The computational time for both software includes the SMIRKS parsing and reaction application as well as molecule preprocessing and file operations. Each algorithm was run 3 times. Detail timing info is present in file time-stat.txt. The raw data outputs for both software tools respectively are stored in files: rdkit-out.txt and  ambit-out-no-eq-filter.txt The generated output files are constructed from blocks for each SMIRKS in the following format: ##smirks-number --> , , … --> , , … … --> , , … On the base of generated raw test data, comparison statistics was summarized in file  compare-ambit-rdkit.xlsx containing following columns: SMILES – target molecule smiles, smirks_num – the index of reaction SMIRKS applied against the target, Ambit-NEF – number of reacted sites in the target molecule for Ambit algorithm, RDKit - number of reacted sites in the target molecule for RDKit algorithm , Diff – absolute difference the number reacted sites in Ambit and RDKit, FlagDiff – it is 1 (true) if the Diff is non zero, FlagRDKitReact – it is 1 (true) if at least one site is reacted in the target molecule by RDKit tool (i.e. RDKit column values > 0), FlagAmbitReact - it is 1 (true) if at least one site is reacted in the target molecule by Ambit-SMIRKS tool (i.e. Ambit-NEF column values > 0). Out of 46410 tests, 6096 test reactions were successfully applied for at least one site in Ambit-SMIRKS (i.e. the value in column Ambit-NEF is not zero) and 5729 reactions were successfully applied for at least one site in RDKit accordingly (i.e. the value in column RDKit is not zero). The obtained total number of reacted sites for Ambit-SMIRKS and RDKit is 41453 and 40782 respectively. We have performed statistics of the number of reacted sites for both software tools and differences were observed for 436 reactions. From our analysis we may infer that the observed differences are mainly due to different treatment of equivalent molecules sites and some small differences of the internal presentation of the molecules and the chemical reactions on both software packages.  [1] Munro I., Ford RA, Kennepohl E, Sprenger J. Correlation of structural class with no-observed-effect-levels: a proposal for establishing a threshold of concern. Food Chem Toxicol. 1996;34:829–867. [2] https://doi.org/10.5281/zenodo.1209313
创建时间:
2020-01-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作