all ECFP4 of ChEMBL25 and ZINC20 as JSON dicts
收藏Figshare2023-02-28 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/ChEMBL_and_ZINC_ECFP_dictionnaries_for_whitelisting/20937427/3
下载链接
链接失效反馈官方服务:
资源简介:
2 JSON dicts that list the connectivity features (key) ECFP4 (including the ECFP2) as detected by the GetMorganFingerprint function of the RDkit program. One files encompass all the 556,187 ECFP4 of the substances of ChEMBL25 as downloaded in September 2019 with 1,817,766 unique molecules. It is a large curated database of bioactive molecules. Here the values are 5 ChEMBL references that can be used to represent the fingerprint. <br> The second dict include the 1,156,416 ECFP(2 and 4) encountered in either the ZINC20 or ChEMBL25. ZINC is larger than ChEMBL and is based on commercially available compounds and not restricted to bioactive molecules. It encompass in proportion more inorganic and organometallic compounds than ChEMBL. We have used the already prepared version ZINC20-ML by Artem Cherkasov and Francesco Gentile with all the 1,006,651,037 ZINC20 molecules as of early March 2021. ZINC20-ML is available at https://files.docking.org/zinc20-ML/. <br> <br>
本数据集包含2个JSON字典,其中收录了通过RDKit软件(RDKit)的GetMorganFingerprint函数(GetMorganFingerprint)所检测得到的连接性特征(connectivity features,以键名形式存储)ECFP4(包含ECFP2)。首个文件涵盖了2019年9月下载的ChEMBL25数据库中全部556,187个ECFP4特征,对应1,817,766个独特分子。该库为经人工整理优化的大型生物活性分子数据库,其中每个指纹可借助5条ChEMBL参考文献完成表征。<br>第二个字典则收录了ZINC20或ChEMBL25中出现的1,156,416个ECFP(2和4)特征。ZINC数据库的规模大于ChEMBL,其数据基于可商用获取的化合物,而非仅局限于生物活性分子,且相较于ChEMBL,该库包含比例更高的无机化合物与有机金属化合物。本次研究使用了由Artem Cherkasov与Francesco Gentile构建的ZINC20-ML预制备版本,该版本涵盖了2021年3月初时全部1,006,651,037个ZINC20分子。ZINC20-ML数据集可于https://files.docking.org/zinc20-ML/ 获取。<br><br>
提供机构:
Cauchy, Thomas
创建时间:
2023-02-28



