five

all ECFP4 of ChEMBL25 and ZINC20 as JSON dicts

收藏
Figshare2023-02-28 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/ChEMBL_and_ZINC_ECFP_dictionnaries_for_whitelisting/20937427/3
下载链接
链接失效反馈
官方服务:
资源简介:
2 JSON dicts that list the connectivity features (key) ECFP4 (including the ECFP2) as detected by the GetMorganFingerprint function of the RDkit program. One files encompass all the 556,187 ECFP4 of the substances of ChEMBL25 as downloaded in September 2019 with 1,817,766 unique molecules. It is a large curated database of bioactive molecules. Here the values are 5 ChEMBL references that can be used to represent the fingerprint. <br> The second dict include the 1,156,416 ECFP(2 and 4) encountered in either the ZINC20 or ChEMBL25. ZINC is larger than ChEMBL and is based on commercially available compounds and not restricted to bioactive molecules. It encompass in proportion more inorganic and organometallic compounds than ChEMBL. We have used the already prepared version ZINC20-ML by Artem Cherkasov and Francesco Gentile with all the 1,006,651,037 ZINC20 molecules as of early March 2021. ZINC20-ML is available at https://files.docking.org/zinc20-ML/. <br> <br>

本数据集包含2个JSON字典,其中收录了通过RDKit软件(RDKit)的GetMorganFingerprint函数(GetMorganFingerprint)所检测得到的连接性特征(connectivity features,以键名形式存储)ECFP4(包含ECFP2)。首个文件涵盖了2019年9月下载的ChEMBL25数据库中全部556,187个ECFP4特征,对应1,817,766个独特分子。该库为经人工整理优化的大型生物活性分子数据库,其中每个指纹可借助5条ChEMBL参考文献完成表征。<br>第二个字典则收录了ZINC20或ChEMBL25中出现的1,156,416个ECFP(2和4)特征。ZINC数据库的规模大于ChEMBL,其数据基于可商用获取的化合物,而非仅局限于生物活性分子,且相较于ChEMBL,该库包含比例更高的无机化合物与有机金属化合物。本次研究使用了由Artem Cherkasov与Francesco Gentile构建的ZINC20-ML预制备版本,该版本涵盖了2021年3月初时全部1,006,651,037个ZINC20分子。ZINC20-ML数据集可于https://files.docking.org/zinc20-ML/ 获取。<br><br>
提供机构:
Cauchy, Thomas
创建时间:
2023-02-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作