Smiles2Dock
收藏arXiv2024-06-09 更新2024-06-12 收录
下载链接:
https://huggingface.co/datasets/tlemenestrel/Smiles2Dock
下载链接
链接失效反馈官方服务:
资源简介:
Smiles2Dock是由斯坦福大学计算工程系创建的一个大规模多任务分子对接数据集,旨在为机器学习(ML)基础的分子对接算法提供训练和基准测试。该数据集包含从ChEMBL数据库中提取的170万个配体与15个AlphaFold蛋白质的对接结果,总计超过2500万个蛋白质-配体结合分数。数据集利用了AlphaFold的高精度蛋白质模型,涵盖了多种生物学相关的化合物,支持图、Transformer和CNN等主要ML基础对接方法的基准测试。此外,数据集还引入了一种新的基于Transformer的对接分数预测架构,并作为初始基准。Smiles2Dock数据集和代码均公开可用,以支持分子对接领域的新ML方法的开发。
Smiles2Dock is a large-scale multi-task molecular docking dataset developed by the Department of Computational Engineering at Stanford University, designed to offer training and benchmarking resources for machine learning (ML)-based molecular docking algorithms. It contains docking results of 1.7 million ligands extracted from the ChEMBL database paired with 15 AlphaFold-predicted proteins, yielding a total of over 25 million protein-ligand binding scores. Leveraging the high-precision protein models from AlphaFold, the dataset covers a diverse set of biologically relevant compounds and supports benchmarking of major ML-based docking methods including graph-based models, Transformer, and CNN. Furthermore, the dataset introduces a novel Transformer-based docking score prediction architecture as an initial benchmark. Both the Smiles2Dock dataset and its accompanying code are publicly available to support the development of novel ML methods in the molecular docking field.
提供机构:
斯坦福大学计算工程系
创建时间:
2024-06-09



