five

Data for: Advances and critical assessment of machine learning techniques for prediction of docking scores

收藏
DataONE2023-09-05 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:1bd0fddc0333ddbe8908929d4fa3ed26a4a5bc1db0096a1103b699c2273f7376
下载链接
链接失效反馈
官方服务:
资源简介:
Semi-flexible docking was performed using AutoDock Vina 1.2.2 software on the SARS-CoV-2 main protease Mpro (PDB ID: 6WQF). Two data sets are provided in the xyz format containing the AutoDock Vina docking scores. These files were used as input and/or reference in the machine learning models using TensorFlow, XGBoost, and SchNetPack to study their docking scores prediction capability. The first data set originally contained 60,411 in-vivo labeled compounds selected for the training of ML models. The second data set,denoted as in-vitro-only, originally contained 175,696 compounds active or assumed to be active at 10 μM or less in a direct binding assay. These sets were downloaded on the 10th of December 2021 from the ZINC15 database. Four compounds in the in-vivo set and 12 in the in-vitro-only set were left out of consideration due to presence of Si atoms. Compounds with no charges assigned in mol2 files were excluded as well (523 compounds in the in-vivo and 1,666 in the in-vitro-only..., Molecular docking calculations and the machine learning approaches are described in the Computational details section of [1]. Reference[1] Lukas Bucinsky, Marián Gall, Ján Matúška, Michal Pitoňák, Marek Štekláč. Advances and critical assessment of machine learning techniques for prediction of docking scores. Int. J. Quantum. Chem. (2023) DOI: 10.1002/qua.27110., ,

本研究以SARS-CoV-2主蛋白酶Mpro(PDB编号:6WQF)为靶点,采用AutoDock Vina 1.2.2软件开展半柔性对接(Semi-flexible docking)实验。本数据集包含两份xyz格式文件,其中存储了AutoDock Vina的对接打分(docking scores)结果。上述文件被用作基于TensorFlow、XGBoost及SchNetPack构建的机器学习(Machine Learning,ML)模型的输入与/或参照,以评估模型对对接打分的预测性能。第一份数据集初始包含60411个体内(in-vivo)标记的化合物,用于机器学习模型的训练。第二份数据集被命名为仅体外(in-vitro-only)数据集,初始包含175696个在直接结合实验中活性值≤10μM,或被认为具有该活性水平的化合物。上述两份数据集均于2021年12月10日从ZINC15数据库下载获取。由于体内标记数据集存在4个含有硅(Si)原子的化合物,仅体外数据集存在12个含硅原子的化合物,均被剔除出数据集。此外,mol2文件中未分配电荷的化合物也被排除(体内标记数据集剔除523个,仅体外数据集剔除1666个……)。分子对接计算及机器学习方法的详细说明参见参考文献[1]的计算细节章节。参考文献[1]:Lukas Bucinsky、Marián Gall、Ján Matúška、Michal Pitoňák、Marek Štekláč. 《对接打分预测机器学习技术的进展与批判性评估》,《International Journal of Quantum Chemistry》(2023),DOI: 10.1002/qua.27110.
创建时间:
2023-11-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作