MOSES (Molecular sets (MOSES))
收藏OpenDataLab2026-05-24 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/MOSES
下载链接
链接失效反馈官方服务:
资源简介:
我们提出了一个从 ZINC 数据库中提炼出来的基准数据集。_x000D_
_x000D_
该套装基于 ZINC Clean Leads 系列。它总共包含 4,591,276 个分子,按分子量在 250 到 350 道尔顿范围内过滤,可旋转键数不大于 7,XlogP 小于或等于 3.5。我们去除了含有带电原子或除 C、N、S、O、F、Cl、Br、H 或超过 8 个原子的循环之外的原子的分子。分子通过药物化学过滤器 (MCF) 和 PAINS 过滤器过滤。_x000D_
_x000D_
该数据集包含 1,936,962 个分子结构。对于实验,我们将数据集分成训练、测试和支架测试集,分别包含大约 1.6M、176k 和 176k 分子。脚手架测试集包含训练和测试集中不存在的独特 Bemis-Murcko 脚手架。我们使用这组来评估模型生成以前未观察到的支架的能力。
We present a benchmark dataset derived from the ZINC database.
This suite is based on the ZINC Clean Leads collection. It contains a total of 4,591,276 molecules, filtered to meet the following criteria: molecular weight ranging from 250 to 350 Daltons, no more than 7 rotatable bonds, and XlogP ≤ 3.5. We excluded molecules containing charged atoms, atoms other than C, N, S, O, F, Cl, Br, and H, or atoms located outside of rings with more than 8 atoms. Molecules were additionally filtered using the Medicinal Chemistry Filter (MCF) and PAINS filter.
This dataset contains 1,936,962 molecular structures. For experiments, we partitioned the dataset into training, test, and scaffold test sets, which contain approximately 1.6M, 176k, and 176k molecules respectively. The scaffold test set consists of unique Bemis-Murcko scaffolds that are not present in either the training or test sets. We utilize this partition to assess the model's ability to generate previously unobserved scaffolds.
提供机构:
OpenDataLab
创建时间:
2022-06-28
搜集汇总
数据集介绍

背景与挑战
背景概述
MOSES数据集是从ZINC数据库中提取的分子基准数据集,包含约460万个分子,经分子量、可旋转键数等条件过滤后得到约190万个结构。该数据集被划分为训练集、测试集和支架测试集,专门用于评估分子生成模型产生新支架的能力。
以上内容由遇见数据集搜集并总结生成



