five

SCScore: Synthetic Complexity Learned from a Reaction Corpus

收藏
NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://figshare.com/articles/dataset/SCScore_Synthetic_Complexity_Learned_from_a_Reaction_Corpus/5826108
下载链接
链接失效反馈
官方服务:
资源简介:
Several definitions of molecular complexity exist to facilitate prioritization of lead compounds, to identify diversity-inducing and complexifying reactions, and to guide retrosynthetic searches. In this work, we focus on synthetic complexity and reformalize its definition to correlate with the expected number of reaction steps required to produce a target molecule, with implicit knowledge about what compounds are reasonable starting materials. We train a neural network model on 12 million reactions from the Reaxys database to impose a pairwise inequality constraint enforcing the premise of this definition: that on average, the products of published chemical reactions should be more synthetically complex than their corresponding reactants. The learned metric (SCScore) exhibits highly desirable nonlinear behavior, particularly in recognizing increases in synthetic complexity throughout a number of linear synthetic routes.

目前已有多种分子复杂度(molecular complexity)的定义,用于助力先导化合物的优先筛选、识别可诱导分子多样性与提升复杂度的化学反应,以及指导逆合成检索。本研究聚焦于合成复杂度(synthetic complexity),并重新形式化其定义,使其与合成目标分子所需的预期反应步数相关联,同时暗含了"何为合理起始原料"的认知。我们基于Reaxys数据库中的1200万条化学反应训练了一款神经网络模型,以施加成对不等式约束来强化该定义的核心前提:平均而言,已发表化学反应的产物应比其对应的反应物具有更高的合成复杂度。该学习得到的度量指标SCScore展现出极具理想特性的非线性行为,尤其在识别多条线性合成路线中合成复杂度的提升方面表现出色。
创建时间:
2018-01-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作