five

Diverse-Expression Program

收藏
DataCite Commons2025-04-27 更新2025-04-16 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=fad6e0981bfb4336945a56d4f60b35cc
下载链接
链接失效反馈
官方服务:
资源简介:
The task of molecule generation guided by specific text descriptions has been proposed to generate molecules that match given text inputs. Mainstream methods typically use SMILES to represent molecules and rely on diffusion models or autoregressive structures for modeling. However, the one-to-many mapping diversity when using SMILES to represent molecules causes existing methods to require complex model architectures and larger training datasets to improve performance, which affects the efficiency of model training and generation. In this paper, we propose a Text-Guided Diverse-Expression Diffusion (TGDD) Model for Molecule Generation. TGDD combines both SMILES and SELFIES into a novel Diverse-Expression molecular representation, enabling precise molecule mapping based on natural language. By leveraging this Diverse-Expression representation, TGDD simplifies the segmented diffusion generation process, achieving faster training and reduced memory consumption, while also exhibiting stronger alignment with natural language. TGDD outperforms both TGM-LDM and the autoregressive model MolT5-Base on most evaluation metrics.
提供机构:
Science Data Bank
创建时间:
2025-02-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作