zihaojing/MuMo-Pretraining
收藏Hugging Face2025-10-29 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/zihaojing/MuMo-Pretraining
下载链接
链接失效反馈官方服务:
资源简介:
MuMo预训练数据集是一个针对分子表示的多模态融合框架,旨在解决3D构象不稳定性和模态崩溃问题。该数据集基于ChEMBL数据库过滤后的约160万个分子,通过结合2D拓扑和3D几何信息,创建了一个统一的稳定结构先验,并通过渐进式注入机制将其不对称地整合到序列流中。数据集分为训练集和验证集,支持长距离依赖建模和鲁棒信息传播。MuMo在29个基准任务上平均提高了2.7%,在22个任务中排名第一。
The MuMo Pretraining Dataset is a multimodal fusion framework for molecular representation designed to address issues of 3D conformer unreliability and modality collapse. Based on filtered ChEMBL with approximately 1.6 million molecules, the dataset combines 2D topology and 3D geometry to create a unified and stable structural prior, which is asymmetrically integrated into the sequence stream through a Progressive Injection mechanism. The dataset is split into training and validation sets, supporting long-range dependency modeling and robust information propagation. MuMo achieves an average improvement of 2.7% over the best-performing baseline on 29 benchmark tasks, ranking first on 22 of them.
提供机构:
zihaojing



