UniParser/MolCap
收藏Hugging Face2025-12-24 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/UniParser/MolCap
下载链接
链接失效反馈官方服务:
资源简介:
MolCap是一个大规模的多模态分子数据集,包含超过32万张分子图像和详细的描述。图像通过RDKit渲染并添加随机扰动,描述来源于PubChem描述,并使用GPT-4o进行清理和重写。每个描述包括规范的SMILES、E-SMILES表示(来自“MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild”,ICCV2025)、结构细节、物理化学性质和其他相关描述符。
MolCap is a large-scale multi-modal molecular dataset with over 320k molecular images and detailed captions. The images are rendered by RDKit with random perturbations, and the captions, derived from PubChem descriptions, are cleaned and rewritten using GPT-4o. Each caption includes the canonical SMILES, the E-SMILES representation (introduced in “MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild“, ICCV2025), as well as structural details, physicochemical properties, and other relevant descriptors.
提供机构:
UniParser



