Inconsistency of LLMs in Molecular Representations
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14430368
下载链接
链接失效反馈官方服务:
资源简介:
This dataset was curated based on the LlaSMol dataset, reaction prediction, and property prediction subsets. We augmented the original dataset by translating SMILES string representation into IUPAC name representation. The resulting dataset consists of one-to-one mapped SMILES and IUPAC representations. Below is the statistics of the dataset:
| Task | #Train | #Valid | #Test ||--------------------------------------|----------|--------|-------|| Forward reaction prediction (full) | 963,567 | 1,956 | 300 || Forward reaction prediction (subset) | 76,379 | 1,956 | 300 || Retrosynthesis (full) | 932,616 | 2,004 | 300 || Retrosynthesis (subset) | 76,471 | 2,004 | 300 || Property - BBBP | 1,521 | 188 | 189 || Property - ClinTox | 1,063 | 127 | 131 || Property - HIV | 32,864 | 4,104 | 300 || Property - SIDER | 21,800 | 2,540 | 300 || Property - ESOL | 888 | 111 | 112 || Property - LIPO | 3,358 | 385 | 300 |
创建时间:
2024-12-18



