five

Gelm: graph-based Tanimoto similarity grouping pretraining and entropy-guided conformer selection finetuning for large language models

收藏
中国科学数据2026-04-20 更新2026-04-25 收录
下载链接:
https://www.sciengine.com/AA/doi/10.1007/s11432-025-4913-5
下载链接
链接失效反馈
官方服务:
资源简介:
Molecular relationship learning (MRL) aims to understand interactions between molecular pairs, driving advancements in biochemical research. In recent years, large language models (LLMs), with their vast knowledge base and reasoning capabilities, have become important tools for MRL. However, existing LLMs primarily rely on SMILES strings and molecular graph representations, facing three major challenges: a lack of relational awareness, making it difficult to associate molecules with similar structures; overlooking the structural diversity of molecules, preventing the capture of key conformers in real-world reactions and the lack of a systematic evaluation of different LLM backbone models. To address these challenges, we propose Gelm (graph-based Tanimoto similarity grouping pretraining and entropy-guided conformer selection finetuning for large language models), a novel framework that enhances relationship learning through structure similarity-based pretraining and entropy-guided conformer selection. Additionally, we conduct extensive performance evaluations on various backbone models to provide scientific guidance on backbone selection. Our results demonstrate that Gelm, with DeepSeek as the backbone, achieves outstanding performance across 12 cross-domain datasets.
创建时间:
2026-04-14
二维码
社区交流群
二维码
科研交流群
商业服务