Large Language Model-Empowered Compound Collision Cross-Section Prediction
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Large_Language_Model-Empowered_Compound_Collision_Cross-Section_Prediction/30020806
下载链接
链接失效反馈官方服务:
资源简介:
Collision cross section (CCS) is a crucial parameter
in ion mobility-mass
spectrometry, which plays a significant role in enhancing the precision
of compound annotation. Computational prediction methods aim to infer
the CCS value from molecular structure and have become a common strategy
for efficiently building large-scale CCS compound databases. However,
most of the current available methods deliver suboptimal predictive
performance due to limited high-quality training data sets and inadequate
model architectures for handling multimodal features. To address these
issues, we present HyperCCS, a novel CCS prediction framework powered
by chemical large language models (CLLMs). Through fine-tuning a CLLM
that has been trained on billions of SMILES sequences, HyperCCS effectively
captures complex structural and chemical information that the current
models trained on a limited CCS data set might miss. A cross-modal
feature fusion module is designed to dynamically integrate CLLM-derived
features with other heterogeneous features, effectively resolving
structural ambiguities commonly found in multimodal features. Benchmark
evaluation on the METLIN-CCS and AllCCS2 data sets shows that HyperCCS
achieves robust CCS prediction on molecules of various masses, adduct
types, and ion modes, outperforming other methods. To showcase the
adaptability of HyperCCS within a real-world system, results on in-house
experimental data demonstrate its ability to accurately resolve isomers
and extrapolate to high-mass analytes. In addition, SHAP analysis
and ablation studies confirm the crucial role of CLLM-derived features
and the cross-modal feature fusion mechanism in enhancing CCS prediction.
HyperCCS is anticipated to offer a high-throughput and cross-instrument
computational tool to aid experimental efforts in metabolomics and
structural biology.
创建时间:
2025-09-01



