five

Towards Generalizable In Silico Predictions of Differential Ion Mobility Using Machine Learning and Customized Fingerprint Engineering

收藏
Figshare2025-04-10 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Towards_Generalizable_i_In_Silico_i_Predictions_of_Differential_Ion_Mobility_Using_Machine_Learning_and_Customized_Fingerprint_Engineering/28768042
下载链接
链接失效反馈
官方服务:
资源简介:
Differential mobility spectrometry (DMS), a tool for separating chemically similar species (including isomers), is readily coupled to mass spectrometry to improve selectivity in analytical workflows. DMS dispersion curves, which describe the dynamic mobility experienced by an ion in a gaseous environment, show the maximum ion transmission for an analyte through the DMS instrument as a function of the separation voltage (SV) and compensation voltage (CV) conditions. To date, there exists no fast, general prediction tool for the dispersion behavior of ions. Here, we demonstrate a machine learning (ML) model that achieves generalized dispersion prediction using an in silico feature addition pipeline. We employ a data set containing 1141 dispersion curve measurements of anions and cations recorded in pure N2 environments and in N2 environments doped with 1.5% methanol (MeOH). Our feature addition pipeline can compute 1591 RDKit and Mordred descriptors using only SMILES codes, which are then normalized to sampled molecular distributions (n = 100 000) using cumulative density functions (CDFs). This tool can be thought of as a “learned” feature fingerprint generation pipeline, which could be applied to almost any molecular (bio)cheminformatics tasks. Our best performing model, which for the first time considers solvent-modified environments, has a mean absolute error (MAE) of 2.1 ± 0.2 V for dispersion curve prediction, a significant improvement over the previous state-of-the-art work. We use explainability techniques (e.g., SHAP analysis) to show that this feature addition pipeline is a semideterministic process for feature sets, and we discuss “best practices” to understand feature sets and maximize model performance. We expect that this tool could be used for prescreening to accelerate or even automate the use of DMS in complex analytical workflows (e.g., 2D LC×DMS separation) and perform automated identification of transmission windows and increase the “self-driving” potential of the instrument. We make our models available as a free and accessible tool at https://github.com/HopkinsLaboratory/DispersionCurveGUI.
创建时间:
2025-04-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作