five

Active Learning Improves Ionization Efficiency Predictions and Quantification in Nontargeted LC/HRMS

收藏
Figshare2025-06-13 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Active_Learning_Improves_Ionization_Efficiency_Predictions_and_Quantification_in_Nontargeted_LC_HRMS/29316829
下载链接
链接失效反馈
官方服务:
资源简介:
Liquid chromatography electrospray ionization high-resolution mass spectrometry (LC/ESI/HRMS) is frequently employed in nontargeted screening (NTS) due to its high selectivity and sensitivity. However, data interpretation is challenging since the number of chemical standards available for quantification is limited and the response of the chemicals vastly differs depending on their structure and analysis conditions. Therefore, machine learning (ML) models have been utilized to predict ionization efficiency (IE) and enable the quantification of detected chemicals. It has been observed that the error in the predictions is high for chemicals structurally different from the training data. To enlarge the training set and to accurately predict the IE given a limited labeling budget, active learning (AL) is proposed to acquire informative data points from the targeted chemical space. In the current study, four AL approaches (clustering-based, uncertainty-based, mix, and anticlustering) and a baseline approach (random) were evaluated for IE prediction. The RMSE of the IE in the targeted space dropped significantly (up to 0.3 log units) after a single AL iteration, highlighting the necessity of chemical space exploration before ML model execution. Clustering-based AL reduced the RMSE least, while the uncertainty-based AL was inefficient if ten or more chemicals were sampled in one iteration, thereby reducing its practicality. Finally, expanding the chemical space improved the quantification accuracy from a fold error of 4.13× to 2.94× for five natural products in Alpinia officinarum, thereby demonstrating the need for updating the chemical space coverage of the training set.
创建时间:
2025-06-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作