Prediction of Retention Time by Combining Multiple Data Sets with Chromatographic Parameter Vectorization and Transfer Learning
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Prediction_of_Retention_Time_by_Combining_Multiple_Data_Sets_with_Chromatographic_Parameter_Vectorization_and_Transfer_Learning/29769672
下载链接
链接失效反馈官方服务:
资源简介:
Retention time (RT) can provide orthogonal
information
to mass
spectra, supporting the qualitative identification. However, RT is
influenced by experimental conditions and column parameters, and it
is difficult to have a large amount of RT data in the user’s
experimental conditions. Hence, various machine learning methods,
including advanced deep learning approaches, have been developed for
RT prediction. However, most of them were limited to a given column
and operational conditions. In the meantime, data sparsity often hinders
the prediction performance. In this study, we propose an MDL-TL method
that combines multiple data sets to jointly train the base model.
MDL-TL vectorizes the column and conditions (chromatographic parameters,
CPs) using word2vec and autoencoders, and distinguishes the data sets
from different chromatographic experiments by including the CPs in
the compound representation. This not only augments the data but
also introduces the CPs into the RT prediction, allowing the pretrained
model to be efficiently transferred to different target systems by
fine-tuning. MDL-TL was evaluated against five popular deep learning
approaches and four machine learning approaches on 14 reversed-phase
liquid chromatography data sets and 14 hydrophilic interaction liquid
chromatography data sets, respectively. The results show that our
method surpassed the compared methods, including transfer learning
methods based on the METLIN small molecule retention time (SMRT) data
set, in mean absolute error, median absolute error, mean relative
error, and R2 in most cases, demonstrating
that MDL-TL is a promising approach for predicting RTs for various
chromatographic systems and operational conditions.
创建时间:
2025-08-01



