Retip: Retention Time Prediction for Compound Annotation in Untargeted Metabolomics
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://figshare.com/articles/dataset/Retip_Retention_Time_Prediction_for_Compound_Annotation_in_Untargeted_Metabolomics/12350204
下载链接
链接失效反馈官方服务:
资源简介:
Unidentified
peaks remain a major problem in untargeted metabolomics
by LC-MS/MS. Confidence in peak annotations increases by combining
MS/MS matching and retention time. We here show how retention times
can be predicted from molecular structures. Two large, publicly available
data sets were used for model training in machine learning: the Fiehn
hydrophilic interaction liquid chromatography data set (HILIC) of
981 primary metabolites and biogenic amines,and the RIKEN plant specialized
metabolome annotation (PlaSMA) database of 852 secondary metabolites
that uses reversed-phase liquid chromatography (RPLC). Five different
machine learning algorithms have been integrated into the Retip R
package: the random forest, Bayesian-regularized neural network, XGBoost,
light gradient-boosting machine (LightGBM), and Keras algorithms for
building the retention time prediction models. A complete workflow
for retention time prediction was developed in R. It can be freely
downloaded from the GitHub repository (https://www.retip.app). Keras outperformed other machine learning
algorithms in the test set with minimum overfitting, verified by small
error differences between training, test, and validation sets. Keras
yielded a mean absolute error of 0.78 min for HILIC and 0.57 min for
RPLC. Retip is integrated into the mass spectrometry software tools
MS-DIAL and MS-FINDER, allowing a complete compound annotation workflow.
In a test application on mouse blood plasma samples, we found a 68%
reduction in the number of candidate structures when searching all
isomers in MS-FINDER compound identification software. Retention time
prediction increases the identification rate in liquid chromatography
and subsequently leads to an improved biological interpretation of
metabolomics data.
创建时间:
2020-05-11



