DataSheet4_Predicting the microalgae lipid profile obtained by supercritical fluid extraction using a machine learning model.pdf
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/DataSheet4_Predicting_the_microalgae_lipid_profile_obtained_by_supercritical_fluid_extraction_using_a_machine_learning_model_pdf/27300156
下载链接
链接失效反馈官方服务:
资源简介:
In this study a Machine Learning model was employed to predict the lipid profile from supercritical fluid extraction (SFE) of microalgae Galdieria sp. USBA-GBX-832 under different temperature (40, 50, 60°C), pressure (150, 250 bar), and ethanol flow (0.6, 0.9 mL min-1) conditions. Six machine learning regression models were trained using 33 independent variables: 29 from RD-Kit molecular descriptors, three from the extraction conditions, and the infinite dilution activity coefficient (IDAC). The lipidomic characterization analysis identified 139 features, annotating 89 lipids used as the entries of the model, primarily glycerophospholipids and glycerolipids. It was proposed a methodology for selecting the representative lipids from the lipidomic analysis using an unsupervised learning method, these results were compared with Tanimoto scores and IDAC calculations using COSMO-SAC-HB2 model. The models based on decision trees, particularly XGBoost, outperformed others (RMSE: 0.035, 0.095, 0.065 and coefficient of determination (R2): 0.971, 0.933, 0.946 for train, test and experimental validation, respectively), accurately predicting lipid profiles for unseen conditions. Machine Learning methods provide a cost-effective way to optimize SFE conditions and are applicable to other biological samples.
本研究采用机器学习模型,针对微藻Galdieria sp. USBA-GBX-832的超临界流体萃取(supercritical fluid extraction, SFE)产物,在不同温度(40、50、60℃)、压力(150、250 bar)及乙醇流速(0.6、0.9 mL·min⁻¹)条件下开展脂谱预测研究。本研究共构建6个机器学习回归模型,输入特征共包含33个自变量:其中29个来自RD-Kit分子描述符,3个来自萃取工艺参数,剩余1个为无限稀释活度系数(infinite dilution activity coefficient, IDAC)。通过脂质组学表征分析,本研究共鉴定出139个脂质特征,注释得到89种脂质作为模型输入项,主要为甘油磷脂与甘油酯类。本研究提出了一种基于无监督学习方法从脂质组学分析结果中筛选代表性脂质的分析流程,并将所得结果与基于COSMO-SAC-HB2模型计算得到的塔尼莫特(Tanimoto)相似度分数及无限稀释活度系数进行对比。基于决策树的模型,尤其是极限梯度提升(XGBoost)模型,整体性能优于其余模型:训练集、测试集及实验验证集的均方根误差(RMSE)分别为0.035、0.095、0.065,决定系数(R²)分别为0.971、0.933、0.946,可精准预测未知工艺条件下的脂谱。机器学习方法为超临界流体萃取工艺的优化提供了经济高效的解决方案,且可推广应用于其他生物样品的相关研究。
创建时间:
2024-10-25



