five

Interpretable Yield Prediction of Supercritical CO2 Extraction from Various Essential Oil Sources Using Optimized Machine Learning and PCA-Based Descriptors

收藏
Figshare2025-12-15 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Interpretable_Yield_Prediction_of_Supercritical_CO_sub_2_sub_Extraction_from_Various_Essential_Oil_Sources_Using_Optimized_Machine_Learning_and_PCA-Based_Descriptors/30882638
下载链接
链接失效反馈
官方服务:
资源简介:
Predicting essential oil yield in supercritical CO2 (SC–CO2) extraction remains difficult due to variations in plant composition and process conditions. Conventional models often assume uniform feedstock behavior, which limits their applicability across diverse species. This study develops machine learning models that integrate extraction parameters with principal component analysis (PCA)-based molecular descriptors representing the seven major compounds of each essential oil source. A data set of 1313 experimental records from 42 plant species was compiled to train three algorithms: LightGBM (LGBMR), HistGradientBoosting (HGBR), and Extra Trees (ETR). The models were optimized using four metaheuristic algorithms to improve their predictive accuracy. All models achieved high predictive performance (R2 > 0.97). The ETR model optimized by a genetic algorithm (ETR-3PCs-GA) attained the highest performance (R2 = 0.9808, root-mean-square error (RMSE) = 0.7802), while the HGBR model with two principal components and GA optimization (HGBR-2PCs-GA) demonstrated superior ability to predict dynamic extraction profiles (RMSE = 0.408). SHapley Additive exPlanations (SHAP) analysis identified pressure and selected PCA coordinates as the most influential features, revealing that both process parameters and molecular composition jointly determine extraction efficiency. The model successfully generalized yield prediction across species and reproduced known process trends, such as the positive effects of pressure and flow rate on yield. The findings also indicate a synergistic effect, whereby the entire molecular profile, not just the most abundant compounds, governs the final yield. This approach demonstrates that integrating molecular-level information with process data can provide transferable, interpretable models for optimizing SC–CO2 extraction of essential oils.
创建时间:
2025-12-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作