Table 1_Composition-centered prediction of kenaf core saccharification for next-generation bioethanol via machine learning.docx
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Table_1_Composition-centered_prediction_of_kenaf_core_saccharification_for_next-generation_bioethanol_via_machine_learning_docx/30673880
下载链接
链接失效反馈官方服务:
资源简介:
IntroductionBiomass pretreatment outcomes are heterogeneous across routes and severities, and condition-centered empirical models often fail to generalize beyond the settings on which they were trained, limiting early-stage decisions about where to focus costly wet-lab effort. This study evaluates a composition-centered surrogate that treats the post-pretreatment solid composition—cellulose, hemicellulose, lignin—as the input space and predicts enzymatic glucose yield as the response for kenaf core.
MethodsKenaf core solids subjected to water, dilute-acid, and alkaline pretreatments were characterized for post-pretreatment cellulose, hemicellulose, and lignin contents and hydrolyzed under a fixed enzymatic protocol to obtain glucose yield at 24 h. The curated dataset (n = 35) was used to train Random-Forest regressors tuned by six hyperparameter optimizers (grid search, random search, Bayesian optimization, genetic algorithm, particle swarm optimization, and simulated annealing). Generalization performance was assessed using nested cross-validation and a held-out test split, with feature contributions examined via permutation importance and accumulated local effects.
ResultsAcross optimizers, held-out performance clustered tightly (test R2 ≈ 0.49–0.55; RMSE 4.42–4.69 GY%), indicating that attainable accuracy is governed more by model capacity and data coverage than by optimizer choice. Feature diagnostics converged on a cellulose-led mechanism, with cellulose showing a positive monotonic effect on yield, lignin a negative effect, and hemicellulose a weaker, context-dependent influence. Iso-yield maps in the cellulose–lignin plane delineated feasible composition windows that prioritize high-cellulose/low-lignin regions under different hemicellulose levels.
DiscussionWithin this accuracy band, the composition-centered surrogate is best suited for uncertainty-aware screening to prune unproductive regions of composition space before targeted design-of-experiments, rather than replacing detailed process optimization. The workflow provides a transferable template for small-sample, composition-based modeling of lignocellulosic feedstocks and can be extended to other varieties and integrated with mechanistic descriptors as data accumulate.
创建时间:
2025-11-21



