A Three-Module Machine Learning Framework for Protein Sequence- and Temperature-Dependent kcat/Km Prediction in β‑Glucosidases
收藏Figshare2025-10-02 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/A_Three-Module_Machine_Learning_Framework_for_Protein_Sequence-_and_Temperature-Dependent_i_k_i_sub_cat_sub_i_K_i_sub_m_sub_Prediction_in_Glucosidases/30266737
下载链接
链接失效反馈官方服务:
资源简介:
The catalytic activity of enzymes is intricately determined by their amino acid sequences and assay conditions, particularly temperature. Navigating the complex interplay among sequence, temperature, and catalytic function is crucial for unlocking a multitude of enzyme applications. Machine learning has recently emerged as a tool for quantitative prediction of enzyme activity from protein sequences. Unfortunately, ML models designed to predict the comprehensive enzyme activity parameter, kcat/Km, from protein sequences are rare compared to those predicting kcat or Km alone. Combining both protein sequence and temperature as input features further challenges predictions; no current ML models capture the nonlinear relationship between kcat/Km and temperature for a protein sequence of interest. In this study, we developed a unique three-module ML framework that predicts β-glucosidase kcat/Km values based on protein sequence and temperature. Each module was designed to capture a distinct aspect of the interplay among protein sequence, temperature, and kcat/Km for β-glucosidase activity; when integrated, they formed an ML framework that maps the sequence and temperature spaces associated with β-glucosidase kcat/Km. This modular approach allowed for optimizations of ML models within each module, collectively achieving notable generalization performance when predicting temperature-dependent kcat/Km values for protein sequences not encountered during training. Our findings underscore the advantages of the three-module framework over traditional single-module methods, particularly by reducing prediction variability due to data splitting and mitigating overfitting. We anticipate that our multimodule ML framework will be directly applicable to other complex systems, enabling quantitative exploration of their property domains.
创建时间:
2025-10-02



