Machine-Learning-Guided Library Design Cycle for Directed Evolution of Enzymes: The Effects of Training Data Composition on Sequence Space Exploration
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://figshare.com/articles/dataset/Machine-Learning-Guided_Library_Design_Cycle_for_Directed_Evolution_of_Enzymes_The_Effects_of_Training_Data_Composition_on_Sequence_Space_Exploration/17049469
下载链接
链接失效反馈官方服务:
资源简介:
Machine learning
(ML) is becoming an attractive tool in mutagenesis-based
protein engineering because of its ability to design a variant library
containing proteins with a desired function. However, it remains unclear
how ML guides directed evolution in sequence space depending on the
composition of training data. Here, we present a ML-guided directed
evolution study of an enzyme to investigate the effects of a known
“highly positive” variant (i.e., variant known to have
high enzyme activity) in training data. We performed two separate
series of ML-guided directed evolution of Sortase A with and without
a known highly positive variant called 5M in training data. In each
series, two rounds of ML were conducted: variants predicted by the
initial round were experimentally evaluated and used as additional
training data for the second-round of prediction. The improvements
in enzyme activity were comparable between the two series, both achieving
enzyme activity 2.2–2.5 times higher than 5M. Intriguingly,
the sequences of the improved variants were largely different between
the two series, indicating that ML guided the directed evolution to
the distinct regions of sequence space depending on the presence/absence
of the highly positive variant in the training data. This suggests
that the sequence diversity of improved variants can be expanded not
only by conventional ML using the whole training data but also by
ML using a subset of the training data even when it lacks highly positive
variants. In summary, this study demonstrates the importance of regulating
the composition of training data in ML-guided directed evolution.
创建时间:
2021-11-19



