five

Machine Learning to Predict Continuous Protein Properties from Binary Cell Sorting Data and Map Unseen Sequence Space

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://www.ncbi.nlm.nih.gov/sra/SRP484190
下载链接
链接失效反馈
官方服务:
资源简介:
Proteins are a diverse class of biomolecules responsible for wide-ranging cellular functions, fromcatalyzing reactions to recognizing pathogens. The ability to evolve proteins rapidly and inexpensivelytowards improved properties is a common objective for protein engineers. Powerful high-throughputmethods like fluorescent activated cell sorting and next-generation sequencing have dramaticallyimproved directed evolution experiments. However, it is unclear how to best leverage these data tocharacterize protein fitness landscapes more completely and identify lead candidates. In this work, wedevelop a simple yet powerful framework to improve protein optimization by predicting continuous proteinproperties from simple directed evolution experiments using interpretable, linear machine learningmodels. Importantly, we find that these models, which use data from simple but imprecise experimentalestimates of protein fitness, have predictive capabilities that approach more precise but expensive data.Evaluated across five diverse protein engineering tasks, continuous properties are consistently predictedfrom readily available deep sequencing data, demonstrating that protein fitness space can be reasonablywell modeled by linear relationships among sequence mutations. To prospectively test the utility of thisapproach, we generated a library of stapled peptides and applied the framework to predict affinity andspecificity from simple cell sorting data. We then coupled integer linear programming, a method tooptimize protein fitness from linear weights, with mutation scores from machine learning to identify newvariants in unseen sequence space that have improved and co-optimal properties. This approachrepresents a versatile tool for improved analysis and identification of protein variants across manydomains of protein engineering.
创建时间:
2024-01-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作