five

List of variables.

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/List_of_variables_/27974097
下载链接
链接失效反馈
官方服务:
资源简介:
A key question in protein evolution and protein engineering is the prevalence of evolutionary paths between distinct proteins. An evolutionary path corresponds to a continuous path of functional sequences in sequence space leading from one protein to another. Natural selection could direct a mutating coding region in DNA along a continuous functional path (CFP), so a new protein could arise far more easily than if a coding region were randomly mutating without any constraints. The distribution and length of CFPs undergird theories on the origin of natural proteins and strategies for engineering artificial proteins. This study examined the distribution of long CFPs within the framework of percolation theory, which addresses the proportion of randomly filled sites in a lattice above which long continuous paths of neighboring filled sites become common (aka percolation threshold). It also used a simulation to demonstrate that the percolation threshold in protein sequence space approximates the reciprocal of the average number of protein variants that could result from a single mutation. For diverse proteins, the ratio was calculated between the percolation threshold and the proportion of sequences reported to perform a protein’s function, relative to the total number of sequences of that protein’s length. This ratio represents a measure of the biasing in the distribution of functional sequences required for evolutionary paths to possibly exist, so it provides a means to quantify the specificity in protein sequence and structure required to allow for a protein to develop new catalytic functions. The consistently high ratio demonstrates that CFPs can only connect distinct proteins if the biasing in the distribution of functional sequences in sequence space is often extremely large. Regions in sequence space are identified where the biasing is sufficient to allow for extensive CFPs. The calculated levels of required biasing and the identified regions of high biasing reinforce the conclusion of previous studies that some proteins are highly optimized, so mutations can enable or enhance catalytic functions while maintaining the protein’s structure. The conclusions of this study also challenge the results of a previous application of percolation theory to sequence space that did not properly incorporate the percolation threshold. Steps are outlined for integrating the percolation threshold and the biasing measure into studies of protein sequence space.
创建时间:
2024-12-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作