five

The Rapid Evolution of De Novo Proteins in Structure and Complex

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10712835
下载链接
链接失效反馈
官方服务:
资源简介:
Recent studies have established that de novo genes, evolving from non-coding sequences, enhance protein diversity through a stepwise process. However, the pattern and rate of their evolution in protein structure over time remain unclear. Here, we addressed these issues within surprisingly a short evolutionary timescale (<1 million years for 97% of Oryza de novo genes) with comparative approaches to gene duplicates. We found that de novo genes evolve faster than gene duplicates in the intrinsic disordered regions (IDRs, such as random coils), secondary structural elements (such as α-helix and β-strand), hydrophobicity, and molecular recognition features (MoRFs). In de novo proteins, specifically, we observed an 8-14% decay in random coils and IDR lengths per million years per protein, and a 2.3-6.5% increase in structured elements, hydrophobicity, and MoRFs. These patterns of structural evolution align with changes in amino acid composition over time as well. We also revealed higher positive charges but smaller molecular weights for de novo proteins than duplicates. Tertiary structure predictions demonstrated that most de novo proteins, though not typically well-folded on their own, readily form low-energy and compact complexes with other proteins facilitated by extensive residue contacts and conformational flexibility, suggesting a faster-binding scenario in de novo proteins to promote interaction. These analyses illuminate a rapid evolution of protein structure in de novo genes in rice genomes, originating from noncoding sequences, highlighting their quick transformation into active, protein complex-forming components within a remarkably short evolutionary timeframe.     duplicated protein-complex.zip: the results of PDB files for complexes of duplicated proteins. 175denovo.structure.tar.gz: raw results (including PDB files) for AF2 of de novo proteins. de novo protein-complex.zip: the results of PDB files for complexes of de novo proteins. usscore.gene.pair.value.br.type: the TM-scores among models of duplicated protein complexes and de novo protein complexes. denovo.dup.rrcontacts.ratioofshortpeplen.freeenergy: Residue-residue contacts per amino acid (out of shortest protein) and free energy among complexes.
创建时间:
2024-03-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作