Expansion of Protein Domain Repeats
收藏NIAID Data Ecosystem2026-03-06 收录
下载链接:
https://figshare.com/articles/dataset/Expansion_of_Protein_Domain_Repeats/152840
下载链接
链接失效反馈官方服务:
资源简介:
Many proteins, especially in eukaryotes, contain tandem repeats of several domains from the same family. These repeats have a variety of binding properties and are involved in protein–protein interactions as well as binding to other ligands such as DNA and RNA. The rapid expansion of protein domain repeats is assumed to have evolved through internal tandem duplications. However, the exact mechanisms behind these tandem duplications are not well-understood. Here, we have studied the evolution, function, protein structure, gene structure, and phylogenetic distribution of domain repeats. For this purpose we have assigned Pfam-A domain families to 24 proteomes with more sensitive domain assignments in the repeat regions. These assignments confirmed previous findings that eukaryotes, and in particular vertebrates, contain a much higher fraction of proteins with repeats compared with prokaryotes. The internal sequence similarity in each protein revealed that the domain repeats are often expanded through duplications of several domains at a time, while the duplication of one domain is less common. Many of the repeats appear to have been duplicated in the middle of the repeat region. This is in strong contrast to the evolution of other proteins that mainly works through additions of single domains at either terminus. Further, we found that some domain families show distinct duplication patterns, e.g., nebulin domains have mainly been expanded with a unit of seven domains at a time, while duplications of other domain families involve varying numbers of domains. Finally, no common mechanism for the expansion of all repeats could be detected. We found that the duplication patterns show no dependence on the size of the domains. Further, repeat expansion in some families can possibly be explained by shuffling of exons. However, exon shuffling could not have created all repeats.
众多蛋白质,尤其是真核生物(eukaryotes)中的蛋白质,携带来自同一结构域(domain)家族的多个结构域形成的串联重复(tandem repeats)序列。这类串联重复序列具有多样的结合特性,既参与蛋白质-蛋白质相互作用,又可与DNA、RNA等其他配体(ligand)结合。学界普遍认为,蛋白质结构域串联重复的快速扩张是通过内部串联重复(internal tandem duplications)演化而来的,但这类串联重复背后的确切分子机制尚未被充分阐明。本研究针对结构域串联重复的演化过程、生物学功能、蛋白质结构、基因结构以及系统发育分布展开了探究。为此,我们针对重复区域采用了灵敏度更高的结构域注释策略,将Pfam-A结构域家族注释至24个蛋白质组(proteome)中。此次注释结果验证了此前的研究结论:与原核生物(prokaryotes)相比,真核生物,尤其是脊椎动物(vertebrates)中,含有结构域串联重复的蛋白质比例显著更高。对单个蛋白质内部的序列相似性分析显示,结构域串联重复通常通过一次复制多个结构域的方式实现扩张,而单结构域的单次复制则较为少见。多数串联重复序列似乎是在重复区域的中部发生复制扩增的,这与其他蛋白质的演化模式形成鲜明对比——后者主要通过在蛋白质两端添加单个结构域实现演化。此外,本研究发现部分结构域家族具有独特的复制模式:例如伴肌动蛋白(nebulin)结构域主要以7个结构域为一个单元进行批量扩增,而其他结构域家族的复制则涉及不同数量的结构单元。最终,本研究未发现适用于所有串联重复序列扩张的通用机制。研究发现,结构域复制模式与结构域自身的大小并无关联。此外,部分结构域家族的串联重复扩张或许可以通过外显子重排(exon shuffling)得以解释,但外显子重排并非所有串联重复序列产生的成因。
创建时间:
2016-01-18



