five

Data_Sheet_2_Identification and Analysis of Long Repeats of Proteins at the Domain Level.xlsx

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://figshare.com/articles/dataset/Data_Sheet_2_Identification_and_Analysis_of_Long_Repeats_of_Proteins_at_the_Domain_Level_xlsx/9959303
下载链接
链接失效反馈
官方服务:
资源简介:
Amino acid repeats play an important role in the structure and function of proteins. Analysis of long repeats in protein sequences enables one to understand their abundance, structure and function in the protein universe. In the present study, amino acid repeats of length >50 (long repeats) were identified in a non-redundant set of UniProt sequences using the RADAR program. The underlying structures and functions of these long repeats were carried out using the Gene3D for structural domains, Pfam for functional domains and enzyme and non-enzyme functional classification for catalytic and binding of the proteins. From a structural perspective, these long repeats seem to predominantly occur in certain architectures such as sandwich, bundle, barrel, and roll and within these architectures abundant in the superfolds. The lengths of the repeats within each fold are not uniform exhibiting different structures for different functions. We also observed that long repeats are in the domain regions of the family and are involved in the function of the proteins. After grouping based on enzyme and non-enzyme classes, we observed the abundant occurrence of long repeats in specific catalytic and binding of the proteins. In this study, we have analyzed the occurrence of long repeats in the protein sequence universe apart from well-characterized short tandem repeats in sequences and their structures and functions of the proteins at the domain level. The present study suggests that long repeats may play an important role in the structure and function of domains of the proteins.

氨基酸重复序列在蛋白质的结构与功能中发挥着关键作用。对蛋白质序列中的长重复序列开展分析,有助于我们明晰其在蛋白质全域中的丰度、结构与功能特征。本研究采用RADAR程序,在非冗余UniProt(通用蛋白质资源库)序列集中筛选出长度大于50的氨基酸重复序列(即长重复序列)。随后,我们针对这些长重复序列的潜在结构与功能展开了系统分析:借助Gene3D数据库解析结构域,利用Pfam蛋白质家族数据库注释功能域,并通过酶与非酶功能分类体系,对蛋白质的催化与结合功能进行归类。从结构层面来看,这类长重复序列主要存在于特定的蛋白质折叠架构中,如三明治型、束状、桶状以及卷曲型架构,且在这些架构中的超级折叠(superfolds)区域内丰度较高。同一折叠类型内的重复序列长度并不均一,且会因功能差异呈现出不同的结构特征。我们还发现,长重复序列位于蛋白质家族的结构域区域内,并参与蛋白质的功能行使。在按照酶与非酶类别进行分组后,我们观察到长重复序列在特定的蛋白质催化与结合功能类别中丰度较高。本研究不仅分析了序列中已被充分表征的短串联重复序列及其在结构域层面的结构与功能,还探究了长重复序列在蛋白质全域中的分布特征。本研究表明,长重复序列可能在蛋白质结构域的结构与功能中发挥重要作用。
创建时间:
2019-10-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作