Identification and Analysis of Genes and Pseudogenes within Duplicated Regions in the Human and Mouse Genomes
收藏NIAID Data Ecosystem2026-03-06 收录
下载链接:
https://figshare.com/articles/dataset/Identification_and_Analysis_of_Genes_and_Pseudogenes_within_Duplicated_Regions_in_the_Human_and_Mouse_Genomes/152895
下载链接
链接失效反馈官方服务:
资源简介:
The identification and classification of genes and pseudogenes in duplicated regions still constitutes a challenge for standard automated genome annotation procedures. Using an integrated homology and orthology analysis independent of current gene annotation, we have identified 9,484 and 9,017 gene duplicates in human and mouse, respectively. On the basis of the integrity of their coding regions, we have classified them into functional and inactive duplicates, allowing us to define the first consistent and comprehensive collection of 1,811 human and 1,581 mouse unprocessed pseudogenes. Furthermore, of the total of 14,172 human and mouse duplicates predicted to be functional genes, as many as 420 are not included in current reference gene databases and therefore correspond to likely novel mammalian genes. Some of these correspond to partial duplicates with less than half of the length of the original source genes, yet they are conserved and syntenic among different mammalian lineages. The genes and unprocessed pseudogenes obtained here will enable further studies on the mechanisms involved in gene duplication as well as of the fate of duplicated genes.
在重复区域内对基因与假基因(pseudogenes)进行鉴定和分类,仍是标准化自动化基因组注释流程面临的核心挑战之一。本研究采用独立于当前基因注释的整合同源性与直系同源性分析策略,分别在人类和小鼠基因组中鉴定出9484个和9017个基因重复拷贝。基于编码区的完整性,我们将这些重复拷贝划分为功能性与失活两类,由此构建了首套一致且全面的人类1811个、小鼠1581个未加工假基因(unprocessed pseudogenes)集合。此外,在预测为功能性基因的14172个人类与小鼠重复拷贝总集当中,多达420个未被纳入当前参考基因数据库,因此极有可能为全新的哺乳动物基因。其中部分拷贝为仅保留原源基因长度不足一半的部分重复序列,但在不同哺乳动物谱系中仍具有保守性且存在共线性。本研究获得的基因与未加工假基因数据集,将为后续深入探究基因重复的作用机制以及重复基因的演化命运提供关键研究基础。
创建时间:
2006-06-30



