five

Pervasive Transcription of the Human Genome Produces Thousands of Previously Unidentified Long Intergenic Noncoding RNAs

收藏
NIAID Data Ecosystem2026-03-07 收录
下载链接:
https://figshare.com/articles/dataset/_Pervasive_Transcription_of_the_Human_Genome_Produces_Thousands_of_Previously_Unidentified_Long_Intergenic_Noncoding_RNAs_/727537
下载链接
链接失效反馈
官方服务:
资源简介:
Known protein coding gene exons compose less than 3% of the human genome. The remaining 97% is largely uncharted territory, with only a small fraction characterized. The recent observation of transcription in this intergenic territory has stimulated debate about the extent of intergenic transcription and whether these intergenic RNAs are functional. Here we directly observed with a large set of RNA-seq data covering a wide array of human tissue types that the majority of the genome is indeed transcribed, corroborating recent observations by the ENCODE project. Furthermore, using de novo transcriptome assembly of this RNA-seq data, we found that intergenic regions encode far more long intergenic noncoding RNAs (lincRNAs) than previously described, helping to resolve the discrepancy between the vast amount of observed intergenic transcription and the limited number of previously known lincRNAs. In total, we identified tens of thousands of putative lincRNAs expressed at a minimum of one copy per cell, significantly expanding upon prior lincRNA annotation sets. These lincRNAs are specifically regulated and conserved rather than being the product of transcriptional noise. In addition, lincRNAs are strongly enriched for trait-associated SNPs suggesting a new mechanism by which intergenic trait-associated regions may function. These findings will enable the discovery and interrogation of novel intergenic functional elements.

已知的蛋白质编码基因外显子仅占人类基因组的不足3%,剩余97%的区域基本属于未被探索的处女地,仅有极小一部分得到了表征与研究。近期在该基因间区域观测到转录现象的成果,引发了学界关于基因间转录的覆盖范围,以及此类基因间RNA是否具备功能的广泛争论。本研究借助覆盖多种人体组织类型的大规模RNA测序(RNA-seq)数据集,直接证实绝大多数基因组区域确实发生转录,这一发现佐证了ENCODE项目(ENCODE)近期的观测结果。此外,通过对该RNA-seq数据开展从头转录组组装分析,我们发现基因间区域编码的长链基因间非编码RNA(long intergenic noncoding RNAs,lincRNAs)数量远多于此前报道,这一结果有助于解决已观测到的海量基因间转录现象与此前已知lincRNAs数量有限之间的矛盾。本研究共计鉴定出数万个表达量至少为单拷贝每细胞的候选lincRNAs,大幅扩充了现有的lincRNA注释集。这些lincRNAs具备特异性调控特性与进化保守性,并非转录噪声的产物。此外,lincRNAs显著富集于性状关联单核苷酸多态性(single nucleotide polymorphisms,SNPs)区域,这提示了基因间性状关联区域可能发挥功能的全新机制。本研究成果将为新型基因间功能元件的发现与功能解析提供重要支撑。
创建时间:
2013-06-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作