A technology-agnostic long-read analysis pipeline for transcriptome discovery and quantification. A technology-agnostic long-read analysis pipeline for transcriptome discovery and quantification
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://www.ncbi.nlm.nih.gov/bioproject/PRJNA613060
下载链接
链接失效反馈官方服务:
资源简介:
Alternative splicing is widely acknowledged to be a crucial regulator of gene expression and is a key contributor to both normal developmental processes and disease states. While cost-effective and accurate for quantification, short-read RNA-seq lacks the ability to resolve full-length transcript isoforms despite increasingly sophisticated computational methods. Long-read sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore (ONT) bypass the transcript reconstruction challenges of short-reads. Here we describe TALON, the ENCODE4 pipeline for analyzing PacBio cDNA and ONT direct-RNA transcriptomes. We apply TALON to three human ENCODE Tier 1 cell lines and show that while both technologies perform well at full-transcript discovery and quantification, each one displayed distinct artifacts. We further apply TALON to mouse cortical and hippocampal transcriptomes and find that a substantial proportion of neuronal genes have more reads associated with novel isoforms than with annotated ones. These data show that TALON is a technology-agnostic long-read transcriptome discovery and quantification pipeline capable of tracking both known and novel transcript models, as well as their expression levels, across datasets for both simple studies and in larger projects. These properties will enable TALON users to move beyond the limitations of short-read data to perform isoform discovery and quantification in a uniform manner on existing and future long-read platforms. Overall design: ONT sequencing of 2 replicates GM128278
可变剪接(Alternative splicing)已被广泛认为是基因表达的关键调控因子,同时也是正常发育过程与疾病状态的重要促成因素。尽管短读长RNA测序(short-read RNA-seq)在定量方面兼具成本效益与准确性,但即便借助日益精进的计算方法,仍无法解析完整长度的转录本异构体。长读长测序平台(long-read sequencing platforms)如太平洋生物科学公司(Pacific Biosciences,PacBio)与牛津纳米孔公司(Oxford Nanopore,ONT)则规避了短读长测序在转录本重构上的难题。本文介绍了TALON——一款用于分析PacBio cDNA与ONT直接RNA转录组的ENCODE4流程。我们将TALON应用于三株人类ENCODE Tier 1细胞系,结果显示:尽管两种技术在完整转录本发现与定量中均表现优异,但各自均存在独特的测序伪影。我们进一步将TALON应用于小鼠皮层与海马体转录组,发现相当比例的神经元基因,其关联的新型异构体测序读段数多于注释异构体的读段数。本研究数据表明,TALON是一款技术无关型长读长转录组发现与定量流程,可跨数据集追踪已知与新型转录本模型及其表达水平,适用于小型研究与大型项目。该特性可帮助TALON用户突破短读长数据的局限,在现有及未来的长读长测序平台上,以统一标准开展异构体发现与定量工作。整体实验设计:对2份生物学重复的GM128278细胞进行ONT测序。
创建时间:
2020-03-17



