Improving the diversity of captured full-length isoforms using a normalized single-molecule RNA sequencing method. Homo sapiens

NIAID Data Ecosystem2026-03-11 收录

下载链接：

https://www.ncbi.nlm.nih.gov/bioproject/PRJNA524771

下载链接

链接失效反馈

官方服务：

资源简介：

Human genes form a large variety of isoforms after transcription, encoding distinct transcripts to exert different functions. Single-molecule RNA sequencing facilitates accurate identification of the isoforms by extending nucleotide read length significantly. However, the gene or isoform diversity is lowly represented by the mRNA molecules captured by sequencing because of high diversity in gene expression level, combined with relatively low sequence output. Here, we present a new modified protocol involving cDNA normalization before the library preparation for PacBio RS II sequencing, and thus, generating an increased number of molecules representing different isoforms. Validation sequencing of blood cells, and gastric cancer and adjacent non-malignant tissues exhibited an additional 1.8-2.3 and 1.8-4.7 fold increase in high-quality isoform species by the new cDNA normalization-based capture procedure, as compared to extending read length significantl, per 100,000 raw reads, respectively. The normalized libraries detected substantially increased amount of low abundant transcripts encoding functionally important proteins such as transcription factors and kinases. In addition, we also developed an allele-specific isoform identification and quantification tool (ASIIQT) for non-normalized next-generation RNA sequencing method to sequentially correct, phase, and quantify the isoforms identified by normalized single-molecule sequencing (Q: you mean currently used method?). Finally, to provide the proof-of-concept data to establish the superiority of the new RNA sequencing protocol and ASIIQT methods over existing protocols by profiling and comparing the transcriptomes of gastric signet-ring cell carcinomas and paired non-malignant gastric tissues, and identifying new cancer-specific transcriptome signatures, and thus, bring out the utility of newly developed protocols in gene expression data analyses.

人类基因在转录后可生成多种转录异构体（isoform），并编码各异的转录本以执行不同的生物学功能。单分子RNA测序（single-molecule RNA sequencing）通过显著延长核苷酸读长，可实现异构体的精准识别。然而，由于基因表达水平差异极大，且测序产出量相对较低，测序捕获的mRNA分子仅能微弱反映基因或异构体的多样性。为此，我们开发了一种改良后的新实验方案：在PacBio RS II测序的文库制备步骤前加入cDNA均一化（cDNA normalization）步骤，从而富集更多代表不同异构体的分子。对血细胞、胃癌及配对癌旁正常组织的验证测序结果显示，与仅显著延长读长的方案相比，基于新型cDNA均一化的捕获流程在每10万条原始读段中，可分别使高质量异构体物种数提升1.8~2.3倍与1.8~4.7倍。经均一化处理的文库可检测到大量低丰度转录本，这些转录本编码转录因子（transcription factor）、激酶（kinase）等具有重要生物学功能的蛋白质。此外，我们还针对非均一化的下一代RNA测序（next-generation RNA sequencing）方法，开发了等位基因特异性异构体识别与定量工具（allele-specific isoform identification and quantification tool，ASIIQT），可对经均一化单分子测序鉴定得到的异构体依次进行校正、定相与定量。（原文括号内标注疑问："你是指当前使用的方法吗？"）最后，为通过对胃印戒细胞癌及其配对癌旁正常胃组织的转录组（transcriptome）进行测序分析与对比，验证新型RNA测序方案与ASIIQT工具相较于现有方案的优势，并鉴定全新的癌症特异性转录组特征，从而展现新开发的流程在基因表达数据分析中的应用价值。

创建时间：

2019-02-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集