Supporting data for "Fusion transcripts and their genomic breakpoints in poly(A)+ and rRNA-minus RNA sequencing data"
收藏DataCite Commons2025-05-26 更新2025-04-15 收录
下载链接:
http://gigadb.org/dataset/100939
下载链接
链接失效反馈官方服务:
资源简介:
Fusion genes are typically identified by RNA-seq without elucidating the causal genomic breakpoints. However, non poly(A)-enriched RNA-seq contains large proportions of intronic reads spanning also genomic breakpoints. <br>We have developed an algorithm, Dr. Disco, that searches for fusion transcripts without being restricted to splice junctions or annotated exon or genes. Using 1,275 RNA-seq samples, we investigated to what extent genomic breakpoints can be extracted from RNA-seq data and their implications regarding poly(A)+ and rRNA-minus RNA-seq data. Comparison with WGS data revealed that most genomic breakpoints are not, or minimally, transcribed while, in contrast, the genomic breakpoints of all 32 TMPRSS2-ERG positive tumors were present at RNA level. This revealed tumors in which the ERG breakpoint was located before ERG, which co-existed with additional deletions and mRNA that incorporated intergenic cryptic exons. In breast cancer and glioma samples we identified rearrangement hotspots near CCND1 and MDM2 and could directly associate this with increased expression. Furthermore, in all datasets we find fusions to intergenic regions, often spanning multiple cryptic exons that potentially encode neo-antigens. Thus, fusion transcripts other than classical gene-to-gene fusions are prominently present and can be identified using RNA-seq. <br>By using the full potential of non poly(A)-enriched RNA-seq data, sophisticated analysis can reliably identify expressed genomic breakpoints and their transcriptional effects.
融合基因(fusion gene)通常通过RNA测序(RNA-seq)进行鉴定,但无法明确其致病基因组断裂点。不过,未富集poly(A)的RNA测序(non poly(A)-enriched RNA-seq)包含大量跨越基因组断裂点的内含子读段(intronic reads)。
我们开发了一款名为Dr. Disco的算法,该算法可搜索融合转录本,且不受剪接位点、已注释外显子或已注释基因的限制。我们利用1275个RNA测序样本,探究了可从RNA测序数据中提取基因组断裂点的程度,以及其对poly(A)富集型(poly(A)+)和核糖体RNA去除型(rRNA-minus)RNA测序数据的影响。
通过与全基因组测序(WGS)数据比对,我们发现大多数基因组断裂点未发生转录,或仅存在极低水平的转录;与之相反,32例TMPRSS2-ERG阳性肿瘤的所有基因组断裂点均在RNA层面存在。
这一结果揭示了一类特殊肿瘤:其ERG基因的断裂点位于ERG基因上游,且此类肿瘤同时伴随额外缺失事件,以及包含基因间隐秘外显子的信使RNA(mRNA)。
在乳腺癌和神经胶质瘤样本中,我们于CCND1和MDM2基因附近鉴定出染色体重排热点区域,并可将该区域与基因表达上调直接关联。
此外,在所有数据集当中,我们均发现了指向基因间区域的融合事件,这类融合通常跨越多个隐秘外显子,有可能编码新抗原(neo-antigens)。
由此可见,除经典的基因-基因融合外,其他类型的融合转录本同样广泛存在,且可通过RNA测序进行鉴定。
充分挖掘未富集poly(A)的RNA测序数据的全部潜力,通过精细化分析即可可靠鉴定出已表达的基因组断裂点及其转录效应。
提供机构:
GigaScience Database
创建时间:
2021-11-01



