five

ARTDeco Output

收藏
Figshare2023-12-18 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/ARTDeco_Output/24848265/1
下载链接
链接失效反馈
官方服务:
资源简介:
<b>Transcriptome profiles from Human healthy tissues</b>RNA samples (BAM files) were accessed on 2021/04/01 from the Genotype-Tissue Expression (GTEx; release v8) project allocated to the NCBI database of Genotypes and Phenotypes (dbGaP) <sup>17–19</sup>. Authorization was granted to dbGaP Accession phs000424.v8. p2, where NIH Genomic Data Sharing Policy policies are applied to protect the privacy of patients (all information is anonymized). The GTEx platform includes approximately 948 postmortem donors, from whom RNA samples from several tissues were isolated in an ongoing manner as donors were enrolled in the study. We considered only paired-end samples with at least 60 million reads per sample and prepared with the Illumina TruSeq library construction protocol (non-strand specific polyA+ selected library). Cell culture samples and tissues containing fewer than 50 samples were excluded. Healthy subjects were selected by filtering samples for “violent and fast deaths" and "no terminal diseases". We obtained 2778 samples from 23 healthy human tissues that were used for downstream analyses.<b>Transcription readthrough detection</b>To detect transcription readthrough (TRT), we first converted the downloaded BAM files from dbGaP back to FASTQ using samtools (v.1.10) <sup>44</sup>, and then re-aligned them to the reference genome (GRCh38 assembly; release 37, GRCh38.p13) using STAR (v2.7.8a) <sup>45</sup>. To detect the transcription readthrough (TRT), we used ARTDeco <sup>20</sup>, a pipeline for analyzing and characterizing transcriptional readthrough that searches for continuous coverage over a minimal length downstream of the 3’end of each gene locus (annotation version 37, Ensembl 103) using a rolling window approach. The transcription levels of the window must meet the thresholds to be considered part of the readthrough tail. We used a rolling window of 500bp, minimum length of 2000 bp, and minimum coverage of 0.15 FPKM. ARTDeco uses HOMER’s tools <sup>46</sup> o select only uniquely mapped reads for downstream analysis and returns a variety of metrics to measure readthrough. We used the information contained inside the “quantification” and “dogs” folders (expression levels and novel transcripts created as a result of readthrough, respectively) for downstream analysis.As GTEx samples were profiled using non-stranded RNAseq libraries, a significant number of reads identified as downstream transcripts corresponded to reads coming from genes being expressed in the opposite direction. Because transcriptional signals can come from either direction, ARTDeco is ambiguous when inferring a true downstream transcript in some cases. To eliminate these dubious cases created by the lack of strandedness (designated as undefined genes), we filtered the output from ARTDeco to report only entries that did not overlap with genes in the opposite strand, using the intersect function from bedtools (v2.30.0) <sup>47</sup>. This approach discards RT transcripts with close downstream neighbors in the opposite strand but ensures that our list of readthrough genes is robust. In addition, only RT transcripts from the expressed genes in each given tissue were considered for downstream analysis. Expressed genes were defined as those with FPKM &gt; 1 in at least 25% of the samples of a given tissue.
提供机构:
Caldas, Paulo
创建时间:
2023-12-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作