five

Annotation- and Batch Effect Correction in TCGA IsomiR Expression Data [Third-party re-analysis]

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE164767
下载链接
链接失效反馈
官方服务:
资源简介:
The Cancer Genome Atlas (TCGA) Isoform Expression Quantification Data is the largest ressource of isomiR level sequenced cancer data publicly available. Since the datasets were built up over years and through different contributing institutions, it is not free of batch effects. We evaluated different batch correction approaches to remove batch effects in the data, details of the best performing algorithm and batch variables are included in the supplementary file. Additionally, annotation of the chromosomal end position of each isomiR feature was corrected by the offset of 1 to account for exclusive annotation. Limma/ComBat batch corrected isomiR expression data for 16 TCGA Projects. Please note that Sample metadata is added as a Series supplementary file which lists links to the GDC legacy portal/GDC harmonized data portal and their associated processed data files. re-analysis data processing step: Download of TCGA IsomiR Expression Quantification data (here: normalized expression [RPM] will be used) Feature annotation correction: end position -1 Filtering for sequencing depth: exclude samples with sequencing depth < 1000000 total mapped reads For the *_tumor_FILTER.txt files: exclude normal samples from analysis IsomiR expression filter: excluding isomiRs with no expression amongst all remaining samples in the cohort (sum 0 filter) or median expression below 15 RPM (median 15 filter) Batch correction using the ComBat function (sva R package, version 3.32.1) or removeBatchEffects function (limma R package, version 3.40.6). Different batch variables were used for sequential batch correction, optimal combination of batch correction algorithm and batch variables was chosen after evaluation of success of batch correction. Sample information for batch effect correction available in supplementary file. genome build: GRCh38 *.txt files including batch corrected isomiR expression data of different TCGA cohorts after (different) filtering steps
创建时间:
2021-08-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作