Treehouse compendium of polyA selected RNA-Seq gene expression data from 932 cell lines
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE294350
下载链接
链接失效反馈官方服务:
资源简介:
We uniformly analyze sequence data to generate a resource for comparative gene expression studies. Specifically, we obtained access to primary RNA sequence data from repositories and clinical partners, consistently processed the data, harmonized metadata, and released the expression values and metadata without access restrictions The data contains 932 consistently processed gene expression datasets from 9 studies of cancer cell lines. TPM values are log transformed (specifically, log2(TPM+1)) and are associated with Hugo identifiers. Expected counts are not transformed and are associated with Ensemble gene identifiers. Transcript enrichment was performed via polyA selection. Gene expression in each sample is uniformly quantified using the dockerized TOIL RNA-Seq pipeline versions from 3.2 to 3.4.1 (Vivian et al., 2017); all of these versions produce bitwise identical RSEM gene expression outputs. The pipeline uses RSEM Version 1.2.25 (Li and Dewey, 2011) for quantification after aligning reads with STAR v 2.4.2a (Dobin et al., 2013) using indices generated from the human reference genome GRCh38 and the human gene models GENCODE 23 as described at https://github.com/UCSC-Treehouse/pipelines. Quality is assessed with the MEND pipeline https://github.com/UCSC-Treehouse/mend_qc (Beale et al., 2021).
创建时间:
2025-07-10



