Treehouse compendium of polyA selected RNA-Seq gene expression data from 932 cell lines

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE268098

下载链接

链接失效反馈

官方服务：

资源简介：

We uniformly analyze sequence data to generate a resource for comparative gene expression studies. Specifically, we obtained access to primary RNA sequence data from repositories and clinical partners, consistently processed the data, harmonized metadata, and released the expression values and metadata without access restrictions The data contains 43 consistently processed gene expression datasets from 1 study. Gene expression in each sample is uniformly quantified using the dockerized TOIL RNA-Seq pipeline versions from 3.2 to 3.4.1 (Vivian et al., 2017); all of these versions produce bitwise identical RSEM gene expression outputs. The pipeline uses RSEM Version 1.2.25 (Li and Dewey, 2011) for quantification after aligning reads with STAR v 2.3.2a (Dobin et al., 2013) using indices generated from the human reference genome GRCh38 and the human gene models GENCODE 23 as described at https://github.com/UCSC-Treehouse/pipelines. Quality is assessed with the MEND pipeline https://github.com/UCSC-Treehouse/mend_qc (Beale et al., 2021). Data pocessing steps were as follows: Adapters are removed with CutAdapt v1.9 (Martin, 2011) Reads are aligned by STAR v 2.4.2a using indices generated from the human reference genome GRCh38 and the human gene models Gencode 23 (Dobin et al., 2013) RSEM 1.2.25 is used to quantify gene expression (Li and Dewey, 2011). Gene level expression in TPM is log transformed: log2(TPM+1) genome build: GRCh38 processed data files format and content: Gene level expression in TPM is log transformed: log2(TPM+1)

创建时间：

2024-05-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集