five

Treehouse compendium of polyA selected RNA-Seq gene expression data from 932 cell lines

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE268098
下载链接
链接失效反馈
官方服务:
资源简介:
We uniformly analyze sequence data to generate a resource for comparative gene expression studies. Specifically, we obtained access to primary RNA sequence data from repositories and clinical partners, consistently processed the data, harmonized metadata, and released the expression values and metadata without access restrictions The data contains 43 consistently processed gene expression datasets from 1 study. Gene expression in each sample is uniformly quantified using the dockerized TOIL RNA-Seq pipeline versions from 3.2 to 3.4.1 (Vivian et al., 2017); all of these versions produce bitwise identical RSEM gene expression outputs. The pipeline uses RSEM Version 1.2.25 (Li and Dewey, 2011) for quantification after aligning reads with STAR v 2.3.2a (Dobin et al., 2013) using indices generated from the human reference genome GRCh38 and the human gene models GENCODE 23 as described at https://github.com/UCSC-Treehouse/pipelines. Quality is assessed with the MEND pipeline https://github.com/UCSC-Treehouse/mend_qc (Beale et al., 2021). Data pocessing steps were as follows: Adapters are removed with CutAdapt v1.9 (Martin, 2011) Reads are aligned by STAR v 2.4.2a using indices generated from the human reference genome GRCh38 and the human gene models Gencode 23 (Dobin et al., 2013) RSEM 1.2.25 is used to quantify gene expression (Li and Dewey, 2011). Gene level expression in TPM is log transformed: log2(TPM+1) genome build: GRCh38 processed data files format and content: Gene level expression in TPM is log transformed: log2(TPM+1)
创建时间:
2024-05-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作