five

Expression Data recompute of selected GEO-deposited RNA-Seq data of HMEC-1 cell lines

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14793941
下载链接
链接失效反馈
官方服务:
资源简介:
We aligned and quantified RNA-Seq data present in GEO regarding HMEC-1 cell lines with a standardized pipeline to homogenize data preprocessing for downstream applications. All uploaded files are UTF-8, .csv-formatted matrices. The *_expected_count.csv.gz files are unlogged, raw expression counts as reported by rsem-quantify-expression with the 'expected counts' feature. The associated *_metadata.csv.gz files contain metadata pertinent to each column of the corresponding expression matrix.Some metadata files may have more rows than the associated number of columns. This is for series that were only partially RNA-Seq based (e.g. combinated RNA-Seq plus miRNA-Seq samples in the same GEO accession ID). Metadata columns are derived from GEO series files, and follow their definitions. See each GEO entry directly to determine metadata meaning. Each recompute has at least the gene_id column holding Ensembl Gene IDs. The remaining columns are ENA run accession IDs of the specific recomputed samples.Each associated metadata has at least the following columns: geo_sample: The GEO sample ID of the sample. geo_series: The GEO series ID of the sample. ena_sample: The ENA sample ID of the sample. ena_run: The ENA run accession ID of the sample, to be cross-referenced with the expression matrices. The remaining columns are derived from GEO metadata files and other ENA-provided data. Please refer to the x.FASTQ package for more information (https://github.com/TCP-Lab/x.FASTQ).Reference genome was downloaded from Ensembl, version hg38. STAR was used to create the index genome with overhang set to 149.The different datasets where generated over a long period of time trough a variety of different versions of x.FASTQ. However, the versions of the softwares that acted on the files themselves (e.g. STAR, rsem, etc...) were unchanged. Changelog Version 2: Added GSE139947 and replaced faulty GSE244042 and GSE199978, which were missing some samples.
创建时间:
2025-02-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作