five

bioStream Reference Genomes for Human (hg38) and Mouse (mm10)

收藏
DataCite Commons2026-05-06 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20045739
下载链接
链接失效反馈
官方服务:
资源简介:
Overview This repository provides a standardized, high-performance collection of reference genomes and pre-built indices for the bioStream pipeline. It is designed to support a wide range of omics analyses, including Bulk RNA-seq,Bulk ATAC-seq, Bulk ChIP-seq, Single-cell RNA-seq (GEX), and Single-cell Multiome (ATAC + GEX). Key Feature: To ensure cross-platform compatibility and prevent annotation mismatch, the genomic sequences (FASTA) and gene models (GTF) used in the standard indices are strictly identical to those in the 10x Genomics Cell Ranger reference packages. Supported Species & Versions 1. Homo sapiens (GRCh38/hg38) Standard Reference: Includes indices for STAR, RSEM, and Bowtie2. 10x Genomics (2024-A): Compatible with the latest Cell Ranger/Cell Ranger ARC workflows. Annotations: Includes BED12, RSeQC rRNA markers, and curated blacklists. 2. Mus musculus (mm10/GRCm38) Standard Reference: Includes indices for STAR, RSEM, and Bowtie2. 10x Genomics (2020-A): Features refdata-gex-mm10-2020-A and refdata-cellranger-arc-mm10-2020-A-2.0.0. Annotations: Comprehensive BED files, gene annotations, and blacklists. Data Components (Included in .tar.gz archives) STAR Index: High-performance, splice-aware indices (including Genome, SA, and SAindex) for RNA-seq alignment. RSEM Index: Comprehensive transcript-level quantification references (including .transcripts.fa, .ti, and .seq). Bowtie2 Index: Full index set (.bt2) for DNA-level alignment tasks such as ATAC-seq or ChIP-seq. 10x Genomics GEX & ARC: GEX: Official Cell Ranger reference folders for Gene Expression. ARC: Specialized multi-ome references for Chromatin Accessibility + Gene Expression. QC & Filtering Resources: BED12 & RSeQC: Full gene models in BED12 format and dedicated rRNA interval files for post-alignment quality control. Blacklist Regions: Curated v2 Blacklist BED files to filter out problematic genomic regions (e.g., telomeres, centromeres) that often cause false signals in NGS data. Simplified Annotations: A consolidated gene_anno_*.csv file for easy mapping between Gene IDs, Symbols, and other metadata in downstream R/Python analysis. File List hg38.tar.gz / hg38_10x.tar.gz mm10.tar.gz / mm10_10x.tar.gz Technical Specifications Consistency: The genome.fa and genes.gtf are synchronized across all sub-directories within the same species to ensure that bulk and single-cell data processed via bioStream remain directly comparable.
提供机构:
Zenodo
创建时间:
2026-05-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作