five

Genentech/GM12878_dnase-data

收藏
Hugging Face2026-02-23 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/Genentech/GM12878_dnase-data
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - tabular-regression tags: - biology - genomics pretty_name: "GM12878 DNase regression data" size_categories: - 100K<n<1M --- # GM12878_dnase-data ## Dataset Summary This dataset contains genomic intervals used to train a regression model on GM12878 DNase data, described in Lal et al. 2025 (https://www.nature.com/articles/s41592-025-02868-z). Genome coordinates correspond to the hg38 reference genome. ## Repository Content The repository includes one BED file and one Jupyter notebook: 1. `intervals.bed`: Genomic intervals stored in BED format. 2. `1_process_GM12878_data.ipynb`: Jupyter notebook containing the preprocessing steps used to generate the `intervals.bed` file. ## Dataset Structure ### Statistics - **Number of intervals:** 435,055 - **Interval length:** 2,114 bp (all intervals) - **Genome build:** hg38 ### Intervals file (`intervals.bed`) BED format (tab-separated). There are three columns with no header: - Chromosome name - Start position - End position ## Usage ```python from huggingface_hub import hf_hub_download import pandas as pd file_path = hf_hub_download( repo_id="Genentech/GM12878_dnase-data", filename="intervals.bed", repo_type="dataset" ) df = pd.read_csv(file_path, sep='\t', header=None, names=['chrom', 'start', 'end']) ```
提供机构:
Genentech
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作