five

Test datasets for Hi-C scaffolding

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7079218
下载链接
链接失效反馈
官方服务:
资源简介:
We provided two datasets for testing Hi-C scaffolding tools. For the CHM13 test dataset, we randomly chunked the first 10Mb of chr1, chr2 and chr3 of the T2T-CHM13v1.1 human genome assembly (Nurk et al. 2022) into 57 contigs. The Hi-C data downloaded from the telomere-to-telomere consortium GitHub repository (https://github.com/marbl/CHM13) were mapped to the reference genome and the reads mapped to these regions were extracted to generate Hi-C alignment files. For the LYZE01 test dataset, the Saccharomyces cerevisiae strain W303 genome assembly (Matheson et al. 2017) was split at positions with gaps (‘N’) to get the original contigs. An independent Hi-C data library was downloaded from the NCBI repository (GEO Accession GSM2417297) and downsampled to approximately 20X. The downsampled Hi-C data were mapped to the contigs to generate Hi-C alignment files. We provided five files for each test dataset: the contig file in FASTA format, the FASTA index file generated with SAMtools faidx command, and the Hi-C alignment file in BAM format sorted by coordinate, in BAM format sorted by query names (with the identifier 'qn' in the file name), and in BED format.
创建时间:
2022-09-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作