Dataset supporting the tool 'delfies: a Python package for the detection of DNA breakpoints with neo-telomere addition'
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14101797
下载链接
链接失效反馈官方服务:
资源简介:
Purpose
These data can be used to test my tool delfies on real data, to get a concrete sense of its inputs/outputs and test that it is properly installed.
Description
Genome
I downloaded the genome of Oscheius onirici, accession: GCA_932521025.
I subsampled the genome to the last 2kbp of chromosome I, which contains an elimination breakpoint, using `seqkit` v2.8.2, giving the FASTA file in this release.
Sequencing data
I then downloaded the following sequencing data for *O. onirici*, from the European Nucleotide Archive:
ERR5967937: Illumina NovaSeq 6000 paired end short reads. Reads are 2x150bp with average per-base quality of Q27.
ERR10796202: Oxford Nanopore PromethION long reads. Reads have average length 11.9kbp and average per-base quality Q11.4.
ERR7979900: Pacific Biosciences (PacBio) Sequel II long reads. Reads have average length 11.1kbp and average per-base quality Q28.
And aligned them to the above genome with `minimap2` version 2.26-r1175, using the following presets: "map-ont" for the Nanopore data, "map-hifi" for the PacBio data, "sr" for the Illumina data.
After sorting with `samtools`, this gives the BAM files in this release.
Running delfies
I then ran `delfies` version 0.6.0 on each BAM and genome, as:
```shdelfies --threads 16 \ --telo_forward_seq TTAGGC \ --breakpoint_type all \ --min_mapq 20 \ --min_supporting_reads 6 \ \${genome} \${bam} \${odirname}```
The three resulting output directories are in this release, prefixed with `delfies_`.
A single, identical breakpoint is found using all three BAMs (see files '*breakpoint_locations.bed').
Data source
The above raw data were produced and released by the Wellcome Sanger Institute as part of projects PRJEB51305 and PRJEB59023.
创建时间:
2024-12-05



