NanoVarBench variant truthset files
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10867170
下载链接
链接失效反馈官方服务:
资源简介:
These tarballs contain the variant truthsets used for each sample in our paper "Benchmarking reveals superiority of deep learning variant callers on bacterial nanopore sequence data".
Each directory contains the following files:
.bed - A BED file of all regions in the genome
.repetitive_regions.bed - A BED file of all repetitive regions of the genome (see the paper for details of how these were identified).
.unique_regions.bed - Non-repetitive regions of the genome. This is the result of performing bedtools complement -i -g
ani.tsv - skani output from skani search for the sample's assembly against all of the downloaded genomes for that species. The last three columns are not from skani. They are completeness_percentile, completeness, and contamination metrics, all obtained from NBCI for each assembly accession.
apply.vcf.gz - the variants that were applied to the sample's reference assembly.
apply.vcf.gz.csi - VCF index for the above VCF
dnadiff.vcf.gz - Variants between the sample and donor genome from mummer4
minimap2.vcf.gz - Variants between the sample and donor genome from minimap2
mutdonor.fna - the FASTA file of the selected variant donor
mutreference.fna - the sample's reference assembly with the apply.vcf.gz applied to it. This is the genome that the sample's read are aligned to for calling variants
mutreference.fna.fai - the faidx of the above genome
reference.fna - the reference assembly of the sample. These are also available on GenBank, but are included here for interoperability
truth.vcf.gz - the truthset of variants. This is essentially apply.vcf.gz with the REF and ALT invert and the POS adjusted for the difference in position between the sample and donor assemblies. (See this script)
vcfstats.txt - VCF statistics produced by paftools.js vcfstat on the truth VCF
For information about each sample, refer to the samplesheet and paper.
创建时间:
2024-03-25



