NanoVarBench variant truthset files

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://zenodo.org/record/10867170

下载链接

链接失效反馈

官方服务：

资源简介：

These tarballs contain the variant truthsets used for each sample in our paper "Benchmarking reveals superiority of deep learning variant callers on bacterial nanopore sequence data". Each directory contains the following files: .bed - A BED file of all regions in the genome .repetitive_regions.bed - A BED file of all repetitive regions of the genome (see the paper for details of how these were identified). .unique_regions.bed - Non-repetitive regions of the genome. This is the result of performing bedtools complement -i -g ani.tsv - skani output from skani search for the sample's assembly against all of the downloaded genomes for that species. The last three columns are not from skani. They are completeness_percentile, completeness, and contamination metrics, all obtained from NBCI for each assembly accession. apply.vcf.gz - the variants that were applied to the sample's reference assembly. apply.vcf.gz.csi - VCF index for the above VCF dnadiff.vcf.gz - Variants between the sample and donor genome from mummer4 minimap2.vcf.gz - Variants between the sample and donor genome from minimap2 mutdonor.fna - the FASTA file of the selected variant donor mutreference.fna - the sample's reference assembly with the apply.vcf.gz applied to it. This is the genome that the sample's read are aligned to for calling variants mutreference.fna.fai - the faidx of the above genome reference.fna - the reference assembly of the sample. These are also available on GenBank, but are included here for interoperability truth.vcf.gz - the truthset of variants. This is essentially apply.vcf.gz with the REF and ALT invert and the POS adjusted for the difference in position between the sample and donor assemblies. (See this script) vcfstats.txt - VCF statistics produced by paftools.js vcfstat on the truth VCF For information about each sample, refer to the samplesheet and paper.

创建时间：

2024-03-25

5,000+

优质数据集

54 个

任务类型

进入经典数据集