Virus Finding Tools: current solutions and limitations - Synthetic datasets
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/6424204
下载链接
链接失效反馈官方服务:
资源简介:
The simulated RNA-seq datasets were built using FluxSimulator to generate paired-end reads. The first one contains thirty viral genomes from NCBI and the human genome (version GRCh38). The second dataset comprises five Human Rhinovirus A1, five Human Papillomavirus 16, and the human genome (version GRCh38).
Concerning the first dataset, we chose the genomes to mix human, animal, and vegetable viruses for a fair analysis of each tool's reliability. Instead, the second dataset was created to understand the capability of each tool to assess the taxonomic level and the specificity of viral identification.
We downloaded the genomes of the selected viruses from NCBI nucleotide, getting the FASTA file and the GFF3 annotation. Next, we used AGAT to transform the GFF3 files into GTF files. After that, we used FluxSimulator to generate a synthetic RNA-seq dataset for each species. Finally, we joined each simulated dataset to build a single RNA-seq sample. The number of reads generated for each species was chosen to act as a real sample.
For each dataset, we include two compressed archives. The "*_fastq_files.tar" files contain the synthetic fastq files generated by FluxSimulator. The "*_raw_results.tar" files contain all the raw output produced by the tools employed in our benchmarking.
The selected viruses for the first dataset are Human rhinovirus 1 strain ATCC VR-1559, Human Rhinovirus 3, Tomato mosaic virus, Molluscum contagiosum virus subtype 1, Apple mosaic virus RNA 3, Encephalomyocarditis virus, Human papillomavirus 52 isolate 52HB20, Hepatitis C virus genotype 1, Human papillomavirus type 31, Human papillomavirus type 54, JC polyomavirus, Marine RNA virus SF-2, Marine RNA virus JP-B, Hepatitis A virus, Human immunodeficiency virus 1, Anguillid herpes virus strain UK N080, Apis mellifera virus 14 isolate BFH508NG, Human enterovirus, Escherichia phage T7 isolate T7, Human herpesvirus 6B, Human measles virus, Cyprinid herpesvirus 3, Rotavirus C segment 8, Japanese encephalitis virus, Human papillomavirus 116, Influenza A virus (A/New York/392/2004(H3N2)) segment 4, Rotavirus RCU, Human parvovirus B19, Rous Sarcoma, Human papillomavirus 16.
The selected strains and isolates for the second dataset are Human rhinovirus 1 strain ATCC VR-1559, Rhinovirus A1 strain 7A2, Rhinovirus A1 strain 5Q1, Rhinovirus A1 strain RvA1B/USA/2021/RJ9JKH, Rhinovirus A1 strain RvA1/USA/2021/L8MQLH, Human papillomavirus type 16, Human papillomavirus type 16 isolate 16CN37, Human papillomavirus type 16 isolate 16CN34, Human papillomavirus type 16 strain MML8, Human papillomavirus type 16 strain MML20.
The tools employed in our benchmarking are VirusFinder, VirusSeq, VirTect, viGEN, VirDetect, DAMIAN, Metamap, Kraken2, Centrifuge.
创建时间:
2022-04-09



