five

Virus Finding Tools: current solutions and limitations - Synthetic datasets

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/6424204
下载链接
链接失效反馈
官方服务:
资源简介:
The simulated RNA-seq datasets were built using FluxSimulator to generate paired-end reads. The first one contains thirty viral genomes from NCBI and the human genome (version GRCh38). The second dataset comprises five Human Rhinovirus A1, five Human Papillomavirus 16, and the human genome (version GRCh38). Concerning the first dataset, we chose the genomes to mix human, animal, and vegetable viruses for a fair analysis of each tool's reliability. Instead, the second dataset was created to understand the capability of each tool to assess the taxonomic level and the specificity of viral identification. We downloaded the genomes of the selected viruses from NCBI nucleotide, getting the FASTA file and the GFF3 annotation. Next, we used AGAT to transform the GFF3 files into GTF files. After that, we used FluxSimulator to generate a synthetic RNA-seq dataset for each species. Finally, we joined each simulated dataset to build a single RNA-seq sample. The number of reads generated for each species was chosen to act as a real sample. For each dataset, we include two compressed archives. The "*_fastq_files.tar" files contain the synthetic fastq files generated by FluxSimulator. The "*_raw_results.tar" files contain all the raw output produced by the tools employed in our benchmarking. The selected viruses for the first dataset are Human rhinovirus 1 strain ATCC VR-1559, Human Rhinovirus 3, Tomato mosaic virus, Molluscum contagiosum virus subtype 1, Apple mosaic virus RNA 3, Encephalomyocarditis virus, Human papillomavirus 52 isolate 52HB20, Hepatitis C virus genotype 1, Human papillomavirus type 31, Human papillomavirus type 54, JC  polyomavirus, Marine RNA virus SF-2, Marine RNA virus JP-B, Hepatitis A virus, Human immunodeficiency virus 1, Anguillid herpes virus strain UK N080, Apis mellifera virus 14 isolate BFH508NG, Human enterovirus, Escherichia phage T7 isolate T7, Human herpesvirus 6B, Human measles virus, Cyprinid herpesvirus 3, Rotavirus C segment 8, Japanese encephalitis virus, Human papillomavirus 116, Influenza A virus (A/New York/392/2004(H3N2)) segment 4, Rotavirus RCU, Human parvovirus B19, Rous Sarcoma, Human papillomavirus 16. The selected strains and isolates for the second dataset are Human rhinovirus 1 strain ATCC VR-1559, Rhinovirus A1 strain 7A2, Rhinovirus A1 strain 5Q1, Rhinovirus A1 strain RvA1B/USA/2021/RJ9JKH, Rhinovirus A1 strain RvA1/USA/2021/L8MQLH, Human papillomavirus type 16, Human papillomavirus type 16 isolate 16CN37, Human papillomavirus type 16 isolate 16CN34, Human papillomavirus type 16 strain MML8, Human papillomavirus type 16 strain MML20. The tools employed in our benchmarking are VirusFinder, VirusSeq, VirTect, viGEN, VirDetect, DAMIAN, Metamap, Kraken2, Centrifuge.
创建时间:
2022-04-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作