Virgo Benchmarking Datasets
收藏Figshare2025-04-13 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/Virgo_Benchmarking_Datasets/28730093/1
下载链接
链接失效反馈官方服务:
资源简介:
1. <b>Global Ocean Eukaryotic Viral (GOEV) Database</b><br>Source: Extract from Gaïa M. et al. Nature (2023)Components:<br>591 MAGs from Schulz, F. et al. (2020) [DOI: 10.1038/s41586-020-1957-x]<br>445 MAGs from Sunagawa, S. et al. (2020) [DOI: 10.1038/s41579-020-0364-5]<br>218 MAGs from Moniruzzaman, M. et al. (2020) [DOI: 10.1038/s41467-020-15507-2]<br>158 reference viral assemblies<br>Accessed: July 20, 2024Data File: GOEV_DB_CONTIGS.db.zip from FigshareSelection Criteria: Only contigs labeled at the Order taxonomic level were retainedSampling Method: Not applicableFinal Sample Size: 1,412 viral contigs<br>--<br><br>2. <b>Known Viral Sequence Clusters (kVSCs)</b><br>Source: Extract from Zolfo, M. et al. (2024) [DOI: 10.1101/2024.02.19.580813]Data Files:<br>VSC5_rep_fnas_nr99_45k_metaphlanDB.fna.gz<br>VSCs_groups.csv metadata<br>Downloaded From: Zenodo, last accessed June 28, 2024Selection Criteria:<br>Started from 45,872 representative sequences from MetaPhlan 4.1<br>Selected kVSCs (sequences clustering with a RefSeq representative)<br>Verified RefSeq accessions against ICTV Release #39 for accurate labeling<br>Sampling Method: RefSeq matching based on metadataFinal Sample Size: 2,232 representative sequences<br>--<br><br>3. <b>ICTV Release #39</b><br>Source: International Committee on Taxonomy of Viruses (ICTV) Release #39Downloaded Using: ICTVdump tool on July 17, 2024Selection Criteria:<br>Viruses present in both VMR releases #37 and #39<br>At least two representatives per family<br>Sampling Method: Up to 5 genomes randomly sampled per family using pandas.sample(), 192 families represented.Final Sample Size: 860.<br>--<br><br>4. <b>RefSeq Viral Dataset (Random Iteration)</b><br>Source: NCBI Virus Portal NCBI Virus accessed on January 27, 2025Selection Criteria:<br>Viruses with an assigned family-level taxonomy, up to 43 viruses per family<br>Sampling Method: Random uniform samplingFinal Sample Size: 6,778 viral genomes<br>--<br><br>5. <b>RefSeq Viral Dataset (Prokaryote-Infecting)</b><br>Source: NCBI Virus Portal NCBI Virus accessed on January 27, 2025Selection Criteria:<br>Viruses with an assigned family-level taxonomy<br>Prokaryote-infecting viruses only<br>Sampling Method: Random uniform samplingFinal Sample Size: 3,536 viral genomes<br>--<br><br>6. <b>ICTV Release #39 (Reduction Study Subset)</b><br>Source: International Committee on Taxonomy of Viruses (ICTV) Release #39Downloaded Using: ICTVdump tool on July 17, 2024Selection Criteria: 1,000 viruses randomly sampled from the full releaseSampling Method: Random uniform samplingFinal Sample Size: 1,000 viral genomes<br>Notes: Reduction study starting data. We provide the source code for generating the fragmented genomes.
提供机构:
Riccardi, Christopher
创建时间:
2025-04-13



