five

Virgo Benchmarking Datasets

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Virgo_Benchmarking_Datasets/28730093
下载链接
链接失效反馈
官方服务:
资源简介:
1. Global Ocean Eukaryotic Viral (GOEV) Database Source: Extract from Gaïa M. et al. Nature (2023) Components: 591 MAGs from Schulz, F. et al. (2020) [DOI: 10.1038/s41586-020-1957-x] 445 MAGs from Sunagawa, S. et al. (2020) [DOI: 10.1038/s41579-020-0364-5] 218 MAGs from Moniruzzaman, M. et al. (2020) [DOI: 10.1038/s41467-020-15507-2] 158 reference viral assemblies Accessed: July 20, 2024 Data File: GOEV_DB_CONTIGS.db.zip from Figshare Selection Criteria: Only contigs labeled at the Order taxonomic level were retained Sampling Method: Not applicable Final Sample Size: 1,412 viral contigs -- 2. Known Viral Sequence Clusters (kVSCs) Source: Extract from Zolfo, M. et al. (2024) [DOI: 10.1101/2024.02.19.580813] Data Files: VSC5_rep_fnas_nr99_45k_metaphlanDB.fna.gz VSCs_groups.csv metadata Downloaded From: Zenodo, last accessed June 28, 2024 Selection Criteria: Started from 45,872 representative sequences from MetaPhlan 4.1 Selected kVSCs (sequences clustering with a RefSeq representative) Verified RefSeq accessions against ICTV Release #39 for accurate labeling Sampling Method: RefSeq matching based on metadata Final Sample Size: 2,232 representative sequences -- 3. ICTV Release #39 Source: International Committee on Taxonomy of Viruses (ICTV) Release #39 Downloaded Using: ICTVdump tool on July 17, 2024 Selection Criteria: Viruses present in both VMR releases #37 and #39 At least two representatives per family Sampling Method: Up to 5 genomes randomly sampled per family using pandas.sample(), 192 families represented. Final Sample Size: 860. -- 4. RefSeq Viral Dataset (Random Iteration) Source: NCBI Virus Portal NCBI Virus accessed on January 27, 2025 Selection Criteria: Viruses with an assigned family-level taxonomy, up to 43 viruses per family Sampling Method: Random uniform sampling Final Sample Size: 6,778 viral genomes -- 5. RefSeq Viral Dataset (Prokaryote-Infecting) Source: NCBI Virus Portal NCBI Virus accessed on January 27, 2025 Selection Criteria: Viruses with an assigned family-level taxonomy Prokaryote-infecting viruses only Sampling Method: Random uniform sampling Final Sample Size: 3,536 viral genomes -- 6. ICTV Release #39 (Reduction Study Subset) Source: International Committee on Taxonomy of Viruses (ICTV) Release #39 Downloaded Using: ICTVdump tool on July 17, 2024 Selection Criteria: 1,000 viruses randomly sampled from the full release Sampling Method: Random uniform sampling Final Sample Size: 1,000 viral genomes Notes: Reduction study starting data. We provide the source code for generating the fragmented genomes.
创建时间:
2025-04-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作