five

Trove of Gut Virus Genomes (TGVG)

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8357020
下载链接
链接失效反馈
官方服务:
资源简介:
TGVG_v1.1.genomes.all.fna Sequences from the Gut Virome Database, the Cenote Human Virome Database, the Metagenomic Gut Virus catalog, and the Gut Phage Database were downloaded and dereplicated at 95% average nucleotide identity (ANI) across 85% alignment fraction (AF) using anicalc.py and aniclust.py from the CheckV (version 0.9.0) package, in line with metagenomic virus sequence community standards. Exemplar sequences from each cluster/singleton from the input sequences were kept and ran through Cenote-Taker 2 (version 2.1.5) to predict virus hallmark genes within each sequence using the ‘virion’ hallmark gene database. Sequences were kept if they 1) encoded direct terminal repeats (signature of complete virus genome), one or more virus hallmark genes, and were over 1.5 kilobases or longer, or 2) encoded 2 or more virus hallmark genes and were over 12 kilobases. Sequences passing this threshold were run through CheckV to remove flanking host (bacterial) sequences and quantify the virus gene/bacteria gene ratio for each contig. Sequences with 3 or fewer virus genes and 3 or more bacterial genes after pruning/were discarded. Finally, sequences passing this threshold were dereplicated again with CheckV scripts at 95% ANI and 85% AF to yield the Trove of Gut Virus Genomes of 110,296 genomes/genome fragments each representing a viral SGB.   TGVG_v1.1_metadata.tsv For each sequence in the Trove of Gut Virus Genomes CheckV was used to estimate completeness, ipHOP (version 1.1.0) was used to predict bacterial/archael host genus. Bacphlip (version 0.9.3) was run on each of the sequences predicted to be 90% or more complete to predict phage virulence. vConTACT2 (version 0.11.3) was used to cluster viral SGBs from the Trove of Gut Virus Genomes into virus clusters. In addition to viral SGBs with vConTACT2 “Singleton” labels, viral SGBs with vConTACT2 labels “Unassigned”, “Outlier”, “Overlap”, “Clustered/Singleton” were also considered “Singletons” for downstream analysis. Genomad (version 1.5.2) taxonomy module was run on each sequence to obtain taxonomical assignment at the phylum, class, order, and family levels.
创建时间:
2023-09-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作