skDER Representative Genomes for Select Bacterial Taxa
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8267522
下载链接
链接失效反馈官方服务:
资源简介:
Genomes belonging to a single genus or order were gathered using a loose search of taxonomic classifications in GTDB R214. By loose we required the string 'g__{GENUSNAME}' to be found in taxonomic info column by GTDB, thus allowing gathering of associated genera (which GTDB suggests are different, but literature/domain experts have yet to rename).
Genomes belonging to a taxa were dereplicated using skDER (v1.0.7) in "greedy" clustering mode with default values for parameters (99% ANI cutoff, 90% AF cutoff).
Overview of Files:
- The 'Genome_Dereplication_Overview.tsv' contains details of all the genomes considered as potential representatives for each taxonomic group and their GTDB R214 taxonomic classifications.
- 18 _Clustering_Information.txt files which contains the relationship information of non-representative genomes to their nearest representative genome. Generated using the `-n` argument in skder v.1.0.7.
- 18 tar.gz compressed directories are provided. Each compressed directory features representative genomes in FASTA format determined for a particular taxon using skDER with greedy clustering and default cutoffs. Genome assemblies are renamed to feature both the GTDB taxonomic classification and the GCA identifier. - Acinetobacter - 1,643 rep genomes (17.8% of 9,221 total genomes considered) - Bacillales - 3,150 rep genomes (35.9% of 8,766 total genomes considered) - Corynebacterium - 726 rep genomes (43.0% of 1,688 total genomes considered) - Cutibacterium - 27 rep genomes (5.4% of 502 total genomes considered) - Enterobacter - 878 rep genomes (19.9% of 4,408 total genomes considered) - Enterococcus - 937 rep genomes (14.6% of 6,426 total genomes considered) - Escherichia - 2,436 rep genomes (7.1% of 34,358 total genomes considered) - Klebsiella - 1,022 rep genomes (5.6% of 18,145 total genomes considered) - Lactobacillus - 541 rep genomes (30.9% of 1,747 total genomes considered) - Listeria - 353 rep genomes (6.9% of 5,062 total genomes considered) - Micromonospora - 211 rep genomes (73.3% of 288 total genomes considered) - Mycobacterium - 744 rep genomes (6.9% of 10,657 total genomes considered) - Neisseria - 414 rep genomes (12.8% of 3,235 total genomes considered) - Pseudomonas - 2,666 rep genomes (18.9% of 14,066 total genomes considered) - Salmonella - 308 rep genomes (2.2% of 14,109 total genomes considered) - Staphylococcus - 496 rep genomes (2.5% of 19,627 total genomes considered) - Streptococcus - 2,452 rep genomes (13.3% of 18,492 total genomes considered) - Streptomyces - 1,555 rep genomes (57.7% of 2,697 total genomes considered)
创建时间:
2023-10-26



