five

skDER Representative Genomes for Select Bacterial Taxa

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8267522
下载链接
链接失效反馈
官方服务:
资源简介:
Genomes belonging to a single genus or order were gathered using a loose search of taxonomic classifications in GTDB R214. By loose we required the string 'g__{GENUSNAME}' to be found in taxonomic info column by GTDB, thus allowing gathering of associated genera (which GTDB suggests are different, but literature/domain experts have yet to rename). Genomes belonging to a taxa were dereplicated using skDER (v1.0.7) in "greedy" clustering mode with default values for parameters (99% ANI cutoff, 90% AF cutoff). Overview of Files: - The 'Genome_Dereplication_Overview.tsv' contains details of all the genomes considered as potential representatives for each taxonomic group and their GTDB R214 taxonomic classifications. - 18 _Clustering_Information.txt files which contains the relationship information of non-representative genomes to their nearest representative genome. Generated using the `-n` argument in skder v.1.0.7.   - 18 tar.gz compressed directories are provided. Each compressed directory features representative genomes in FASTA format determined for a particular taxon using skDER with greedy clustering and default cutoffs. Genome assemblies are renamed to feature both the GTDB taxonomic classification and the GCA identifier.        - Acinetobacter - 1,643 rep genomes (17.8% of 9,221 total genomes considered)        - Bacillales - 3,150 rep genomes (35.9% of 8,766 total genomes considered)        - Corynebacterium - 726 rep genomes (43.0% of 1,688 total genomes considered)        - Cutibacterium - 27 rep genomes (5.4% of 502 total genomes considered)        - Enterobacter - 878 rep genomes (19.9% of 4,408 total genomes considered)        - Enterococcus - 937 rep genomes (14.6% of 6,426 total genomes considered)        - Escherichia - 2,436 rep genomes (7.1% of 34,358 total genomes considered)        - Klebsiella - 1,022 rep genomes (5.6% of 18,145 total genomes considered)        - Lactobacillus - 541 rep genomes (30.9% of 1,747 total genomes considered)        - Listeria - 353 rep genomes (6.9% of 5,062 total genomes considered)        - Micromonospora - 211 rep genomes (73.3% of 288 total genomes considered)        - Mycobacterium - 744 rep genomes (6.9% of 10,657 total genomes considered)        - Neisseria - 414 rep genomes (12.8% of 3,235 total genomes considered)        - Pseudomonas - 2,666 rep genomes (18.9% of 14,066 total genomes considered)        - Salmonella - 308 rep genomes (2.2% of 14,109 total genomes considered)        - Staphylococcus - 496 rep genomes (2.5% of 19,627 total genomes considered)        - Streptococcus - 2,452 rep genomes (13.3% of 18,492 total genomes considered)        - Streptomyces - 1,555 rep genomes (57.7% of 2,697 total genomes considered)
创建时间:
2023-10-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作