five

GTDB r214.1 Mash Database (UNOFFICIAL MIRROR)

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8048186
下载链接
链接失效反馈
官方服务:
资源简介:
This is an UNOFFICIAL host for the GTDB mash sketch based on GTDB r214.1 Intended use of this file is to include in the VEBA database for quicker GTDB-Tk analysis.  Created by running the following command using GTDB-Tk v2.3.0 on the S1 sample from Zenodo:7946802:  gtdbtk classify_wf --genome_dir veba_output/binning/prokaryotic/S1/output/genomes/ --out_dir test_output -x fa --cpus 1 --mash_db ./gtdb_r214.msh   Source Files: gtdbtk_r214_data.tar.gz   RELEASE_NOTES.txt Release Notes: Release 214.1: ------------- Correction regarding the classification of the genome "GB_GCA_902406375.1" in 214.1 release. We have identified an error in the taxonomy assignment for this particular genome. The genome GB_GCA_902406375.1 was previously classified as Collinsella sp905215505 in some files . We have reevaluated the taxonomy and determined that the correct classification should be Collinsella sp002232035. We have rectified this error and made the necessary updates to the following files within the package: - bac120_taxonomy_r214.tsv - sp_clusters_r214.tsv - ssu_all_r214.tar.gz Notes: ------ - We thank Jan MareÅ¡ for his help in curating the Cyanobacteria - Phylum names have been updated following the valid publication of 42 names in IJSEM (https://pubmed.ncbi.nlm.nih.gov/34694987/), including Bacillota and Pseudomonadota - Fixed issue with SSU files where sequences started 2 bp after correct start and stopped 1 bp after correct end of sequence. Thanks to CX for bringing this issue to our attention: https://forum.gtdb.ecogenomic.org/t/16s-23s-and-ssu-all-r207/307/2 - SSU files now provide sequences in their 5' to 3' orientation - Changed QC criterion for number of contigs from 1000 to 2000 in order to better align the GTDB criteria with RefSeq (https://www.ncbi.nlm.nih.gov/assembly/help/anomnotrefseq/) - Changed QC criterion to use ar53 instead of ar122 marker set. The impact of this change was evaluated on the 353,569 genomes (~6,100 archaeal) considered for GTDB R207: -- only 1 additional genome passed QC -- only 21 additional genomes failed QC which included the following species representatives: -- s__Methanoregula sp002497485 -- s__Methanobrevibacter_A sp017634055 -- s__Methanosphaera sp003266165 -- s__MGIIa-L1 sp002688825 -- s__MGIIb-N2 sp002503665 -- s__MGIIa-L2 sp002692685 -- s__MGIIb-O3 sp002730445 -- s__DTDI01 sp011334935 -- s__Methanosphaera sp017652595 -- s__Nitrosopelagicus sp902606945 -- s__Methanolinea sp002501965 If you have found this useful, please cite the original publications:  Chaumeil PA, et al. 2022. GTDB-Tk v2: memory friendly classification with the Genome Taxonomy Database. Bioinformatics, btac672. Parks, D.H., et al. (2021). GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Research, 50: D785–D794.
创建时间:
2023-06-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作