five

16S-ITS-23S-DB database

收藏
DataCite Commons2025-05-16 更新2025-04-16 收录
下载链接:
https://entrepot.recherche.data.gouv.fr/citation?persistentId=doi:10.57745/APDPLQ
下载链接
链接失效反馈
官方服务:
资源简介:
Extraction of 16S-ITS-23S sequences: All GenBank and RefSeq genomes of Archaea and Bacteria from GTDB v214 [1] were downloaded using genome_updater v0.6.2. A total of 402,695 assemblies were downloaded, excluding 4 assemblies with missing fna files and 77,032 assemblies with missing annotation files. These incomplete genomes were ignored, leaving 402,691 assemblies with sequences. The annotation file (GFF) was parsed to identify the 16S and 23S genes. Identification was performed by applying regex patterns to the product or gene attributes of the rRNA annotations. Pairs of 16S and 23S genes were generated under the following conditions: (i) Both 16S and 23S genes must be on the same strand, (ii) the extracted portion must be between 3000 and 7000 nucleotides in length and (iii) the region must begin with 16S and end with 23S. Genomic regions meeting these criteria have been extracted. A total of 358,166 16S-ITS-23S regions were found in 142,377 out of the 402,691 assemblies. Preprocessing of the 16S-ITS-23S sequences Since 358,166 16S-ITS-23S regions, for removing redundant information, identical sequences with the exact same taxonomy were dereplicated, resulting in a total of 199,690 unique sequences. To identify and remove potential eukaryotic contamination, the sequences were blasted against a S. cerevisae 35S sequence. Sequences with a query identity at 70% minimum and a coverage greater than 40% were removed e.g. 283 sequences. 1. Parks DH, Chuvochina M, Rinke C, Mussig AJ, Chaumeil PA, Hugenholtz P. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res. 2022;50(D1):D785-D94.
提供机构:
Recherche Data Gouv
创建时间:
2023-09-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作