rEGEN-B database

Name: rEGEN-B database
Creator: Dubois, Benjamin
Published: 2025-07-31 00:00:00
License: 暂无描述

Figshare2025-07-31 更新2026-04-08 收录

下载链接：

https://figshare.com/articles/dataset/rEGEN-B_database/26380702/2

下载链接

链接失效反馈

官方服务：

资源简介：

The rEGEN-B (rrn operons Extracted from GENomes of Bacteria) database is dedicated to the ribosomal operon sequences of bacteria. The database contains 523,869 sequences, representing 16,217 species, with an average length of 4,580 bp. The database was filtered according to “high-confidence curation” criteria that were defined: (i) the sequences in the database only come from genomes with confident assembly levels (i.e. “chromosome” or “complete genome” status, but not “contig” nor “scaffold”), (ii) only sufficiently recent genomes were retained for operon sequence extraction (nothing before 2005), and (iii) the database was curated using the DB4Q2 pipeline (Dubois et al., 2022) to discard low-quality and misidentified sequences. To enable users with lower computational capabilities to utilize the rEGEN-B database in a more efficient way, a lighter version of the database has also been compiled by extracting only the first copy of the rrn operon in each genome (see the “uniq” label in the database files). This lighter database contains 115,032 rrn opeorn sequences.Database update (2025-01-15):rEGEN-B: 542,371 sequences, 15,903 speciesrEGEN-B_uniq: 115,727 sequences, 15,903 speciesThe rEGEN-B database was constructed as part of the PRONAME pipeline, which has been developed to process Nanopore metabarcoding data and to significantly increase its accuracy and usability. Thanks to an innovative approach combining different quality filtering steps, read clustering, error-correction with a tool specifically dedicated to Nanopore data and the valorization of duplex reads, the generated consensus sequences display at least 99.5% accuracy with default settings.Please refer to the project GitHub repository for detailed information: https://github.com/benn888/PRONAMEReferencesrEGEN-B databasesDubois, B., Delitte, M., Lengrand, S., Bragard, C., Legrève, A., & Debode, F. (2024). PRONAME: a user-friendly pipeline to process long-read nanopore metabarcoding data by generating high-quality consensus sequences. Frontiers in bioinformatics, 4, 1483255.DB4Q2 pipelineDubois, B., Debode, F., Hautier, L., Hulin, J., Martin, G. S., Delvaux, A., et al. (2022). A detailed workflow to develop QIIME2-formatted reference databases for taxonomic analysis of DNA metabarcoding data. BMC Genom Data 23, 53. doi: 10.1186/s12863-022-01067-5

提供机构：

Dubois, Benjamin

创建时间：

2025-02-18

5,000+

优质数据集

54 个

任务类型

进入经典数据集