five

rEGEN-B database

收藏
Figshare2025-07-31 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/rEGEN-B_database/26380702/2
下载链接
链接失效反馈
官方服务:
资源简介:
The rEGEN-B (<i>rrn</i> operons Extracted from GENomes of Bacteria) database is dedicated to the ribosomal operon sequences of bacteria. The database contains 523,869 sequences, representing 16,217 species, with an average length of 4,580 bp. The database was filtered according to “high-confidence curation” criteria that were defined: (i) the sequences in the database only come from genomes with confident assembly levels (i.e. “chromosome” or “complete genome” status, but not “contig” nor “scaffold”), (ii) only sufficiently recent genomes were retained for operon sequence extraction (nothing before 2005), and (iii) the database was curated using the DB4Q2 pipeline (Dubois et al., 2022) to discard low-quality and misidentified sequences. To enable users with lower computational capabilities to utilize the rEGEN-B database in a more efficient way, a lighter version of the database has also been compiled by extracting only the first copy of the <i>rrn</i> operon in each genome (see the “uniq” label in the database files). This lighter database contains 115,032 <i>rrn </i>opeorn sequences.Database update (2025-01-15):rEGEN-B: 542,371 sequences, 15,903 speciesrEGEN-B_uniq: 115,727 sequences, 15,903 speciesThe rEGEN-B database was constructed as part of the PRONAME pipeline, which has been developed to process Nanopore metabarcoding data and to significantly increase its accuracy and usability. Thanks to an innovative approach combining different quality filtering steps, read clustering, error-correction with a tool specifically dedicated to Nanopore data and the valorization of duplex reads, the generated consensus sequences display at least 99.5% accuracy with default settings.Please refer to the project GitHub repository for detailed information: https://github.com/benn888/PRONAME<b>References</b>rEGEN-B databases<i>Dubois, B., Delitte, M., Lengrand, S., Bragard, C., Legrève, A., &amp; Debode, F. (2024). PRONAME: a user-friendly pipeline to process long-read nanopore metabarcoding data by generating high-quality consensus sequences. </i><i>Frontiers in bioinformatics</i><i>, </i><i>4</i><i>, 1483255.</i>DB4Q2 pipeline<i>Dubois, B., Debode, F., Hautier, L., Hulin, J., Martin, G. S., Delvaux, A., et al. </i><i>(2022). A detailed workflow to develop QIIME2-formatted reference databases for taxonomic analysis of DNA metabarcoding data. </i><i>BMC Genom Data</i><i> 23, 53. doi: 10.1186/s12863-022-01067-5</i>
提供机构:
Dubois, Benjamin
创建时间:
2025-02-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作