five

Table 2_PRONAME: a user-friendly pipeline to process long-read nanopore metabarcoding data by generating high-quality consensus sequences.xlsx

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Table_2_PRONAME_a_user-friendly_pipeline_to_process_long-read_nanopore_metabarcoding_data_by_generating_high-quality_consensus_sequences_xlsx/28069538
下载链接
链接失效反馈
官方服务:
资源简介:
BackgroundThe study of sample taxonomic composition has evolved from direct observations and labor-intensive morphological studies to different DNA sequencing methodologies. Most of these studies leverage the metabarcoding approach, which involves the amplification of a small taxonomically-informative portion of the genome and its subsequent high-throughput sequencing. Recent advances in sequencing technology brought by Oxford Nanopore Technologies have revolutionized the field, enabling portability, affordable cost and long-read sequencing, therefore leading to a significant increase in taxonomic resolution. However, Nanopore sequencing data exhibit a particular profile, with a higher error rate compared with Illumina sequencing, and existing bioinformatics pipelines for the analysis of such data are scarce and often insufficient, requiring specialized tools to accurately process long-read sequences. ResultsWe present PRONAME (PROcessing NAnopore MEtabarcoding data), an open-source, user-friendly pipeline optimized for processing raw Nanopore sequencing data. PRONAME includes precompiled databases for complete 16S sequences (Silva138 and Greengenes2) and a newly developed and curated database dedicated to bacterial 16S-ITS-23S operon sequences. The user can also provide a custom database if desired, therefore enabling the analysis of metabarcoding data for any domain of life. The pipeline significantly improves sequence accuracy, implementing innovative error-correction strategies and taking advantage of the new sequencing chemistry to produce high-quality duplex reads. Evaluations using a mock community have shown that PRONAME delivers consensus sequences demonstrating at least 99.5% accuracy with standard settings (and up to 99.7%), making it a robust tool for genomic analysis of complex multi-species communities. ConclusionPRONAME meets the challenges of long-read Nanopore data processing, offering greater accuracy and versatility than existing pipelines. By integrating Nanopore-specific quality filtering, clustering and error correction, PRONAME produces high-precision consensus sequences. This brings the accuracy of Nanopore sequencing close to that of Illumina sequencing, while taking advantage of the benefits of long-read technologies.

背景 样本分类组成的研究已从直接观测与耗时费力的形态学研究,逐步发展至各类DNA测序技术范式。此类研究大多依托宏条形码技术(metabarcoding),即扩增基因组中一段具备分类学信息的小片段并开展后续高通量测序。近年来,牛津纳米孔科技公司(Oxford Nanopore Technologies)推出的测序技术革新推动了该领域的颠覆性发展,实现了测序设备便携性、成本可控性与长读长测序能力,进而显著提升了分类学分辨率。然而,纳米孔测序数据具有独特的特征:相较于Illumina测序,其错误率更高,且现有的针对此类数据的生物信息学分析流程较为稀缺且功能往往不足,亟需专用工具才能准确处理长读长序列。 结果 本研究开发了PRONAME(PROcessing NAnopore MEtabarcoding data,纳米孔宏条形码数据处理流程),这是一款针对原始纳米孔测序数据处理优化的开源、易用型分析流程。PRONAME内置完整16S序列数据库(Silva138与Greengenes2),以及新开发且经精心整理的细菌16S-ITS-23S操纵子序列专用数据库。用户亦可根据需求提供自定义数据库,从而可针对生命的任一域开展宏条形码数据分析。该流程通过采用创新性的纠错策略,并利用新型测序化学技术生成高质量双链读段,可显著提升序列准确性。使用模拟群落(mock community)开展的评估结果显示,在标准参数设置下,PRONAME生成的一致序列准确率至少可达99.5%,最高可至99.7%,是针对复杂多物种群落开展基因组分析的可靠工具。 结论 PRONAME可应对长读长纳米孔数据处理的诸多挑战,相较于现有分析流程具备更高的准确性与通用性。通过集成纳米孔专属的质量过滤、聚类与纠错模块,PRONAME可生成高精度一致序列,将纳米孔测序的准确率提升至接近Illumina测序的水平,同时保留长读长技术的固有优势。
创建时间:
2024-12-20
二维码
社区交流群
二维码
科研交流群
商业服务