Table 4_PRONAME: a user-friendly pipeline to process long-read nanopore metabarcoding data by generating high-quality consensus sequences.xlsx
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Table_4_PRONAME_a_user-friendly_pipeline_to_process_long-read_nanopore_metabarcoding_data_by_generating_high-quality_consensus_sequences_xlsx/28069496
下载链接
链接失效反馈官方服务:
资源简介:
BackgroundThe study of sample taxonomic composition has evolved from direct observations and labor-intensive morphological studies to different DNA sequencing methodologies. Most of these studies leverage the metabarcoding approach, which involves the amplification of a small taxonomically-informative portion of the genome and its subsequent high-throughput sequencing. Recent advances in sequencing technology brought by Oxford Nanopore Technologies have revolutionized the field, enabling portability, affordable cost and long-read sequencing, therefore leading to a significant increase in taxonomic resolution. However, Nanopore sequencing data exhibit a particular profile, with a higher error rate compared with Illumina sequencing, and existing bioinformatics pipelines for the analysis of such data are scarce and often insufficient, requiring specialized tools to accurately process long-read sequences.
ResultsWe present PRONAME (PROcessing NAnopore MEtabarcoding data), an open-source, user-friendly pipeline optimized for processing raw Nanopore sequencing data. PRONAME includes precompiled databases for complete 16S sequences (Silva138 and Greengenes2) and a newly developed and curated database dedicated to bacterial 16S-ITS-23S operon sequences. The user can also provide a custom database if desired, therefore enabling the analysis of metabarcoding data for any domain of life. The pipeline significantly improves sequence accuracy, implementing innovative error-correction strategies and taking advantage of the new sequencing chemistry to produce high-quality duplex reads. Evaluations using a mock community have shown that PRONAME delivers consensus sequences demonstrating at least 99.5% accuracy with standard settings (and up to 99.7%), making it a robust tool for genomic analysis of complex multi-species communities.
ConclusionPRONAME meets the challenges of long-read Nanopore data processing, offering greater accuracy and versatility than existing pipelines. By integrating Nanopore-specific quality filtering, clustering and error correction, PRONAME produces high-precision consensus sequences. This brings the accuracy of Nanopore sequencing close to that of Illumina sequencing, while taking advantage of the benefits of long-read technologies.
研究背景
样本分类组成研究已从直接观测与劳动密集型形态学研究,逐步演进至各类DNA测序技术方法。此类研究大多采用元条形码(metabarcoding)技术路线,即扩增基因组中一段具备分类学信息的短小片段,随后对其进行高通量测序。牛津纳米孔科技公司(Oxford Nanopore Technologies)带来的测序技术最新进展,彻底革新了该领域:该技术具备便携性、成本低廉的优势,且可实现长读长测序,由此大幅提升了分类学分辨率。然而,纳米孔测序数据存在独特的特征:相较于Illumina测序,其错误率更高;且目前用于分析此类数据的生物信息学流程十分稀缺,往往难以满足需求,亟需专用工具来精准处理长读长序列。
研究结果
本研究推出PRONAME(全称PROcessing NAnopore MEtabarcoding data,即纳米孔元条形码数据处理流程),这是一款开源且操作友好的流程,专为处理原始纳米孔测序数据优化设计。PRONAME内置完整16S序列的预编译数据库(Silva138与Greengenes2),以及全新开发并经过人工整理的、针对细菌16S-ITS-23S操纵子序列的专用数据库。用户亦可根据需求自定义数据库,由此可针对所有生命域的元条形码数据开展分析。该流程采用创新性的错误校正策略,并借助新型测序化学技术生成高质量双链读段,可显著提升序列准确性。通过模拟群落开展的评估结果显示,在标准参数设置下,PRONAME生成的一致序列准确率至少可达99.5%(最高可达99.7%),足以作为分析复杂多物种群落基因组学的可靠工具。
研究结论
PRONAME应对了长读长纳米孔数据处理的诸多挑战,相较于现有流程具备更高的准确性与通用性。通过集成针对纳米孔测序的质量过滤、聚类与错误校正模块,PRONAME可生成高精度的一致序列。这使得纳米孔测序的准确率得以接近Illumina测序,同时兼顾了长读长技术的各项优势。
创建时间:
2024-12-20



