Data_Sheet_1_Validation of a Bioinformatics Workflow for Routine Analysis of Whole-Genome Sequencing Data and Related Challenges for Pathogen Typing in a European National Reference Center: Neisseria meningitidis as a Proof-of-Concept.pdf
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://figshare.com/articles/dataset/Data_Sheet_1_Validation_of_a_Bioinformatics_Workflow_for_Routine_Analysis_of_Whole-Genome_Sequencing_Data_and_Related_Challenges_for_Pathogen_Typing_in_a_European_National_Reference_Center_Neisseria_meningitidis_as_a_Proof-of-Concept_pdf/7807052
下载链接
链接失效反馈官方服务:
资源简介:
Despite being a well-established research method, the use of whole-genome sequencing (WGS) for routine molecular typing and pathogen characterization remains a substantial challenge due to the required bioinformatics resources and/or expertise. Moreover, many national reference laboratories and centers, as well as other laboratories working under a quality system, require extensive validation to demonstrate that employed methods are “fit-for-purpose” and provide high-quality results. A harmonized framework with guidelines for the validation of WGS workflows does currently, however, not exist yet, despite several recent case studies highlighting the urgent need thereof. We present a validation strategy focusing specifically on the exhaustive characterization of the bioinformatics analysis of a WGS workflow designed to replace conventionally employed molecular typing methods for microbial isolates in a representative small-scale laboratory, using the pathogen Neisseria meningitidis as a proof-of-concept. We adapted several classically employed performance metrics specifically toward three different bioinformatics assays: resistance gene characterization (based on the ARG-ANNOT, ResFinder, CARD, and NDARO databases), several commonly employed typing schemas (including, among others, core genome multilocus sequence typing), and serogroup determination. We analyzed a core validation dataset of 67 well-characterized samples typed by means of classical genotypic and/or phenotypic methods that were sequenced in-house, allowing to evaluate repeatability, reproducibility, accuracy, precision, sensitivity, and specificity of the different bioinformatics assays. We also analyzed an extended validation dataset composed of publicly available WGS data for 64 samples by comparing results of the different bioinformatics assays against results obtained from commonly used bioinformatics tools. We demonstrate high performance, with values for all performance metrics >87%, >97%, and >90% for the resistance gene characterization, sequence typing, and serogroup determination assays, respectively, for both validation datasets. Our WGS workflow has been made publicly available as a “push-button” pipeline for Illumina data at https://galaxy.sciensano.be to showcase its implementation for non-profit and/or academic usage. Our validation strategy can be adapted to other WGS workflows for other pathogens of interest and demonstrates the added value and feasibility of employing WGS with the aim of being integrated into routine use in an applied public health setting.
尽管全基因组测序(whole-genome sequencing, WGS)作为一种成熟的研究方法已被广泛应用,但将其用于常规分子分型与病原体表征仍面临显著挑战,这主要源于其所需的生物信息学资源与专业知识门槛。此外,诸多国家参考实验室、研究中心以及遵循质量体系的其他实验室,均需开展大量验证工作,以证明所采用的方法适合预期用途(fit-for-purpose)并能产出高质量结果。尽管近期多项案例研究凸显了此类验证的迫切需求,但目前仍缺乏一套统一的WGS工作流验证指南框架。本研究针对一款旨在替代微生物分离株常规分子分型方法的WGS工作流,以具有代表性的小型实验室应用场景为基础,以脑膜炎奈瑟菌(Neisseria meningitidis)作为概念验证模型,重点提出了一套针对其生物信息学分析的全面表征验证策略。我们针对三类不同的生物信息学分析场景,对多种经典性能评估指标进行了适配:一是耐药基因表征(基于ARG-ANNOT、ResFinder、CARD及NDARO数据库);二是多种常用分型方案(包括核心基因组多位点序列分型(core genome multilocus sequence typing)等);三是血清群鉴定。我们构建了核心验证数据集,包含67株经经典基因型和/或表型方法精准分型的实验室内部测序样本,以此评估不同生物信息学分析方法的重复性、重现性、准确度、精密度、灵敏度与特异度。此外,我们还构建了扩展验证数据集,包含64株公开的WGS数据样本,通过将本研究中不同生物信息学分析的结果与常用生物信息学工具的输出结果进行比对完成验证。针对两类验证数据集,耐药基因表征、序列分型及血清群鉴定分析的各项性能指标分别均高于87%、97%和90%,充分证明了该工作流的优异性能。我们已将这款WGS工作流以「一键式」Illumina数据分析流程的形式公开部署于https://galaxy.sciensano.be,以供非营利性和/或学术机构免费使用。本研究提出的验证策略可适配其他针对不同目标病原体的WGS工作流,同时证明了将WGS整合至公共卫生应用场景常规工作中的可行性与附加价值。
创建时间:
2019-03-06



