Data Sheet 1_Enhancing pathogen identification through AI-assisted metagenomic sequencing.pdf
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Data_Sheet_1_Enhancing_pathogen_identification_through_AI-assisted_metagenomic_sequencing_pdf/30166885
下载链接
链接失效反馈官方服务:
资源简介:
IntroductionTo address the limitations of current metagenomic identification approaches, we proposed a principled AI-assisted architecture that enhances accuracy, scalability, and biological interpretability through three core innovations.
MethodsFirstly, we developed a structured probabilistic model that formulates pathogen detection as a hierarchical and compositional inference task under taxonomic and ecological constraints. This framework enables the integration of phylogenetic priors and sparsity-aware mechanisms, reducing noise and ambiguity. By modeling taxonomic structure and ecological dependencies, the approach ensures more accurate identification, especially in complex or low-abundance microbial communities. Secondly, we introduced the Taxon-aware Compositional Inference Network (TCINet), a deep learning model that processes sequencing reads to produce taxonomic embeddings. TCINet estimates abundance distributions via masked neural activations that enforce sparsity and interpretability, while also propagating uncertainty through log-normal variance modeling. Designed to respect microbial phylogeny and co-occurrence patterns, TCINet enables scalable, biologically plausible inference across diverse clinical and environmental datasets. Thirdly, we presented the Hierarchical Taxonomic Reasoning Strategy (HTRS), a post-inference module that refines predictions by enforcing compositional constraints, propagating evidence across taxonomic hierarchies, and calibrating confidence using entropy and variance-based metrics. HTRS includes context-aware thresholding and co-occurrence priors to adaptively optimize performance based on dataset characteristics.
ResultsTogether, these innovations create a unified framework for metagenomic identification that combines probabilistic modeling, deep learning, and structured reasoning.
DiscussionThe architecture delivers robust and interpretable results, making it suitable for applications in clinical diagnostics, environmental monitoring, and ecological research.
创建时间:
2025-09-19



