3Systematic tracing of nitrogen sources in complex river catchments: Machine learning approach based on microbial metagenomics
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1063879
下载链接
链接失效反馈官方服务:
资源简介:
Tracking nitrogen pollution sources is crucial for better management of water quality. This is an essential yet difficult task due to the complex contaminative scenarios in the freshwater systems. However, these contaminative pattern variations can induce rapid responses of the aquatic microbe, which may be a sensitive indicator of pollution origins. Here, we employed the soil and water assessment tool (SWAT) accompanied by a detailed pollution source database to find the main nitrogen pollution patterns in the watershed. Further, the random forest model was constructed to predict the main pollution sources from three different river ecosystems along distinct pollution disturbance (i.e., point source pollution-dominated areas (PA), the crop cultivation pollution-dominated areas (CA), and the septic tank pollution-dominated areas (SA)) based on natural conditions, river physicochemical properties, 16S rRNA microbial taxonomic composition, microbial metagenomic data containing taxonomic and functional information, and their combination, respectively. Metagenomic indices as inputs provided a markedly better prediction of water nitrogen pollution sources than using other data sets. Among the metagenomic data-based models, using the taxonomic information combined with functional information of all the species achieved the highest accuracy (0.84) and increased median kappa coefficient (0.70). Feature importance analysis suggested that the bacteria Rhabdochromatium marinum, Frankia, Actinomycetia, and Competibacteraceae were the most important species, although their relative abundances were ranging only from 0.00042% to 0.1%. Among the top 30 important variables, functional variables constitute more than half, demonstrating the remarkable variation in the microbial functions among sites with distinct pollution sources and the key role of functionality in predicting pollution sources. Intriguingly, many functional indicators related with the metabolism of Mycobacterium tuberculosis, such as K25621, K19794, and K18958, emerged as the significant important factors to distinguish nitrogen pollution origins. With the shortage of pollution source statistics work in developing regions, this proposed method provides an economical, rapid, and reliable way to identify water nitrogen pollution sources based on the metagenomic data of microbial communities.
创建时间:
2024-01-12



