five

Systematic tracing of nitrogen sources in complex river catchments: Machine learning approach based on microbial metagenomics

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://www.ncbi.nlm.nih.gov/sra/SRP482893
下载链接
链接失效反馈
官方服务:
资源简介:
Tracking nitrogen pollution sources is crucial for better management of water quality. This is an essential yet difficult task due to the complex contaminative scenarios in the freshwater systems. However, these contaminative pattern variations can induce rapid responses of the aquatic microbe, which may be a sensitive indicator of pollution origins. Here, we employed the soil and water assessment tool (SWAT) accompanied by a detailed pollution source database to find the main nitrogen pollution patterns in the watershed. Further, the random forest model was constructed to predict the main pollution sources from three different river ecosystems along distinct pollution disturbance (i.e., point source pollution-dominated areas (PA), the crop cultivation pollution-dominated areas (CA), and the septic tank pollution-dominated areas (SA)) based on natural conditions, river physicochemical properties, 16S rRNA microbial taxonomic composition, microbial metagenomic data containing taxonomic and functional information, and their combination, respectively. Metagenomic indices as inputs provided a markedly better prediction of water nitrogen pollution sources than using other data sets. Among the metagenomic data-based models, using the taxonomic information combined with functional information of all the species achieved the highest accuracy (0.84) and increased median kappa coefficient (0.70). Feature importance analysis suggested that the bacteria Rhabdochromatium marinum, Frankia, Actinomycetia, and Competibacteraceae were the most important species, although their relative abundances were ranging only from 0.00042% to 0.1%. Among the top 30 important variables, functional variables constitute more than half, demonstrating the remarkable variation in the microbial functions among sites with distinct pollution sources and the key role of functionality in predicting pollution sources. Intriguingly, many functional indicators related with the metabolism of Mycobacterium tuberculosis, such as K25621, K19794, and K18958, emerged as the significant important factors to distinguish nitrogen pollution origins. With the shortage of pollution source statistics work in developing regions, this proposed method provides an economical, rapid, and reliable way to identify water nitrogen pollution sources based on the metagenomic data of microbial communities.
创建时间:
2025-01-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作