Predicting microbial community structure and dynamics in full-scale wastewater treatment plants by using graph neural network models
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://www.ncbi.nlm.nih.gov/sra/SRP492741
下载链接
链接失效反馈官方服务:
资源简介:
Complex microbial communities are essential in biological wastewater treatment plants (WWTPs) to effectively remove or even recycle resources from human activities that would otherwise pollute the environment. The presence and abundance of process-critical microorganisms, such as nitrifiers or undesirably abundant bacteria causing severe problems with foaming and bulking, are important for the overall performance of the WWTP. The relative abundance of the individual species vary greatly over time and many show no recurring abundance dynamics. Being able to predict the future abundance dynamics can help WWTP operators prevent problems in time. Here we designed and tested a machine learning based on a graph neural network model design that, for the first time, enables accurate prediction of the future abundance dynamics in the microbial communities of activated sludge. We trained and tested models individually on different microbial community time series datasets from 10 Danish full-scale WWTPs sampled 2-5 times a month over a period of 3-6 years totalling 2896 environmental samples. In order to maximize the prediction accuracy the effect of pre-clustering multiple Amplicon Sequence Variants (ASVs) into groups before model training on each group using a few different methods was also tested and then compared to grouping by known biological function, where the predictions from the latter were the least accurate overall. When predicting 10 time points into the future (2-3 months depending on sampling interval), the median prediction accuracy for the top 200 most abundant ASVs in each WWTP (52-62% of total DNA sequence reads) ranged from 0.11-0.14 for 8 of the WWTP datasets according to the Bray-Curtis dissimilarity measure, indicating a good overall prediction accuracy. The predicted abundance dynamics of both the most abundant and known process-critical species were the most accurate in general, except for a few. Lastly, the approach was implemented as a software pipeline suitable for any other longitudinal community dataset, and not limited to the AS environment alone.
创建时间:
2025-03-03



