five

FT-GNN Tool for Bridging HRMS Features and Bioactivity: Uncovering Unidentified Estrogen Receptor Agonists in Sewage

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/FT-GNN_Tool_for_Bridging_HRMS_Features_and_Bioactivity_Uncovering_Unidentified_Estrogen_Receptor_Agonists_in_Sewage/28759236
下载链接
链接失效反馈
官方服务:
资源简介:
Identifying primary estrogen receptor (ER) agonists in municipal sewage is essential for ensuring the health of aquatic environments. Given the complex and variable chemical composition of sewage, the predominant ER agonists remain unclear. High-resolution mass spectrometry (HRMS)-based models have been developed to predict compound bioactivity in complex matrices, but further optimization is needed to effectively bridge HRMS features with ER agonists. To address this challenge, an FT-GNN (fragmentation tree-based graph neural network) model was proposed. Given limited data and class imbalance, data augmentation was performed using model predictions within the applicability domain (AD) and oversampling technique (OTE). Model development results demonstrated that integrating the FT-GNN with data augmentation improved the balanced accuracy (bACC) value by 6%–31%. The developed model, with a high bACC to identify more true ER agonists, efficiently classified tens of thousands of unidentified HRMS features in sewage, reducing postprocessing workload in nontargeted screening. Analysis of ER agonist transformation during sewage treatment revealed the anaerobic stage as key to both their removal and formation. Estrogenic effect balance analysis suggests that α-E2 and 9,11-didehydroestriol may be two previously overlooked key ER agonists. Collectively, the development and application of the FT-GNN model are crucial advancements toward credible tracking and efficient control of estrogenic risks in water.

精准识别市政污水中的原生雌激素受体(ER)激动剂,对保障水生生态系统健康至关重要。鉴于污水化学成分复杂多变,当前主流的ER激动剂组分仍未被探明。基于高分辨质谱(HRMS)的模型已被开发用于预测复杂基质中的化合物生物活性,但仍需进一步优化,以有效搭建HRMS特征与ER激动剂之间的关联纽带。为应对这一挑战,本研究提出了一种基于碎裂树的图神经网络(FT-GNN)模型。考虑到数据量有限且存在类别不平衡问题,研究通过适用域(AD)内的模型预测结合过采样技术(OTE)开展了数据增强操作。模型开发结果表明,将FT-GNN与数据增强相结合,可使平衡准确率(bACC)提升6%~31%。所构建的模型平衡准确率优异,可有效识别更多真实的ER激动剂,能够高效完成污水中数万条未鉴定HRMS特征的分类任务,从而降低非靶向筛选过程中的后处理工作量。对污水处理过程中ER激动剂的转化分析显示,厌氧阶段是其去除与生成的关键环节。雌激素效应平衡分析表明,α-雌二醇(α-E2)与9,11-二脱氢雌三醇可能是两种此前被忽视的关键ER激动剂。综上,FT-GNN模型的开发与应用,为实现水环境中雌激素风险的可靠追踪与高效管控提供了重要的技术进展。
创建时间:
2025-04-09
二维码
社区交流群
二维码
科研交流群
商业服务