Generative AI-Empowered Screening Strategy for Chemical Pollutants: A Case on Per- and Polyfluoroalkyl Substances
收藏Figshare2025-11-12 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Generative_AI-Empowered_Screening_Strategy_for_Chemical_Pollutants_A_Case_on_Per-_and_Polyfluoroalkyl_Substances/30604142
下载链接
链接失效反馈官方服务:
资源简介:
Identifying unknown chemical pollutants is essential for effective risk management. However, current analytical techniques are restricted to structures covered by existing databases, capturing only the tip of the iceberg in the vast pollutant chemical space. Generative artificial intelligence (AI) holds promise for exploring this space and revealing the structures of unknown pollutants. This study proposes a chemical language model (CLM)-assisted screening strategy and applies it to identify unknown perfluoroalkyl and polyfluoroalkyl substances (PFASs). Using CLMs, over 1.4 million PFAS structures were generated, expanding the PFAS chemical space by 21.6%. The generated structures were compiled into a suspect list for screening. This strategy achieved a Top-1 annotation accuracy of 87% on spiked samples. When applied to a fluorochemical wastewater influent sample, it successfully enabled the annotation of 88 previously overlooked PFAS features. To further demonstrate its utility, the generated suspect list was integrated with an existing computational tool for analyzing the effluent sample, leading to the annotation of 100 new PFAS features. These findings underscore the substantial potential of generative AI for elucidating the structures of unknown pollutants, providing a new approach for high-throughput pollutant annotation.
创建时间:
2025-11-12



