Generative AI-Empowered Screening Strategy for Chemical Pollutants: A Case on Per- and Polyfluoroalkyl Substances
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Generative_AI-Empowered_Screening_Strategy_for_Chemical_Pollutants_A_Case_on_Per-_and_Polyfluoroalkyl_Substances/30604133
下载链接
链接失效反馈官方服务:
资源简介:
Identifying unknown chemical pollutants is essential
for effective
risk management. However, current analytical techniques are restricted
to structures covered by existing databases, capturing only the tip
of the iceberg in the vast pollutant chemical space. Generative artificial
intelligence (AI) holds promise for exploring this space and revealing
the structures of unknown pollutants. This study proposes a chemical
language model (CLM)-assisted screening strategy and applies it to
identify unknown perfluoroalkyl and polyfluoroalkyl substances (PFASs).
Using CLMs, over 1.4 million PFAS structures were generated, expanding
the PFAS chemical space by 21.6%. The generated structures were compiled
into a suspect list for screening. This strategy achieved a Top-1
annotation accuracy of 87% on spiked samples. When applied to a fluorochemical
wastewater influent sample, it successfully enabled the annotation
of 88 previously overlooked PFAS features. To further demonstrate
its utility, the generated suspect list was integrated with an existing
computational tool for analyzing the effluent sample, leading to the
annotation of 100 new PFAS features. These findings underscore the
substantial potential of generative AI for elucidating the structures
of unknown pollutants, providing a new approach for high-throughput
pollutant annotation.
创建时间:
2025-11-12



