five

Generative AI-Empowered Screening Strategy for Chemical Pollutants: A Case on Per- and Polyfluoroalkyl Substances

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Generative_AI-Empowered_Screening_Strategy_for_Chemical_Pollutants_A_Case_on_Per-_and_Polyfluoroalkyl_Substances/30604133
下载链接
链接失效反馈
官方服务:
资源简介:
Identifying unknown chemical pollutants is essential for effective risk management. However, current analytical techniques are restricted to structures covered by existing databases, capturing only the tip of the iceberg in the vast pollutant chemical space. Generative artificial intelligence (AI) holds promise for exploring this space and revealing the structures of unknown pollutants. This study proposes a chemical language model (CLM)-assisted screening strategy and applies it to identify unknown perfluoroalkyl and polyfluoroalkyl substances (PFASs). Using CLMs, over 1.4 million PFAS structures were generated, expanding the PFAS chemical space by 21.6%. The generated structures were compiled into a suspect list for screening. This strategy achieved a Top-1 annotation accuracy of 87% on spiked samples. When applied to a fluorochemical wastewater influent sample, it successfully enabled the annotation of 88 previously overlooked PFAS features. To further demonstrate its utility, the generated suspect list was integrated with an existing computational tool for analyzing the effluent sample, leading to the annotation of 100 new PFAS features. These findings underscore the substantial potential of generative AI for elucidating the structures of unknown pollutants, providing a new approach for high-throughput pollutant annotation.
创建时间:
2025-11-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作