five

Leveraging Large Language Models for Contextual Prioritization of Contaminants of Emerging Concern in Chemical Mixtures

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Leveraging_Large_Language_Models_for_Contextual_Prioritization_of_Contaminants_of_Emerging_Concern_in_Chemical_Mixtures/31971918
下载链接
链接失效反馈
官方服务:
资源简介:
Effective management of chemical mixtures presents a continuing challenge due to the growing diversity and inadequate characterization of contaminants of emerging concern (CECs). While recent advances in nontarget analysis enable the generation of extensive chemical inventories, key bottlenecks have shifted to postidentification interpretation within heterogeneous data. Here, we present an agent-based workflow that integrates large language models (LLMs) with functional categories, potential sources, and toxicology information to support risk prioritization. The practical technical components and evaluation benchmarks for LLMs were established, showing that optimized prompts and the best-performing model (GPT-4-Turbo) among the seven candidates enhanced user alignment with context perfectly. Integrating real-world data through retrieval-augmented generation enabled us to retrieve 100% truthful content, and further fine-tuning nearly doubled response consistency, substantially reducing hallucination. The workflow was validated using two mixture scenarios to assess the applicability across matrices and chemical contexts. The agent enabled complete functional and source annotation of chemicals by querying the NORMAN Network and achieved ∼85% accuracy for substances absent from existing databases by emulating NORMAN-aligned logic. This capability allowed mixture-level interpretation of chemical inventory, revealing dominant categories and industrial sources, such as lubricants in shale gas flowback produced water and semiconductor-related industrial intermediates, which contributed to elevated risks in the studied scenarios.
创建时间:
2026-04-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作