Anonymous-07/ChemSafetyBench
收藏Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/Anonymous-07/ChemSafetyBench
下载链接
链接失效反馈官方服务:
资源简介:
ChemSafetyBench是一个基于法规的多标签GHS(全球统一制度)危害预测和LLM安全可靠性评估的基准数据集,包含32,614种化学物质。与以往基于药物数据库构建的分子基准不同,ChemSafetyBench源自一个经过筛选的有害物质注册表,确保覆盖现实世界工业和安全性关键的化学品,包括溶剂、反应性气体、农用化学品和重金属化合物等。每种物质都标注了分子结构表示(规范SMILES、InChI、InChI键)、八个物理化学描述符、自由文本物理描述以及30个二元GHS危害分类标签,涵盖毒理学、物理和环境危害类别。数据集支持两个基准任务:多标签危害预测和LLM安全幻觉基准评估。
ChemSafetyBench is a regulatory-grounded benchmark dataset of 32,614 chemical substances for multi-label GHS (Globally Harmonized System) hazard prediction and LLM safety reliability evaluation. Unlike prior molecular benchmarks constructed by querying pharmaceutical databases, ChemSafetyBench is seeded from a curated hazardous materials registry, ensuring coverage of real-world industrial and safety-critical chemicals including solvents, reactive gases, agrochemicals, and heavy metal compounds. Each substance is annotated with molecular structure representations (canonical SMILES, InChI, InChI key), eight physicochemical descriptors, free-text physical descriptions, and 30 binary GHS hazard classification labels spanning toxicological, physical, and environmental hazard categories. The dataset supports two benchmark tasks: multi-label hazard prediction and LLM safety hallucination benchmark evaluation.
提供机构:
Anonymous-07



