SciSafetyBench

arXiv2025-09-30 收录

下载链接：

https://github.com/ulab-uiuc/SafeScientist

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个专门为评估科学背景下人工智能安全性而设计的全新基准，它包含了6个领域内的240个高风险科学任务，同时配备了30个特别设计的科学工具以及120个与工具相关的风险任务。该数据集包括为评估人工智能驱动科学研究安全性而量身定制的高风险任务和工具。任务的重点是评估科学研究中的AI安全性。

This dataset is a novel benchmark specifically designed for evaluating artificial intelligence safety in scientific contexts. It includes 240 high-risk scientific tasks across 6 domains, alongside 30 specially designed scientific tools and 120 tool-related risk tasks. This dataset comprises high-risk tasks and tools tailored specifically for assessing the safety of AI-driven scientific research, with the core focus of these tasks being the evaluation of AI safety in scientific research.

搜集汇总

数据集介绍

背景与挑战

背景概述

SciSafetyBench是一个用于评估LLM代理在科学探索中安全性的综合基准数据集。它包含240个高风险科学任务和120个工具特定风险任务，覆盖生物学、化学、物理学、医学、信息安全和材料科学六个领域，并集成了多种攻击变体如Base64、DAN和DeepInception，以测试系统的防御能力。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集