SafeBench

arXiv2025-09-30 收录

下载链接：

https://github.com/trust-ai/safebench

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是根据OpenAI和Meta的使用政策精心策划的一个基准，旨在评估针对视觉-语言模型攻击的有效性。它包含了常见的安全主题和场景，在这些场景中，用户被禁止利用这些模型。数据集的规模涵盖了从三个视觉-语言模型家族生成的46,500个模型响应。其任务是评估针对视觉-语言模型的越狱攻击的有效性。

This dataset is a carefully curated benchmark developed in accordance with the usage policies of OpenAI and Meta, with the general goal of evaluating the effectiveness of attacks against vision-language models (VLMs). It covers common security-related topics and scenarios where users are prohibited from exploiting these models. The dataset comprises 46,500 model responses generated from three distinct vision-language model families. Its specific task is to assess the effectiveness of jailbreak attacks against VLMs.

5,000+

优质数据集

54 个

任务类型

进入经典数据集