SafeBench
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/trust-ai/safebench
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是根据OpenAI和Meta的使用政策精心策划的一个基准,旨在评估针对视觉-语言模型攻击的有效性。它包含了常见的安全主题和场景,在这些场景中,用户被禁止利用这些模型。数据集的规模涵盖了从三个视觉-语言模型家族生成的46,500个模型响应。其任务是评估针对视觉-语言模型的越狱攻击的有效性。
This dataset is a carefully curated benchmark developed in accordance with the usage policies of OpenAI and Meta, with the general goal of evaluating the effectiveness of attacks against vision-language models (VLMs). It covers common security-related topics and scenarios where users are prohibited from exploiting these models. The dataset comprises 46,500 model responses generated from three distinct vision-language model families. Its specific task is to assess the effectiveness of jailbreak attacks against VLMs.



