AdvBench

arXiv2025-09-30 收录

下载链接：

https://github.com/llm-attacks/llm-attacks

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个包含500个实例的基准测试，这些实例表现为具体指令形式的有害行为。其任务是对模型的校准度进行评估，同时检测有害行为。

This dataset is a benchmark consisting of 500 instances, each of which is a harmful behavior presented in the form of a specific instruction. The task of this benchmark is to evaluate the calibration of models and detect harmful behaviors simultaneously.

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集是'llm-attacks'项目的官方仓库，专注于研究对齐语言模型的通用和可转移对抗攻击。提供了实验代码、模型配置和复现指南，支持Vicuna-7B和LLaMA-2-7B-Chat等模型。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集