Harmful Behaviours Dataset

Name: Harmful Behaviours Dataset
Creator: Open-source dataset from PARDEN project
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://github.com/Ed-Zh/PARDEN

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含了通过对抗性攻击大型语言模型（LLMs）所收集的有害行为数据，特别关注了逃逸（jailbreaks）的实例。数据集以4元组的形式组织：（指令，输出，重复次数，标签），其中标签用于指示实例是否具有害性。该数据集的规模涵盖了484个针对Llama 2的真实逃逸实例和539个针对Claude 2的实例。其任务是进行逃逸检测和评估。

This dataset contains harmful behavior data collected through adversarial attacks on large language models (LLMs), with special emphasis on jailbreak instances. It is structured as 4-tuples: (instruction, output, repetition count, label), where the label is used to indicate whether the instance is harmful. The dataset includes 484 real jailbreak instances targeting Llama 2 and 539 instances targeting Claude 2. The tasks supported by this dataset are jailbreak detection and evaluation.

提供机构：

Open-source dataset from PARDEN project

5,000+

优质数据集

54 个

任务类型

进入经典数据集