Harmful Behaviours Dataset
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/Ed-Zh/PARDEN
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了通过对抗性攻击大型语言模型(LLMs)所收集的有害行为数据,特别关注了逃逸(jailbreaks)的实例。数据集以4元组的形式组织:(指令,输出,重复次数,标签),其中标签用于指示实例是否具有害性。该数据集的规模涵盖了484个针对Llama 2的真实逃逸实例和539个针对Claude 2的实例。其任务是进行逃逸检测和评估。
This dataset contains harmful behavior data collected through adversarial attacks on large language models (LLMs), with special emphasis on jailbreak instances. It is structured as 4-tuples: (instruction, output, repetition count, label), where the label is used to indicate whether the instance is harmful. The dataset includes 484 real jailbreak instances targeting Llama 2 and 539 instances targeting Claude 2. The tasks supported by this dataset are jailbreak detection and evaluation.
提供机构:
Open-source dataset from PARDEN project



