Gandalf Ignore Instruction

Name: Gandalf Ignore Instruction
Creator: Lakera AI
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://huggingface.co/datasets/Lakera/gandalf_ignore_instructions

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含了在教育游戏中收集的提示，该游戏旨在告知人们关于大型语言模型（LLMs）在提示攻击下可能出现的AI泄露风险。提示内容通过角色扮演的方式揭示游戏中的秘密密码。该数据集的规模为1000条提示，其任务是绕过模型的对齐防御机制。

This dataset contains prompts collected from an educational game developed to inform the public about potential AI leakage risks faced by Large Language Models (LLMs) during prompt attacks. These prompts are used to reveal the secret in-game passwords through role-playing scenarios. Comprising a total of 1000 prompts, the dataset targets the task of bypassing the alignment defense mechanisms of the models.

提供机构：

Lakera AI

5,000+

优质数据集

54 个

任务类型

进入经典数据集