Gandalf Attack Dataset

Name: Gandalf Attack Dataset
Creator: Gandalf platform
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://github.com/lakeraai/dsec-gandalf

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含了27.9万个通过一种众包、游戏化的红队平台生成的提示攻击，该平台旨在评估大型语言模型应用中对抗提示攻击的防御能力。此外，该数据集补充了良性用户数据，以便分析大型语言模型应用中安全性与可用性之间的相互作用。其规模达到了279,000个提示攻击，任务是对大型语言模型防御中的安全性与实用性权衡进行评估。

This dataset encompasses 279,000 adversarial prompts generated through a crowdsourced, gamified red teaming platform designed to evaluate defensive measures against prompt attacks in large language model (LLM) applications. It is further supplemented with benign user data to support analysis of the trade-off between security and usability in LLM-powered applications. With its collection of 279,000 adversarial prompts, this dataset aims to assess the security-usability tradeoffs inherent in LLM defense systems.

提供机构：

Gandalf platform

5,000+

优质数据集

54 个

任务类型

进入经典数据集