Gandalf Attack Dataset
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/lakeraai/dsec-gandalf
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了27.9万个通过一种众包、游戏化的红队平台生成的提示攻击,该平台旨在评估大型语言模型应用中对抗提示攻击的防御能力。此外,该数据集补充了良性用户数据,以便分析大型语言模型应用中安全性与可用性之间的相互作用。其规模达到了279,000个提示攻击,任务是对大型语言模型防御中的安全性与实用性权衡进行评估。
This dataset encompasses 279,000 adversarial prompts generated through a crowdsourced, gamified red teaming platform designed to evaluate defensive measures against prompt attacks in large language model (LLM) applications. It is further supplemented with benign user data to support analysis of the trade-off between security and usability in LLM-powered applications. With its collection of 279,000 adversarial prompts, this dataset aims to assess the security-usability tradeoffs inherent in LLM defense systems.
提供机构:
Gandalf platform



