AttaQ
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/ibm/AttaQ
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一组独特的数据,包含了以问题的形式设计的对抗性示例,旨在诱导大型语言模型产生有害或不恰当的回应。此外,该数据集还包括了由人类精选示例合成的攻击性问题,并通过多种方法生成,以确保对模型防御的鲁棒性。该数据集的任务是评估大型语言模型在面对对抗性攻击时的脆弱性。
This dataset is a unique corpus of adversarial examples formulated as questions, designed to induce large language models (LLMs) to generate harmful or inappropriate responses. Additionally, it includes adversarial questions synthesized from human-curated examples, which are generated via multiple methodologies to ensure robustness against model defenses. The core task of this dataset is to evaluate the vulnerability of LLMs when confronted with adversarial attacks.
提供机构:
Anthropic



