AttaQ

Name: AttaQ
Creator: Anthropic
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://huggingface.co/datasets/ibm/AttaQ

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一组独特的数据，包含了以问题的形式设计的对抗性示例，旨在诱导大型语言模型产生有害或不恰当的回应。此外，该数据集还包括了由人类精选示例合成的攻击性问题，并通过多种方法生成，以确保对模型防御的鲁棒性。该数据集的任务是评估大型语言模型在面对对抗性攻击时的脆弱性。

This dataset is a unique corpus of adversarial examples formulated as questions, designed to induce large language models (LLMs) to generate harmful or inappropriate responses. Additionally, it includes adversarial questions synthesized from human-curated examples, which are generated via multiple methodologies to ensure robustness against model defenses. The core task of this dataset is to evaluate the vulnerability of LLMs when confronted with adversarial attacks.

提供机构：

Anthropic

5,000+

优质数据集

54 个

任务类型

进入经典数据集