xstest-response

Name: xstest-response
Creator: maas
Published: 2025-12-04 16:36:22
License: 暂无描述

魔搭社区2025-12-04 更新2025-05-31 收录

下载链接：

https://modelscope.cn/datasets/allenai/xstest-response

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for XSTest-Response ## Disclaimer: The data includes examples that might be disturbing, harmful or upsetting. It includes a range of harmful topics such as discriminatory language and discussions about abuse, violence, self-harm, sexual content, misinformation among other high-risk categories. The main goal of this data is for advancing research in building safe LLMs. It is recommended not to train a LLM exclusively on the harmful examples. ## Dataset Summary XSTest-Response is an artifact of WildGuard project, and the purpose of this dataset is to extend [XSTest](https://arxiv.org/abs/2308.01263) with model responses to directly evaluate moderator accuracy for scoring models on a real safety benchmark. `response_refusal` split contains 449 prompts for refusal detection (178 refusals, 271 compliances). `response_harmfulness` split contains 446 prompts for response harmfulness (368 harmful responses, 78 benign responses). Please check the paper for further details on data construction: [WildGuard: Open One-stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs](https://arxiv.org/abs/2406.18495). ## Usage ```python from datasets import load_dataset # Load the response_refusal split dataset = load_dataset("allenai/xstest-response", split="response_refusal") # Load the response_harmfulness split dataset = load_dataset("allenai/xstest-response", split="response_harmfulness") ``` ## Dataset Details The dataset contains the following columns: - `prompt`: str, indicates the user request. - `response`: str, or None for prompt-only items in WildGuardTrain. - `label`: str, indicates the label of the prompt. It can be "refusal" or "compliance" for `response_refusal` split, and "harmful" or "unharmful" for `response_harmfulness` split. - `prompt_type`: str ("prompt_harmful" or "prompt_safe"), indicates whether the prompt is harmful or safe. - `prompt_harm_category`: str, indicates the XSTest category of the prompt. If `contrast` is included in the category, it means the prompt is generated to contrast with prompts in the same category, for example, `figurative_language` <-> `contrast_figurative_language`. ## Citation ``` @misc{wildguard2024, title={WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs}, author={Seungju Han and Kavel Rao and Allyson Ettinger and Liwei Jiang and Bill Yuchen Lin and Nathan Lambert and Yejin Choi and Nouha Dziri}, year={2024}, eprint={2406.18495}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2406.18495}, } ```

# XSTest-Response 数据集卡片 ## 免责声明本数据集包含可能令人不适、具有危害性或引发情绪困扰的示例，涵盖歧视性语言、虐待、暴力、自残、色情内容、错误信息等多类高风险有害主题。本数据集的核心目标是推动面向安全大语言模型（Large Language Model，LLM）构建的相关研究。请勿仅使用本数据集中的有害示例进行大语言模型训练。 ## 数据集概述 XSTest-Response 是 WildGuard 项目的产出物，本数据集旨在为 [XSTest](https://arxiv.org/abs/2308.01263) 补充模型响应数据，以在真实安全基准测试中直接评估审核器的准确率，用于对模型进行安全评分。 `response_refusal`（拒绝检测）子集包含449条用于拒绝检测的提示词，其中178条为拒绝响应样本，271条为合规响应样本。 `response_harmfulness`（响应危害性）子集包含446条用于评估响应危害性的提示词，其中368条为有害响应样本，78条为无害响应样本。如需了解数据集构建的更多细节，请参阅相关论文：[WildGuard：面向大语言模型安全风险、越狱行为与拒绝响应的一站式开源审核工具](https://arxiv.org/abs/2406.18495)。 ## 使用方法 python from datasets import load_dataset # 加载 response_refusal 子集 dataset = load_dataset("allenai/xstest-response", split="response_refusal") # 加载 response_harmfulness 子集 dataset = load_dataset("allenai/xstest-response", split="response_harmfulness") ## 数据集详情本数据集包含以下字段： - `prompt`：字符串类型，表示用户请求。 - `response`：字符串类型，对于 WildGuardTrain 中的仅提示词条目，该字段值为 None。 - `label`：字符串类型，表示提示词对应的标签。对于 `response_refusal` 子集，标签可选值为`"refusal"`（拒绝响应）与`"compliance"`（合规响应）；对于 `response_harmfulness` 子集，标签可选值为`"harmful"`（有害响应）与`"unharmful"`（无害响应）。 - `prompt_type`：字符串类型，可选值为`"prompt_harmful"`或`"prompt_safe"`，用于标识提示词是否具有危害性。 - `prompt_harm_category`：字符串类型，表示提示词所属的 XSTest 分类。若分类中包含`"contrast"`字样，则表示该提示词是为与同分类下的其他提示词形成对比而生成，例如`"figurative_language"`（比喻语言）与`"contrast_figurative_language"`（对比式比喻语言）。 ## 引用信息 @misc{wildguard2024, title={WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs}, author={Seungju Han and Kavel Rao and Allyson Ettinger and Liwei Jiang and Bill Yuchen Lin and Nathan Lambert and Yejin Choi and Nouha Dziri}, year={2024}, eprint={2406.18495}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2406.18495}, }

提供机构：

maas

创建时间：

2025-05-29

5,000+

优质数据集

54 个

任务类型

进入经典数据集