xstest-response
收藏魔搭社区2025-12-04 更新2025-05-31 收录
下载链接:
https://modelscope.cn/datasets/allenai/xstest-response
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for XSTest-Response
## Disclaimer:
The data includes examples that might be disturbing, harmful or upsetting. It includes a range of harmful topics such as discriminatory language and discussions
about abuse, violence, self-harm, sexual content, misinformation among other high-risk categories. The main goal of this data is for advancing research in building safe LLMs.
It is recommended not to train a LLM exclusively on the harmful examples.
## Dataset Summary
XSTest-Response is an artifact of WildGuard project, and the purpose of this dataset is to extend [XSTest](https://arxiv.org/abs/2308.01263) with model responses to directly evaluate moderator accuracy for scoring models on a real safety benchmark.
`response_refusal` split contains 449 prompts for refusal detection (178 refusals, 271 compliances).
`response_harmfulness` split contains 446 prompts for response harmfulness (368 harmful responses, 78 benign responses).
Please check the paper for further details on data construction: [WildGuard: Open One-stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs](https://arxiv.org/abs/2406.18495).
## Usage
```python
from datasets import load_dataset
# Load the response_refusal split
dataset = load_dataset("allenai/xstest-response", split="response_refusal")
# Load the response_harmfulness split
dataset = load_dataset("allenai/xstest-response", split="response_harmfulness")
```
## Dataset Details
The dataset contains the following columns:
- `prompt`: str, indicates the user request.
- `response`: str, or None for prompt-only items in WildGuardTrain.
- `label`: str, indicates the label of the prompt. It can be "refusal" or "compliance" for `response_refusal` split, and "harmful" or "unharmful" for `response_harmfulness` split.
- `prompt_type`: str ("prompt_harmful" or "prompt_safe"), indicates whether the prompt is harmful or safe.
- `prompt_harm_category`: str, indicates the XSTest category of the prompt. If `contrast` is included in the category, it means the prompt is generated to contrast with prompts in the same category, for example, `figurative_language` <-> `contrast_figurative_language`.
## Citation
```
@misc{wildguard2024,
title={WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs},
author={Seungju Han and Kavel Rao and Allyson Ettinger and Liwei Jiang and Bill Yuchen Lin and Nathan Lambert and Yejin Choi and Nouha Dziri},
year={2024},
eprint={2406.18495},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2406.18495},
}
```
# XSTest-Response 数据集卡片
## 免责声明
本数据集包含可能令人不适、具有危害性或引发情绪困扰的示例,涵盖歧视性语言、虐待、暴力、自残、色情内容、错误信息等多类高风险有害主题。本数据集的核心目标是推动面向安全大语言模型(Large Language Model,LLM)构建的相关研究。请勿仅使用本数据集中的有害示例进行大语言模型训练。
## 数据集概述
XSTest-Response 是 WildGuard 项目的产出物,本数据集旨在为 [XSTest](https://arxiv.org/abs/2308.01263) 补充模型响应数据,以在真实安全基准测试中直接评估审核器的准确率,用于对模型进行安全评分。
`response_refusal`(拒绝检测)子集包含449条用于拒绝检测的提示词,其中178条为拒绝响应样本,271条为合规响应样本。
`response_harmfulness`(响应危害性)子集包含446条用于评估响应危害性的提示词,其中368条为有害响应样本,78条为无害响应样本。
如需了解数据集构建的更多细节,请参阅相关论文:[WildGuard:面向大语言模型安全风险、越狱行为与拒绝响应的一站式开源审核工具](https://arxiv.org/abs/2406.18495)。
## 使用方法
python
from datasets import load_dataset
# 加载 response_refusal 子集
dataset = load_dataset("allenai/xstest-response", split="response_refusal")
# 加载 response_harmfulness 子集
dataset = load_dataset("allenai/xstest-response", split="response_harmfulness")
## 数据集详情
本数据集包含以下字段:
- `prompt`:字符串类型,表示用户请求。
- `response`:字符串类型,对于 WildGuardTrain 中的仅提示词条目,该字段值为 None。
- `label`:字符串类型,表示提示词对应的标签。对于 `response_refusal` 子集,标签可选值为`"refusal"`(拒绝响应)与`"compliance"`(合规响应);对于 `response_harmfulness` 子集,标签可选值为`"harmful"`(有害响应)与`"unharmful"`(无害响应)。
- `prompt_type`:字符串类型,可选值为`"prompt_harmful"`或`"prompt_safe"`,用于标识提示词是否具有危害性。
- `prompt_harm_category`:字符串类型,表示提示词所属的 XSTest 分类。若分类中包含`"contrast"`字样,则表示该提示词是为与同分类下的其他提示词形成对比而生成,例如`"figurative_language"`(比喻语言)与`"contrast_figurative_language"`(对比式比喻语言)。
## 引用信息
@misc{wildguard2024,
title={WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs},
author={Seungju Han and Kavel Rao and Allyson Ettinger and Liwei Jiang and Bill Yuchen Lin and Nathan Lambert and Yejin Choi and Nouha Dziri},
year={2024},
eprint={2406.18495},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2406.18495},
}
提供机构:
maas
创建时间:
2025-05-29



