five

xstest-response

收藏
魔搭社区2025-12-04 更新2025-05-31 收录
下载链接:
https://modelscope.cn/datasets/allenai/xstest-response
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for XSTest-Response ## Disclaimer: The data includes examples that might be disturbing, harmful or upsetting. It includes a range of harmful topics such as discriminatory language and discussions about abuse, violence, self-harm, sexual content, misinformation among other high-risk categories. The main goal of this data is for advancing research in building safe LLMs. It is recommended not to train a LLM exclusively on the harmful examples. ## Dataset Summary XSTest-Response is an artifact of WildGuard project, and the purpose of this dataset is to extend [XSTest](https://arxiv.org/abs/2308.01263) with model responses to directly evaluate moderator accuracy for scoring models on a real safety benchmark. `response_refusal` split contains 449 prompts for refusal detection (178 refusals, 271 compliances). `response_harmfulness` split contains 446 prompts for response harmfulness (368 harmful responses, 78 benign responses). Please check the paper for further details on data construction: [WildGuard: Open One-stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs](https://arxiv.org/abs/2406.18495). ## Usage ```python from datasets import load_dataset # Load the response_refusal split dataset = load_dataset("allenai/xstest-response", split="response_refusal") # Load the response_harmfulness split dataset = load_dataset("allenai/xstest-response", split="response_harmfulness") ``` ## Dataset Details The dataset contains the following columns: - `prompt`: str, indicates the user request. - `response`: str, or None for prompt-only items in WildGuardTrain. - `label`: str, indicates the label of the prompt. It can be "refusal" or "compliance" for `response_refusal` split, and "harmful" or "unharmful" for `response_harmfulness` split. - `prompt_type`: str ("prompt_harmful" or "prompt_safe"), indicates whether the prompt is harmful or safe. - `prompt_harm_category`: str, indicates the XSTest category of the prompt. If `contrast` is included in the category, it means the prompt is generated to contrast with prompts in the same category, for example, `figurative_language` <-> `contrast_figurative_language`. ## Citation ``` @misc{wildguard2024, title={WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs}, author={Seungju Han and Kavel Rao and Allyson Ettinger and Liwei Jiang and Bill Yuchen Lin and Nathan Lambert and Yejin Choi and Nouha Dziri}, year={2024}, eprint={2406.18495}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2406.18495}, } ```

# XSTest-Response 数据集卡片 ## 免责声明 本数据集包含可能令人不适、具有危害性或引发情绪困扰的示例,涵盖歧视性语言、虐待、暴力、自残、色情内容、错误信息等多类高风险有害主题。本数据集的核心目标是推动面向安全大语言模型(Large Language Model,LLM)构建的相关研究。请勿仅使用本数据集中的有害示例进行大语言模型训练。 ## 数据集概述 XSTest-Response 是 WildGuard 项目的产出物,本数据集旨在为 [XSTest](https://arxiv.org/abs/2308.01263) 补充模型响应数据,以在真实安全基准测试中直接评估审核器的准确率,用于对模型进行安全评分。 `response_refusal`(拒绝检测)子集包含449条用于拒绝检测的提示词,其中178条为拒绝响应样本,271条为合规响应样本。 `response_harmfulness`(响应危害性)子集包含446条用于评估响应危害性的提示词,其中368条为有害响应样本,78条为无害响应样本。 如需了解数据集构建的更多细节,请参阅相关论文:[WildGuard:面向大语言模型安全风险、越狱行为与拒绝响应的一站式开源审核工具](https://arxiv.org/abs/2406.18495)。 ## 使用方法 python from datasets import load_dataset # 加载 response_refusal 子集 dataset = load_dataset("allenai/xstest-response", split="response_refusal") # 加载 response_harmfulness 子集 dataset = load_dataset("allenai/xstest-response", split="response_harmfulness") ## 数据集详情 本数据集包含以下字段: - `prompt`:字符串类型,表示用户请求。 - `response`:字符串类型,对于 WildGuardTrain 中的仅提示词条目,该字段值为 None。 - `label`:字符串类型,表示提示词对应的标签。对于 `response_refusal` 子集,标签可选值为`"refusal"`(拒绝响应)与`"compliance"`(合规响应);对于 `response_harmfulness` 子集,标签可选值为`"harmful"`(有害响应)与`"unharmful"`(无害响应)。 - `prompt_type`:字符串类型,可选值为`"prompt_harmful"`或`"prompt_safe"`,用于标识提示词是否具有危害性。 - `prompt_harm_category`:字符串类型,表示提示词所属的 XSTest 分类。若分类中包含`"contrast"`字样,则表示该提示词是为与同分类下的其他提示词形成对比而生成,例如`"figurative_language"`(比喻语言)与`"contrast_figurative_language"`(对比式比喻语言)。 ## 引用信息 @misc{wildguard2024, title={WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs}, author={Seungju Han and Kavel Rao and Allyson Ettinger and Liwei Jiang and Bill Yuchen Lin and Nathan Lambert and Yejin Choi and Nouha Dziri}, year={2024}, eprint={2406.18495}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2406.18495}, }
提供机构:
maas
创建时间:
2025-05-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作