ScaDSAI/XSB_and_MS-XSB

Name: ScaDSAI/XSB_and_MS-XSB
Creator: ScaDSAI
Published: 2025-10-12 16:13:48
License: 暂无描述

Hugging Face2025-10-12 更新2025-10-18 收录

下载链接：

https://hf-mirror.com/datasets/ScaDSAI/XSB_and_MS-XSB

下载链接

链接失效反馈

官方服务：

资源简介：

XSB和MS-XSB是专门设计用于系统评估大型语言模型中夸张安全行为的标准基准。XSB专注于单轮交互，而MS-XSB则评估更复杂的多轮对话场景。XSB包含12个类别中的580个安全提示语，这些提示语常被错误地分类为不安全。MS-XSB包含嵌入在30个对话场景中的600个提示语，单独看可能不安全，但在其上下文中是安全的。该数据集用于AI安全与对齐研究，评估LLM的有益性与无害性权衡，以及训练更校准和上下文感知的模型。

XSB and MS-XSB are benchmarks specifically designed to systematically evaluate exaggerated safety behaviors in large language models (LLMs). XSB focuses on single-turn interactions, while MS-XSB evaluates more complex multi-turn conversational scenarios. XSB contains 580 safe prompts across 12 categories that are often misclassified as unsafe. MS-XSB includes 600 prompts embedded within 30 conversational scenarios, which may appear unsafe on their own but are safe within their context. The dataset is intended for research on AI safety and alignment, evaluating the helpfulness-harmlessness trade-off in LLMs, and training more calibrated and context-aware models.

提供机构：

ScaDSAI

5,000+

优质数据集

54 个

任务类型

进入经典数据集