CoSApien
收藏魔搭社区2026-01-06 更新2025-07-26 收录
下载链接:
https://modelscope.cn/datasets/microsoft/CoSApien
下载链接
链接失效反馈官方服务:
资源简介:
# CoSApien: A Human-Authored Safety Control Benchmark
## Overview
**Paper**: [Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements](https://arxiv.org/abs/2410.08968), published at ICLR 2025.
**Purpose**: Evaluate the controllability of large language models (LLMs) aligned through natural language safety configs, ensuring both helpfulness and adherence to specified safety requirements.
**Description**: CoSApien is a human-authored benchmark comprising real-world scenarios where diverse safety standards are critical. Each scenario includes a detailed safety config describing acceptable and unacceptable content and a set of carefully curated evaluation prompts. Scenarios span various contexts, such as game development, regional publishing standards, and criminal investigations, highlighting nuanced, culturally-informed safety requirements.
**Evaluation**: CoSApien follows the CoSA-Score evaluation protocol, integrating judgments of response helpfulness and compliance with specified safety configs. Please see more details in our paper.
## Dataset Details
**Composition**:
- **5 Distinct Safety Configurations**: Each tailored to real-world LLM applications with specialized safety constraints.
- **200 Evaluation Prompts**: 40 per config, covering prompts that elicit fully allowed, fully disallowed, and partially allowed content.
**Explanation of columns**:
- Scenario: the safety config corresponding to the current scenario. This will be used as the system prompt.
- Prompt: the test prompt of the instance.
- Type: evaluation prompt type specified in Section 3.1 of our paper
**Applications**:
- Assessing safety controllability of LLMs
- Testing inference-time adaptability to varied user and cultural norms
**Authors**: Jingyu Zhang, Ahmed Elgohary, Ahmed Magooda, Daniel Khashabi, Benjamin Van Durme
**Project URL**: [aka.ms/controllable-safety-alignment](https://aka.ms/controllable-safety-alignment)
# CoSApien:人工编写的安全控制基准
## 概览
**论文**:[可控安全对齐:面向多样化安全需求的推理时适配](https://arxiv.org/abs/2410.08968),发表于ICLR 2025。
**研究目的**:评估基于自然语言安全配置对齐的大语言模型(Large Language Model,LLM)的可控性,确保模型输出兼具有用性与对指定安全要求的依从性。
**数据集描述**:CoSApien是一套人工编写的基准数据集,涵盖了多样化安全标准至关重要的真实世界场景。每个场景均包含一份详细的安全配置,用以明确可接受与不可接受的内容范畴,同时配套一组精心筛选的评估提示词。场景覆盖游戏开发、区域出版标准、刑事侦查等多个领域,凸显了精细化且贴合文化背景的安全需求。
**评估方式**:CoSApien采用CoSA-Score评估协议,整合了对模型输出有用性及对指定安全配置依从性的评判。更多细节请参阅我们的论文。
## 数据集详情
**数据集构成**:
- **5种独立安全配置**:每种均针对具备专属安全约束的真实世界大语言模型应用场景定制。
- **200条评估提示词**:每种配置对应40条提示词,涵盖可完全放行、完全禁止及部分允许的内容类型提示。
**字段说明**:
- Scenario:当前场景对应的安全配置,将被用作系统提示词(system prompt)。
- Prompt:该实例的测试提示词。
- Type:本文第3.1节中定义的评估提示词类型。
**应用场景**:
- 评估大语言模型的安全可控性
- 测试针对多样化用户与文化规范的推理时适配能力
**作者**:Jingyu Zhang、Ahmed Elgohary、Ahmed Magooda、Daniel Khashabi、Benjamin Van Durme
**项目网址**:[aka.ms/controllable-safety-alignment](https://aka.ms/controllable-safety-alignment)
提供机构:
maas
创建时间:
2025-07-22



