CoSApien

Name: CoSApien
Creator: maas
Published: 2026-01-06 16:39:38
License: 暂无描述

魔搭社区2026-01-06 更新2025-07-26 收录

下载链接：

https://modelscope.cn/datasets/microsoft/CoSApien

下载链接

链接失效反馈

官方服务：

资源简介：

# CoSApien: A Human-Authored Safety Control Benchmark ## Overview **Paper**: [Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements](https://arxiv.org/abs/2410.08968), published at ICLR 2025. **Purpose**: Evaluate the controllability of large language models (LLMs) aligned through natural language safety configs, ensuring both helpfulness and adherence to specified safety requirements. **Description**: CoSApien is a human-authored benchmark comprising real-world scenarios where diverse safety standards are critical. Each scenario includes a detailed safety config describing acceptable and unacceptable content and a set of carefully curated evaluation prompts. Scenarios span various contexts, such as game development, regional publishing standards, and criminal investigations, highlighting nuanced, culturally-informed safety requirements. **Evaluation**: CoSApien follows the CoSA-Score evaluation protocol, integrating judgments of response helpfulness and compliance with specified safety configs. Please see more details in our paper. ## Dataset Details **Composition**: - **5 Distinct Safety Configurations**: Each tailored to real-world LLM applications with specialized safety constraints. - **200 Evaluation Prompts**: 40 per config, covering prompts that elicit fully allowed, fully disallowed, and partially allowed content. **Explanation of columns**: - Scenario: the safety config corresponding to the current scenario. This will be used as the system prompt. - Prompt: the test prompt of the instance. - Type: evaluation prompt type specified in Section 3.1 of our paper **Applications**: - Assessing safety controllability of LLMs - Testing inference-time adaptability to varied user and cultural norms **Authors**: Jingyu Zhang, Ahmed Elgohary, Ahmed Magooda, Daniel Khashabi, Benjamin Van Durme **Project URL**: [aka.ms/controllable-safety-alignment](https://aka.ms/controllable-safety-alignment)

# CoSApien：人工编写的安全控制基准 ## 概览 **论文**：[可控安全对齐：面向多样化安全需求的推理时适配](https://arxiv.org/abs/2410.08968)，发表于ICLR 2025。 **研究目的**：评估基于自然语言安全配置对齐的大语言模型（Large Language Model，LLM）的可控性，确保模型输出兼具有用性与对指定安全要求的依从性。 **数据集描述**：CoSApien是一套人工编写的基准数据集，涵盖了多样化安全标准至关重要的真实世界场景。每个场景均包含一份详细的安全配置，用以明确可接受与不可接受的内容范畴，同时配套一组精心筛选的评估提示词。场景覆盖游戏开发、区域出版标准、刑事侦查等多个领域，凸显了精细化且贴合文化背景的安全需求。 **评估方式**：CoSApien采用CoSA-Score评估协议，整合了对模型输出有用性及对指定安全配置依从性的评判。更多细节请参阅我们的论文。 ## 数据集详情 **数据集构成**： - **5种独立安全配置**：每种均针对具备专属安全约束的真实世界大语言模型应用场景定制。 - **200条评估提示词**：每种配置对应40条提示词，涵盖可完全放行、完全禁止及部分允许的内容类型提示。 **字段说明**： - Scenario：当前场景对应的安全配置，将被用作系统提示词（system prompt）。 - Prompt：该实例的测试提示词。 - Type：本文第3.1节中定义的评估提示词类型。 **应用场景**： - 评估大语言模型的安全可控性 - 测试针对多样化用户与文化规范的推理时适配能力 **作者**：Jingyu Zhang、Ahmed Elgohary、Ahmed Magooda、Daniel Khashabi、Benjamin Van Durme **项目网址**：[aka.ms/controllable-safety-alignment](https://aka.ms/controllable-safety-alignment)

提供机构：

maas

创建时间：

2025-07-22

5,000+

优质数据集

54 个

任务类型

进入经典数据集