five

CoSApien

收藏
魔搭社区2026-01-06 更新2025-07-26 收录
下载链接:
https://modelscope.cn/datasets/microsoft/CoSApien
下载链接
链接失效反馈
官方服务:
资源简介:
# CoSApien: A Human-Authored Safety Control Benchmark ## Overview **Paper**: [Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements](https://arxiv.org/abs/2410.08968), published at ICLR 2025. **Purpose**: Evaluate the controllability of large language models (LLMs) aligned through natural language safety configs, ensuring both helpfulness and adherence to specified safety requirements. **Description**: CoSApien is a human-authored benchmark comprising real-world scenarios where diverse safety standards are critical. Each scenario includes a detailed safety config describing acceptable and unacceptable content and a set of carefully curated evaluation prompts. Scenarios span various contexts, such as game development, regional publishing standards, and criminal investigations, highlighting nuanced, culturally-informed safety requirements. **Evaluation**: CoSApien follows the CoSA-Score evaluation protocol, integrating judgments of response helpfulness and compliance with specified safety configs. Please see more details in our paper. ## Dataset Details **Composition**: - **5 Distinct Safety Configurations**: Each tailored to real-world LLM applications with specialized safety constraints. - **200 Evaluation Prompts**: 40 per config, covering prompts that elicit fully allowed, fully disallowed, and partially allowed content. **Explanation of columns**: - Scenario: the safety config corresponding to the current scenario. This will be used as the system prompt. - Prompt: the test prompt of the instance. - Type: evaluation prompt type specified in Section 3.1 of our paper **Applications**: - Assessing safety controllability of LLMs - Testing inference-time adaptability to varied user and cultural norms **Authors**: Jingyu Zhang, Ahmed Elgohary, Ahmed Magooda, Daniel Khashabi, Benjamin Van Durme **Project URL**: [aka.ms/controllable-safety-alignment](https://aka.ms/controllable-safety-alignment)

# CoSApien:人工编写的安全控制基准 ## 概览 **论文**:[可控安全对齐:面向多样化安全需求的推理时适配](https://arxiv.org/abs/2410.08968),发表于ICLR 2025。 **研究目的**:评估基于自然语言安全配置对齐的大语言模型(Large Language Model,LLM)的可控性,确保模型输出兼具有用性与对指定安全要求的依从性。 **数据集描述**:CoSApien是一套人工编写的基准数据集,涵盖了多样化安全标准至关重要的真实世界场景。每个场景均包含一份详细的安全配置,用以明确可接受与不可接受的内容范畴,同时配套一组精心筛选的评估提示词。场景覆盖游戏开发、区域出版标准、刑事侦查等多个领域,凸显了精细化且贴合文化背景的安全需求。 **评估方式**:CoSApien采用CoSA-Score评估协议,整合了对模型输出有用性及对指定安全配置依从性的评判。更多细节请参阅我们的论文。 ## 数据集详情 **数据集构成**: - **5种独立安全配置**:每种均针对具备专属安全约束的真实世界大语言模型应用场景定制。 - **200条评估提示词**:每种配置对应40条提示词,涵盖可完全放行、完全禁止及部分允许的内容类型提示。 **字段说明**: - Scenario:当前场景对应的安全配置,将被用作系统提示词(system prompt)。 - Prompt:该实例的测试提示词。 - Type:本文第3.1节中定义的评估提示词类型。 **应用场景**: - 评估大语言模型的安全可控性 - 测试针对多样化用户与文化规范的推理时适配能力 **作者**:Jingyu Zhang、Ahmed Elgohary、Ahmed Magooda、Daniel Khashabi、Benjamin Van Durme **项目网址**:[aka.ms/controllable-safety-alignment](https://aka.ms/controllable-safety-alignment)
提供机构:
maas
创建时间:
2025-07-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作