coconot

Name: coconot
Creator: maas
Published: 2025-12-05 16:36:09
License: 暂无描述

魔搭社区2025-12-05 更新2024-11-30 收录

下载链接：

https://modelscope.cn/datasets/allenai/coconot

下载链接

链接失效反馈

官方服务：

资源简介：

# 🥥 CoCoNot: Contextually, Comply Not! Dataset Card ## Dataset Details ### Dataset Description Chat-based language models are designed to be helpful, yet they should not comply with every user request. While most existing work primarily focuses on refusal of "unsafe" queries, we posit that the scope of noncompliance should be broadened. We introduce a comprehensive taxonomy of contextual noncompliance describing when and how models should *not* comply with user requests. We developed a new resource, 🥥 **CoCoNot**, for **benchmarking and enhancing noncompliance** behavior of chat-based language models. - **Curated by:** Allen Institute for AI - **Paper:** [ArXiv](https://arxiv.org/abs/2407.12043) - **Repository:** [https://github.com/allenai/noncompliance](https://github.com/allenai/noncompliance) - **Language(s) (NLP):** English - **License:** https://allenai.org/licenses/impact-lr - **Point of Contact:** [Faeze Brahman & Sachin Kumar](mailto:faezeb@allenai.org, sachink@allenai.org) ## Uses In this work, we aim to broaden the scope of noncompliance beyond its previous focus on safety. To achieve this, we develop a taxonomy of contextual noncompliance for language models, drawing inspiration from previous research in real-world content moderation, user safety and experience, and AI safety and risks. We outline the taxonomy below: it comprises five major categories, each of which includes several fine-grained sub-categories. ![](https://github.com/allenai/noncompliance/blob/main/taxonomy_figure_hf.png?raw=true) We create CoCoNot based on the proposed taxonomy shown above. Our dataset contains two subsets: (1) noncompliance queries (original) and (2) a contrast query set that should be complied with (contrast). Each subset is split into a human-verified evaluation set and a training set (with responses). We use the former to benchmarking and latter to enhance noncompliance in language models. ### Loading ```python from datasets import load_dataset # load original test set coconot_eval = load_dataset("allenai/coconot", "original", split="test") # load contrast test set coconot_contrast_eval = load_dataset("allenai/coconot", "contrast", split="test") # load preference training set coconot_train_pref = load_dataset("allenai/coconot", "pref", split="train") ``` ### Dataset Structure CoCoNot examples (original and contract set) contain the following fields: - id (str): a unique identifier - prompt (str): the instruction/query which should NOT be complied with (original set) or should be complied with (contrast) - response (str): the noncompliant or compliant response (only in train split) - category (str): a high-level noncompliance category defined in our taxonomy including: "incomplete requests", "unsupported requests", "indeterminate requests", "humanizing requests", and "requests with safety concerns" - subcategory (str): a fine-grained subcategory under each category Preference data subset (coconot_pre) has the following fields: - id (str): a unique identifier - prompt (`str`): the instruction/query which can be safely complied with - chosen (`str`): the compliant response from a stronger model - chosen_model (`str`): gpt-4 - rejected (`str`): the noncompliant response from a weaker model - rejected_model (`str`): where applicable This can easily be converted to the standard chosen/rejected list of messages format for DPO training (see [UltraFeedback for an example](https://huggingface.co/datasets/allenai/ultrafeedback_binarized_cleaned)) ### Data Creation Please refer to our [paper](https://nbviewer.org/github/allenai/noncompliance/blob/main/paper.pdf) Section 3 for details on our dataset collection. ## Licensing Information CoCoNot is made available under the ODC-BY requiring the user to follow the licenses of the subsequent parts. ## Citation ``` @article{brahman-kumar2024, title={{The Art of Saying No: Contextual Noncompliance in Language Models}}, author={Faeze Brahman, Sachin Kumar, Vidhisha Balachandran, Pradeep Dasigi, Valentina Pyatkin, Abhilasha Ravichander, Sarah Wiegreffe, Nouha Dziri, Khyathi Chandu, Jack Hessel, Yulia Tsvetkov, Noah A. Smith, Yejin Choi, Hannaneh Hajishirzi}, journal={}, year={2024} } ```

# 🥥 CoCoNot：上下文合规拒绝数据集卡片 ## 数据集详情 ### 数据集概述基于对话的大语言模型（LLM）虽旨在为用户提供协助，但不应满足所有用户提出的请求。尽管现有研究大多仅聚焦于对"不安全"查询的拒绝响应，但我们认为合规拒绝的范畴应当得到拓展。本文提出了一套完整的上下文合规拒绝分类体系，用于阐明大语言模型应当在何种场景下、以何种方式拒绝响应用户请求。我们构建了全新的数据集资源🥥 **CoCoNot**，用于基准测试与优化基于对话的大语言模型的合规拒绝行为。 - **整理方**：艾伦人工智能研究所（Allen Institute for AI） - **相关论文**：[ArXiv](https://arxiv.org/abs/2407.12043) - **代码仓库**：[https://github.com/allenai/noncompliance](https://github.com/allenai/noncompliance) - **自然语言处理所用语言**：英语 - **许可协议**：https://allenai.org/licenses/impact-lr - **联系方式**：[Faeze Brahman 与 Sachin Kumar](mailto:faezeb@allenai.org, sachink@allenai.org) ## 使用场景本研究旨在将合规拒绝的范畴从以往仅聚焦于安全场景的范围中拓展出来。为此，我们借鉴了现实内容审核、用户安全与体验、AI安全与风险等领域的既有研究，构建了面向大语言模型的上下文合规拒绝分类体系。下文将对该分类体系进行说明，其包含5个大类，每个大类下又设有若干细粒度子类别。（分类体系示意图见：https://github.com/allenai/noncompliance/blob/main/taxonomy_figure_hf.png?raw=true）我们基于上述提出的分类体系构建了CoCoNot数据集。本数据集包含两个子集：(1) 合规拒绝查询集（原始集）与(2) 应当被满足的对照查询集（对照集）。每个子集均被划分为经人工验证的评测集与带响应的训练集。我们使用前者对大语言模型的合规拒绝能力进行基准测试，后者则用于优化模型的合规拒绝行为。 ### 加载方式 python from datasets import load_dataset # 加载原始测试集 coconot_eval = load_dataset("allenai/coconot", "original", split="test") # 加载对照测试集 coconot_contrast_eval = load_dataset("allenai/coconot", "contrast", split="test") # 加载偏好训练集 coconot_train_pref = load_dataset("allenai/coconot", "pref", split="train") ### 数据集结构 CoCoNot数据集样本（原始集与对照集）包含以下字段： - id（字符串类型）：唯一标识符 - prompt（字符串类型）：需被拒绝的指令/查询（原始集），或需被满足的指令/查询（对照集） - response（字符串类型）：拒绝响应或合规响应（仅训练子集包含该字段） - category（字符串类型）：本分类体系中定义的高阶合规拒绝类别，包含："不完整请求"、"不支持的请求"、"不确定请求"、"人性化请求"以及"存在安全顾虑的请求" - subcategory（字符串类型）：对应大类下的细粒度子类别偏好数据子集（coconot_pref）包含以下字段： - id（字符串类型）：唯一标识符 - prompt（字符串类型）：可被安全满足的指令/查询 - chosen（字符串类型）：更强模型生成的合规响应 - chosen_model（字符串类型）：生成该响应的模型为gpt-4 - rejected（字符串类型）：较弱模型生成的拒绝响应 - rejected_model（字符串类型）：生成该响应的模型（如适用）该子集可轻松转换为用于直接偏好优化（Direct Preference Optimization, DPO）训练的标准chosen/rejected消息列表格式，示例可参考[UltraFeedback](https://huggingface.co/datasets/allenai/ultrafeedback_binarized_cleaned) ### 数据构建有关数据集收集的详细细节，请参阅我们的论文[论文链接](https://nbviewer.org/github/allenai/noncompliance/blob/main/paper.pdf)的第3节。 ## 许可信息 CoCoNot数据集采用ODC-BY许可协议发布，使用者需遵守相关附属部分的许可要求。 ## 引用格式 @article{brahman-kumar2024, title={{《拒答的艺术：大语言模型中的上下文合规拒绝》}}, author={Faeze Brahman, Sachin Kumar, Vidhisha Balachandran, Pradeep Dasigi, Valentina Pyatkin, Abhilasha Ravichander, Sarah Wiegreffe, Nouha Dziri, Khyathi Chandu, Jack Hessel, Yulia Tsvetkov, Noah A. Smith, Yejin Choi, Hannaneh Hajishirzi}, journal={}, year={2024} }

提供机构：

maas

创建时间：

2025-05-27

搜集汇总

数据集介绍