imadreamerboy/constitutional-safety-classifier-data

Name: imadreamerboy/constitutional-safety-classifier-data
Creator: imadreamerboy
Published: 2026-04-21 20:51:04
License: 暂无描述

Hugging Face2026-04-21 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/imadreamerboy/constitutional-safety-classifier-data

下载链接

链接失效反馈

官方服务：

资源简介：

宪法安全分类器训练数据集，用于训练开源实现的宪法分类器。数据集包含65,578个示例，分为安全（32,140，49.0%）和不安全（33,438，51.0%）两类，比例接近平衡。数据来源于三个高质量的数据集：Aegis 2.0（30,496个示例，人类标记的多类别安全数据）、BeaverTails（30,000个示例，大规模提示+响应对安全数据，平衡子样本）和ToxicChat（5,082个示例，用户与聊天机器人交互中的真实越狱尝试）。数据集分为训练集（59,022个示例）、验证集（3,278个示例）和测试集（3,278个示例）。每个示例以ChatML格式存储，包含用户消息和助手回复。用户消息中嵌入了完整的宪法（12个有害和12个无害类别）作为上下文，助手回复则简单标注为“安全”或“不安全”。

Training dataset for the Constitutional Safety Classifier, an open-source implementation of Anthropics Constitutional Classifiers. The dataset contains **65,578 examples** formatted for next-token prediction (NTP) safety classification, drawn from three high-quality sources: Aegis 2.0 (30,496 examples, human-labeled multi-category safety data), BeaverTails (30,000 examples, large-scale prompt+response safety pairs, balanced subsample), and ToxicChat (5,082 examples, real-world jailbreak attempts from user-chatbot interactions). The label distribution is nearly perfectly balanced with 32,140 (49.0%) safe and 33,438 (51.0%) unsafe examples. The dataset is split into train (59,022 examples), validation (3,278 examples), and test (3,278 examples) sets. Each example contains a `messages` field in conversational ChatML format, where the user message embeds the full constitution (12 harmful + 12 harmless categories) as context, and the assistant response is simply "safe" or "unsafe".

提供机构：

imadreamerboy

5,000+

优质数据集

54 个

任务类型

进入经典数据集