five

imadreamerboy/constitutional-safety-classifier-data

收藏
Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/imadreamerboy/constitutional-safety-classifier-data
下载链接
链接失效反馈
官方服务:
资源简介:
宪法安全分类器训练数据集,用于训练开源实现的宪法分类器。数据集包含65,578个示例,分为安全(32,140,49.0%)和不安全(33,438,51.0%)两类,比例接近平衡。数据来源于三个高质量的数据集:Aegis 2.0(30,496个示例,人类标记的多类别安全数据)、BeaverTails(30,000个示例,大规模提示+响应对安全数据,平衡子样本)和ToxicChat(5,082个示例,用户与聊天机器人交互中的真实越狱尝试)。数据集分为训练集(59,022个示例)、验证集(3,278个示例)和测试集(3,278个示例)。每个示例以ChatML格式存储,包含用户消息和助手回复。用户消息中嵌入了完整的宪法(12个有害和12个无害类别)作为上下文,助手回复则简单标注为“安全”或“不安全”。

Training dataset for the Constitutional Safety Classifier, an open-source implementation of Anthropics Constitutional Classifiers. The dataset contains **65,578 examples** formatted for next-token prediction (NTP) safety classification, drawn from three high-quality sources: Aegis 2.0 (30,496 examples, human-labeled multi-category safety data), BeaverTails (30,000 examples, large-scale prompt+response safety pairs, balanced subsample), and ToxicChat (5,082 examples, real-world jailbreak attempts from user-chatbot interactions). The label distribution is nearly perfectly balanced with 32,140 (49.0%) safe and 33,438 (51.0%) unsafe examples. The dataset is split into train (59,022 examples), validation (3,278 examples), and test (3,278 examples) sets. Each example contains a `messages` field in conversational ChatML format, where the user message embeds the full constitution (12 harmful + 12 harmless categories) as context, and the assistant response is simply "safe" or "unsafe".
提供机构:
imadreamerboy
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作