five

llm-semantic-router/jailbreak-detection-dataset

收藏
Hugging Face2026-01-21 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/llm-semantic-router/jailbreak-detection-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
这是一个用于训练AI安全分类器的综合数据集,与MLCommons AI Safety分类法对齐。数据集结合了多个来源的数据,包括nvidia/Aegis-AI-Content-Safety-Dataset-2.0、lmsys/toxic-chat和jackhhao/jailbreak-classification,并增强了边缘案例,如儿童性剥削(CSE)、医疗/法律/金融建议、疫苗/选举/阴谋论等。数据集包含两个级别的标签:Level 1为二分类(安全/不安全),Level 2为MLCommons 9类分类法,涵盖暴力犯罪、非暴力犯罪、性犯罪、武器/CBRNE、自残、仇恨言论、专业建议、隐私和虚假信息等类别。数据集统计信息显示,总样本量为18,000(Level 1)和18,164(Level 2),并提供了各类别的分布情况。

A comprehensive dataset for training AI safety classifiers, aligned with the MLCommons AI Safety taxonomy. This dataset combines multiple sources for robust jailbreak and safety detection, including nvidia/Aegis-AI-Content-Safety-Dataset-2.0, lmsys/toxic-chat, and jackhhao/jailbreak-classification, and enhances edge cases such as CSE (Child Sexual Exploitation), medical/legal/financial advice, vaccine/election/conspiracy theories, etc. The dataset includes two levels of labels: Level 1 is binary (safe/unsafe), and Level 2 follows the MLCommons 9-class taxonomy, covering violent crimes, non-violent crimes, sex crimes, weapons/CBRNE, self-harm, hate speech, specialized advice, privacy, and misinformation. The dataset statistics show a total sample size of 18,000 (Level 1) and 18,164 (Level 2), with distribution details for each category provided.
提供机构:
llm-semantic-router
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作