llm-semantic-router/jailbreak-detection-dataset

Name: llm-semantic-router/jailbreak-detection-dataset
Creator: llm-semantic-router
Published: 2026-01-21 22:56:43
License: 暂无描述

Hugging Face2026-01-21 更新2026-02-07 收录

下载链接：

https://hf-mirror.com/datasets/llm-semantic-router/jailbreak-detection-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

这是一个用于训练AI安全分类器的综合数据集，与MLCommons AI Safety分类法对齐。数据集结合了多个来源的数据，包括nvidia/Aegis-AI-Content-Safety-Dataset-2.0、lmsys/toxic-chat和jackhhao/jailbreak-classification，并增强了边缘案例，如儿童性剥削（CSE）、医疗/法律/金融建议、疫苗/选举/阴谋论等。数据集包含两个级别的标签：Level 1为二分类（安全/不安全），Level 2为MLCommons 9类分类法，涵盖暴力犯罪、非暴力犯罪、性犯罪、武器/CBRNE、自残、仇恨言论、专业建议、隐私和虚假信息等类别。数据集统计信息显示，总样本量为18,000（Level 1）和18,164（Level 2），并提供了各类别的分布情况。

A comprehensive dataset for training AI safety classifiers, aligned with the MLCommons AI Safety taxonomy. This dataset combines multiple sources for robust jailbreak and safety detection, including nvidia/Aegis-AI-Content-Safety-Dataset-2.0, lmsys/toxic-chat, and jackhhao/jailbreak-classification, and enhances edge cases such as CSE (Child Sexual Exploitation), medical/legal/financial advice, vaccine/election/conspiracy theories, etc. The dataset includes two levels of labels: Level 1 is binary (safe/unsafe), and Level 2 follows the MLCommons 9-class taxonomy, covering violent crimes, non-violent crimes, sex crimes, weapons/CBRNE, self-harm, hate speech, specialized advice, privacy, and misinformation. The dataset statistics show a total sample size of 18,000 (Level 1) and 18,164 (Level 2), with distribution details for each category provided.

提供机构：

llm-semantic-router

5,000+

优质数据集

54 个

任务类型

进入经典数据集