JailEval

arXiv2025-09-30 收录

下载链接：

https://docs.qq.com/sheet/DRFRtVGVValNQQk5h?tab=BB08J2

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为JailEval，包含了180个精心设计的恶意问题，部分问题来源于一个现存的恶意问题集合，覆盖了九种不同的恶意主题。这些主题包括“有害”、“隐私”、“成人内容”、“非法”、“政治”、“未经授权的实践”、“政府”、“误导性”以及“国家安全”。每个恶意问题都有一个在语法上相似的良性问题作为对照。该数据集规模为180个问题，其任务是评估模型逃逸监狱的程度。

The dataset is named JailEval. It contains 180 meticulously designed malicious questions, some of which are sourced from an existing malicious question collection, covering nine distinct malicious themes. These themes include "Harmful", "Privacy", "Adult Content", "Illegal", "Politics", "Unauthorized Practices", "Government", "Misinformation", and "National Security". Each malicious question is paired with a benign counterpart with grammatically similar structure. The dataset has a total of 180 questions, and its core task is to evaluate the extent to which large language models can be jailbroken.

搜集汇总

数据集介绍

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集