Name: TrustAIRLab/in-the-wild-jailbreak-prompts
Creator: TrustAIRLab
Published: 2024-11-19 13:45:28
License: 暂无描述

下载链接：

https://hf-mirror.com/datasets/TrustAIRLab/in-the-wild-jailbreak-prompts

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit dataset_info: - config_name: jailbreak_2023_05_07 features: - name: platform dtype: string - name: source dtype: string - name: prompt dtype: string - name: jailbreak dtype: bool - name: created_at dtype: string - name: date dtype: string - name: community_id dtype: float64 - name: community_name dtype: string splits: - name: train num_bytes: 1391612 num_examples: 666 download_size: 656975 dataset_size: 1391612 - config_name: jailbreak_2023_12_25 features: - name: platform dtype: string - name: source dtype: string - name: prompt dtype: string - name: jailbreak dtype: bool - name: created_at dtype: string - name: date dtype: string - name: community dtype: string - name: community_id dtype: float64 - name: previous_community_id dtype: float64 splits: - name: train num_bytes: 3799875 num_examples: 1405 download_size: 1871641 dataset_size: 3799875 - config_name: regular_2023_05_07 features: - name: platform dtype: string - name: source dtype: string - name: prompt dtype: string - name: jailbreak dtype: bool - name: created_at dtype: string - name: date dtype: string splits: - name: train num_bytes: 6534994 num_examples: 5721 download_size: 3264474 dataset_size: 6534994 - config_name: regular_2023_12_25 features: - name: platform dtype: string - name: source dtype: string - name: prompt dtype: string - name: jailbreak dtype: bool - name: created_at dtype: string - name: date dtype: string splits: - name: train num_bytes: 24345310 num_examples: 13735 download_size: 12560543 dataset_size: 24345310 configs: - config_name: jailbreak_2023_05_07 data_files: - split: train path: jailbreak_2023_05_07/train-* - config_name: jailbreak_2023_12_25 data_files: - split: train path: jailbreak_2023_12_25/train-* - config_name: regular_2023_05_07 data_files: - split: train path: regular_2023_05_07/train-* - config_name: regular_2023_12_25 data_files: - split: train path: regular_2023_12_25/train-* task_categories: - text-generation size_categories: - 10K<n<100K --- # In-The-Wild Jailbreak Prompts on LLMs This is the official repository for the ACM CCS 2024 paper ["Do Anything Now'': Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models](https://arxiv.org/abs/2308.03825) by [Xinyue Shen](https://xinyueshen.me/), [Zeyuan Chen](https://picodora.github.io/), [Michael Backes](https://michaelbackes.eu/), Yun Shen, and [Yang Zhang](https://yangzhangalmo.github.io/). In this project, employing our new framework JailbreakHub, we conduct the first measurement study on jailbreak prompts in the wild, with **15,140 prompts** collected from December 2022 to December 2023 (including **1,405 jailbreak prompts**). Check out our [website here](https://jailbreak-llms.xinyueshen.me/). **Disclaimer. This repo contains examples of harmful language. Reader discretion is recommended. This repo is intended for research purposes only. Any misuse is strictly prohibited.** ## Data ## Prompts Overall, we collect 15,140 prompts from four platforms (Reddit, Discord, websites, and open-source datasets) during Dec 2022 to Dec 2023. Among these prompts, we identify 1,405 jailbreak prompts. To the best of our knowledge, this dataset serves as the largest collection of in-the-wild jailbreak prompts. > Statistics of our data source. (Adv) UA refers to (adversarial) user accounts. | Platform | Source | # Posts | # UA | # Adv UA | # Prompts | # Jailbreaks | Prompt Time Range | | --------- | -------------------------- | ----------- | --------- | -------- | ---------- | ------------ | ------------------- | | Reddit | r/ChatGPT | 163549 | 147 | 147 | 176 | 176 | 2023.02-2023.11 | | Reddit | r/ChatGPTPromptGenius | 3536 | 305 | 21 | 654 | 24 | 2022.12-2023.11 | | Reddit | r/ChatGPTJailbreak | 1602 | 183 | 183 | 225 | 225 | 2023.02-2023.11 | | Discord | ChatGPT | 609 | 259 | 106 | 544 | 214 | 2023.02-2023.12 | | Discord | ChatGPT Prompt Engineering | 321 | 96 | 37 | 278 | 67 | 2022.12-2023.12 | | Discord | Spreadsheet Warriors | 71 | 3 | 3 | 61 | 61 | 2022.12-2023.09 | | Discord | AI Prompt Sharing | 25 | 19 | 13 | 24 | 17 | 2023.03-2023.04 | | Discord | LLM Promptwriting | 184 | 64 | 41 | 167 | 78 | 2023.03-2023.12 | | Discord | BreakGPT | 36 | 10 | 10 | 32 | 32 | 2023.04-2023.09 | | Website | AIPRM | - | 2777 | 23 | 3930 | 25 | 2023.01-2023.06 | | Website | FlowGPT | - | 3505 | 254 | 8754 | 405 | 2022.12-2023.12 | | Website | JailbreakChat | - | - | - | 79 | 79 | 2023.02-2023.05 | | Dataset | AwesomeChatGPTPrompts | - | - | - | 166 | 2 | - | | Dataset | OCR-Prompts | - | - | - | 50 | 0 | - | | **Total** | | **169,933** | **7,308** | **803** | **15,140** | **1,405** | **2022.12-2023.12** | **Load Prompts** You can use the Hugging Face [`Datasets`](https://huggingface.co/datasets/TrustAIRLab/in-the-wild-jailbreak-prompts) library to easily load all collected prompts. ```python from datasets import load_dataset dataset = load_dataset('TrustAIRLab/in-the-wild-jailbreak-prompts', 'jailbreak_2023_05_07', split='train') # dataset = load_dataset('TrustAIRLab/in-the-wild-jailbreak-prompts', 'jailbreak_2023_12_25', split='train') # dataset = load_dataset('TrustAIRLab/in-the-wild-jailbreak-prompts', 'regular_2023_05_07', split='train') # dataset = load_dataset('TrustAIRLab/in-the-wild-jailbreak-prompts', 'regular_2023_12_25', split='train') ``` The original csv files are provided in our GitHub repo [jailbreak_llms](https://github.com/verazuo/jailbreak_llms/tree/main/data). Note: If you plan to use this dataset to train models, preprocessing the `prompt` field to remove duplicates is recommended. For more details, see [this discussion](https://huggingface.co/datasets/TrustAIRLab/in-the-wild-jailbreak-prompts/discussions/3). ## Question Set To evaluate the effectiveness of jailbreak prompts, we construct a question set comprising 390 questions across 13 forbidden scenarios adopted from [OpenAI Usage Policy](https://openai.com/policies/usage-policies). We exclude `Child Sexual Abuse` scenario from our evaluation and focus on the rest 13 scenarios, including `Illegal Activity`, `Hate Speech`, `Malware Generation`, `Physical Harm`, `Economic Harm`, `Fraud`, `Pornography`, `Political Lobbying`, `Privacy Violence`, `Legal Opinion`, `Financial Advice`, `Health Consultation`, and `Government Decision`. ```python from datasets import load_dataset forbidden_question_set = load_dataset("TrustAIRLab/forbidden_question_set", split='train') ``` The original file of the question set is also provided in our GitHub repo [jailbreak_llms](https://github.com/verazuo/jailbreak_llms/tree/main/data). ## Code ### Evaluator - ChatGLMEval ``` cd code/ChatGLMEval  python run_evaluator.py ``` ### Semantics Visualization Check `code/semantics_visualization/visualize.ipynb` ## Ethics We acknowledge that data collected online can contain personal information. Thus, we adopt standard best practices to guarantee that our study follows ethical principles, such as not trying to deanonymize any user and reporting results on aggregate. Since this study only involved publicly available data and had no interactions with participants, it is not regarded as human subjects research by our Institutional Review Boards (IRB). Nonetheless, since one of our goals is to measure the risk of LLMs in answering harmful questions, it is inevitable to disclose how a model can generate hateful content. This can bring up worries about potential misuse. However, we strongly believe that raising awareness of the problem is even more crucial, as it can inform LLM vendors and the research community to develop stronger safeguards and contribute to the more responsible release of these models. We have responsibly disclosed our findings to related LLM vendors. ## Citation If you find this useful in your research, please consider citing: ``` @inproceedings{SCBSZ24, author = {Xinyue Shen and Zeyuan Chen and Michael Backes and Yun Shen and Yang Zhang}, title = {{``Do Anything Now'': Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models}}, booktitle = {{ACM SIGSAC Conference on Computer and Communications Security (CCS)}}, publisher = {ACM}, year = {2024} } ``` ## License `jailbreak_llms` is licensed under the terms of the MIT license. See LICENSE for more details.

许可证：MIT许可证数据集信息： - 配置名称：jailbreak_2023_05_07 特征字段： - 名称：platform（平台），数据类型：string（字符串） - 名称：source（来源），数据类型：string（字符串） - 名称：prompt（提示词），数据类型：string（字符串） - 名称：jailbreak（是否为越狱提示词），数据类型：bool（布尔值） - 名称：created_at（创建时间），数据类型：string（字符串） - 名称：date（日期），数据类型：string（字符串） - 名称：community_id（社区ID），数据类型：float64（64位浮点数） - 名称：community_name（社区名称），数据类型：string（字符串）划分集： - 名称：train（训练集），字节数：1391612，样本数：666 下载大小：656975，数据集大小：1391612 - 配置名称：jailbreak_2023_12_25 特征字段： - 名称：platform（平台），数据类型：string（字符串） - 名称：source（来源），数据类型：string（字符串） - 名称：prompt（提示词），数据类型：string（字符串） - 名称：jailbreak（是否为越狱提示词），数据类型：bool（布尔值） - 名称：created_at（创建时间），数据类型：string（字符串） - 名称：date（日期），数据类型：string（字符串） - 名称：community（社区），数据类型：string（字符串） - 名称：community_id（社区ID），数据类型：float64（64位浮点数） - 名称：previous_community_id（前社区ID），数据类型：float64（64位浮点数）划分集： - 名称：train（训练集），字节数：3799875，样本数：1405 下载大小：1871641，数据集大小：3799875 - 配置名称：regular_2023_05_07 特征字段： - 名称：platform（平台），数据类型：string（字符串） - 名称：source（来源），数据类型：string（字符串） - 名称：prompt（提示词），数据类型：string（字符串） - 名称：jailbreak（是否为越狱提示词），数据类型：bool（布尔值） - 名称：created_at（创建时间），数据类型：string（字符串） - 名称：date（日期），数据类型：string（字符串）划分集： - 名称：train（训练集），字节数：6534994，样本数：5721 下载大小：3264474，数据集大小：6534994 - 配置名称：regular_2023_12_25 特征字段： - 名称：platform（平台），数据类型：string（字符串） - 名称：source（来源），数据类型：string（字符串） - 名称：prompt（提示词），数据类型：string（字符串） - 名称：jailbreak（是否为越狱提示词），数据类型：bool（布尔值） - 名称：created_at（创建时间），数据类型：string（字符串） - 名称：date（日期），数据类型：string（字符串）划分集： - 名称：train（训练集），字节数：24345310，样本数：13735 下载大小：12560543，数据集大小：24345310 配置项： - 配置名称：jailbreak_2023_05_07 数据文件： - 划分集：train（训练集），路径：jailbreak_2023_05_07/train-* - 配置名称：jailbreak_2023_12_25 数据文件： - 划分集：train（训练集），路径：jailbreak_2023_12_25/train-* - 配置名称：regular_2023_05_07 数据文件： - 划分集：train（训练集），路径：regular_2023_05_07/train-* - 配置名称：regular_2023_12_25 数据文件： - 划分集：train（训练集），路径：regular_2023_12_25/train-* 任务类别：文本生成规模类别：10000 < 样本数 < 100000 # 大语言模型（Large Language Model，LLM）野外越狱提示词数据集本仓库为ACM CCS 2024论文《“无所不能”：野外大语言模型越狱提示词的特征刻画与评估》<sup>[1]</sup>的官方开源项目，作者为[Xinyue Shen](https://xinyueshen.me/)、[Zeyuan Chen](https://picodora.github.io/)、[Michael Backes](https://michaelbackes.eu/)、Yun Shen以及[Yang Zhang](https://yangzhangalmo.github.io/)。[1]: https://arxiv.org/abs/2308.03825 本项目依托我们全新研发的JailbreakHub框架，开展了首个针对公开网络中越狱提示词的实测研究，共收集2022年12月至2023年12月期间的**15140条提示词**，其中包含**1405条越狱提示词（jailbreak prompt）**。请访问我们的[官方网站](https://jailbreak-llms.xinyueshen.me/)获取更多相关信息。 **免责声明**：本仓库包含有害语言示例，请读者谨慎阅读。本仓库仅用于学术研究目的，严禁任何形式的不当使用。 ## 提示词数据集 ### 数据集概况我们于2022年12月至2023年12月期间，从四大平台（Reddit、Discord、公开网站及开源数据集）共计收集15140条提示词，其中经人工标注识别出1405条越狱提示词。据我们所知，本数据集是目前规模最大的公开野外越狱提示词集合。 > 数据来源统计：(Adv) UA 指代（对抗性）用户账户（(Adversarial) User Account）。 | 平台 | 来源 | 帖子数 | 用户账户数 | 对抗性用户账户数 | 提示词数 | 越狱提示词数 | 提示词时间范围 | | ------ | ------------------------ | ------ | ---------- | ---------------- | -------- | ------------ | -------------- | | Reddit | r/ChatGPT | 163549 | 147 | 147 | 176 | 176 | 2023.02-2023.11 | | Reddit | r/ChatGPTPromptGenius | 3536 | 305 | 21 | 654 | 24 | 2022.12-2023.11 | | Reddit | r/ChatGPTJailbreak | 1602 | 183 | 183 | 225 | 225 | 2023.02-2023.11 | | Discord| ChatGPT | 609 | 259 | 106 | 544 | 214 | 2023.02-2023.12 | | Discord| ChatGPT Prompt Engineering | 321 | 96 | 37 | 278 | 67 | 2022.12-2023.12 | | Discord| Spreadsheet Warriors | 71 | 3 | 3 | 61 | 61 | 2022.12-2023.09 | | Discord| AI Prompt Sharing | 25 | 19 | 13 | 24 | 17 | 2023.03-2023.04 | | Discord| LLM Promptwriting | 184 | 64 | 41 | 167 | 78 | 2023.03-2023.12 | | Discord| BreakGPT | 36 | 10 | 10 | 32 | 32 | 2023.04-2023.09 | | 网站 | AIPRM | - | 2777 | 23 | 3930 | 25 | 2023.01-2023.06 | | 网站 | FlowGPT | - | 3505 | 254 | 8754 | 405 | 2022.12-2023.12 | | 网站 | JailbreakChat | - | - | - | 79 | 79 | 2023.02-2023.05 | | 数据集 | AwesomeChatGPTPrompts | - | - | - | 166 | 2 | - | | 数据集 | OCR-Prompts | - | - | - | 50 | 0 | - | | **总计** | | **169933** | **7308** | **803** | **15140**| **1405** | **2022.12-2023.12** | ### 加载数据集您可以通过Hugging Face「数据集（Datasets）」库快速加载本数据集的所有提示词： python from datasets import load_dataset # 加载2023年5月7日版越狱提示词训练集 dataset = load_dataset('TrustAIRLab/in-the-wild-jailbreak-prompts', 'jailbreak_2023_05_07', split='train') # 加载2023年12月25日版越狱提示词训练集 # dataset = load_dataset('TrustAIRLab/in-the-wild-jailbreak-prompts', 'jailbreak_2023_12_25', split='train') # 加载2023年5月7日版常规提示词训练集 # dataset = load_dataset('TrustAIRLab/in-the-wild-jailbreak-prompts', 'regular_2023_05_07', split='train') # 加载2023年12月25日版常规提示词训练集 # dataset = load_dataset('TrustAIRLab/in-the-wild-jailbreak-prompts', 'regular_2023_12_25', split='train') 原始CSV格式数据可在我们的GitHub仓库[jailbreak_llms](https://github.com/verazuo/jailbreak_llms/tree/main/data)中获取。 **注意**：若您计划使用本数据集训练模型，建议对`prompt`字段进行预处理以去除重复样本。详细说明请参阅[此讨论帖](https://huggingface.co/datasets/TrustAIRLab/in-the-wild-jailbreak-prompts/discussions/3)。 ## 评测问题集为评估越狱提示词的攻击有效性，我们构建了一套标准化评测问题集，包含源自[OpenAI使用政策](https://openai.com/policies/usage-policies)的13个违禁场景下的390个问题。我们从评测中移除了「儿童性虐待」场景，聚焦其余13个场景，包括：非法活动、仇恨言论、恶意软件生成、人身伤害、经济损害、欺诈、色情内容、政治游说、隐私暴力、法律意见、金融咨询、健康咨询以及政府决策。您可以通过以下代码加载评测问题集： python from datasets import load_dataset # 加载评测问题集训练集 forbidden_question_set = load_dataset("TrustAIRLab/forbidden_question_set", split='train') 评测问题集的原始文件同样可在我们的GitHub仓库[jailbreak_llms](https://github.com/verazuo/jailbreak_llms/tree/main/data)中获取。 ## 代码实现 ### 评测器 - ChatGLMEval cd code/ChatGLMEval # 在run_evaluator.py的df_path_list变量中添加数据路径 python run_evaluator.py ### 语义可视化请查看`code/semantics_visualization/visualize.ipynb`文件。 ## 伦理考量我们意识到在线收集的公开数据可能包含个人隐私信息，因此本研究采用了行业通用的最佳实践以确保符合伦理原则，例如不尝试对任何用户进行去匿名化处理，且仅报告聚合统计结果。由于本研究仅使用公开可得数据且未与研究对象产生任何交互，我们的机构审查委员会（Institutional Review Board, IRB）未将其认定为人体受试者研究。尽管如此，由于本研究的目标之一是评估大语言模型回答有害问题的风险，不可避免地需要披露模型生成仇恨内容的机制，这可能引发对潜在不当使用的担忧。然而我们坚信，提升对该问题的认知更为关键，这可以帮助大语言模型服务商及研究社区开发更强的防护机制，助力这些模型更负责任地发布。我们已就本研究的发现向相关大语言模型服务商进行了负责任的披露。 ## 引用若您的研究中使用了本数据集，请考虑引用以下论文： @inproceedings{SCBSZ24, author = {Xinyue Shen and Zeyuan Chen and Michael Backes and Yun Shen and Yang Zhang}, title = {{``Do Anything Now'': Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models}}, booktitle = {{ACM SIGSAC Conference on Computer and Communications Security (CCS)}}, publisher = {ACM}, year = {2024} } ## 许可证 `jailbreak_llms`采用MIT许可证协议，更多细节请参阅LICENSE文件。

应用场景：