Latent Jailbreak Prompt Dataset

arXiv2023-08-28 更新2024-06-21 收录

下载链接：

https://github.com/qiuhuachuan/latent-jailbreak

下载链接

链接失效反馈

官方服务：

资源简介：

本研究引入了名为‘Latent Jailbreak Prompt Dataset’的数据集，由浙江大学和西湖大学工程学院共同创建。该数据集包含416条潜藏恶意指令嵌入的提示，旨在全面研究大型语言模型（LLMs）的文本安全和输出鲁棒性。数据集通过替换常规任务中的输入，如翻译任务中的恶意指令，来构建潜藏的越狱提示。创建过程涉及设计层次化标注框架，以分析LLMs在明确正常指令的位置、词汇替换和指令替换等方面的安全性和鲁棒性。该数据集主要应用于评估LLMs在面对敏感话题时的安全性和鲁棒性问题，旨在解决模型在遵循指令时的安全性和鲁棒性挑战。

This study introduces a dataset named 'Latent Jailbreak Prompt Dataset', which was jointly developed by the School of Engineering of Zhejiang University and Westlake University. This dataset contains 416 prompts embedded with latent malicious instructions, aiming to comprehensively investigate the text safety and output robustness of Large Language Models (LLMs). The dataset constructs latent jailbreak prompts by replacing standard inputs in conventional tasks, such as substituting normal translation inputs with malicious instructions. The dataset creation process involves designing a hierarchical annotation framework to analyze the safety and robustness of LLMs in scenarios including explicit normal instruction positioning, lexical replacement, and instruction replacement. This dataset is primarily utilized to evaluate the safety and robustness issues of LLMs when confronted with sensitive topics, with the objective of addressing the safety and robustness challenges that models face when following instructions.

提供机构：

浙江大学西湖大学工程学院

创建时间：

2023-07-17

5,000+

优质数据集

54 个

任务类型

进入经典数据集