gabrielchua/system-prompt-leakage

Name: gabrielchua/system-prompt-leakage
Creator: gabrielchua
Published: 2024-11-04 15:01:02
License: 暂无描述

Hugging Face2024-11-04 更新2024-12-14 收录

下载链接：

https://hf-mirror.com/datasets/gabrielchua/system-prompt-leakage

下载链接

链接失效反馈

官方服务：

资源简介：

System Prompt Leakage数据集提供了一个合成提示和模型响应的集合，专门用于帮助检测和管理系统提示泄露的实例。在现代大型语言模型（LLMs）应用中，保护敏感或专有的系统指令不被暴露在响应中至关重要。该数据集提供了多样化的现实世界启发示例，用于开发和评估防止此类泄露的防护措施。数据集包含283,353条训练集数据和71,351条测试集数据。数据集涵盖了直接和间接泄露的示例，并通过合成数据生成方法创建。数据结构包括系统提示、内容和泄露标签三列。数据集的应用场景包括训练和基准测试模型，以增强数据隐私和专有信息的安全性。

The System Prompt Leakage Dataset offers a collection of synthetic prompts and model responses, specifically designed to help detect and manage instances of system prompt leakage. In modern applications of large language models (LLMs), safeguarding sensitive or proprietary system instructions from being exposed in responses is critical. This dataset provides a diverse set of real-world-inspired examples for developing and evaluating guardrails to prevent such leakage. The dataset comprises 283,353 entries in the training set and 71,351 entries in the test set, covering both direct and indirect leakage types. Synthetic system prompts and user prompts are sourced from the [off-topic dataset] and provided to the model for paraphrasing and responding tasks. The dataset structure includes system_prompt, content, and leakage columns to assess whether the content constitutes system prompt leakage. This dataset can be applied in training and benchmarking models intended to detect and prevent various forms of system prompt leakage, enabling enhanced data privacy and proprietary information security in LLM deployments.

提供机构：

gabrielchua

5,000+

优质数据集

54 个

任务类型

进入经典数据集