hackaprompt-dataset

Name: hackaprompt-dataset
Creator: maas
Published: 2025-12-05 16:41:29
License: 暂无描述

魔搭社区2025-12-05 更新2025-07-12 收录

下载链接：

https://modelscope.cn/datasets/hackaprompt/hackaprompt-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for HackAPrompt 💻🔍 This dataset contains submissions from a prompt hacking competition. An in-depth analysis of the dataset has been accepted at the EMNLP 2023 conference. 📊👾 Submissions were sourced from two environments: a playground for experimentation and an official submissions platform. The playground itself can be accessed [here](https://huggingface.co/spaces/hackaprompt/playground) 🎮 More details about the competition itself [here](http://paper.hackaprompt.com) 🏆 ## Dataset Details 📋 ### Dataset Description 📄 We conducted a prompt hacking competition where users were competing to "hack" different large language models (LLMs). Different levels were proposed, with varying degrees of difficulty, and for each level, 3 LLMs were evaluated: GPT-3 (`text-davinci-003`), FlanT5-XXL (`philschmid/flan-t5-xxl-sharded-fp16`), and ChatGPT (`gpt-3.5-turbo`). We anonymously collected user submissions throughout the competition and also had users submit their best attempts via an online platform for a chance to win the competition. Users submitted their prompts, and our servers automatically evaluated their attempts. To delineate between ties, token counts were used where lower counts gave better scores. This dataset releases all submissions sent to both our playground and submission servers. 📤📥 ### Columns Description 🧾 - **level**: A numerical value indicating the difficulty or complexity of the prompt. - **user_input**: The input provided by the user or participant in response to the given challenge. - **prompt**: The full prompt that was used to query the model, this includes the user's input. - **completion**: The output or completion generated by the model based on the user's input. - **model**: The type or version of the model that generated the completion. For example, "gpt-3.5-turbo" or "FlanT5-XXL". - **expected_completion**: The expected or ideal output that should have been generated by the model for the given user input. - **token_count**: The number of tokens present in the user's input. This serves as a measure of the input's length. - **correct**: A boolean value indicating whether the model's completion was correct or not, based on the expected output. - **error**: A boolean value indicating if there was an error during the model's processing of the user input. Note: we did not include submissions that triggered errors in this dataset. - **score**: A numerical value representing the score assigned to the model's completion based on its accuracy, correctness, and other evaluation metrics. (Only available for prompts on the submissions platform) - **dataset**: A categorical variable indicating the source of the submission. The two categories are "playground_data" (for submissions from the playground environment) and "submission_data" (for official submissions). - **timestamp**: The date and time when the submission was made. (Only available for playground dataset)  ## Uses 🧑‍🔬 This dataset is meant to be used in a research context to better understand the different types of attacks "in the wild" on LLMs. 📚🔬  #### Personal and Sensitive Information 🔒 We did not release directly any personal or sensitive information explicitly. On the playground, users could submit anonymously, and we did not collect information about the users directly. For the submissions data, teams did submit in their names, but that information has not been made available in this version of the dataset to preserve participants' privacy.  ## Bias, Risks, and Limitations ⚠️ The data was submitted via a public portal hosted on huggingface. We did not curate the data before publishing it. The data may contain offensive material. Please use at your own risk. ### Recommendations 🚀 Users should be made aware of the risks, biases, and limitations of the dataset and use at their own risk. Please use at your own risk. ## Citation 📝  **BibTeX:** https://arxiv.org/abs/2311.16119 ``` @inproceedings{Schulhoff:Pinto:Khan:Bouchard:Si:Boyd-Graber:Anati:Tagliabue:Kost:Carnahan-2023, Title = {Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs Through a Global Prompt Hacking Competition}, Author = {Sander V Schulhoff and Jeremy Pinto and Anaum Khan and Louis-FranÃ§ois Bouchard and Chenglei Si and Jordan Lee Boyd-Graber and Svetlina Anati and Valen Tagliabue and Anson Liu Kost and Christopher R Carnahan}, Booktitle = {Empirical Methods in Natural Language Processing}, Year = {2023}, Location = {Singapore} } ```

# HackAPrompt 💻🔍 数据集卡片本数据集收录了一场提示词黑客攻防竞赛的参赛提交作品。针对该数据集的深度分析论文已被EMNLP 2023会议收录。📊👾 参赛提交作品来自两个渠道：用于实验的互动试玩平台，以及官方提交平台。可通过[此链接](https://huggingface.co/spaces/hackaprompt/playground)访问该试玩平台 🎮 有关本次竞赛的更多详情可查阅[此链接](http://paper.hackaprompt.com) 🏆 ## 数据集详情 📋 ### 数据集概况 📄 我们举办了一场提示词黑客攻防竞赛，参赛者旨在通过构造提示词“攻破”不同的大语言模型（Large Language Model, LLM）。竞赛设置了多个难度梯度各异的关卡，每个关卡均会针对三款大语言模型进行测试：GPT-3（`text-davinci-003`）、FlanT5-XXL（`philschmid/flan-t5-xxl-sharded-fp16`）以及ChatGPT（`gpt-3.5-turbo`）。竞赛全程我们以匿名方式收集用户提交的攻击方案，同时也开放了线上平台供参赛者提交最优尝试以角逐竞赛奖项。参赛者提交构造的提示词后，我们的服务器会自动对其攻击效果进行评估。为区分平局情况，我们采用标记（Token）数量作为评分依据：输入的标记数越少，得分越高。本数据集将包含所有提交至试玩平台与官方提交服务器的参赛作品。📤📥 ### 字段说明 🧾 - **level**：用于表示提示词任务难度与复杂度的数值。 - **user_input**：参赛者针对给定任务提交的用户输入内容。 - **prompt**：用于向模型发起查询的完整提示词，其中包含参赛者提交的用户输入内容。 - **completion**：模型基于用户输入生成的输出结果或补全文本。 - **model**：生成补全文本所使用的模型类型或版本，例如`gpt-3.5-turbo`或FlanT5-XXL。 - **expected_completion**：针对给定用户输入，模型理应生成的预期或理想输出结果。 - **token_count**：用户输入中包含的标记（Token）数量，用于衡量输入文本的长度。 - **correct**：布尔类型值，用于表示模型生成的补全文本是否与预期输出一致。 - **error**：布尔类型值，用于表示模型处理用户输入时是否触发了错误。注意：本数据集未包含触发错误的参赛提交作品。 - **score**：基于模型补全结果的准确率、正确性及其他评估指标计算得出的分数值。（仅官方提交平台的提示词任务可获取该字段） - **dataset**：表示提交作品来源的分类变量，包含两类取值：`playground_data`（来自试玩平台的提交作品）与`submission_data`（官方提交作品）。 - **timestamp**：提交作品的生成时间戳。（仅试玩平台数据集包含该字段）  ## 数据集用途 🧑‍🔬 本数据集旨在用于研究场景，以深入理解针对大语言模型（Large Language Model, LLM）的各类真实场景攻击手段。📚🔬  #### 个人与敏感信息 🔒 我们未直接公开任何明确的个人或敏感信息。在试玩平台中，用户可匿名提交作品，我们也未直接收集用户的个人信息。对于官方提交的数据，参赛团队需以团队名义提交，但为保护参赛者隐私，本版本数据集未包含此类信息。  ## 偏差、风险与局限性 ⚠️ 本数据集的数据通过托管于Hugging Face的公共门户收集。我们未在发布前对数据进行筛选整理，因此数据中可能包含冒犯性内容。请谨慎使用，风险自负。 ### 使用建议 🚀 使用者应充分知晓本数据集的风险、偏差与局限性，并谨慎使用，风险自负。请谨慎使用，风险自负。 ## 引用规范 📝  **BibTeX 格式：** https://arxiv.org/abs/2311.16119 @inproceedings{Schulhoff:Pinto:Khan:Bouchard:Si:Boyd-Graber:Anati:Tagliabue:Kost:Carnahan-2023, Title = {忽略此标题并参与HackAPrompt：通过全球提示词黑客攻防竞赛揭示大语言模型的系统性漏洞}, Author = {Sander V Schulhoff and Jeremy Pinto and Anaum Khan and Louis-François Bouchard and Chenglei Si and Jordan Lee Boyd-Graber and Svetlina Anati and Valen Tagliabue and Anson Liu Kost and Christopher R Carnahan}, Booktitle = {自然语言处理经验方法会议（Empirical Methods in Natural Language Processing）}, Year = {2023}, Location = {新加坡} }

提供机构：

maas

创建时间：

2025-07-10

搜集汇总

数据集介绍