hmalik03054729558/prompt-injection-dataset

Name: hmalik03054729558/prompt-injection-dataset
Creator: hmalik03054729558
Published: 2026-05-29 11:53:04
License: 暂无描述

Hugging Face2026-05-29 更新2026-05-31 收录

下载链接：

https://hf-mirror.com/datasets/hmalik03054729558/prompt-injection-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个二进制分类数据集，用于检测基于大语言模型（LLM）应用中的用户输入是否为提示词注入攻击。它旨在训练仅编码器模型（如BERT、RoBERTa、DistilBERT）将用户输入分类为良性或提示词注入尝试。数据集包含两个类别：标签0表示良性（合法用户查询），标签1表示注入（提示词注入尝试）。特征包括文本列（用户输入字符串）和标签列（二进制标签0或1）。覆盖的攻击类型包括指令覆盖、角色混淆、提示提取、混淆技术、社会工程、技术注入、嵌入式攻击、假设框架、紧急/危机场景和上下文操纵。此外，良性类别包含硬负样本，如AI/ML技术问题、提示工程、AI安全研究等，以减少误报。数据集特点包括平衡的训练集（良性与攻击比例为1.32:1）、内容多样性（良性类别中30%为问题，70%为陈述）、文本长度多样（从20-150+字符）和攻击多样性（覆盖10+攻击类别）。使用方式可通过Hugging Face的datasets库加载，评估指标包括准确率、F1分数、精确率、召回率和ROC-AUC。限制包括仅支持英语、专注于基于文本的攻击（无多模态）、可能未覆盖所有新兴攻击模式，以及验证/测试集为手动策划且规模较小。

A binary classification dataset for detecting prompt injection attacks in user inputs to LLM-based applications. It is designed to train encoder-only models (e.g., BERT, RoBERTa, DistilBERT) to classify user inputs as either benign or prompt injection attempts. The dataset includes two classes: label 0 for BENIGN (legitimate user queries) and label 1 for INJECTION (prompt injection attempts). Features include a text column (user input string) and a label column (binary label 0 or 1). It covers various prompt injection attack patterns such as instruction override, role confusion, prompt extraction, obfuscation, social engineering, technical injection, embedded attacks, hypothetical framing, urgency/emergency scenarios, and context manipulation. The benign class includes hard negatives like AI/ML technical questions, prompt engineering, AI safety research, etc., to reduce false positives. Dataset characteristics include balanced training (1.32:1 benign-to-attack ratio), diverse content (30% questions and 70% statements in benign class), text length variety (short to long examples), and attack diversity (covering 10+ attack categories). Usage involves loading via Hugging Faces datasets library, and evaluation metrics include accuracy, F1-score, precision, recall, and ROC-AUC. Limitations include English-only language, focus on text-based attacks (no multi-modal), potential gaps in emerging attack patterns, and manually curated validation/test sets that are smaller than the training set.

提供机构：

hmalik03054729558

5,000+

优质数据集

54 个

任务类型

进入经典数据集