five

hmalik03054729558/prompt-injection-dataset

收藏
Hugging Face2026-05-29 更新2026-05-31 收录
下载链接:
https://hf-mirror.com/datasets/hmalik03054729558/prompt-injection-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是一个二进制分类数据集,用于检测基于大语言模型(LLM)应用中的用户输入是否为提示词注入攻击。它旨在训练仅编码器模型(如BERT、RoBERTa、DistilBERT)将用户输入分类为良性或提示词注入尝试。数据集包含两个类别:标签0表示良性(合法用户查询),标签1表示注入(提示词注入尝试)。特征包括文本列(用户输入字符串)和标签列(二进制标签0或1)。覆盖的攻击类型包括指令覆盖、角色混淆、提示提取、混淆技术、社会工程、技术注入、嵌入式攻击、假设框架、紧急/危机场景和上下文操纵。此外,良性类别包含硬负样本,如AI/ML技术问题、提示工程、AI安全研究等,以减少误报。数据集特点包括平衡的训练集(良性与攻击比例为1.32:1)、内容多样性(良性类别中30%为问题,70%为陈述)、文本长度多样(从20-150+字符)和攻击多样性(覆盖10+攻击类别)。使用方式可通过Hugging Face的datasets库加载,评估指标包括准确率、F1分数、精确率、召回率和ROC-AUC。限制包括仅支持英语、专注于基于文本的攻击(无多模态)、可能未覆盖所有新兴攻击模式,以及验证/测试集为手动策划且规模较小。

A binary classification dataset for detecting prompt injection attacks in user inputs to LLM-based applications. It is designed to train encoder-only models (e.g., BERT, RoBERTa, DistilBERT) to classify user inputs as either benign or prompt injection attempts. The dataset includes two classes: label 0 for BENIGN (legitimate user queries) and label 1 for INJECTION (prompt injection attempts). Features include a text column (user input string) and a label column (binary label 0 or 1). It covers various prompt injection attack patterns such as instruction override, role confusion, prompt extraction, obfuscation, social engineering, technical injection, embedded attacks, hypothetical framing, urgency/emergency scenarios, and context manipulation. The benign class includes hard negatives like AI/ML technical questions, prompt engineering, AI safety research, etc., to reduce false positives. Dataset characteristics include balanced training (1.32:1 benign-to-attack ratio), diverse content (30% questions and 70% statements in benign class), text length variety (short to long examples), and attack diversity (covering 10+ attack categories). Usage involves loading via Hugging Faces datasets library, and evaluation metrics include accuracy, F1-score, precision, recall, and ROC-AUC. Limitations include English-only language, focus on text-based attacks (no multi-modal), potential gaps in emerging attack patterns, and manually curated validation/test sets that are smaller than the training set.
提供机构:
hmalik03054729558
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作