five

neuralchemy/prompt-injection-Threat-Matrix

收藏
Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/neuralchemy/prompt-injection-Threat-Matrix
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: cc-by-nc-4.0 task_categories: - text-classification tags: - prompt-injection - jailbreak - security - llm-security - ai-safety - threat-intelligence - adversarial-nlp size_categories: - 10K<n<100K configs: - config_name: binary data_files: - split: train path: binary/train-* - split: validation path: binary/validation-* - split: test path: binary/test-* - config_name: multiclass default: true data_files: - split: train path: multiclass/train-* - split: validation path: multiclass/validation-* - split: test path: multiclass/test-* --- # Neuralchemy Prompt Injection Threat Matrix A professional-grade prompt injection and jailbreak detection dataset featuring 32,320 curated samples across 7 attack intent classes with full threat intelligence schema including technique classification, severity scoring, attack surface detection, and ambiguity flagging. Built for training production-grade LLM security classifiers. --- ## Why This Dataset Most prompt injection datasets offer binary labels only. This dataset provides a full threat matrix — not just "is this malicious?" but "what does the attacker want, how are they doing it, and how dangerous is it?" This enables: - Multi-class threat classifiers - Severity-aware filtering systems - Technique-specific defenses - Ambiguity-aware edge case handling --- ## Construction **Sources:** - HackaPrompt competition dataset - Neuralchemy prompt injection dataset v1 - Synthetic threat generation - Synthetic benign generation **Labeling pipeline:** - Qwen 14B running locally for 24 hours - 7 intent classes labeled - 10-level severity scoring - Technique and surface classification - Ambiguity flagging for borderline cases **Quality control:** - 50,000 raw samples generated - Filtered to 32,320 high-quality samples - Verified no data leakage between splits - Clean 80/10/10 train/val/test split - Duplicate and near-duplicate removal --- ## Configs | Config | Train | Validation | Test | Purpose | |--------|------:|----------:|-----:|---------| | binary | 25,856 | 3,232 | 3,232 | Benign vs malicious | | multiclass | 25,856 | 3,232 | 3,232 | 7-way intent classification | --- ## Schema ### Binary config | Column | Type | Description | |--------|------|-------------| | text | string | Input prompt | | label | int | 0=benign, 1=malicious | | ambiguity | bool | True if borderline case | ### Multiclass config | Column | Type | Description | |--------|------|-------------| | text | string | Input prompt | | label | int | 7-way intent class | | binary_label | int | 0=benign, 1=malicious | | intent | string | Intent name | | intent_label | int | Intent class index | | technique | string | Attack technique name | | technique_label | int | Technique class index | | severity | int | 1-10 severity score | | surface | string | Attack surface | | surface_label | int | Surface class index | | source | string | Data source | | ambiguity | bool | True if borderline | --- ## Label Mapping ### Intent Classes | Label | Intent | Description | |-------|--------|-------------| | 0 | benign | Normal user input | | 1 | direct_injection | Explicit instruction override | | 2 | system_extraction | Attempts to leak system prompt | | 3 | role_hijack | Persona or role manipulation | | 4 | obfuscation | Encoded or disguised attacks | | 5 | tool_abuse | Malicious tool/function calls | | 6 | indirect_injection | Context-based injection | ### Severity Scale 1-2: Low — unlikely to succeed 3-4: Moderate — some risk 5-6: High — likely effective 7-8: Severe — dangerous 9-10: Critical — high impact --- ## Quick Start ```python from datasets import load_dataset # Binary classification binary_ds = load_dataset( "neuralchemy/prompt-injection-Threat-Matrix", "binary" ) # Multiclass threat classification multi_ds = load_dataset( "neuralchemy/prompt-injection-Threat-Matrix", "multiclass" ) # Example: filter by severity high_severity = [ x for x in multi_ds["train"] if x["severity"] >= 7 ] # Example: filter by technique encoding_attacks = [ x for x in multi_ds["train"] if x["technique"] == "encoding" ] Trained Model A DistilBERT classifier trained on this dataset is available at: 👉 neuralchemy/distilbert-base-threat-matrix Related Work • Neuralchemy Prompt Injection Dataset v1 • AITL: AI In The Loop Framework Paper zenodo.org/records/19551173 • OpenClay: Agentic Security Framework (coming soon) Citation If you use this dataset please cite: @dataset{jajoo2026threatmatrix, author = {Sanskar Jajoo}, title = {Neuralchemy Prompt Injection Threat Matrix}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/ neuralchemy/prompt-injection- Threat-Matrix} } License CC BY-NC 4.0 — Free for research. Commercial use requires permission. Contact via GitHub: github.com/m4vic ---
提供机构:
neuralchemy
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作