five

AdvRahul/Agentic-Safety

收藏
Hugging Face2026-04-18 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/AdvRahul/Agentic-Safety
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-generation - question-answering - text-classification language: - en tags: - cybersecurity - agentic-ai - security - llm-security - owasp - synthetic-data size_categories: - 10K<n<100K --- # agentic-safety-gguf: Training & Evaluation Datasets **Model**: [guerilla7/agentic-safety-gguf](https://huggingface.co/guerilla7/agentic-safety-gguf) **Paper**: (https://arxiv.org/abs/2601.00848) **Total**: 80,992 examples (80,851 after deduplication) ## Overview Complete training and evaluation datasets for agentic-safety-gguf, a specialized Llama 3.1 8B model for agentic AI security analysis. Supports iterative continuation training methodology (V2→V3→V4) for full reproducibility. ### Dataset Files | File | Examples | Size | Purpose | |------|----------|------|---------| | **training_data_v2.jsonl** | 45,825 | 134MB | Base training (18 cybersecurity sources) | | **training_data_v3_synthetic.jsonl** | 80,851 | 212MB | V2 training: Base + 35,026 synthetic traces | | **continuation_v3_owasp.jsonl** | 111 | 101KB | V3 continuation: OWASP Top 10 + MS Taxonomy | | **continuation_v4_adversarial.json** | 30 | 22KB | V4 continuation: Adversarial examples | | **cybersecurity_questions.jsonl** | 75 | 21KB | Custom MCQA evaluation | | **benign_traces.json** | 15 | 8.9KB | Legitimate workflows (testing) | | **malicious_traces.json** | 15 | 8.9KB | Attack traces (testing) | | **agentic_security_augmentation.jsonl** | - | - | Additional augmentation data | ## Training Data Composition **V2 Base** (80,851 examples from `training_data_v3_synthetic.jsonl`): - **18 Public Datasets** (45,825 examples): HelpSteer, cybersecurity base datasets, Agent-SafetyBench, UltraFeedback, TruthfulQA, and 13 others - **Synthetic Traces** (35,026 examples): OpenTelemetry traces generated via Claude Sonnet 4.5 covering attack patterns (prompt injection, multi-agent attacks, tool manipulation) and benign workflows **V3 Continuation** (+111 examples from `continuation_v3_owasp.jsonl`): - OWASP Top 10 for Agentic Applications (2026) - Microsoft Taxonomy of Failure Modes in Agentic AI Systems - Targeted knowledge gap closure **V4 Continuation** (+30 examples from `continuation_v4_adversarial.json`): - Attack success rate definitions - Multi-step attack chain analysis - Adversarial examples targeting remaining weaknesses See research paper for complete source attribution and deduplication methodology. ## Training Results | Version | Training Data | MCQA Accuracy | Improvement | |---------|---------------|---------------|-------------| | **V2** | 80,851 base | 61.4% | Baseline | | **V3** | +111 OWASP | 67.1% | +5.7 pts | | **V4** | +30 adversarial | **74.3%** | +7.2 pts | **Final Performance**: 74.29% overall (70% agentic, 76% traditional security) **Base Model Comparison**: +31.43 points improvement over base model ## Quick Start ### Load Datasets ```python from datasets import load_dataset # V2 base training train_v2 = load_dataset("guerilla7/agentic-safety-gguf", data_files="training_data_v3_synthetic.jsonl") # V3 continuation continuation_v3 = load_dataset("guerilla7/agentic-safety-gguf", data_files="continuation_v3_owasp.jsonl") # V4 continuation continuation_v4 = load_dataset("guerilla7/agentic-safety-gguf", data_files="continuation_v4_adversarial.json") # Evaluation mcqa = load_dataset("guerilla7/agentic-safety-gguf", data_files="cybersecurity_questions.jsonl") benign = load_dataset("guerilla7/agentic-safety-gguf", data_files="benign_traces.json") malicious = load_dataset("guerilla7/agentic-safety-gguf", data_files="malicious_traces.json") ``` ### Reproduce Training Pipeline ```python from transformers import AutoModelForCausalLM, AutoTokenizer from datasets import load_dataset # Step 1: Train V2 base model base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct") dataset_v2 = load_dataset("guerilla7/agentic-safety-gguf", data_files="training_data_v3_synthetic.jsonl") # Train with QLoRA (see model repo for complete training script) # Step 2: V3 continuation dataset_v3 = load_dataset("guerilla7/agentic-safety-gguf", data_files="continuation_v3_owasp.jsonl") # Continue training from V2 checkpoint (500 steps) # Step 3: V4 continuation dataset_v4 = load_dataset("guerilla7/agentic-safety-gguf", data_files="continuation_v4_adversarial.json") # Continue training from V3 checkpoint (500 steps) ``` ## Data Format ### Training Examples (JSONL) ```json { "instruction": "Analyze this agentic workflow for security vulnerabilities...", "input": "[workflow description or trace data]", "output": "This workflow exhibits ASI01 (Prompt Injection)...", "source": "Agent-SafetyBench", "category": "security_analysis" } ``` ### MCQA Evaluation (JSONL) ```json { "question": "What is the primary risk of ASI01?", "A": "Performance degradation", "B": "Prompt injection attacks", "C": "Data leakage", "D": "None", "answer": "B" } ``` ### Trace Data (JSON) ```json { "trace_id": "unique_id", "spans": [...], "classification": "benign" | "malicious", "attack_type": "prompt_injection" | null } ``` ## Use Cases ✅ **Reproduce Research**: Complete V2/V3/V4 training pipeline ✅ **Train Alternative Models**: Llama 3.3, Qwen 2.5, Mistral ✅ **Develop Balanced Datasets**: Add benign workflow examples to address 66.7% FPR ✅ **Domain-Specific Security**: Fintech, healthcare, government specialization ✅ **Benchmark Evaluation**: Compare new security models ## Ethical Considerations - **Synthetic Attack Patterns**: Research purposes only, not for malicious use - **High FPR (66.7%)**: Model trained on this data requires human oversight in production - **Synthetic Data Bias**: 43% synthetic data may not reflect real-world distributions - **Defensive Research**: Designed for security improvement, not attack development ## Citation ```bibtex @article{agentic-safety-gguf-2025, title={agentic-safety-gguf: Specialized Fine-Tuning for Agentic AI Security}, year={2025}, url={https://huggingface.co/datasets/guerilla7/agentic-safety-gguf} } ``` ## Resources - **Model**: [guerilla7/agentic-safety-gguf](https://huggingface.co/guerilla7/agentic-safety-gguf) - **Training Scripts**: See model repository for complete QLoRA implementation - **Research Paper**:(https://arxiv.org/abs/2601.00848) ## License Apache 2.0

许可证:Apache 2.0 任务类别: - 文本生成 - 问答 - 文本分类 语言: - 英语 标签: - 网络安全 - AI智能体(agentic AI) - 安全 - 大语言模型安全(LLM Security) - OWASP(开放式Web应用安全项目) - 合成数据 规模类别: - 10K < n < 100K # agentic-safety-gguf:训练与评测数据集 **模型**:[guerilla7/agentic-safety-gguf](https://huggingface.co/guerilla7/agentic-safety-gguf) **论文**:(https://arxiv.org/abs/2601.00848) **总样本数**:80,992条(去重后为80,851条) ## 概览 本数据集为agentic-safety-gguf的完整训练与评测数据集,该模型是一款专门用于AI智能体安全分析的Llama 3.1 8B模型。本数据集支持可完全复现的迭代续训方法论(V2→V3→V4)。 ## 数据集文件 | 文件名称 | 样本数量 | 大小 | 用途 | |------|----------|------|---------| | **training_data_v2.jsonl** | 45,825 | 134MB | 基础训练(源自18个网络安全数据源) | | **training_data_v3_synthetic.jsonl** | 80,851 | 212MB | V2训练:基础数据集 + 35,026条合成轨迹 | | **continuation_v3_owasp.jsonl** | 111 | 101KB | V3续训:OWASP Top 10 + 微软故障模式分类法 | | **continuation_v4_adversarial.json** | 30 | 22KB | V4续训:对抗样本数据集 | | **cybersecurity_questions.jsonl** | 75 | 21KB | 自定义多项选择问答评测 | | **benign_traces.json** | 15 | 8.9KB | 合法工作流(测试用) | | **malicious_traces.json** | 15 | 8.9KB | 攻击轨迹(测试用) | | **agentic_security_augmentation.jsonl** | - | - | 额外增强数据 | ## 训练数据构成 **V2基础数据集**(源自`training_data_v3_synthetic.jsonl`的80,851条样本): - **18个公开数据集**(共45,825条样本):包括HelpSteer、网络安全基础数据集、Agent-SafetyBench、UltraFeedback、TruthfulQA 以及其余13个公开数据集 - **合成轨迹**(共35,026条样本):通过Claude Sonnet 4.5生成的开放遥测(OpenTelemetry)轨迹,覆盖攻击模式(提示注入、多智能体攻击、工具操纵)与合法工作流 **V3续训数据集**(源自`continuation_v3_owasp.jsonl`的新增111条样本): - 2026版AI智能体应用OWASP Top 10 - 微软AI智能体系统故障模式分类法 - 针对性填补模型知识空白 **V4续训数据集**(源自`continuation_v4_adversarial.json`的新增30条样本): - 攻击成功率定义 - 多步攻击链分析 - 针对模型剩余薄弱点的对抗样本 完整的数据源归因与去重方法论详见研究论文。 ## 训练结果 | 模型版本 | 训练数据 | 多项选择问答准确率 | 性能提升 | |---------|---------------|---------------|-------------| | **V2** | 80,851条基础样本 | 61.4% | 基线基准 | | **V3** | 新增111条OWASP相关样本 | 67.1% | +5.7个百分点 | | **V4** | 新增30条对抗样本 | **74.3%** | +7.2个百分点 | **最终性能**:整体准确率为74.29%(其中AI智能体场景占比70%,传统安全场景占比76%);**基础模型对比**:相较初始基础模型提升31.43个百分点。 ## 快速开始 ### 加载数据集 python from datasets import load_dataset # 加载V2基础训练数据集 train_v2 = load_dataset("guerilla7/agentic-safety-gguf", data_files="training_data_v3_synthetic.jsonl") # 加载V3续训数据集 continuation_v3 = load_dataset("guerilla7/agentic-safety-gguf", data_files="continuation_v3_owasp.jsonl") # 加载V4续训数据集 continuation_v4 = load_dataset("guerilla7/agentic-safety-gguf", data_files="continuation_v4_adversarial.json") # 加载评测数据集 mcqa = load_dataset("guerilla7/agentic-safety-gguf", data_files="cybersecurity_questions.jsonl") benign = load_dataset("guerilla7/agentic-safety-gguf", data_files="benign_traces.json") malicious = load_dataset("guerilla7/agentic-safety-gguf", data_files="malicious_traces.json") ### 复现训练流水线 python from transformers import AutoModelForCausalLM, AutoTokenizer from datasets import load_dataset # 步骤1:训练V2基础模型 base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct") dataset_v2 = load_dataset("guerilla7/agentic-safety-gguf", data_files="training_data_v3_synthetic.jsonl") # 使用QLoRA(量化低秩适配)进行训练(完整训练脚本详见模型仓库) # 步骤2:V3续训 dataset_v3 = load_dataset("guerilla7/agentic-safety-gguf", data_files="continuation_v3_owasp.jsonl") # 基于V2模型 checkpoint 续训500步 # 步骤3:V4续训 dataset_v4 = load_dataset("guerilla7/agentic-safety-gguf", data_files="continuation_v4_adversarial.json") # 基于V3模型 checkpoint 续训500步 ## 数据格式 ### 训练样本(JSONL格式) 训练样本采用JSONL格式,示例结构如下: json { "instruction": "Analyze this agentic workflow for security vulnerabilities...", "input": "[workflow description or trace data]", "output": "This workflow exhibits ASI01 (Prompt Injection)...", "source": "Agent-SafetyBench", "category": "security_analysis" } ### 多项选择问答评测样本(JSONL格式) 示例结构如下: json { "question": "What is the primary risk of ASI01?", "A": "Performance degradation", "B": "Prompt injection attacks", "C": "Data leakage", "D": "None", "answer": "B" } ### 轨迹数据(JSON格式) 示例结构如下: json { "trace_id": "unique_id", "spans": [...], "classification": "benign" | "malicious", "attack_type": "prompt_injection" | null } ## 应用场景 ✅ **复现学术研究**:完整复现V2/V3/V4训练流水线 ✅ **训练替代模型**:可适配Llama 3.3、Qwen 2.5、Mistral等多款大语言模型 ✅ **构建均衡数据集**:补充合法工作流样本以降低66.7%的假阳性率 ✅ **领域专属安全优化**:可针对金融科技、医疗、政务等垂直领域进行专业化适配 ✅ **基准性能评测**:用于对比新型AI安全模型的性能表现 ## 伦理考量 - **合成攻击模式限制**:本数据集仅可用于学术研究目的,严禁用于恶意攻击活动 - **较高假阳性率(66.7%)**:基于本数据集训练的模型在生产环境中必须配合人工审核机制 - **合成数据偏差问题**:43%的合成数据可能无法反映真实世界的攻击与工作流分布 - **防御导向设计**:本数据集旨在提升AI系统的安全防护能力,而非开发攻击手段 ## 引用格式 bibtex @article{agentic-safety-gguf-2025, title={agentic-safety-gguf: Specialized Fine-Tuning for Agentic AI Security}, year={2025}, url={https://huggingface.co/datasets/guerilla7/agentic-safety-gguf} } ## 相关资源 - **模型仓库**:[guerilla7/agentic-safety-gguf](https://huggingface.co/guerilla7/agentic-safety-gguf) - **训练脚本**:详见模型仓库以获取完整的QLoRA实现代码 - **研究论文**:(https://arxiv.org/abs/2601.00848) ## 许可证 Apache 2.0
提供机构:
AdvRahul
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作