AdvRahul/Agentic-Safety
收藏Hugging Face2026-04-18 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/AdvRahul/Agentic-Safety
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-generation
- question-answering
- text-classification
language:
- en
tags:
- cybersecurity
- agentic-ai
- security
- llm-security
- owasp
- synthetic-data
size_categories:
- 10K<n<100K
---
# agentic-safety-gguf: Training & Evaluation Datasets
**Model**: [guerilla7/agentic-safety-gguf](https://huggingface.co/guerilla7/agentic-safety-gguf)
**Paper**: (https://arxiv.org/abs/2601.00848)
**Total**: 80,992 examples (80,851 after deduplication)
## Overview
Complete training and evaluation datasets for agentic-safety-gguf, a specialized Llama 3.1 8B model for agentic AI security analysis. Supports iterative continuation training methodology (V2→V3→V4) for full reproducibility.
### Dataset Files
| File | Examples | Size | Purpose |
|------|----------|------|---------|
| **training_data_v2.jsonl** | 45,825 | 134MB | Base training (18 cybersecurity sources) |
| **training_data_v3_synthetic.jsonl** | 80,851 | 212MB | V2 training: Base + 35,026 synthetic traces |
| **continuation_v3_owasp.jsonl** | 111 | 101KB | V3 continuation: OWASP Top 10 + MS Taxonomy |
| **continuation_v4_adversarial.json** | 30 | 22KB | V4 continuation: Adversarial examples |
| **cybersecurity_questions.jsonl** | 75 | 21KB | Custom MCQA evaluation |
| **benign_traces.json** | 15 | 8.9KB | Legitimate workflows (testing) |
| **malicious_traces.json** | 15 | 8.9KB | Attack traces (testing) |
| **agentic_security_augmentation.jsonl** | - | - | Additional augmentation data |
## Training Data Composition
**V2 Base** (80,851 examples from `training_data_v3_synthetic.jsonl`):
- **18 Public Datasets** (45,825 examples): HelpSteer, cybersecurity base datasets, Agent-SafetyBench, UltraFeedback, TruthfulQA, and 13 others
- **Synthetic Traces** (35,026 examples): OpenTelemetry traces generated via Claude Sonnet 4.5 covering attack patterns (prompt injection, multi-agent attacks, tool manipulation) and benign workflows
**V3 Continuation** (+111 examples from `continuation_v3_owasp.jsonl`):
- OWASP Top 10 for Agentic Applications (2026)
- Microsoft Taxonomy of Failure Modes in Agentic AI Systems
- Targeted knowledge gap closure
**V4 Continuation** (+30 examples from `continuation_v4_adversarial.json`):
- Attack success rate definitions
- Multi-step attack chain analysis
- Adversarial examples targeting remaining weaknesses
See research paper for complete source attribution and deduplication methodology.
## Training Results
| Version | Training Data | MCQA Accuracy | Improvement |
|---------|---------------|---------------|-------------|
| **V2** | 80,851 base | 61.4% | Baseline |
| **V3** | +111 OWASP | 67.1% | +5.7 pts |
| **V4** | +30 adversarial | **74.3%** | +7.2 pts |
**Final Performance**: 74.29% overall (70% agentic, 76% traditional security)
**Base Model Comparison**: +31.43 points improvement over base model
## Quick Start
### Load Datasets
```python
from datasets import load_dataset
# V2 base training
train_v2 = load_dataset("guerilla7/agentic-safety-gguf",
data_files="training_data_v3_synthetic.jsonl")
# V3 continuation
continuation_v3 = load_dataset("guerilla7/agentic-safety-gguf",
data_files="continuation_v3_owasp.jsonl")
# V4 continuation
continuation_v4 = load_dataset("guerilla7/agentic-safety-gguf",
data_files="continuation_v4_adversarial.json")
# Evaluation
mcqa = load_dataset("guerilla7/agentic-safety-gguf",
data_files="cybersecurity_questions.jsonl")
benign = load_dataset("guerilla7/agentic-safety-gguf",
data_files="benign_traces.json")
malicious = load_dataset("guerilla7/agentic-safety-gguf",
data_files="malicious_traces.json")
```
### Reproduce Training Pipeline
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset
# Step 1: Train V2 base model
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
dataset_v2 = load_dataset("guerilla7/agentic-safety-gguf",
data_files="training_data_v3_synthetic.jsonl")
# Train with QLoRA (see model repo for complete training script)
# Step 2: V3 continuation
dataset_v3 = load_dataset("guerilla7/agentic-safety-gguf",
data_files="continuation_v3_owasp.jsonl")
# Continue training from V2 checkpoint (500 steps)
# Step 3: V4 continuation
dataset_v4 = load_dataset("guerilla7/agentic-safety-gguf",
data_files="continuation_v4_adversarial.json")
# Continue training from V3 checkpoint (500 steps)
```
## Data Format
### Training Examples (JSONL)
```json
{
"instruction": "Analyze this agentic workflow for security vulnerabilities...",
"input": "[workflow description or trace data]",
"output": "This workflow exhibits ASI01 (Prompt Injection)...",
"source": "Agent-SafetyBench",
"category": "security_analysis"
}
```
### MCQA Evaluation (JSONL)
```json
{
"question": "What is the primary risk of ASI01?",
"A": "Performance degradation",
"B": "Prompt injection attacks",
"C": "Data leakage",
"D": "None",
"answer": "B"
}
```
### Trace Data (JSON)
```json
{
"trace_id": "unique_id",
"spans": [...],
"classification": "benign" | "malicious",
"attack_type": "prompt_injection" | null
}
```
## Use Cases
✅ **Reproduce Research**: Complete V2/V3/V4 training pipeline
✅ **Train Alternative Models**: Llama 3.3, Qwen 2.5, Mistral
✅ **Develop Balanced Datasets**: Add benign workflow examples to address 66.7% FPR
✅ **Domain-Specific Security**: Fintech, healthcare, government specialization
✅ **Benchmark Evaluation**: Compare new security models
## Ethical Considerations
- **Synthetic Attack Patterns**: Research purposes only, not for malicious use
- **High FPR (66.7%)**: Model trained on this data requires human oversight in production
- **Synthetic Data Bias**: 43% synthetic data may not reflect real-world distributions
- **Defensive Research**: Designed for security improvement, not attack development
## Citation
```bibtex
@article{agentic-safety-gguf-2025,
title={agentic-safety-gguf: Specialized Fine-Tuning for Agentic AI Security},
year={2025},
url={https://huggingface.co/datasets/guerilla7/agentic-safety-gguf}
}
```
## Resources
- **Model**: [guerilla7/agentic-safety-gguf](https://huggingface.co/guerilla7/agentic-safety-gguf)
- **Training Scripts**: See model repository for complete QLoRA implementation
- **Research Paper**:(https://arxiv.org/abs/2601.00848)
## License
Apache 2.0
许可证:Apache 2.0
任务类别:
- 文本生成
- 问答
- 文本分类
语言:
- 英语
标签:
- 网络安全
- AI智能体(agentic AI)
- 安全
- 大语言模型安全(LLM Security)
- OWASP(开放式Web应用安全项目)
- 合成数据
规模类别:
- 10K < n < 100K
# agentic-safety-gguf:训练与评测数据集
**模型**:[guerilla7/agentic-safety-gguf](https://huggingface.co/guerilla7/agentic-safety-gguf)
**论文**:(https://arxiv.org/abs/2601.00848)
**总样本数**:80,992条(去重后为80,851条)
## 概览
本数据集为agentic-safety-gguf的完整训练与评测数据集,该模型是一款专门用于AI智能体安全分析的Llama 3.1 8B模型。本数据集支持可完全复现的迭代续训方法论(V2→V3→V4)。
## 数据集文件
| 文件名称 | 样本数量 | 大小 | 用途 |
|------|----------|------|---------|
| **training_data_v2.jsonl** | 45,825 | 134MB | 基础训练(源自18个网络安全数据源) |
| **training_data_v3_synthetic.jsonl** | 80,851 | 212MB | V2训练:基础数据集 + 35,026条合成轨迹 |
| **continuation_v3_owasp.jsonl** | 111 | 101KB | V3续训:OWASP Top 10 + 微软故障模式分类法 |
| **continuation_v4_adversarial.json** | 30 | 22KB | V4续训:对抗样本数据集 |
| **cybersecurity_questions.jsonl** | 75 | 21KB | 自定义多项选择问答评测 |
| **benign_traces.json** | 15 | 8.9KB | 合法工作流(测试用) |
| **malicious_traces.json** | 15 | 8.9KB | 攻击轨迹(测试用) |
| **agentic_security_augmentation.jsonl** | - | - | 额外增强数据 |
## 训练数据构成
**V2基础数据集**(源自`training_data_v3_synthetic.jsonl`的80,851条样本):
- **18个公开数据集**(共45,825条样本):包括HelpSteer、网络安全基础数据集、Agent-SafetyBench、UltraFeedback、TruthfulQA 以及其余13个公开数据集
- **合成轨迹**(共35,026条样本):通过Claude Sonnet 4.5生成的开放遥测(OpenTelemetry)轨迹,覆盖攻击模式(提示注入、多智能体攻击、工具操纵)与合法工作流
**V3续训数据集**(源自`continuation_v3_owasp.jsonl`的新增111条样本):
- 2026版AI智能体应用OWASP Top 10
- 微软AI智能体系统故障模式分类法
- 针对性填补模型知识空白
**V4续训数据集**(源自`continuation_v4_adversarial.json`的新增30条样本):
- 攻击成功率定义
- 多步攻击链分析
- 针对模型剩余薄弱点的对抗样本
完整的数据源归因与去重方法论详见研究论文。
## 训练结果
| 模型版本 | 训练数据 | 多项选择问答准确率 | 性能提升 |
|---------|---------------|---------------|-------------|
| **V2** | 80,851条基础样本 | 61.4% | 基线基准 |
| **V3** | 新增111条OWASP相关样本 | 67.1% | +5.7个百分点 |
| **V4** | 新增30条对抗样本 | **74.3%** | +7.2个百分点 |
**最终性能**:整体准确率为74.29%(其中AI智能体场景占比70%,传统安全场景占比76%);**基础模型对比**:相较初始基础模型提升31.43个百分点。
## 快速开始
### 加载数据集
python
from datasets import load_dataset
# 加载V2基础训练数据集
train_v2 = load_dataset("guerilla7/agentic-safety-gguf",
data_files="training_data_v3_synthetic.jsonl")
# 加载V3续训数据集
continuation_v3 = load_dataset("guerilla7/agentic-safety-gguf",
data_files="continuation_v3_owasp.jsonl")
# 加载V4续训数据集
continuation_v4 = load_dataset("guerilla7/agentic-safety-gguf",
data_files="continuation_v4_adversarial.json")
# 加载评测数据集
mcqa = load_dataset("guerilla7/agentic-safety-gguf",
data_files="cybersecurity_questions.jsonl")
benign = load_dataset("guerilla7/agentic-safety-gguf",
data_files="benign_traces.json")
malicious = load_dataset("guerilla7/agentic-safety-gguf",
data_files="malicious_traces.json")
### 复现训练流水线
python
from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset
# 步骤1:训练V2基础模型
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
dataset_v2 = load_dataset("guerilla7/agentic-safety-gguf",
data_files="training_data_v3_synthetic.jsonl")
# 使用QLoRA(量化低秩适配)进行训练(完整训练脚本详见模型仓库)
# 步骤2:V3续训
dataset_v3 = load_dataset("guerilla7/agentic-safety-gguf",
data_files="continuation_v3_owasp.jsonl")
# 基于V2模型 checkpoint 续训500步
# 步骤3:V4续训
dataset_v4 = load_dataset("guerilla7/agentic-safety-gguf",
data_files="continuation_v4_adversarial.json")
# 基于V3模型 checkpoint 续训500步
## 数据格式
### 训练样本(JSONL格式)
训练样本采用JSONL格式,示例结构如下:
json
{
"instruction": "Analyze this agentic workflow for security vulnerabilities...",
"input": "[workflow description or trace data]",
"output": "This workflow exhibits ASI01 (Prompt Injection)...",
"source": "Agent-SafetyBench",
"category": "security_analysis"
}
### 多项选择问答评测样本(JSONL格式)
示例结构如下:
json
{
"question": "What is the primary risk of ASI01?",
"A": "Performance degradation",
"B": "Prompt injection attacks",
"C": "Data leakage",
"D": "None",
"answer": "B"
}
### 轨迹数据(JSON格式)
示例结构如下:
json
{
"trace_id": "unique_id",
"spans": [...],
"classification": "benign" | "malicious",
"attack_type": "prompt_injection" | null
}
## 应用场景
✅ **复现学术研究**:完整复现V2/V3/V4训练流水线
✅ **训练替代模型**:可适配Llama 3.3、Qwen 2.5、Mistral等多款大语言模型
✅ **构建均衡数据集**:补充合法工作流样本以降低66.7%的假阳性率
✅ **领域专属安全优化**:可针对金融科技、医疗、政务等垂直领域进行专业化适配
✅ **基准性能评测**:用于对比新型AI安全模型的性能表现
## 伦理考量
- **合成攻击模式限制**:本数据集仅可用于学术研究目的,严禁用于恶意攻击活动
- **较高假阳性率(66.7%)**:基于本数据集训练的模型在生产环境中必须配合人工审核机制
- **合成数据偏差问题**:43%的合成数据可能无法反映真实世界的攻击与工作流分布
- **防御导向设计**:本数据集旨在提升AI系统的安全防护能力,而非开发攻击手段
## 引用格式
bibtex
@article{agentic-safety-gguf-2025,
title={agentic-safety-gguf: Specialized Fine-Tuning for Agentic AI Security},
year={2025},
url={https://huggingface.co/datasets/guerilla7/agentic-safety-gguf}
}
## 相关资源
- **模型仓库**:[guerilla7/agentic-safety-gguf](https://huggingface.co/guerilla7/agentic-safety-gguf)
- **训练脚本**:详见模型仓库以获取完整的QLoRA实现代码
- **研究论文**:(https://arxiv.org/abs/2601.00848)
## 许可证
Apache 2.0
提供机构:
AdvRahul



