djoeyc/AgentSec-Toolkit-Bundle
收藏Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/djoeyc/AgentSec-Toolkit-Bundle
下载链接
链接失效反馈官方服务:
资源简介:
# AgentSec Toolkit Datasets
Starter Hugging Face–ready datasets for two narrow fine-tuned models:
## Files
- `classifier-train.jsonl` — Prompt Injection Classifier training set
- `classifier-eval.jsonl` — Prompt Injection Classifier eval set
- `hardener-train.jsonl` — Prompt Hardener training set
- `hardener-eval.jsonl` — Prompt Hardener eval set
## Tasks
### 1. Prompt Injection Classifier
Input: untrusted text snippet + source type
Output:
- Injection Detected
- Attack Category
- Confidence
- Rationale
- Recommended Handling
### 2. Prompt Hardener
Input: insecure agent prompt + tools + scenario
Output:
- Risks Found
- Hardened Prompt
- Hardening Changes
- Remaining Risks
## Notes
- Files are JSONL and suitable for Hugging Face dataset upload.
- Keep eval sets separate from training.
- Labels and section names are intentionally rigid to improve consistency during fine-tuning.
# AgentSec 工具包数据集
专为两款窄域微调模型打造的、可直接适配Hugging Face的入门数据集:
## 文件
- `classifier-train.jsonl` — 提示词注入分类器(Prompt Injection Classifier)训练集
- `classifier-eval.jsonl` — 提示词注入分类器评估集
- `hardener-train.jsonl` — 提示词加固器(Prompt Hardener)训练集
- `hardener-eval.jsonl` — 提示词加固器评估集
## 任务
### 1. 提示词注入分类器(Prompt Injection Classifier)
输入:不可信文本片段 + 来源类型
输出:
- 检测到注入(Injection Detected)
- 攻击类别(Attack Category)
- 置信度(Confidence)
- 判定依据(Rationale)
- 推荐处理方案(Recommended Handling)
### 2. 提示词加固器(Prompt Hardener)
输入:不安全的AI智能体(AI Agent)提示词 + 工具集 + 应用场景
输出:
- 发现的风险(Risks Found)
- 加固后的提示词(Hardened Prompt)
- 加固修改点(Hardening Changes)
- 遗留风险(Remaining Risks)
## 备注
- 所有文件均为JSONL格式,可直接用于Hugging Face数据集上传
- 评估集与训练集需分开使用
- 标签与章节名称均采用严格固定的格式,以提升微调过程中的一致性
提供机构:
djoeyc



