hermes-agent-reasoning-traces
收藏魔搭社区2026-05-02 更新2026-05-03 收录
下载链接:
https://modelscope.cn/datasets/lambda-ai/hermes-agent-reasoning-traces
下载链接
链接失效反馈官方服务:
资源简介:
# Hermes Agent Reasoning Traces
Multi-turn tool-calling trajectories for training AI agents using the [Hermes Agent](https://github.com/nousresearch/hermes-agent) harness. Each sample is a real agent conversation with step-by-step reasoning (`<think>` blocks) and actual tool execution results.
This dataset has two configs, one per source model:
| Config | Model | Samples |
|--------|-------|---------|
| **kimi** | Moonshot AI Kimi-K2.5 | 7,646 |
| **glm-5.1** | ZhipuAI GLM-5.1-FP8 | 7,055 |
## Loading
```python
from datasets import load_dataset
# Kimi-K2.5 traces
ds = load_dataset("lambda/hermes-agent-reasoning-traces", "kimi", split="train")
# GLM-5.1 traces
ds = load_dataset("lambda/hermes-agent-reasoning-traces", "glm-5.1", split="train")
```
## Schema
Both configs share the same schema:
| Field | Type | Description |
|-------|------|-------------|
| `id` | string | UUID identifier |
| `conversations` | list | Multi-turn dialogue (system, human, gpt, tool messages) |
| `tools` | string | JSON tool definitions available to the agent |
| `category` | string | High-level task category |
| `subcategory` | string | Fine-grained task type |
| `task` | string | Task description (from user prompt) |
Conversation messages use ShareGPT format:
```json
{"from": "system|human|gpt|tool", "value": "..."}
```
- `<think>` blocks contain chain-of-thought reasoning
- `<tool_call>` blocks contain function invocations
- `<tool_response>` blocks contain real execution results
## Statistics
| Metric | kimi | glm-5.1 |
|--------|------|---------|
| Samples | 7,646 | 7,055 |
| Total turns | 185,798 | 134,918 |
| Total tool calls | 106,222 | 68,328 |
| Avg turns per sample | 24.3 | 19.1 |
| Avg tool calls per sample | 13.9 | 9.7 |
| Avg `<think>` depth (words) | 414 | 70 |
## Categories
Both configs use a shared 9-category taxonomy:
| Category | kimi | glm-5.1 |
|----------|-----:|--------:|
| Terminal & Coding | 2,010 | 2,237 |
| Agent Tools | 1,474 | 2,775 |
| Repository Tasks | 1,109 | 1,022 |
| Browser Automation | 1,048 | 639 |
| Multi-Tool | 807 | 52 |
| File Operations | 757 | 134 |
| Scheduling | 204 | 104 |
| Planning & Organization | 201 | 92 |
| Conversational | 36 | 0 |
## Generation Details
### Kimi-K2.5
- **Model:** `moonshotai/Kimi-K2.5` (MoE)
- **Inference:** vLLM with `--tool-call-parser kimi_k2 --reasoning-parser kimi_k2 --enable-auto-tool-choice`
### GLM-5.1
- **Model:** `zai-org/GLM-5.1-FP8`
- **Inference:** vLLM with `--tool-call-parser glm47 --reasoning-parser glm45 --enable-auto-tool-choice`
- **Serving:** 3x 8xH100 nodes via load-balanced gateway
- **Context:** 202,752 tokens max, MTP speculative decoding
Both datasets were generated using the [hermes-agent-generator](https://github.com/nousresearch/hermes-agent) pipeline with **real tool execution** (terminal commands, file operations, browser actions) — not synthetic outputs.
## Data Sources
Both datasets include trajectories across the same task categories:
- **Terminal & Coding** — script writing, debugging, environment setup, data processing, testing, documentation
- **Browser Automation** — Playwright-based navigation, scraping, form filling, screenshot analysis
- **Agent Tools** — Hermes-specific capabilities: memory persistence, task delegation, skill management, todo planning, code execution, session recall
- **Repository Tasks** — real codebase work across GitHub repos: bug fixes, feature implementation, test writing, code review, refactoring
## License
Apache 2.0
# Hermes智能体推理轨迹数据集
多轮工具调用轨迹数据集,用于基于[Hermes智能体(Hermes Agent)](https://github.com/nousresearch/hermes-agent)框架训练AI智能体。每条样本均为真实智能体对话,包含逐步推理过程(`<think>`模块)与实际工具执行结果。
本数据集包含两种配置,分别对应不同的源模型:
| 配置名称 | 模型 | 样本数量 |
|--------|-------|---------|
| **kimi** | 月之暗面Kimi-K2.5 | 7,646 |
| **glm-5.1** | 智谱AI GLM-5.1-FP8 | 7,055 |
## 加载方式
python
from datasets import load_dataset
# 加载Kimi-K2.5轨迹数据集
ds = load_dataset("lambda/hermes-agent-reasoning-traces", "kimi", split="train")
# 加载GLM-5.1轨迹数据集
ds = load_dataset("lambda/hermes-agent-reasoning-traces", "glm-5.1", split="train")
## 数据结构
两种配置共享同一数据结构(Schema):
| 字段名 | 数据类型 | 字段说明 |
|-------|------|-------------|
| `id` | 字符串 | UUID格式的唯一标识符 |
| `conversations` | 列表 | 多轮对话(包含系统、用户、智能体、工具消息) |
| `tools` | 字符串 | 智能体可用的JSON格式工具定义 |
| `category` | 字符串 | 高层级任务类别 |
| `subcategory` | 字符串 | 细粒度任务类型 |
| `task` | 字符串 | 任务描述(源自用户提示词) |
对话消息采用ShareGPT格式:
json
{"from": "system|human|gpt|tool", "value": "..."}
- `<think>` 模块包含思维链推理内容
- `<tool_call>` 模块包含函数调用请求
- `<tool_response>` 模块包含实际工具执行返回结果
## 统计指标
| 统计指标 | kimi配置 | glm-5.1配置 |
|--------|------|---------|
| 样本数量 | 7,646 | 7,055 |
| 总对话轮次 | 185,798 | 134,918 |
| 总工具调用次数 | 106,222 | 68,328 |
| 单样本平均对话轮次 | 24.3 | 19.1 |
| 单样本平均工具调用次数 | 13.9 | 9.7 |
| `<think>`模块平均推理深度(单词数) | 414 | 70 |
## 任务分类
两种配置采用统一的9类任务分类体系:
| 任务类别 | kimi配置 | glm-5.1配置 |
|----------|-----:|--------:|
| 终端与编码 | 2,010 | 2,237 |
| 智能体工具 | 1,474 | 2,775 |
| 代码仓库任务 | 1,109 | 1,022 |
| 浏览器自动化 | 1,048 | 639 |
| 多工具协同 | 807 | 52 |
| 文件操作 | 757 | 134 |
| 调度任务 | 204 | 104 |
| 规划与组织 | 201 | 92 |
| 对话交互 | 36 | 0 |
## 生成细节
### Kimi-K2.5
- **模型:** `moonshotai/Kimi-K2.5`(混合专家模型MoE)
- **推理部署:** 采用vLLM框架,配置参数为`--tool-call-parser kimi_k2 --reasoning-parser kimi_k2 --enable-auto-tool-choice`
### GLM-5.1
- **模型:** `zai-org/GLM-5.1-FP8`
- **推理部署:** 采用vLLM框架,配置参数为`--tool-call-parser glm47 --reasoning-parser glm45 --enable-auto-tool-choice`
- **服务架构:** 通过负载均衡网关部署于3台8卡H100节点集群
- **上下文窗口:** 最大202,752个Token,支持MTP推测性解码
两个数据集均通过[hermes-agent-generator](https://github.com/nousresearch/hermes-agent)流水线生成,且采用**真实工具执行**(终端命令、文件操作、浏览器动作)而非合成输出。
## 数据来源
两个数据集覆盖了以下同类任务类别:
- **终端与编码任务** — 脚本编写、调试、环境搭建、数据处理、测试、文档撰写
- **浏览器自动化任务** — 基于Playwright的导航、爬取、表单填写、截图分析
- **智能体工具任务** — Hermes专属能力:记忆持久化、任务委派、技能管理、待办规划、代码执行、会话回溯
- **代码仓库任务** — 针对GitHub代码库的实际开发工作:缺陷修复、功能实现、测试用例编写、代码评审、代码重构
## 许可证
Apache 2.0
提供机构:
maas
创建时间:
2026-04-07



