hermes-agent-reasoning-traces

Name: hermes-agent-reasoning-traces
Creator: maas
Published: 2026-05-02 00:33:30
License: 暂无描述

魔搭社区2026-05-02 更新2026-05-03 收录

下载链接：

https://modelscope.cn/datasets/lambda-ai/hermes-agent-reasoning-traces

下载链接

链接失效反馈

官方服务：

资源简介：

# Hermes Agent Reasoning Traces Multi-turn tool-calling trajectories for training AI agents using the [Hermes Agent](https://github.com/nousresearch/hermes-agent) harness. Each sample is a real agent conversation with step-by-step reasoning (`<think>` blocks) and actual tool execution results. This dataset has two configs, one per source model: | Config | Model | Samples | |--------|-------|---------| | **kimi** | Moonshot AI Kimi-K2.5 | 7,646 | | **glm-5.1** | ZhipuAI GLM-5.1-FP8 | 7,055 | ## Loading ```python from datasets import load_dataset # Kimi-K2.5 traces ds = load_dataset("lambda/hermes-agent-reasoning-traces", "kimi", split="train") # GLM-5.1 traces ds = load_dataset("lambda/hermes-agent-reasoning-traces", "glm-5.1", split="train") ``` ## Schema Both configs share the same schema: | Field | Type | Description | |-------|------|-------------| | `id` | string | UUID identifier | | `conversations` | list | Multi-turn dialogue (system, human, gpt, tool messages) | | `tools` | string | JSON tool definitions available to the agent | | `category` | string | High-level task category | | `subcategory` | string | Fine-grained task type | | `task` | string | Task description (from user prompt) | Conversation messages use ShareGPT format: ```json {"from": "system|human|gpt|tool", "value": "..."} ``` - `<think>` blocks contain chain-of-thought reasoning - `<tool_call>` blocks contain function invocations - `<tool_response>` blocks contain real execution results ## Statistics | Metric | kimi | glm-5.1 | |--------|------|---------| | Samples | 7,646 | 7,055 | | Total turns | 185,798 | 134,918 | | Total tool calls | 106,222 | 68,328 | | Avg turns per sample | 24.3 | 19.1 | | Avg tool calls per sample | 13.9 | 9.7 | | Avg `<think>` depth (words) | 414 | 70 | ## Categories Both configs use a shared 9-category taxonomy: | Category | kimi | glm-5.1 | |----------|-----:|--------:| | Terminal & Coding | 2,010 | 2,237 | | Agent Tools | 1,474 | 2,775 | | Repository Tasks | 1,109 | 1,022 | | Browser Automation | 1,048 | 639 | | Multi-Tool | 807 | 52 | | File Operations | 757 | 134 | | Scheduling | 204 | 104 | | Planning & Organization | 201 | 92 | | Conversational | 36 | 0 | ## Generation Details ### Kimi-K2.5 - **Model:** `moonshotai/Kimi-K2.5` (MoE) - **Inference:** vLLM with `--tool-call-parser kimi_k2 --reasoning-parser kimi_k2 --enable-auto-tool-choice` ### GLM-5.1 - **Model:** `zai-org/GLM-5.1-FP8` - **Inference:** vLLM with `--tool-call-parser glm47 --reasoning-parser glm45 --enable-auto-tool-choice` - **Serving:** 3x 8xH100 nodes via load-balanced gateway - **Context:** 202,752 tokens max, MTP speculative decoding Both datasets were generated using the [hermes-agent-generator](https://github.com/nousresearch/hermes-agent) pipeline with **real tool execution** (terminal commands, file operations, browser actions) — not synthetic outputs. ## Data Sources Both datasets include trajectories across the same task categories: - **Terminal & Coding** — script writing, debugging, environment setup, data processing, testing, documentation - **Browser Automation** — Playwright-based navigation, scraping, form filling, screenshot analysis - **Agent Tools** — Hermes-specific capabilities: memory persistence, task delegation, skill management, todo planning, code execution, session recall - **Repository Tasks** — real codebase work across GitHub repos: bug fixes, feature implementation, test writing, code review, refactoring ## License Apache 2.0

# Hermes智能体推理轨迹数据集多轮工具调用轨迹数据集，用于基于[Hermes智能体（Hermes Agent）](https://github.com/nousresearch/hermes-agent)框架训练AI智能体。每条样本均为真实智能体对话，包含逐步推理过程（`<think>`模块）与实际工具执行结果。本数据集包含两种配置，分别对应不同的源模型： | 配置名称 | 模型 | 样本数量 | |--------|-------|---------| | **kimi** | 月之暗面Kimi-K2.5 | 7,646 | | **glm-5.1** | 智谱AI GLM-5.1-FP8 | 7,055 | ## 加载方式 python from datasets import load_dataset # 加载Kimi-K2.5轨迹数据集 ds = load_dataset("lambda/hermes-agent-reasoning-traces", "kimi", split="train") # 加载GLM-5.1轨迹数据集 ds = load_dataset("lambda/hermes-agent-reasoning-traces", "glm-5.1", split="train") ## 数据结构两种配置共享同一数据结构（Schema）： | 字段名 | 数据类型 | 字段说明 | |-------|------|-------------| | `id` | 字符串 | UUID格式的唯一标识符 | | `conversations` | 列表 | 多轮对话（包含系统、用户、智能体、工具消息） | | `tools` | 字符串 | 智能体可用的JSON格式工具定义 | | `category` | 字符串 | 高层级任务类别 | | `subcategory` | 字符串 | 细粒度任务类型 | | `task` | 字符串 | 任务描述（源自用户提示词） | 对话消息采用ShareGPT格式： json {"from": "system|human|gpt|tool", "value": "..."} - `<think>` 模块包含思维链推理内容 - `<tool_call>` 模块包含函数调用请求 - `<tool_response>` 模块包含实际工具执行返回结果 ## 统计指标 | 统计指标 | kimi配置 | glm-5.1配置 | |--------|------|---------| | 样本数量 | 7,646 | 7,055 | | 总对话轮次 | 185,798 | 134,918 | | 总工具调用次数 | 106,222 | 68,328 | | 单样本平均对话轮次 | 24.3 | 19.1 | | 单样本平均工具调用次数 | 13.9 | 9.7 | | `<think>`模块平均推理深度（单词数） | 414 | 70 | ## 任务分类两种配置采用统一的9类任务分类体系： | 任务类别 | kimi配置 | glm-5.1配置 | |----------|-----:|--------:| | 终端与编码 | 2,010 | 2,237 | | 智能体工具 | 1,474 | 2,775 | | 代码仓库任务 | 1,109 | 1,022 | | 浏览器自动化 | 1,048 | 639 | | 多工具协同 | 807 | 52 | | 文件操作 | 757 | 134 | | 调度任务 | 204 | 104 | | 规划与组织 | 201 | 92 | | 对话交互 | 36 | 0 | ## 生成细节 ### Kimi-K2.5 - **模型：** `moonshotai/Kimi-K2.5`（混合专家模型MoE） - **推理部署：** 采用vLLM框架，配置参数为`--tool-call-parser kimi_k2 --reasoning-parser kimi_k2 --enable-auto-tool-choice` ### GLM-5.1 - **模型：** `zai-org/GLM-5.1-FP8` - **推理部署：** 采用vLLM框架，配置参数为`--tool-call-parser glm47 --reasoning-parser glm45 --enable-auto-tool-choice` - **服务架构：** 通过负载均衡网关部署于3台8卡H100节点集群 - **上下文窗口：** 最大202,752个Token，支持MTP推测性解码两个数据集均通过[hermes-agent-generator](https://github.com/nousresearch/hermes-agent)流水线生成，且采用**真实工具执行**（终端命令、文件操作、浏览器动作）而非合成输出。 ## 数据来源两个数据集覆盖了以下同类任务类别： - **终端与编码任务** — 脚本编写、调试、环境搭建、数据处理、测试、文档撰写 - **浏览器自动化任务** — 基于Playwright的导航、爬取、表单填写、截图分析 - **智能体工具任务** — Hermes专属能力：记忆持久化、任务委派、技能管理、待办规划、代码执行、会话回溯 - **代码仓库任务** — 针对GitHub代码库的实际开发工作：缺陷修复、功能实现、测试用例编写、代码评审、代码重构 ## 许可证 Apache 2.0

提供机构：

maas

创建时间：

2026-04-07

搜集汇总

数据集介绍