sroecker/hermes-agent-traces-chatml
收藏Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/sroecker/hermes-agent-traces-chatml
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: messages
list:
- name: role
dtype: string
- name: content
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 940104210
num_examples: 18487
download_size: 939906839
dataset_size: 940104210
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
license: apache-2.0
task_categories:
- text-generation
language:
- en
tags:
- tool-calling
- function-calling
- agent
- hermes
- reasoning
- chatml
- sft
size_categories:
- 10K<n<100K
---
# Hermes Agent Traces — ChatML Format
A ready-to-train dataset of **18,487 multi-turn tool-calling conversations** in ChatML `messages` format, combining Hermes Agent reasoning traces with NousResearch function-calling data.
Built for SFT training of tool-calling / agentic LLMs with [TRL's SFTTrainer](https://huggingface.co/docs/trl/sft_trainer).
## Quick Start
```python
from datasets import load_dataset
from trl import SFTTrainer
dataset = load_dataset("sroecker/hermes-agent-traces-chatml", split="train")
trainer = SFTTrainer(
model="Qwen/Qwen3-0.6B",
train_dataset=dataset,
)
trainer.train()
```
## Schema
| Column | Type | Description |
|--------|------|-------------|
| `messages` | `list[{role, content}]` | Multi-turn conversation in ChatML format |
| `source` | `string` | Origin dataset: `"hermes-traces"` or `"nous-fc"` |
Message roles: `system`, `user`, `assistant`, `tool`
## Source Datasets
| Source | Config | Samples | Description |
|--------|--------|---------|-------------|
| [lambda/hermes-agent-reasoning-traces](https://huggingface.co/datasets/lambda/hermes-agent-reasoning-traces) | `kimi` | 7,646 | Multi-turn agentic traces from Kimi-K2.5, avg 24.3 turns, 13.9 tool calls per sample |
| [lambda/hermes-agent-reasoning-traces](https://huggingface.co/datasets/lambda/hermes-agent-reasoning-traces) | `glm-5.1` | 7,055 | Multi-turn agentic traces from GLM-5.1, avg 19.1 turns, 9.7 tool calls per sample |
| [NousResearch/hermes-function-calling-v1](https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1) | `func_calling_singleturn` | 1,893 | Single-turn function calling across diverse domains |
| [NousResearch/hermes-function-calling-v1](https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1) | `func_calling` | 1,893 | Multi-turn function calling conversations |
## Processing Steps
The dataset was created by the following pipeline:
### 1. Format conversion (ShareGPT → ChatML)
All source datasets use ShareGPT format (`from`/`value` keys). These were converted to ChatML (`role`/`content`):
| ShareGPT `from` | ChatML `role` |
|-----------------|---------------|
| `system` | `system` |
| `human` | `user` |
| `gpt` | `assistant` |
| `tool` | `tool` |
### 2. System prompt condensation (Hermes traces only)
The original Hermes Agent system prompts are **~25,000 chars / ~6,200 tokens** each because they embed full tool JSON schemas inline. These were replaced with a condensed ~90-token instruction:
```
You are a function calling AI model. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.
For each function call return a JSON object with the following schema:
{"name": <function-name>, "arguments": <args-dict>}
Each function call should be enclosed within <tool_call> </tool_call> XML tags.
Function results will be provided within <tool_response> </tool_response> XML tags.
```
**Why?** The original system prompts consumed ~75% of a typical training window (8,192 tokens), leaving almost no room for the actual tool-calling conversation. By condensing the system prompt, the model sees far more of the multi-turn interaction patterns during training. The tool-calling format (`<tool_call>`, `<tool_response>`, `<think>`) is learned from the conversation turns themselves, not from the schema in the system prompt.
### 3. Filtering
Examples were filtered to require:
- At least 3 messages
- At least one `assistant` turn
### 4. Concatenation & shuffling
All four source splits were concatenated and shuffled with `seed=42`.
## Conversation Format
Assistant messages contain inline XML blocks for reasoning and tool use:
```xml
<think>
The user wants me to search for files. Let me use the search tool.
</think>
<tool_call>
{"name": "search_files", "arguments": {"query": "payment processing"}}
</tool_call>
```
Tool responses appear as:
```xml
<tool_response>
{"tool_call_id": "call_123", "name": "search_files", "content": {"results": [...]}}
</tool_response>
```
These special tokens (`<tool_call>`, `</tool_call>`, `<tool_response>`, `</tool_response>`, `<think>`, `</think>`) are natively supported by Qwen3's tokenizer as dedicated token IDs.
## Task Categories
The dataset covers a wide range of agentic tasks:
- **Terminal & Coding** — script writing, debugging, environment setup
- **Agent Tools** — memory persistence, task delegation, skill management, todo planning
- **Repository Tasks** — bug fixes, feature implementation, code review, refactoring
- **Browser Automation** — Playwright-based navigation, scraping, form filling
- **File Operations** — reading, writing, patching files
- **Scheduling & Planning** — task organization, time management
- **IoT & Home Automation** — smart device control (from NousResearch data)
- **Multi-Tool** — complex tasks requiring multiple tool types
## Token Length Distribution
With the condensed system prompts (measured with Qwen3 tokenizer):
| Percentile | Tokens |
|-----------|--------|
| P10 | ~1,200 |
| P25 | ~4,900 |
| P50 (median) | ~17,000 |
| P75 | ~49,700 |
| P90 | ~85,400 |
Recommended `max_length` settings:
- `4096`: captures ~21% of examples fully
- `8192`: captures ~31% of examples fully
- `16384`: captures ~49% of examples fully
Longer examples are truncated from the right. With `assistant_only_loss=True`, the truncated system/user prefix tokens don't contribute to loss anyway.
## License
Apache 2.0 (inherited from source datasets)
提供机构:
sroecker



