hermes_reasoning_tool_use
收藏魔搭社区2025-12-05 更新2025-11-15 收录
下载链接:
https://modelscope.cn/datasets/interstellarninja/hermes_reasoning_tool_use
下载链接
链接失效反馈官方服务:
资源简介:
## TL;DR
**51 004 ShareGPT conversations** that teach LLMs *when*, *how* and **whether** to call tools.
Built with the **Nous Research Atropos** RL stack in [Atropos](https://github.com/NousResearch/atropos) using a custom `MultiTurnToolCallingEnv`, and aligned with **BFCL v3** evaluation scenarios.
Released by **@interstellarninja** under **Apache-2.0**.
---
## 1 Dataset Highlights
| Count | Split | Scenarios covered | Size |
|-------:|:------:|-------------------------------------------------|------|
| 51 004 | train | single-turn · multi-turn · multi-step · relevance | 392 MB |
* Each row: OpenAI-style `conversations`, per-episode `tools` schema, scenario label, source tag.
* Stored as ShareGPT conversations format for finetuning for tool-use with libraries such as axolotl.
---
## 2 Scenario Taxonomy (BFCL v3)
| `scenario_category` | Definition (BFCL) | Manifestation here |
|---------------------|-----------------------------------------------------------------|--------------------|
| `single_turn` | 1 user request → **1** valid tool call | Assistant emits exactly one `<tool_call>` block |
| `multi_turn` | Back-and-forth multiple tool calls with user follow-up | Alternating user / assistant turns with at least 2 tool calls |
| `multi_step` | ≥ 2 sequential tool calls after a **single** user turn | No user interruptions between calls |
| `relevance` | No tool suitable → assistant must *refuse* | Ground-truth trace is empty, correct answer is apology / info-request |
---
## 3 Data Preparation Pipeline
| Step | What we did |
|------|-------------|
| **1 · Seed data** | Loaded several open tool-calling corpora (Hermes-Tools, Glaive-FC, ToolAce, Nvidia-When2Call etc.) via 🤗 Datasets. |
| **2 · Scenario routing** | Regex + heuristic checks assigned each conversation to `single_turn`, `multistep`, `multiturn`, or `relevance`. |
| **3 · Environment** | Wrapped each episode in `MultiTurnToolCallingEnv` (sub-class of `BaseEnv`) from the Atropos library. Helpers like `SEQ_TOOL_HELPER`, `APOLOGY_HELPER` and `NARRATION_THINK_HELPER` were injected into the system prompt. |
| **4 · GRPO roll-outs** | Roll-outs with `NousResearch/DeepHermes-3-Llama-3-8B-Preview` for **GRPO** advantage; environment validated `<think>` / `<tool_call>` blocks`. |
| **5 · Reward shaping** | Dense accuracy + sparse bonus (+λ if all calls correct) − 0.2 penalty on first mismatch. Relevance episodes gained extra credit for explicit apologies and clarification requests. |
| **6 · Validation filters** | Functions `_validate_think_plus_calls`, `_validate_think_only`, and `_check_sequential_tools` enforced schema correctness; only roll-outs with ≥ 2 validated calls (or perfect refusals) were kept. |
---
## 4 Intended Uses
* **Supervised fine-tuning** or SFT warmup for **GRPO** for tool-calling models (e.g. Llama-3, Qwen-2).
* Finetuning LLMs for agentic tool-use with various scenarios common in agent applications
* Research on **relevance detection** and **refusal behaviour**.
---
## 5 Loading Example
```python
from datasets import load_dataset
ds = load_dataset(
"interstellarninja/hermes_reasoning_tool_use",
split="train",
streaming=True
)
sample = next(iter(ds))
print(sample["scenario_category"], sample["conversations"][0])
```
# How to cite:
```bibtex
@misc{Hermes_Reasoning_Tool_Use,
title = {Hermes Tool Use Reasoning},
author = {interstellarninja},
year = {2025},
howpublished = {\url{https://huggingface.co/datasets/interstellarninja/hermes_reasoning_tool_use}},
note = {Apache-2.0}
}
```
## TL;DR
**51004条ShareGPT对话**,用于教授大语言模型(Large Language Model,LLM)何时、如何以及是否调用工具。
本数据集基于Nous Research Atropos强化学习(Reinforcement Learning,RL)栈构建,依托[Atropos](https://github.com/NousResearch/atropos)仓库中的自定义`MultiTurnToolCallingEnv`环境,并与BFCL v3评估场景对齐。
由**@interstellarninja**以Apache-2.0开源协议发布。
---
## 1 数据集亮点
| 样本量 | 划分集 | 覆盖场景 | 大小 |
|-------:|:------:|---------|------|
| 51004 | train | 单轮、多轮、多步骤、相关性 | 392 MB |
* 每条数据均包含OpenAI格式的`conversations`(对话)字段、每轮对话的`tools`(工具)模式、场景标签与来源标签。
* 数据集采用ShareGPT对话格式存储,可用于基于axolotl等工具库的工具调用模型微调。
---
## 2 场景分类体系(BFCL v3)
| `scenario_category` | 定义(BFCL) | 本数据集实现形式 |
|---------------------|-------------|--------------------|
| `single_turn` | 1个用户请求 → **1**次有效工具调用 | 助手仅输出单个`<tool_call>`块 |
| `multi_turn` | 伴随用户后续提问的多轮交替工具调用 | 用户与助手交替对话,且至少包含2次工具调用 |
| `multi_step` | 单次用户提问后,≥2次连续工具调用 | 工具调用过程中无用户打断 |
| `relevance` | 无适配工具 → 助手必须拒绝调用 | 真实调用轨迹为空,正确响应为致歉或请求补充信息 |
---
## 3 数据制备流程
| 步骤 | 操作内容 |
|------|-------------|
| **1 · 种子数据** | 通过🤗 Datasets加载多个开源工具调用语料库,包括Hermes-Tools、Glaive-FC、ToolAce、Nvidia-When2Call等。 |
| **2 · 场景路由** | 通过正则表达式与启发式检查,将每条对话划分至`single_turn`、`multistep`、`multiturn`或`relevance`类别。 |
| **3 · 环境封装** | 将每个对话回合封装至Atropos库中的`MultiTurnToolCallingEnv`(`BaseEnv`的子类)中,并向系统提示词中注入`SEQ_TOOL_HELPER`、`APOLOGY_HELPER`与`NARRATION_THINK_HELPER`等辅助工具。 |
| **4 · GRPO采样推演** | 使用`NousResearch/DeepHermes-3-Llama-3-8B-Preview`模型开展GRPO优势采样推演;环境将验证`<think>`与`<tool_call>`块的格式正确性。 |
| **5 · 奖励塑形** | 采用稠密准确率奖励+稀疏bonus奖励(所有调用均正确时加λ),首次匹配错误则扣除0.2分。相关性场景的对话若包含明确致歉或请求澄清的内容,可获得额外加分。 |
| **6 · 验证过滤** | 通过`_validate_think_plus_calls`、`_validate_think_only`与`_check_sequential_tools`函数强制约束格式正确性;仅保留至少包含2次有效验证调用(或完美拒绝调用)的采样结果。 |
---
## 4 预期用途
* 用于工具调用模型的监督微调(Supervised Fine-Tuning,SFT)或GRPO预热训练,例如Llama-3、Qwen-2等模型。
* 针对AI智能体(AI Agent)应用中的常见场景,微调用于智能体工具调用的大语言模型。
* 用于相关性检测与拒绝行为的相关研究。
---
## 5 加载示例
python
from datasets import load_dataset
ds = load_dataset(
"interstellarninja/hermes_reasoning_tool_use",
split="train",
streaming=True
)
sample = next(iter(ds))
print(sample["scenario_category"], sample["conversations"][0])
# 引用格式:
bibtex
@misc{Hermes_Reasoning_Tool_Use,
title = {Hermes Tool Use Reasoning},
author = {interstellarninja},
year = {2025},
howpublished = {url{https://huggingface.co/datasets/interstellarninja/hermes_reasoning_tool_use}},
note = {Apache-2.0}
}
提供机构:
maas
创建时间:
2025-08-18



