five

hermes_reasoning_tool_use

收藏
魔搭社区2025-12-05 更新2025-11-15 收录
下载链接:
https://modelscope.cn/datasets/interstellarninja/hermes_reasoning_tool_use
下载链接
链接失效反馈
官方服务:
资源简介:
## TL;DR **51 004 ShareGPT conversations** that teach LLMs *when*, *how* and **whether** to call tools. Built with the **Nous Research Atropos** RL stack in [Atropos](https://github.com/NousResearch/atropos) using a custom `MultiTurnToolCallingEnv`, and aligned with **BFCL v3** evaluation scenarios. Released by **@interstellarninja** under **Apache-2.0**. --- ## 1 Dataset Highlights | Count | Split | Scenarios covered | Size | |-------:|:------:|-------------------------------------------------|------| | 51 004 | train | single-turn · multi-turn · multi-step · relevance | 392 MB | * Each row: OpenAI-style `conversations`, per-episode `tools` schema, scenario label, source tag. * Stored as ShareGPT conversations format for finetuning for tool-use with libraries such as axolotl. --- ## 2 Scenario Taxonomy (BFCL v3) | `scenario_category` | Definition (BFCL) | Manifestation here | |---------------------|-----------------------------------------------------------------|--------------------| | `single_turn` | 1 user request → **1** valid tool call | Assistant emits exactly one `<tool_call>` block | | `multi_turn` | Back-and-forth multiple tool calls with user follow-up | Alternating user / assistant turns with at least 2 tool calls | | `multi_step` | ≥ 2 sequential tool calls after a **single** user turn | No user interruptions between calls | | `relevance` | No tool suitable → assistant must *refuse* | Ground-truth trace is empty, correct answer is apology / info-request | --- ## 3 Data Preparation Pipeline | Step | What we did | |------|-------------| | **1 · Seed data** | Loaded several open tool-calling corpora (Hermes-Tools, Glaive-FC, ToolAce, Nvidia-When2Call etc.) via 🤗 Datasets. | | **2 · Scenario routing** | Regex + heuristic checks assigned each conversation to `single_turn`, `multistep`, `multiturn`, or `relevance`. | | **3 · Environment** | Wrapped each episode in `MultiTurnToolCallingEnv` (sub-class of `BaseEnv`) from the Atropos library. Helpers like `SEQ_TOOL_HELPER`, `APOLOGY_HELPER` and `NARRATION_THINK_HELPER` were injected into the system prompt. | | **4 · GRPO roll-outs** | Roll-outs with `NousResearch/DeepHermes-3-Llama-3-8B-Preview` for **GRPO** advantage; environment validated `<think>` / `<tool_call>` blocks`. | | **5 · Reward shaping** | Dense accuracy + sparse bonus (+λ if all calls correct) − 0.2 penalty on first mismatch. Relevance episodes gained extra credit for explicit apologies and clarification requests. | | **6 · Validation filters** | Functions `_validate_think_plus_calls`, `_validate_think_only`, and `_check_sequential_tools` enforced schema correctness; only roll-outs with ≥ 2 validated calls (or perfect refusals) were kept. | --- ## 4 Intended Uses * **Supervised fine-tuning** or SFT warmup for **GRPO** for tool-calling models (e.g. Llama-3, Qwen-2). * Finetuning LLMs for agentic tool-use with various scenarios common in agent applications * Research on **relevance detection** and **refusal behaviour**. --- ## 5 Loading Example ```python from datasets import load_dataset ds = load_dataset( "interstellarninja/hermes_reasoning_tool_use", split="train", streaming=True ) sample = next(iter(ds)) print(sample["scenario_category"], sample["conversations"][0]) ``` # How to cite: ```bibtex @misc{Hermes_Reasoning_Tool_Use, title = {Hermes Tool Use Reasoning}, author = {interstellarninja}, year = {2025}, howpublished = {\url{https://huggingface.co/datasets/interstellarninja/hermes_reasoning_tool_use}}, note = {Apache-2.0} } ```

## TL;DR **51004条ShareGPT对话**,用于教授大语言模型(Large Language Model,LLM)何时、如何以及是否调用工具。 本数据集基于Nous Research Atropos强化学习(Reinforcement Learning,RL)栈构建,依托[Atropos](https://github.com/NousResearch/atropos)仓库中的自定义`MultiTurnToolCallingEnv`环境,并与BFCL v3评估场景对齐。 由**@interstellarninja**以Apache-2.0开源协议发布。 --- ## 1 数据集亮点 | 样本量 | 划分集 | 覆盖场景 | 大小 | |-------:|:------:|---------|------| | 51004 | train | 单轮、多轮、多步骤、相关性 | 392 MB | * 每条数据均包含OpenAI格式的`conversations`(对话)字段、每轮对话的`tools`(工具)模式、场景标签与来源标签。 * 数据集采用ShareGPT对话格式存储,可用于基于axolotl等工具库的工具调用模型微调。 --- ## 2 场景分类体系(BFCL v3) | `scenario_category` | 定义(BFCL) | 本数据集实现形式 | |---------------------|-------------|--------------------| | `single_turn` | 1个用户请求 → **1**次有效工具调用 | 助手仅输出单个`<tool_call>`块 | | `multi_turn` | 伴随用户后续提问的多轮交替工具调用 | 用户与助手交替对话,且至少包含2次工具调用 | | `multi_step` | 单次用户提问后,≥2次连续工具调用 | 工具调用过程中无用户打断 | | `relevance` | 无适配工具 → 助手必须拒绝调用 | 真实调用轨迹为空,正确响应为致歉或请求补充信息 | --- ## 3 数据制备流程 | 步骤 | 操作内容 | |------|-------------| | **1 · 种子数据** | 通过🤗 Datasets加载多个开源工具调用语料库,包括Hermes-Tools、Glaive-FC、ToolAce、Nvidia-When2Call等。 | | **2 · 场景路由** | 通过正则表达式与启发式检查,将每条对话划分至`single_turn`、`multistep`、`multiturn`或`relevance`类别。 | | **3 · 环境封装** | 将每个对话回合封装至Atropos库中的`MultiTurnToolCallingEnv`(`BaseEnv`的子类)中,并向系统提示词中注入`SEQ_TOOL_HELPER`、`APOLOGY_HELPER`与`NARRATION_THINK_HELPER`等辅助工具。 | | **4 · GRPO采样推演** | 使用`NousResearch/DeepHermes-3-Llama-3-8B-Preview`模型开展GRPO优势采样推演;环境将验证`<think>`与`<tool_call>`块的格式正确性。 | | **5 · 奖励塑形** | 采用稠密准确率奖励+稀疏bonus奖励(所有调用均正确时加λ),首次匹配错误则扣除0.2分。相关性场景的对话若包含明确致歉或请求澄清的内容,可获得额外加分。 | | **6 · 验证过滤** | 通过`_validate_think_plus_calls`、`_validate_think_only`与`_check_sequential_tools`函数强制约束格式正确性;仅保留至少包含2次有效验证调用(或完美拒绝调用)的采样结果。 | --- ## 4 预期用途 * 用于工具调用模型的监督微调(Supervised Fine-Tuning,SFT)或GRPO预热训练,例如Llama-3、Qwen-2等模型。 * 针对AI智能体(AI Agent)应用中的常见场景,微调用于智能体工具调用的大语言模型。 * 用于相关性检测与拒绝行为的相关研究。 --- ## 5 加载示例 python from datasets import load_dataset ds = load_dataset( "interstellarninja/hermes_reasoning_tool_use", split="train", streaming=True ) sample = next(iter(ds)) print(sample["scenario_category"], sample["conversations"][0]) # 引用格式: bibtex @misc{Hermes_Reasoning_Tool_Use, title = {Hermes Tool Use Reasoning}, author = {interstellarninja}, year = {2025}, howpublished = {url{https://huggingface.co/datasets/interstellarninja/hermes_reasoning_tool_use}}, note = {Apache-2.0} }
提供机构:
maas
创建时间:
2025-08-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作