ToolScale
收藏魔搭社区2026-01-09 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/nv-community/ToolScale
下载链接
链接失效反馈官方服务:
资源简介:
# ToolScale Dataset
[](https://arxiv.org/abs/2511.21689)
[](https://github.com/NVlabs/ToolOrchestra/)
[](https://huggingface.co/nvidia/Orchestrator-8B)
[](https://huggingface.co/datasets/nvidia/ToolScale)
[](https://research.nvidia.com/labs/lpr/ToolOrchestra/)
The **ToolScale dataset** is a key component of the [ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration](https://arxiv.org/abs/2511.21689) project. It provides synthetic environment and tool-call tasks specifically generated to aid the reinforcement learning (RL) training of small orchestrator models. These orchestrators are designed to effectively manage and coordinate diverse intelligent tools and other models for solving complex, multi-turn agentic tasks.
### Description
The `ToolScale` dataset is instrumental in teaching AI agents how to reason, plan, and utilize a heterogeneous set of tools (e.g., web search, code interpreters, specialized LLMs) to achieve user-defined goals. It supports the development of efficient and robust tool-augmented reasoning systems.
### Dataset Structure
The `ToolScale` dataset contains detailed information structured to facilitate training and evaluation of tool-orchestration agents. Key features include:
* `id`: A unique identifier for each sample.
* `description`: Provides context about the task, including its `purpose`.
* `user_scenario`: Details the user's interaction scenario, comprising elements such as `persona`, `task_instructions`, `reason_for_call`, `known_info`, `unknown_info`, and the `domain`.
* `initial_state`: The starting conditions or state for the given task.
* `evaluation_criteria`: Specifies the expected actions and assertions for successful task completion, detailing `actions` with their `arguments`, `name`, and `action_id`, as well as `communicate_info` and `nl_assertions`.
For a full schema of the dataset, please refer to the YAML metadata at the top of this card.
### Sample Usage
You can load the `ToolScale` dataset using the Hugging Face `datasets` library:
```python
from datasets import load_dataset
dataset = load_dataset("nvidia/ToolScale")
print(dataset)
print(dataset["train"][0])
```
### Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://app.intigriti.com/programs/nvidia/nvidiavdp/detail).
### License/Terms of Use
[NVIDIA License](LICENSE)
### Citation
If you find this dataset useful, please cite our [paper](https://arxiv.org/abs/2511.21689):
```bibtex
@misc{toolorchestra,
title={ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration},
author={Hongjin Su and Shizhe Diao and Ximing Lu and Mingjie Liu and Jiacheng Xu and Xin Dong and Yonggan Fu and Peter Belcak and Hanrong Ye and Hongxu Yin and Yi Dong and Evelina Bakhturina and Tao Yu and Yejin Choi and Jan Kautz and Pavlo Molchanov},
year={2025},
eprint={2511.21689},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2511.21689},
}
```
# ToolScale 数据集
[](https://arxiv.org/abs/2511.21689)
[](https://github.com/NVlabs/ToolOrchestra/)
[](https://huggingface.co/nvidia/Orchestrator-8B)
[](https://huggingface.co/datasets/nvidia/ToolScale)
[](https://research.nvidia.com/labs/lpr/ToolOrchestra/)
**ToolScale 数据集**是[ToolOrchestra:通过高效模型与工具协同提升智能](https://arxiv.org/abs/2511.21689)项目的核心组成部分。该数据集专为辅助小型协同模型的强化学习(reinforcement learning, RL)训练而生成,包含合成环境与工具调用任务。这类协同模型旨在高效管理、协调各类智能工具与其他模型,以解决复杂的多轮AI智能体(AI Agent)任务。
### 数据集概述
`ToolScale` 数据集可用于指导AI智能体如何推理、规划并利用异构工具集(如网页搜索、代码解释器、专用大语言模型(Large Language Model, LLM))来达成用户定义的目标,助力高效且鲁棒的工具增强推理系统的开发。
### 数据集结构
`ToolScale` 数据集包含结构化的详细信息,可辅助工具协同智能体的训练与评估,其核心字段包括:
* `id`:每个样本的唯一标识符。
* `description`:提供任务的上下文信息,包含其`purpose`(任务目标)。
* `user_scenario`:详细描述用户交互场景,包含`persona`(用户人设)、`task_instructions`(任务指令)、`reason_for_call`(调用工具的缘由)、`known_info`(已知信息)、`unknown_info`(未知信息)以及`domain`(应用领域)等要素。
* `initial_state`:给定任务的初始状态或起始条件。
* `evaluation_criteria`:指定任务成功完成所需的预期操作与验证规则,详细说明了`actions`(操作)的`arguments`(参数)、`name`(名称)与`action_id`(操作ID),以及`communicate_info`(交互信息)和`nl_assertions`(自然语言验证断言)。
如需查看数据集的完整schema(模式),请参阅本卡片顶部的YAML元数据。
### 示例用法
你可以通过Hugging Face的`datasets`库加载`ToolScale`数据集:
python
from datasets import load_dataset
dataset = load_dataset("nvidia/ToolScale")
print(dataset)
print(dataset["train"][0])
### 伦理考量
NVIDIA 认为,可信人工智能是一项共同责任,我们已制定相关政策与实践规范,以支持各类人工智能应用的开发。开发者在遵循服务条款下载或使用本数据集时,应与内部模型团队协作,确保该数据集符合相关行业与应用场景的要求,并防范可能出现的产品滥用问题。
请[在此](https://app.intigriti.com/programs/nvidia/nvidiavdp/detail)报告模型质量、风险、安全漏洞或NVIDIA人工智能相关问题。
### 许可/使用条款
[NVIDIA License](LICENSE)
### 引用
若您认为本数据集对您的研究有所帮助,请引用我们的[论文](https://arxiv.org/abs/2511.21689):
bibtex
@misc{toolorchestra,
title={ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration},
author={Hongjin Su and Shizhe Diao and Ximing Lu and Mingjie Liu and Jiacheng Xu and Xin Dong and Yonggan Fu and Peter Belcak and Hanrong Ye and Hongxu Yin and Yi Dong and Evelina Bakhturina and Tao Yu and Yejin Choi and Jan Kautz and Pavlo Molchanov},
year={2025},
eprint={2511.21689},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2511.21689},
}
提供机构:
maas
创建时间:
2025-11-29



