diaforge-utc-r-0725
收藏魔搭社区2025-12-31 更新2025-11-08 收录
下载链接:
https://modelscope.cn/datasets/SAP/diaforge-utc-r-0725
下载链接
链接失效反馈官方服务:
资源简介:
## DiaFORGE UTC: Unified Tool-Calling Conversations Dataset
[]() [](https://arxiv.org/abs/2507.03336)
Dataset for our paper [Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky](https://arxiv.org/abs/2507.03336) which includes 5000 enterprise tools and the corresponding dialogues generated using DiaFORGE UTC data engine.
The dataset is generated with the data generation engine described in Figure 1. The engine simulates a user agent and an assistant agent in a dialogue, where the user agent has a persona and the assistant agent has access to a set of tools. Detailed information about the data generation process can be found in the paper.
> [!WARNING]
> The dataset has suffix "-r" to indicate that the dataset MUST ONLY be used for research purposes. The dataset is not intended for commercial use and should not be used to train models that are intended for commercial use.
<figure>
<div align="center">
<img style="text-align: center;" src="./images/data_generation_engine.png" width="80%" style="align: center;" alt="Tool Disambiguation Turns">
</div>
<div align="center">
<figcaption style="text-align: center;">Figure 1: Data Generation Engine for Disambiguation-Centric Unified Tool-Calling Conversations (UTC-GEN)</figcaption>
</div>
</figure>
### Dataset Structure
Each entry in the dataset contains:
- `seed`: Gold tool for the generated dialogue.
- `user_persona`: Persona of the simulated user agent.
- `messages`: List of messages in the dialogue, where each message is a dictionary containing:
- `role`: Role of the message sender (e.g., `user`, `assistant`).
- `content`: Content of the message.
- `thought`: Thought process of the sender. (only for `assistant` messages)
- `tool_calls`: List of tool calls made in the message, if any. (only for `assistant` messages)
- `distractor_tools`: List of distractor tools that are not the gold tool but are relevant to the dialogue. These tools are used by user agent to generate utterances that are hard to disambiguate from the gold tool.
- `retrieved_tools`: List of tools retrieved by the assistant agent and used by the assistant agent in order to ask clarifying questions to the user agent.
### Usage
You can use the dataset to train and evaluate LLMs for tool-calling tasks. The dialogues are designed to be realistic and challenging, with a focus on disambiguation and user intent understanding.
### About the Enterprise Tools
The enterprise tools in this dataset correspond to wide range of APIs. The name of the tool in the dataset is of the form `fn_<id>_<api_name>`. Note that the dataset contain many tools with the same `api_name` but different `id`. The `id` is used to distinguish between different tools that have the same `api_name`. These different tools (having the same `api_name`) correspond to different API endpoints or different configurations of the same API. Tool descriptions are provided in the dataset to help understand the functionality of the API as well as the endpoint.
### Data Distribution
We present the distribution of the training data available in this dataset. We use the dataset to finetune LLMs for tool-calling tasks. The dataset is designed to be realistic and challenging, with a focus on disambiguation and user intent clarification & understanding.
<figure>
<div align="center">
<img style="text-align: center;" src="./images/data_num_turns.png" width="50%" style="align: center;" alt="Dustribution of Turns in the Dataset">
</div>
<div align="center">
<figcaption style="text-align: center;">Figure 2: Conversation length distribution: number of
dialogue turns per sample</figcaption>
</div>
</figure>
Figure 2 illustrates the distribution of conversation lengths, measured by the number of dialogue turns. The majority of conversations contain fewer than five turns, aligning with typical session lengths observed in real-world enterprise tool-use scenarios.
<figure>
<div align="center">
<img style="text-align: center;" src="./images/data_num_params.png" width="50%" style="align: center;" alt="Dustribution of Turns in the Dataset">
</div>
<div align="center">
<figcaption style="text-align: center;">Figure 3: Parameter count distribution: number of
parameters per seed tool</figcaption>
</div>
</figure>
Figure 3 shows the distribution of the number of parameters associated with the seed tools for which the conversations were generated. We observe that most of the tools has fewer than 5 parameters, with a few tools having more than 10 parameters. This distribution is typical for enterprise tools, where most APIs are designed to be simple and easy to use.
<figure>
<div style="display: flex;">
<img src="./images/data_num_tool_disamb_turns.png" width="49%" alt="Tool Disambiguation Turns">
<img src="./images/data_num_param_disamb_turns.png" width="49%" alt="Prameter Disambiguation Turns">
</div>
<div align="center">
<figcaption style="text-align: center;">Figure 4: Turn distribution for tool disambiguation (left) and parameter specification (right).</figcaption>
</div>
</figure>
Figure 4 depicts the number of dialogue turns dedicated to tool disambiguation and parameter filling. In most cases, tool selection is completed within two turns, followed by a single turn for parameter specification. Notably, some samples contain zero turns for parameter filling: this occurs when the tool either requires no parameters or when parameters are implicitly provided during the tool selection phase, which reflects common patterns observed in real-world multi-turn enterprise interactions.
### Authors
- [Ashutosh Hathidara](https://www.linkedin.com/in/ashutosh1919/)
- [Julien Yu](https://www.linkedin.com/in/yue-yu-415995a6/)
- [Sebastian Schreiber](https://www.linkedin.com/in/sebastian-schreiber-00b091159/)
### Citation
If you use this dataset in your research or want to refer to our work, please cite:
```bibtex
@misc{hathidara2025disambiguationcentricfinetuningmakesenterprise,
title={Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky},
author={Ashutosh Hathidara and Julien Yu and Sebastian Schreiber},
year={2025},
eprint={2507.03336},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2507.03336},
}
```
### License
Please see our [LICENSE](LICENSE) for copyright and license information.
### Copyright Statement
Copyright 2025 SAP SE or an SAP affiliate company and diaforge-utc-r-0725 contributors.
# DiaFORGE UTC:统一工具调用对话数据集(Unified Tool-Calling Conversations Dataset)
[]() [](https://arxiv.org/abs/2507.03336)
本数据集配套论文为《以消歧为核心的微调使企业级工具调用大语言模型(Large Language Model, LLM)更贴合真实场景且风险更低》,其中包含5000个企业级工具及使用DiaFORGE UTC数据引擎生成的对应对话。
本数据集通过图1所述的数据生成引擎构建。该引擎在对话中模拟用户AI智能体(AI Agent)与助手AI智能体:用户智能体拥有预设人设,助手智能体可调用指定工具集。数据生成流程的详细说明请参见原论文。
> [!WARNING]
> 本数据集后缀带有"-r",表示该数据集**仅可用于研究用途**,不得用于商业场景,亦不得用于训练面向商业应用的模型。
<figure>
<div align="center">
<img style="text-align: center;" src="./images/data_generation_engine.png" width="80%" style="align: center;" alt="Tool Disambiguation Turns">
</div>
<div align="center">
<figcaption style="text-align: center;">图1:以消歧为核心的统一工具调用对话数据生成引擎(UTC-GEN)</figcaption>
</div>
</figure>
### 数据集结构
数据集中的每条样本包含以下字段:
- `seed`:生成对应对话的金标准工具。
- `user_persona`:模拟用户智能体的人设信息。
- `messages`:对话消息列表,每条消息为包含以下键值的字典:
- `role`:消息发送者的角色(例如`user`、`assistant`)。
- `content`:消息的具体内容。
- `thought`:发送者的思考过程(仅助手角色消息包含此字段)。
- `tool_calls`:当前消息中发起的工具调用列表(若存在,仅助手角色消息包含此字段)。
- `distractor_tools`:干扰工具列表,即非金标准工具但与对话主题相关的工具。此类工具用于辅助用户智能体生成难以与金标准工具区分的对话表述,提升消歧任务的挑战性。
- `retrieved_tools`:助手智能体检索得到的工具列表,助手将使用这些工具向用户智能体发起澄清询问。
### 使用场景
本数据集可用于训练与评估面向工具调用任务的大语言模型。对话设计贴合真实场景且具备一定挑战性,核心聚焦于消歧任务与用户意图理解。
### 企业级工具说明
本数据集收录的企业级工具对应各类广泛应用的API。数据集中工具的命名格式为`fn_<id>_<api_name>`。请注意,数据集中存在多个`api_name`相同但`id`不同的工具:`id`用于区分同名API下的不同工具,此类工具对应不同的API端点或同一API的不同配置。数据集中附带工具描述,以帮助理解API功能与对应端点信息。
### 数据分布
本数据集提供训练数据的分布统计,我们使用该数据集对面向工具调用任务的大语言模型进行微调。数据集设计贴合真实场景且具备挑战性,核心聚焦于消歧任务与用户意图澄清及理解。
<figure>
<div align="center">
<img style="text-align: center;" src="./images/data_num_turns.png" width="50%" style="align: center;" alt="对话轮次分布">
</div>
<div align="center">
<figcaption style="text-align: center;">图2:对话长度分布:单样本的对话轮次统计</figcaption>
</div>
</figure>
图2展示了以对话轮次为统计维度的对话长度分布。绝大多数对话的轮次少于5轮,这与真实企业级工具使用场景中的典型会话时长相符。
<figure>
<div align="center">
<img style="text-align: center;" src="./images/data_num_params.png" width="50%" style="align: center;" alt="参数数量分布">
</div>
<div align="center">
<figcaption style="text-align: center;">图3:参数数量分布:单金标准工具的参数数量统计</figcaption>
</div>
</figure>
图3展示了生成对话所依赖的金标准工具的参数数量分布。我们观察到,多数工具的参数数量少于5个,仅有少量工具的参数数量超过10个。该分布符合企业级工具的普遍特征:绝大多数API均设计为简洁易用的形态。
<figure>
<div align="center">
<img src="./images/data_num_tool_disamb_turns.png" width="49%" alt="工具消歧轮次分布">
<img src="./images/data_num_param_disamb_turns.png" width="49%" alt="参数指定轮次分布">
</div>
<div align="center">
<figcaption style="text-align: center;">图4:工具消歧(左)与参数指定(右)的轮次分布</figcaption>
</div>
</figure>
图4展示了对话中用于工具消歧与参数补全的轮次占比。多数场景下,工具选择环节可在2轮对话内完成,随后仅需1轮对话完成参数指定。值得注意的是,部分样本的参数补全轮次为0:这要么是因为该工具无需参数,要么是参数已在工具选择阶段隐式提供,这一特征贴合真实多轮企业级交互的常见模式。
### 作者
- [Ashutosh Hathidara](https://www.linkedin.com/in/ashutosh1919/)
- [Julien Yu](https://www.linkedin.com/in/yue-yu-415995a6/)
- [Sebastian Schreiber](https://www.linkedin.com/in/sebastian-schreiber-00b091159/)
### 引用规范
若您在研究中使用本数据集或参考本工作,请引用如下文献:
bibtex
@misc{hathidara2025disambiguationcentricfinetuningmakesenterprise,
title={Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky},
author={Ashutosh Hathidara and Julien Yu and Sebastian Schreiber},
year={2025},
eprint={2507.03336},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2507.03336},
}
### 许可协议
有关版权与许可的详细信息,请参见本仓库中的[LICENSE](LICENSE)文件。
### 版权声明
Copyright 2025 SAP SE或其关联公司及diaforge-utc-r-0725贡献者。
提供机构:
maas
创建时间:
2025-11-05



