Compumacy/toolcall_bench
收藏Hugging Face2025-05-11 更新2025-10-18 收录
下载链接:
https://hf-mirror.com/datasets/Compumacy/toolcall_bench
下载链接
链接失效反馈官方服务:
资源简介:
When2Call是一个评估大型语言模型工具调用决策的基准数据集,包括何时生成工具调用、何时提出后续问题、何时承认提供的问题无法用工具回答,以及如果问题似乎需要使用工具但无法进行工具调用时应该怎么做。该数据集提供了一个用于When2Call的训练集,并利用基准的多选特性开发了一种偏好优化训练策略,该策略比传统的微调对工具调用有显著改进。此数据集已准备好商业使用。
When2Call is a benchmark designed to evaluate tool-calling decision-making for large language models (LLMs), including when to generate a tool call, when to ask follow-up questions, when to admit the question cant be answered with the tools provided, and what to do if the question seems to require tool use but a tool call cant be made. The dataset offers a training set for When2Call and leverages the multiple-choice nature of the benchmark to develop a preference optimization training regime, which shows considerable improvement over traditional fine-tuning for tool calling. This dataset is ready for commercial use.
提供机构:
Compumacy



