ToolRet

github2025-03-12 更新2025-03-02 收录

下载链接：

https://github.com/mangopy/tool-retrieval-benchmark

下载链接

链接失效反馈

官方服务：

资源简介：

ToolRet是第一个全面的工具检索基准测试，用于系统地评估现有信息检索模型在工具检索任务上的表现。该项目还提供了一个大规模的训练数据集，以优化信息检索模型在此任务上的专业知识。

ToolRet is the first comprehensive tool retrieval benchmark designed to systematically evaluate the performance of existing information retrieval models on the tool retrieval task. This project also provides a large-scale training dataset to optimize the specialized expertise of information retrieval models for this specific task.

创建时间：

2025-02-11

原始信息汇总

ToolRet 数据集概述

数据集简介

ToolRet 是一个用于工具检索任务的全面基准数据集，旨在系统地评估信息检索模型在工具检索任务上的性能。该数据集还包括一个大规模的训练数据集，用于优化信息检索模型在工具检索任务上的专业知识。

数据集构成

ToolRet：评估数据集，包含查询、相关工具和指令。
ToolRet-train：训练数据集，包含查询、正例工具和负例工具。

数据集示例

评估数据集示例： json { "id": "apigen_query_5", "query": "Given an initial population of 500 bacteria with a growth rate of 0.3 per minute and a doubling time of 20 minutes, what will be the population after 45 minutes?", "labels": [ { "id": "apigen_tool_272", "doc": { "name": "bacterial_growth", "description": "Calculates the bacterial population after a given time based on the initial population and growth rate.", "parameters": { "initial_population": { "description": "The initial bacterial population.", "type": "int", "default": 20 }, "growth_rate": { "description": "The growth rate per unit time.", "type": "float", "default": 20 }, "time": { "description": "The time elapsed.", "type": "float" }, "doubling_time": { "description": "The doubling time of the bacteria in minutes. Defaults to 20.", "type": "float, optional" } } }, "relevance": 1 } ], "instruction": "Given a bacterial population prediction task, retrieve tools that calculate population growth by processing parameters such as initial population, growth rate, elapsed time, and doubling time to provide the projected population size." }
训练数据集示例： txt { "query": "Is https://www.apple.com available in the Wayback Machine on September 9, 2015?", "pos": [ "{name: availability, description: Checks if a given URL is archived and currently accessible in the Wayback Machine., parameters: {url: {description: The URL to check for availability in the Wayback Machine., type: str, default: http://mashape.com}, timestamp: {description: "The timestamp to look up in Wayback. If not specified, the most recent available capture is returned. The format of the timestamp is 1-14 digits (YYYYMMDDhhmmss). Defaults to 20090101.", type: str, optional, default: 20090101}, callback: {description: An optional callback to produce a JSONP response. Defaults to None., type: str, optional, default: }}}" ], "neg": [ "{name: top_grossing_mac_apps, description: Fetches a list of the top-grossing Mac apps from the App Store., parameters: {category: {description: "The category ID for the apps to be fetched. Defaults to 6016 (general category).", type: str, default: 6016}, country: {description: "The country code for the App Store. Defaults to us.", type: str, default: us}, lang: {description: "The language code for the results. Defaults to en.", type: str, default: en}, num: {description: The number of results to return. Defaults to 100. Maximum allowed value is 200., type: int, default: 100}}}" ] }

数据集发布

评估数据集：已发布在 HuggingFace 上，包括工具集（ToolRet-Tools）和查询集（ToolRet-Queries）。
训练数据集：具体内容在 README 文件中展示。

Python 环境设置

使用 conda 创建 Python 环境：conda env create -f requirements.yml

模型评估

提供了多种信息检索模型和重排模型的评估配置。
评估示例代码已提供，可在 example/embedding.py 中查看。

数据集用途

用于评估和训练信息检索模型在工具检索任务上的性能。

搜集汇总

数据集介绍

构建方式

ToolRet数据集的构建采用系统化的方式，首先定义了工具检索任务的评价指标，接着基于大规模的工具集构建了训练和评估数据集。数据集包含查询、正例工具和负例工具，其中正例工具是与查询相关的工具，负例工具是与查询不相关的工具。通过这种方式，数据集能够为信息检索模型提供充足的训练和评估资源，以提升模型在工具检索任务上的性能。

特点

ToolRet数据集的特点在于其是首个全面的工具检索评价基准，它为工具检索任务提供了系统性的评价标准。此外，数据集规模庞大，包含了多样化的查询和工具，能够全面考验信息检索模型在工具检索方面的能力。数据集中的工具描述详细，包含了工具的名称、功能描述、参数等信息，为模型的训练和评估提供了丰富的信息。

使用方法

使用ToolRet数据集时，用户可以从HuggingFace平台获取数据集，其中包括工具集和查询集。数据集的使用包括模型训练和模型评估两个阶段。在训练阶段，用户可以利用数据集中的查询和正例工具、负例工具来训练信息检索模型。在评估阶段，用户可以通过数据集中的评估配置来测试模型的性能，评估配置中包含了多种先进的检索模型，用户可以根据需要选择合适的模型进行评估。

背景与挑战

背景概述

ToolRet数据集，创建于2024年12月15日，是由AutoTools项目组贡献的。该数据集旨在为大规模工具集中的工具检索任务提供首个全面的工具检索基准，并配备了一个大规模的训练数据集，以优化信息检索模型在工具检索任务上的专长。研究背景方面，ToolRet数据集的构建是为了应对大型语言模型在工具学习过程中的一个重要步骤——从大规模工具集中检索有用的工具。此数据集的创建，对于推动相关领域的研究具有重要意义。

当前挑战

在领域问题解决方面，ToolRet数据集面临的挑战包括如何有效地从大规模工具集中检索出与任务相关的工具。在构建过程中，挑战主要集中于如何构建一个能够全面覆盖各类工具，并且能够准确反映工具功能和参数的大规模训练数据集。此外，对于评价模型的挑战在于，需要设计出能够适应不同工具描述和查询特性的评价协议和指标体系。

常用场景

经典使用场景

ToolRet数据集作为一项全面的工具检索基准，其经典使用场景在于为大型语言模型（LLMs）在工具学习领域中提供了一种从大规模工具集中检索有用工具的有效途径。通过该数据集，研究者可以系统评估现有的信息检索模型在工具检索任务上的表现，进而优化这些模型在特定任务上的性能。

衍生相关工作

基于ToolRet数据集，研究者已经衍生出了一系列相关工作，包括对现有工具检索模型的评估、新型工具检索算法的开发以及工具使用策略的优化等，这些工作进一步推动了工具学习和应用自动化领域的发展。

数据集最近研究