ToolRet
收藏ToolRet 数据集概述
数据集简介
ToolRet 是一个用于工具检索任务的全面基准数据集,旨在系统地评估信息检索模型在工具检索任务上的性能。该数据集还包括一个大规模的训练数据集,用于优化信息检索模型在工具检索任务上的专业知识。
数据集构成
- ToolRet:评估数据集,包含查询、相关工具和指令。
- ToolRet-train:训练数据集,包含查询、正例工具和负例工具。
数据集示例
-
评估数据集示例: json { "id": "apigen_query_5", "query": "Given an initial population of 500 bacteria with a growth rate of 0.3 per minute and a doubling time of 20 minutes, what will be the population after 45 minutes?", "labels": [ { "id": "apigen_tool_272", "doc": { "name": "bacterial_growth", "description": "Calculates the bacterial population after a given time based on the initial population and growth rate.", "parameters": { "initial_population": { "description": "The initial bacterial population.", "type": "int", "default": 20 }, "growth_rate": { "description": "The growth rate per unit time.", "type": "float", "default": 20 }, "time": { "description": "The time elapsed.", "type": "float" }, "doubling_time": { "description": "The doubling time of the bacteria in minutes. Defaults to 20.", "type": "float, optional" } } }, "relevance": 1 } ], "instruction": "Given a
bacterial population predictiontask, retrieve tools that calculate population growth by processing parameters such as initial population, growth rate, elapsed time, and doubling time to provide the projected population size." } -
训练数据集示例: txt { "query": "Is https://www.apple.com available in the Wayback Machine on September 9, 2015?", "pos": [ "{name: availability, description: Checks if a given URL is archived and currently accessible in the Wayback Machine., parameters: {url: {description: The URL to check for availability in the Wayback Machine., type: str, default: http://mashape.com}, timestamp: {description: "The timestamp to look up in Wayback. If not specified, the most recent available capture is returned. The format of the timestamp is 1-14 digits (YYYYMMDDhhmmss). Defaults to 20090101.", type: str, optional, default: 20090101}, callback: {description: An optional callback to produce a JSONP response. Defaults to None., type: str, optional, default: }}}" ], "neg": [ "{name: top_grossing_mac_apps, description: Fetches a list of the top-grossing Mac apps from the App Store., parameters: {category: {description: "The category ID for the apps to be fetched. Defaults to 6016 (general category).", type: str, default: 6016}, country: {description: "The country code for the App Store. Defaults to us.", type: str, default: us}, lang: {description: "The language code for the results. Defaults to en.", type: str, default: en}, num: {description: The number of results to return. Defaults to 100. Maximum allowed value is 200., type: int, default: 100}}}" ] }
数据集发布
- 评估数据集:已发布在 HuggingFace 上,包括工具集(ToolRet-Tools)和查询集(ToolRet-Queries)。
- 训练数据集:具体内容在 README 文件中展示。
相关论文
- 论文已提交至 arXiv,详情见 paper。
Python 环境设置
- 使用
conda创建 Python 环境:conda env create -f requirements.yml
模型评估
- 提供了多种信息检索模型和重排模型的评估配置。
- 评估示例代码已提供,可在
example/embedding.py中查看。
数据集用途
- 用于评估和训练信息检索模型在工具检索任务上的性能。




