five

HiTZ/TOOLtifruti

收藏
Hugging Face2026-02-04 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/HiTZ/TOOLtifruti
下载链接
链接失效反馈
官方服务:
资源简介:
巴斯克评估生态系统仍然缺乏标准化的数据集和协议来评估代理行为,特别是在端到端代理RAG设置中的工具选择和使用。为了填补这一空白,我们引入了TOOLtifruti,这是一个专门设计的数据集,用于评估LLM是否能够识别何时需要工具,并在我们的用例中从多个特定领域的选项中选择适当的工具。这种设置使工具调用评估变得简单且可重复,同时也为端到端代理RAG评估提供了一个共同的基准,因为每个查询都与参考工具调用和参考答案配对。TOOLtifruti包含来自五个领域的查询。每个查询都与其中一个领域(以及相应的工具)相关联。数据集还包括可以无需任何工具(no-tool)回答的查询。这些查询涵盖了各种类别,如翻译、数学、教育和编程。这些领域包括:BOPV/EHAA、巴斯克议会、Berria报纸、巴斯克维基百科和无工具查询。每个数据集实例包括查询、其类型(即它映射到的工具/领域)、回答它所需的参考上下文(从中创建查询的源段落)以及从该上下文中得出的相应参考答案。

The Basque evaluation ecosystem still lacks standardized datasets and protocols to assess agentic behavior, and in particular tool selection and tool use in end-to-end Agentic RAG settings. To address this gap, we introduce TOOLtifruti, an ad hoc dataset designed to evaluate whether an LLM can identify when a tool is needed and select the appropriate tool among multiple domain-specific options in our use case. This setup makes tool-calling evaluation straightforward and reproducible, and it also provides a common benchmark for end-to-end Agentic RAG evaluation, since each query is paired with a reference tool call and a reference answer. TOOLtifruti contains queries from five domains. Each query is associated with one of these domains (and therefore with the corresponding tool). The dataset includes queries that can be answered without using any tool (no-tool). These queries span various categories, such as translation, mathematics, education, and programming. These are the domains: BOPV/EHAA, Basque Parliament, Berria Newspaper, Basque Wikipedia, and No-tool queries. Each dataset instance includes the query, its type (i.e., the tool/domain it maps to), the reference context required to answer it (the source passage from which the query was created), and the corresponding reference answer derived from that context.
提供机构:
HiTZ
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作