HubBench

Name: HubBench
Creator: Floworks
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://huggingface.co/datasets/floworks/HubBench-queries

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个基于HubSpot客户关系管理操作的全面基准数据集，旨在评估ThorV2模型在与OpenAI和Anthropic领先模型的对比中的表现。该数据集包含了准确性、可靠性、延迟和成本等指标，并涵盖了广泛的实际客户关系管理任务。其规模为200个查询（包括142个单一API查询和58个复杂查询），任务是对大型语言模型中的函数调用能力进行评估。

This dataset is a comprehensive benchmark dataset based on HubSpot customer relationship management (CRM) operations, designed to evaluate the performance of the ThorV2 model against state-of-the-art models from OpenAI and Anthropic. It encompasses metrics including accuracy, reliability, latency, and cost, and covers a wide range of real-world CRM tasks. The dataset comprises a total of 200 queries, including 142 single API queries and 58 complex queries, and is constructed to assess the function calling capabilities of large language models.

提供机构：

Floworks

5,000+

优质数据集

54 个

任务类型

进入经典数据集