HubBench
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/floworks/HubBench-queries
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个基于HubSpot客户关系管理操作的全面基准数据集,旨在评估ThorV2模型在与OpenAI和Anthropic领先模型的对比中的表现。该数据集包含了准确性、可靠性、延迟和成本等指标,并涵盖了广泛的实际客户关系管理任务。其规模为200个查询(包括142个单一API查询和58个复杂查询),任务是对大型语言模型中的函数调用能力进行评估。
This dataset is a comprehensive benchmark dataset based on HubSpot customer relationship management (CRM) operations, designed to evaluate the performance of the ThorV2 model against state-of-the-art models from OpenAI and Anthropic. It encompasses metrics including accuracy, reliability, latency, and cost, and covers a wide range of real-world CRM tasks. The dataset comprises a total of 200 queries, including 142 single API queries and 58 complex queries, and is constructed to assess the function calling capabilities of large language models.
提供机构:
Floworks



