five

HubBench

收藏
arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/floworks/HubBench-queries
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是一个基于HubSpot客户关系管理操作的全面基准数据集,旨在评估ThorV2模型在与OpenAI和Anthropic领先模型的对比中的表现。该数据集包含了准确性、可靠性、延迟和成本等指标,并涵盖了广泛的实际客户关系管理任务。其规模为200个查询(包括142个单一API查询和58个复杂查询),任务是对大型语言模型中的函数调用能力进行评估。

This dataset is a comprehensive benchmark dataset based on HubSpot customer relationship management (CRM) operations, designed to evaluate the performance of the ThorV2 model against state-of-the-art models from OpenAI and Anthropic. It encompasses metrics including accuracy, reliability, latency, and cost, and covers a wide range of real-world CRM tasks. The dataset comprises a total of 200 queries, including 142 single API queries and 58 complex queries, and is constructed to assess the function calling capabilities of large language models.
提供机构:
Floworks
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作