ModaLabs/GatewayBench-v1
收藏Hugging Face2025-11-16 更新2025-11-30 收录
下载链接:
https://hf-mirror.com/datasets/ModaLabs/GatewayBench-v1
下载链接
链接失效反馈官方服务:
资源简介:
GatewayBench v1 是一个用于评估大型语言模型网关系统和路由决策的合成基准数据集。该数据集提供了 2000 个测试用例,每个测试用例都带有真实标签,涵盖四种不同的任务类型:工具密集型、检索、聊天和压力测试。每种任务类型旨在测试网关性能的不同方面,例如从大型集合中选择工具、信息检索、纯聊天以及高复杂性场景。数据集具有真实工具相关性标签、用于评估的理想工具子集、用于质量评估的参考答案、具有成本和延迟指标的模型池、多领域覆盖范围和难度评级。数据集以英语提供,并采用 JSONL 格式,允许进行流式处理和部分读取。数据集 100% 使用 OpenAI 的 GPT-4o 模型合成,不包含任何真实用户数据、版权材料或个人信息。
GatewayBench v1 is a synthetic benchmark dataset for evaluating LLM gateway systems and routing decisions. It provides 2,000 test cases with ground truth labels across four distinct task types: tool-heavy, retrieval, chat, and stress. Each task type is designed to test different aspects of gateway performance, such as tool selection from large sets, information retrieval, pure conversation, and high-complexity scenarios. The dataset features ground truth tool relevance labels, ideal tool subsets for evaluation, reference answers for quality assessment, a model pool with cost and latency metrics, multi-domain coverage, and difficulty ratings. The dataset is available in English and is formatted in JSONL, allowing for streaming processing and partial reads. The dataset is 100% synthetically generated using OpenAIs GPT-4o model, and no real user data, copyrighted material, or personal information is included.
提供机构:
ModaLabs



