OneMillion-Bench
收藏魔搭社区2026-04-21 更新2026-05-03 收录
下载链接:
https://modelscope.cn/datasets/HiWorld2024/OneMillion-Bench
下载链接
链接失效反馈官方服务:
资源简介:
A bilingual (Global/Chinese) realistic expert-level benchmark for evaluating language agents across 5 professional domains. The benchmark contains 400 entries with detailed, weighted rubric-based grading criteria designed for fine-grained evaluation of domain expertise, analytical reasoning, and instruction following.
本评测基准是一款双语(全球通用/中文)、贴合真实场景的专家级评测基准,用于在5个专业领域中评估AI智能体(AI Agent)。该基准包含400条评测条目,配套详细的加权评分细则,专为细粒度评估领域专业能力、分析推理能力与指令遵循能力而设计。
提供机构:
maas
创建时间:
2026-03-10



