ufukkaraca/ody-bench
收藏Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/ufukkaraca/ody-bench
下载链接
链接失效反馈官方服务:
资源简介:
Ody Bench是一个用于评估企业级AI代理可部署性的基准测试套件,涵盖了检索质量、跨源实体解析、矛盾检测、单步动作正确性、校准、多步工作流分解和安全敏感性请求处理等多个维度。该基准测试的目的是提供一个集成的、共享的语料库,以及一个信任调整的元度量,并诚实地披露包括负面结果在内的所有信息。
Ody Bench is a benchmark suite for evaluating the deployability of enterprise-grade AI Agents. It encompasses multiple evaluation dimensions such as retrieval quality, cross-source entity resolution, contradiction detection, single-step action correctness, calibration, multi-step workflow decomposition, and security-sensitive request handling. The objective of this benchmark is to provide an integrated and shared corpus, a trust-aligned meta-metric, and to transparently disclose all information including negative results.
提供机构:
ufukkaraca



