mercor/apex-agents
收藏Hugging Face2026-02-06 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/mercor/apex-agents
下载链接
链接失效反馈官方服务:
资源简介:
APEX–Agents是由Mercor提供的一个基准数据集,用于评估AI代理是否能够执行长期、跨应用的专业服务任务。这些任务由投资银行分析师、管理顾问和公司律师创建,要求代理在包含文件和工具(如文档、电子表格、PDF、电子邮件、聊天、日历)的现实工作环境中导航。数据集包含480个任务(每个职业类别160个)、33个世界(10个银行、11个咨询、12个法律)、每个任务平均4个评分标准、每个任务的黄金输出、世界资产(文件和元数据)。数据集采用CC-BY 4.0许可,仅用于模型评估,禁止用于训练、微调或参数拟合。
APEX–Agents is a benchmark from Mercor for evaluating whether AI agents can execute long-horizon, cross-application professional services tasks. Tasks were created by investment banking analysts, management consultants, and corporate lawyers, and require agents to navigate realistic work environments with files and tools (e.g., docs, spreadsheets, PDFs, email, chat, calendar). The dataset includes 480 total tasks (160 per job category), 33 total worlds (10 banking, 11 consulting, 12 law), rubric criteria with a mean of ~4 criteria per task, gold outputs for every task, and world assets (files + metadata). The dataset is licensed under CC-BY 4.0 and is intended exclusively for model evaluation. Any use of this dataset for training, fine-tuning, or parameter fitting is expressly forbidden.
提供机构:
mercor



