five

mercor/apex-agents

收藏
Hugging Face2026-02-06 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/mercor/apex-agents
下载链接
链接失效反馈
官方服务:
资源简介:
APEX–Agents是由Mercor提供的一个基准数据集,用于评估AI代理是否能够执行长期、跨应用的专业服务任务。这些任务由投资银行分析师、管理顾问和公司律师创建,要求代理在包含文件和工具(如文档、电子表格、PDF、电子邮件、聊天、日历)的现实工作环境中导航。数据集包含480个任务(每个职业类别160个)、33个世界(10个银行、11个咨询、12个法律)、每个任务平均4个评分标准、每个任务的黄金输出、世界资产(文件和元数据)。数据集采用CC-BY 4.0许可,仅用于模型评估,禁止用于训练、微调或参数拟合。

APEX–Agents is a benchmark from Mercor for evaluating whether AI agents can execute long-horizon, cross-application professional services tasks. Tasks were created by investment banking analysts, management consultants, and corporate lawyers, and require agents to navigate realistic work environments with files and tools (e.g., docs, spreadsheets, PDFs, email, chat, calendar). The dataset includes 480 total tasks (160 per job category), 33 total worlds (10 banking, 11 consulting, 12 law), rubric criteria with a mean of ~4 criteria per task, gold outputs for every task, and world assets (files + metadata). The dataset is licensed under CC-BY 4.0 and is intended exclusively for model evaluation. Any use of this dataset for training, fine-tuning, or parameter fitting is expressly forbidden.
提供机构:
mercor
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作