xbench/AgentIF-OneDay
收藏Hugging Face2026-01-29 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/xbench/AgentIF-OneDay
下载链接
链接失效反馈官方服务:
资源简介:
AgentIF-OneDay是一个综合性基准测试,旨在评估AI代理在多样化日常任务中的表现,涵盖工作、生活和学习场景。与仅关注任务难度的评估不同,该数据集强调满足一般用户需求的广度,要求代理处理复杂附件、推断隐含指令并交付基于文件的实际输出。数据集包含104个任务,围绕开放工作流执行、潜在指令推断和迭代优化构建。
AgentIF-OneDay is a comprehensive benchmark designed to evaluate AI agents on diverse, daily tasks across work, life, and learning scenarios. Unlike evaluations focused solely on task difficulty, this dataset emphasizes the breadth of general user needs, requiring agents to handle complex attachments, infer implicit instructions, and deliver tangible file-based outputs. It comprises 104 tasks structured around Open Workflow Execution, Latent Instruction, and Iterative Refinement.
提供机构:
xbench



