THU-KEG/AgentIF
收藏Hugging Face2025-10-24 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/THU-KEG/AgentIF
下载链接
链接失效反馈官方服务:
资源简介:
AgentIF是一个针对大型语言模型在代理场景下指令遵循能力的评估基准,包含50个真实世界的代理应用,平均每个指令长度为1,723个单词,最长可达15,630个单词,每个指令平均包含11.9个约束,涵盖多种约束类型。数据集由707个人工注释的指令组成,跨越50个代理任务,并为每个指令注释了相关约束和评估指标。
AgentIF is a benchmark for evaluating the instruction following ability of large language models in agentic scenarios, consisting of 50 real-world agentic applications, with an average instruction length of 1,723 words and a maximum of 15,630 words, each instruction containing an average of 11.9 constraints covering various constraint types. The dataset is composed of 707 human-annotated instructions across 50 agentic tasks, with annotations for related constraints and corresponding evaluation metrics for each instruction.
提供机构:
THU-KEG



