CriticAgent/Submission
收藏Hugging Face2025-05-16 更新2025-11-29 收录
下载链接:
https://hf-mirror.com/datasets/CriticAgent/Submission
下载链接
链接失效反馈官方服务:
资源简介:
CriticAgent测试集是一个用于评估能够使用工具和推理的代理的奖励模型的数据集。它包含5000个注释的步骤,涵盖了10个不同的环境类别和39种任务类型。每个步骤都有详细的注释,包括步骤ID、轨迹ID、步骤类别、任务类型、历史记录、当前步骤的代理推理和行动、步骤之后的观察(可选)、整个轨迹的结果(可选)、任务的真实情况(可选)、注释和步骤的质量分数。
The CriticAgent test set is a dataset designed for evaluating reward models for agents capable of tool use and reasoning. It consists of 5,000 annotated steps spanning 10 different environment categories and 39 different task types. Each step includes detailed annotations such as step ID, trajectory ID, step category, task type, history, the agents reasoning and action at the current step, observation after the step (optional), outcome of the whole trajectory (optional), ground truth of the whole task (optional), annotation, and the quality score of the step.
提供机构:
CriticAgent



