yoonsanglee/healthbench-react
收藏Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/yoonsanglee/healthbench-react
下载链接
链接失效反馈官方服务:
资源简介:
AggAgent是一个代理聚合框架,通过在测试时从基础代理中采样多个并行rollout并聚合它们的证据和解决方案,来扩展长视野代理。本数据集卡片发布了AggAgent所使用的ReAct基础rollout,即在任何聚合步骤之前生成的单代理轨迹。每个rollout是通过对基准提示运行ReAct风格的深度研究代理(推理→工具调用→观察→...→最终解决方案)生成的。代理框架改编自Tongyi DeepResearch。轨迹包括完整的消息流、提取的预测、工具/rollout成本核算和自动判断结果,因此可以直接用于Best-of-N选择、聚合器训练或基础策略的行为分析。本次发布涵盖了三个开放权重骨干模型:GLM-4.7-Flash、MiniMax-M2.5和Qwen3.5-122B-A10B。每个骨干模型作为一个单独的Parquet文件提供。每个基准实例存储了8个并行rollout(参见metadata字段)。
AggAgent is an agentic aggregation framework that scales long-horizon agents at test time by sampling multiple parallel rollouts from a base agent and then aggregating their evidence and solutions. This dataset card releases the ReAct base rollouts that AggAgent consumes, i.e. single-agent trajectories produced before any aggregation step. Each rollout was generated by running a ReAct-style deep-research agent (reasoning → tool call → observation → ... → final solution) against the benchmark prompts. The agent scaffold is adapted from Tongyi DeepResearch. The trajectories include the full message stream, the extracted prediction, tool/rollout cost accounting, and an auto-judge verdict, so they can be used directly for Best-of-N selection, aggregator training, or behavioural analysis of the base policy. This release covers three open-weights backbones: GLM-4.7-Flash, MiniMax-M2.5, and Qwen3.5-122B-A10B. Each backbone is shipped as a single Parquet file. 8 parallel rollouts are stored per benchmark instance (see metadata).
提供机构:
yoonsanglee



