yoonsanglee/healthbench-react

Name: yoonsanglee/healthbench-react
Creator: yoonsanglee
Published: 2026-04-29 18:00:29
License: 暂无描述

Hugging Face2026-04-29 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/yoonsanglee/healthbench-react

下载链接

链接失效反馈

官方服务：

资源简介：

AggAgent是一个代理聚合框架，通过在测试时从基础代理中采样多个并行rollout并聚合它们的证据和解决方案，来扩展长视野代理。本数据集卡片发布了AggAgent所使用的ReAct基础rollout，即在任何聚合步骤之前生成的单代理轨迹。每个rollout是通过对基准提示运行ReAct风格的深度研究代理（推理→工具调用→观察→...→最终解决方案）生成的。代理框架改编自Tongyi DeepResearch。轨迹包括完整的消息流、提取的预测、工具/rollout成本核算和自动判断结果，因此可以直接用于Best-of-N选择、聚合器训练或基础策略的行为分析。本次发布涵盖了三个开放权重骨干模型：GLM-4.7-Flash、MiniMax-M2.5和Qwen3.5-122B-A10B。每个骨干模型作为一个单独的Parquet文件提供。每个基准实例存储了8个并行rollout（参见metadata字段）。

AggAgent is an agentic aggregation framework that scales long-horizon agents at test time by sampling multiple parallel rollouts from a base agent and then aggregating their evidence and solutions. This dataset card releases the ReAct base rollouts that AggAgent consumes, i.e. single-agent trajectories produced before any aggregation step. Each rollout was generated by running a ReAct-style deep-research agent (reasoning → tool call → observation → ... → final solution) against the benchmark prompts. The agent scaffold is adapted from Tongyi DeepResearch. The trajectories include the full message stream, the extracted prediction, tool/rollout cost accounting, and an auto-judge verdict, so they can be used directly for Best-of-N selection, aggregator training, or behavioural analysis of the base policy. This release covers three open-weights backbones: GLM-4.7-Flash, MiniMax-M2.5, and Qwen3.5-122B-A10B. Each backbone is shipped as a single Parquet file. 8 parallel rollouts are stored per benchmark instance (see metadata).

提供机构：

yoonsanglee

5,000+

优质数据集

54 个

任务类型

进入经典数据集