five

yoonsanglee/researchrubrics-react

收藏
Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/yoonsanglee/researchrubrics-react
下载链接
链接失效反馈
官方服务:
资源简介:
AggAgent是一个代理聚合框架,通过在测试时从基础代理中采样多个并行rollout并聚合它们的证据和解决方案来扩展长视野代理。此数据集卡发布了AggAgent使用的ReAct基础rollout,即在任何聚合步骤之前生成的单代理轨迹。每个rollout是通过对基准提示运行ReAct风格的深度研究代理(推理→工具调用→观察→...→最终解决方案)生成的。代理框架改编自Tongyi DeepResearch。轨迹包括完整的消息流、提取的预测、工具/rollout成本核算和自动判断结果,因此可以直接用于Best-of-N选择、聚合器训练或基础策略的行为分析。此版本涵盖三个开放权重骨干:GLM-4.7-Flash、MiniMax-M2.5和Qwen3.5-122B-A10B。每个骨干作为一个单独的Parquet文件发布。每个基准实例存储了8个并行rollout(参见metadata)。注意:BrowseComp和BrowseComp-Plus的rollout故意不在Hugging Face上分发,以避免这些评估的网络爬取污染,它们仅通过GitHub repo发布。其余基准(DeepSearchQA、HealthBench、HLE、ResearchRubrics)的rollout在此发布。

AggAgent is an agentic aggregation framework that scales long-horizon agents at test time by sampling multiple parallel rollouts from a base agent and then aggregating their evidence and solutions. This dataset card releases the ReAct base rollouts that AggAgent consumes, i.e. single-agent trajectories produced before any aggregation step. Each rollout was generated by running a ReAct-style deep-research agent (reasoning → tool call → observation → ... → final solution) against the benchmark prompts. The agent scaffold is adapted from Tongyi DeepResearch. The trajectories include the full message stream, the extracted prediction, tool/rollout cost accounting, and an auto-judge verdict, so they can be used directly for Best-of-N selection, aggregator training, or behavioural analysis of the base policy. This release covers three open-weights backbones: GLM-4.7-Flash, MiniMax-M2.5, and Qwen3.5-122B-A10B. Each backbone is shipped as a single Parquet file. roll_out_count = 8 parallel rollouts are stored per benchmark instance (see metadata). Note: Rollouts for BrowseComp and BrowseComp-Plus are intentionally not distributed on Hugging Face — to limit web-crawl contamination of these evals, they are released only as tar archives via the GitHub repo. Rollouts for the remaining benchmarks (DeepSearchQA, HealthBench, HLE, ResearchRubrics) are released here.
提供机构:
yoonsanglee
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作