five

PersonalAILab/O-Researcher-RL-Dataset

收藏
Hugging Face2026-01-09 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/PersonalAILab/O-Researcher-RL-Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是O-Researcher的核心训练数据,专门设计用于在大型语言模型中激发端到端、多轮、多工具的深度研究能力。基于多智能体数据合成范式,该数据集利用协作的AI智能体模拟复杂的工具集成推理,将多智能体研究工作流程转化为适用于监督微调(SFT)的轨迹数据,支持深度研究场景中的动态网页搜索、页面抓取和信息合成。数据集包含两个核心部分:SFT数据:包含从多智能体研究模拟中合成的高质量轨迹,使模型能够学习涉及多轮工具调用、信息检索和研究报告生成的复杂问题解决逻辑;RL数据:专注于可验证的多领域研究任务场景,用于通过智能体强化学习进一步增强模型的鲁棒性和性能。该数据集已帮助O-Researcher在主要深度研究基准上实现了新的最先进性能。作为完全开源资源的一部分,它为探索深度研究智能体、智能体强化学习及相关领域的研究人员提供了高质量的数据基础。

This dataset serves as the core training data for O-Researcher, specifically designed to elicit end-to-end, multi-turn, multi-tool deep research capabilities in large language models. Built on the Multi-Agent Data Synthesis paradigm, the dataset leverages collaborative AI agents to simulate complex tool-integrated reasoning, transforming multi-agent research workflows into trajectory data suitable for supervised fine-tuning (SFT), enabling dynamic web search, page crawling, and information synthesis in deep research scenarios. The dataset consists of two core components: SFT Data: Contains high-quality trajectories synthesized from multi-agent research simulations, enabling models to learn sophisticated problem-solving logic involving multi-turn tool invocation, information retrieval, and research report generation; RL Data: Focuses on verifiable multi-domain research task scenarios, used to further enhance the models robustness and performance through agentic reinforcement learning. This dataset has empowered O-Researcher to achieve new state-of-the-art performance on major deep research benchmarks. As part of fully open-sourced resources, it provides a high-quality data foundation for researchers exploring deep research agents, agentic reinforcement learning, and related fields.
提供机构:
PersonalAILab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作