five

PersonalAILab/O-Researcher-SFT-Dataset

收藏
Hugging Face2026-01-09 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/PersonalAILab/O-Researcher-SFT-Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
本数据集是O-Researcher的核心训练数据,专门设计用于培养大型语言模型的端到端、多轮、多工具深度研究能力。基于多智能体数据合成范式,该数据集利用协作AI智能体模拟复杂的工具集成推理,将多智能体研究工作流程转化为适合监督微调(SFT)的轨迹数据,支持深度研究场景中的动态网络搜索、页面爬取和信息合成。数据集包含两个核心组件:SFT数据(包含从多智能体研究模拟中合成的高质量轨迹,使模型能够学习涉及多轮工具调用、信息检索和研究报告生成的复杂问题解决逻辑)和RL数据(专注于可验证的多领域研究任务场景,用于通过智能体强化学习进一步增强模型的鲁棒性和性能)。该数据集已帮助O-Researcher在主要深度研究基准测试中达到新的最先进性能,作为完全开源资源的一部分,为探索深度研究智能体、智能体强化学习及相关领域的研究人员提供了高质量的数据基础。

This dataset serves as the core training data for O-Researcher, specifically designed to elicit end-to-end, multi-turn, multi-tool deep research capabilities in large language models. Built on the Multi-Agent Data Synthesis paradigm, the dataset leverages collaborative AI agents to simulate complex tool-integrated reasoning, transforming multi-agent research workflows into trajectory data suitable for supervised fine-tuning (SFT), enabling dynamic web search, page crawling, and information synthesis in deep research scenarios. The dataset consists of two core components: SFT Data (contains high-quality trajectories synthesized from multi-agent research simulations, enabling models to learn sophisticated problem-solving logic involving multi-turn tool invocation, information retrieval, and research report generation) and RL Data (focuses on verifiable multi-domain research task scenarios, used to further enhance the models robustness and performance through agentic reinforcement learning). This dataset has empowered O-Researcher to achieve new state-of-the-art performance on major deep research benchmarks. As part of fully open-sourced resources, it provides a high-quality data foundation for researchers exploring deep research agents, agentic reinforcement learning, and related fields.
提供机构:
PersonalAILab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作