veggiebird/MATPO-rollout

Name: veggiebird/MATPO-rollout
Creator: veggiebird
Published: 2025-10-08 08:53:16
License: 暂无描述

Hugging Face2025-10-08 更新2025-10-25 收录

下载链接：

https://hf-mirror.com/datasets/veggiebird/MATPO-rollout

下载链接

链接失效反馈

官方服务：

资源简介：

MATPO数据集是一种用于在单个大型语言模型中训练多个代理角色（规划代理和工人代理）的强化学习框架。它解决了当前单代理方法在多轮工具集成规划中的局限性，例如上下文长度瓶颈和嘈杂的工具响应。MATPO引入了一种多代理架构，其中规划代理负责高级规划和任务委派，而工人代理则处理特定的浏览和搜索任务。该框架在单个LLM中训练这两种角色，并通过强化学习进行角色特定的提示。MATPO具有多个关键特性，包括多代理模型、原则性信用分配、易于集成、稳健的训练和基础设施效率。

MATPO is a novel reinforcement learning framework designed for training multiple specialized agent roles (planner and worker agents) within a single large language model. It addresses limitations of current single-agent approaches for multi-turn tool-integrated planning, such as context length bottleneck and noisy tool responses. MATPO introduces a multi-agent-in-one-model architecture where a planner-agent orchestrates high-level planning and delegates subtasks, while worker-agents handle specific browsing and search tasks with isolated contexts. Both roles are trained within a single LLM using role-specific prompts via reinforcement learning. The framework offers key features like multi-agent-in-one-model, principled credit assignment, easy integration, robust training, and infrastructure efficiency.

提供机构：

veggiebird

5,000+

优质数据集

54 个

任务类型

进入经典数据集