veggiebird/MATPO-rollout
收藏Hugging Face2025-10-08 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/veggiebird/MATPO-rollout
下载链接
链接失效反馈官方服务:
资源简介:
MATPO数据集是一种用于在单个大型语言模型中训练多个代理角色(规划代理和工人代理)的强化学习框架。它解决了当前单代理方法在多轮工具集成规划中的局限性,例如上下文长度瓶颈和嘈杂的工具响应。MATPO引入了一种多代理架构,其中规划代理负责高级规划和任务委派,而工人代理则处理特定的浏览和搜索任务。该框架在单个LLM中训练这两种角色,并通过强化学习进行角色特定的提示。MATPO具有多个关键特性,包括多代理模型、原则性信用分配、易于集成、稳健的训练和基础设施效率。
MATPO is a novel reinforcement learning framework designed for training multiple specialized agent roles (planner and worker agents) within a single large language model. It addresses limitations of current single-agent approaches for multi-turn tool-integrated planning, such as context length bottleneck and noisy tool responses. MATPO introduces a multi-agent-in-one-model architecture where a planner-agent orchestrates high-level planning and delegates subtasks, while worker-agents handle specific browsing and search tasks with isolated contexts. Both roles are trained within a single LLM using role-specific prompts via reinforcement learning. The framework offers key features like multi-agent-in-one-model, principled credit assignment, easy integration, robust training, and infrastructure efficiency.
提供机构:
veggiebird



