query-MAS pairs dataset

Name: query-MAS pairs dataset
Creator: 上海交通大学, 上海人工智能实验室, 牛津大学, 悉尼大学
Published: 2025-03-06 01:27:59
License: 暂无描述

arXiv2025-03-06 更新2025-03-07 收录

下载链接：

https://github.com/rui-ye/MAS-GPT

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含了查询-多智能体系统（MAS）对，由上海交通大学、上海人工智能实验室、牛津大学和悉尼大学合作构建。数据集通过将MAS统一表示为可执行的Python代码片段来构建，每个代码片段定义了一个MAS的前向函数，其中包括了智能体提示作为变量，LLM调用作为函数，智能体之间的关系通过字符串连接表示。数据集旨在用于训练LLM生成针对用户查询的特定MAS，从而简化和优化多智能体系统的构建过程。

This dataset comprises query-multi-agent system (MAS) pairs, and was collaboratively constructed by Shanghai Jiao Tong University, Shanghai AI Laboratory, University of Oxford, and the University of Sydney. It is built by uniformly representing MASs as executable Python code snippets: each code snippet defines the forward function of an MAS, where agent prompts serve as variables, LLM calls act as functions, and the relationships between agents are represented via string concatenation. This dataset is intended for training LLMs to generate specific MASs tailored to user queries, thereby simplifying and optimizing the construction workflow of multi-agent systems.

提供机构：

上海交通大学, 上海人工智能实验室, 牛津大学, 悉尼大学

创建时间：

2025-03-06

搜集汇总

数据集介绍

构建方式

本数据集的构建旨在解决多智能体系统（MAS）在处理多样化任务时缺乏适应性和高推理成本的问题。研究者们将构建MAS的过程重新定义为生成性语言任务，即输入为用户查询，输出为相应的MAS。为此，他们统一了MAS的可执行代码表示，并提出了一致性导向的数据构建流程，以创建一个包含连贯且一致的查询-MAS对的高质量数据集。使用该数据集，研究者们训练了MAS-GPT，这是一个开源的中型语言模型，能够在单次LLM推理中生成查询自适应的MAS。生成的MAS可以无缝地应用于处理用户查询并提供高质量的响应。在9个基准测试和5个LLM上的广泛实验表明，提出的MAS-GPT在多样化设置下始终优于10多种基线MAS方法，表明MAS-GPT的高效性、效率和强大的泛化能力。

使用方法

query-MAS pairs数据集的使用方法主要涉及MAS-GPT的训练和推理过程。首先，使用数据集中的查询-MAS对进行监督微调，以训练MAS-GPT。在推理过程中，用户输入一个查询，MAS-GPT会生成一个查询特定的MAS，该MAS可以直接应用于处理查询并提供最终答案。这种方法简化了构建MAS的过程，使得MAS的创建与查询ChatGPT一样简单和高效。

背景与挑战

背景概述

大型语言模型（LLM）在处理各种任务时展现出巨大的潜力，然而单一LLM在处理实际应用中的多样化和复杂任务时往往力不从心。为了克服这一局限性，研究转向构建基于LLM的多智能体系统（MAS），其中多个具有特定能力的LLM（智能体）协同工作以实现更有效的解决方案。本文介绍了一个名为MAS-GPT的LLM，它专门训练用于根据用户查询生成可执行的MAS。为了训练MAS-GPT，研究者们构建了一个包含一致且一致的查询-MAS对的高质量数据集。MAS-GPT的引入简化了构建MAS的过程，使其适应性强、成本低、泛化能力强，为LLM推理的强推理能力提供了进一步推动。

当前挑战

构建MAS的主要挑战包括MAS的适应性和高成本问题。首先，现有的MAS在MetaGPT、ChatDev和AgentVerse中都是手动设计的，缺乏适应性和通用性。其次，尽管有努力设计自适应的MAS，但它们将人力成本转移到计算成本上，例如GPTSwarm和DyLAN需要多次LLM推理。为了解决这些关键问题，本文将构建每个查询的MAS的过程重新构造成一个生成性语言任务，并统一MAS的表示形式，提出一致性导向的数据构建流程。

常用场景

经典使用场景

query-MAS pairs dataset is primarily used for training LLMs to generate LLM-based multi-agent systems (MAS). It is designed to simplify the process of building MAS by reframing it as a generative language task. The input to this task is a user query, and the output is a corresponding MAS represented as executable code. This dataset is crucial for training LLMs like MAS-GPT, which can generate query-adaptive MAS within a single LLM inference, making the process of building MAS more efficient and adaptable.

解决学术问题

The query-MAS pairs dataset addresses the challenge of inadaptability and high costs in existing MAS approaches. Traditional methods require manual configurations or multiple calls of advanced LLMs, leading to inadaptability and high inference costs. This dataset, along with MAS-GPT, simplifies the MAS building process by reducing it to a single LLM inference, thus enhancing adaptability and reducing costs. It also provides a solution to the lack of training data for MAS generation, which is a significant limitation in current LLMs.

实际应用

The practical application of the query-MAS pairs dataset is extensive. It can be used in various domains such as mathematics, coding, and general QA, where MAS-GPT can generate executable MAS to process user queries and deliver high-quality responses. This approach can significantly reduce the time and computational resources required to build the right MAS, making it more accessible and efficient. It can also be applied in real-world scenarios where MAS is needed, such as in complex problem-solving tasks, decision-making processes, and data analysis.

数据集最近研究