PARTNR

Name: PARTNR
Creator: FAIR Meta
Published: 2024-11-01 01:53:12
License: 暂无描述

arXiv2024-11-01 更新2024-11-06 收录

下载链接：

https://github.com/facebookresearch/partnr-planner

下载链接

链接失效反馈

官方服务：

资源简介：

PARTNR是由FAIR Meta创建的一个用于研究人机协作中规划和推理任务的基准数据集。该数据集包含100,000个自然语言任务，涵盖60个房屋和5,819个独特对象，旨在模拟日常家庭活动中的协作场景。数据集通过大型语言模型（LLMs）和模拟环境相结合的半自动化流程生成，强调了空间、时间和异质性代理能力约束。PARTNR的应用领域主要集中在提升机器人与人类在复杂任务中的协作能力，旨在解决当前模型在协调、任务跟踪和错误恢复方面的不足。

PARTNR is a benchmark dataset created by FAIR Meta for research on planning and reasoning tasks in human-robot collaboration. This dataset includes 100,000 natural language tasks, covering 60 households and 5,819 unique objects, and aims to simulate collaborative scenarios in daily household activities. The dataset is generated through a semi-automated workflow that combines large language models (LLMs) and simulated environments, with an emphasis on spatial, temporal, and heterogeneous agent capability constraints. The primary application domains of PARTNR focus on enhancing the collaborative capabilities between robots and humans in complex tasks, with the goal of addressing current shortcomings of existing models in coordination, task tracking, and error recovery.

提供机构：

FAIR Meta

创建时间：

2024-11-01

原始信息汇总

PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-Agent Tasks

概述

数据集名称: PARTNR
领域: 具身多智能体任务中的规划与推理
主要抽象:
- Agent: 代表机器人或人类，能够在环境中行动。
- Planner: 代表集中式和分散式规划器。
- Tool: 代表使智能体能够感知或与环境交互的抽象。
- Skill: 低级技能，智能体可以使用这些技能与环境交互。
数据集生成: 使用大型语言模型（LLM）生成PARTNR数据集。

代码组织

habitat-llm:
- Agent: 代表机器人或人类。
- Tools: 代表感知或与环境交互的抽象。
- Planner: 代表集中式和分散式规划器。
- LLM: 包含Llama和GPT API的抽象。
- WorldGraph: 包含表示房间、家具、物体的层次世界图。
- Perception: 包含模拟感知管道，将局部检测发送到世界模型。
- Examples: 包含演示和评估程序，用于展示或分析规划器的性能。
- EvaluationRunner: 代表运行规划器的抽象。
- Conf: 包含所有类的hydra配置文件。
- Utils: 包含代码库中所需的各类实用方法。
- Tests: 包含单元测试。
scripts:
- hitl_analysis: 包含分析和重放人机交互轨迹的脚本。
- prediviz: 包含PARTNR任务的可视化和注释工具。

信息流

EnvironmentInterface: 读取每个智能体的观察结果并发送给感知模块。
Perception Module: 处理观察结果并更新世界图。
Planner: 使用世界图和任务描述选择工具与环境交互。

安装

参考INSTALLATION.md进行安装。

快速开始

数据集分割: train_2k, val, train, val_mini
示例:
- Decentralized Multi Agent React Summary: bash python -m habitat_llm.examples.planner_demo --config-name baselines/decentralized_zero_shot_react_summary.yaml habitat.dataset.data_path="data/datasets/partnr_episodes/v0_0/val_mini.json.gz" evaluation.agents.agent_0.planner.plan_config.llm.inference_mode=hf evaluation.agents.agent_1.planner.plan_config.llm.inference_mode=hf evaluation.agents.agent_0.planner.plan_config.llm.generation_params.engine=meta-llama/Meta-Llama-3-8B-Instruct evaluation.agents.agent_1.planner.plan_config.llm.generation_params.engine=meta-llama/Meta-Llama-3-8B-Instruct
- Centralized Multi Agent React Summary: bash python -m habitat_llm.examples.planner_demo --config-name baselines/centralized_zero_shot_react_summary.yaml habitat.dataset.data_path="data/datasets/partnr_episodes/v0_0/val_mini.json.gz" evaluation.planner.plan_config.llm.inference_mode=hf evaluation.planner.plan_config.llm.generation_params.engine=meta-llama/Meta-Llama-3-8B-Instruct
- Single Agent React Summary: bash python -m habitat_llm.examples.planner_demo --config-name baselines/single_agent_zero_shot_react_summary.yaml habitat.dataset.data_path="data/datasets/partnr_episodes/v0_0/val_mini.json.gz" evaluation.agents.agent_0.planner.plan_config.llm.inference_mode=hf evaluation.agents.agent_0.planner.plan_config.llm.generation_params.engine=meta-llama/Meta-Llama-3-8B-Instruct
- Heuristic Planner: bash python -m habitat_llm.examples.planner_demo --config-name baselines/heuristic_full_obs.yaml habitat.dataset.data_path="data/datasets/partnr_episodes/v0_0/val_mini.json.gz"

计算结果

使用python scripts/read_results.py <output_dir>/<dataset_name>检查运行进度和结果。

测试集

使用以下命令测试数据集的可运行性和初始步骤的成功率： bash HYDRA_FULL_ERROR=1 python -m habitat_llm.examples.verify_episodes --config-name examples/planner_multi_agent_demo_config.yaml hydra.run.dir="." evaluation=centralized_evaluation_runner_multi_agent habitat.dataset.data_path="data/datasets/partnr_episodes/v0_0/val_mini.json.gz" mode=data world_model.partial_obs=False evaluation.type="centralized" num_proc=5

许可证

许可证: MIT

搜集汇总

数据集介绍

构建方式

PARTNR数据集通过半自动化的任务生成流程构建，利用大型语言模型（LLMs）生成自然语言任务，并通过模拟环境进行验证和过滤。首先，LLM生成任务和评估函数，这些任务和函数在模拟房屋中的对象和家具上进行接地。接着，采用模拟-在-循环中的方法过滤掉幻觉和不切实际的指令，并辅以人工注释以增强多样性和准确性。随后，使用1000个经过验证的指令和评估函数作为种子，通过LLM的上下文提示生成100,000个任务。

特点

PARTNR数据集具有以下特点：1. 包含100,000个自然语言任务，涵盖60个房屋和5,819个独特对象；2. 任务类型包括无约束、空间、时间和异构任务，强调协作动态，如任务分工和跟踪伙伴进度；3. 通过模拟-在-循环中的方法生成任务，确保任务的现实性和可执行性；4. 提供了大规模、系统化的评估方法，揭示了当前最先进模型的显著局限性。

使用方法

PARTNR数据集可用于评估和训练具身AI代理在多智能体任务中的规划和推理能力。使用方法包括：1. 通过提供的代码库和数据集，研究人员可以重现实验并进行基准测试；2. 利用数据集中的自然语言任务和评估函数，开发和测试新的规划和推理算法；3. 通过人类-在-循环中的评估工具，研究人员可以评估代理与真实人类的协作能力，并收集数据以改进模型。

背景与挑战

背景概述

PARTNR, a benchmark for Planning and Reasoning in Embodied Multi-agent Tasks, was introduced in 2024 by a team of researchers from FAIR Meta. The dataset aims to study human-robot coordination in household activities, featuring 100,000 natural language tasks across 60 houses and 5,819 unique objects. The tasks exhibit characteristics of everyday activities, such as spatial, temporal, and heterogeneous agent capability constraints. The dataset was created using a semi-automated task generation pipeline with Large Language Models (LLMs) and simulation-in-the-loop for grounding and verification. PARTNR's significance lies in its ability to systematically evaluate planning approaches and highlight the challenges facing collaborative embodied agents, driving research in this direction.

当前挑战

The primary challenge addressed by PARTNR is the lack of realistic benchmarks that evaluate robots in collaborative settings. The dataset tackles this by providing a large-scale, diverse set of tasks that require effective collaboration dynamics, such as task division and tracking partner’s progress. The creation of such a benchmark involves significant challenges, including the need for a semi-automated generation method using LLMs with simulation-in-the-loop grounding to filter out hallucinations and infeasible instructions. Additionally, the analysis of state-of-the-art LLMs on PARTNR tasks reveals significant limitations in coordination, task tracking, and recovery from errors, underscoring the potential for improvement in these models.

常用场景

经典使用场景

PARTNR数据集的经典使用场景在于评估具身多智能体任务中的规划和推理能力。该数据集通过模拟日常家庭活动中的协作任务，涵盖了空间、时间和异构智能体能力约束等特征，为研究人机协作提供了丰富的自然语言任务和评估函数。

衍生相关工作

PARTNR数据集的发布催生了一系列相关工作，包括改进的LLM模型在多智能体协作任务中的应用、基于PARTNR的规划算法优化以及人机协作系统的实际部署研究。这些工作进一步推动了具身AI领域的发展，特别是在多智能体协作和自然语言交互方面。

数据集最近研究