Curriculum learning and reward shaping for multi-robot navigation via deep reinforcement learning

Name: Curriculum learning and reward shaping for multi-robot navigation via deep reinforcement learning
Creator: Thammasat University
Published: 2026-01-23 10:00:13
License: 暂无描述

DataCite Commons2026-01-23 更新2026-05-04 收录

下载链接：

http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14457/TU.the.2025.66

下载链接

链接失效反馈

官方服务：

资源简介：

Autonomous multi-robot navigation in unknown environments remains a challenging task, as robots must make real-time decisions under dynamic and partially observable conditions without relying on global information or inter-robot communication. In recent years, the use of mobile robots has significantly expanded across diverse fields, transforming automation and efficiency in applications such as manufacturing, warehousing, logistics, healthcare, and service industries. This study presents a deep reinforcement learning–based (DRL-based) framework for multi-robot navigation and exploration in unknown environments, addressing the limitations of traditional and hybrid path planning approaches that rely on predefined maps and then resulting high computational demands. The proposed framework employs the twin delayed deep deterministic policy gradient with prioritized experience replay (TD3-PER) to enhance learning efficiency and adaptability without requiring prior environmental information. To mitigate suboptimal navigation behaviors such as oscillations, stagnation, and local optima, a structured reward function was designed through reward shaping. Furthermore, a curriculum learning strategy was introduced to improve scalability and robustness as the number of robots increases, followed by a policy-switching mechanism that enables adaptive decision-making without inter-robot communication. Experimental results in a virtual environment show that the proposed framework reduces training time and improves success rate in various types of environments and scenarios. In exploration tasks, the switching model demonstrated the best balance between speed, stability, and fault tolerance. Overall, the findings establish that the integration of reinforcement learning, reward shaping, curriculum learning, and policy switching provides a scalable and robust foundation for multi-robot systems operating in unknown and dynamic environments.

提供机构：

Thammasat University

创建时间：

2026-01-23

5,000+

优质数据集

54 个

任务类型

进入经典数据集