"Delay-Aware Lyapunov-Enhanced Multi-Agent Soft Actor-Critic Algorithm"

Name: "Delay-Aware Lyapunov-Enhanced Multi-Agent Soft Actor-Critic Algorithm"
Creator: IEEE DataPort
Published: 2025-12-04 14:52:58
License: 暂无描述

DataCite Commons2025-12-04 更新2026-05-03 收录

下载链接：

https://ieee-dataport.org/documents/delay-aware-lyapunov-enhanced-multi-agent-soft-actor-critic-algorithm

下载链接

链接失效反馈

官方服务：

资源简介：

"Ensuring stability in cooperative multi-agent systems remains a principal challenge for multi-agent reinforcement learning, particularly when agents operate over communication networks with delays and external disturbances. This paper proposes the Lyapunov-enhanced Soft Actor-Critic Algorithm with Delay Compensation (LSAC-delay), which integrates Lyapunov stability theory and algebraic graph theory into the maximum-entropy Multi-Agent Reinforcement Learning (MARL) framework. The algorithm defines a consensus-based error over a communication graph and employs delay-aware Lyapunov penalties that adaptively scale based on current communication conditions, enabling fully decentralized execution while maintaining provable stability guarantees. We establish exponential stability in mean square for the ideal case without delays or disturbances. Extending this result, we prove exponential ultimate boundedness for the realistic case with bounded communication delays and external disturbances, providing explicit bounds on the system's ultimate error in terms of delay and disturbance magnitudes. The experimental results show that, compared with the baseline algorithm, LSAC-delay has better performance and robustness in the delay scenario, verifying its effectiveness in secure and reliable cooperative control in the network multi-agent system."

确保协作多智能体系统的稳定性仍是多智能体强化学习的核心挑战，尤其是当智能体运行在存在通信延迟与外部扰动的通信网络中时。本文提出了带延迟补偿的李雅普诺夫增强型软演员-评论家算法（Lyapunov-enhanced Soft Actor-Critic Algorithm with Delay Compensation，简称LSAC-delay），将李雅普诺夫稳定性理论与代数图论融入最大熵多智能体强化学习（Multi-Agent Reinforcement Learning, MARL）框架。该算法定义了通信图上基于一致性的误差项，并采用基于当前通信条件自适应调整的延迟感知李雅普诺夫惩罚项，可实现完全去中心化执行，同时提供可证明的稳定性保证。我们针对无延迟与扰动的理想场景，证明了系统的均方指数稳定性；针对存在有界通信延迟与外部扰动的实际场景，进一步证明了系统的指数最终有界性，并给出了以延迟与扰动幅值为参数的系统最终误差显式界。实验结果表明，相较于基线算法，LSAC-delay在延迟场景下具备更优异的性能与鲁棒性，验证了其在网络化多智能体系统安全可靠协作控制中的有效性。

提供机构：

IEEE DataPort

创建时间：

2025-12-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集