Path Following Method of Six-DOF Fixed-Wing UAV Based on Hierarchical Deep Reinforcement Learning

中国科学数据2026-04-13 更新2026-04-25 收录

下载链接：

https://www.sciengine.com/AA/doi/10.19678/j.issn.1000-3428.0070197

下载链接

链接失效反馈

官方服务：

资源简介：

The path following mechanism of fixed-wing Unmanned Aerial Vehicles (UAVs) is crucial in the UAV domain. In the field of six-Degrees of Freedom (DOF) dynamics, the fixed-wing UAV is presented as a nonlinear system, wherein the high dimensions of its continuous state and action spaces make it challenging to control and guide. A novel hierarchical reinforcement learning framework is proposed to address the complex issues in fixed-wing UAV path following. The basis of this framework is to decompose path following into separate control and guidance problems. For the control problem, a Proximal Policy Optimization with Differential Compensator (PPO-DC) algorithm is introduced by incorporating a differential compensator, which demonstrates a faster convergence speed and control stability. Experimental results reveal that the proposed PPO-DC algorithm improves convergence speed by approximately 2.5 times compared to the standard PPO algorithm and achieves better control accuracy. Moreover, models trained for specific control tasks exhibit strong adaptability when handling other control tasks. For the guidance problem, the fixed-wing UAV guidance is modeled, and an effective guidance strategy is proposed. Additionally, a cumulative reward design is proposed to address the sequential learning of multiple objectives in reinforcement learning tasks, ensuring effective convergence of training. Experimental results show that the proposed hierarchical reinforcement learning framework performs exceptionally well in various complex path-following scenarios, maintaining an average path-following error of less than 20 meters for fixed-wing UAVs.

固定翼无人机（fixed-wing Unmanned Aerial Vehicles, UAVs）的路径跟踪机制在无人机领域至关重要。在六自由度（six-Degrees of Freedom, DOF）动力学范畴内，固定翼无人机被建模为非线性系统，其连续状态与动作空间维度较高，使得控制与导引任务极具挑战性。本文提出一种新颖的分层强化学习（hierarchical reinforcement learning）框架，以解决固定翼无人机路径跟踪中的复杂问题。该框架的核心思路是将路径跟踪任务拆解为独立的控制与导引两个子问题。针对控制子问题，本文引入带微分补偿器的近端策略优化（Proximal Policy Optimization with Differential Compensator, PPO-DC）算法，该算法展现出更快的收敛速度与更优的控制稳定性。实验结果表明，所提出的PPO-DC算法相较于标准近端策略优化（Proximal Policy Optimization, PPO）算法，收敛速度提升约2.5倍，且控制精度更优。此外，针对特定控制任务训练得到的模型，在处理其他控制任务时展现出较强的适应性。针对导引子问题，本文对固定翼无人机导引过程进行建模，并提出一种高效的导引策略。同时，为解决强化学习（reinforcement learning）任务中多目标的序贯学习问题，本文提出累积奖励设计（cumulative reward design）方案，以确保训练过程能够有效收敛。实验结果显示，所提出的分层强化学习框架在各类复杂路径跟踪场景中表现优异，可将固定翼无人机的平均路径跟踪误差控制在20米以内。

创建时间：

2026-04-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集