Code underlying the publication: Safe, Efficient, Comfort, and Energy-Saving Automated Driving Through Roundabout Based on Deep Reinforcement Learning

Name: Code underlying the publication: Safe, Efficient, Comfort, and Energy-Saving Automated Driving Through Roundabout Based on Deep Reinforcement Learning
Creator: Yuan, Henan; van Arem, Bart; Kang, Liujiang; Li, Penghui
Published: 2025-02-20 00:00:00
License: 暂无描述

4TU.ResearchData2025-02-20 更新2026-04-23 收录

下载链接：

https://data.4tu.nl/datasets/c1020a3f-0053-491f-8ead-35d18819d37e/1

下载链接

链接失效反馈

官方服务：

资源简介：

This is the code related to the publication:H. Yuan, P. Li, B. Van Arem, L. Kang, H. Farah and Y. Dong, "Safe, Efficient, Comfort, and Energy-Saving Automated Driving Through Roundabout Based on Deep Reinforcement Learning," 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), Bilbao, Spain, 2023, pp. 6074-6079, doi: 10.1109/ITSC57777.2023.10422488. <br>keywords: {Road transportation;Deep learning;Energy consumption;Merging;Reinforcement learning;Safety;Testing},<br>The implementation is based on Python, Stable-Baselines3 (https://stable-baselines3.readthedocs.io/en/master/) and Highway_env simulation environment https://github.com/Farama-Foundation/HighwayEnv.<br>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Traffic scenarios in roundabouts pose substantial complexity for automated driving. Manually mapping all possible scenarios into a state space is labor-intensive and challenging. Deep reinforcement learning (DRL) with its ability to learn from interacting with the environment emerges as a promising solution for training such automated driving models. This study explores, employs, and implements various DRL algorithms, namely Deep Deterministic Policy Gradient (DDPG), Proximal Policy Optimization (PPO), and Trust Region Policy Optimization (TRPO) to instruct automated vehicles' driving through roundabouts. The driving state space, action space, and reward function are designed. The reward function considers safety, efficiency, comfort, and energy consumption to align with real-world requirements. All three tested DRL algorithms succeed in enabling automated vehicles to drive through the roundabout. To holistically evaluate the performance of these algorithms, this study establishes an evaluation methodology considering multiple indicators, i.e., safety, efficiency, comfort and energy consumption level. A method employing the Analytic Hierarchy Process is also developed to weigh these evaluation indicators. Experimental results on various testing scenarios reveal that the TRPO algorithm outperforms DDPG and PPO in terms of safety and efficiency, while PPO performs the best in terms of comfort level and energy consumption. Lastly, to verify the model's adaptability and robustness regarding other driving scenarios, this study also deploys the model trained by TRPO to a range of different testing scenarios, e.g., highway driving and merging. Experimental results demonstrate that the TRPO model trained on only roundabout driving scenarios exhibits a certain degree of proficiency in highway driving and merging scenarios. This study provides a foundation for the application of automated driving with DRL.<br>

本代码关联如下学术成果：H. Yuan、P. Li、B. Van Arem、L. Kang、H. Farah与Y. Dong发表于2023年第26届IEEE智能交通系统国际会议（ITSC）的论文《基于深度强化学习的环岛安全高效舒适节能自动驾驶》，会议举办地为西班牙毕尔巴鄂，2023年，页码6074-6079，DOI: 10.1109/ITSC57777.2023.10422488。关键词：道路运输；深度学习；能耗；合流；强化学习；安全；测试。本实现基于Python、Stable-Baselines3（官方文档：https://stable-baselines3.readthedocs.io/en/master/）以及Highway_env仿真环境（代码仓库：https://github.com/Farama-Foundation/HighwayEnv）。环岛交通场景对自动驾驶而言存在极高复杂度。手动将所有可能场景映射至状态空间不仅耗时费力，且极具挑战。深度强化学习（Deep Reinforcement Learning, DRL）凭借其与环境交互学习的能力，成为训练此类自动驾驶模型的极具前景的解决方案。本研究探索、采用并实现了多种DRL算法，即深度确定性策略梯度（Deep Deterministic Policy Gradient, DDPG）、近端策略优化（Proximal Policy Optimization, PPO）以及置信区域策略优化（Trust Region Policy Optimization, TRPO），用于引导自动驾驶车辆完成环岛通行。研究设计了自动驾驶的状态空间、动作空间与奖励函数。该奖励函数兼顾安全性、通行效率、驾乘舒适性与能耗水平，以贴合真实场景的实际需求。经测试的三种DRL算法均成功实现了自动驾驶车辆的环岛通行。为全面评估这些算法的性能，本研究构建了一套涵盖安全性、通行效率、驾乘舒适性与能耗水平多维度指标的评价体系。同时提出了一种基于层次分析法（Analytic Hierarchy Process, AHP）的方法，用于对各评价指标进行权重赋值。多种测试场景下的实验结果表明，TRPO算法在安全性与通行效率方面优于DDPG与PPO，而PPO则在驾乘舒适性与能耗控制方面表现最优。最后，为验证模型在其他驾驶场景下的适应性与鲁棒性，本研究将TRPO训练得到的模型部署至多种不同的测试场景，例如高速公路行驶与合流场景。实验结果显示，仅基于环岛通行场景训练得到的TRPO模型，在高速公路行驶与合流场景中也具备一定的操作熟练度。本研究为深度强化学习在自动驾驶领域的应用奠定了基础。

提供机构：

Yuan, Henan; van Arem, Bart; Kang, Liujiang; Li, Penghui

创建时间：

2025-02-20

5,000+

优质数据集

54 个

任务类型

进入经典数据集