全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于改进的深度强化学习策略的交通信号控制
Traffic Signal Control Based on Improved Deep Reinforcement Learning Strategy

DOI: 10.12677/MOS.2024.131014, PP. 136-150

Keywords: 深度强化学习,交通信号控制,SUMO,智能交通,机器学习
Deep Reinforcement Learning
, Traffic Signal Control, SUMO, Intelligent Transportation, Machine Learning

Full-Text   Cite this paper   Add to My Lib

Abstract:

交叉口的交通信号控制是治理交通拥堵的重要组成部分,而现有的交通信号大多采用循环控制,效率低下且会造成长时间的车辆延迟和能量浪费。针对此问题,采用深度强化学习算法与环境之间进行互动来学习最佳策略。具体地,在智能体学习的初始阶段,创建了一个动作价值评估网络,以增加智能体的学习经验,帮助智能体更快的掌握缓解交通拥堵的技能。提出的模型基于双决斗深度Q网络(Double Dueling Deep Q-Network, 3DQN)算法,车辆的位置信息作为模型的输入,交叉口的四种相位为动作空间,执行动作前后的累积等待时间差被定义为奖励。在城市交通模拟器(Simulation Of Urban Mobility, SUMO)中对模型进行评估。实验结果表明,提出的模型在累积奖励方面相较于DQN、Double DQN、Dueling DQN、3DQN分别增加了58.9%、51.9%、51.3%、48%,证明改进的学习策略可以有效地提升各项交通指标。
Traffic signal control at intersections plays a crucial role in managing traffic congestion. However, the conventional cycle control used in existing traffic signals is inefficient and often leads to signifi-cant vehicle delays and energy wastage. To address this issue, a deep reinforcement learning algo-rithm was employed to interact with the environment and learn the optimal control strategy. In the initial stages of the agent’s learning, an action-value evaluation network was established to enhance the agent’s learning experience and facilitate the rapid acquisition of skills for mitigating traffic congestion. The proposed model was based on the double dueling deep Q-Network (3DQN) algo-rithm, utilizing vehicle position information as input and the four phases of the intersection as the action space. The reward was defined as the difference in cumulative waiting time before and after executing an action. The model’s performance was evaluated using the simulation of urban mobility (SUMO) city traffic simulator. Experimental results demonstrated that the proposed model achieves a substantial increase in cumulative rewards, surpassing DQN, double DQN, dueling DQN, and 3DQN by 58.9%, 51.9%, 51.3%, and 48%, respectively. These findings validated the effectiveness of the improved learning strategy in enhancing various traffic indicators.

References

[1]  Webster, F.V. (1958) Traffic Signal Settings.
https://trid.trb.org/view/113579
[2]  Vincent, R.A. and Peirce, J.R. (1988) “MOVA”: Traffic Responsive, Self-Optimising Signal Control for Isolated Intersections.
https://trid.trb.org/view/295257
[3]  Kronborg, P. and Davidsson, F. (1993) MOVA and LHOVRA: Traffic Signal Control for Isolated Intersections. Traffic Engineering and Control, 34, 195-200.
[4]  Kronborg, P. and Davidsson, F. (1996) Development and Field Trials of the New SOS Algorithm for Optimising Signal Control at Isolated Intersections. IEE Conference Publication, 42, 80-84.
https://doi.org/10.1049/cp:19960295
[5]  Sims, A.G. (1979) The Sydney Coordinated Adaptive Traffic System. Engineering Foundation Conference on Research Directions in Computer Control of Urban Traffic Systems, Pacific Grove, 1979, 12-27.
[6]  Hunt, P.B., Robertson, D.I., Bretherton, R.D., et al. (1981) SCOOT-A Traffic Responsive Method of Coordinating Sig-nals.
https://trid.trb.org/view/179439
[7]  Wang, S., Xie, X., Huang, K., et al. (2019) Deep Reinforcement Learning-Based Traf-fic Signal Control Using High-Resolution Event-Based Data. Entropy, 21, 744.
https://doi.org/10.3390/e21080744
[8]  Luo, J., Li, X. and Zheng, Y. Researches on Intelligent Traffic Signal Control Based on Deep Reinforcement Learning. 2020 16th International Conference on Mobility, Sensing and Networking (MSN). Tokyo, 17-19 December 2020, 729-734.
https://doi.org/10.1109/MSN50589.2020.00124
[9]  唐慕尧, 周大可, 李涛. 结合状态预测的深度强化学习交通信号控制[J]. 计算机应用研究, 2022, 39(8): 2311-2315.
https://doi.org/10.19734/j.issn.1001-3695.2021.12.0704
[10]  任安妮, 周大可, 冯锦浩, 等. 基于注意力机制的深度强化学习交通信号控制[J]. 计算机应用研究, 2023, 40(2): 430-434.
https://doi.org/10.19734/j.issn.1001-3695.2022.06.0334
[11]  Wang, X., Wang, S., Liang, X., et al. (2022) Deep Reinforcement Learning: A Survey. IEEE Transactions on Neural Networks and Learning Systems, 33, 1-15.
https://doi.org/10.1109/TNNLS.2022.3207346
[12]  Mnih, V., Kavukcuoglu, K., Silver, D., et al. (2015) Human-Level Control through Deep Reinforcement Learning. Nature, 518, 529-533.
https://doi.org/10.1038/nature14236
[13]  Van Hasselt, H., Guez, A. and Silver, D. (2016) Deep Reinforcement Learning with Double Q-Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 30, 2094-2100.
https://doi.org/10.1609/aaai.v30i1.10295
[14]  Hasselt, H. (2010) Double Q-Learning. Advances in Neural Information Processing Systems, 23, 2613-2621.
[15]  Wang, Z., Schaul, T., Hessel, M., et al. (2016) Dueling Network Archi-tectures for Deep Reinforcement Learning. ICML'16: Proceedings of the 33rd International Conference on International Conference on Machine Learning, New York, 19-24 June 2016, 1995-2003.
[16]  刘智敏, 叶宝林, 朱耀东, 等. 基于深度强化学习的交通信号控制方法[J]. 浙江大学学报(工学版), 2022, 56(6): 1249-1256.
[17]  Mousavi, S.S., Schukat, M. and Howley, E. (2017) Traffic Light Control Using Deep Policy-Gradient and Value-Function-Based Reinforcement Learning. IET Intelligent Transport Systems, 11, 417-423.
https://doi.org/10.1049/iet-its.2017.0153
[18]  Haji, S.H. and Abdulazeez, A.M. (2021) Comparison of Optimization Techniques Based on Gradient Descent Algorithm: A Review. PalArch’s Journal of Archaeology of Egypt/Egyptology, 18, 2715-2743.
[19]  Monga, R. and Mehta, D. (2022) Sumo (Simulation of Urban Mobility) and OSM (Open Street Map) Implementation. 2022 11th International Conference on System Modeling & Advancement in Research Trends (SMART), Moradabad, 16-17 December 2022, 534-538.
https://doi.org/10.1109/SMART55829.2022.10046720

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413