面向可信自动驾驶决策——一种具有安全保证的鲁棒强化学习方法

Xiangkun He; Wenhui Huang; Chen Lv

doi:10.1016/j.eng.2023.10.005

PDF(884 KB)

工程（英文） ›› 2024, Vol. 33 ›› Issue (2) : 77-89. DOI: 10.1016/j.eng.2023.10.005

研究论文

面向可信自动驾驶决策——一种具有安全保证的鲁棒强化学习方法

作者信息 +

Toward Trustworthy Decision-Making for Autonomous Vehicles: A Robust Reinforcement Learning Approach with Safety Guarantees

Author information +

History +

Abstract

While autonomous vehicles are vital components of intelligent transportation systems, ensuring the trustworthiness of decision-making remains a substantial challenge in realizing autonomous driving. Therefore, we present a novel robust reinforcement learning approach with safety guarantees to attain trustworthy decision-making for autonomous vehicles. The proposed technique ensures decision trustworthiness in terms of policy robustness and collision safety. Specifically, an adversary model is learned online to simulate the worst-case uncertainty by approximating the optimal adversarial perturbations on the observed states and environmental dynamics. In addition, an adversarial robust actor-critic algorithm is developed to enable the agent to learn robust policies against perturbations in observations and dynamics. Moreover, we devise a safety mask to guarantee the collision safety of the autonomous driving agent during both the training and testing processes using an interpretable knowledge model known as the Responsibility-Sensitive Safety Model. Finally, the proposed approach is evaluated through both simulations and experiments. These results indicate that the autonomous driving agent can make trustworthy decisions and drastically reduce the number of collisions through robust safety policies.

Keywords

Autonomous vehicle / Decision-making / Reinforcement learning / Adversarial attack / Safety guarantee

引用本文

EndNote

Ris (Procite)

Bibtex

导出引用

Xiangkun He, Wenhui Huang, Chen Lv. . Engineering. 2024, 33(2): 77-89 https://doi.org/10.1016/j.eng.2023.10.005

参考文献

原文顺序 | 文献年度倒序 | 文中引用次数倒序

[1]	B. Yang, X. Cao, K. Xiong, C. Yuen, Y.L. Guan, S. Leng, et al. Edge intelligence for autonomous driving in 6G wireless system: design challenges and solutions. IEEE Wireless Commun, 28 (2) (2021), pp. 40-47
[2]	Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. In: Von Luxburg U, Guyon I, Bengio S, Wallach H, Fergus R, editors. NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems; 2017 Dec 4-9; Long Beach, CA, USA. New York City: Curran Associates Inc.; 2017. p. 6000-10.
[3]	J. Wang, H. Huang, K. Li, J. Li. Towards the unified principles for level 5 autonomous vehicles. Engineering, 7 (9) (2021), pp. 1313-1325
[4]	M.B. Mollah, J. Zhao, D. Niyato, Y.L. Guan, C. Yuen, S. Sun, et al. Blockchain for the internet of vehicles towards intelligent transportation systems: a survey. IEEE Internet Things J, 8 (6) (2021), pp. 4157-4185
[5]	Li J, Shao W, Wang H. Key challenges and Chinese solutions for SOTIF in intelligent connected vehicles. Engineering 2023 ;31(12):27-30.
[6]	S. Feng, X. Yan, H. Sun, Y. Feng, H.X. Liu. Intelligent driving intelligence test for autonomous vehicles with naturalistic and adversarial environment. Nat Commun, 12 (1) (2021), p. e748
[7]	J. Liu, Y. Luo, Z. Zhong, K. Li, H. Huang, H. Xiong. A probabilistic architecture of long-term vehicle trajectory prediction for autonomous driving. Engineering, 19(12) (2022), pp. 228-239
[8]	X. He, J. Wu, Z. Huang, Z. Hu, J. Wang, A. Sangiovanni-Vincentelli, et al. Fear-neuro-inspired reinforcement learning for safe autonomous driving. IEEE Trans Pattern Anal Mach Intell (2023 Oct:), pp. 1-13
[9]	Yuan K, Huang Y, Yang S, Zhou Z, Wang Y, Cao D, et al. Evolutionary decisionmaking and planning for autonomous driving based on safe and rational exploration and exploitation. Engineering. In press.
[10]	W. Huang, Y. Zhou, X. He, C. Lv. Goal-guided transformer-enabled reinforcement learning for efficient autonomous navigation. IEEE Trans Intell Transp Syst (2023 Sep), pp.1-14
[11]	Y. Zhang, C. Li, T.H. Luan, C. Yuen, Y. Fu. Collaborative driving: learning-aided joint topology formulation and beamforming. IEEE Veh Technol Mag, 17 (2) (2022), pp. 103-111
[12]	J. Wu, Z. Huang, Z. Hu, C. Lv. Toward human-in-the-loop AI: enhancing deep reinforcement learning via real-time human guidance for autonomous driving. Engineering, 21(2) (2023), pp. 75-91
[13]	H. Wang, A. Khajepour, D. Cao, T. Liu. Ethical decision making in autonomous vehicles: challenges and research progress. IEEE Intell Transp Syst Mag, 14 (1) (2022), pp. 6-17
[14]	X. He, C. Lv. Toward personalized decision making for autonomous vehicles: a constrained multi-objective reinforcement learning technique. Transp Res Part C Emerging Technol, 156 (2023), Article 104352
[15]	X. Tang, K. Yang, H. Wang, J. Wu, Y. Qin, W. Yu, et al. Prediction-uncertainty-aware decision-making for autonomous vehicles. IEEE Trans Intell Veh, 7 (4) (2022), pp. 849-862
[16]	J. Liu, H. Wang, L. Peng, Z. Cao, D. Yang, J. Li. PNNUAD: perception neural networks uncertainty aware decision-making for autonomous vehicle. IEEE Trans Intell Transp Syst, 23 (12) (2022), pp. 24355-24368
[17]	G. Li, Y. Qiu, Y. Yang, Z. Li, S. Li, W. Chu, et al. Lane change strategies for autonomous vehicles: a deep reinforcement learning approach based on transformer. IEEE Trans Intell Veh, 8 (3) (2023), pp. 2197-2211
[18]	C. Urmson, J. Anhalt, D. Bagnell, C. Baker, R. Bittner, M.N. Clark, et al. Autonomous driving in urban environments: boss and the urban challenge. J Field Rob, 25 (8) (2008), pp. 425-466
[19]	M. Montemerlo, J. Becker, S. Bhat, H. Dahlkamp, D. Dolgov, S. Ettinger, et al. Junior: the Stanford entry in the urban challenge. J Field Rob, 25 (9) (2008), pp. 569-597
[20]	V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, et al. Human-level control through deep reinforcement learning. Nature, 518 (7540) (2015), pp. 529-533
[21]	O. Vinyals, I. Babuschkin, W.M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575 (7782) (2019), pp. 350-354
[22]	X. He, H. Chen, C. Lv. Robust multiagent reinforcement learning toward coordinated decision-making of automated vehicles. SAE Int J Veh Dyn Stab NVH, 7 (4) (2023), p. 2023
[23]	N.Q. Hieu, D.T. Hoang, D. Niyato, P. Wang, D.I. Kim, C. Yuen. Transferable deep reinforcement learning framework for autonomous vehicles with joint radar-data communications. IEEE Trans Commun, 70 (8) (2022), pp. 5164-5180
[24]	J. Duan, S.E. Li, Y. Guan, Q. Sun, B. Cheng. Hierarchical reinforcement learning for self-driving decision-making without reliance on labelled driving data. IET Intell Transp Syst, 14 (5) (2020), pp. 297-305
[25]	B.R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A.A. Al Sallab, S. Yogamani, et al. Deep reinforcement learning for autonomous driving: a survey. IEEE Trans Intell Transp Syst, 23 (6) (2022), pp. 4909-4926
[26]	Ye F, Wang P, Chan CY, Zhang J. Meta reinforcement learning-based lane change strategy for autonomous vehicles. In:Proceedings of 2021 IEEE Intelligent Vehicles Symposium (IV); 2021 Jul 11- 17 ; Nagoya, Japan. Piscataway: IEEE; 2021. p. 223-30.
[27]	G. Wang, J. Hu, Z. Li, L. Li. Harmonious lane changing via deep reinforcement learning. IEEE Trans Intell Transp Syst, 23 (5) (2022), pp. 4642-4650
[28]	G. Li, Y. Yang, S. Li, X. Qu, N. Lyu, S.E. Li. Decision making of autonomous vehicles in lane change scenarios: deep reinforcement learning approaches with risk awareness. Transp Res Part C, 134 (2022), p. e103452
[29]	Mirchevska B, Pek C, Werling M, Althoff M, Boedecker J. High-level decision making for safe and reasonable autonomous lane changing using reinforcement learning. In:Proceedings of 2018 21st International Conference on Intelligent Transportation Systems; 2018 Nov 4-7; Maui, HI, USA. Piscataway: IEEE; 2018. p. 2156-62.
[30]	Lubars J, Gupta H, Chinchali S, Li L, Raja A, Srikant R, et al. Combining reinforcement learning with model predictive control for on-ramp merging. In: Proceedings of 2021 IEEE International Intelligent Transportation Systems Conference; 2021 Sep 19-22; Indianapolis, IN, USA. Piscataway: IEEE; 2021. p. 942-7.
[31]	H. Wang, H. Gao, S. Yuan, H. Zhao, K. Wang, X. Wang, et al. Interpretable decision-making for autonomous vehicles at highway on-ramps with latent space reinforcement learning. IEEE Trans Veh Technol, 70 (9) (2021), pp. 8707-8719
[32]	Bouton M, Nakhaei A, Fujimura K, Kochenderfer MJ. Cooperation-aware reinforcement learning for merging in dense traffic. In: Proceedings of 2019 IEEE Intelligent Transportation Systems Conference; 2019 Oct 27-30; Auckland, New Zealand. Piscataway: IEEE; 2019. p. 3441-7.
[33]	Qiao Z, Tyree Z, Mudalige P, Schneider J, Dolan JM. Hierarchical reinforcement learning method for autonomous vehicle behavior planning. In:Proceedings of 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems; 2020 Oct 24-2021 Jan 24; Las Vegas, NV, USA. Piscataway: IEEE; 2021. p. 6084-9.
[34]	X. He, B. Lou, H. Yang, C. Lv. Robust decision making for autonomous vehicles at highway on-ramps: a constrained adversarial reinforcement learning approach. IEEE Trans Intell Transp Syst, 24 (4) (2023), pp. 4103-4113
[35]	C.J. Hoel, K. Driggs-Campbell, K. Wolff, L. Laine, M.J. Kochenderfer. Combining planning and deep reinforcement learning in tactical decision making for autonomous driving. IEEE Trans Intell Veh, 5 (2) (2020), pp. 294-305
[36]	Y. Zhang, B. Gao, L. Guo, H. Guo, H. Chen. Adaptive decision-making for automated vehicles under roundabout scenarios using optimization embedded reinforcement learning. IEEE Trans Neural Networks Learn Syst, 32 (12) (2021), pp. 5526-5538
[37]	X. He, C. Lv. Toward intelligent connected e-mobility: energy-aware cooperative driving with deep multiagent reinforcement learning. IEEE Veh Technol Mag, 18 (3) (2023), pp. 101-109
[38]	Nageshrao S, Tseng HE, Filev D. Autonomous highway driving using deep reinforcement learning. In:Proceedings of 2019 IEEE International Conference on Systems, Man and Cybernetics; 2019 Oct 6- 9 ; Bari, Italy. Piscataway: IEEE; 2019. p. 2326-31.
[39]	B. Gangopadhyay, H. Soora, P. Dasgupta. Hierarchical program-triggered reinforcement learning agents for automated driving. IEEE Trans Intell Transp Syst, 23 (8) (2022), pp. 10902-10911
[40]	Z. Cao, S. Xu, X. Jiao, H. Peng, D. Yang. Trustworthy safety improvement for autonomous driving using reinforcement learning. Transp Res Part C, 138 (2022), Article 103656
[41]	Shalev-Shwartz S, Shammah S, Shashua A. On a formal model of safe and scalable self-driving cars. 2017. arXiv:1708.06374.
[42]	Shalev-Shwartz S, Shammah S, Shashua A. Vision zero: can roadway accidents be eliminated without compromising traffic throughput? 2018. arXiv:1901.05022.
[43]	Lopez PA, Behrisch M, Bieker-Walz L, Erdmann J, Flötteröd YP, Hilbrich R, et al. Microscopic traffic simulation using SUMO. In:Proceedings of 2018 21st International Conference on Intelligent Transportation Systems; 2018 Nov 4-7 ; Maui, HI, USA. Piscataway: IEEE; 2018. p. 2575-82.
[44]	J. Lin. Divergence measures based on the Shannon entropy. IEEE Trans Inf Theory, 37 (1) (1991), pp. 145-151
[45]	Huszár F. How (not) to train your generative model: scheduled sampling, likelihood, adversary? 2015.arXiv:1511.05101.
[46]	W. Huang, C. Zhang, J. Wu, X. He, J. Zhang, C. Lv. Sampling efficient deep reinforcement learning through preference-guided stochastic exploration. IEEE Trans Neural Networks Learn Syst (2023 Oct), pp.1-12
[47]	A.J. Hoffman, R.M. Karp. On nonterminating stochastic games. Manage Sci, 12 (5) (1966), pp. 359-370
[48]	T.D. Hansen, P.B. Miltersen, U. Zwick. Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor. J ACM, 60 (1) (2013), pp. 1-16
[49]	V. Mazalov. Mathematical game theory and applications. John Wiley & Sons Ltd, Chichester (2014)
[50]	Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th International Conference on Machine Learning; 2018. p. 1861-70.
[51]	Bae I, Moon J, Jhung J, Suk H, Kim T, Park H, et al. Self-driving like a human driver instead of a robocar: personalized comfortable driving experience for autonomous vehicles. 2020. arXiv:2001.03908.
[52]	Wang Z, Schaul T, Hessel M, van Hasselt H, Lanctot M, Dueling network architectures for deep reinforcement learning. In: Balcan MF, Weinberger KQ, editors. ICML'16: Proceedings of the 33rd International Conference on International Conference on Machine Learning—Volume 48; 2016 Jun 19-24; New York City, NY, USA. JMLR.org; 2016. p. 1995-2003.
[53]	Hessel M, Modayil J, van Hasselt H, Schaul T, Ostrovski G, Dabney W, et al. Rainbow:combining improvements in deep reinforcement learning. In: McIlraith SA, Weinberger KQ, editors. AAAI'18/IAAI'18/EAAI'18:Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence; 2018 Feb 2-7; New Orleans, LA, USA. Palo Alto: AAAI Press; 2018. p. 3215-22.
[54]	Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. 2017. arXiv:1707.06347.
[55]	Christodoulou P. Soft actor-critic for discrete action settings. 2019. arXiv:1910.07207.
[56]	X. He, H. Yang, Z. Hu, C. Lv. Robust lane change decision making for autonomous vehicles: an observation adversarial reinforcement learning approach. IEEE Trans Intell Veh, 8 (1) (2023), pp. 184-193