面向可信自动驾驶决策——一种具有安全保证的鲁棒强化学习方法

工程（英文） ›› 2024, Vol. 33 ›› Issue (2) : 86 -99. DOI: 10.1016/j.eng.2023.10.005

研究论文

作者信息 +

Toward Trustworthy Decision-Making for Autonomous Vehicles: A Robust Reinforcement Learning Approach with Safety Guarantees

Author information +

文章历史 +

Received	Accepted	Published
2022-10-15	2023-10-18
Issue Date
2024-06-12

PDF (2598K)

摘要

尽管自动驾驶汽车是智能交通系统的重要组成部分，但确保自主决策的可信性仍然是实现自动驾驶技术大规模部署的一个重大挑战。因此，我们提出了一种新颖的具有安全保证的鲁棒强化学习方法，以实现自动驾驶汽车的可信决策。该技术能够从策略鲁棒性和碰撞安全性两个方面保证自主决策的可信性。具体地说，通过逼近针对观测状态和环境动态的最优对抗摄动，可以在线学习对手模型，以模拟最坏情况下的不确定性。我们还提出了一种对抗鲁棒演员-评论家算法，使智能体能够学习针对状态观测摄动与环境动态摄动的鲁棒策略。此外，我们设计了一个基于可解释知识模型（即责任敏感安全模型）的安全掩码，保证自动驾驶智能体在训练和测试过程中的碰撞安全性。最后，通过仿真测试与实验验证对所提方法进行了评估。结果表明，基于学习到的鲁棒安全策略，自动驾驶智能体不仅能够实现可信决策，还能显著减少车辆碰撞次数。

Abstract

While autonomous vehicles are vital components of intelligent transportation systems, ensuring the trustworthiness of decision-making remains a substantial challenge in realizing autonomous driving. Therefore, we present a novel robust reinforcement learning approach with safety guarantees to attain trustworthy decision-making for autonomous vehicles. The proposed technique ensures decision trustworthiness in terms of policy robustness and collision safety. Specifically, an adversary model is learned online to simulate the worst-case uncertainty by approximating the optimal adversarial perturbations on the observed states and environmental dynamics. In addition, an adversarial robust actor-critic algorithm is developed to enable the agent to learn robust policies against perturbations in observations and dynamics. Moreover, we devise a safety mask to guarantee the collision safety of the autonomous driving agent during both the training and testing processes using an interpretable knowledge model known as the Responsibility-Sensitive Safety Model. Finally, the proposed approach is evaluated through both simulations and experiments. These results indicate that the autonomous driving agent can make trustworthy decisions and drastically reduce the number of collisions through robust safety policies.