期刊首页 优先出版 当期阅读 过刊浏览 作者中心 关于期刊 English

《工程(英文)》 >> 2023年 第21卷 第2期 doi: 10.1016/j.eng.2022.05.017

人在回路的深度强化学习算法及其在自动驾驶智能决策中的应用

School of Mechanical and Aerospace Engineering, Nanyang Technological University, Singapore 639798, Singapore

收稿日期: 2021-10-09 修回日期: 2022-04-04 录用日期: 2022-05-10 发布日期: 2022-07-20

下一篇 上一篇

摘要

由于机器学习智能和能力有限,它目前仍无法处理各种情况,因此不能在现实应用中完全取代人类。因为人类在复杂场景中表现出稳健性和适应性,所以将人类引入人工智能(AI)的训练回路并利用人类智能进一步提升机器学习算法变得至关重要。本研究开发了一种基于实时人类指导(Hug)的深度强化学习
(DRL)方法,用于端到端自动驾驶案例中的策略训练。通过新设计的人类与自动化之间的控制转移机制,人类能够在模型训练过程中实时干预和纠正智能体的不合理行为。基于这种人在回路的指导机制,本研究开发一种基于修正策略和价值网络的改良的演员-评论家架构(actor-critic architecture)。所提出的Hug-DRL的快速收敛允许实时的人类指导行为融合到智能体的训练回路中,进一步提高了DRL的效率和性能。本研究通过40 名受试者的人在回路实验对开发的方法进行了验证,并与其他最先进的学习方法进行了比较。结果表明,该方法可以在人类指导下有效地提高DRL算法的训练效率和性能,且不特定要求参与者的专业知识或经验。

补充材料

图片

图1

图2

图3

图4

图5

图6

图7

参考文献

[ 1 ] Stilgoe J. Self-driving cars will take a while to get right. Nat Mach Intell 2019;1(5):202‒3. 链接1

[ 2 ] Mo X, Huang Z, Xing Y, Lv C. Multi-agent trajectory prediction with heterogeneous edge-enhanced graph attention network. IEEE Trans Intell Transp Syst. . . 10.1109/tits.2022.3146300

[ 3 ] Huang Z, Wu J, Lv C. Efficient deep reinforcement learning with imitative expert priors for autonomous driving. IEEE Trans Neural Netw Learn Syst. . . 10.1109/tnnls.2022.3142822

[ 4 ] Feng S, Yan X, Sun H, Feng Y, Liu HX. Intelligent driving intelligence test for autonomous vehicles with naturalistic and adversarial environment. Nat Commun 2021;12:748. 链接1

[ 5 ] Codevilla F, Müller M, López A, Koltun V, Dosovitskiy A. End-to-end driving via conditional imitation learning. In: Proceedings of 2018 IEEE International Conference on Robotics and Automation (ICRA); 2018 May 21‒25; Brisbane, QLD, Australia. IEEE; 2018. p. 4693‒700. 链接1

[ 6 ] Huang Z, Wu J, Lv C. Driving behavior modeling using naturalistic human driving data with inverse reinforcement learning. IEEE Trans Intell Transp Syst. . . 10.1109/tits.2021.3088935

[ 7 ] Codevilla F, Santana E, López AM, Gaidon A. Exploring the limitations of behavior cloning for autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV); 2019 Oct 27‒Nov 2; Seoul, Republic of Korea. IEEE; 2019. p. 9329‒38. 链接1

[ 8 ] Ross S, Gordon GJ, Bagnell JA. A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS); 2011 Apr 11‒13; Fort Lauderdale, FL, USA. PMLR; 2011. p. 627‒35.

[ 9 ] Ho J, Ermon S. Generative adversarial imitation learning. In: Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS 2016); 2016 Dec 5‒10; Barcelona, Spain. NIPS; 2016. p. 1‒9.

[10] Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016;529(7587):484‒9. 链接1

[11] Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, et al. Mastering the game of Go without human knowledge. Nature 2017;550(7676):354‒9. 链接1

[12] Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 2018;362(6419):1140‒4. 链接1

[13] Sutton RS, Barto AG. Reinforcement learning: an introduction. 2nd ed. Cambridge: MIT press; 2018.

[14] Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, et al. Human-level control through deep reinforcement learning. Nature 2015;518(7540):529‒33. 链接1

[15] Wolf P, Hubschneider C, Weber M, Bauer A, Härtl J, Dürr F, et al. Learning how to drive in a real world simulation with deep Q-Networks. In: Proceedings of 2017 IEEE Intelligent Vehicles Symposium (IV); 2017 Jun 11‒14; Los Angeles, CA, USA. IEEE; 2017. p. 244‒50. 链接1

[16] Sallab AE, Abdou M, Perot E, Yogamani S. Deep reinforcement learning framework for autonomous driving. Electron Imaging 2017;29:70‒6. 链接1

[17] Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th International Conference on Machine Learning; 2018 Jul 10‒15; Stockholm, Sweden. PMLR; 2018. p. 1861‒70. 链接1

[18] Fujimoto S, van Hoof H, Meger D. Addressing function approximation error in actor-critic methods. In: Proceedings of the 35th International Conference on Machine Learning;2018Jul10‒15;Stockholm,Sweden.PMLR;2018.p.1587‒96.

[19] Cai P, Mei X, Tai L, Sun Y, Liu M. High-speed autonomous drifting with deep reinforcement learning. IEEE Robot Autom Lett 2020;5(2):1247‒54. 链接1

[20] Neftci EO, Averbeck BB. Reinforcement learning in artificial and biological systems. Nat Mach Intell 2019;1(3):133‒43. 链接1

[21] Harutyunyan A, Dabney W, Mesnard T, Azar MG, Piot B, Heess N, et al. Hindsight credit assignment. In: Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019); 2019 Dec 9‒14; Vancouver, BC, Canada. NeurIPS; 2019. p. 12498‒507.

[22] Huang Z, Lv C, Xing Y, Wu J. Multi-modal sensor fusion-based deep neural network for end-to-end autonomous driving with scene understanding. IEEE Sens J 2021;21(10):11781‒90. 链接1

[23] Lv C, Cao D, Zhao Y, Auger DJ, Sullman M, Wang H, et al. Analysis of autopilot disengagements occurring during autonomous vehicle testing. IEEE/CAA J Autom Sin 2018;5(1):58‒68. 链接1

[24] Mao J, Gan C, Kohli P, Tenenbaum JB, Wu J. The neuro-symbolic concept learner: interpreting scenes, words, and sentences from natural supervision. In: Proceedings of the 7th International Conference on Learning Representations (ICLR); 2019 May 6‒9; New Orleans, LA, USA. ICLR; 2019. p. 1‒28.

[25] Knox WB, Stone P. Reinforcement learning from human reward: discounting in episodic tasks. In: Proceedings of 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication; 2012 Sep 9‒13; Paris, France. IEEE; 2012. p. 878‒85. 链接1

[26] MacGlashan J, Ho MK, Loftin R, Peng B, Wang G, Roberts DL, et al. Interactive learning from policy-dependent human feedback. In: Proceedings of the 34th International Conference on Machine Learning; 2017 Aug 6‒11; Sydney, NSW, Australia. PMLR; 2017. p. 2285‒94.

[27] Vecerik M, Hester T, Scholz J, Wang F, Pietquin O, Piot B, et al. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. 2017. arXiv:1707.08817.

[28] Rajeswaran A, Kumar V, Gupta A, Vezzani G, Schulman J, Todorov E, et al. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. In: Proceedings of Robotics: Science and Systems; 2018 Jun 26‒30; Pittsburgh, PA, USA. RSS; 2018. p. 1‒9. 链接1

[29] Ibarz B, Leike J, Pohlen T, Irving G, Legg S, Amodei D. Reward learning from human preferences and demonstrations in Atari. In: Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS); 2018 Dec 3‒8; Montreal, QC, Canada. NeurIPS; 2018. p. 8011‒23.

[30] Ziebart BD, Maas A, Bagnell JA, Dey AK. Maximum entropy inverse reinforcement learning. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence; 2008 Jul 13‒17; Chicago, IL, USA. AAAI Press; 2008. p. 1433‒8.

[31] Hester T, Vecerik M, Pietquin O, Lanctot M, Schaul T, Piot B, et al. Deep Qlearning from demonstrations. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence; 2018 Feb 2‒7; New Orleans, LA, USA. AAAI Press; 2018. p. 3223‒30. 链接1

[32] Saunders W, Sastry G, Stuhlmüller A, Evans O. Trial without error: towards safe reinforcement learning via human intervention. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems; 2018 Jul 10‒15; Stockholm, Sweden. AAMAS; 2018. p. 2067‒9.

[33] Krening S, Harrison B, Feigh KM, Isbell CL, Riedl M, Thomaz A. Learning from explanations using sentiment and advice in RL. IEEE Trans Cogn Dev Syst 2017;9(1):44‒55. 链接1

[34] Nair A, McGrew B, Andrychowicz M, Zaremba W, Abbeel P. Overcoming exploration in reinforcement learning with demonstrations. In: Proceedings of 2018 IEEE International Conference on Robotics and Automation (ICRA); 2018 May 21‒25; Brisbane, QLD, Australia. IEEE; 2018. p. 6292‒9. 链接1

[35] Wang F, Zhou B, Chen K, Fan T, Zhang X, Li J, et al. Intervention aided reinforcement learning for safe and practical policy optimization in navigation. In: Proceedings of the 2nd Conference on Robot Learning; 2018 Oct 29‒31; Zürich, Switzerland. PMLR; 2018. p. 410‒21.

[36] Littman ML. Reinforcement learning improves behaviour from evaluative feedback. Nature 2015;521(7553):445‒51. 链接1

[37] Droździel P, Tarkowski S, Rybicka I, Wrona R. Drivers’ reaction time research in the conditions in the real traffic. Open Eng 2020;10(1):35‒47. 链接1

[38] Hu Z, Zhang Y, Xing Y, Zhao Y, Cao D, Lv C. Toward human-centered automated driving: a novel spatiotemporal vision transformer-enabled head tracker. IEEE Veh Technol Mag. . . 10.1109/mvt.2021.3140047

[39] Machado MC, Bellemare MG, Bowling M. Count-based exploration with the successor representation. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence; 2020 Feb 7‒12; New York City, NY, USA. AAAI Press; 2020. p. 5125‒33. 链接1

[40] Badia AP, Sprechmann P, Vitvitskyi A, Guo D, Piot B, Kapturowski S, et al. Never give up: learning directed exploration strategies. In: Proceedings of the 8th International Conference on Learning Representations (ICLR 2020); 2020 Apr 26‒May 1; Addis Ababa, Ethiopia. ICLR; 2020. p. 1‒26.

相关研究