Embodied Interactive Intelligence Towards Autonomous Driving

Nan Ma , Jia Pan , Yongjin Liu , Yajue Yang , Yiheng Han , Jiacheng Guo , Zhixuan Wu , Zecheng Yang , Zhiwei Yang , Deyi Li

Engineering ›› : 202509032

PDF (4824KB)
Engineering ›› :202509032 DOI: 10.1016/j.eng.2025.09.032
Research
research-article
Embodied Interactive Intelligence Towards Autonomous Driving
Author information +
History +
PDF (4824KB)

Abstract

Autonomous driving depends on successful interactions among humans, vehicles, and roads. However, people often lack an understanding of autonomous vehicle (AV) behaviours and decisions. Moreover, AVs have difficulty aligning with human intentions in their interactions. To overcome the obstacles associated with the absence of interactive intelligence, especially in complex and uncertain environments, we introduce the concept of embodied interactive intelligence towards autonomous driving (EIIAD), which establishes representation and learning methods aligned with the physical world, enhancing human-machine integration. Building on this concept, we propose an end-to-end unified constrained vehicle environment interaction (UniCVE) model, which involves the construction of an end-to-end perception-cognition-behaviour closed-loop feedback paradigm and continuous learning through accumulated split driving scenarios. This model realizes interaction cognition through networks designed for pedestrians and vehicles, and it unifies the cognition as a value network of AVs to generate socially compatible behaviours. The UniCVE model is implemented on Dongfeng autonomous buses, which have successfully travelled 22 thousand kilometres and completed 45 thousand navigation tasks in Xiong’an New Area, China, demonstrating its general applicability in various driving scenarios. In addition, we highlight the high-level interactive intelligence of the UniCVE model in selected simulated complex interaction scenarios, demonstrating that it makes AVs more intelligent, more reliable, and more attuned to human relationships. Furthermore, the UniCVE model’s capacity for self-learning and self-growth allows it to infinitely approximate true intelligence, even with limited experience.

Keywords

Embodied interactive intelligence / Autonomous driving / Cognition behaviour / Continuous learning / Hypergraph learning

Cite this article

Download citation ▾
Nan Ma, Jia Pan, Yongjin Liu, Yajue Yang, Yiheng Han, Jiacheng Guo, Zhixuan Wu, Zecheng Yang, Zhiwei Yang, Deyi Li. Embodied Interactive Intelligence Towards Autonomous Driving. Engineering 202509032 DOI:10.1016/j.eng.2025.09.032

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Crosato L, Tian K, Shum HPH, Ho ESL, Wang Y, Wei C. Social interaction-aware dynamical models and decision-making for autonomous vehicles. Adv Intell Syst 2024; 6(3):2300575.

[2]

Ettinger S, Cheng S, Caine B, Liu C, Zhao H, Pradhan S, et al. Large scale interactive motion forecasting for autonomous driving:the Waymo open motion dataset. In: Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV); 2021 Oct 10-17; Montreal, QC, Canada. Piscataway: IEEE; 2021. p. 9690-9.

[3]

Huang Z, Liu H, Lv C. Gameformer:game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving. In: Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV); 2023 Oct 1-6; Paris, France. Piscataway: IEEE; 2023. p. 3880-90.

[4]

Wang WS, Wang L, Zhang C, Liu C, Sun L. Social interactions for autonomous driving: a review and perspectives. Found Trends Robotics 2022;10:198-376.

[5]

Luo W, Park C, Cornman A, Sapp B, Anguelov D. JFP:Joint future prediction with interactive multi-agent modeling for autonomous driving. In: Proceedings of the 6th Conference on Robot Learning; 2022 Dec 14-18; Auckland, New Zealand. New York City: PLMR; 2022. p. 1457-67.

[6]

Xia C, Xing M, He S. Interactive planning for autonomous driving in intersection scenarios without traffic signs. IEEE Trans Intell Transp Syst 2022; 23(12):24818-28.

[7]

Floreano D, Mondada F, Perez-Uribe A, Roggen D. Evolution of embodied intelligence. In: Iida F, Pfeifer R, Steels L, Kuniyoshi Y, editors. Embodied artificial intelligence Berlin: Springer; 2004. p. 291-311.

[8]

Roy N, Posner I, Barfoot T, Beaudoin P, Bengio Y, Bohg J, et al. From machine learning to robotics: challenges and opportunities for embodied intelligence. 2021. arXiv:2110.15245.

[9]

Howard D, Eiben AE, Kennedy DF, Mouret JB, Valencia P, Winkler D. Evolving embodied intelligence from materials to machines. Nat Mach Intell 2019; 1 (1):12-9.

[10]

Gupta A, Savarese S, Ganguli S, Li F. Embodied intelligence via learning and evolution. Nat commun 2021;12:5721.

[11]

Mengaldo G, Renda F, Brunton SL, Bächer M, Calisti M, Duriez C, et al. A concise guide to modelling the physics of embodied intelligence in soft robotics. Nat Rev Phys 2022; 4(9):595-610.

[12]

Cross ES, Ramsey R. Mind meets machine: towards a cognitive science of human-machine interactions. Trends Cogn Sci 2021; 25(3):200-12.

[13]

Hoc JM. Towards a cognitive approach to human-machine cooperation in dynamic situations. Int J Hum-Comput Stud 2001; 54(4):509-40.

[14]

Chougule A, Chamola V, Sam A, Yu FR, Sikdar B. A Comprehensive review on limitations of autonomous driving and its impact on accidents and collisions. IEEE Open J Veh Technol 2023;5:142-61.

[15]

Ryan C, Murphy F, Mullins M. End-to-end autonomous driving risk analysis: a behavioural anomaly detection approach. IEEE Trans Intell Transp Syst 2021; 22(3):1650-62.

[16]

Chia WMD, Keoh SL, Goh C, Johnson C. Risk assessment methodologies for autonomous driving: a survey. IEEE Trans Intell Transp Syst 2022; 23 (10):16923-39.

[17]

Cao Z, Jiang K, Zhou W, Xu S, Peng H, Yang D. Continuous improvement of self-driving cars using dynamic confidence-aware reinforcement learning. Nat Mach Intell 2023; 5(2):145-58.

[18]

Zhuang H, Fang D, Tong K, Liu Y, Zeng Z, Zhou X, et al. Online analytic exemplar-free continual learning with large models for imbalanced autonomous driving task. IEEE Trans Vehicular Technol 2024; 74(2):1949-58.

[19]

Niu H, Xu Y, Jiang X, Hu J. Continual driving policy optimization with closed-loop individualized curricula. In: Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA); 2024 May 13-17; Yokohama, Japan. Piscataway: IEEE; 2024. p. 6850-7.

[20]

Li D. Cognitive physics—the enlightenment by Schrödinger, turing, and wiener and beyond. Intell Comput 2023;2:0009.

[21]

Hafner D, Lillicrap T, Ba J, Norouzi M. Dream to control: learning behaviors by latent imagination. 2019. arXiv:1912.01603.

[22]

Zhu W, Hayashibe M. Autonomous navigation system in pedestrian scenarios using a dreamer-based motion planner. IEEE Robot Autom Lett 2023; 8 (6):3836-43.

[23]

Gao Y, Zhang Q, Ding DW, Zhao D. Dream to drive with predictive individual world model. IEEE Trans Intell Veh 2024; 9(12):8224-38.

[24]

Shao H, Hu Y, Wang L, Song G, Waslander SL, Liu Y, et al. Lmdrive:closed-loop end-to-end driving with large language models. In: Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2024 Jun 16-22; Seattle, WA, USA. Piscataway: IEEE; 2024. p. 15120-30.

[25]

Xu Z, Zhang Y, Xie E, Zhao Z, Guo Y, Wong KYK, et al. Drivegpt4: interpretable end-to-end autonomous driving via large language model. IEEE Robot Autom Lett 2024; 9(10):8186-93.

[26]

Fu D, Li X, Wen L, Dou M, Cai P, Shi B, et al. Drive like a human:rethinking autonomous driving with large language models. In: Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW); 2024 Jan 1-6; Waikoloa, HI, USA. Piscataway: IEEE; 2024. p. 910-9.

[27]

Ma N, Wu Z, Feng Y, Wang C, Gao Y. Multi-view time-series hypergraph neural network for action recognition. IEEE Trans Image Process 2024;33:3301-13.

[28]

Ma N. Cross-modal human behavior dataset for autonomous driving (CMHBD-AD) [Internet]. Beijing: e Intelligent Interaction Team. 2024 [cited 2025 Sep 19]. Available from: http://www.mananlab.tech/Cross-Modal-Human-Behavior. Chinese.

[29]

Shahroudy A, Liu J, Ng T, Wang G. NTU RGB+D:a large scale dataset for 3d human activity analysis. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016 Jun 27-30; Las Vegas, NV, USA. Piscataway: IEEE; 2016. p. 1010-9.

[30]

Chen Z, Li S, Yang B, Li Q, Liu H. Multi-scale spatial temporal graph convolutional network for skeleton-based action recognition. Proc Conf AAAI Artif Intell 2021; 35(2):1113-22.

[31]

Chen Y, Li Y, Zhang C, Zhou H, Luo Y, Hu C. Informed patch enhanced HyperGCN for skeleton-based action recognition. Inf Process Manage 2022; 59 (4):102950.

[32]

Shi L, Zhang YF, Cheng J, Lu H. Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019 Jun 15-20; Long Beach, CA, USA. Piscataway: IEEE; 2019. p. 12018-27.

[33]

Liu ZY, Zhang HW, Chen ZH, Wang Z, Ouyang W. Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020 Jun 13-19; Seattle, WA, USA Piscataway: IEEE; 2020. p. 140-9.

[34]

Chen Y, Zhang Z, Yuan C, Li B, Deng Y, Hu W. Channel-wise topology refinement graph convolution for skeleton-based action recognition. In: Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV); 2021 Oct 10-17; Montreal, QC, Canada. Piscataway: IEEE; 2021. p. 13339-48.

[35]

Feng YF, You H, Zhang Z, Ji R, Gao Y. Hypergraph neural networks. In: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence; 2019 Jan 27-Feb 1; Honolulu, HI, USA Palo Alto: AAAI Press; 2019. p. 3558-65.

[36]

Nikpour B, Armanfard N. Spatio-temporal hard attention learning for skeleton-based activity recognition. Pattern Recognit 2023;139:109428.

[37]

Shao H, Wang L, Chen R, Li H, Liu Y. Safety-enhanced autonomous driving using interpretable sensor fusion transformer. In: Proceedings of the 6th Conference on Robot Learning; 2022 Dec 14-18; Auckland, New Zealand. New York: PLMR; 2022. p. 726-37.

[38]

Chen D, Krähenbühl P. Learning from all vehicles. In: Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2022 Jun 18-24; New Orleans, LA, USA. Piscataway: IEEE; 2022. p. 17201-10.

[39]

Lang AH, Vora S, Caesar H, Zhou L, Yang J, Beijbom O. Pointpillars:fast encoders for object detection from point clouds. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019 Jun 15-20; Long Beach, CA, USA. Piscataway: IEEE; 2019. p. 12689-97.

[40]

Wu Z, Ma N, Wang C, Xu C, Xu G, Li M. Spatial-temporal hypergraph based on dual-stage attention network for multi-view data lightweight action recognition. Pattern Recognit 2024;151:110427.

PDF (4824KB)

0

Accesses

0

Citation

Detail

Sections
Recommended

/