融合物理属性的机器人认知学习‌

Fuchun Sun; Wenbing Huang; Yu Luo; Tianying Ji; Huaping Liu; He Liu; Jianwei Zhang

doi:10.1016/j.eng.2024.10.013

PDF(1947 KB)

工程（英文） ›› 2025, Vol. 47 ›› Issue (4) : 168-179. DOI: 10.1016/j.eng.2024.10.013

研究论文

Article

融合物理属性的机器人认知学习‌

Fuchun Sun ^a^,^* ,
Wenbing Huang ^b ,
Yu Luo ^a ,
Tianying Ji ^a ,
Huaping Liu ^a ,
He Liu ^a ,
Jianwei Zhang ^c

作者信息 +

Robot Cognitive Learning by Considering Physical Properties

Fuchun Sun ^a^,^* ,
Wenbing Huang ^b ,
Yu Luo ^a ,
Tianying Ji ^a ,
Huaping Liu ^a ,
He Liu ^a ,
Jianwei Zhang ^c

Author information +

History +

Abstract

Humans achieve cognitive development through continuous interaction with their environment, enhancing both perception and behavior. However, current robots lack the capacity for human-like action and evolution, posing a bottleneck to improving robotic intelligence. Existing research predominantly models robots as one-way, static mappings from observations to actions, neglecting the dynamic processes of perception and behavior. This paper introduces a novel approach to robot cognitive learning by considering physical properties. We propose a theoretical framework wherein a robot is conceptualized as a three-body physical system comprising a perception-body (P-body), a cognition-body (C-body), and a behavior-body (B-body). Each body engages in physical dynamics and operates within a closed-loop interaction. Significantly, three crucial interactions connect these bodies. The C-body relies on the P-body’s extracted states and reciprocally offers long-term rewards, optimizing the P-body’s perception policy. In addition, the C-body directs the B-body’s actions through sub-goals, and subsequent P-body-derived states facilitate the C-body’s cognition dynamics learning. At last, the B-body would follow the sub-goal generated by the C-body and perform actions conditioned on the perceptive state from the P-body, which leads to the next interactive step. These interactions foster the joint evolution of each body, culminating in optimal design. To validate our approach, we employ a navigation task using a four-legged robot, D’Kitty, equipped with a movable global camera. Navigational prowess demands intricate coordination of sensing, planning, and D’Kitty’s motion. Leveraging our framework yields superior task performance compared with conventional methodologies. In conclusion, this paper establishes a paradigm shift in robot cognitive learning by integrating physical interactions across the P-body, C-body, and B-body, while considering physical properties. Our framework’s successful application to a navigation task underscores its efficacy in enhancing robotic intelligence.

Keywords

Robot learning / Physical basis / Cognitive learning

引用本文

EndNote

Ris (Procite)

Bibtex

导出引用

Fuchun Sun, Wenbing Huang, Yu Luo. 融合物理属性的机器人认知学习‌. Engineering. 2025, 47(4): 168-179 https://doi.org/10.1016/j.eng.2024.10.013

参考文献

原文顺序 | 文献年度倒序 | 文中引用次数倒序

[1]	Miriyev A, Kova Mč.Skills for physical artificial intelligence.Nat Mach Intell 2020; 2(11):658-660.
[2]	Murray RM.Feedback systems: an introduction for scientists and engineers.Princeton University Press, Princeton (2010)
[3]	Sünderhauf N, Brock O, Scheirer W, Hadsell R, Fox D, Leitner J, et al.The limits and potentials of deep learning for robotics.Int J Robot Res 2018; 37(4–5):405-420.
[4]	Wang W, Siau K.Artificial intelligence, machine learning, automation, robotics, future of work and future of humanity: a review and research agenda.J Database Manage 2019; 30(1):61-79.
[5]	Osa T, Pajarinen J, Neumann G, Bagnell JA, Abbeel P, Peters J.An algorithmic perspective on imitation learning.Found Trends Robotics 2018; 7(1–2):1-179.
[6]	Kretzschmar H, Spies M, Sprunk C, Burgard W.Socially compliant mobile robot navigation via inverse reinforcement learning.Int J Robot Res 2016; 35(11):1289-1307.
[7]	Kohl N, Stone P.Policy gradient reinforcement learning for fast quadrupedal locomotion.In: Proceedings of the IEEE International Conference on Robotics and Automation; 2004 Apr 26–May 1; New Orleans, L A, US A. New York City: IEE E; 2004. p. 2619–24.
[8]	Akkaya I, Andrychowicz M, Chociej M, Litwin M, McGrew B, Petron A, et al.Solving rubik’s cube with a robot hand.2019. arXiv: 1910.07113.
[9]	Zhang K, Yang Z, Basar T.Multi-agent reinforcement learning: a selective overview of theories and algorithms.K.G. Vamvoudakis, Y. Wan, F.L. Lewis, D. Cansever (Eds.), Handbook of reinforcement learning and control, Springer, Berlin 2021; 321-384.
[10]	Ahn C, Kim E, Oh S.Deep elastic networks with model selection for multi-task learning.In: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2019 Oct 27–Nov 2; Seoul, Republic of Korea. New York City: IEE E; 2019. p. 6529–38.
[11]	He K, Zhang X, Ren S, Sun J.Deep residual learning for image recognition.In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016 Jun 27–30; Las Vegas, N V, US A. New York City: IEE E; 2016. p. 770–8.
[12]	Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, et al.MobileNets: efficient convolutional neural networks for mobile vision applications.2017. arXiv: 1704.04861.
[13]	Krizhevsky A, Sutskever I, Hinton GE.ImageNet classification with deep convolutional neural networks.In: Proceedings of the 26th Annual Conference on Neural Information Processing Systems; 2012 Dec 3–6; Lake Tahoe, N A, US A. Trier: the dblp computer science bibliography; 2012. p. 1097–105.
[14]	Girshick R.Fast R-CNN.In: Proceedings of the IEEE International Conference on Computer Vision; 2015 Dec 7–13; Santiago, Chile. New York City: IEE E; 2015. p. 1440–48.
[15]	Girshick R, Donahue J, Darrell T, Malik J.Rich feature hierarchies for accurate object detection and semantic segmentation.In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2014 Jun 23–28; Columbus, O H, US A. New York City: IEE E; 2014. p. 580–7.
[16]	He K, Zhang X, Ren S, Sun J.Spatial pyramid pooling in deep convolutional networks for visual recognition.IEEE Trans Pattern Anal Mach Intell 2015; 37(9):1904-1916.
[17]	Redmon J, Farhadi A.YOLO9000: better, faster, stronger.In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017 Jul 21–26; Honolulu, H I, US A. New York City: IEE E; 2017. p. 7263–71.
[18]	Redmon J, Farhadi A.YOLOv3: an incremental improvement.2018. arXiv: 1804.02767.
[19]	He K, Gkioxari G, Dollár P, Girshick R.Mask R-CNN.In: Proceedings of the IEEE International Conference on Computer Vision; 2017 Oct 22–29; Venice, Italy. New York City: IEE E; 2017. p. 2961–9.
[20]	Ren S, He K, Girshick R, Sun J.Faster R-CNN: towards real-time object detection with region proposal networks.In: Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems; 2015 Dec 7–12; Montreal, Q C, Canada. Cambridge: The MIT Press; 2015. p. 91–9.
[21]	Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S.Feature pyramid networks for object detection.In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition-2017; 2017 Jul 21–26; Honolulu, H I, US A. New York City: IEE E; 2017. p. 2117–25.
[22]	Redmon J, Divvala S, Girshick R, Farhadi A.You only look once: unified, real-time object detection.In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition-2016; 2016 Jun 27–30; Las Vegas, N V, US A. New York City: IEE E; 2016. p. 779–88.
[23]	Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, et al.SSD: single shot multibox detector.In: Proceedings of the European Conference on Computer Vision; 2016 Oct 11–14; Amsterdam, the Netherlands. Berlin: Springer; 2016. P. 21–37.
[24]	Lin TY, Goyal P, Girshick R, He K, Dollár P.Focal loss for dense object detection.In: Proceedings of the IEEE International Conference on Computer Vision; 2017 Oct 22–29; Venice, Italy. New York City: IEE E; 2017. p. 2980–8.
[25]	Law H, Deng J.CornerNet: detecting objects as paired keypoints.In: Proceedings of the European Conference on Computer Vision (ECC V 2018); 2018 Sep 8–14; Munich, Germany. Berlin: Springer; 2018. p. 734–50.
[26]	Zhou X, Zhuo J, Krahenbuhl P.Bottom-up object detection by grouping extreme and center points.In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition-2019; 2019 Jun 15–20; Long Beach, C A, US A. New York City: IEE E; 2019. p. 850–9.
[27]	Zhu C, He Y, Savvides M.Feature selective anchor-free module for single-shot object detection.2019. arXiv: 1903.00621.
[28]	Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, et al.An image is worth 16 ×16 words: transformers for image recognition at scale.2020. arXiv: 2010.11929.
[29]	Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, et al.Swin transformer: hierarchical vision transformer using shifted windows.In: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2021 Oct 10–17; Montreal, Q C, Canada. New York City: IEE E; 2021. p. 10012–22.
[30]	Fang Y, Liao B, Wang X, Fang J, Qi J, Wu R, et al.You only look at one sequence: rethinking transformer in vision through object detection.In: Proceedings of the 35th Annual Conference on Neural Information Processing; 2021 Dec 6–14; online. San Diego: Neural Information Processing Systems; 2021.
[31]	Song H, Sun D, Chun S, Jampani V, Han D, Heo B, et al.ViDT: an efficient and effective fully transformer-based object detector.2021. arXiv: 2110.03921.
[32]	Jing M, Ma X, Huang W, Sun F, Yang C, Fang B, et al.Reinforcement learning from imperfect demonstrations under soft expert guidance.Proc Conf AAAI Artif Intell 2020; 34(04):5109-5116.
[33]	Kong T, Sun F, Liu H, Jiang Y, Li L, Shi J.Foveabox: beyound anchor-based object detection.IEEE Trans Image Process 2020; 29:7389-7398.
[34]	Liu H, Wang F, Guo D, Liu X, Zhang X, Sun F.Active object discovery and localization using sound-induced attention.IEEE Trans Industr Inform 2021; 17(3):2021-2029.
[35]	Bajcsy R, Aloimonos Y, Tsotsos JK.Revisiting active perception.Auton Robots 2018; 42(2):177-196.
[36]	Liu H, Den Y, Guo D, Fang B, Sun F, Yang W.An interactive perception method for warehouse automation in smart cities.IEEE Trans Industr Inform 2021; 17(2):830-838.
[37]	Silver D, Singh SP, Precup D, Sutton RS.Reward is enough.Artif Intell 2021; 299:103535.
[38]	Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, et al.Human-level control through deep reinforcement learning.Nature 2015; 518(7540):529-533.
[39]	Sutton RS, McAllester D, Singh S, Mansour Y.Policy gradient methods for reinforcement learning with function approximation.In: Proceedings of the Annual Conference on Neural Information Processing Systems (NIP S 1999); 1999 Nov 29–Dec 4; Denver, C O, US A. Cambridge: The MIT Press; 1999.
[40]	Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, et al.Continuous control with deep reinforcement learning.In: Proceedings of the 4th International Conference on Learning Representations, ICL R 2016; 2016 May 2–4; San Juan, Puerto Rico. Trier: the dblp computer science bibliography; 2016.
[41]	Haarnoja T, Zhou A, Abbeel P, Levine S.Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor.In: Proceedings of the 35th International Conference on Machine Learning; 2018 Jul 10–15; Stockholm, Sweden. New York City: Proceedings of Machine Learning Research; 2018. p. 1856–65.
[42]	Wang T, Bao X, Clavera I, Hoang J, Wen Y, Langlois E, et al.Benchmarking model-based reinforcement learning.2019. ar Xiv:1907.02057v1.
[43]	Janner M, Fu J, Zhang M, Levine S.When to trust your model: model-based policy optimization.In: Proceedings of the Annual Conference on Neural Information Processing Systems; 2019 Dec 8–14; Vancouver, B C, Canada. San Diego: Neural Information Processing Systems Foundation, Inc.; 2019. p. 12498–09.
[44]	Tassa Y, Erez T, Todorov E.Synthesis and stabilization of complex behaviors through online trajectory optimization.In: Proceedings of the 2012 IEE E/RSJ International Conference on Intelligent Robots and Systems; 2012 Oct 7–12; Vilamoura, Portugal. New York City: IEE E; 2012. p. 4906–13.
[45]	Zhou Z, Yan N.A survey of numerical methods for convection-diffusion optimal control problems.J Numer Math 2014; 22(1):61-85.
[46]	De PT Boer, Kroese DP, Mannor S, Rubinstein RY.A tutorial on the cross-entropy method.Ann Oper Res 2005; 134(1):19-67.
[47]	Chua K, Calandra R, McAllister R, Levine S.Deep reinforcement learning in a handful of trials using probabilistic dynamics models.In: Proceedings of the Annual Conference on Neural Information Processing Systems; 2018 Dec 3–8; Montreal, Q C, Canada. Red Hook: Curran Associates Inc.; 2018. p. 4759–70.
[48]	Yildiz C, Heinonen M, Lähdesmäki H.Continuous-time model-based reinforcement learning.In: Proceedings of the International Conference on Machine Learning; 2021 Jun 18–24; online. New York City: Proceedings of Machine Learning Research; 2021. p. 12009–18.
[49]	Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O.Proximal policy optimization algorithms.2017. arXiv: 1707.06347.
[50]	Schulman J, Levine S, Abbeel P, Jordan M, Moritz P.Trust region policy optimization.In: Proceedings of the International Conference on Machine Learning; 2015 Jul 6–11; Lille, France. New York City: Proceedings of Machine Learning Research; 2015. p. 1889–97.
[51]	Chang D, Johnson-Roberson M, Sun J.An active perception framework for autonomous underwater vehicle navigation under sensor constraints.IEEE Trans Control Syst Technol 2022; 30(6):2301-2316.
[52]	Amos B, Jimenez I, Sacks J, Boots B, Zico Kolter J.Differentiable MPC for end-to-end planning and control.In: Proceedings of the 32nd Annual Conference on Neural Information Processing Systems (NIP S 2018); 2018 Dec 2–8; Montreal, Q C, Canada. San Diego: Neural Information Processing Systems; 2018.
[53]	Pong V, Gu S, Dalal M, Levine S.Temporal difference models: model-free deep RL for model-based control.In: Proceedings of the International Conference on Learning Representations; 2018 Apr 30–May 3; Vancouver, B C, Canada. Trier: the dblp computer science bibliography; 2018.
[54]	Ahn M, Zhu H, Hartikainen K, Ponte H, Gupta A, Levine S, Kumar V.ROBEL: robotics benchmarks for learning with low-cost robots.In: Proceedings of the Conference on Robot Learning; 2020 Nov 16–18; online; 2020.