
基于具身智能的移动操作机器人系统发展研究
Development of Mobile Manipulator Robot System with Embodied Intelligence
具身智能是新一轮科技革命与产业变革中的战略性技术,是当前世界各国重点竞争的前沿高地之一;移动操作机器人系统因其优秀的运动、规划、执行能力成为具身技术首选的硬件载体;基于具身智能的移动操作机器人系统作为实现跨领域、多场景、多功能的自主具身智能平台,将成为引领未来新一代信息技术和人工智能发展的关键。本文从基于具身智能的移动操作机器人系统发展的需求出发,总结了基于具身智能的移动操作机器人系统的发展现状,分析了该领域发展面临的问题和挑战,提出了涵盖多模态感知技术、世界认知与理解技术、智能自主决策技术、运动与操作联合规划技术等基于具身智能的移动操作机器人系统的关键共性技术。基于此,本文从国家政策倾斜、共性技术突破、交叉学科建设与人才培养、综合验证平台构建等方面提出了对策建议,以期助力具身智能发展浪潮下我国移动操作机器人领域的长足发展。
Embodied intelligence stands as a strategic technology in the ongoing scientific and technological revolution, forming a frontier in global competition. The mobile manipulator robot system, with its exceptional mobility, planning, and execution capabilities, has become the preferred hardware carrier for embodied intelligence. Moreover, the mobile manipulator robot system, rooted in embodied intelligence, emerges as a pivotal platform capable of cross-domain functionality. Positioned at the forefront of a new era in information technology and artificial intelligence, this system is integral for future development. Addressing the strategic demand for embodied-intelligence-based mobile manipulator robot systems, this study presents an overview of the current developmental landscape. It delves into the challenges faced by this field, proposing key common technologies such as multimodal perception, world cognition, intelligent autonomous decision-making, and joint planning for movement and manipulation. Furthermore, the study offers recommendations for advancing the field, encompassing national policy support, breakthroughs in common technologies, interdisciplinary collaboration, talent cultivation, and construction of comprehensive verification platforms. These suggestions aim to facilitate the rapid progress of mobile manipulator robots in China amid the wave of embodied intelligence development.
具身智能 / 移动操作机器人 / 任务和运动联合规划 / 智能决策
embodied intelligence / mobile manipulator robot / joint planning for movement and manipulation / intelligent decision-making
[1] |
Roa M A, Berenson D, Huang W. Mobile manipulation: Toward smart manufacturing [TC spotlight] [J]. IEEE Robotics & Automation Magazine, 2015, 22(4): 14‒15.
|
[2] |
刘华平, 郭迪, 孙富春, 等. 基于形态的具身智能研究: 历史回顾与前沿进展 [J]. 自动化学报, 2023, 49(6): 1131‒1154.
|
[3] |
李延真, 石立国, 徐志根, 等. 移动机器人视觉SLAM研究综述 [J]. 智能计算机与应用, 2022, 12(7): 40‒45.
|
[4] |
左晋, 张皓, 远子涵, 等. 移动机器人SLAM发展现状综述 [J]. 北京印刷学院学报, 2023, 31(6): 30‒32.
|
[5] |
李敖. 基于SLAM的移动机器人避障研究 [J]. 自动化应用, 2023, 64(3): 23‒26.
|
[6] |
Rubio F, Valero F, Llopis-Albert C. A review of mobile robots: Concepts, methods, theoretical framework, and applications [J]. International Journal of Advanced Robotic Systems, 2019, 16(2): 172988141983959.
|
[7] |
Ichnowski J, Avigal Y, Kerr J, et al. Dex-NeRF: Using a neural radiance field to grasp transparent objects [EB/OL]. (2021-10-27)[2023-08-30]. http: //arxiv.org/abs/2110.14217.pdf.
|
[8] |
Brahmbhatt S, Ham C, Kemp C C, et al. ContactDB: Analyzing and predicting grasp contact via thermal imaging [C]. Long Beach: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
|
[9] |
Ze Y J, Liu Y Y, Shi R Z, et al. H-InDex: Visual reinforcement learning with hand-informed representations for dexterous manipulation [EB/OL]. (2023-10-02)[2023-11-09]. http: //arxiv.org/abs/2310.01404.pdf.
|
[10] |
Kadi H A, Terzić K. Data-driven robotic manipulation of cloth-like deformable objects: The present, challenges and future prospects [J]. Sensors, 2023, 23(5): 2389.
|
[11] |
孟繁科. 具身智能: 智能进化的新阶段 [J]. 中国工业和信息化, 2023 (7): 6‒10.
|
[12] |
谭民, 王硕. 机器人技术研究进展 [J]. 自动化学报, 2013, 39(7): 963‒972.
|
[13] |
Zhu Q, Zhang F, Huang Y, et al. An all-round AI-chemist with a scientific mind [J]. National Science Review, 2022, 9(10): nwac190.
|
[14] |
Turing A M. Computing machinery and intelligence [M]. Dordrecht: Springer Netherlands, 2007: 23‒65.
|
[15] |
Open AI. GPT-4 [EB/OL]. [2023-03-14]. https: //openai.com/research/gpt-4.
|
[16] |
Sun Y, Wang S, Feng S, et al. Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation [EB/OL]. (2021-07-05)[2023-12-01]. https: //doi.org/10.48550/arXiv. 2107.02137.
|
[17] |
Bai J, Bai S, Yang S, et al. Qwen-vl: A frontier large vision-language model with versatile abilities [EB/OL]. (2023-08-24)[2023-12-01]. https: //doi.org/10.48550/arXiv.2308.12966.
|
[18] |
Betker J, Goh G, Jing L, et al. Improving image generation with better captions [EB/OL]. [2023-04-29]. https: //cdn.openai.com/papers/dall-e-3.pdf.
|
[19] |
Team G, Anil R, Borgeaud S, et al. Gemini: A family of highly capable multimodal models [EB/OL]. (2023-12-19)[2023-12-20]. https: //doi.org/10.48550/arXiv.2312.11805.
|
[20] |
Luo H, Sun Q, Xu C, et al. Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct [EB/OL]. (2023-08-18)[2023-09-20]. https: //doi.org/10.48550/arXiv. 2308.09583.
|
[21] |
Driess D, Xia F, Sajjadi M S M, et al. PaLM-E: An embodied multimodal language model [EB/OL]. (2023-03-06)[2023-11-10]. http: //arxiv.org/abs/2303.03378.pdf.
|
[22] |
Liu H T, Li C Y, Wu Q Y, et al. Visual instruction tuning [EB/OL]. (2023-04-17)[2023-11-09]. http: //arxiv.org/abs/2304.08485.pdf.
|
[23] |
Wu H T, Jing Y, Cheang C H, et al. Unleashing large-scale video generative pre-training for visual robot manipulation [EB/OL]. (2023-12-20)[2023-12-21]. https: //doi.org/10.48550/arXiv.2312.13139.
|
[24] |
Fu Z P, Zhao T Z, Finn C. Mobile ALOHA: Learning bimanual mobile manipulation with low-cost whole-body teleoperation [EB/OL]. (2024-01-04)[2024-01-06]. https: //doi.org/10.48550/arXiv.2401.02117.
|
[25] |
Gu J, Kirmani S, Wohlhart P, et al. RT-Trajectory: Robotic task generalization via hindsight trajectory sketches [EB/OL]. (2023-11-03)[2023-12-21]. https: //doi.org/10.48550/arXiv.2311.01977.
|
[26] |
多可移动协作机器人 [EB/OL]. [2023-10-18]. https: //www.ducorobots.cn/index/goods/prodetail/cid/31.html.
|
[27] |
迦智科技. 复合移动作业机器人 [EB/OL]. [2023-10-18]. https: //www.iplusbot.cn/product/184-271.
|
[28] |
KUKA. KUKA移动机器人 [EB/OL]. [2023-10-18]. https: //www.kuka.com/zh-cn/%E4%BA%A7%E5%93%81, -a-, %E6% 88%90%E6%9E%9C/%E6%9C%BA%E5%8A%A8%E6%80%A7/%E7%A7%BB%E5%8A%A8%E6%9C%BA%E5%99%A8%E4%BA%BA.
|
[29] |
朱悦. 谷歌DeepMind发布机器人大模型RT-2, 提高泛化与涌现能力 [EB/OL]. (2023-08-02)[2023-12-10]. https: //baijiahao.baidu. com/s?id=1773102794187718464&wfr=spider&for=pc.
|
[30] |
1X. Data collection for embodied learning [EB/OL]. [2023-10-11]. https: //www.1x.tech/androids/eve.
|
[31] |
Boston Dynamics. Meta: Advanced AI research [EB/OL]. [2023-02-01]. https: //bostondynamics.com/case-studies/advanced-ai-adding- capabilities-to-spot-through-research/.
|
[32] |
罗欣, 丁晓军. 地面移动作业机器人运动规划与控制研究综述 [J]. 哈尔滨工业大学学报, 2021, 53(1): 1‒15.
|
[33] |
Edmonds M, Gao F, Liu H X, et al. A tale of two explanations: Enhancing human trust by explaining robot behavior [J]. Science Robotics, 2019, 4(37): eaay4663.
|
[34] |
Mavrogiannis C, Baldini F, Wang A, et al. Core challenges of social robot navigation: A survey [J]. ACM Transactions on Human-Robot Interaction, 12(3): 36.
|
[35] |
Kirillov A, Mintun E, Ravi N, et al. Segment anything [EB/OL]. (2023-04-05)[2023-11-09]. http: //arxiv.org/abs/2304.02643.pdf.
|
[36] |
Zheng Y, Wang G M, Liu J M, et al. Spherical frustum sparse convolution network for LiDAR point cloud semantic segmentation [EB/OL]. (2023-11-29)[2023-11-30]. http: //arxiv.org/abs/2311. 17491.pdf.
|
[37] |
Touvron H, Lavril T, Izacard G, et al. LLaMA: Open and efficient foundation language models [EB/OL]. (2023-02-27)[2023-11-09]. http: //arxiv.org/abs/2302.13971.pdf.
|
[38] |
徐继敏. 生成式AI治理原则与法律策略 [J]. 理论与改革, 2023 (5): 72‒83.
|
/
〈 |
|
〉 |