农业障碍物少样本及零样本三维检测的多模态特征表示机制

Tianhai Wang; Ning Wang; Shunda Li; Zhiwen Jin; Jianxing Xiao; Yanlong Miao; Yifan Sun; Han Li; Man Zhang

doi:10.1016/j.eng.2026.01.030

工程（英文） ›› 2026, Vol. 60 ›› Issue (5) : 77 -89. DOI: 10.1016/j.eng.2026.01.030

研究论文

农业障碍物少样本及零样本三维检测的多模态特征表示机制

Tianhai Wang ^a ,
Ning Wang ^b ,
Shunda Li ^a ,
Zhiwen Jin ^b ,
Jianxing Xiao ^a ,
Yanlong Miao ^c ,
Yifan Sun ^d ,
Han Li ^a^,^b ,
Man Zhang ^a^,^b^,^*

作者信息 +

Multimodal Feature Representation Mechanism for 3D Detection of Agricultural Obstacles with Few or Zero Samples

Tianhai Wang ^a ,
Ning Wang ^b ,
Shunda Li ^a ,
Zhiwen Jin ^b ,
Jianxing Xiao ^a ,
Yanlong Miao ^c ,
Yifan Sun ^d ,
Han Li ^a^,^b ,
Man Zhang ^a^,^b^,^*

Author information +

文章历史 +

PDF

摘要

深度学习（deep learning, DL）方法，尤其是融合相机与激光雷达（light detection and ranging, LiDAR）数据的方法，在三维（three-dimensional, 3D）障碍物检测方面已表现出显著精度，这对于实现农业机械可靠的自主导航至关重要。然而，近期方法在训练过程中高度依赖大规模标注数据集，而农业场景中的样本稀缺且差异明显，这给其应用带来了挑战。为克服这一局限，本文提出一种基于多模态特征表示机制的农业障碍物少样本及零样本三维检测新方法。文中集成图像与点云姿态校正器，以提高多模态数据的准确性、可靠性和一致性；集成语义特征编码器与几何—强度特征编码器，以捕获类别之间的关键关系；设计鸟瞰图（Bird’s Eye View, BEV）空间融合解码器，以辨别类内相似性与类间差异性。在多种田间场景下开展的多类别实验表明，所提方法可将对训练样本的依赖降低30%–40%，其精确率、召回率、综合评价指标F1分数和检测速度分别达到95.03%、97.01%、96.01%和16.56帧每秒（frames per second, FPS）。即使在完全未知的场景中（即障碍物类别没有任何对应训练样本），所提方法仍可取得81.63%的可接受的F1分数。结果表明，所提方法在检测性能、运行效率与数据依赖之间实现了较优平衡，为农业机械自主导航提供了有效的安全保障。

Abstract

Deep learning (DL) methods, particularly those that combine camera and light detection and ranging (LiDAR) data, have demonstrated remarkable accuracy in three-dimensional (3D) obstacle detection. This is crucial for achieving rigorous and reliable autonomous navigation of agricultural machinery. However, recent approaches heavily rely on large-scale labeled datasets during training, which creates challenges for their application in agriculture because of presence of scarce and distinct agricultural samples. To overcome this limitation, this paper proposes a novel 3D detection method for agricultural obstacles with few or zero samples based on a multimodal feature representation mechanism. Image and point cloud attitude adjusters are integrated to increase the accuracy, reliability, and uniformity of multimodal data. Semantic and geometry-intensity feature encoders are integrated to capture essential relationships among categories. The Bird’s Eye View (BEV) fusion decoder is designed to discern intracategory similarities and intercategory distinctions. Multicategory experiments in various field scenarios reveal that the proposed method reduces the dependence on training samples by 30%-40%, and the precision rate, recall rate, F₁ score, and detection speed are 95.03%, 97.01%, 96.01%, and 16.56 frames per second (FPS), respectively. Even in completely unknown scenarios (i.e., obstacle categories that lack any corresponding training samples), the proposed method still achieves an acceptable F₁ score of 81.63%. As indicated by the results, the proposed method achieves a sophisticated trade-off among detection performance, operational efficiency, and data dependency, providing an effective safety guarantee for the autonomous navigation of agricultural machinery.

关键词

三维障碍物检测 / 多模态表征 / 相机-激光雷达融合 / 自主导航 / 智慧农业

Key words

3D obstacle detection / Multimodal representation / Camera-LiDAR fusion / Autonomous navigation / Smart agriculture

引用本文

引用格式 ▾

Tianhai Wang,Ning Wang,Shunda Li,Zhiwen Jin,Jianxing Xiao,Yanlong Miao,Yifan Sun,Han Li,Man Zhang. 农业障碍物少样本及零样本三维检测的多模态特征表示机制[J]. 工程（英文）, 2026, 60(5): 77-89 DOI:10.1016/j.eng.2026.01.030