Multimodal Feature Representation Mechanism for 3D Detection of Agricultural Obstacles with Few or Zero Samples

Tianhai Wang; Ning Wang; Shunda Li; Zhiwen Jin; Jianxing Xiao; Yanlong Miao; Yifan Sun; Han Li; Man Zhang

doi:10.1016/j.eng.2026.01.030

Engineering ›› :202601030 DOI: 10.1016/j.eng.2026.01.030

Research

research-article

Multimodal Feature Representation Mechanism for 3D Detection of Agricultural Obstacles with Few or Zero Samples

Author information +

History +

PDF

Abstract

Deep learning (DL) methods, particularly those that combine camera and light detection and ranging (LiDAR) data, have demonstrated remarkable accuracy in three-dimensional (3D) obstacle detection. This is crucial for achieving rigorous and reliable autonomous navigation of agricultural machinery. However, recent approaches heavily rely on large-scale labeled datasets during training, which creates challenges for their application in agriculture because of presence of scarce and distinct agricultural samples. To overcome this limitation, this paper proposes a novel 3D detection method for agricultural obstacles with few or zero samples based on a multimodal feature representation mechanism. Image and point cloud attitude adjusters are integrated to increase the accuracy, reliability, and uniformity of multimodal data. Semantic and geometry-intensity feature encoders are integrated to capture essential relationships among categories. The Bird’s Eye View (BEV) fusion decoder is designed to discern intracategory similarities and intercategory distinctions. Multicategory experiments in various field scenarios reveal that the proposed method reduces the dependence on training samples by 30%-40%, and the precision rate, recall rate, F₁ score, and detection speed are 95.03%, 97.01%, 96.01%, and 16.56 frames per second (FPS), respectively. Even in completely unknown scenarios (i.e., obstacle categories that lack any corresponding training samples), the proposed method still achieves an acceptable F₁ score of 81.63%. As indicated by the results, the proposed method achieves a sophisticated trade-off among detection performance, operational efficiency, and data dependency, providing an effective safety guarantee for the autonomous navigation of agricultural machinery.

Keywords

3D obstacle detection / Multimodal representation / Camera-LiDAR fusion / Autonomous navigation / Smart agriculture

Cite this article

Download citation ▾

Tianhai Wang, Ning Wang, Shunda Li, Zhiwen Jin, Jianxing Xiao, Yanlong Miao, Yifan Sun, Han Li, Man Zhang. Multimodal Feature Representation Mechanism for 3D Detection of Agricultural Obstacles with Few or Zero Samples. Engineering 202601030 DOI:10.1016/j.eng.2026.01.030

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Zhang L, Huang L, Li T, Wang T, Yang X, Yang Q. The skyscraper crop factory: a potential crop-production system to meet rising urban food demand. Engineering 2023;31:70-5.

[2]	Zhang S, Chen Z, Cao C, Cui Y, Gao Y. Photothermal-management agricultural films toward industrial planting: opportunities and challenges. Engineering 2023;35:191-200.

[3]	Wang T, Liu Y, Wang M, Fan Q, Tian H, Qiao X, et al. Applications of UAS in crop biomass monitoring: a review. Front Plant Sci 2021;12:616689.

[4]	Wei W, Xiao M, Wang H, Zhu Y, Xie C, Geng G. Research progress of multiple agricultural machines for cooperative operations: a review. Comput Electron Agric 2024;227:109628.

[5]	Yu Y, Liu Y, Wang J, Noguchi N, He Y. Obstacle avoidance method based on double DQN for agricultural robots. Comput Electron Agric 2023;204:107546.

[6]	Wang J, Zheng H, Yu Y, He Y, Liu Y. Robust multiple obstacle tracking method based on depth aware OCSORT for agricultural robots. Comput Electron Agric 2024;217:108580.

[7]

Liu Z, Tang H, Amini A, Yang X, Mao H, Rus DL, et al. BEVFusion:multi-task multi-sensor fusion with unified bird’s-eye view representation. In:Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA); 2023 May 29-Jun 2; London, UK. New York City:IEEE; 2023. p. 2774-81.

[8]	Wang T, Chen B, Wang N, Ji Y, Li H, Zhang M. Zero-shot obstacle detection using panoramic vision in farmland. J Field Robot 2024; 41(7):2169-83.

[9]	Wang T, Wang N, Xiao J, Miao Y, Sun Y, Li H, et al. One-shot domain adaptive real-time 3D obstacle detection in farmland based on semantic-geometry-intensity fusion strategy. Comput Electron Agric 2023;214:108264.

[10]

Li T, Xie F, Feng Q, Qiu Q. 2022 Nov 19-20; Beijing China. Multi-vision-based localization and pose estimation of occluded apple fruits for harvesting robots. Proceedings of the 2022 37th Youth Academic Annual Conference of Chinese Association of Automation (YAC); New York City:IEEE; 2022. p. 767-72.

[11]	Wei W, Xiao M, Duan W, Wang H, Zhu Y, Zhai C, et al. Research progress on autonomous operation technology for agricultural equipment in large fields. Agriculture 2024; 14(9):1473.

[12]

Liang T, Xie H, Yu K, Xia Z, Lin Z, Wang Y, et al. BEVFusion:a simple and robust LiDAR-camera fusion framework. In:Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022); 2022 Nov 28-Dec 9; New Orleans, LA, USA. Red Hook: Curran Associates, Inc.; 2022. p. 10421-34.

[13]	Wang X, Li K, Chehri A. Multi-sensor fusion technology for 3D object detection in autonomous driving: a review. IEEE Trans Intell Transp Syst 2024; 25(2):1148-65.

[14]	Sindagi VA, Zhou Y, Tuzel O. 2019 May 20-24; Montreal QC, Canada. MVX-Net:multimodal VoxelNet for 3D object detection. Proceedings of the 2019 IEEE International Conference on Robotics and Automation (ICRA); New York City:IEEE; 2019. p. 7276-82.

[15]	Wang C, Ma C, Zhu M, Yang X.PointAugmenting:cross-modal augmentation for 3D object detection. In:Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2021 Jun 19-25; Virtual Conference. Los Alamitos:IEEE Computer Society; 2021. p. 11789-98.

[16]	Xie B, Yang Z, Yang L, Wei A, Weng X, Li B. AMMF: attention-based multi-phase multi-task fusion for small contour object 3D detection. IEEE Trans Intell Transp Syst 2022;24:1692-701.

[17]	Zhao K, Ma L, Meng Y, Liu L, Wang J, Junior JM. 3D vehicle detection using multi-level fusion from point clouds and images. IEEE Trans Intell Transp Syst 2022; 23(9):15146-54.

[18]	Chen Y, Li Y, Zhang X, Sun J, Jia J.Focal sparse convolutional networks for 3D object detection. In:Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2022 Jun 18-24; New Orleans, LA, USA. Los Alamitos:IEEE Computer Society; 2022. p. 5418-27.

[19]

Bai X, Hu Z, Zhu X, Huang Q, Chen Y, Fu H, et al.TransFusion:robust LiDAR-camera fusion for 3D object detection with transformers. In:Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2022 Jun 18-24; New Orleans, LA, USA. Los Alamitos:IEEE Computer Society; 2022. p. 1080-9.

[20]	Shi P, Qi H, Liu Z, Yang A. 3D vehicle detection algorithm based on multimodal decision-level fusion. Comput Model Eng Sci 2023; 135(3):2007-23.

[21]	Pang S, Morris D, Radha H. 2020 Oct 25-29; camera-LiDAR object candidates fusion for 3D object detection. Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); Las Vegas, NV, USA. New York City:IEEE; 2020. p. 10386-93.

[22]	Ku J, Mozifian M, Lee J, Harakeh A, Waslander SL. 2018 Oct 1-5; Madrid, Spain. Joint 3D Proposal Generation and Object Detection from View Aggregation. Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); New York City:IEEE; 2018. p. 1-8.

[23]	Zhang Q, Barri K, Babanajad SK, Alavi AH. Real-time detection of cracks on concrete bridge decks using deep learning in the frequency domain. Engineering 2021; 7(12):1786-96.

[24]	Wang H, Ma Z, Ren Y, Du S, Lu H, Shang Y, et al. Interactive image segmentation based field boundary perception method and software for autonomous agricultural machinery path planning. Comput Electron Agric 2024;217:108568.

[25]	Wang T, Chen B, Zhang Z, Li H, Zhang M. Applications of machine vision in agricultural robot navigation: a review. Comput Electron Agric 2022;198:107085.

[26]	Tian H, Wang T, Liu Y, Qiao X, Li Y. Computer vision technology in agricultural automation—a review. Inf Process Agric 2020; 7(1):1-19.

[27]	Wang H, Li J, Wu H, Hovy E, Sun Y. Pre-trained language models and their applications. Engineering 2023;25:51-65.

[28]	Wang X, Wang X, Jiang B, Luo B. Few-shot learning meets transformer: unified query-support transformers for few-shot classification. IEEE Trans Circ Syst Video Tech 2023; 33(12):7789-802.

[29]	Gupta A, Narayan S, Khan S, Khan FS, Shao L, van de Weijer J. Generative multi-label zero-shot learning. IEEE Trans Pattern Anal Mach Intell 2023; 45(12):14611-24.

[30]	Chen Z, Fu Y, Zhang Y, Jiang YG, Xue X, Sigal L. Multi-level semantic feature augmentation for one-shot learning. IEEE Trans Image Process 2019; 28(9):4594-605.

[31]	Li Z, Tang H, Peng Z, Qi GJ, Tang J. Knowledge-guided semantic transfer network for few-shot image recognition. IEEE Trans Neural Netw Learn Syst 2025; 36(11):19474-88.

[32]

Corral-Soto ER, Nabatchian A, Gerdzhev M, Bingbing L. China. LiDAR few-shot domain adaptation via integrated CycleGAN and 3D object detector with joint learning delay. Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA); 2021 May 30-Jun 5; Xi’an, New York City:IEEE; 2021. p. 13099-105.

[33]	Li Y, Chen C, Yan W, Cheng Z, Tan HL, Zhang W. Cascade graph neural networks for few-shot learning on point clouds. IEEE Trans Intell Transp Syst 2023; 24(8):8788.

[34]	Hong B, Zhou Y, Qin H, Wei Z, Liu H, Yang Y. Few-shot object detection using multimodal sensor systems of unmanned surface vehicles. Sensors 2022; 22(4):1511.

[35]	Zhu K, Chen W, Hou Z, Wang Q, Chen H. Modified fusing-and-filling generative adversarial network-based few-shot image generation for GMAW defect detection using multi-sensor monitoring system. Int J Adv Manuf Technol 2023; 128(5-6):2753-62.

[36]	Geiger A, Lenz P, Stiller C, Urtasun R. Vision meets robotics: the KITTI dataset. Int J Rob Res 2013; 32(11):1231-7.

[37]	He K, Zhang X, Ren S, Sun J.Deep residual learning for image recognition. In:Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016 Jun 27-30; Las Vegas, NV, USA. Los Alamitos:IEEE Computer Society; 2016. p. 770-8.

[38]

Wu Y, Chen Y, Yuan L, Liu Z, Wang L, Li H, et al.Rethinking classification and localization for object detection. In:Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020 Jun 13-19; Seattle, WA, USA. Los Alamitos:IEEE Computer Society; 2020. p. 10183-92.

[39]	Zhang Z. A flexible new technique for camera calibration. IEEE Trans Pattern Anal Mach Intell 2000; 22(11):1330-4.

[40]

Devlin J, Chang MW, Lee K, Toutanova K. BERT:pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies (NAACL-HLT 2019); 2019 Jun 2-7; Minneapolis, MN, USA. Stroudsburg: Association for Computational Linguistics; 2019. p. 4171-86.

[41]	Woo S, Park J, Lee JY, Kweon IS. CBAM:convolutional block attention module. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y, Proceedings of the Computer Vision—ECCV 2018: 15 th European Conference; 2018 Sep 8-14; Munich, Germany. Cham:Springer International Publishing; 2018. p. 3-19.

[42]	Wang H, Shan Y, Chen L, Liu M, Wang L, Meng Z. Multi-scale feature learning for 3D semantic mapping of agricultural fields using UAV point clouds. Int J Appl Earth Obs Geoinf 2025;141:104626.

[43]	Wu A, He P, Li X, Chen K, Ranka S, Rangarajan A. An efficient semi-automated scheme for infrastructure LiDAR annotation. IEEE Trans Intell Transp Syst 2024; 25(7):8237-47.

[44]	Gao B, Pan Y, Li C, Geng S, Zhao H. Are we hungry for 3D LiDAR data for semantic segmentation? A survey of datasets and methods. IEEE Trans Intell Transp Syst 2022; 23(7):6063-81.

[45]	Guan X, Wan H, Han W, Jiang R, Ou Y, Chen Y, et al. MDS-PointPillars: a lightweight obstacle identification method in farmland based on three-dimensional LiDAR for autonomous navigation. Comput Electron Agric 2025;237:110688.

[46]	Wu T, Guo H, Zhou W, Gao G, Wang X, Yang C. Navigation path extraction for farmland headlands via red-green-blue and depth multimodal fusion based on an improved DeepLabv3+ model. Eng Appl Artif Intel 2025;151:110681.

[47]	Jiang W, Chen W, Song C, Yan Y, Zhang Y, Wang S. Obstacle detection and tracking for intelligent agricultural machinery. Comput Electr Eng 2023;108:108670.

[48]	Chi F, Wang Y, Nasiopoulos P, Leung VCM. Parameter-efficient federated cooperative learning for 3-D object detection in autonomous driving. IEEE Internet Things J 2025; 12(12):20314-25.

[49]

Wang Z, Li YL, Chen X, Zhao H, Wang S. Uni3DETR:unified 3D detection transformer. In: Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S,editors. Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023 (NeurIPS 2023); 2023 Dec 10-16; New Orleans, LA, USA. Red Hook: Curran Associates, Inc.; 2023. p. 39876-96.

[50]	Jin R, Jia Z, Chu Z. 2023 Oct 8-11; Kuala Lumpur, Malaysia. Efficient aerial image object detection with imaging condition decomposition. Proceedings of the 2023 IEEE International Conference on Image Processing (ICIP 2023); New York City:IEEE; 2023. p. 620-4.

[51]	Ravi N, Reizenstein J, Novotny D, Gordon T, Lo WY, Johnson J, et al. Accelerating 3D Deep Learning with PyTorch3D. arXiv:2007.08501.

[52]	Shi S, Jiang L, Deng J, Wang Z, Guo C, Shi J, et al. PV-RCNN++: point-voxel feature set abstraction with local vector representation for 3D object detection. Int J Comput Vis 2023; 131(2):531-51.

[53]	Zhang Y, Lu J, Zhou J.Objects are different:flexible monocular 3D object detection. In:Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021); 2021 Jun 20-25; Nashville, TN, USA. Los Alamitos:IEEE Computer Society; 2021. p. 3288-97.

[54]	Lu B, Sun Y, Yang Z, Song R, Jiang H, Liu Y. HRNet: 3D object detection network for point cloud with hierarchical refinement. Pattern Recogn 2024;149:110254.

[55]	Chen H, Yan H, Yang X, Su H, Zhao S, Qian F. Efficient adversarial attack strategy against 3D object detection in autonomous driving systems. IEEE Trans Intell Transp Syst 2024; 25(11):16118-32.

[56]	Sheng H, Cai S, Zhao N, Deng B, Huang J, Hua XS, et al. Rethinking IoU-based optimization for single-stage 3D object detection. In: Avidan S, Brostow G, Cissé M, Farinella GM, Hassner T, Proceedings of the Computer Vision—ECCV 2022: 17 th European Conference; 2022 Oct 23-27; Tel Aviv, Israel.