Multimodal Feature Representation Mechanism for 3D Detection of Agricultural Obstacles with Few or Zero Samples
Tianhai Wang , Ning Wang , Shunda Li , Zhiwen Jin , Jianxing Xiao , Yanlong Miao , Yifan Sun , Han Li , Man Zhang
Engineering ›› : 202601030
Deep learning (DL) methods, particularly those that combine camera and light detection and ranging (LiDAR) data, have demonstrated remarkable accuracy in three-dimensional (3D) obstacle detection. This is crucial for achieving rigorous and reliable autonomous navigation of agricultural machinery. However, recent approaches heavily rely on large-scale labeled datasets during training, which creates challenges for their application in agriculture because of presence of scarce and distinct agricultural samples. To overcome this limitation, this paper proposes a novel 3D detection method for agricultural obstacles with few or zero samples based on a multimodal feature representation mechanism. Image and point cloud attitude adjusters are integrated to increase the accuracy, reliability, and uniformity of multimodal data. Semantic and geometry-intensity feature encoders are integrated to capture essential relationships among categories. The Bird’s Eye View (BEV) fusion decoder is designed to discern intracategory similarities and intercategory distinctions. Multicategory experiments in various field scenarios reveal that the proposed method reduces the dependence on training samples by 30%-40%, and the precision rate, recall rate, F1 score, and detection speed are 95.03%, 97.01%, 96.01%, and 16.56 frames per second (FPS), respectively. Even in completely unknown scenarios (i.e., obstacle categories that lack any corresponding training samples), the proposed method still achieves an acceptable F1 score of 81.63%. As indicated by the results, the proposed method achieves a sophisticated trade-off among detection performance, operational efficiency, and data dependency, providing an effective safety guarantee for the autonomous navigation of agricultural machinery.
3D obstacle detection / Multimodal representation / Camera-LiDAR fusion / Autonomous navigation / Smart agriculture
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
|
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
|
| [46] |
|
| [47] |
|
| [48] |
|
| [49] |
|
| [50] |
|
| [51] |
|
| [52] |
|
| [53] |
|
| [54] |
|
| [55] |
|
| [56] |
|
/
| 〈 |
|
〉 |