Advance on Agricultural Robot Hand–Eye Coordination for Agronomic Task: A Review

Liang He , Yuhuan Sun , Liping Chen , Qingchun Feng , Yajun Li , Jiewen Lin , Yicheng Qiao , Chunjiang Zhao

Engineering ›› 2025, Vol. 51 ›› Issue (8) : 279 -296.

PDF (15166KB)
Engineering ›› 2025, Vol. 51 ›› Issue (8) :279 -296. DOI: 10.1016/j.eng.2025.01.022
Research
research-article
Advance on Agricultural Robot Hand–Eye Coordination for Agronomic Task: A Review
Author information +
History +
PDF (15166KB)

Abstract

To address the rising global agricultural labor costs, there is an urgent need for robots to accomplish some complex agronomic tasks and break through the limitations of traditional machinery. Thus, robots are considered an essential support for the future smart agriculture. Given that agronomic targets, such as plants and animals, are living organisms with diverse growth patterns and physical characteristics, effective hand–eye coordination is crucial for robots to interact with these targets proficiently. This paper reviews the developments in hand–eye coordination technology for agricultural robots, focusing on its configuration, principles, and applications in target detection and manipulation, based on a review of research literature and technical specifications of commercial products. Furthermore, the ongoing challenges in hand–eye coordination for robotic operations in complex agronomic tasks are analyzed and summarized, and the potential trends for overcoming these challenges are predicted. Finally, this review aims to deepen understanding of agricultural robots’ capabilities and inspire ongoing innovation to further their agricultural applications.

Graphical abstract

Keywords

Agricultural robot / Hand–eye coordination / Agronomic task / Target perception / Collision-free operation

Cite this article

Download citation ▾
Liang He, Yuhuan Sun, Liping Chen, Qingchun Feng, Yajun Li, Jiewen Lin, Yicheng Qiao, Chunjiang Zhao. Advance on Agricultural Robot Hand–Eye Coordination for Agronomic Task: A Review. Engineering, 2025, 51(8): 279-296 DOI:10.1016/j.eng.2025.01.022

登录浏览全文

4963

注册一个新账户 忘记密码

1. Introduction

The aging population and rising labor costs have become common challenges for global agricultural development [1]. According to estimates from the United Nations Department of Economic and Social Affairs, Population Division, the global population is projected to reach approximately 10.3 billion by the mid-21st century [2], doubling the demand for agricultural products. However, the aging and decline of the current agricultural workforce is an ongoing trend in global demographics. Thus, enhancing agricultural mechanization by replacing manual labor with smart machines is essential for maintaining production efficiency and promoting sustainable development in modern agriculture.

In recent years, mechanization in the production of typical field crops in China, including corn, wheat, rice, and soybeans, has increased significantly, with the mechanization rate across the entire production process surpassing 80% [3]. However, the comprehensive mechanization rate of agricultural production such as fruits and vegetables is less than 30% [4], especially complex agronomic operations such as fruit and vegetable harvesting, weed management, plant pruning, and precise pollination, which require identification, positioning, and selective precision operation of animal and plant tissues of different forms, and are still highly dependent on manual operation [5], [6], [7]. At present, the labor cost of fruit and vegetable production has accounted for more than 42% of the total production cost [8], which seriously restricts the improvement of industrial benefits.

Given the unique ability of robots to autonomously perceive, make decisions, and execute, they are capable of achieving mechanized operations for complex agricultural tasks. In recent years, with the support of artificial intelligence technology and modern agricultural production technology, agricultural robot products such as milking robots, driverless tractors, and picking robots continue to emerge, and it is expected that the agricultural robot market will reach 35.9 billion units in 2030 [9].

Traditional agricultural tasks are predominantly performed by human operators. These activities typically involve living plants and animals, which exhibit characteristics such as intertwined growth, irregular postures, varying sizes, and delicate fragility. Consequently, these complex tasks require humans to execute them using precise and gentle movements. Fig. 1 illustrates several traditional agronomic tasks that depend on human operators. As demonstrated in Fig. 1(a), manual milking requires a high level of skill and experience. The milker employs flexible finger movements to perform two main actions: repetitive squeezing and up-and-down stretching. Additionally, the milker must continuously monitor the milk flow to assess whether adjustments in pressure, frequency, or termination of the extraction are necessary. In the tomato picking task depicted in Fig. 1(b), the picker must first locate ripe fruits and then focus on the stem’s position. Using both hands, one hand holds the fruit securely while the other precisely cuts the stem with scissors. Moreover, when handling delicate and fragile items, ensuring flexibility and softness in operation is crucial. As shown in Figs. 1(c) and (d), when picking strawberries [10] and collecting eggs [11], it is vital to handle these items gently and apply an appropriate level of grip force to prevent damage [12].

Thus, robots tasked with executing complex agricultural operations encounter several significant challenges: ① The targets are unevenly spread over vast areas, which requires the robots to possess both high accessibility and precise maneuvering capabilities; ② the irregular and overlapping growth patterns of the targets require reliable hand–eye coordination to identify obscured objects and perform tasks without collisions; ③ the fragility of the targets demands fine motor skills and sensitivity, even requiring the robots to mimic the dexterity and gentleness typical of human handling.

Hand–eye coordination technology, a key component in robotic systems, has been extensively implemented in various complex agronomic operation tasks. For instance, the sweet pepper harvesting robot employs a hand–eye coordination unit mounted on a lifting platform to actively search for and pick mature fruits irregularly distributed on tall plants [13]. Tomatoes grow in unpredictable orientations and are often concealed by branches and leaves. By integrating hand–eye coordination technology with deeply reinforcing learning algorithms, the end-effector can be driven to perform collision-free operations with the optimal grasping posture [14]. For delicate fruits such as grapes, strawberries, and other easily damaged berries, precise segmentation and stem localization by cameras are crucial during mechanical harvesting. The robot arm is then guided to perform accurate harvesting without causing harm. In these scenarios, hand–eye coordination technology plays a vital role in ensuring non-damaging harvests [15]. Despite variations in design, functionality, and environmental conditions, all of these robotic systems adhere to the fundamental principle of hand–eye coordination, where the hand operates under the guidance of the eye. In spite of this, the main problems of the current agricultural robots are the large perception error of the operating object information, the low operating efficiency and success rate, and the lack of non-destructive safety. The research on hand–eye collaboration technology innovation is an important way to improve the performance index of agricultural robots.

As illustrated in Fig. 2, the research and application of the hand–eye coordination system integrates multiple foundational disciplines. Mechanical engineering, which provides robust structures for agricultural robots; electronic engineering, which establishes the critical bridge between software and hardware; control engineering, which enables precise manipulation of robots; artificial intelligence, which improves the robots’ decision-making and adaptive learning abilities; computer vision, which allows robots to perceive the environment accurately; and agricultural science, which promotes the integration of machinery and agronomy, optimizes agronomic conditions to simplify robotic operations and supports the modernization of agricultural production.

With the support and integration of these technologies, agricultural robots for different agronomic tasks sch as selective weeding, harvesting, pollination, and pruning are used to assist or replace humans in labor-intensive tasks, thereby alleviating labor shortages. In contrast to industrial robots, agricultural robots operate in more complex and unpredictable environments.

This paper starts with an introduction to the fundamental concepts and core functions of agricultural robots. In the second section, the architecture, calibration method, and control strategy of hand–eye coordination system are introduced in detail. The third section summarizes the application of hand–eye coordination in the target perception of agricultural robots, including passive perception and active perception. The fourth section summarizes the application of hand–eye coordination in the target operation of agricultural robots. The fifth section summarizes and analyzes the main challenges and development trends of hand–eye collaboration in agricultural robots. The paper concludes with a summary of the key findings in the final section.

2. Configuration and principle of agricultural robot hand–eye system

2.1. Hand–eye system configuration

2.1.1. Robots based on eye-in-hand units

In the eye-in-hand configuration, the camera is typically positioned at the robot arm joint, allowing it to move in tandem with the arm and capture images from various angles. For instance, the broccoli harvesting robot [16], depicted in Fig. 3(a), utilizes an eye-in-hand unit. The camera is incorporated into the end-effector and connected to the robot arm. The arm guides the camera to observe, search for, and identify the broccoli from a top-down perspective. Once the target is correctly positioned in the camera’s field of view, the arm adjusts the end-effector to carry out the harvesting task. In contrast, strawberries, being fragile and susceptible to damage, require careful handling, often involving manipulation of the stem during harvest. As shown in Fig. 3(b), cameras on the multi-arm parallel strawberry harvesting robot [17] are mounted on the end-effector, allowing the robot to observe, recognize, locate, grasp, and cut the stem, ensuring non-destructive harvesting. Given the complexity of agricultural environments, occlusions can impede robotic operations. To address this, the tomato harvesting robot [18] employs the eye-in-hand unit to assess the relative positions of both targets and obstacles. By adjusting the camera’s angle, the robot can optimize its view, reducing the effects of occlusions and lighting fluctuations.

Milking in livestock farming is a labor-intensive process, but the use of robotic systems can greatly reduce the need for manual labor, improve milking efficiency, and maintain consistent milk quality. As illustrated in Fig. 3(c), the end-effector of the milking robot [19] is equipped with four teat cups and a camera. Guided by the camera, the robot arm adjusts its angle to navigate around udders that may obstruct each other, ensuring that the teat cups are accurately positioned and attached to the udders in sequence for automatic milking.

These examples clearly demonstrate that the eye-in-hand unit offers a flexible field of view, allowing it to detect and recognize targets from various perspectives and positions. This capability enhances the system’s perception range and overall performance by incorporating multi-scale information [20], [21].

2.1.2. Robots based on eye-to-hand unit

In the eye-to-hand unit, the camera is positioned on the robot’s frame rather than being mounted on the arm, offering a consistent perspective to monitor the operational environment in a global context. This configuration ensures that the relative position between the camera and the end-effector remains constant, allowing the hand–eye relative position to estimate the target’s location and guiding the hand to perform the task. For different tasks, the “hands” are not limited to robot arms but can also include lasers [22], weed knives [23], spraying systems [24], air delivery systems [25], and egg-picking mechanisms [26].

For example, as shown in Fig. 4(a), the Laserweeder is a laser-based weeding robot [22]. Its cameras and laser generators are installed on the robot platform, moving in unison with it. The perception systems equipped with 42 high-resolution cameras to scans the environment and provides real-time feedback on weed locations as the robot advances. This data is then used to activate 30 lasers for weed elimination. The hand–eye system delivers a response time of just 50 ms, enabling the Laserweeder to eradicate 99% of weeds and achieve a weeding rate of 300 000 weeds per hour. In addition to laser-based methods, selective weeding can also be accomplished through chemical and physical, as shown in Figs. 4(b) and (c). These robots are also based on the Eye-to-hand unit, where cameras identify weeds, which are subsequently controlled using sprayers [27] or weed cutters [28].

The pollination robot [25], depicted in Fig. 4(d), is equipped with four vertically arranged eye-to-hand units, each containing a camera and a wind-assisted pollination device. During operation, the cameras scan the tomato flowers at different heights while the wind-assisted devices perform selective pollination. Similarly, the egg collection robot [26], designed for free-range chicken farms, employs a front-mounted camera to detect and navigate toward eggs. Once the eggs enter the collection zone, they are systematically directed into a storage tank, guided by the egg-collection channel and picking mechanism. As shown in Fig. 4(e), the tomato harvesting robot [29] is equipped with a three dimensional (3D) vision camera mounted on one side. While the robot moves forward, the 3D vision camera captures images of the tomatoes from a fixed perspective, providing crucial data for the robot arm’s operations.

The eye-to-hand unit is characterized by its simple design, quick response time, and extensive field of view, making it well-suited for handling targets arranged on relatively flat surfaces. Examples include kiwis grown on flat trellises [30], kiwi flowers awaiting pollination [31], weeds and crops dispersed on the ground [32], [33], as well as pineapple and tea plants with similar growth heights [34], [35], and apples on two dimensional (2D) standardized tree structures [36].

2.1.3. Robots based on multi-eye/arm

To combine the advantages of both hand–eye system configurations, several studies have fused these two structures by installing cameras on the robot arm and its base, thereby providing global and local perspectives simultaneously [37]. In citrus harvesting, the robot [38] first uses the global camera to assess the citrus distribution and plan the harvesting sequence. This information is then used to direct the eye-in-hand servo system, enabling the robot to approach and examine the citrus in closer detail. As shown in Fig. 5, precise identification and cutting of fruit stems are critical for harvesting cucumbers [39] and tomatoes [40]. The integration of global and local vision information allows effective detection and sequencing of spatially dispersed targets. It also facilitates the search and observation of detailed information, ensuring a higher operational accuracy and success rate. In addition, the sequence of operations can be planned in advance through vision information to improve efficiency. However, this approach also increases the complexity of the system structure and control algorithms, posing a significant challenge to decision systems.

Eye-to-hand and eye-in-hand structures can also be used for dual-arm and multi-arm collaborative work [41]. In the case of the multi-arm parallel strawberry picking robot, as shown in Fig. 3(b), each visual servo system is comprised of a camera and a robot arm, with each unit operating independently. In contrast, dual-arm and multi-arm robot systems are driven by a single camera that simultaneously controls two or more arms. More importantly, this configuration enables collaborative task execution [42], allowing actions such as active obstacle avoidance, mimicking human operations, and performing other more complex tasks.

As shown in Fig. 6(a), the aubergine picking robot [43] is a representative example of dual-arm coordination, where one arm is tasked with clearing obstacles, and the other is dedicated to the picking process. Fig. 6(b) depicts a robot with three robot arms collaborating to harvest sweet peppers [44]. The central arm adjusts the camera to actively search for sweet peppers and identify their stem location. Once the target is locked, the right robot arm uses a flexible gripper to grasp the sweet pepper, while the left robot arm executes the stem cutting, thereby achieving coordinated harvesting.

The number of arms does not directly correlate with the performance of a robot. Increasing the number of arms can lead to interference between them, greater structural complexity, and a higher computational burden on the control system [45], [46].

2.2. Hand–eye relationship calibration

The purpose of calibration is to determine the spatial relationship between the eye and the hand. This procedure ensures accurate mapping of image data to the robot arm’s workspace, enabling the camera to direct the manipulator in executing precise visual servo tasks.

2.2.1. Offline calibration

Offline calibration refers to the process of establishing the hand–eye relationship using calibration tools when the system is not operational. Fig. 7(a) demonstrates the calibration procedure for an eye-in-hand system. In this configuration, the robot arm positions the camera at various angles to observe the calibration board. For each pose i, the relative pose T1i between the camera and the calibration board, as well as the pose T2i of the end-effector are recorded. The transformation matrix X that describes the relationship between the camera and the end-effector is then computed using the following equation:

T2iT1i=XT2i-1T1i-1

Similarly, in the eye-to-hand system depicted in Fig. 7(b), the robot arm manipulates the calibration board to present the camera with various poses. For each pose i, the relative transformation T1i between the camera and the calibration board, as well as the pose T2i of the end-effector relative to the calibration board are determined. The transformation matrix X, representing the relationship between the camera and the robot arm, is then calculated using the following equation:

X=T1iT2i-1

The kiwifruit harvesting robot [47] initially employs a checkerboard pattern to calibrate the vision system, thereby computing the intrinsic and extrinsic parameters of the camera. Subsequently, a circular dot calibration board is utilized to calibrate the stereo vision system, establishing the image matching relationship for the binocular vision system. Furthermore, through offline calibration, the coordinate system of the stereo vision system is aligned with the base coordinate system of the robotic arm.

In addition to checkerboard and circle grid calibration boards, ArUco markers are also commonly utilized offline calibration tools. For instance, in the case of the tomato forward speed in picking robot with an eye-in-hand configuration [48], the ArUco marker is fixed, and the transformation matrix tbTbasetool, which represents the relationship between the base and the calibration board, remains constant. Since the camera is mounted on the hand, the relative transformation matrix ehThandeye between them is constant but unknown. However, the transformation matrix h_ibTbasehand from the end-effector to the base and the transformation matrix te_iTeyetool from the ArUco marker calibration board to the camera are also known. This leads to the following equation:

h_ibTehTte_iT=tbT

This equation can be used to gather multiple calibration data points, and the least squares method can be applied to compute the relative transformation matrix e_ih_iT. Similarly, the peach packaging robot [49] employed an eye-to-hand coordination system, where hand–eye calibration was conducted to establish the relative spatial relationship between the camera and the base. During the calibration process, three methods were evaluated: Tsai–Lenz [50], Park [51], and Horaud [52]. Among these, the method proposed by Park utilizes global optimization to estimate the hand–eye relationship, while the approach by Horaud employs geometric constraints to compute the hand–eye relationship.

In addition to calculating the hand–eye relationship directly through formulas, artificial neural networks have also been demonstrated to compute this relationship as well [53], [54]. Neural networks excel at modeling nonlinear functions, offering superior stability and accuracy compared to traditional approaches for solving transformation matrices. However, irrespective of whether the system is eye-to-hand or eye-in-hand, offline calibration remains the most widely used method across various fields [49], [55], [56], [57].

Offline calibration is also applicable in robotic systems featuring multiple cameras or arms. For instance, in robots equipped with a “global–local” vision system, the global camera is used to estimate the approximate position of the target, while the local camera and robot execute precise tasks according to the relationship established through offline hand–eye calibration [39], [58]. In dual-arm robot systems, each arm can be calibrated separately with the camera to ensure an accurate operational foundation. However, determining the spatial relative positions of the arms is essential for preventing potential collisions when operating simultaneously [45]. Similarly, in a three-arm collaborative robot, offline calibration is used to establish the hand–eye relationship between the camera and one of the arms [44]. Since the bases of these arms are relatively fixed, the operational coordinates of the other two arms can be determined using basic matrix operations.

2.2.2. Online calibration

The hand–eye pose relationship is generally considered to remain fixed throughout the robot’s operation. Consequently, the accuracy of the robot’s performance is heavily dependent on the precision of its hand–eye calibration. However, in practical agricultural scenarios, factors such as unavoidable vibrations, external impacts, and fluctuations in load can disrupt calibration accuracy [59]. To overcome these challenges, researchers have developed more adaptive, efficient, and autonomous online calibration algorithms. These techniques allow for real-time estimation of the hand–eye relationship, even while the robotic arm is actively in use.

For example, in surgical robotics, the frequent replacement of end-effectors requires rapid recalibration. Online calibration enables the swift re-establishment of the camera–robot arm relationship without halting the surgical procedure [60], thus overcoming the inflexibility of traditional offline calibration methods. In harsh environments like deep sea or space, where extreme temperature or pressure conditions may hinder offline calibration, online calibration algorithms enable the robot to self-calibrate autonomously, without human intervention [61]. Similarly, in situations where calibration tools are unavailable or the robot’s range of motion is limited, the hand–eye relationship can be estimated using the Kronecker product. Sparse bundle adjustment can then be applied to refine the estimates, yielding more accurate and robust calibration results [62].

Online calibration can also face challenges due to unexpected occlusions of the calibration board. To address this, Lin et al. [63] proposed a method that combines ArUco markers with a chessboard. This approach ensures that camera decoding remains unaffected, even when the calibration board is partially occluded, thus effectively overcoming the challenges of occlusion interference encountered in traditional calibration methods. During the calibration process, the robot arm adjusts the camera’s pose to capture sufficient calibration board images while simultaneously collecting the joint data of the robot arm. The AX = XB function is then used for online hand–eye calibration. A represents the transformation matrix from the camera coordinate system to the world coordinate system, B represents the transformation matrix from the robot end-effector coordinate system to the world coordinate system, and X represents the transformation matrix from the camera coordinate system to the robot end-effector coordinate system.

To improve the accuracy of online calibration, Zhang et al. [64] proposed a method based on a discrete view quality field (DVQF). This method assigns a score to each AprilTags image, quantifying its contribution to reducing uncertainty in the calibration results and improving the diversity of the robot’s pose. Using this approach, the system can select and guide the robot arm to the next-best-view (NBV), capturing the most valuable images to enhance calibration accuracy.

Online calibration offers significant advantages, including automation [65] and real-time operation [66]. In agricultural robotics, for instance, online calibration helps mitigate precision loss due to prolonged use or collisions, enhancing the automation process and making these systems more accessible to non-technical users. Despite these advantages, research on online calibration for agriculture robots remains limited, primarily due to the need for real-time processing of large data volumes and the high-quality data requirements, which present significant challenges for the system’s computational power and data filtering capabilities.

2.2.3. Uncalibrated

In order to avoid the dependence of robot hand eye calibration accuracy, some studies have proposed uncalibrated methods to solve the problem of frequent calibration, which mainly includes two types of methods. The first category focuses on constructing a Jacobian matrix to define the local linear relationship between two vector spaces [67]. In this approach, the Jacobian matrix maps the robot’s joint velocities (or end-effector velocities) to changes in the velocity of image feature points. This allows visual information to be directly translated into control signals, which guide the robot to achieve the desired target position or orientation [68]. For example, a depth-independent composite Jacobian matrix [69] can be used to represent the robot’s visual and physical parameters in a linear form. By combining this matrix with adaptive control laws and innovative estimation algorithms, uncalibrated visual servoing control can be realized. The infinite homography matrix (IHM) [70] is often employed to convert the motion of image features into rotational and translational movements of the robot’s end-effector. Additionally, using the Kronecker product to derive linear equations and employing Kalman filtering for online estimation of the Jacobian matrix facilitates precise control in uncalibrated conditions. While estimating the Jacobian matrix using adaptive algorithms requires high accuracy in mathematical modeling, it simplifies the calibration process and enhances the adaptability and robustness of the system.

The second category involves constructing reinforcement learning model to learn control strategies from image data. For example, Luo et al. [71] proposed an efficient two-stage keypoint detection network to achieve real-time inference of sparse image plane features. This method integrates robust key points with the visual servoing control loop in image space without needing hand–eye calibration, enabling precise real-time grasping. Furthermore, the deep deterministic policy gradient (DDPG) algorithm [72] is also used to learn the mapping from current image features and camera pose to optimal control commands, allowing robots to move from initial positions to target positions efficiently. DDPG optimizes policy and value networks through repeated trial-and-error in simulation environments, learning to adjust critical parameters in hybrid visual servoing models to achieve smooth robot movement. The hand–eye servo control method based on reinforcement learning avoids the complicated mathematical modeling and parameter calibration process, enabling the system to autonomously complete visual servoing tasks in unknown environments, thereby ensuring high performance in practical applications.

2.3. Servo control strategy

Visual servoing is a control approach that integrates computer vision with control theory, widely applied in robotics and automation systems. Its main objective is to utilize visual feedback for regulating, modifying, and optmizing the movement trajectory of actuators. This facilitates the robot’s ability to sense and interpret its surroundings, thereby enabling precise positioning, tracking, and task execution. Based on the underlying control methodologies, visual servoing techniques are typically categorized into image-based visual servo (IBVS), position-based visual servo (PBVS), hybrid visual servo control (HVSC), and end-to-end control.

2.3.1. IBVS

IBVS is a control strategy that exploits the visual features of a target, such as color, contour, centroid, or texture, to regulate the movements of the actuator. This approach offers several advantages, including simplicity of implementation and high computational efficiency. IBVS is especially effective for tasks requiring continuous operation, such as weeding, spraying, and fertilizing, as it allows robots to analyze and operate targets in real time while moving.

For example, the weeding robot [73] advances at a speed of V (cm∙s−1) under ideal conditions, with the distance between the weeding device and the camera center being S (cm). When the centroid of the weed nears the center of the image, the control system waits for T = S/V seconds before activating the weeding device. However, the system cannot accurately track the robot’s speed in real time without speed feedback sensors. This discrepancy leads to a mismatch between the vehicle’s velocity and the timing for activating the weeding mechanism, causing the weeding success rate drops to 64% when the robot’s speed exceeds 20 cm∙s−1. Adding sensors for speed feedback can avoid the problem of hand–eye coordination mismatch [28].

Planar IBVS typically requires additional sensors to determine the specific location of the target, which may include photoelectric position sensors [74], infrared sensors [75], and tactile sensors [76]. The tomato harvesting robot [40] shown in Fig. 8 calculates the deviation between the center of the tomato and the center of the field of view in real time, guiding the end effector to move toward the fruit based on closed-loop and forward kinematics control. When the tomato enters the picking range, the photoelectric sensor on the end effector is triggered. To achieve accurate IBVS-based leaf-picking operations without relying on other sensors in 3D space, Ahlin et al. [77] adopted a 3D IBVS-based approach. By using multi-view images for monocular depth analysis, this method estimates the position of feature points in the Cartesian coordinate system [77], enabling spatial positioning of the target, and employs IBVS to operate on the target. Wang et al. [78] collected images and depth information in real time using a fixed depth camera and an ordinary camera mounted on a clamp, calculating image errors and 3D errors via multi-view fusion technology to drive the robotic arm for tomato picking.

Since IBVS relies directly on feature information extracted from images for control, the visual system’s ability to distinguish features and handle environmental interference significantly influences the robot’s success rate. Given these factors, IBVS is more suited for applications with minimal environmental interference and obstacles, such as weed management [28], [79], [80] and vegetable fertilization [33], [81].

2.3.2. PBVS

PBVS employs cameras to determine the spatial coordinates and orientation of the target, which are then used to specify the desired pose of the end effector [11]. The decision-making system subsequently calculates the required parameters for each robot arm joint based on the hand–eye relationship. Depending on whether the camera provides real-time feedback to the controller, PBVS can be categorized as either open-loop or closed-loop control.

Fig. 9(a) illustrates the open-loop control process of a kiwi pollination robot based on PBVS [82], [83]. In this process, the camera captures global image data to identify the target. Once the camera calculates the 3D coordinates and orientation of the flowers, no further visual feedback is required. Based on the established hand–eye relationship, the decision-making system drives the robot arm to complete the pollination. This feedback-free mode is recognized for its simple structure, rapid response, and stability, making it ideal for straightforward tasks such as apple harvesting [36] and weed removal [84] in unobstructed static environments. However, open-loop control lacks real-time feedback, meaning that the robot arm cannot adjust accordingly if the target’s pose changes during the operation. External factors like wind or branch disturbances can cause dynamic changes in the target, and detecting the target accurately from a distance may be difficult. To address these challenges, closed-loop control systems offer enhanced precision and adaptability, which involve real-time tracking of target position and dynamic adjustment of the robot arm’s pose.

As shown in Fig. 9(b), the pruning robot requires high precision in locating the cutting points. To ensure precise hand–eye coordination, the robot arm adjusts the camera’s position closer to the cutting point before the actuator reaches it. Simultaneously, the system calculates the real-time coordinates of the cutting point and feeds this information back to the decision-making system, which continuously adjusts and fine-tunes the robot arm’s position for accurate targeting [85].

The closed-loop PBVS control method exhibits strong robustness to environmental variations and demonstrates excellent compatibility with complex agricultural operations, such as selective harvesting [35], [56], [86], pollination [31], and vine pruning [85]. However, the closed-loop control requires substantial computational resources, and the operational accuracy of PBVS is highly dependent on the precision of target localization and hand–eye calibration.

2.3.3. HVSC

HVSC can be regarded as a control strategy integrating PBVS and IBVS, enhancing the camera’s field of view and positioning accuracy simultaneously. This integration addresses limitations related to the robot’s vision and the accuracy of its positioning system. For instance, a soft robot [87] equipped with cameras on both the base and the robot arm utilizes HVSC for servo control. If the target is outside the local camera’s field of view, the IBVS conditions are no longer valid. In such instances, PBVS employs the target’s position information to recalibrate the robot arm’s pose until IBVS conditions are re-established. Even a single camera setup can implement HVSC. The cherry tomato picking robot [88] was designed with an eye-in-hand unit and an HVSC algorithm. Initially, PBVS is used to locate the end-effector roughly. Then, IBVS guided the end-effector to the desired operational position. Finally, the robotic manipulator switched back to PBVS to perform the actual cherry tomato picking.

HVSC also addresses the intricate hand–eye calibration issues commonly encountered in traditional visual servo systems [89]. The hybrid system employs a global RGB-Depth (RGB-D) camera to capture global image and depth information, alongside a local camera for closed-loop control. Then, an adaptive tracking controller dynamically estimates the robot’s parameters in real-time, thus avoiding the cumbersome and error-prone process of hand–eye calibration.

In conclusion, HVSC provides several advantages for agricultural robots. By leveraging PBVS, it broadens the field of view, allowing for the pre-planing of operations and reducing the time spent on target detection. At the same time, the incorporation of IBVS allows the robot arm to bring the camera closer to the target, facilitating more accurate operations [39]. HVSC balances the operating accuracy and perceptual efficiency of PBVS and IBVS, but the disadvantage is that it increases the complexity of the servo system, because it needs to process multiple view information sources in real time and coordinate different operating modes, which puts higher requirements on computing resources and response speed.

2.3.4. End-to-end control strategy

End-to-end control is a strategy that utilizes deep learning to map sensor data to the robot’s control commands directly. In autonomous driving, for example, image sequences are input into convolutional neural networks (CNNs), which predict steering angles directly [90]. This approach facilitates the development of control strategies that are both robust and reliable, making it particularly well-suited for visual navigation tasks in agricultural robotics [91], [92]. For instance, an end-to-end control strategy has been developed that utilizes RGB-D data to identify workers and autonomously navigate to their location, thus facilitating transportation tasks [93].

End-to-end control strategies can also be applied to robot arm picking tasks, enabling the robot arm to imitate human grasping actions [94]. This strategy utilizes a CNN model to extract features from human grasping movements by analyzing human actions. The trained CNN then maps stereo visual information to optimal picking commands for the robot arm, eliminating the need for traditional hand–eye calibration. Yu et al. [95] proposed an end-to-end control strategy based on hyper-networks, which consists of two key components: The hyper-network generates the parameters for the final layer of a low-level controller based on keypoint information of the desired pose, and the neural controller computes the camera’s velocity by comparing the error between the current and target poses, guiding the camera toward the intended position. This architecture preserves the accuracy and robustness of traditional visual servoing while enhancing the system’s flexibility and generalization capabilities through self-supervised learning. Compared to conventional IBVS and other neural controllers, the hyper-network based neural controller (HPN-NC) exhibits notable advantages in terms of success rate, efficiency, network size, inference time, and adaptability.

Despite the potential of end-to-end strategies, research on their application for controlling robot arms in agricultural tasks is still limited. Most studies have focused on end-to-end detection technologies, including flower sorting [96], grape picking point detection [97], and disease recognition [98]. However, the advancement of end-to-end servo control could reduce the reliance on conventional calibration techniques. This approach empowers robots to learn from experience, optimize their functions, and make autonomous decisions. As such, the adoption of end-to-end servo control strategies is expected to play a crucial role in the future development of agricultural robots .

3. Hand–eye coordination for target perception

In complex agricultural environments, accurately and efficiently detecting randomly distributed targets is essential for effective hand–eye coordination in agricultural robotics. Recent research has concentrated on addressing several obstacles present in these unstructured settings, such as occlusions caused by branches and leaves, overlapping fruits, and light interference, all of which complicate visual perception [99]. To examine these challenges, this chapter offers a comprehensive review of target perception techniques, including a thorough comparison of the strengths and limitations associated with both passive and active perception approaches.

3.1. Passive perception

In passive perception mode, the camera’s pose remains fixed, and the hand–eye system passively receives the image data from the field of view. The decision-making system then utilizes this passive perception information to guide the robot arm during task execution. However, since the camera cannot adjust its position to view obstructed targets, it may be necessary to disregard specific targets that present operational risks [30]. As a result, passive perception is more suited for tasks involving targets distributed on relatively flat surfaces, such as weeds, seedlings, or in standardized planting scenarios. In these cases, the camera can continuously monitor most targets within the global field of view from a fixed perspective, enabling efficient robot operation.

For instance, in a vine pruning platform, three cameras are mounted on the side of the machine [100]. As the platform moves forward, these cameras perform global scanning and reconstruction of the vines from fixed angles, enabling real-time detection of pruning points within the global image. Similarly, a spectral camera on a grape disease control robot maintains a constant viewing angle as it moves, allowing for real-time disease detection and identification, which guides the mechanical arm for precise and continuous spraying. Fig. 10(a) shows the robot’s disease perception results during movement [101]. Although the camera is mounted at the robot arm’s end-effector, the arm resets to the home position during each sensing, maintaining a constant sensing posture [102], [103].

Another typical example of passive perception is weeding robots, which capture images of weeds from a fixed overhead view while in motion. These robots rely on IBVS for continuous weed analysis, location, and efficient removal [73], [104]. As shown in Fig. 10(b), a single camera and actuator can no longer be sufficient for efficient sensing and operation while the robot’s working width increases. To overcome this limitation, cameras are installed at the front of each weeding actuator, providing weed information for each actuator and enabling parallel weeding across multiple rows based on IBVS [105]. Similarly, the apple harvesting robot shown in Fig. 10(c) employs four cameras and four robotic arms to expand its perception range and improve picking efficiency [99]. These cameras capture images of the apple trees from fixed angles, and by combining their respective fields of view, the sensing range is significantly expanded.

In summary, in the passive perception mode, the robot can observe the global image of the target. If the perception range is limited, the perception and operation range can be improved by adding a hand–eye collaborative system. This makes the passive perception mode have a significant advantage in the perception range, and can effectively cope with the operation needs in large scenes.

3.2. Active perception

In scenarios involving discretely distributed targets or tasks with complex operational demands, robots must often scan large areas or identify specific local features (e.g., fruit stems). To achieve this, the robot follows predetermined paths to locate the target or adjusts its orientation to collect detailed information about its surroundings. This requires the hand–eye coordination system to explore the environment actively, a process commonly referred to as active perception.

Active perception typically involves manipulating the viewpoint and position of sensors, particularly cameras, to optimize the quality of perception [106]. For instance, a sweet pepper harvesting robot can manage peppers distributed irregularly across space by adjusting its platform’s height [107]. Once a pepper is detected, the visual servoing system performs a close inspection, guiding the robot arm to rotate around the stem following the peduncle’s, thereby enhancing the visibility of the fruit and the peduncle [13], as shown in Fig. 11(a). This strategy helps overcome challenges such as light interference and occlusion, achieving a 38% improvement in harvesting success rate compared to the original method [108]. Additionally, a multi-sensor fusion technique can mitigate light interference by integrating images from various angles, enhancing the servo system’s perception and reducing the impact of lighting fluctuations [109]. Therefore, using multiple cameras to search for targets by capturing images from diverse viewpoints and orientations is also considered an active perception strategy.

Strawberry harvesting demands careful handling to avoid surface damage, as even minor abrasions can considerably impact the fruit’s quality during storage and transport. To overcome this challenge, the strawberry harvesting robot [110] employs active perception to locate the fruit stalks, enabling the end-effector to execute precise actions with the appropriate orientation, as illustrated in Fig. 11(b). The robot’s camera is mounted on the end-effector and surrounded by light emitting diodes (LEDs) to ensure a stable light source, providing the robot works in low-light conditions or at night.

In addition to light interference, working with occluded targets presents another challenge the active perception system must overcome. As illustrated by the green arrows in Fig. 11(c), the robot adjusts the direction and distance of the camera based on visual feedback from the occlusion of the tomatoes. This active adjustment process continues until the system’s confidence reaches a predefined threshold, at which point the harvesting action is initiated. Experiments have demonstrated that with the assistance of active perception, the robot’s precision in harvesting occluded tomatoes increased by 33%, while overall harvesting efficiency improved by 43% [18]. Similarly, the citrus harvesting robot [111] employs a “global–local” visual system and an HVSC strategy for active perception. Initially, the global camera determines the approximate location of the fruit, and subsequently, it guides the robot arm to approach the fruit. Finally, the local camera captures detailed information about the fruit, as depicted in Fig. 11(d).

Active perception systems can overcome challenges such as light interference and restricted fields of view, allowing the detection of occluded targets. These systems offer superior perception capabilities and stability, which has led to their widespread adoption in complex tasks that require precise identification and manipulation, such as pruning and picking [112]. However, a potential limitation is that locating the optimal operating position may necessitate repetitive search processes, decreasing overall efficiency. Therefore, developing efficient search strategies is essential for enhancing the accuracy and robustness of active perception systems.

3.3. Comparison and analysis

Table 1 [28], [31], [36], [73], [82], [83], [84], [99], [100], [102], [103], [109], [110], [113], [114], [115], [116], [117], [118], [119], [120], [121], [122], [123], [124], [125], [126], [127] compares the operational performance of various agricultural robots under passive and active perception modes. Only experimental results conducted in unstructured environments are included to ensure data comparability. From the analysis of the cases listed, the following patterns can be observed:

(1) Most agricultural robots opt for passive perception under favorable agronomic conditions, such as evenly distributed targets and minimal occlusion. In this mode, the robots can maintain both operational efficiency and precision. Kiwi fruit pollination and weeding are typical examples of passive perception cases. The accuracy requirements for pollination and weeding positioning are relatively low, as the operating range of the actuators (such as the pollen or mist droplets sprayed) is often sufficient to cover the target area, including flowers or weeds.

(2) While active perception significantly enhances the robots’ perception accuracy and stability, it also increases the complexity of the robot’s structure and the demands on the vision and control systems. This added complexity is a critical factor that must be addressed in future research and development.

4. Hand–eye coordination for target handling

4.1. Single-target handling

When agricultural robots operate with targets of complex shapes, simply acquiring the center point or centroid information is inadequate to fulfill operational requirements. For tasks such as plant pruning, fruit harvesting, and dairy cow milking, considering the growth state and characteristics of the objects is essential. This allows for the formulation of customized operational poses and trajectories, reducing damage and collision risks and ensuring the robot’s reachability. This approach of customizing the robot’s operational posture for each target is referred to as single-target handling.

Taking the milking robot as an example, cows and their teats exhibit varying postures, requiring the robot to recognize and assess the posture of the teats during milking. This information guides the robotic arm to position the teat cups at the appropriate angle for attachment to the udder, enabling a gentle and damage-free milking operation [128]. Similarly, to achieve damage-free tomato harvesting, the harvesting robot [129] shown in Fig. 12(a) employs a single-target handling strategy. The robot’s camera captures the current image information before each operation. Although multiple targets are visible within the field of view, the decision-making system selects and generates a trajectory for only one target at a time. Similar to human operators, the end-effector adjusts the gripping force based on feedback within a closed-loop control system to avoid damaging the fruit.

In contrast to harvesting individual tomatoes, harvesting tomato clusters presents additional complexities. In such cases, the robot must consider the growth pose of the cluster stem and avoid collisions between the end-effector and the main stem to prevent harvesting failure. Deep reinforcement learning (DRL) methods are well-suited to address these challenges. The end-to-end output capability of DRL algorithms enables the robot to dynamically generate the optimal harvesting pose and path, ensuring that the end-effector approaches the root of the cluster stem at the ideal angle while avoiding collisions [14]. As shown in Fig. 12(b), this end-to-end reinforcement learning-based harvesting strategy increases the success rate by 43% compared to the traditional parallel-to-main-stem operation methods.

Reachability is a critical performance indicator for pruning robots, especially in greenhouse settings with limited operational range. In the tomato de-leafing task, the robot arm needs to function effectively close to the root of the leaves. This ensures minimal damage to the main stems while achieving optimal cuts that reduce the likelihood of disease transmission. To address these needs, researchers [130] have developed advanced optimization algorithms to refine the operational trajectories of the robot arm. This involves refining the kinematic model to optimize reachability while ensuring that the arm remains compact and highly maneuverable. As a result, the system has achieved an impressive reachability rate of 89.98% for lateral branch positions within the designated pruning area. This success highlights the importance of tailoring operational strategies for each target based on its unique pose, which enables the robot arm to adapt to the varying spatial orientations of tomato branches.

Single-target handling, including target detection, path planning, visual feedback, and collision avoidance, enables precise control for each target. This approach significantly enhances operational accuracy and damage-free performance, making it particularly suitable for agricultural robots that require high-precision operations.

4.2. Multi-target handling

Certain robots typically rely on the centroid of the target for operations, such as harvesting kiwifruit, picking apples, and weeding. In these cases, the robot does not need to identify the fruit's peduncle or require a specific pose for the target. Instead, the main objective of the robotic vision and manipulation system is to navigate efficiently and interact with the targets while minimizing both operational time and energy expenditure. This approach is commonly referred to as multi-target handling.

In multi-target handling, one of the core tasks of the decision-making system is to plan the execution sequence for multiple targets. Proper planning improves operational efficiency and minimizes the risk of collisions with other targets during operation [131]. For instance, in the kiwifruit harvesting task [30], as illustrated in Fig. 13(a), the robot’s camera employs semantic segmentation to identify the calyxes of the kiwifruits, which are then used as operational points. These points are processed by the decision-making system, which categorizes, labels, clusters, and ranks them accordingly. Subsequently, a scheduling program assigns tasks to four groups of robot arms in an orderly manner for harvesting. In environments with dwarf and densely planted apple trees, the apples are distributed to minimize occlusion, making them ideal for multi-target handling. The multi-arm collaborative apple harvesting robot [99] uses semantic segmentation to calculate and locate these operation points. To improve the picking efficiency of the four robot arms, a multi-arm task planning method based on the Markov game framework is employed, which optimizes the harvesting sequence and reduces the task planning time by 33.3%.

The aforementioned robots typically function intermittently, halting after reaching the harvesting zone and only resuming when the robot arms have completed the multi-target tasks. Such frequent starts and stops can significantly diminish the overall efficiency of the operation. To address this issue, the tomato harvesting robot [132] employs the DDPG algorithm to generate optimized stopping points, thereby reducing the frequency of halts and the idle time of the robot arms by up to 46.5% and 42.9%, respectively, while ensuring that no fruits are overlooked. In contrast, weeding robots generally do not experience the challenge of intermittent operation, as most are designed for continuous, multi-target handling [28], [84], [126], [133]. A typical weeding strategy is illustrated in Fig. 13(b) [134]. As the robot moves forward, it simplifies its perception of crops and weeds into center points, representing their root systems’ central position. The robot delineates protection and weeding areas based on the center points of crops and weeds.

Multi-target operation eliminates the need for repetitive identification processes by leveraging the centroid of targets to simplify task requirements. The decision-making system plans the operation sequence and trajectory in a single instance after acquiring all target information. Although this approach sacrifices some degree of positioning accuracy, it significantly enhances operational efficiency, making it particularly suitable for scenarios with densely clustered fruits and minimal obstacles.

4.3. Collision-free handle

In environments where plants grow randomly and densely, both single-target and multi-target handling systems must address the challenge of target occlusion. These obstructions impair the robot’s capacity to detect targets accurately and complicate the manipulator’s path planning, increasing the risk of collisions when approaching the targets. As a result, integrating hand–eye coordination technology to improve obstacle avoidance and manipulation capabilities has become a key research focus in agricultural robotics.

4.3.1. Active obstacle avoidance

Active obstacle avoidance involves using sensors to detect the target and surrounding obstacles, guiding the robot arm to navigate around them [135]. For instance, to address the challenges posed by dense foliage during sweet pepper harvesting, a multi-view visual servoing method called 3D move to see (3DMTS) has been employed to achieve active obstacle avoidance [136]. In cases where a sweet pepper is partially occluded and only a fraction of its features are visible, the 3DMTS algorithm leverages a camera mounted on the end-effector to guide the robot arm in real-time, allowing it to search for and analyze the hidden portion of the pepper. Fig. 14(a) illustrates the robot arm’s obstacle avoidance trajectories as it maneuvers around obstacles from multiple directions [136]. The 3DMTS method is an example of an active obstacle avoidance strategy that continuously scans the environment with cameras and adjusts the robot arm’s trajectory in response to obstacles. This technique enhances the visibility of the fruit, increasing it from 1.69% to 30.19%. Similarly, the guava harvesting robot [137], [138] employs a recurrent neural network to store information about fruits and obstacles, which is then processed using the DDPG algorithm to support environmental perception and enable collision-free path planning.

The rapid-exploration random tree (RRT) algorithm is a commonly employed approach for trajectory planning. To address the collision issues from the increased volume of the end-effector after clamping the fruit, the tomato harvesting robot [139] adopts a spatial partitioning method to minimize the sampling space for obstacle avoidance. This method overcomes the challenges posed by the expanded volume of the end-effector by prioritizing the planning of the return path before the picking path. Additionally, it also integrates the position posture map (PPM) and RRT algorithm for unobstructed path planning. Experimental results indicate that this method achieves a nearly 100% success rate and reduces path planning time by more than 70% compared to traditional algorithms.

To address collisions between apple-picking robots and surrounding branches, researchers [113] employed an RGB-D camera to capture point cloud data of apples and nearby branches, which facilitated the initial determination of the apple’s picking direction. Following the spatial modeling of the obstacle data, an enhanced particle swarm optimization (PSO) algorithm was employed to plan the end-effector trajectory in 3D space, ensuring that the picking process remained free of collisions. Similarly, Zhuang et al. [140] applied the artificial potential field (APF) method to simulate the physical fields of apples and obstacles, guiding the robot to avoid collisions. The A* algorithm was also used to identify the most efficient path. To improve the smoothness of the path, the study introduced an improved RRT algorithm for re-planning the path, which generates smoother and more stable trajectories. The stereo vision system is not only used for fruit recognition, but also employed to perform stereo matching for the 3D reconstruction of both fruits and obstacles, as illustrated in Fig. 14(b) [141]. Finally, a DRL algorithm is applied to plan the apple-picking trajectory within the reconstructed model. Regarding average path planning time, the DDPG algorithm requires only 0.27% of the time required by the RRT algorithm.

These strategies guide the robot’s operation by perceiving target and obstacle information and integrating it with path planning, enabling active obstacle avoidance. This strategy increases operational success and safety and expands the potential applications of agricultural robots in complex environments. However, the processes of environmental perception and the generation of collision-free paths substantially elevate the computational demands on decision-making systems, including tasks like 3D reconstruction and real-time path planning, which pose significant challenges to the efficiency of decision algorithms.

4.3.2. Passive obstacle avoidance

In contrast to active obstacle avoidance strategies, passive obstacle avoidance focuses on the visible area of the target and plans the motion path based on this area. For example, the kiwifruit harvesting robot [30] employs a pre-defined “U-move” motion trajectory for harvesting actions. During harvesting operations, the robot arm picks only visible kiwifruits. If obstacles, such as branches or wires, obstruct a target kiwifruit, the system instructs the robot arm to abandon the target rather than adapting its strategy to environmental changes [142]. Although this passive approach can only harvest 55.8% of the kiwifruits, it ensures that the robot arm can safely and efficiently pick fruit at a rate of 2.78 s∙fruit−1 in complex environments [117]. Similarly, the actuator structure of most weeding robots is relatively simple and lacks the degrees of freedom to perform active obstacle avoidance. During physical weeding, the weeding knife may accidentally damage the crop if the weed is too close to the crop. To prevent such damage, the robot only clears weeds that pose no threat to the crops [134].

The humanoid apple-picking robot [143] first performs image segmentation using the hue, saturation, and value (HSV) color space to identify regions in the image that correspond to apples, mainly focusing on the red areas. Following this, the Canny edge detection algorithm is then applied to extract and fit the contours of the visible apple regions. Once the center of the contour is determined, stereo vision technology is used for spatial localization, providing depth information to drive the robot arm accurately. Finally, precise picking is performed using the PBVS algorithm. Importantly, throughout this process, the robot does not actively avoid obstacles such as leaves or branches but instead focuses only on the visible apples.

From the above examples, it is evident that the passive obstacle avoidance strategy focuses on planning the robot’s operation path based on the visible area of the target. When the robot encounters obstacles it cannot overcome, it will abandon the target, potentially reducing the overall success rate of the task. However, this approach provides significant safety benefits, simplifies the trajectory planning process, and reduces the computational complexity of scene reconstruction and path planning.

5. Challenges and trends

5.1. Challenges

(1) Challenges from stabilizing hand–eye relations in complex environments. Offline calibration is a widely adopted technique with extensive applications across various domains, including agricultural robotics. However, maintaining consistent hand–eye calibration results presents significant challenges due to the inherent complexity and unpredictability of agricultural environments, which are influenced by various uncontrollable factors. Elements such as vibrations, impacts, or collisions can disrupt the relationship between the robot arm and the camera, affecting the accuracy and reliability of robotic operations.

In addition, certain objective factors also affect calibration accuracy, including variations in gear mesh accuracy, wear on the robot arm joints, and minor deformation. Over time, such errors tend to accumulate, gradually reducing the precision of servo control. Additionally, maintenance or recalibration carried out by farmers without professional training or experience may inadvertently affect the positioning accuracy of the visual servo system, leading to operational errors and potential economic losses. Consequently, maintaining precise calibration remains a key challenge for agricultural robots, necessitating effective solutions to ensure the stability and accuracy of hand–eye coordination in these dynamic environments.

(2) Challenges from collision-free operation. Taking apple orchards as an example, traditional planting models typically feature spherical tree crowns, with fruits distributed across the surface or occasionally concealed within the foliage. This conventional approach poses significant challenges for robotic harvesting, as robots must navigate the scattered arrangement of fruits and the dense foliage that obstructs their path. However, recent advancements in agronomy have led to the introduction of new dwarf rootstock varieties and innovative cultivation techniques, such as “V”-shaped, “Y”-shaped, “wall”-shaped, and spindle-shaped configurations. Among these, the spindle-shaped, close-planting dwarf rootstock model has become particularly popular due to its significant yield advantages [144]. These agronomic improvements boost production efficiency and reduce the complexity of robotic operations. Despite these advancements, some obstacles remain unavoidable, and achieving entirely unobstructed operations remains a key challenge for robotic systems.

The highly irregular morphology of naturally growing animals and plants, along with the complexity of the operating environment, pose significant challenges to ensuring the collision-free operation of robots. Another major challenge in achieving collision-free operation is addressing the impact of occlusion on visual recognition. Occlusions can obscure essential information, reducing accuracy in robot operations and increasing the risk of system malfunctions. Once the issue of object detection under occlusions is resolved, the next critical task is to implement collision-free path planning for the robot arm. During operation, it is crucial to ensure that the end-effector reaches the operation point smoothly and to prevent collisions between the robot arm joints and surrounding animals or plants. Such collisions can damage the robot and the plants, causing substantial economic losses.

(3) Challenges from reproducing manual skills. Humans exhibit exceptional adaptability and precision in agricultural activities, but transferring these skills to agricultural robots presents a considerable challenge. For example, tasks such as picking delicate berries and milking require robots to demonstrate high levels of operational compliance. Moreover, more intricate actions, such as bending, twisting, and trimming side branches or peduncles, necessitate even greater precision and flexibility. Using reinforcement learning to mimic the actions of surgical experts [145] and employing dual-arm robots for clothing organizations [146] have demonstrated impressive precision and dexterity. These examples provide valuable insights for advancing agricultural robot technology, but they also present formidable challenges.

In summary, the coordinated operation of dual-arm or multi-arm robots introduces additional challenges, particularly in enhancing their hand–eye coordination capabilities. Despite the remarkable technical potential demonstrated in these tasks, most agricultural robots cannot yet mimic them, and related research remains relatively early.

5.2. Trends

The evolution of hand–eye coordination technology heralds promising developments for agricultural robotics. Looking ahead, several trends are expected to shape the future of the field:

(1) End-to-end control from vision to action. Surgical and space robots necessitate significantly higher levels of precision and safety than agricultural robots, owing to their critical roles in human health and high economic costs. In the medical domain, surgical robots often operate within the confined and delicate spaces of the human body, imposing stringent demands on the robot’s size, dexterity, and responsiveness [147]. For example, intestinal inspection robots [148] employ deep learning models to map image data directly into motor control commands, allowing the robot arm to move precisely. Compared to IBVS, this end-to-end control framework significantly reduces inspection time and the risk of intestinal damage. Agricultural robots stand to gain from adopting such an end-to-end control approach, as it would help mitigate the risk of collisions when the robot arm approaches branches or other obstacles.

In the aerospace field, robots are employed to collect floating debris in space. However, due to the microgravity environment, both the debris and the robot experience free-floating motion, creating relative movement between them. This relative motion renders traditional offline trajectory planning methods ineffective [149]. To address this challenge, space robots employ an end-to-end control strategy that utilizes multiple multi-layer neural networks [150], which are optimized through the soft actor-critic (SAC) algorithm. During operation, the controller employs this strategy to map RGB images in real time into joint angular velocities for the robot arm. These velocities are then fed into a proportional-derivative controller, which precisely controls the torque at each joint. This end-to-end control strategy demonstrates excellent resistance to interference and robustness, effectively handling the relative motion between the target and the robot’s base. This method offers valuable insights for mobile agricultural robots, particularly for tasks such as weeding, spraying, and pollination that require continuous operation.

In summary, the end-to-end control strategy [151] utilizes deep learning or reinforcement learning to map control commands directly from visual data, providing an effective solution for guiding agricultural robots. By eliminating the need for conventional control steps, such as hand–eye calibration and path planning, this approach is particularly effective in addressing the challenges posed by dynamic variations in the hand–eye relationship. Specifically, a reinforcement learning model is employed, where a reward function is designed to optimize the robot’s action strategy based on successful task execution. Throughout the learning process, the robot gains experience through the reward function and tries to find the optimal path. Once training is complete, the robot no longer requires hand–eye calibration, enabling it to directly generate the motion trajectory of the robot arm from captured images, thereby achieving seamless end-to-end hand–eye coordination.

(2) Integration of machinery and agronomy. Improving agronomic conditions is one of the most direct and effective ways to reduce the complexity of robotic operations. For example, vine suspension, pruning, and topping in greenhouse tomato cultivation help maintain consistent plant height and growth form. This consistency ensures that fruits ripen simultaneously and are evenly distributed, thereby reducing the time spent by harvesting robots on active perception tasks and improving overall operational efficiency. In elevated strawberry cultivation, the fruits tend to cluster along the sides of the beds, with fewer obstacles and a more straightforward background, which facilitates the robot’s search process and obstacle avoidance. Additionally, the compact structure of spindle-shaped dwarf apple trees, characterized by small and orderly crowns, allows robots to harvest directly between rows without navigating around the trees. The orderly canopy structure further minimizes interference from branches, leaves, and background, enhancing the robot’s precision and reducing the risk of collisions.

In conclusion, improving agronomic conditions, such as canopy pruning, breeding new varieties, standardizing planting patterns, and adopting factory-style cultivation, can significantly reduce the complexity of the agricultural environment. These modifications help minimize the likelihood of target occlusion and robot collisions, thereby increasing the efficiency and safety of mechanized harvesting. Consequently, these advancements are anticipated to play a pivotal role in the future development of agricultural automation.

(3) Skill transfer and learning between humans and robots. Replacing human labor with robots in industry, healthcare, services, and agriculture has become a prominent research direction. Many tasks within these fields require high levels of precision, compliance, and flexibility, making it practical to mimic human skills for performing these tasks. For example, industrial robots [152] have adopted various functions, such as part recognition, positioning, packaging, and transportation, by emulating the collaborative capabilities of human arms. Through effective dual-arm coordination and autonomous operation, these robots can perform fully automated workflows, spanning from order processing to shipping. Multi-arm collaborative assembly robots [153] are gaining traction in manufacturing, given their capacity to quickly and flexibly construct complex structures with minimal human intervention.

In contrast, agricultural robots are still relatively limited in their capabilities. However, if these robots could perform a broader array of tasks, such as harvesting, weeding, pruning, and de-leafing, by dexterous, human-like hand movements, their extensive skill set would significantly expand their range of applications, reduce the idle rate and the cost of farmers to purchase agricultural machinery.

In the medical field, surgical robots [145] utilize discrete reinforcement learning to master expert-level suturing techniques. By generating sparse rewards for desired motion trajectories, these robots assist surgeons with various types of wound suturing. This capacity to mimic human actions is equally beneficial for agricultural robots, as it could improve the accuracy and consistency of farm tasks by emulating expert farmers’ movements, ultimately ensuring higher-quality produce. Furthermore, discrete reinforcement learning enables agricultural robots to adapt to diverse environments and tasks, such as allowing a single robot to harvest different types of fruit, like apples, citrus, and peaches. In the service industry, the dual-arm robot [146] utilizing deep P-network and dueling deep P-network techniques has demonstrated exceptional gentleness and flexibility in tasks such as folding clothes. These advanced techniques could provide valuable insights for agricultural robots, particularly for non-destructive operations such as delicate fruit picking and placement. For example, mimicking human actions to place harvested grapes carefully could minimize damage risk [154].

For robots to attain or even surpass human-level operational capabilities, it is crucial to integrate human skill sets into their functionality. By establishing inverse reinforcement learning models to analyze human postures and actions, robots can extract and learn critical operational experiences, which are central to mastering human skills. Optimizing the hand–eye coordination system by integrating mechanical design and control models will further enhance the robot’s ability to replicate and improve human skills.

6. Conclusions

As the labor cost’s continuous rising, agricultural robots have become integral for the smart agriculture, particularly those capable of replacing or even outperforming humans in complex agronomic tasks. The hand–eye coordination, which enable the robot to dynamically operate on the target based on real-time visual information, is crucial for intricate agricultural operations such as fruit harvesting, weeding, pollination, and pruning. In contrast to industrial robots that typically work in highly standardized production environments, the design of hand–eye coordination systems for agricultural robots must account for the complex agronomic tasks in unstructured conditions. Additionally, it is essential to strike a balance between the accuracy and efficiency of handling these tasks. Although the hand–eye coordination technology in agricultural robots has made considerable advancements and is increasingly being implemented in practical agricultural applications, the robots still face various challenges, which mainly include issues with target perception and obstacle avoidance. Future research should focus on key breakthroughs in areas such as robotic imitation of human perception and manipulation skills, error identification and correction under natural conditions, multi-eye/hand collaborative systems.

CRediT authorship contribution statement

Liang He: Writing – original draft, Conceptualization. Yuhuan Sun: Writing – review & editing. Liping Chen: Methodology. Qingchun Feng: Writing – review & editing, Funding acquisition. Yajun Li: Methodology. Jiewen Lin: Visualization, Data curation. Yicheng Qiao: Investigation. Chunjiang Zhao: Writing – review & editing, Funding acquisition.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the Beijing Natural Science Foundation (6252007), the Beijing Nova Program, China (20220484023), the BAAFS Innovation Capacity Building Project (KJCX20240502), and the BAAFS International Science and Technology Cooperation Platform (2024-08).

References

[1]

Li A, Reimer JJ.The US market for agricultural labor: evidence from the national agricultural workers survey.Appl Econ Perspect Policy 2021; 43(3):1125-1139.

[2]

World Population Prospects 2024: Summary Report. New York City: United Nations; 2024 [cited 15 Sep 2024]. Available from:

[3]

Liao W, Zeng F, Chanieabate M.Mechanization of small-scale agriculture in China: lessons for enhancing smallholder access to agricultural machinery.Sustainability 2022; 14(13):7964.

[4]

Fang D, Chen J, Wang S, Chen B.Can agricultural mechanization enhance the climate resilience of food production? Evidence from China.Applied Energy 2024; 373:123928.

[5]

Verbiest R, Ruysen K, Vanwalleghem T, Demeester E, Kellens K.Automation and robotics in the cultivation of pome fruit: Where do we stand today?.J Field Robot 2021; 38(4):513-531.

[6]

Assirelli A, Giovannini D, Cacchi M, Sirri S, Baruzzi G, Caracciolo G.Evaluation of a new machine for flower and fruit thinning in stone fruits.Sustainability 2018; 10(11):4088.

[7]

N SA’Cho, Mourits M, Rodenburg J, Lansink AO.Inefficiency of manual weeding in rainfed rice systems affected by parasitic weeds.Agric Econ 2019; 50(2):151-163.

[8]

Li Y, Huang KM; Guan Z.Mechanization and farm profit: model and application to specialty crops.In: Agricultural and Applied Economics Association (AAEA) Conferences; 2024 July 28–30, New Orleans, LA, USA. AgEcon Search; 2024.

[9]

Shahbandeh M.Agricultural robots: global market unit volume 2020—2030 [Internet].New York City: Statista; 2024 Nov 14 [cited 2024 Dec 15]. Available from: https://www.statista.com/statistics/1290013/agricultural-robot-global-market-unit-volume/.

[10]

De Preter A, Anthonis J, De Baerdemaeker J.Development of a robot for harvesting strawberries.IFAC-PapersOnLine 2018; 51(17):14-19.

[11]

Li G, Chesser GD, Huang Y, Zhao Y, Purswell JL.Development and optimization of a deep-learning-based egg-collecting robot.Trans ASABE 2021; 64(5):1659-1669.

[12]

Ranasinghe HNB; Kawshan C; Himaruwan S; Kulasekera AL; Dassanayake P.Soft Pneumatic Grippers for Reducing Fruit Damage During Strawberry Harvesting.In: proceedings of the 2022 Moratuwa Engineering Research Conference (MERCon); 2022 Jul 27–29; Moratuwa, Sri Lanka. Piscataway: IEEE; 2022.

[13]

Arad B, Balendonck J, Barth R, Ben-Shahar O, Edan Y, Hellström T, et al.Development of a sweet pepper harvesting robot.J Field Robot 2020; 37(6):1027-1039.

[14]

Li Y, Feng Q, Zhang Y, Peng C, Ma Y, Liu C, et al.Peduncle collision-free grasping based on deep reinforcement learning for tomato harvesting robot.Comput Electron Agric 2024; 216:108488.

[15]

Fujinaga T.Strawberries recognition and cutting point detection for fruit harvesting and truss pruning.Precis Agric 2024; 25(3):1262-1283.

[16]

roboveg.com [Internet]. Leicestershire: KMS Projects Limited; c2024 [cited 2024 Sep 23]. Available from:

[17]

agrobot.com [Internet]. c2024 [cited 2024 Sep 23]. Available from:

[18]

Sun T, Zhang W, Miao Z, Zhang Z, Li N.Object localization methodology in occluded agricultural environments through deep learning and active sensing.Comput Electron Agric 2023; 212:108141.

[19]

lely.com [Internet]. Maassluis: Lely; c2024 [cited 2024 Oct 20]. Available from:

[20]

Chen M, Chen Z, Luo L, Tang Y, Cheng J, Wei H, et al.Dynamic visual servo control methods for continuous operation of a fruit harvesting robot working throughout an orchard.Comput Electron Agric 2024; 219:108774.

[21]

Lee G, Yonrith P, Yeo D, Hong A.Enhancing detection performance for robotic harvesting systems through RandAugment.Eng Appl Artif Intell 2023; 123:106445.

[22]

carbonrobotics.com [Internet]. Seattle: carbon robotics; c2025 [cited 2024 Sep 23]. Available from:

[23]

Wang Y, Ye Y, Wu H, Tao K, Qian M.In different weed distributions, the dynamic coverage algorithm for mechanical selective weeding robot.Comput Electron Agric 2024; 226:109486.

[24]

Nasir FE, Tufail M, Haris M, Iqbal J, Khan S, Khan MT.Precision agricultural robotic sprayer with real-time Tobacco recognition and spraying system based on deep learning.PLoS One 2023; 18(3):e0283801.

[25]

arugga.com [Internet]. Australia: Arugga; c2021 [cited 2024 Sep 23]. Available from:

[26]

Chang C, Xie B, Wang C.Visual guidance and egg collection scheme for a smart poultry robot for free-range farms.Sensors 2020; 20(22):6624.

[27]

press.ecorobotix. com [Internet]. Yverdon-les-Bains: ecorobotix; c2021 [cited 2024 Sep 23]. Available from:

[28]

Quan L, Jiang W, Li H, Li H, Wang Q, Chen L.Intelligent intra-row robotic weeding system combining deep learning technology with a targeted weeding mode.Biosyst Eng 2022; 216:13-31.

[29]

metomotion.com [Internet]. Israel: GRoW; c2019 [cited 2024 Sep23]. Available from:

[30]

Williams HAM, Jones MH, Nejati M, Seabright MJ, Bell J, Penhall ND, et al.Robotic kiwifruit harvesting using machine vision, convolutional neural networks, and robotic arms.Biosyst Eng 2019; 181:140-156.

[31]

Gao C, He L, Fang W, Wu Z, Jiang H, Li R, et al.A novel pollination robot for kiwifruit flower based on preferential flowers selection and precisely target.Comput Electron Agric 2023; 207:107762.

[32]

Mary MF, Yogaraman D.Neural network based weeding robot for crop and weed discrimination.J Phys Conf Ser 2021; 1979:012027.

[33]

Ulloa CC, Krus A, Barrientos A, del Cerro J, Valero C.Robotic fertilization in strip cropping using a CNN vegetables detection-characterization method.Comput Electron Agric 2022; 193:106684.

[34]

Liu T, Qiu J, Liu Y, Li J, Chen S, Lai J, et al.Research on an intelligent pineapple pre-harvest anti-lodging method based on deep learning and machine vision.Comput Electron Agric 2024; 218:108706.

[35]

Li Y, Wu S, He L, Tong J, Zhao R, Jia J, et al.Development and field evaluation of a robotic harvesting system for plucking high-quality tea.Comput Electron Agric 2023; 206:107659.

[36]

Hu G, Chen C, Chen J, Sun L, Sugirbay A, Chen Y, et al.Simplified 4-DOF manipulator for rapid robotic apple harvesting.Comput Electron Agric 2022; 199:107177.

[37]

Jin Y, Yu C, Yin J, Yang SX.Detection method for table grape ears and stems based on a far-close-range combined vision system and hand-eye-coordinated picking test.Comput Electron Agric 2022; 202:107364.

[38]

Mehta SS, MacKunis W, Burks TF.Robust visual servo control in the presence of fruit motion for robotic citrus harvesting.Comput Electron Agric 2016; 123:362-375.

[39]

Park Y, Seol J, Pak J, Jo Y, Kim C, Son HI.Human-centered approach for an efficient cucumber harvesting robot system: harvest ordering, visual servoing, and end-effector.Comput Electron Agric 2023; 212:108116.

[40]

Shi Y, Zhang W, Li Z, Wang Y, Liu L, Cui Y.A “global–local” visual servo system for picking manipulators.Sensors 2020; 20(12):3366.

[41]

Bai Q, Li P, Tian W, Shen J, Li B, Hu J.Vision guided dynamic synchronous path tracking control of dual manipulator cooperative system.J Manuf Sci Eng 2023; 145(12):121003.

[42]

Jiang Y, Liu J, Wang J, Li W, Peng Y, Shan H.Development of a dual-arm rapid grape-harvesting robot for horizontal trellis cultivation.Front Plant Sci 2022; 13:881904.

[43]

Sepulveda D, Fernandez R, Navas E, Armada M, Gonzalez-De-Santos P.Robotic aubergine harvesting using dual-arm manipulation.IEEE Access 2020; 8:121889-121904.

[44]

Lenz C, Menon R, Schreiber M, Jacob MP, Behnke S, Bennewitz M.HortiBot: an adaptive multi-arm system for robotic horticulture of sweet peppers.2024. arXiv: 2403.15306.

[45]

He Z, Ma L, Wang Y, Wei Y, Ding X, Li K, et al.Double-arm cooperation and implementing for harvesting kiwifruit.Agriculture 2022; 12(11):1763.

[46]

Pu Q, Xu X, Zhang H, Li Q, Rodi Ać, Petrovich PB, et al.The algorithm of multiple obstacle avoidance tasks for dual-arm robots.IEEE Access 2023; 11:79190-79202.

[47]

Au CK, Lim SH, Duke M, Kuang YC, Redstall M, Ting C.Integration of stereo vision system calibration and kinematic calibration for an autonomous kiwifruit harvesting system.Int J Intell Robot Appl 2023; 7(2):350-369.

[48]

Song C, Wang K, Wang C, Tian Y, Wei X, Li C, et al.TDPPL-Net: a lightweight real-time tomato detection and picking point localization model for harvesting robots.IEEE Access 2023; 11:37650-37664.

[49]

Wang Q, Wu D, Sun Z, Zhou M, Cui D, Xie L, et al.Design, integration, and evaluation of a robotic peach packaging system based on deep learning.Comput Electron Agric 2023; 211:108013.

[50]

Tsai RY, Lenz RK.A new technique for fully autonomous and efficient 3D robotics hand/eye calibration.IEEE Trans Robot Autom 1989; 5(3):345-358.

[51]

Park FC, Martin BJ.Robot sensor calibration: solving AX=XB on the Euclidean group.IEEE Trans Robot Autom 1994; 10(5):717-721.

[52]

Dornaika F, Horaud R.Simultaneous robot-world and hand–eye calibration.IEEE Trans Robot Autom 1998; 14(4):617-622.

[53]

Sato J.Hand–eye calibration using a tablet computer.Math Comput Appl 2023; 28(1):22.

[54]

Hua J, Zeng L.Hand–eye calibration algorithm based on an optimized neural network.Actuators 2021; 10(4):85.

[55]

Chen C, Lu J, Zhou M, Yi J, Liao M, Gao Z.A YOLOv3-based computer vision system for identification of tea buds and the picking point.Comput Electron Agric 2022; 198:107116.

[56]

Li J, Tang Y, Zou X, Lin G, Wang H.Detection of fruit-bearing branches and localization of litchi clusters for vision-based harvesting robots.IEEE Access 2020; 8:117746-117758.

[57]

Zhang X, Yao M, Cheng Q, Liang G, Fan F.A novel hand–eye calibration method of picking robot based on TOF camera.Front Plant Sci 2022; 13:1099033.

[58]

Peng Y, Liu J, Xie B, et al.Research progress of urban dual-arm humanoid grape harvesting robot.In: Proceedings of the 2021 IEEE 11th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER); 2021 Jul 27–31 July 2021; Jiaxing, China. Piscataway: IEEE; 2021.

[59]

Warren M, Mckinnon D, Upcroft B.Online calibration of stereo rigs for long-term autonomy.In: 2013 IEEE International Conference on Robotics and Automation; 2013 May 6–10; Karlsruhe, Germany. Piscataway: IEEE; 2013.

[60]

Zhang Z, Zhang L, Yang G.A computationally efficient method for hand–eye calibration.Int J CARS 2017; 12(10):1775-1787.

[61]

Du G, Zhang P.Online robot calibration based on vision measurement.Robot Comput-Integr Manuf 2013; 29(6):484-492.

[62]

Li W, Dong M, Lu N, Lou X, Sun P.Simultaneous robot–world and hand–eye calibration without a calibration object.Sensors 2018; 18(11):3949.

[63]

Lin W, Liang P, Luo G, Zhao Z, Zhang C.Research of online hand–eye calibration method based on ChArUco board.Sensors 2022; 22(10):3805.

[64]

Zhang X, Xi Y, Huang Z, Zheng L, Huang H, Xiong Y, et al.Active hand–eye calibration via online accuracy-driven next-best-view selection.Vis Comput 2023; 39:381-391.

[65]

Zhang X, Song Y, Yang Y, Pan H.Stereo vision based autonomous robot calibration.Robot Auton Syst 2017; 93:43-51.

[66]

Mikhelson IV, Lee PG, Sahakian AV, Wu Y, Katsaggelos AK.Automatic, fast, online calibration between depth and color cameras.J Vis Commun Image Represent 2014; 25(1):218-226.

[67]

Han T, Zhu H, Yu D.Data-driven model predictive control for uncalibrated visual servoing.Symmetry 2024; 16(1):48.

[68]

Shademan A, Farahmand AM.Jägersand M.Robust Jacobian estimation for uncalibrated visual servoing. In: 2010 IEEE International Conference on Robotics and Automation; 2010 May 3–7; Anchorage, AK, USA. Piscataway: IEEE; 2010.

[69]

Lai G, Liu A, Yang W, Chen Y, Zhao L.Uncalibrated adaptive visual servoing of robotic manipulators with uncertainties in kinematics and dynamics.Actuators 2023; 12(4):143.

[70]

Lei X, Fu Z, Spyrakos-Papastavridis E, Pan J, Li M, Chen X.IHUVS: infinite homography-based uncalibrated methodology for robotic visual servoing.IEEE Trans Ind Electron 2024; 71(4):3822-3831.

[71]

Luo J, Zhu L, Li L, Hong P.Robot visual servoing grasping based on top-down keypoint detection network.IEEE Trans Instrum Meas 2024; 73:5000511.

[72]

Liu Z, Wang K, Liu D, Wang Q, Tan J.A motion planning method for visual servoing using deep reinforcement learning in autonomous robotic assembly.IEEE/ASME Trans Mechatron 2023; 28(6):3513-3524.

[73]

Chang CL, Xie BX, Chung SC.Mechanical control with a deep learning method for precise weeding on a farm.Agriculture-Basel 2021; 11(11):1049.

[74]

De-an Z, Jidong L, Wei J, Ying Z, Yu C.Design and control of an apple harvesting robot.Biosyst Eng 2011; 110(2):112-122.

[75]

Xiong Y, From PJ. Isler V. Design and evaluation of a novel cable-driven gripper with perception capabilities for strawberry picking robots. In: 2018 IEEE International Conference on Robotics and Automation (ICRA); 2018 May 21–25; Brisbane, QLD, Australia. Piscataway: IEEE; 2018.

[76]

Dischinger LM.Cravetz M, Dawes J, Votzke C, VanAtter C, Johnston ML. Towards intelligent fruit picking with in-hand sensing. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 2021 Sep 27–Oct 1; Prague, Czech Republic. Piscataway: IEEE; 2021.

[77]

Ahlin K, Joffe B, Hu AP, McMurray G, Sadegh N.Autonomous leaf picking using deep learning and visual-servoing.IFAC-PapersOnLine 2016; 49(16):177-183.

[78]

Wang L, Chu Y. Huang Y, Liang F. Enhancement on target-gripper alignment: a tomato harvesting robot with dual-camera image-based visual servoing; In: 2024 IEEE International Conference on Robotics and Automation (ICRA); 2024 May 13–17; Yokohama, Japan. Piscataway: IEEE; 2024.

[79]

Machleb J, Peteinatos GG, Sökefeld M, Gerhards R.Sensor-based intrarow mechanical weed control in sugar beets with motorized finger weeders.Agronomy 2021; 11(8):1517.

[80]

Ji Y, Kumar R, Singh D, Singh M.Performance analysis of target information recognition system for agricultural robots.Int J Agric Environ Inf Syst 2021; 12(2):49-60.

[81]

Valero C, Krus A, Cruz Ulloa C, Barrientos A, Ramírez-Montoro JJ, del Cerro J, et al.Single plant fertilization using a robotic platform in an organic cropping environment.Agronomy 2022; 12(6):1339.

[82]

Li K, Zhai L, Pan H, Shi Y, Ding X, Cui Y.Identification of the operating position and orientation of a robotic kiwifruit pollinator.Biosyst Eng 2022; 222:29-44.

[83]

Li K, Huo YJ, Liu YA, Shi Y, He Z, Cui Y.Design of a lightweight robotic arm for kiwifruit pollination.Comput Electron Agric 2022; 198:107114.

[84]

Zhu H, Zhang Y, Mu D, Bai L, Zhuang H, Li H.YOLOX-based blue laser weeding robot in corn field.Front Plant Sci 2022; 13:1017803.

[85]

You A, Sukkar F, Fitch R, Karkee M, Davidson JR.An efficient planning and control framework for pruning fruit trees.In: Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA); 2020 May 31–Aug 31; Paris, France. Piscataway: IEEE; 2020.

[86]

Gharakhani H, Alex Thomasson J, Lu Y.Integration and preliminary evaluation of a robotic cotton harvester prototype.Comput Electron Agric 2023; 211:107943.

[87]

Albeladi A, Ripperger E, Hutchinson S, Krishnan G.Hybrid eye-in-hand/eye-to-hand image based visual servoing for soft continuum arms.IEEE Robot Autom Lett 2022; 7(4):11298-11305.

[88]

Li YR, Lien WY, Huang ZH, Chen CT.Hybrid visual servo control of a robotic manipulator for cherry tomato harvesting.Actuators 2023; 12(6):253.

[89]

Li T, Yu J, Qiu Q, Zhao C.Hybrid uncalibrated visual servoing control of harvesting robots with RGB-D cameras.IEEE Trans Ind Electron 2023; 70(3):2729-2738.

[90]

Yang Z, Zhang Y, Yu J, Cai J, Luo J.End-to-end multi-modal multi-task vehicle control for self-driving cars with visual perceptions.In: 2018 24th International Conference on Pattern Recognition (ICPR); 2018 Aug 20–24; Beijing, China. Piscataway: IEEE; 2018.

[91]

Li D, Li B, Kang S, Feng H, Long S, Wang J.E2CropDet: an efficient end-to-end solution to crop row detection.Expert Syst Appl 2023; 227:120345.

[92]

Huang P, Zhu L, Zhang Z, Yang C.An end-to-end learning-based row-following system for an agricultural robot in structured apple orchards.Math Probl Eng 2021; 2021(1):6221119.

[93]

Pal A, Leite AC, From PJ.A novel end-to-end vision-based architecture for agricultural human–robot collaboration in fruit picking operations.Robot Auton Syst 2024; 172:104567.

[94]

Tsai CY, Huang CC, Chou YS.Data-driven visual picking control of a 6-DoF manipulator using end-to-end imitation learning.In: Proceedings of the 2018 International Automatic Control Conference (CACS); 2018 Nov 4–7; Taoyuan, China. Piscataway: IEEE; 2018.

[95]

Yu H, Chen A, Xu K, Zhou Z, Jing W, Wang Y, et al.A hyper-network based end-to-end visual servoing with arbitrary desired poses.IEEE Robot Autom Lett 2023; 8(8):4769-4776.

[96]

Duan Z, Liu W, Zeng S, Zhu C, Chen L, Cui W.Research on a real-time, high-precision end-to-end sorting system for fresh-cut flowers.Agriculture 2024; 14(9):1532.

[97]

Zhao R, Zhu Y, Li Y.An end-to-end lightweight model for grape and picking point simultaneous detection.Biosyst Eng 2022; 223:174-188.

[98]

Khan K, Khan RU, Albattah W, Qamar AM.End-to-end semantic leaf segmentation framework for plants disease classification.Complexity 2022; 2022(1):1168700.

[99]

Li T, Xie F, Zhao Z, Zhao H, Guo X, Feng Q.A multi-arm robot system for efficient apple harvesting: perception, task plan and control.Comput Electron Agric 2023; 211:107979.

[100]

Botterill T, Paulin S, Green R, Williams S, Lin J, Saxton V, et al.A robot system for pruning grape vines.J Field Robot 2017; 34(6):1100-1122.

[101]

Oberti R, Marchi M, Tirelli P, Calcante A, Iriti M, Tona E, et al.Selective spraying of grapevines for disease control using a modular agricultural robot.Biosyst Eng 2016; 146:203-215.

[102]

Bulanon DM, Burr C, DeVlieg M, Braddock T, Allen B.Development of a visual servo system for robotic fruit harvesting.Agriengineering 2021; 3(4):840-852.

[103]

Parsa S, Debnath B, Khan MA, Ghalamzan EA.Modular autonomous strawberry picking robotic system.J Field Robot 2023; 41(7):2226-2246.

[104]

Bo B, Zhang S, Liu W, Liu L, Shi Y.Simulation of workspace and trajectory of a weeding mechanism.Alex Eng J 2022; 61(2):1133-1143.

[105]

farmwise.io [Internet]. Salinas: FarmWise; c2024 [cited 2024 Sep 24]. Available from:

[106]

Li S, Hendrich N, Liang H, Ruppel P, Zhang C, Zhang J.A dexterous hand–arm teleoperation system based on hand pose estimation and active vision.IEEE Trans Cybern 2022; 54(3):1417-1428.

[107]

Bac CW, Hemming J, van Tuijl BAJ, Barth R, Wais E, van Henten EJ.Performance evaluation of a harvesting robot for sweet pepper.J Field Robot 2017; 34(6):1123-1139.

[108]

Barth R, Hemming J, Van Henten EJ.Angle estimation between plant parts for grasp optimisation in harvest robots.Biosyst Eng 2019; 183:26-46.

[109]

Zhang K, Lammers K, Chu P, Dickinson N, Li Z, Lu R.Algorithm design and integration for a robotic apple harvesting system. In: Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 2022 Oct 23–27; Kyoto, Japan. Piscataway: IEEE; 2022.

[110]

Dogtooth.Strawberry harvesting robots [Internet]. Melbourn: Dogtooth; [cited 2024 Sep 25]. Available from:

[111]

Mehta SS, Burks TF.Vision-based control of robotic manipulator for citrus harvesting.Comput Electron Agric 2014; 102:146-158.

[112]

Magalh SAães, Moreira AP, Santos FN, Dias J.Active perception fruit harvesting robots—a systematic review.J Intell Robot Syst 2022; 105:14.

[113]

Gao R, Zhou Q, Cao S, Jiang Q.Apple-picking robot picking path planning algorithm based on improved PSO.Electronics 2023; 12(8):1832.

[114]

Zhang K, Lammers K, Chu P, Li Z, Lu R.An automated apple harvesting robot-From system design to field evaluation.J Field Robot 2023; 41(7):2384-2400.

[115]

Ma L, He Z, Zhu Y, Jia L, Wang Y, Ding X, et al.A method of grasping detection for kiwifruit harvesting robot based on deep learning.Agronomy 2022; 12(12):3096.

[116]

Fu M, Guo S, Chen A, et al.Design and experimentation of multi-fruit envelope-cutting kiwifruit picking robot. Frontiers in Plant Science, 15 (2024)

[117]

Williams H, Ting C, Nejati M, Jones MH, Penhall N, Lim JY, et al.Improvements to and large-scale evaluation of a robotic kiwifruit harvester.J Field Robot 2020; 37(2):187-201.

[118]

Ren G, Wu T, Lin T, Yang L, Chowdhary G, Ting KC, et al.Mobile robotics platform for strawberry sensing and harvesting within precision indoor farming systems.J Field Robot 2023; 41(7):2047-2065.

[119]

Xiong Y, Peng C, Grimstad L, From PJ, Isler V.Development and field evaluation of a strawberry harvesting robot with a cable-driven gripper.Comput Electron Agric 2019; 157:392-402.

[120]

Tituaña L, Gholami A, He Z, Xu Y, Karkee M, Ehsani R.A small autonomous field robot for strawberry harvesting.Smart Agric Technol 2024; 8:100454.

[121]

Shi Y, Jin S, Zhao Y, Huo Y, Liu L, Cui Y.Lightweight force-sensing tomato picking robotic arm with a “global–local” visual servo.Comput Electron Agric 2023; 204:107549.

[122]

Feng Q, Zou W, Fan P, Zhang C, Wang X.Design and test of robotic harvesting system for cherry tomato.Int J Agric Biol Eng 2018; 11(1):96-100.

[123]

Rong J, Wang P, Wang T, Hu L, Yuan T.Fruit pose recognition and directional orderly grasping strategies for tomato harvesting robots.Comput Electron Agric 2022; 202:107430.

[124]

Miao Z, Yu X, Li N, Zhang Z, He C, Li Z, et al.Efficient tomato harvesting robot based on image processing and deep learning.Precis Agric 2023; 24(1):254-287.

[125]

Williams H, Nejati M, Hussein S, Penhall N, Lim JY, Jones MH, et al.Autonomous pollination of individual kiwifruit flowers: toward a robotic kiwifruit pollinator.J Field Robot 2020; 37(2):246-262.

[126]

Chang CL, Chen HW, Ke JY.Robust guidance and selective spraying based on deep learning for an advanced four-wheeled farming robot.Agriculture 2024; 14(1):57.

[127]

Silwal A, Yandun F, Nellithimaru A, Bates T, Kantor G.Bumblebee: a path towards fully autonomous robotic vine pruning.2021. arXiv: 2112.00291.

[128]

Lu Z, Zhao M, Luo J, Wang G, Wang D.Automatic teat detection for rotary milking system based on deep learning algorithms.Comput Electron Agric 2021; 189:106391.

[129]

Gao J, Zhang F, Zhang J, Yuan T, Yin J, Guo H, et al.Development and evaluation of a pneumatic finger-like end-effector for cherry tomato harvesting robot in greenhouse.Comput Electron Agric 2022; 197:106879.

[130]

Ma Y, Feng Q, Sun Y, Guo X, Zhang W, Wang B, et al.Optimized design of robotic arm for tomato branch pruning in greenhouses.Agriculture 2024; 14(3):359.

[131]

Kurtser P, Edan Y.Planning the sequence of tasks for harvesting robots.Robot Auton Syst 2020; 131:103591.

[132]

Li Y, Feng Q, Zhang Y, Peng C, Zhao C.Intermittent stop-move motion planning for dual-arm tomato harvesting robot in greenhouse based on deep reinforcement learning.Biomimetics 2024; 9(2):105.

[133]

Robot de désherbage mécanique ANATIS.Saint Martin des Noyers: CARRE; 2024 [cited 25 Sep 2024].Available from:

[134]

Jiang W, Quan L, Wei G, Chang C, Geng T.A conceptual evaluation of a weed control method with post-damage application of herbicides: a composite intelligent intra-row weeding robot.Soil Tillage Res 2023; 234:105837.

[135]

Au W, Zhou H, Liu T, Kok E, Wang X, Wang M, et al.The Monash apple retrieving system: a review on system intelligence and apple harvesting performance.Comput Electron Agric 2023; 213:108164.

[136]

Paul ZA, Chris L.Towards active robotic vision in agriculture: a deep learning approach to visual servoing in occluded and unstructured protected cropping environments.IFAC-PapersOnLine 2019; 52(30):120-125.

[137]

Lin G, Zhu L, Li J, Zou X, Tang Y.Collision-free path planning for a guava-harvesting robot based on recurrent deep reinforcement learning.Comput Electron Agric 2021; 188:106350.

[138]

Lin G, Tang Y, Zou X, Wang C.Three-dimensional reconstruction of guava fruits and branches using instance segmentation and geometry analysis.Comput Electron Agric 2021; 184:106107.

[139]

Zhang Q, Liu F, Li B.A heuristic tomato-bunch harvest manipulator path planning method based on a 3D-CNN-based position posture map and rapidly-exploring random tree.Comput Electron Agric 2023; 213:108183.

[140]

Zhuang M, Li G, Ding K.Obstacle avoidance path planning for apple picking robotic arm incorporating artificial potential field and A* algorithm.IEEE Access 2023; 11:100070-100082.

[141]

Li Z, Xiong Z, Tian K, Gao T, Cai K.Deep reinforcement learning for robotic arm path planning in multi-obstacle environments.In: Proceedings of the 2024 9th Asia-Pacific Conference on Intelligent Robot Systems (ACIRS); 2024 Jul 18–20; Dalian, China. Piscataway: IEEE; 2024.

[142]

Suo R, Gao F, Zhou Z, Fu L, Song Z, Dhupia J, et al.Improved multi-classes kiwifruit detection in orchard to avoid collisions during robotic picking.Comput Electron Agric 2021; 182:106052.

[143]

Yu X, Fan Z, Wang X, Wan H, Wang P, Zeng X, et al.A lab-customized autonomous humanoid apple harvesting robot.Comput Electr Eng 2021; 96:107459.

[144]

Yan B, Quan J, Yan W.Three-dimensional obstacle avoidance harvesting path planning method for apple-harvesting robot based on improved ant colony algorithm.Agriculture 2024; 14(8):1336.

[145]

Varier VM, Rajamani DK, Goldfarb N, Tavakkolmoghaddam F, Munawar A.Collaborative suturing: a reinforcement learning approach to automate hand-off task in suturing for surgical robots.In: 2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN); 2020 Aug 31–Sep 4; Naples, Italy. Piscataway: IEEE; 2020.

[146]

Tsurumine Y, Cui Y, Uchibe E, Matsubara T.Deep reinforcement learning with smooth policy update: application to robotic cloth manipulation.Robot Auton Syst 2019; 112:72-83.

[147]

Razjigaev A, Pandey AK, Howard D, Roberts J, Wu L.End-to-end design of bespoke, dexterous snake-like surgical robots: a case study with the RAVEN II.IEEE Trans Robot 2022; 38(5):2827-2840.

[148]

Nguyen VS, Hwang B, Kim B, Jung JH.An end-to-end learning-based control signal prediction for autonomous robotic colonoscopy.IEEE Access 2024; 12:1280-1290.

[149]

Wu Y, Yu Z, Li C, He M, Hua B, Chen Z.Reinforcement learning in dual-arm trajectory planning for a free-floating space robot.Aerosp Sci Technol 2020; 98:105657.

[150]

Wang S, Cao Y, Zheng X, Zhao T.An end-to-end trajectory planning strategy for free-floating space robots.In: 2021 40th Chinese Control Conference (CCC); 2021 Jul 26–28; Shanghai, China. Piscataway: IEEE; 2021.

[151]

Li S, Nguyen HT, Cheah CC.A theoretical framework for end-to-end learning of deep neural networks with applications to robotics.IEEE Access 2023; 11:21992-22006.

[152]

Papadopoulos G, Andronas D, Kaliakatsos-Georgopoulos D, Kampourakis E, Kavvathas K, Theodoropoulos N, et al.Reconfigurable manufacturing systems through intelligent workpiece handling and artificial intelligence.Procedia CIRP 2024; 128:793-798.

[153]

Chen J, Li J, Huang Y, Garrett C, Sun D, Fan C, et al.Cooperative task and motion planning for multi-arm assembly systems. 2022. arXiv.2203.02475.

[154]

Sun J, Feng Q, Zhang Y, Ru M, Li Y, Li T, et al.Fruit flexible collecting trajectory planning based on manual skill imitation for grape harvesting robot.Comput Electron Agric 2024; 225:109332.

RIGHTS & PERMISSIONS

THE AUTHOR

PDF (15166KB)

20901

Accesses

0

Citation

Detail

Sections
Recommended

/