《1. Introduction》

1. Introduction

Intelligent robots are becoming increasingly important in both industry and everyday life. In industry, rising labor costs are motivating manufacturers to consider using more robots in factories. For example, in China the average minimum wage increased by more than 20% in 2012, while the supply of manufacturing robots also increased by 51% [1]. Europe and USA exhibit similar trends: Intelligent robots are being designed in order to make workers more productive and make manufacturers more competitive in terms of price and quality. One recent example among these intelligent robots is the new “Baxter” robot [2], which is equipped with software that enables the robot to learn various tasks from human demonstration, recognize different objects, and react intelligently to external forces. Intelligent robots are expected to assist people in everyday life. In the future, such robots are expected to perform various tasks, including ① household and care support, such as cooking and laundry; ② healthy life support, such as chatting with the elderly and taking care of people with disabilities; and ③ labor support in unsafe working conditions such as chemical plants [3]. Several successful prototypes for assistant robots exist. For example, the PR2 robot from Willow Garage has been shown to assist people with severe physical disabilities such as quadriplegia [4]; and humanoid robots such as the HRP-4 can perform human-like actions, and can communicate with people using speech [5]. In addition to their applications in industry and everyday life, modern intelligent robots can be helpful in other areas, including autonomous vehicles [6], medical and surgical intervention [7], emergency and disaster rescue [8], and military tasks [9].

The tremendous improvement in the design and availability of intelligent robots over the last decade is based on progress in many related areas, including computer vision, artificial intelligence, machine learning, control, sensor systems, and mechanical systems, which correspond to different components of an intelligent robot system (Figure 1). For example, the simultaneous localization and mapping (SLAM) algorithm enables a robot to accurately track its position in an unknown environment [10]. In addition, with the help of advanced vision techniques, robots can now recognize and segment objects from background point clouds [11]. Compared to traditional industrial robots, one important feature of the modern intelligent robot system is high-level planning and navigation. Its main purpose is to compute low-level instructions based on high-level descriptions for the tasks to be executed; these low-level instructions are then provided to the robot actuator system. This planning and navigation component is composed of many different sub-components (Figure 1) and there has been extensive work in this area, such as task planning [12], feedback from observation [13,14], optimal control [15], and adaptive control [16].

《Fig. 1》

Fig.1 Important hardware and software subsystems in a robot system. ① Feed-forward system (F), including task planning, navigation strategy, motion planning, and trajectory generation; ② control system (C), including kinematics, dynamics, and control algorithms; ③ actuator system (A), including motors, servos, transmissions, and so forth; ④ sensor system (S), including various sensors such as camera, laser, IMU, and related low-level sensor data processing algorithms such as signal processing, estimation, and fusion; ⑤ sensor post-processing system (S+), including localization, mapping, etc. The main software component of a robot system is the high-level planning and navigation, which determines the instructions sent to the actuator system, given the desired tasks to be executed. One important component of high-level planning and navigation is motion planning, which focuses on computing the trajectory from the environment description.

One of the most important sub-systems of the high-level planning and navigation component is the motion planning system, which enables the robot to move safely from an initial position to a goal position without colliding with any static or moving obstacles in the environment. A fast online motion planning algorithm is critical for many applications. For instance, there exists an increasing demand to incorporate mobile robots to assist humans in repetitive, non-value added tasks in the manufacturing domain [17]. To ensure a safe collaboration between robots and human beings, a robot needs to react to the changing environment in time, and this requires a real-time motion planning to avoid any accidents. Online motion planning is also critical while improving the level of automation for many traditional manufacturing process. For instance, for automatic bin picking, an efficient online motion planning is critical for the grasp performance [18]. In addition, a fast planning is also desirable for service robots working in hospitals and human homes. Motion planning problems can be directly formalized and solved in the 3D workspace, for instance with the widely-used potential field algorithms [19]. However, these workspace solutions cannot easily handle robots with different geometries and mechanical constraints. To overcome these difficulties, motion planning may be formalized and solved in a new space called the configuration space [20¯22]. In the configuration space, a robot with a complex geometric shape in a 3D workspace is mapped to a point robot, and the robot’s trajectory corresponds to a continuous curve in the high-dimensional configuration space (Figure 5). Based on the configuration space formulation, the motion planning problem can be solved in two steps:

(1) Construct a representation of the configuration space;

(2) Perform optimization based on the computed represen-tation.

This motion planning pipeline based on configuration spaces is very successful and is adopted by many real-world planning applications that require optimal planning solu-tions. Many different representations for the configuration space have been proposed, including polyhedrons [23], semi-algebraic sets [24,25], graphs [26], and trees/forests [27]. Different optimization approaches have been proposed for different configuration space representations, including com-puting a shortest path, computing the minimum distance to the boundary of a closed set inside the configuration space, and so forth. Moreover, the same pipeline is also implicitly used in some motion planning algorithms for only computing a feasible path (i.e., a collision-free path that does not violate other constraints). For example, many variants of rapidly exploring random tree (RRT) [27] use different heuristics to guide the search toward the goal configuration while growing a search tree structure as an approximate representation of the configuration space. Such a strategy can be viewed as a variant of the above pipeline, in which the configuration space construction alternates with the optimization computation.

However, this algorithmic pipeline based on configuration space still has many computational challenges. Two of the most important challenges are described here:

(1)Efficiently computing an approximate or exact representation for the configuration space is difficult, especially for high-DOF (degree of freedom) robots with high-dimensional configuration spaces. Such configuration space approximation problems would have exponential complexity.

(2)Many robotics applications require real-time planning in order to work reliably and efficiently in human environments with moving obstacles, but performing optimization in the computed representation for the configuration space can be time consuming.

In this paper, we will discuss our recent work for solving these challenges related to the configuration space. In par-ticular, we first demonstrate how to convert the configuration space construction problem into a machine learning problem, and then use active learning to compute an approximate configuration space efficiently and robustly (Section 4). Second, we provide parallel GPU-based algorithms to accelerate the optimization computations in the configuration space, which can allow for real-time planning computation in many challenging environments (Section 5).

《2. Background and related work》

2. Background and related work

《2.1. Configuration space》

2.1. Configuration space

The configuration space is a key concept used in classical mechanics to describe and analyze the motion of many important systems [28]. Generally, a configuration q is a vector of independent parameters uniquely specifying the state of a system; and a configuration space or C-space is a collection of all possible configurations for a given system. For example, for a system of n point particles, the configuration is a vector describing the positions of all the particles, and the corresponding C-space is R3n; the configuration of a 3D rigid body consists of its position and orientation, and the configuration space is SE(3) if both rotation and translation are allowed, and R3 if only translation is allowed; and the configuration of an articulated object is the vector of all its joint angles.

The configuration space of a robot A is composed of two components: collision-free space Cfree={q:A(q)∩B=∅} and in-collision space or obstacle space Cobs={q:A(q)∩B≠∅}, where B corresponds to the geometric representation of obstacles in the environment and A( q) corresponds to A with the configuration q. Cobs is a closed set and its boundary is denoted as the contact space Ccont = ∂ Cobs, which corresponds to the set of configurations where A and B just touch each other without penetration. Figure 2 shows an example of the C-space of two objects where Ccont is highlighted with an orange curve.

《Fig. 2》

     

Fig.2 The configuration space of two objects. The orange curve highlights the contact space Ccont of A and B. A point inside/on the orange curve belongs to Cobs and a point outside the orange curve belongs to Cfree. The red and green points denote configurations in Cobs and Cfree, respectively.

 《Fig. 3》

    

Fig.3 Cobs between 2D rigid objects A and B. For each rotation angle θ∈ [0, 2π), we can compute the Minkowski sum between A( θ) and − B, where A( θ) is the resulting shape after rotating A about the origin with θ degrees. When stacking the Minkowski sums for all angles θ, we obtain the Cobs between A and B.

In the special case when A and B are both rigid objects and robot A can only perform translational motion, Cobs is equal to the well-known Minkowski sum between A and B:Cobs=A⊕(−B)={x=xA+xB/xA∈A,xB∈−B}. One example of the Minkowski sum is shown in Figure 2. When robot A can perform general motion (i.e., both translation and rotation), the geometry of Cobs is much more complicated, as shown by the 2D example in Figure 3.

Based on the notion of configuration space, the motion planning problem in 3D workspace can be reduced to path planning for a point robot in C-space, that is, finding a curve in Cfree connecting the given initial and goal configurations of the robot.

《2.2. Configuration space construction》

2.2. Configuration space construction

Before performing the motion planning computation in the configuration space, one prerequisite is to compute the geome-try of C-space in an appropriate representation (e.g., a graph or a surface). Since C-space= Cfree∪ Cobs and Cfree ∩ Cobs = Ø, we only need to construct the representation for either Cfree or Cobs. Another equivalent solution is to compute Ccont, the boundary between Cfree and Cobs.

Previous work on configuration space construction can be categorized into two different methods: geometry-based and topology-based. Geometry-based methods compute the exact geometric representation of the configuration space, while topology-based methods capture the connectivity of the configuration space.

Geometry-based methods are usually limited to low-dimensional configuration spaces, due to the combinatorial complexity involved in computing the boundary of Cobs for high-dimensional configuration spaces. Most previous work has focused on the special case when objects A and B are rigid bodies only performing translational motion. As mentioned in Section 2.1, the resulting Cobs is the Minkowski sum between A and – B. Even for this special case, the computational complexity involved in computing Cobs is still high: The complexity is O( mn) when A and B are both convex-objects and is O( m3 n3) when A and B are both non-convex objects [29], where m and n are the number of triangles in A and B, respectively. In addition to the high complexity, most existing implementations for computing the Minkowski sum are prone to challenges that arise in the context of 3D geometric algorithms. In particular, these implementations are ① not robust to numerical errors, and ② susceptible to degeneracies (i.e., cannot reliably handle polygon soups or meshes with holes). Recent work has proposed methods [30¯32] for computing the approximate Minkowski sum efficiently and reliably, but these methods are also prone to robustness issues and can have high complexity in terms of dealing with complex objects. Options other than the Minkowski sum exist for computing Cobs . For example, Varadhan et al. [33] computed the Cobs for 2D objects with rotation and translation by approximating the Cobs with an adaptive grid; and Zhang et al. [34] computed an approximation to 4D C-space using cell decomposition.

Topology-based methods capture the connectivity of the configuration space. Most previous approaches attempt to capture the connectivity of Cfree using sampling techniques [26,27]. The basic idea is first to generate random samples (called milestones) in Cfree and then organize these samples using a graph structure or a forest of tree structures (Figure 4). As the topology of Cfree can be rather complex, and may consist of multiple components or small, narrow passages, it is hard to capture the full connectivity of Cfree using random sampling. There is extensive work on improving the connectivity computation by using different sampling strategies [35–40]. Recent work attempts to capture the topology of both Cfree and Cobs [41]. Topology-based methods can compute an approximate C-space representation much faster than geometry-based methods. However, these methods do not work well with narrow passages and can be slow for high-DOF robots.

《2.3. Optimization in configuration space》

2.3. Optimization in configuration space

Once an exact or approximate representation for the configuration space is computed, we next need to perform optimization in this C-space representation. For example, the goal of motion planning is to compute a trajectory in C-space, as shown in Figure 5. The trajectory should satisfy the following constraints: ① It should be completely inside Cfree; and ② it should be feasible, e.g., for humanoid robots, the robot should not fall down when following the trajectory. Moreover, it is preferable for the trajectory to be optimal under some metric. For instance, the optimal trajectory could be the shortest, take the least time to execute, or maintain the maximum distance from obstacles. As a result, motion planning can be formalized as a constrained optimization problem in C-space. Similar formulation can be applied to different applications, such as penetration depth computation [39,42¯44].

《Fig. 4》

     

Fig.4 Topology-based methods for configuration space computation. (a) Capture Cfree using a graph structure; (b) capture Cfree using a forest of tree structures.

Optimization in C-space is usually computationally expen-sive, especially for a high-dimensional C-space with a com-plicated structure and topology. To illustrate the computa-tional challenge for C-space optimization problems, we take motion planning in C-space as an example. Theoretically, motion planning using the exact representation of C-space has high computational complexity. Planning algorithms are considered to be “complete” if for any planning problem instance, the algorithm will either find a solution or will correctly report that no solution exists. Complete planning algorithms have been proved to be PSPACE-hard [45] and PSPACE-complete [24], and kinodynamic motion planning (i.e., motion planning with simple kinematic or dynamic constraints) has been shown to be NEXPTIME-hard [25]. The decidability is still unknown for motion planning with general differential constraints [46]. When the approximate representation of C-space is used (e.g., using a graph or forest to approximate the connectivity of Cfree), there exist approximate motion planning algorithms that provide guarantees of probabilistic completeness [26,27] and/or asymptotic optimality [47]. The complexity of these approximate motion planning algorithms is usually bounded by O( nln n) where n is the number of configuration samples used in the approximate representation of C-space [47]. Since n can be very large when Cfree has narrow passages and/or high dimensionality [48], the performance of these approximate algorithms is still far from real-time.

《Fig. 5》

     

Fig.5 Motion planning in workspaces and in configuration spaces. The left figure shows obstacles (with different colors) and a 2-linked robot in the workspace (both the initial and goal settings). The right figure shows the configuration space corresponding to the workspace in the left figure, where different colors describe the correspondence between obstacles in the workspace and obstacles in the configuration space. The blue curve is a trajectory connecting the initial and goal configurations, and is the result of a motion planning algorithm.

Various planning methods related to C-space have been proposed in the past decades, including optimization-based planning algorithms such as CHOMP [49] and TrajOPT [50], and search-based algorithms such as Anytime A* [51]. For motion planning of high-DOF robots, most of the practical methods are based on randomized algorithms, including probabilistic roadmap (PRM) [26] and RRT [27].

《3. Overview》

3. Overview

Our solutions to the challenges related to the configuration space include two parts, for the configuation space construction and the optimization in configuration spaces, respectively.

《3.1. Efficient configuration space construction》

3.1. Efficient configuration space construction

In the first part [52], we present a novel algorithm to efficiently approximate a high-dimensional configuration space using machine learning techniques. The main idea is to generate samples in the configuration space and then use these samples to approximate the contact space Ccont by a separating surface that can correctly separate all the in-collision and collision-free samples. This separating surface is computed using support vector machine (SVM) classification. Our method greatly reduces the required number of samples by leveraging incremental and active learning techniques. When the number of samples increases, the approximate contact space computed by our method can quickly converge to the exact contact space; we also provide bounds on the expected error in the approximate contact space. We evaluate the performance of our algorithm on high-dimensional benchmarks.

To construct a representation of the configuration space, we use an offline learning algorithm, as shown in the left box in Figure 6. We first generate a small set of uniform samples in a subspace of C-space for two given objects. Next, we justify whether these configurations lie in Cfree or in Cobs by performing exact collision checking between the two objects. We use the notation c( q)∈ {–1, +1} to denote the collision state of a configuration q, that is, c( q) = +1 if q∈ Cobs and c( q) = –1 if q∈ Cfree. Given the collision states of all configuration samples, a coarse approximation of the contact space, LCS0 (Figure 6(b)), is computed using classifiers, where LCS stands for learned contact space. Next, we select new samples in C-space to further improve the accuracy of the initial representation LCS0 using active learning. During active learning, we either select samples that are far away from prior samples ( exploration) (Figure 6(c)) or samples that are near LCS0 ( exploitation) (Figure 6(d)). After the new samples are generated, we compute an updated approximation LCS1 (Figure 6(e)) based on incremental machine learning techniques. We repeat this process, generating a sequence of approximate representations LCS0, LCS1, ..., with increasing accuracy. This iterative process is repeated until the collision states of all the new samples can be correctly predicted by the current approximation. The final result LCS (Figure 6(f)) corresponds to a smooth surface approximation of the contact space.

《Fig. 6》

     

Fig.6 Offline computation pipeline for Ccont approximation. The different approximations of LCS are shown below the corresponding stages. We use green points to indicate collision-free configuration samples and red points to indicate in-collision samples. (a) Exact contact space (for reference); (b) initial model; (c) exploration; (d) exploitation; (e) solution after i-th iteration; (f) final approximate contact space.

《3.2. Efficient optimization in configuration spaces》

3.2. Efficient optimization in configuration spaces

The approximate contact space computed by the first part of our work can potentially be directly used for motion planning, for instance, by using it as the collision-free constraint in the trajectory optimization framework [53]. However, this constraint is highly non-convex. Thus, solving the resulting optimization problem is still non-trivial, and this is one focus of our ongoing work. Another way to solve the motion planning problem is using a sample-based approximation to the configuration space, which has been adopted by many traditional planning algorithms such as the PRM. In the se-cond part [54] of this paper, we present a novel parallel algorithm for real-time motion planning of high-DOF robots that exploits the computational capability of a $400 USD commodity graphics processing unit (GPU). Current GPUs are programmable many-core processors that can support thousands of concurrent threads. We use them for real-time computation of a PRM and a lazy planner. We describe efficient parallel strategies for constructing the roadmap that include sample generation, collision detection, connecting nearby samples, and local planning. The query phase is also performed in parallel based on a graph search. In order to design an efficient single query planner, we use a lazy strategy that defers collision checking and local planning. We also describe new hierarchy-based collision detection algorithms, to accelerate the overall performance.

We choose the PRM algorithm as the underlying method for parallel planning, because it is most suitable to exploit multiple cores and data parallelism on GPUs. The PRM algo-rithm has two phases: roadmap construction and querying. The roadmap construction phase includes four main steps: ① Generate samples in the configuration space; ② compute milestones that correspond to the samples in the free space by performing discrete collision queries; ③ for each milestone, find other milestones that are nearest to it; and ④ connect nearby milestones using local planning, and form a roadmap. The query phase includes two parts: ① Connect initial and goal configurations of the query to the roadmap, and ② execute a graph search algorithm on the roadmap and find collision-free paths.

Parts of the PRM algorithm, such as the collision queries, are embarrassingly parallel [55]. We can use a many-core GPU to significantly enhance the performance of the other components as well. The framework of our PRM algorithm on the GPU is shown in Figure 7. We parallelize each of the six steps of the PRM algorithm efficiently. First, each thread of a multi-core GPU generates a random robot configuration, and some of these configurations will collide with obstacles. All of the collision-free samples are milestones and become vertices of the roadmap graph. Next, each GPU thread computes the k-nearest neighbors of a single milestone and collects all the neighborhood pairs. Each thread then checks whether it is possible to connect these adjacent pairs by performing local planning. If there is a collision-free path between that neighborhood pair of milestones, we add the edge to the roadmap. Once the roadmap is built, queries are connected to the roadmap in parallel and we use a parallel graph search algorithm to find paths.

《Fig. 7》

     

Fig.7 PRM overview and parallel components in our algorithm.

The resulting GPU-based framework is very efficient for a multi-query version of the planning problem. The most expensive step in this computation is the local planning algorithm; thus, we use new collision detection algorithms to improve its performance. In order to accelerate the single-query algorithm, we introduce a solution that uses a lazy strategy and defers collision checking for local planning. In other words, the algorithm connects all the edges corresponding to the nearest neighbors and searches for paths between the initial and final configurations. After that, it performs local planning on the edges that constitute these paths.

《4. Learning-based approximate contact space construction》

4. Learning-based approximate contact space construction

In this section, we present our algorithm for the offline learning of the contact space and the computation of LCS. Diffe-rent stages of this algorithm are shown in Figure 6.

《4.1. Initial sampling》

4.1. Initial sampling

We perform uniform sampling in C-space to obtain a set of configuration points. Rather than sampling the entire C-space, we generate samples in a subspace that contains Ccont. Given two objects A and B, the contact space Ccont is contained in the in-collision space of their bounding volumes BV( A) and BV ( B).

《4.2. Compute LCS0》

4.2. Compute LCS0

Given a set of k samples from Cobs( BV( A), BV( B)), we perform exact collision queries between A and B to check whether these samples are within in-collision space or not. Our goal is to learn an approximate representation LCS0 from these configurations. In particular, LCS0 corresponds to a decision function f( q) = 0 that is fully determined by a set of configurations S in C-space. We refer to f( q) as the classifier and use it to predict whether a given configuration q is collision-free ( f( q)<0) or in-collision ( f( q)>0). S corresponds to the support vectors, which are a small subset of configuration samples used in learning. Intuitively, S are the samples that are closest to Ccont.

We use the SVM classifier [56] to learn LCS0 from the initial sampling of k configurations. An SVM generates a decision function that is a smooth nonlinear surface. We use the hardmargin SVM, as the underlying samples can always be separated into collision-free and in-collision spaces. Intuitively, an SVM uses a function to map the given samples { q i} from the input space into a higher (possibly infinite) dimensional feature space. An SVM computes a linear separating hyperplane characterized by parameters w and b. The hyperplane’s maximal margin is in the higher dimensional feature space. The hyperplane corresponds to a nonlinear separating surface in the input space. The w is the normal vector to the hyperplane, and the parameter b determines the offset of the hyperplane from the origin along the normal vector. In the feature space, the distance between a hyperplane and the closest sample point is called the “margin,” and the optimal separating hyperplane should maximize this distance. The maximal margin can be achieved by solving the following optimization problem:

where ci∈{−1;+1} is the collision state of each sample qi. We define K(qi,qi)=φ(qi)Tφ(qj) as the kernel function (i.e., a function used to calculate inner products in the feature space). We use radial basis function (RBF) kernel in our results.

The solution of Eq. (1) is a nonlinear surface in the input space (and a hyperplane in the feature space) that separates collision-free and in-collision configurations. This solution can be formulated as:

where w* and b* are the solutions of Eq. (1) and αi≥0. The vectors qi corresponding to the non-zero αi are called the support vectors, which we denote as S. Intuitively, the support vectors are those samples closest to the separating hyperplane f( q) = 0, as shown by the larger red and green points in Figure 6(b). Thus, LCS0 consists of an implicit function fLCS0(q)=f(q) and a set of samples SLCS0=S (i.e., the support vectors), which are used to approximate the exact contact space.

《4.3. Refine LCS0 using active learning》

4.3. Refine LCS0 using active learning

We refine LCS0 using active learning. The goal is to actively select new samples so that a better approximate contact space representation, LCS1 , can be obtained by in­corporating these samples into LCS0. We use a combination of exploration and exploitation [57]. The idea is to determine whether to explore or to exploit by flipping a biased coin with a certain probabili-ty for landing on heads (initially 0.5). If the result is heads, we apply exploration; if it is tails, we apply exploitation. The probability of landing on heads is adjusted according to the fraction of exploration samples’ collision states that are correctly predicted by the current LCSi. The new samples are used to update LCS0 and generate a new approximation LCS1 (or refine from LCSi to LCS i+1). We repeat the active learning step until all the new samples can either be correctly predicted by the current LCSi, or the final result (represented as LCS) has sufficient accuracy to approximate Ccont.

《4.4. Incremental learning》

4.4. Incremental learning

Instead of computing a new decision function from scratch using all the previous samples, we apply incremental learning techniques to efficiently compute LCSi+1 from LCSi. Incre­mental learning utilizes a small set of new samples to update LCSi. The decision function of LCSi serves as the initial guess for generating LCSi+1. The incremental SVM [58] can update the current result generated using SVMs; the key is to retain the optimality condition of Eq. (1) (i.e., the Kuhn-Tucker condition) on all prior samples while adding new samples. This is achieved by adjusting the coefficients αi and bin Eq. (2) and by adjusting support vector set S. The coefficient adjustment and the support vector changes are guided by the gradient of the objective function in Eq. (2).

《4.5. Terminating active learning》

4.5. Terminating active learning

Active learning terminates when either of these conditions has been satisfied:

(1)The collision states of all the new samples generated during exploration and exploitation can be correctly predicted by the current approximation LCSi.

(2)The total number of samples used in active learning iterations is more than a user-specified threshold.

The first condition guarantees that all the configurations used for learning LCS are consistent (i.e., they can be correctly predicted by LCS). This implies that the current LCSis a close approximation of the underlying contact space. The second condition controls the accuracy of the approximate Ccont: As more samples are used, we get a better approximation to Ccont.

《5. GPU-based real-time optimization in configuration spaces》

5. GPU-based real-time optimization in configuration spaces

In this section, we present the details about how to use GPU to accelerate the optimization speed in configuration spaces.

《5.1. Hierarchy computation》

5.1. Hierarchy computation

We construct a bounding volume hierarchy (BVH) for the robot and one for each of the obstacles in the environment, to accelerate the collision queries. We use the GPU-based construction algorithm introduced in Ref. [59], which can construct the hierarchy of axis-aligned bounding boxes or oriented bounding boxes(OBB) in parallel on the GPU, for given triangle representation. For collision detection, we use the OBB hierarchy, as it provides higher culling efficiency and improved performance on GPU-like architectures. These hierarchies are stored in the GPU memory and we apply appropriate transformations for different configurations.

《5.2. Roadmap construction》

5.2. Roadmap construction

The roadmap construction phase tries to capture the con­nectivity of the free configuration space, which is the main computationally intensive part of the PRM algorithm.

(1) Sample generation: We first need to generate random samples within the configuration space. Since samples are independent, we schedule enough parallel threads to utilize the GPU and use MD5 cryptographic hash function [60], which in practice provides good randomness without a shared seed.

(2) Milestone computation: For each configuration gen­erated in the previous step, we need to check whether it is a milestone: i.e., a configuration that lies in the free space and does not collide with obstacles. We use a hierarchical collision detection approach using BVHs to test for overlap between the obstacles and the robot in the configuration defined by the sample. The collision detection is performed in each thread by using a traversal algorithm in the two BVHs. The traversal algorithm starts with the two BVH root nodes and tests the OBB nodes for overlap in a recursive manner. If two nodes overlap, then all possible pairings of their children should be recursively tested for intersection.

We also use GPUs to compute the actual BVH structure for both the robot and obstacles by using a parallel hierarchy construction algorithm [59]. Since the robot’s geometric objects move depending on the configuration, its BVH is only valid for the initial configuration. In order to avoid recomputing a BVH for each configuration, we instead transform each node of the robot’s BVH with the current configuration sample before performing overlap tests. Thus, only nodes that are actually needed during collision testing are transformed.

(3) Proximity computation: For each milestone computed, we need to find its k-nearest neighbors ( k-NN). In general, there are two types of k-NN algorithms: exact k-NN and approximate k-NN, which is faster by allowing a small relaxation. Our proximity algorithm is based on a range query that uses a BVH structure of the points in configuration space.

For 3-DOF Euclidean space, we first construct the BVH structure for all the milestones using a parallel algo­rithm [59]. For each configuration q, we enclose it within an axis-aligned box: A box with q as the center and with 2 ε as the edge length. Next, we traverse the BVH tree to find all leaf nodes (i.e., configurations) that are within the ε box. This reduces to a range-query for q. For a non-Euclidean DOF, we duplicate samples to transform it into a Euclidean space locally. For example, suppose one DOF is the rotation angle α∈ [0, 2π]. We add another sample α*∈ [−π, 3π] with a distance 2π to α. If all 3-DOF are rotations, we need to add another 7 samples for each milestone. Once the range query finishes, we choose the k-nearest ones from all the query results; this gives us the exact nearest neighbors. This approach can be extended to handle k-NN search in high dimensional space by using a decomposition strategy; the details are in our recent work [61].

To further improve the performance of proximity computation in high dimensional space, we have developed a new k-NN algorithm, which uses locality sensitive hashing (LSH) and cuckoo hashing to efficiently compute approximate k-NN in parallel on the GPU. For more details, please refer to our recent work [62].

(4) Local planning: Local planning checks whether there is a local path between two milestones, which corresponds to an edge on the roadmap. It is the most expensive part of the PRM algorithm. Suppose we have n m milestones, and each milestone has at most nk nearest neighbors. Then the algorithm performs local planning at most nm· nk times. If we use DCD, then we need to perform at most nl = n m· n k· n i collision queries, which can be very high for a complex benchmark. For multi-query problems, this cost can be amortized over multiple queries, as the roadmap is constructed only once. For a single-query problem, computing the whole roadmap is too expensive.

Therefore, in the single-query case, we use a lazy strategy to defer local planning until absolutely necessary. Given a query, we compute several different candidate paths in the roadmap graph from the initial to final configuration and only check local planning for roadmap edges on the candidate paths. Local planning may conclude that some of these edges are not valid, and in that case, we delete them from the roadmap. If there exists one candidate path without invalid edges, the algorithm has found a collision-free solution. Otherwise, we compute candidate paths again on the updated roadmap and repeat the above process. This lazy strategy can greatly improve performance for single queries.

《5.3. Query phase》

5.3. Query phase

The query phase includes two parts: connecting queries to the roadmap and executing graph searches to find paths.

(1) Query connection: Given the initial-goal configura­tions in a single query, we connect them to the roadmap. For both of these configurations, we find the k-nearest milestones on the roadmap and add edges between the query and milestones that can be connected by local planning. We use the same algorithm from the roadmap construction phase, except that the kused is 2–3 times larger in order to increase the proba­bility of finding a path.

(2) Graph search: The search algorithm tries to find a path on the roadmap connecting initial and goal configurations. We use depth-first search (DFS) or breadth-first search (BFS) for the graph search. For the multi-query case, each GPU thread traverses the roadmap for one query using DFS, and the final results are collision-free paths. For the single-query case, we exploit all the GPU threads to find the path for one query using a BFS search: For nodes that are the same number of steps away from the initial node, we add their unvisited neighbors into the queue in parallel. In other words, different GPU cores traverse different parts of the graph. The main challenge of this method is that work is gene-rated dynamically as the BFS traverse progresses, and the computational load on different cores can change significantly. To address the problem of load balancing and work distribution so that parallelism for all cores is maintained, we use the light-weight load balancing strategy in Ref. [63].

《6. Experiments》

6. Experiments

In this section, we investigate the performance of our approaches while solving two challenges related to the confi-guration space.

《6.1. Configuration space construction》

6.1. Configuration space construction

We have used a set of complex benchmarks to evaluate the contact space approximation results of our algorithm.

We first compute the contact space for 2D objects shown as the input of the learning pipeline in Figure 6. In this experiment, object A can only undergo translational motion and object B is fixed. The resulting configuration space has 2 degrees of freedom, and the series of LCS surfaces com­puted by our learning-based approach is shown in Figure 8. We can see that after a few iterations with several hundred samples, the learned LCS can well approximate the exact contact space between these two objects.

《Fig. 8》

     

Fig.8 LCS computation using active learning for the 2D contact space between 2D star and room models shown as the input of learning pipeline in Figure 6. We highlight the number of support vectors corresponding to LCS i. (a) i = 0, | S| = 37; (b) i = 4, | S| = 75; (c) i = 8, | S| = 198; (d) i = 14, | S| = 327; (e) i = 20; | S| = 654.

Next, we compute the contact space for the same pair of 2D objects, but we now allow object A to perform rotation in the 2D plane. The resulting configuration space has 3 degrees of freedom, and the series of LCS surfaces generated by our approach is shown in Figure 9. Similar to the case of the 2D contact space, the learned contact space can quickly converge to a good approximation of the exact contact space.

《Fig. 9》

     

Fig.9 LCS computation using active learning for the 3D contact space between 2D star and room models shown as the input of learning pipeline in Figure 6. We highlight the number of support vectors corresponding to LCSi. The vertical axis represents the rotational component of the C-space. (a) LCS0, | S| = 88; (b) LCS5, | S| = 174; (c) LCS9, | S| = 237; (d) LCS12, | S| = 248.

《Fig. 10》

     

Fig.10 LCS computation using active learning for the 3D contact space between 3D cup and spoon. We provide the number of support vectors corresponding to LCS i. As shown, the algorithm can compute a good approximation in a few iterations. (a) Objects A and B; (b) LCS0, | S| = 231; (c) LCS5, | S| = 869; (d) LCS9, | S| = 1350; (e) LCS12, | S| = 1572.

Finally, we compute the contact space for a pair of 3D objects shown in Figure 10(a), where the object A can only undergo translation and thus the corresponding configuration space is three-dimensional. Similarly, we can observe that the learned contact space sequence quickly converges to the exact contact space.

《6.2. Optimization in configuration spaces》

6.2. Optimization in configuration spaces

We present details of our implementation and evaluate the performance of our algorithm on a set of benchmarks. All the timings reported here were taken on a machine using a Intel Core i7 CPU (~$600) at 3.2 GHz CPU and 6 GB memory. We implemented our algorithms using CUDA on a NVIDIA GTX 285 GPU (~$380) with 1 GB of video memory.

We implement the PRM algorithm on the GPU (G-PRM) for multi-query planning problems, and implement its lazy version (GL-PRM) for single-query problems. We compare these with the PRM and RRT algorithms implemented in the OOPSMP library [64], which is a popular library for motion planning algorithms on CPU. The benchmarks used are shown in Figure 11. Our comparisons are designed as follows: For each benchmark, we find a suitable setting where CPU-PRM (C-PRM) finds a solution, and then we run G-PRM with a comparable number of samples. After that, we run GL-PRM with the same setting as G-PRM, and run CPU-RRT (C-RRT) with the same setting as C-PRM.

Table 1 shows the comparison of timings between algorithms. In general, G-PRM is about 10 times faster than C-PRM, and GL-PRM can provide another 10-fold acceleration for single-query problems. G-PRM is faster than C-PRM even for dynamic scenes. The current C-RRT and C-PRM are both single-core versions. However, even a multi-core version of PRM would only improve the timing by 4-fold at most, because on an 8-core CPU it is hard to scale the hierarchy computations and nearest neighbor computations linearly. Our GPU algorithms can still provide performances 1¯2 orders of magnitude higher than CPU algorithms.

《Tab.1》

Tab.1 The left two columns evaluate the performance of the PRM and RRT algorithms in the OOPSMP. The right two columns evaluate the performance of our GPU-based algorithms.

《Fig. 11》

     

Fig.11 The benchmark scenes used for our algorithms in the following order: Piano (2484 triangles), helicopter (2484 triangles), maze3d1 (40 triangles), maze3d2 (40 triangles), maze3d3 (970 triangles), and alpha puzzle (2016 triangles).

Figure 12 shows the timing breakdown between various steps for G-PRM and GL-PRM. The difference between the performance of the two algorithms is clear: In G-PRM, local planning is the bottleneck and dominates the timing, while in GL-PRM the graph search takes longer because local planning is performed in a lazy or output-sensitive manner. In GL-PRM, three components take most timing: milestone construction, proximity computation, and graph search, because all of them may perform collision queries heavily. If the environment is cluttered and the model has complex geometry, milestone construction will be slow (alpha puzzle in Figure 12). If the environment is an open space and has many milestones, proximity computation will be the bottleneck (maze3d2 in Figure 12). If the lazy strategy cannot guess a correct path, then the graph search will be computationally intensive due to the large number of collision queries (maze3d3 in Figure 12). However, in all these environments, GL-RPM outperforms all other methods.

《Fig. 12》

     

Fig.12 Split-up of timings: the fraction of time spent in different parts of the G-PRM and GL-PRM.

We tested the scalability of G-PRM and GL-PRM on the maze3d3 benchmark, and the result is shown in Figure 13. It is obvious that GL-PRM is generally faster than G-PRM, and both algorithms achieve near-linear scaling on the benchmark. However, observe that as the number of samples increases, GL-PRM slows down faster than G-PRM. This is because when the number of samples increases, proximity computation becomes increasingly expensive and dominates the timing when the number of samples is near 1 million.

《Fig. 13》

     

Fig.13 The scalability of the G-PRM and GL-PRM.

Our method can be extended for efficient planning for articulated bodies, and can achieve real-time performance for the PR2 grasping operation, as shown in Figure 14.

《Fig. 14》

   

Fig.14 Our GPU-based motion planner can compute a collision-free path for PR2 in less than 1.(a) PR2 simulation; (b) real PR2.

《7. Conclusions》

7. Conclusions

In this paper, we have addressed two computational chal­lenges related to configuration spaces, that is, configuration space construction and efficient optimization in configuration space. As research in configuration space continues, we expect progress in all these areas and more. While there is always more to do, the work presented in this paper has addressed many of the important issues in this field.

To summarize the main results presented in this paper, we first presented a novel approach to the approximation of configuration spaces. The main idea is to sample the configuration space and approximate the contact space based on machine learning classifiers, particularly support vector machines. Furthermore, we use active learning techniques to select the samples during precomputation. Next, we in­troduced a whole motion planning algorithm on GPUs. Our algorithm can exploit all the parallelism within the PRM algorithm, including the high-level parallelism provided by the PRM framework and the low-level parallelism within different components of the PRM algorithm, such as collision detection and graph search. This makes our work the first to perform real-time motion planning and global navigation in general environments using GPUs.

《Acknowledgements》

Acknowledgements

This research was partially supported by the Army Research Office, the National Science Foundation, Willow Garage, and the Seed Funding Programme for Basic Research at the University of Hong Kong.

《Compliance with ethics guidelines》

Compliance with ethics guidelines

Jia Pan and Dinesh Manocha declare that they have no conflict of interest or financial conflicts to disclose.