A Co-Point Mapping-Based Approach to Drivable Area Detection for Self-Driving Cars

Ziyi Liu; Siyu Yu; Nanning Zheng

doi:10.1016/j.eng.2018.07.010

Engineering ›› 2018, Vol. 4 ›› Issue (4) :479 -490. DOI: 10.1016/j.eng.2018.07.010

Research

Research Robotics—Article

A Co-Point Mapping-Based Approach to Drivable Area Detection for Self-Driving Cars

Ziyi Liu ^a^,^b
, Siyu Yu ^a^,^b
, Nanning Zheng ^a^,^b

Author information +

History +

PDF (4008KB)

Abstract

The randomness and complexity of urban traffic scenes make it a difficult task for self-driving cars to detect drivable areas. Inspired by human driving behaviors, we propose a novel method of drivable area detection for self-driving cars based on fusing pixel information from a monocular camera with spatial information from a light detection and ranging (LIDAR) scanner. Similar to the bijection of collineation, a new concept called co-point mapping, which is a bijection that maps points from the LIDAR scanner to points on the edge of the image segmentation, is introduced in the proposed method. Our method positions candidate drivable areas through self-learning models based on the initial drivable areas that are obtained by fusing obstacle information with superpixels. In addition, a fusion of four features is applied in order to achieve a more robust performance. In particular, a feature called drivable degree (DD) is proposed to characterize the drivable degree of the LIDAR points. After the initial drivable area is characterized by the features obtained through self-learning, a Bayesian framework is utilized to calculate the final probability map of the drivable area. Our approach introduces no common hypothesis and requires no training steps; yet it yields a state-of-art performance when tested on the ROAD-KITTI benchmark. Experimental results demonstrate that the proposed method is a general and efficient approach for detecting drivable area.

Keywords

Drivable area / Self-driving / Data fusion / Co-point mapping

Cite this article

Download citation ▾

Ziyi Liu, Siyu Yu, Nanning Zheng. A Co-Point Mapping-Based Approach to Drivable Area Detection for Self-Driving Cars. Engineering, 2018, 4(4): 479-490 DOI:10.1016/j.eng.2018.07.010

登录浏览全文

4963

注册一个新账户忘记密码

1. Introduction

Road detection has long been considered to be a decisive component of self-driving cars, and attracts wide research attention. Thus far, remarkable progress has been achieved in road detection [1–3]. However, automatic driving decisions that are based on road detection alone may fail to deal with certain emergencies in which the detected road becomes undrivable due to suddenly turning vehicles or pedestrians. In fact, when driving a car, a human driver understands scenarios by classifying obstacles versus non-obstacles, rather than merely identifying the road. Thus, a human driver can choose to drive on flat areas that are not normally viewed as roads for safety reasons during an emergency. For self-driving cars, instead of the detection of road areas, the detection of such “flat areas” can provide a more comprehensive knowledge for the decision-making process, allowing self-driving cars to act more like human drivers.

Although most existing road-detection methods are already available for well-marked roads through sample training, the problem of detecting a road surface on weakly marked roads and lanes in urban and rural environments remains unsolved, due to the high variability of the scene layout, illumination, and weather conditions. Thus far, no reliable solution exists; therefore, a robust and efficient method is urgently needed.

In image segmentation, the boundaries of objects generally appear in the area of depth discontinuities. Therefore, image segmentation should be fused with depth discontinuities. In projection geometry, a homograph is an isomorphism of projective spaces; it is a bijection that maps lines to lines, in what is known as collineation. Here, we introduce a new concept that is similar to the bijection of a collineation: co-point mapping. Co-point mapping is a bijection that maps points from a laser sensor to the points on the edges of image segmentation. The projective space is constructed as a set of points of normal vectors over a given field. Some co-points are not homographs owing to changes in illumination, unevenness of the road, and shadows in two-dimensional (2D) images. In order to overcome these problems, we simply use the normal vector of the point cloud data instead of using the raw point cloud data, which will be described later. In this context, pixel-depth data fusion can be clearly described, and the homographs of pixel depth are defined as co-point mapping.

We thus propose a self-adaptive method for drivable area detection by leveraging co-point mapping to fuse the pixel information from a monocular camera with the spatial information from a laser sensor, as shown in Fig. 1. By combining the image pixels’ coordinates with the spatial location of each laser point, a Delaunay triangulated graph [4] is built to establish the spatial relationship among the laser points, and the normal vectors of the triangles are used in the obstacle classification task of the laser points. Next, initial drivable areas are located by fusing obstacle classification results with image superpixels through self-learning. Candidate drivable areas in different feature spaces can then be obtained. These features are: the drivable degree (DD) feature, the normal vector (NV) feature, the color feature, and the strength feature. Finally, a Bayesian framework is utilized to fuse the candidate areas in order to obtain the final drivable area. In our evaluations, we tested our method using the ROAD-KITTI benchmark [5]. Our results, when compared with other fusing methods, demonstrated that the proposed method achieves state-of-the-art results without requiring training or assumptions about shape or height; this result validates our method as being robust and having a high generalization ability.

The key contributions of this paper are as follows:

	• We propose an unsupervised detection method based on data fusion without the need for a strong hypothesis, which ensures our method’s generalization ability in different urban traffic scenes.
	• We introduce the new concept of co-point mapping, which describes a novel kind of constraint in the fusion of data from the laser sensor and camera.
	• We design a new feature called DD in order to describe the drivable degree of the laser points.

2. Related work

A robust road area detection method is central to self-driving cars. Many methods have been proposed over the past decades to deal with this problem. These methods can be categorized by the sensors that are used to acquire data, which include monocular cameras, stereo vision, laser sensors, and the fusion of multi-sensors.

Monocular vision-based approaches have been widely used in road detection. Compared with other sensors, a visual sensor is small in size, low in cost, and easy to install. Moreover, rich visual information is available from a visual sensor, which has a wide range of detection. In addition, the concealment capability of a visual sensor is better than those of other sensors. Above all, the principle and structure of a visual sensor are similar to those of human sensory organization. In road detection, 2D information on the visual scene, such as color, corner points, texture, edge, and shape, are utilized. Regarding color cue, it is common to process segmentation in the RGB color space [6], HSI color space [7], or other color spaces. Jau et al. [8] compared RGB and HSI color segmentation under different lighting conditions. Finlayson et al. [9] presented a physics-based illumination invariant space that achieves a shadow-free image representation, which was used in this paper. Moreover, by exploiting the spectral properties of the camera that is used to capture raw color images, Maddern et al. [10] proposed another illumination invariant color space that reduces the effects of illumination variation caused by sunlight.

Another hot research topic is convolutional neural network (CNN)-based methods, which have achieved great success in this field [11,12]. Originally, CNNs were used to solve classification problems [13,14]; however, with the emergence of recent work [15,16], the utilization of CNN-based methods for sematic segmentation has entered a period of upsurge.

However, the conception of a road is different from other vision conceptions because the pixel appearance in vision is not the only criterion to detect a road area. Physical attributes, such as flatness, contribute more to the conception of a road, which indicates that methods that rely only on monocular vision are not reliable enough for self-driving cars. Although CNN-based methods can achieve good performance, they heavily depend on training, may fail to deal with unseen scenarios, and can have overfitting problems. Unlike problems such as scene categorization [17] or similarity learning [18], road area detection is an ill-conditioned problem that requires using 2D information to solve a task in a three-dimensional (3D) real-world scene. Although many 3D cues, such as the horizon line and vanishing point, are used to alleviate this problem [19–21], the detection of these 3D cues is in itself an unsolved problem [22,23]; some geometric assumptions may result in failure or may reduce the generalization ability of the methods, as shown in Fig. 2.

In recent years, the advent of sensors has inspired the development of many road-detection methods based on laser sensors that can offer supplementary depth measurement of real-world 3D scenes. These methods, which use the spatial locations of laser points to analyze a scene and identify flat areas as road, can be classified as follows:

(1) Grid-based methods. Since point clouds contains a large quantity of data, 2D grid-based methods [24–26] are commonly used to reduce the data size, while statistics of the points within a grid are calculated in order to characterize each grid, and may include the average height and the maximum height difference. Although these methods are straightforward, noise-robust, and efficient, the selection of appropriate thresholds is difficult.

(2) Plane-fitting-based methods. The basic assumption in these methods is that the road is flat and smooth, so that it can be fitted by a plane with several parameters [27,28]. Typical plane-estimation methods are well developed, such as random sample consensus (RANSAC) [29]. However, these methods maybe suffer in heavy traffic scenarios because of the lack of true ground laser points.

(3) Methods based on the spatial relationship between neighboring points. These methods [30,31] take advantage of the spatial relationship between neighboring points to extract features (such as the normal vector) or probabilistic models in order to estimate the ground laser points or obstacle laser points (such as those of the curb).

All methods that are based on laser sensors suffer from the sparsity of point cloud data; as a result, it is difficult to reconstruct details from the laser points.

Detecting the road area can be regarded as a two-class labeling problem, and the conditional random field (CRF) framework is popular in this area [32,33]. The CRF framework formulates a labeling problem as a calculation of the maximum posterior probability of the overall labeling result, given the observations from all aspects. This is a general framework in which different observations can be defined by designing different entries of the energy function and potential function. Thus, CRF-based methods are widely used in fusion methods in order to both balance the data from different sources and obtain optimal fusion results [34,35]. However, the computation and memory consumption of CRF-based methods are large, and manually labeled data are required.

To overcome the drawbacks mentioned above, this paper proposes a co-point mapping-based self-adaptive method of drivable area detection by fusing data from a laser sensor with data from a monocular camera. First, our method conducts several preprocessing steps, which include: conducting superpixel segmentation to obtain the minimum image-processing units for subsequent steps; projecting the laser points onto an RGB image via cross-calibration and co-point mapping; and utilizing the Delaunay triangulation properties to preprocess the laser points in order to establish a spatial relationship. Next, pixel-depth data fusion is processed by leveraging the preprocessing results, and obstacle classification can be performed by utilizing the data fusion results. By combining superpixels with the obstacle classification results, an initial drivable area is located. This step is followed by feature-extraction processing, by which we obtain the DD feature, NV feature, color feature, and strength feature. All these features can be easily transformed into probabilities in a self-learning manner. Finally, a joint probability can be calculated superpixel by superpixel, by leveraging the Bayesian framework to obtain a joint probability map of the drivable areas.

Our proposed method distinguishes itself from other detection techniques in three main aspects:

	• Our method requires no strong hypothesis, training processing, or labeled data.
	• By leveraging the co-point perspective and fusing data from both a laser sensor and a monocular camera, our method is robust to variation in illumination and can cope with complex scenarios.
	• Our method adopts superpixel segmentation, and superpixels are then taken as the minimum processing elements. The advantages of replacing pixels with superpixels will be detailed later.

Unlike methods that lack feature-level fusion [36,37], our proposed method combines monocular vision with laser sensors to obtain abundant information on both the data level and the feature level. Our method extracts and fuses features in a self-learning way. In addition, co-point perspective and superpixel representation are utilized to make the method more robust and efficient. As demonstrated by the experimental results, our method achieves higher accuracy than other methods. Thus, we consider our method to be a general, practical, and self-adaptive approach to the detection of drivable areas for self-driving cars.

3. Preliminary knowledge

In this section, we provide the preliminary knowledge that is required by our method, including superpixel representation of images, the projection of laser points onto an RGB image, and the establishment of log-chromaticity space.

3.1. Superpixel representation

The idea of superpixels was originally developed by Ren and Malik [38]; a superpixel is a group of pixels that is coherent in color or texture, such that a superpixel representation preserves most of the structural information of the original image.

In this paper, superpixels, rather than pixels, are taken as the minimum processing units in image-processing steps, and assist in shaping candidate areas. Because of the performance improvement of superpixel methods, replacing pixels with superpixels reduces the computation and memory costs without sacrificing much accuracy. In addition, the usage of superpixels takes color information into account and achieves robust results when dealing with situations with complex illumination.

In order to segment the original images better, the superpixel method should meet two requirements: First, the speed of superpixel generation should be fast; and second, the generated superpixels should “stick” to the edges.

As proposed in Refs. [39,40], we utilized sticky-edge adhesive superpixels, which meet our requirements. This method is an improved version of the simple line interface calculation (SLIC) method [41], by the addition of an edge term. With this added edge term, the generated superpixels better adhere to the edges, thus preserving more image structure and resulting in better object boundaries.

3.2. Laser points project processing

As shown in Fig. 3, laser sensor coordinates are intended to be projected into the camera coordinates, as presented in Ref. [42]. The projection of a 3D point

p_laser = (x l, y l, z l, 1) T

in the laser sensor coordinate to a point

p_camera = (x c, y c, z c, 1) T

in the camera coordinate is given as:

(1)

p_camera = R rect 0 T velo cam p_laser

where

R rect 0

is the rotation matrix. Here,

R rect 0

is expanded into a 4 × 4 matrix by appending a fourth zero row and column, and by setting

R rect 0 (4, 4) = 1

T velo cam

is the transformation matrix, and is obtained by:

(2)

T velo cam = R velo cam t velo cam 01

where

R velo cam

and

t velo cam

are the rotation matrix and the translation vector, respectively, as given in Ref. [42]:

(3)

(u, v, 1) T = P rect (i) (x c, y c, z c, 1) T

where

P rect (i)

is the projection matrix from the

i th

image plane (we use a second camera in our work) for all points

p c

, from which the location information of the camera coordinates,

(u i, v i)

, is obtained.

After the projection and rectification, we obtained a set of laser points,

P = {P i} i = 1 N

, where

P i = (x i, y i, z i, u i, v i)

3.3. Log-chromaticity color space

As presented in Ref. [36], in order to obtain color features that are independent of shadows and lighting conditions, we transformed RGB color images (

I

) into log-chromaticity spaces (

I log

), in order to generate an illuminant-invariant image

I l - c

. Each

{log (R / G), log (B / G)}

pixel value in the log-chromaticity space corresponds to a

{R, G, B}

pixel value in the original RGB image. As shown in Fig. 4, we obtain a grayscale image

I l - c

by projecting the

{log (R / G), log (B / G)}

pixel values along an orthogonal axis defined with the angle

θ

. The angle

θ

is defined as the invariant direction orthogonal to the lighting change lines; it is device dependent and can be calibrated. We set

θ

45 ∘

, as suggested in Ref. [43].

I log

can be calculated as follows:

(4)

I log (u, v) = log [I R (u, v) / I G (u, v)] + log [I B (u, v) / I G (u, v)] · tan θ 1 + tan 2 θ

4. Pixel-depth data fusion

Rather than using sensors that can directly detect obstacles [44], we try to find obstacles by leveraging the fusion of the spatial location information provided by a laser sensor with the appearance information provided by a monocular camera. This section presents a series of processes that are designed for pixel-depth data fusion. The first process fuses superpixels with laser points for efficiency and robustness. The second process, co-point mapping, involves fusing points from the laser sensor with points on the edge of the image segmentation, in order to eliminate laser points within the flat area. The third and final process involves the fusion between the spatial information and the image coordinates; this step models the spatial relationship of the laser points and the obstacles by generating an undirected graph.

4.1. Image processing with superpixel representation

Our method adopts the superpixel representation described in Section 3.1. In addition to the advantages mentioned above, it was found that superpixels and laser points are complementary. First, laser points contain spatial location information that cannot be obtained from a monocular camera; second, superpixels are dense, so that robust local statistics of both pixels and laser points can be calculated. Superpixels also contain color information that cannot be captured by a laser sensor. Therefore, instead of pixels, superpixels were taken as the minimum units in the image-processing steps. In Section 5, superpixel segmentation will be used to learn features of the initial drivable area.

4.2. Bijection of edge information via co-point mapping

Since the laser points and image pixels are both provided through observation of the same scene at the same time, they are reflections of the same structure. Therefore, the projection between them should satisfy certain constraints. Inspired by the conception of collineation in projection geometry, we introduce a new concept, co-point mapping, to describe this kind of constraint. Similar to the bijection of collineation, co-point mapping represents a bijection that maps points from a laser sensor to points on the edge of the image segmentation.

Using co-point mapping, we can improve the alignment performance and eliminate undesired laser points. More specifically, edges can be regarded as the basic element that affects both the appearance of pixels and the structure of laser points. Bijection of edge information between pixels and laser points is the key in data fusion. When aligning pixels and laser points, edges must be aligned correctly. Using cross-calibration, an initial alignment can be obtained. Next, we can improve that alignment through co-point mapping. Similarly, bijection of edge information can eliminate redundant laser points that are located in a flat area. It can accelerate the whole process and improve the robustness of our algorithm, since there may be noisy laser points within the flat area.

In this paper, we only use co-point mapping to eliminate laser points within the flat area, as shown in Fig. 5. First, we use the edges of all superpixels to form an edge pool of the whole image. Obviously, there are many redundant edges. However, this is a reasonable practice in order to retain all the true edges in the pool. In practice, image dilation is used to overcome alignment errors. Next, laser points located in the edge pool are kept for further processing, while others are discarded. The experimental results showed that 27% of the laser points were eliminated without sacrificing too much accuracy; this finding demonstrates that co-point mapping is feasible and efficient.

4.3. Obstacle classification

The obstacle classification step can be formulated as finding the mapping function,

ob (P i)

, where:

(5)

ob (P i) = 1, if P i is an obstacle point 0, otherwise

The classification result is shown in Fig. 6.

We assume that the value of

ob (P i)

only depends on the flatness of the surface around

P i

, which in the real world is reflected by the laser points. Thus, the obstacle classification problem is decomposed into two sub-problems: how to find the surface around

P i

and how to determine whether it is flat or not.

The first sub-problem is solved by leveraging a Delaunay triangulation [4]. A Delaunay triangulation for a set of points in a plane is a triangulation for which no point in the set is inside the circumcircle of any triangle in the triangulation. Its properties are such that each point has six surrounding triangles, on average. The nearest neighbor graph is a subgraph of the Delaunay triangulation, so it can be used to build a spatial relationship among the laser points. For each

P i = (x i, y i, z i, u i, v i)

, we use its image coordinate frame

(u i, v i)

in a planar Delaunay triangulation to build an undirected graph

G = {P, E}

, where

E

is the set of edges representing the spatial relationships among

P i

. Then, the surface around

P i

is composed of surfaces (triangles) determined by

{(u j, v j) j = i or P j ∈ Nb (P i)}

, where

Nb (P i)

is the set of points connected to

P i

. The edge

(P i, P j)

is eliminated if it does not satisfy the following:

(6)

‖ (P i - P j) ‖ < ε

where

‖ (P i - P j) ‖

is the Euclidean distance of

(x i, y i, z i)

and

(x j, y j, z j)

, and

ε

is the maximum length of any edge.

For the second sub-problem, the flatness of the surface around

P i

can be measured by calculating the normal vector of its neighbor triangles. Next, the normal vector

N (P i) = (x i n, y i n, z i n)

is calculated by averaging the normal vectors of these triangles. Then

ob (P i)

is obtained by the following:

(7)

ob (P i) = 1, if arcsin (z i n / ‖ Nb (P i) ‖) > c 0, otherwise

where

c

is a manually set parameter,

Nb (P i)

is the set of points connected to

P i

, and the normal vector of

P i

N (P i) = (x i n, y i n, z i n)

. Eq. (7) can be explained as follows: If the angle between

Nb (P i)

and the horizontal plane is larger than

c

P i

will be classified as an obstacle point; otherwise, it will not be classified so. Thus,

c

denotes the maximum angle threshold, which is set to 60° in the experiment. The whole process is visualized in Fig. 7.

5. Detecting drivable areas

This section explains the key step of our proposed method. First, a direction ray map (

I DRM

) is obtained from the obstacle classification result. Next, an initial drivable area is obtained by combining

I DRM

with superpixels. Afterward, the initial drivable area is characterized by features from different observations. The probabilities of each superpixel within the initial drivable area being foreground are then computed in a self-learning way. Finally, a Bayesian framework is utilized to derive the probability map of drivable areas.

5.1. Generating an initial drivable area

5.1.1. Detecting ray

The direction ray map

I DRM

is generated as shown in Algorithm 1.

First, every

P i

is transformed to polar coordinates in order to better represent the drivable area. That is,

(u i, v i)

is transformed into the polar coordinate whose origin point is the middle bottom pixel of the image (noted as

P base

). Thus,

P

is represented as

{P (h)} h = 1 H

, where

P (h) = {P i (h)} i = 1 N (h)

and

P i (h)

represents a point belongs to the

h th

angle range. Because of the sparsity of

P

, it is necessary to address two problems: first, how to get over the “leakage” problem, as shown in Fig. 8; and second, how to obtain dense pixel areas from sparse rays.

For the former problem, the solution is provided by filtering the rays’ length calculated in the image coordinate frame, as shown in Fig. 8(d). Since the width of a car is not negligible, the question of whether a region that is represented by one ray is drivable or not depends on how wide that region is. In other words, if a region is too narrow for a car to pass, no matter how flat that region may be, that region cannot be regarded as drivable. Thus, minimum filtering is adopted and the “leakage” problem is solved, as shown in Fig. 8(c).

For the latter problem, the most straightforward solution is to increase the number of

H

; however, doing so will aggravate the former problem. Therefore, we combine

I DRM

with superpixels to obtain the initial drivable area. As mentioned in Section 4.1, this solution has two advantages: First, it greatly reduces the amount of data by replacing pixels with superpixels; and second, it fuses depth and color information.

After the combination of

I DRM

with superpixels, the initial drivable area is represented by a set of superpixels defined as

S int = {S i S i ∪ I DRM ≠ ∅}

, and

P S i

represents the set of laser points located in

S i

. Therefore, all of the following features can be computed based on superpixels instead of pixels, thereby introducing robust local statistics and accelerating the entire algorithm.

5.1.2. Generating the DD feature

To detect drivable areas, features that can describe the drivable degree must be defined and well designed. In this paper, the DD feature is proposed, as shown in Algorithm 2.

Points in

P (h)

are arranged by their distances to

P base

in the image coordinate frame, which signifies that all points in

P (h)

satisfy the following:

(8)

dist (P i (h), P base) > dist (P i - 1 (h), P base), i = 2 → N (h)

Fig. 9 provides a schematic diagram of the DD value calculation. Fig. 10(c) visualizes what occurs when the DD feature is taken advantage of [45].

5.2. Obtaining self-adaptive feature models

Based on the obtained initial drivable area, candidate drivable areas are self-adaptively learned from four features in different probability spaces: the DD, NV, color, and strength features.

5.2.1. The DD feature

The DD feature of each superpixel (

D (S i)

) is calculated as follows:

(9)

D (S i) = 1 ‖ P S i ‖ ∑ P S i D (P i)

Since

D (P i)

is related to the difference in height, it is considered that the smaller

D (P i)

is, the more drivable the corresponding area that

P i

represents will be. The self-adaptive probability space of the DD feature used to locate the candidate drivable area is built through a Gaussian-like model as follows:

(10)

D prob (S i) = exp {- [D (S i) - μ D] 2 / 2 σ D 2}, if D (S i) ≥ μ D 0, otherwise

where

μ D

and

σ D 2

are the parameters of the Gaussian-like distribution that can be obtained by utilizing

S int

without the need for manual setting or training.

D prob (S i)

represents the probability of

S i

belonging to the drivable area, given the DD feature observation.

5.2.2. The NV feature

According to Section 4.3, it is believed that the larger the value the normal vector has in

z i n

, the more drivable the surface is. Therefore, the NV feature of each superpixel,

N (S i)

, is calculated using the minimum value of

z i n

Among

P S i

. Similar to

D prob (S i)

, a self-adaptive probability feature space based on a Gaussian-like model with parameters

μ n

and

σ n 2

is generated as follows:

(11)

N prob (S i) = exp {- [N (S i) - μ n] 2 / 2 σ n 2}, if N (S i) ≥ μ n 1, otherwise

where

N prob (S i)

is the probability of

S i

belonging to the drivable area, given the NV feature observation. The parameters

μ n

and

σ n 2

are estimated using the same steps as those used to estimate

μ D

and

σ D 2

, as mentioned above. Therefore, this model is self-adaptive and no manual setting is involved.

5.2.3. The color feature

As mentioned in Section 3.3, the illuminant-invariant image is utilized to obtain the color feature of

S int

. Similar to

D (S i)

and

N (S i)

, a parametric probability model is built with the Gaussian parameters

μ c

and

σ c 2

as follows:

(12)

C prob (S i) = exp - I log (S i) - μ c 2 2 σ c 2

where

C prob (S i)

is the probability of

S i

belonging to the drivable area, given the color observation, and

I log (S i)

represents the transformed pixel value of

S i

5.2.4. The strength feature

The strength feature (

Sg (S i)

) of

S i

provides a measure of the smoothness of each superpixel, which is actually the overlap of

I DRM

with each superpixel. The probability of

S i

being the drivable area is modeled as follows:

(13)

Sg prob (S i) = Sg (S i) dist (S i, P base) A (S i)

where

A (S i)

is the area of

S i

and

dist (S i, P base)

is the distance between

S i

and

P base

in the image coordinate.

5.3. The Bayesian framework

The fusion of features is conducted in the Bayesian framework. The objective is to find the posterior probability that a superpixel belongs to the drivable area, given the observations from the camera and laser sensor,

p (S i = R Obs)

, where

Obs

represents all the observations detailed above:

(14)

obs = {D (S i), C (S i), N (S i), Sg (S i) S i ∈ S int}

Next, the probability maps obtained from the above features are interpreted as the prior conditional probability that a superpixel belongs to the drivable area. Assuming that the superpixels are conditionally independent, the posterior probability of each superpixel is obtained as follows:

(15)

p (S i = R Obs) ∝ ∏ j = 1 4 p S i = R Obs j p (S i = R)

where

p (S i = R)

is the probability of a superpixel belonging to the drivable area, and is obtained by averaging all

S int

across the image set.

6. Experimental results and discussion

To test our proposed method, experiments were carried out using the ROAD-KITTI benchmark, which includes 289 training images and 290 testing images [5]. All experimental results were evaluated in bird’s eye view (BEV) with the following metrics: max F-measure (MaxF), average precision (AP), precision (PRE) and recall (REC), false positive rate (FPR), and false negative rate (FNR). Three datasets were used: Urban Marked (UM), Urban Multiple Marked (UMM), and Urban Unmarked (UU). To demonstrate the efficiency of our proposed approach (see “Our method with co-point” in the tables below), we compared it with the top three laser-sensing methods from the ROAD-KITTI benchmark’s website: HybridCRF, MixedCRF, and LidarHisto. We also listed the result from Ref. [36] (RES3D-Velo) below, since the data fusion procedures our method adopts are similar to those used in that reference. To show the impact that results from eliminating laser points by means of co-point mapping, the performances of our method with and without co-point mapping were compared. Moreover, we listed the performance of our method with co-point mapping on a training set, in order to demonstrate that our method is unsupervised.

As shown in Tables 1–4, our method yields the best performances in PRE and FPR in the UMM and UU datasets, which demonstrates that our method covers the road area well. Table 4 averages the performance; our method yields the best AP, PRE, and FPR, which reveals the robustness of our method in different scenarios. Compared with our method without co-point mapping, our method with co-point mapping eliminates around 30% of the laser data (as shown in Table 5), yet yields similar performances; this finding indicates that co-point mapping can successfully preserve the sematic structure of an image. Above all, although our method is unsupervised, it yields a competitive performance when compared with supervised methods.

In addition, in order to verify how much the feature fusion improves the performance, we compared the feature fusion results with each individual feature’s probability map, as well as with

S int

and the baseline, in Tables 6–9. All these experimental results were obtained using the training set. As Tables 6–9 show, significant improvement was achieved in MaxF and AP by means of feature fusion.

When compared with the method that used all the laser points (see “Fusion without co-point” in the tables below), the method with co-point mapping cut off around 30% of the laser points, on average, in the training set (as shown in Table 5), but still yielded a similar performance.

S int

shows exceptional performance in REC and FNR with a similar FPR as “Baseline” in the tables below, so it is a feasible choice to use

S int

to estimate the parameters, as detailed in Section 5.

As shown in Fig. 11, the image areas bounded by red lines are not road areas in the ground truth in the ROAD-KITTI dataset; however, our method tends to identify these areas as foreground, since detecting drivable areas is our main concern. Thus, the FPR value of our results is higher in the situation shown in Fig. 11. In fact, these bounded areas have ambiguous semantics, and include the transition zone between a sidewalk and road, the entrance of parking lots, and slopes for vehicles to drive onto the sidewalk. Many of these areas are designed for vehicles to drive on when necessary for convenience. In real-life application, self-driving cars are expected to choose these kinds of flat areas as candidate roads in an emergency (such as when avoiding a suddenly turning vehicle); therefore, these areas should be considered by planning algorithms.

7. Conclusion and future work

This paper proposes a self-adaptive method for drivable area detection by fusing pixel information with spatial information from laser points based on co-point mapping. Four features (the DD, NV, color, and strength features) are fused in a Bayesian framework. This method, which is based on data fusion, overcomes the disadvantages of using a single sensor when dealing with highly random and complex urban traffic scenes. Our method requires no strong hypothesis, training process, or labeled data. In addition, experiments conducted using the ROAD-KITTI benchmark testify to the efficiency and robustness of our method. Regarding future work, the first task is to divide the road into drivable areas and drivable area candidates for emergency use. Next, a dataset that can better deal with the problem of ambiguous semantics is needed. Finally, a field-programmable gate array (FPGA) implementation of our method is required in order to realize the real-time application of this method for self-driving cars.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Bar Hillel A., Lerner R., Levi D., Raz G.. Recent progress in road and lane detection: a survey. Mach Vis Appl. 2014; 25(3): 727-745.

[2]	Zhang G, Zheng N, Cui D, Yang G. An efficient road detection method in noisy urban environment. In: 2009 IEEE Intelligent Vehicles Symposium Proceedings; 2009. P. 556–561.

[3]	Cong Y., Peng J.J., Sun J., Zhu L.L., Tang Y.D.. V-disparity based UGV obstacle detection in rough outdoor terrain. Acta Autom Sin. 2010; 36(5): 667-673.

[4]	Lee D.T., Schachter B.J.. Two algorithms for constructing a delaunay triangulation. Int J Comput Inf Sci. 1980; 9(3): 219-242.

[5]	Fritsch J., Kuehnl T., Geiger A.. A new performance measure and evaluation benchmark for road detection algorithms. In: Proceedings of International Conference on Intelligent Transportation Systems (ITSC); 2013 Oct 6–9; The Hague, the Netherlands. New York: IEEE; 2013.

[6]	Tan C., Hong T., Chang T., Shneier M.. Color model-based real-time learning for road following. In: Proceedings of Intelligent Transportation Systems Conference. New York: IEEE; 2006. p. 939-944.

[7]	Rotaru C., Graf T., Zhang J.. Color image segmentation in HSI space for automotive applications. J Real-Time Image Process. 2008; 3(4): 311-322.

[8]	Jau U.L., Teh C.S., Ng G.W.. A comparison of RGB and HSI colour segmentation in real-time video images: a preliminary study on road sign detection. In: Proceedings of the 2008 International Symposium on Information Technology; 2008 Aug 26–28; Kuala Lumpur, Malaysia. New York: IEEE; 2008.

[9]	Finlayson G.D., Hordley S.D., Lu C., Drew M.S.. On the removal of shadows from images. IEEE Trans Pattern Anal Mach Intell. 2006; 28(1): 59-68.

[10]	Maddern W., Stewart A., McManus C., Upcroft B., Churchill W., Newman P.. Illumination invariant imaging: applications in robust vision-based localisation, mapping and classification for autonomous vehicles.

[11]	Alvarez J.M., Gevers T., LeCun Y., Lopez A.M.. Road scene segmentation from a single image. In: Proceedings of the 12th European Conference on Computer Vision: Volume Part VII; 2012 Oct 7–13; Florence, Italy. Heidelberg: Springer-Verlag Berlin; 2012. p. 376-389.

[12]	Teichmann M, Weber M, Zoellner M, Cipolla B, Urtasun R. MultiNet: real-time joint semantic reasoning for autonomous driving. 2016. arXiv:1612.07695.

[13]	Krizhevsky A., Sutskever I., Hinton G.E.. Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst. 2017; 64(6): 64-90.

[14]	Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014. arXiv:1409.1556.

[15]	Badrinarayanan V, Handa A, Cipolla R. Segnet: a deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling. 2015. arXiv:1505.07293.

[16]	Long J., Shelhamer E., Darrell T.. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2015 Jun 7–12; Boston, MA, USA. New York: IEEE; 2015. p. 3431-3440.

[17]	Cong Y., Liu J., Yuan J., Luo J.. Self-supervised online metric learning with low rank constraint for scene categorization. IEEE Trans Image Process. 2013; 22(8): 3179-3191.

[18]	Cong Y., Liu J., Fan B., Zeng P., Yu H., Luo J.. Online similarity learning for big data with overfitting. IEEE Trans Big Data. 2017; 4(1): 78-89.

[19]	Alvarez JM. Gevers T. Lopez AM. 3D scene priors for road detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2010; 2010 Jun 13–18; San Francisco, CA, USA. New York; 2010. P. 57–64.

[20]	Nan Z., Wei P., Xu L., Zheng N.. Efficient lane boundary detection with spatial-temporal knowledge filtering. Sensors. 2016; 16(8): 1276.

[21]	Hoiem D., Efros A.A., Hebert M.. Recovering surface layout from an image. Int J Comput Vis. 2007; 75(1): 151-172.

[22]	Kong H., Audibert J., Ponce J.. Vanishing point detection for road detection.

[23]	Sivic J., Kaneva B., Torralba A., Avidan S., Freeman W.T.. Creating and exploring a large photorealistic virtual space. In: Proceedings of 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops; 2008 Jun 23–28; Anchorage, AK, USA. New York: IEEE; 2008.

[24]	Montemerlo M., Becker J., Bhat S., Dahlkamp H., Dolgov D., Ettinger S., . Junior: the Stanford entry in the urban challenge. J Field Robot. 2008; 25(9): 569-597.

[25]	Thrun S., Montemerlo M., Dahlkamp H., Stavens D., Aron A., Diebel J., . Stanley: the robot that won the DARPA grand challenge. J Field Robot. 2006; 23(9): 661-692.

[26]	Urmson C., Anhalt J., Bagnell D., Baker C., Bittner R., Clark M.N., . Autonomous driving in urban environments: boss and the urban challenge. J Field Robot. 2008; 25(8): 425-466.

[27]	Neidhart H., Sester M.. Extraction of building ground plans from LiDAR data. Int Arch Photogramm Remote Sens Spat Inf Sci. 2008; 37(Pt 2): 405-410.

[28]	Hu X., Rodriguez F.S.A., Gepperth A.. A multi-modal system for road detection and segmentation. In: 2014 IEEE Intelligent Vehicles Symposium Proceedings; 2014 Jun 8–11; Dearborn, MI, USA. New York: IEEE; 2014. p. 1365-1370.

[29]	Fischler M.A., Bolles R.C.. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM. 1981; 24(6): 381-395.

[30]	Wellington C., Courville A., Stentz A.T.. A generative model of terrain for autonomous navigation in vegetation. Int J Robot Res. 2006; 25(12): 1287-1304.

[31]	Klasing K., Wollherr D., Buss M.. Realtime segmentation of range data using continuous nearest neighbors. In: Proceedings of 2009 IEEE International Conference on Robotics and Automation; 2008 May 13–19; Kobe, Japan. New York: IEEE; 2009. p. 2431-2436.

[32]	Diebel J., Thrun S.. An application of Markov random fields to range sensing. Adv Neural Inf Process Syst. 2005; 18: 291-298.

[33]

Shotton J., Winn J., Rother C., Criminisi A.. Textonboost: joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Proceedings of the 9th European Conference on Computer Vision: Volume Part I; 2006 May 7–13; Graz, Austria. Berlin: Springer; 2006. p. 1-15.

[34]	Xiao L., Dai B., Liu D., Tingbo H., Tao W.. CRF based road detection with multi-sensor fusion. In: Proceedings of 2015 IEEE Intelligent Vehicles Symposium (IV); 2015 Jun 28–Jul 1; Seoul, Korea. New York: IEEE; 2015. p. 192-198.

[35]	Huang W., Gong X., Yang M.Y.. Joint object segmentation and depth upsampling. IEEE Signal Process Lett. 2015; 22(2): 192-196.

[36]	Alvarez A.J.M., Lopez A.M.. Road detection based on illuminant invariance. IEEE Trans Intell Transp Syst. 2011; 12(1): 184-193.

[37]	Shinzato P.Y., Wolf D.F., Stiller C.. Road terrain detection: avoiding common obstacle detection assumptions using sensor fusion. In: 2014 IEEE Intelligent Vehicles Symposium Proceedings; 2014 Jun 8–11; Dearvorn, MI, USA. New York: IEEE; 2014. p. 687-692.

[38]	Ren X., Malik J.. Learning a classification model for segmentation. In: Proceedings of the 9th IEEE International Conference on Computer Vision; 2003 Oct 13–16; Nice, France. New York: IEEE; 2003. p. 10-17.

[39]	Dollár P., Zitnick C.L.. Structured forests for fast edge detection. In: Proceedings of 2013 IEEE International Conference on Computer Vision; 2013 Dec 1–8; Sydney, NSW, Australia. New York: IEEE; 2013. p. 1841-1848.

[40]	Zitnick C.L., Dollár P.. Edge boxes: locating object proposals from edges. In: editor. Computer Vision—ECCV 2014; 2014 Sep 6–12; Zurich, Switzerland. Cham: Springer; 2014. p. 391-405.

[41]	Achanta R., Shaji A., Smith K., Lucchi A., Fua P., Süsstrunk S.. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans Pattern Anal Mach Intell. 2012; 34(11): 2274-2282.

[42]	Geiger A., Lenz P., Stiller C., Urtasun R.. Vision meets robotics: the KITTI dataset. Int J Robot Res. 2013; 32(11): 1231-1237.

[43]	Hu X., Rodriguez F.S.A, Gepperth A.. A multi-modal system for road detection and segmentation. In: 2014 IEEE Intelligent Vehicles Symposium Proceedings. New York: IEEE; 2014. p. 1365-1370.

[44]	Wang T., Zheng N., Xin J., Ma Z.. Integrating millimeter wave radar with a monocular vision sensor for on-road obstacle detection applications. Sensors. 2011; 11(9): 8992-9008.

[45]	Liu J., Gong X.. Guided depth enhancement via anisotropic diffusion. In: Proceedings of the 14th Pacific-Rim Conference on Multimedia; 2013 Dec 13–16; Nanjing, China. Berlin: Springer; 2013. p. 408-417.

Funding

()

PDF (4008KB)

2539

Accesses

Citation

Detail

Sections

Recommended

Browse

Authors & reviewers

About the journal

Abstract

Keywords

Cite this article

1. Introduction

2. Related work

3. Preliminary knowledge

3.1. Superpixel representation

3.2. Laser points project processing

3.3. Log-chromaticity color space

4. Pixel-depth data fusion

4.1. Image processing with superpixel representation

4.2. Bijection of edge information via co-point mapping

4.3. Obstacle classification

5. Detecting drivable areas

5.1. Generating an initial drivable area

5.1.1. Detecting ray

5.1.2. Generating the DD feature

5.2. Obtaining self-adaptive feature models

5.2.1. The DD feature

5.2.2. The NV feature

5.2.3. The color feature

5.2.4. The strength feature

5.3. The Bayesian framework

6. Experimental results and discussion

7. Conclusion and future work

References

Funding