The forward design of trajectory planning strategies requires preset trajectory optimization functions, resulting in poor adaptability of the strategy and an inability to accurately generate obstacle avoidance trajectories that conform to real driver behavior habits. In addition, owing to the strong time-varying dynamic characteristics of obstacle avoidance scenarios, it is necessary to design numerous trajectory optimization functions and adjust the corresponding parameters. Therefore, an anthropomorphic obstacle-avoidance trajectory planning strategy for adaptive driving scenarios is proposed. First, numerous expert-demonstrated trajectories are extracted from the HighD natural driving dataset. Subsequently, a trajectory expectation feature-matching algorithm is proposed that uses maximum entropy inverse reinforcement learning theory to learn the extracted expert-demonstrated trajectories and achieve automatic acquisition of the optimization function of the expert-demonstrated trajectory. Furthermore, a mapping model is constructed by combining the key driving scenario information that affects vehicle obstacle avoidance with the weight of the optimization function, and an anthropomorphic obstacle avoidance trajectory planning strategy for adaptive driving scenarios is proposed. Finally, the proposed strategy is verified based on real driving scenarios. The results show that the strategy can adjust the weight distribution of the trajectory optimization function in real time according to the “emergency degree” of obstacle avoidance and the state of the vehicle. Moreover, this strategy can generate anthropomorphic trajectories that are similar to expert-demonstrated trajectories, effectively improving the adaptability and acceptability of trajectories in driving scenarios.
Jian Wu, Yang Yan, Yulong Liu, Yahui Liu.
Research on Anthropomorphic Obstacle Avoidance Trajectory Planning for Adaptive Driving Scenarios Based on Inverse Reinforcement Learning Theory.
Engineering, 2024, 33(2): 147-160 DOI:10.1016/j.eng.2023.07.018
With the development of information and communications technology (ICT) and artificial intelligence (AI) technologies, intelligence has become an inevitable trend in automobile development [1]. Planning a collision-free trajectory from an initial state to a target state while driving is an important aspect of vehicle intelligence [2].
Trajectory planning, as a link between the decision-making and control execution layers of intelligent driving technology, has been less studied because of the lack of abundant and accurate vehicle trajectories. Existing trajectory planning techniques can be roughly divided into four types: graph search [3], [4], [5], sampling [6], [7], [8], curve interpolation [9], [10], [11], and numerical optimization [12], [13], [14]. Based on the linear time-varying model predictive control method, the vehicle lateral obstacle avoidance trajectory planning problem was defined as a constrained optimal control problem [15]. A real-time solution to the trajectory planning problem was realized based on the spatial–temporal grid method [16]. Based on the potential energy function and vehicle-reachable set, a robust model predictive control method was adopted to generate feasible vehicle trajectories [17]. Based on vehicle–vehicle communication technology, the trajectory–planning problem has been transformed into a constrained-optimization problem to solve the obstacle avoidance trajectory satisfying driving safety and ride comfort [18]. A time-independent polynomial equation was used to construct a dynamic obstacle-avoidance trajectory planning model [19]. A linear quadratic method of constrained iteration was proposed to solve the trajectory planning problem by transforming it into a nonlinear programming problem [20]. The aforementioned strategies satisfy the vehicle dynamics constraints and effectively address the safety and efficiency of obstacle avoidance trajectories.
However, these strategies mostly require the design of trajectory optimization functions in advance and cannot be extended to complex driving scenarios. The design of the trajectory optimization function plays a key role in trajectory planning [21]. Considering the complexity of driving scenarios, the dynamics of traffic flow, the randomness of driving behavior, and the game between traffic participants, it is a significant challenge for a vehicle decision system to realize autonomous obstacle avoidance trajectory planning and meet safety and comfort requirements. The establishment of a trajectory optimization function requires not only experienced engineers to carefully design each component, but also to formulate its tradeoff strategy. Highly complex and dynamic driving scenarios often require a large number of trajectory optimization functions, which makes their design more tedious and difficult and may fail to obtain the optimal solution [22].
To avoid the numerous complications associated with manual design, scholars have proposed various methods for directly or indirectly recovering planning strategies from expert-demonstrated obstacle avoidance trajectories (EDOATs). By combining imitation learning and optimization, a long time-domain model predictive control method was adopted to obtain expert obstacle avoidance trajectories [23]. A backpropagation neural network model was used to predict obstacle avoidance time to design a constant-speed offset obstacle avoidance trajectory [24]. A hyperbolic-tangent trajectory model was proposed to reconstruct the actual trajectory using a large number of real driver obstacle avoidance scenarios [25]. A trajectory optimization function was designed by constructing the relationship between the obstacle avoidance decisions of real drivers and the trajectory selection probability [26]. Although the aforementioned studies generated effective obstacle avoidance trajectories, obstacle avoidance on structured roads involving surrounding dynamic traffic entities presents a complex challenge. This necessitates a high degree of scenario adaptability for the optimization function of the obstacle avoidance trajectory. The process of establishing a mapping relationship between driving scenarios and obstacle avoidance trajectory optimization functions has not been widely explored.
In addition to security considerations, acceptability is a crucial factor in obstacle avoidance trajectory planning [27], [28]. An anthropomorphic trajectory that conforms to the driving habits of drivers can enable vehicles to avoid obstacles more smoothly. This can reduce the tension of drivers and passengers, and improve ride comfort and acceptability. The Gaussian mixture model has been used as a statistical method to model drivers and can effectively describe the differences in drivers’ behavioral habits [29]. Remote-learning control modes and real vehicle test data have been used to imitate human driving behavior and develop different styles of driver modeling [30]. To simulate the behavior habits of real drivers, a recurrent neural network-based long short-term memory network architecture was developed, and a trajectory planning strategy based on vision and imitation learning was proposed [31]. By integrating personalized driving habits into obstacle avoidance trajectory planning, a safe driving trajectory that adapts to different driver styles was planned [32]. The state grid method was combined with model predictive control to achieve automatic driving with personalized trajectory characteristics of the driver [33]. A convolutional neural network was used to extract track features from road images collected by cameras, and track planning of intelligent vehicles was realized through classification [34]. A Gaussian mixture model based on neural network parameterization was proposed to predict the trajectories of vehicles at the entrance of an expressway [35]. A lane-change prediction algorithm adapted to the driver’s personalized style was proposed. A trained support-vector-machine decision model was integrated into the model predictive control framework to predict the lane-change behaviors of different drivers [36]. These strategies have ensured the acceptability of obstacle avoidance trajectories and have made important contributions to the development of human-like driving.
Therefore, considering both the acceptability and adaptability of obstacle avoidance trajectory planning strategies, and avoiding the complicated process of manual parameter tuning, this study proposes anthropomorphic obstacle avoidance trajectory (AOAT) planning for adaptive driving scenarios using the HighD natural driving dataset [37]. The main contributions of this study are summarized as follows.
(1) To avoid complicated parameter adjustment of the optimization function, this study applies inverse reinforcement learning theory for the offline learning of EDOATs. A trajectory expectation feature-matching algorithm is proposed to achieve automatic recovery of the optimization function.
(2) To enhance the adaptability of the trajectory optimization function to driving scenarios, this study uses the speed of the ego vehicle and the speed difference between the ego and lead vehicles as the primary driving scenario parameters influencing obstacle avoidance trajectory planning. A mapping relationship between the driving scenario information and the obstacle avoidance trajectory optimization function is established.
The trajectory planning problem is described mathematically in Section 2. The trajectory expectation feature matching algorithm based on maximum entropy inverse reinforcement learning theory is introduced in Section 3. The mapping relationship between the trajectory cost function and key driving scenarios is established and verified in Section 4. The conclusions and future work are presented in Section 5.
2. Related work
2.1. Problem description
The problem of vehicle obstacle avoidance trajectory planning can be roughly described as follows. After receiving an obstacle avoidance instruction from the decision-making layer, the system preselects a feasible trajectory set based on control strategies according to the current driving scenarios, where represent the first, second, …, th feasible trajectories, respectively. Subsequently, the optimal trajectory is selected to optimize the predesigned optimization function.
where is the final optimized obstacle avoidance trajectory and is the trajectory optimization function.
To balance the efficiency, comfort, and safety of the trajectory planning, the optimization function of the obstacle avoidance trajectory is expressed as
where is a vector composed of trajectory features, and is the weight matrix balancing these trajectory features, , where represents the weight matrices of the trajectory features for the first, second, …, th trajectories, respectively. It should be noted that there is a complex mapping relationship between feature weights and driving scenarios, which leads to different obstacle avoidance trajectories.
Fig. 1 shows the driving-scenario information of intelligent vehicles for obstacle avoidance on structured highways. In Fig. 1, vx represents the longitudinal speed of the vehicle during collision avoidance. In the same driving scenario, different weights of trajectory features result in different obstacle avoidance trajectories planned by the vehicles. If the speed of the ego vehicle is much higher than that of the lead vehicle when it receives an obstacle avoidance command, the red trajectory may be better than the green and blue trajectories, based on the degree of obstacle avoidance urgency. Therefore, the driving scenarios information including the states of the ego vehicle and surrounding vehicles is defined as , , where represents the driving scenarios information for the first, second, …, nth trajectories, respectively.
Eq. 2 can be rewritten as
where is the weights parameter of the mapping model from driving scenarios information to trajectory feature weights .
To balance the efficiency, comfort, and safety of trajectory planning, the feature weights of the trajectory optimization function must be manually adjusted by engineers based on their own practical experience. Adjusting the parameters is generally tedious and requires considerable repetition. To solve this problem, this study proposes a new trajectory-planning scheme.
As shown in Fig. 2, the strategy adopted in this study is divided into two stages: offline training and online optimization. In the offline training stage, abundant expert obstacle avoidance trajectories and corresponding driving scenarios information are firstly extracted from the HighD natural driving dataset. Then, using the maximum entropy inverse reinforcement learning technology, the feature weights of the trajectory optimization function are extracted from the EDOATs. Subsequently, the key driving scenario information that affects the obstacle avoidance trajectory is extracted, and a mapping model of the driving scenarios to the trajectory feature weight is constructed. In this study, the model parameter is obtained using multivariate nonlinear fitting. In the online optimization stage, the optimization function of the obstacle avoidance trajectory is reconstructed based on the driving scenarios information and the mapping model obtained in the offline training stage.
2.2. Natural driving dataset selection and trajectory extraction
As mentioned above, to avoid a series of problems caused by manual adjustment of weights , this research proposes an automatic recovery method of feature weights based on inverse reinforcement learning theory, which has been widely used as a tool to obtain optimization functions from expert examples. The core idea of inverse reinforcement learning theory is to generate strategies that match expert examples by obtaining the weights of the optimization function. Therefore, the extraction of the obstacle avoidance trajectory of a real driver is necessary to obtain the feature weights of the trajectory optimization function.
In intelligent driving research, natural driving trajectory datasets provide an effective tool for the development and verification of intelligent driving algorithms. Currently, researchers mainly use the next-generation simulation (NGSIM), KITTI, Cityscapes, and HighD datasets. The KITTI and Cityscapes datasets mainly focus on urban road conditions. The trajectories in the NGSIM dataset show poor smoothness of lateral displacement and lateral velocity. This study focuses on obstacle avoidance trajectory planning for autonomous vehicles when driving on structured highways. Through a simple comparison, the HighD dataset was selected to develop and verify the obstacle-avoidance trajectory algorithm. The following is a description of the data used in the study.
Each frame of the HighD dataset provides 25 groups of data centered on the vehicle [37], as shown in Fig. 3. Based on the HighD dataset, obstacle avoidance trajectory planning for vehicles on structured highways was considered. For the main research objective of this study, an excellent obstacle avoidance trajectory in the dataset should first be extracted. In the process of obstacle avoidance trajectory extraction, the following items were considered:
The HighD dataset utilized drones to capture typical road sections. Typical captured sections included ordinary highway and ramp inflow scenarios. This study focuses on the obstacle avoidance behavior of vehicles on ordinary straight roads. Therefore, the free lane-change trajectory data in the barrier-free situation in the front and the ramp inflow trajectory data are discarded in the trajectory extraction process.
To improve the effectiveness of the designed obstacle avoidance trajectory, trajectory data with vehicles around, but not in, the captured field of vision were discarded. Trajectory data in which the surrounding vehicles have left the captured field of vision before obstacle avoidance is complete were also discarded.
The moment when the absolute value of vehicle lateral velocity reaches 0.1 m·s−1 is considered as the starting moment of obstacle avoidance.
The focus is mainly on high-speed driving obstacle avoidance scenarios, and the data of the obstacle avoidance trajectory when the vehicle speed was less than 30 km·h−1 were discarded.
Based on the above four points, 262 groups of EDOAT data were extracted from the HighD dataset. Fig. 4 shows the extracted EDOATs.
To facilitate the design of the trajectory optimization function, the longitudinal velocity of the EDOAT was first analyzed.
The longitudinal velocity at the start time of obstacle avoidance is assumed to be , the maximum or minimum longitudinal velocity from the start time to the end of obstacle avoidance is assumed to be . The maximum rate of change of the longitudinal velocity during obstacle avoidance is defined in Eq. 4.
where reflects the degree of change in longitudinal velocity during obstacle avoidance. Statistical analysis of the EDOATs was performed to obtain a statistical histogram of , as shown in Fig. 5.
A total of 96.56% of the vehicles had values of less than 10.00%. A total of 75.95% of the vehicles had values of less than 5.00%. Therefore, it can be inferred that the degree of longitudinal velocity change is very small during obstacle avoidance. This is closely related to the ride comfort and safety of vehicles. Therefore, it is assumed that the longitudinal velocity of the vehicle was constant during obstacle avoidance.
3. Methodology
3.1. Trajectory expectation feature matching algorithm based on inverse reinforcement learning
The trajectory optimization function plays a key role in the design of an obstacle-avoidance trajectory planning strategy. Optimizing a function based on an engineer’s manual design is time consuming and laborious. Although the obstacle-avoidance function is satisfied, the driving habits of real drivers are ignored, which leads to poor acceptability.
3.1.1. Principle of inverse reinforcement algorithm
The key of inverse reinforcement learning is to find the desired parameter such that the obstacle avoidance trajectory generated by the trajectory optimization function is similar to the EDOAT. Trajectory comprises a continuous sequence of states.
where is the state vector of the agent at moment , . The trajectory optimization function is the mapping of state feature vector to the state feature weights. The state feature vector is defined as the related features that affect the driving state of the vehicle and the surrounding vehicles during obstacle avoidance trajectory planning.
In this study, a linear relationship between optimization function and state feature vector is preliminarily assumed.
Based on the above definition, the objective of inverse reinforcement learning can be described as follows. Given a set of expert-demonstrated trajectories and obtaining its state feature weights , the optimization function parameterized by the feature weights can generate obstacle avoidance trajectories similar to expert-demonstrated trajectories. That is, the probability distribution of the trajectory is expected to be obtained such that the expectation of the trajectory features that satisfy the probability distribution model is consistent with the empirical features of the demonstration trajectory.
where is the number of EDOATs. Because the only constraint condition introduced is given by Eq. 7, the maximum entropy principle best reflects existing information. Therefore, the trajectory probability distribution obtained in this study can be expressed as
where is defined as the distribution entropy of trajectory probability distribution model . Based on the Lagrange multiplier method, the Lagrange function is constructed as expressed by Eq. 9.
where and are Lagrange multipliers. Thus, Eq. 9 is transformed into the optimization problem of Lagrange function .
According to the variational method:
where is the partition function.
Finally, the trajectory probability distribution model is given by Eq. 13.
The distribution entropy of the maximization system under feature matching is equivalent to the likelihood of a maximization expert-demonstrated trajectory under the exponential probability distribution model. Therefore, the distribution parameter of the probability distribution model is given by Eq. 14.
where represents the set of EDOATs, represents the th EDOAT.
Eq. 14 does not generally provide an analytical solution. Therefore, this study obtains the value of through numerical iteration. To achieve this, the gradient of the likelihood pair distribution parameter of the expert-demonstrated trajectory must be determined.
The partial derivative with respect to yields Eqs. 16, 17.
The gradient of the distribution parameter is equal to the difference between the expected trajectory features and empirical features of the expert-demonstrated trajectory under the exponential distribution model, as given by Eq. 16.
When the feature of a trajectory in the exponential distribution model is large, the corresponding weight coefficient of the feature increases, as given by Eq. 17. Combined with Eq. 13, the probability of the probability distribution model choosing this trajectory is “exponentially” reduced. Finally, the trajectory features are reduced to the empirical characteristics of the expert-demonstrated trajectory.
Obtaining the gradient requires the expectations of the trajectory features to be calculated using an exponential probability distribution model. Calculating the expectations of trajectory features using a probabilistic model for high-dimensional continuous space problems is difficult. Similar to Ref. [38], inverse optimal control theory is adopted in this study. Therefore, the feature of the trajectory with the highest probability is used to approximate the expectation of the trajectory features under the exponential probability distribution model.
The final gradient, Gr, is expressed by Eq. 19.
Based on the abovementioned inverse reinforcement learning process, the trajectory expectation feature matching algorithm is given as follows.
In contrast to the global path planning algorithm, the trajectory provided by the local path planning algorithm is generally a continuous function of space coordinates at a certain time in the future.
where and are the abscissa and ordinate of the obstacle avoidance trajectory, respectively, relative to a certain coordinate system; is the initial obstacle avoidance time; is the total duration of obstacle avoidance.
According to Fig. 5, the change in the longitudinal velocity during obstacle avoidance can be ignored; thus, can be expressed as
where represents the longitudinal velocity at the starting time of obstacle avoidance.
The lateral obstacle avoidance trajectory is a continuous function of time; therefore, the obstacle avoidance trajectory optimization problem is a typical infinite-dimensional optimization problem, which must be transformed into a finite-dimensional optimization problem. By introducing the “track support point,” the vehicle’s obstacle avoidance trajectory is represented by a quintic spline curve. Thus, the infinite-dimensional optimization problem is transformed into a finite-dimensional optimization problem.
According to the extracted EDOATs, most vehicles completed obstacle avoidance behavior within 9 s, as shown in Fig. 4. In this study, the starting time of obstacle avoidance was taken as the starting point on the time axis, and seven discrete points were taken at equal intervals of 1.5 s, as shown in Fig. 6. The lateral position corresponding to initial moment is the lateral position of the vehicle at the initial moment of obstacle avoidance, which is assumed to be known in this study. Finally, the point set composed of six “trajectory support points” is selected as the interpolation point for the spline interpolation of the obstacle avoidance trajectory.
The point set divides the vehicle’s lateral obstacle avoidance trajectory into six segments. Based on the principle of quintic spline interpolation, the th segment of the lateral obstacle avoidance trajectory is represented by Eq. 23.
where , , is the fourth derivative of the curve, and is the second derivative of the curve.
According to the continuity of the first and third derivatives at the interpolation points at both ends of the quintic spline interpolation curve, the following can be obtained:
where
Thus, the optimization function of the obstacle avoidance trajectory is defined as a linear combination of the trajectory feature and feature weight vector . The features of the obstacle avoidance trajectories must be designed in advance. The following three trajectory features are used to describe the trajectory of the vehicle obstacle avoidance:
where , , and represent the lateral position deviation, lateral velocity, and lateral acceleration characteristics of the trajectory, respectively; , , and are the maximum lateral position difference, lateral velocity, and lateral acceleration, respectively, during the obstacle avoidance process. is the abscissa of the vehicle when the expected obstacle avoidance is complete, and respectively represent the first-order and second-order derivatives of with respect to time.
The vehicle obstacle avoidance process can be divided into obstacle avoidance risk assessment, obstacle avoidance target lane selection, and trajectory planning. The first two points are the content of obstacle avoidance behavior decisions, and the focus of this study is vehicle lateral obstacle avoidance trajectory planning. Therefore, it is assumed that the obstacle avoidance instructions and expected obstacle avoidance lanes are known.
The trajectory features in Eq. 26 describe the distance between the lateral positions of the vehicle and target during obstacle avoidance. The trajectory features in Eq. 27 describe the speed at which the vehicle approaches the target position. The trajectory features in Eq. 28 describe the acceleration of the vehicle as it approaches the target position. To a certain extent, this reflects the comfort of the obstacle avoidance trajectory. Combined with the three proposed trajectory features, the optimization function of the obstacle avoidance trajectory can be written as
where , , and are the weight coefficients of , , and , respectively.
Next, the inverse reinforcement learning algorithm is combined with the obstacle avoidance trajectory based on the quintic spline curve proposed in this study. The trajectory expectation feature matching algorithm based on inverse reinforcement learning is illustrated in Fig. 7.
In Fig. 7, when the binary norm of the update gradient of the trajectory feature weight is less than a pre-given value , the algorithm terminates. During the learning process, feature weight vector is always positive. When is negative, is first multiplied by the coefficient . Then, we take the exponent of and multiply it by to achieve the update.
3.2. Obstacle avoidance trajectory planning for adaptive driving scenarios
The following obstacle avoidance behaviors of experienced drivers on a structured expressway are closely related to the surrounding driving scenarios. Based on the above model, this section explores the quantitative model relationship between the optimization function of the obstacle avoidance trajectory and the surrounding driving scenarios. An AOAT planning strategy for adaptive driving scenarios is proposed.
To study the quantitative model of the optimization function and driving scenarios, we define the driving scenarios information , which includes the states of the ego vehicle and surrounding vehicles. Firstly, the mapping relationship between obstacle avoidance environment and weight coefficient of the obstacle avoidance trajectory features is obtained.
where is used to represent the mapping relationship between trajectory feature weights and critical driving scenarios, .
The mapping model must be interpretable and transferable. Therefore, a polynomial regression model is used to study the mapping relationship between driving scenarios and weight coefficients .
where –, –, and – are the fitting parameters of , respectively, and and denote the key environmental information that affect the weight coefficients, .
4. Strategy validation and results analysis
4.1. Verification of inverse reinforcement learning algorithm based on feature matching
To verify the effectiveness of the algorithm, an EDOAT (demonstration trajectory 1) was randomly selected from the HighD dataset to verify the algorithm. The initial trajectory feature weight in the maximum entropy inverse reinforcement learning algorithm was set to .
After 76 iterations, the binary norm of update gradient of the trajectory feature weights is less than the set value . The optimal feature weight vector obtained by inverse reinforcement learning theory is .
The blue line represents the lateral position, velocity, and acceleration of demonstration trajectory 1, as shown in Fig. 8. The arrow indicates the change in the direction of the features during the learning process. The final learned obstacle avoidance trajectory has a good degree of feature matching with demonstration trajectory 1, which verifies the effectiveness of the learning algorithm.
To further quantify the learning effect of the algorithm, the change of feature gradient in the learning process is given. Fig. 9(a) – (c) show the gradient changes in the features given in Eqs. 26, 28 during the learning process. As the learning continued, the gradients of the three trajectory features gradually approached zero, as shown in Fig. 9. The difference between the features of the demonstration trajectory and the feature expectations of the trajectory decreased.
Next, another expert trajectory (demonstration trajectory 2) was randomly selected from the HighD dataset for algorithm verification. The initial trajectory feature weight was set to .
After 86 iterations, the binary norm of update gradient of trajectory feature weights is less than the set value , as shown in Fig. 10, Fig. 11. The optimal feature weight vector obtained by inverse reinforcement learning theory is .
To explore the influence of the initial track feature weight vector on the final results, we set the initial trajectory feature weights to for the same demonstration trajectory (demonstration trajectory 2).
After 38 iterations, the binary norm of update gradient of trajectory feature weights is less than the set value , as shown in Fig. 12, Fig. 13. The optimal feature weight vector obtained using inverse reinforcement learning theory is .
It can be seen that under different initial weights of the trajectory features, the final trajectory learned is very similar, but there are significant differences in the numerical weights of the trajectory features. This is because the absolute value of each feature weight coefficient in the trajectory optimization function does not determine the trajectory features, that is, the relative value. The first weighting coefficient is set to unit 1. Under completely different initial values of the two feature weights, the ratios of and to are and , respectively. Under the two different initial values of the feature weights, the relative changes in the feature weight coefficients were less than 3%, as shown in Fig. 14.
To simplify the calculations, the ratio of the feature weights was used to describe the trajectory optimization function.
Based on the HighD dataset, the obstacle-avoidance trajectories of 262 groups of vehicles on structured roads were extracted. The extracted trajectories are learned to obtain the statistical results of feature weight vector , .
4.2. Validation of the mapping model
To effectively select the key environmental information that affects the vehicle obstacle avoidance trajectory planning behavior, the driving scenario information was analyzed from a practical perspective. Generally speaking, the actual length of the driver’s obstacle avoidance time is mainly affected by “obstacle avoidance emergency degree,” “obstacle avoidance risk,” and the state of the vehicle. When the time headway (THW) between the ego vehicle and lead vehicle is small, the driver is more inclined to complete obstacle avoidance over a shorter time. However, if the speed is high, the driver may be more inclined to complete the obstacle avoidance task over a longer time to ensure the comfort of obstacle avoidance and reduce the tension of the passengers.
Based on practical experience, drivers usually perform obstacle avoidance operations to leave the congested lane and improve driving efficiency.
Based on the analysis of 262 groups of obstacle avoidance information extracted from the HighD dataset, it was found that approximately 83.97% of the vehicle speeds were higher than that of the lead vehicle when obstacles were avoided. Approximately 85.50% of the vehicles avoided obstacles with a distance-to-headway (DHW) of less than 100 m; approximately 91.61% of vehicles avoided obstacles with a THW of less than 4 s; and approximately 90.84% of the vehicles avoided obstacles with a time-to-collision (TTC) of less than 50 s.
After the vehicle receives the obstacle avoidance command, the speed difference between the ego and lead vehicles, DHW, and TTC can be used as key driving scenario information to represent the urgency of obstacle avoidance, as shown in Fig. 15.
Considering the above situation, the speed difference between the ego and lead vehicles is regarded as one of the key factors affecting vehicle obstacle avoidance trajectory planning. It is worth noting that the vehicle speed also affects the choice of obstacle avoidance trajectory. When the speed is high, the driver usually chooses a trajectory with a smaller curvature to reduce passenger tension and avoid vehicle instability caused by urgent obstacle avoidance. Therefore, to make the obstacle avoidance trajectory more anthropomorphic and acceptable, the vehicle speed is also used as one of the key factors affecting vehicle trajectory planning.
In summary, and are key scenario factors affecting obstacle avoidance trajectory planning. The relative weight coefficients of each feature in the trajectory optimization function determine the trajectory features. Therefore, the mapping relationship between the key scenario factors and the weight coefficients of trajectory features is redefined.
Nonlinear fitting technology was adopted for the expert-demonstrated trajectories to obtain the mapping relation from the driving scenario information to the feature weight of the trajectory optimization function:
The fitted curves corresponding to Eq. 34 are shown in Fig. 16(a) and (b), respectively.
Table 1 lists the fitting results of the mapping model between the driving scenarios and weights of the trajectory features. According to Table 1, the estimation model can better explain the mapping relationship between the key information of obstacle avoidance scenarios and feature weights.
As shown in Fig. 16, and are subtractive functions of ; and will decrease as the speed difference between the ego vehicle and lead vehicle increases. The greater the vehicle speed difference , the higher is the urgency of vehicle obstacle avoidance and collision risk. Therefore, the corresponding obstacle avoidance completion time should be shorter, and the corresponding and should be smaller. Because and are increasing functions of , the higher the speed of the ego vehicle when avoiding obstacles, the greater are and . This can be explained in terms of obstacle avoidance risk. When the vehicle is moving at high speed, an obstacle avoidance time that is too short will increase the risk of dangerous conditions, such as rollover, thereby increasing the driver’s nervousness. Therefore, when the speed is high, the corresponding obstacle avoidance time should be longer, and the corresponding and values should be higher.
At this point, the offline learning phase is completed. The effectiveness of the obstacle avoidance trajectory planning strategy is verified as follows.
4.3. Simulation and experimental verification
In this study, 262 groups of real vehicle obstacle avoidance trajectory data points were extracted from the HighD dataset. A total of 220 groups of trajectory data were used as samples to train the mapping relationship between the key information of the driving scenarios and feature weights of the trajectory optimization function. The effectiveness of the proposed trajectory-planning strategy was verified based on the remaining 42 groups of real obstacle-avoidance trajectory data.
First, a trajectory optimization function for comparison (Scheme 1) was constructed by taking the average of the learned feature weights from 220 obstacle avoidance trajectories. The trajectory optimization function constructed from the feature weights obtained through the mapping of driving scenario information is shown in Scheme 2. To more intuitively quantify the advantages and disadvantages of the two strategies, the trajectory feature distance vector is introduced, .
where – are, respectively, the features of the obstacle avoidance trajectory generated by the trajectory planning algorithm; and – are, respectively, the features of the EDOAT. A specific description is provided in Eqs. 26, 27, 28. According to the definition of the trajectory feature distance vector , this value describes the distance between the obstacle avoidance trajectory generated by the planning algorithm and features of the expert-demonstrated trajectory. The smaller is, the closer the generated obstacle avoidance trajectory is to the expert-demonstrated trajectory.
The proposed algorithm (Scheme 2) and comparison scheme (Scheme 1) were used to generate trajectories for 42 groups of vehicle–obstacle avoidance test scenarios. The average value of obtained by the two schemes in 42 groups of test scenarios is taken to obtain the “average” distance between the features of obstacle avoidance trajectories generated by the two schemes and expert-demonstrated, as shown in Fig. 17.
The obstacle avoidance trajectory generated by the two trajectory planning schemes and the expert-demonstrated trajectory had certain trajectory feature differences. This is because neither scheme considered the different driving styles of drivers in the HighD dataset. Compared to Scheme 1, Scheme 2 can significantly improve the similarity between the generated and expert-demonstrated trajectories. Specifically, the average feature difference in the lateral position of the obstacle avoidance trajectory generated in Scheme 2 was only 49.04% of that generated in Scheme 1. The average characteristic difference of the lateral velocity in Scheme 2 was only 42.91% of that in Scheme 1. The average characteristic difference of the lateral acceleration in Scheme 2 was only 55.35% of that of Scheme 1.
To further verify the effectiveness of the proposed scheme, three groups of real obstacle avoidance scenarios were randomly selected from the 42 groups of test samples. The proposed trajectory planning strategy was used to generate an AOAT, as shown in Fig. 18. The EDOATs in Scenarios 1, 2, and 3 are denoted as EDOAT1, EDOAT2, and EDOAT3, respectively. The AOATs generated based on Scenarios 1, 2, and 3 are denoted as AOAT1, AOAT2, and AOAT3, respectively. The lateral velocity and acceleration of the obstacle avoidance trajectory generated based on real driving scenarios are denoted as LV and LA, respectively. The real lateral velocity and acceleration in this scenario are denoted as LVT and LAT.
Fig. 18(a) – (f) show the lateral displacement, velocity, and acceleration of the trajectory generated based on Scenarios 1, 2, and 3, respectively. The values of the three scenarios are 14.40, 21.30, and 41.76 km·h−1, respectively. According to the driving scenario information, Scenario 3 was a more urgent condition than Scenarios 1 and 2. To prevent collision accidents, the vehicle chooses an obstacle avoidance trajectory with high lateral acceleration in Scenario 3.
To verify the real-time performance and acceptability of the strategy, a hardware-in-the-loop (HIL) test platform was built based on the aforementioned obstacle avoidance scenarios, as shown in Fig. 19.
The HIL platform includes a ground resistance moment simulation system, automotive steering assembly, dSPACE MicroLabBox, and national instruments’ peripheral component interconnect extensions for instrumentation national instruments' peripheral component interconnect extensions for instrumentation (NI PXI) real-time system. The HIL experimental process is shown in Fig. 20. An AOAT planning strategy for driving scenario adaptation was run in MicroLabBox to ensure real-time performance. Vehicle and road models in CarSim were embedded in NI PXI real-time systems using NI LabVIEW software. The servomotor simulated the ground resistance moment. These components communicate via a controller area network (CAN) bus. The working process is as follows. After MicroLabBox receives the driving scenario information from NI PXI through the CAN bus, it plans the optimal AOAT and calculates the optimal control angle . The actuator receives control instructions through the CAN bus and causes the corresponding steering action to follow an optimal trajectory. The NI PXI software receives the actual control actions of the platform to complete the closed-loop control.
Fig. 21, Fig. 22 show the trajectory planning and following process of the vehicles for 10 s from the start of obstacle avoidance.
In Scenario 1, the speed difference between the ego and lead vehicles at the beginning of obstacle avoidance was 14.4 km·h−1, and the ego vehicle speed was 121.9 km·h−1. In Scenario 2, the speed difference between the ego and lead vehicles at the beginning of obstacle avoidance was 21.3 km·h−1, and the ego vehicle speed was 118.3 km·h−1. In Scenario 3, the speed difference between the ego and lead vehicles at the beginning of obstacle avoidance was 41.76 km·h−1, and the ego vehicle speed was 94.50 km·h−1. Compared with Scenarios 1 and 2, the speed of the vehicle in Scenario 3 was slightly lower, but the speed difference between the ego and lead vehicles was large. Therefore, the obstacle avoidance urgency in Scenario 3 was relatively high and the obstacle avoidance time was relatively short.
As shown in Fig. 22(a), the maximum lateral position errors in the three scenarios are 0.185, 0.130, and 0.090 m, respectively. By contrast, the tracking accuracy of Scenario 2 was slightly worse than that of Scenario 1, whereas that of Scenario 3 was the worst. The reasons for this phenomenon are as follows. Compared with Scenario 1, the trajectory in Scenario 3 is more urgent, and the system may sacrifice trajectory tracking accuracy appropriately to ensure vehicle safety. Fig. 22(b) shows the motor control current required to track the three obstacle avoidance trajectories. Through the HIL experiment, it can be observed that the AOAT planned in this study follows well. The data in the HighD dataset are all real traffic scenarios. The simulation and experimental results show that the proposed trajectory planning strategy can automatically adjust its weight coefficients according to the degree of urgency and safety in different driving scenarios. Thus, an AOAT suitable for the current driving scenarios was planned.
5. Summary
Trajectory optimization functions are difficult to design and exhibit poor adaptability to various scenarios. Moreover, a single trajectory optimization function cannot generate an obstacle avoidance trajectory that is consistent with driver behavioral habits. Thus, based on maximum entropy inverse reinforcement learning theory, automatic acquisition of the optimization functions of real obstacle avoidance trajectories was realized, and the acceptability of the trajectory optimization function was improved. Then, combined with the key driving scenario information affecting vehicle obstacle avoidance behavior, an AOAT planning strategy for adaptive driving scenarios was proposed. This strategy significantly enhances the adaptability to driving scenarios and improves driving safety, while avoiding obstacles.
In future work, more complex and reliable mapping models will be designed to describe the relationship between driving scenarios and the feature weights of the optimization functions, and the proposed algorithm will be fully validated by real car tests. For inverse reinforcement learning, obstacle avoidance trajectories of humans with different driving styles were studied in depth. In addition, the vehicle trajectory planning in this study was limited to rectilinear road conditions. Obstacle avoidance trajectory planning for commercial vehicles under typical nonrectilinear road conditions will be an important research direction in the future.
Acknowledgments
The work was supported by the National Natural Science Foundation of China (51875302).
Compliance with ethics guidelines
Jian Wu, Yang Yan, Yulong Liu, and Yahui Liu declare that they have no conflict of interest or financial conflicts to disclose.
C.Gao, G.Wang, W.Shi, Z.Wang, Y.Chen. Autonomous driving security: state of the art and challenges. IEEE Internet Things J, 9 (10) (2022), pp. 7572-7579
[2]
A.Benloucif, A.T.Nguyen, C.Sentouh, J.C.Popieul. Cooperative trajectory planning for haptic shared control between driver and automation in highway driving. IEEE Trans Ind Electron, 66 (12) (2019), pp. 9846-9857
[3]
D.Dolgov, S.Thrun, M.Montemerlo, J.Diebel. Practical search techniques in path planning for autonomous driving. Ann Arbor, 1001 (48105) (2009), pp. 18-80
[4]
IslamF, NarayananV, LikhachevM. Dynamic multi-heuristic A. In:Proceedings of IEEE International Conference on Robotics and Automation (ICRA); 2015 May 26-30; Seattle, WA, USA. IEEE; 2015. p. 2376-82.
[5]
KushleyevA, LikhachevM. Time-bounded lattice for efficient planning in dynamic environments. In:Proceedings of IEEE International Conference on Robotics and Automation (ICRA); 2009 May 12-17; Kobe, Japan. IEEE; 2009. p. 1662-8.
[6]
ArslanO, BerntorpK, TsiotrasP. Sampling-based algorithms for optimal motion planning using closed-loop prediction. In:Proceedings of IEEE International Conference on Robotics and Automation (ICRA); 2017 May 29-Jun 3; Singapore. IEEE; 2017. p. 4991-6.
[7]
LaValleSM, KuffnerJJ. Randomized kinodynamic planning. In:Proceedings of IEEE International Conference on Robotics and Automation (ICRA); 1999 May 10-15; Detroit, MI, USA. IEEE; 1999. p. 473-9.
[8]
ZuckerM, KuffnerJ, BranickyM. Multipartite RRTs for rapid replanning in dynamic environments. In:Proceedings of IEEE International Conference on Robotics and Automation (ICRA); 2007 Apr 10-14; Rome, Italy. IEEE; 2007. p. 1603-9.
[9]
T.Berglund, A.Brodnik, H.Jonsson, M.Staffanson, I.Soderkvist. Planning smooth and obstacle-avoiding B-spline paths for autonomous mining vehicles. IEEE Trans Autom Sci Eng, 7 (1) (2009), pp. 167-172
[10]
J.Wu, J.Zhang, B.Nie, Y.Liu, X.He. Adaptive control of PMSM servo system for steering-by-wire system with disturbances observation. IEEE Trans Transp Electrification, 8 (2) (2022), pp. 2015-2028
[11]
RastelliJP, LattaruloR, NashashibiF. Dynamic trajectory generation using continuous-curvature algorithms for door to door assistance vehicles. In:Proceedings of IEEE Intelligent Vehicles Symposium Proceedings (IV); 2014 Jun 8-11; Dearborn, MI, USA. IEEE; 2014. p. 510-5.
[12]
GuT, DolanJM. On-road motion planning for autonomous vehicles. In:Proceedings of International Conference on Intelligent Robotics and Applications (ICIRA); 2012 Oct 3-5; Montreal, QC, Canada; 2012. p. 588-97.
[13]
LattaruloR, GonzálezL, PerezJ. Real-time trajectory planning method based on n-order curve optimization. In:Proceedings of International Conference on System Theory, Control and Computing (ICSTCC); 2020 Oct 8-10; Sinaia, Romania. IEEE; 2020. p. 751-6.
[14]
W.Lim, S.Lee, M.Sunwoo, K.Jo. Hybrid trajectory planning for autonomous driving in on-road dynamic scenarios. IEEE Trans Intell Transp Syst, 22 (1) (2019), pp. 341-355
[15]
B.Gutjahr, L.Gröll, M.Werling. Lateral vehicle trajectory optimization using constrained linear time-varying MPC. IEEE Trans Intell Transp Syst, 18 (6) (2016), pp. 1586-1595
[16]
McNaughtonM, UrmsonC, DolanJM, LeeJW. Motion planning for autonomous driving with a conformal spatiotemporal lattice. In:Proceedings of IEEE International Conference on Robotics and Automation (ICRA); 2011 May 9-13; Shanghai, China. IEEE; 2011. p. 4889-95.
[17]
S.Dixit, U.Montanaro, M.Dianati, D.Oxtoby, T.Mizutani, A.Mouzakitis, et al.. Trajectory planning for autonomous high-speed overtaking in structured environments using robust MPC. IEEE Trans Intell Transp Syst, 21 (6) (2019), pp. 2310-2323
[18]
Y.Luo, Y.Xiang, K.Cao, K.Li. A dynamic automated lane change maneuver based on vehicle-to-vehicle communication. Transport Res C Emer, 62 (2016), pp. 87-102
[19]
D.Yang, S.Zheng, C.Wen, P.J.Jin, B.Ran. A dynamic lane-changing trajectory planning model for automated vehicles. Transport Res C Emer, 95 (2018), pp. 228-247
[20]
J.Chen, W.Zhan, M.Tomizuka. Autonomous driving motion planning with constrained iterative LQR. IEEE Trans Intell Vehicles, 4 (2) (2019), pp. 244-254
[21]
LiuY, LiuY, JiX, SunL, TomizukaM, HeX. Learning from demonstration: situation-adaptive lane change trajectory planning for automated highway driving. In:Proceedings of IEEE International Conference on Mechatronics and Automation (ICMA); 2020 Oct 13-16; Beijing, China. IEEE; 2020. p. 376-382.
[22]
ZieglerJ, BenderP, DangT, StillerC. Trajectory planning for Bertha—a local, continuous method. In:Proceedings of IEEE International Vehicles Symposium Proceedings (IVSP); 2014 Jun 8-11; Dearborn, MI, USA. IEEE; 2014. p. 450-7.
[23]
SunL, PengC, ZhanW, TomizukaM. A fast integrated planning and control framework for autonomous driving via imitation learning. In: Dynamic Systems and Control Conference (DSCC); 2018 Sep 30-Oct 3; Atlanta, GA, USA; 2018.
[24]
WangY, PanD, LiuZ, FengR. Study on lane change trajectory planning considering of driver characteristics. SAE Technical Paper 2018;2018-01-1627.
[25]
B.Zhou, Y.Wang, G.Yu, X.Wu. A lane-change trajectory model from drivers’ vision view. Transport Res C, 85 (2017), pp. 609-627
[26]
HeX, XuD, ZhaoH, MozeM, AiounF, GuillemardF. A human-like trajectory planning method by learning from naturalistic driving data. In:Proceedings of IEEE Intelligent Vehicles Symposium (IV); 2018 Jun 26-30; Changshu, China. IEEE; 2018. p. 339-46.
[27]
J.Wu, Q.Kong, K.Yang, Y.Liu, D.Cao, Z.Li. Research on the steering torque control for intelligent vehicles co-driving with the penalty factor of human-machine intervention. IEEE Trans Syst Man Cybern, 53 (1) (2023), pp. 59-70
[28]
A.T.Nguyen, J.Rath, T.M.Guerra, R.Palhares, H.Zhang. Robust set-invariance based fuzzy output tracking control for vehicle autonomous driving under uncertain lateral forces and steering constraints. IEEE Trans Intell Transp Syst, 22 (9) (2020), pp. 5849-5860
[29]
C.Miyajima, Y.Nishiwaki, K.Ozawa, T.Wakita, K.Itou, K.Takeda, et al.. Driver modeling based on driving behavior and its evaluation in driver identification. Proc IEEE, 95 (2) (2007), pp. 427-437
[30]
L.Xu, J.Hu, H.Jiang, W.Meng. Establishing style-oriented driver models by imitating human driving behaviors. IEEE Trans Intell Transp Syst, 16 (5) (2015), pp. 2522-2530
[31]
CaiP, SunY, ChenY, LiuM. Vision-based trajectory planning via imitation learning for autonomous vehicles. In:Proceedings of IEEE Intelligent Transportation Systems Conference (ITSC); 2019 Oct 27-30; Auckland, New Zealand. IEEE; 2019. p. 2736-42.
[32]
H.Li, C.Wu, D.Chu, L.Lu, K.Cheng. Combined trajectory planning and tracking for autonomous vehicle considering driving styles. IEEE Access, 9 (2021), pp. 9453-9463
[33]
ZhangC, ChuD, LyuN, WuC. Trajectory planning and tracking for autonomous vehicle considering human driver personality. In:Proceedings of Conference on Vehicle Control and Intelligence (CVCI); 2019 Sep 21-22; Hefei, China; 2019. p. 1-6.
[34]
WuP, CaoY, HeY, LiD. Vision-based robot path planning with deep learning. In:Proceedings of International Conference on Computer Vision Systems (ICVS); 2017 Jul 10-13; Shenzhen, China; 2017.
[35]
LenzD, DiehlF, LeM, KnollA. Deep neural networks for Markovian interactive scene prediction in highway scenarios. In:Proceedings of IEEE Intelligent Vehicles Symposium (IV); 2017 Jun 11-14; Los Angeles, CA, USA. IEEE; 2017. p. 685-92.
[36]
VallonC, ErcanZ, CarvalhoA, BorrelliF. A machine learning approach for personalized autonomous lane change initiation and control. In:Proceedings of IEEE Intelligent Vehicles Symposium (IV); 2017 Jun 11-14; Los Angeles, CA, USA. IEEE; 2017. p. 1590-5.
[37]
KrajewskiR, BockJ, KloekerL, EcksteinL. The HighD dataset: a drone dataset of naturalistic vehicle trajectories on German highways for validation of highly automated driving systems. In:Proceedings of International Conference on Intelligent Transportation Systems (ITSC); 2018 Nov 4-7; Maui, HI, USA. IEEE; 2018. p. 2118-25.
[38]
KudererM, GulatiS, BurgardW. Learning driving styles for autonomous vehicles from demonstration. In:Proceedings of IEEE International Conference on Robotics and Automation (ICRA); 2015 May 26-30; Seattle, WA, USA. IEEE; 2015. p. 2641-6.