Safe and Human‐Like Trajectory Planning of Self‐Driving Cars: A Constraint Imitative Method

Safe and human‐like trajectory planning is crucial for self‐driving cars. While model‐based planning has demonstrated reliability, it is beneficial to incorporate human demonstrations and align the results with human behaviors. This work aims at bridging the gap between model‐based planning and driver imitation by proposing a constraint imitative trajectory planning method (CITP). CITP integrates artificial potential field and dynamic movement primitives, which have achieved both the ability to imitate human demonstrations as well as ensure safety constraints. During the planning process, CITP first encodes human demonstrations, local driving target, and traffic obstacles as attractive or repulsive effects, and then the trajectory planning problem is solved through model predictive optimization. To address the dynamics of traffic scenarios, a hierarchical planning strategy is proposed based on the division of planning process. CITP is designed with five modules, including LSTM‐based target generation, encoding attractive and repulsive effects with target, demonstrations and obstacles, and trajectory planning with model predictive optimization. Data collection and experiments are carried out based on CARLA driving simulator, and the effectiveness in terms of both safety and consistency with human behavior are reported.


Introduction
Ensuring safety and human-like driving is crucial for developing self-driving cars. [1]hrough modeling of safety constraints, model-based planning methods, such as artificial potential field (APF), have shown effectiveness in collision avoidance and reaching local targets.Nonetheless, to align the planning results with human behaviors, a complicated designing and tuning process is necessary for diverse scenarios. [2]n the other hand, recent advance in intelligent transportation has facilitated better access to human driving data; [3] planning by imitating human experts holds good potential for human-like planning. [4,5]herefore, this research aims to bridge the gap between APF model and imitative planning and generate a constraint imitative trajectory planning (CITP) method for dynamic traffic environments.
Model-based methods have demonstrated notable efficacy in trajectory planning.A trajectory comprising a path and velocity profile is a two-part entity.The path can be designed using various approaches such as splines, [6] Rapidly exploring Random Trees (RRT) [7] or optimal control strategies. [8]The velocity profile can be crafted by applying speed curves [9] or optimal control with a time-distance graph. [8]Among the planning methods, APF holds the advantage of synchronously planning both path and velocity.12] Through the design of potential functions (PFs), obstacles and other risk factors are encoded as repulsive effects, while the target or the prior reference is encoded as an attraction.Based on APF, NVIDIA [13] has introduced a safety force field (SFF) model to analyze the impacts of other vehicles.Zheng [14] has proposed a driving safety field (DSF) to model the risk factors according to the driver-vehicle-road relationships.Tang [15] applied APF to model the uncertainty of collision risk.To address the moving obstacles, researchers [15,16] have integrated APF with model predictive control (MPC) to generate predictive planning results.Figure 1 illustrates an APF, where the obstacle vehicles and lane boundaries generate repulsive effects.
Despite their effectiveness, APF and other model-based methods come with inherent drawbacks.Notably, achieving human-like planning requires complex model design and meticulous parameter tuning.For APF, in particular, aligning the planning results with human behavior requires a thoughtfully design of the attraction effects or prior reference trajectories. [11,15]A human driver may adjust its path or speed for various reasons and situations, which complicates the development of a comprehensive model. [14]n the other hand, imitative planning holds an advantage in achieving human-like planning.Within the field of robotics, dynamic movement primitives (DMP) refers to a commonly applied imitative motion planning method. [17,18]Similar to APF, DMP encodes obstacles as PFs to generate repulsive effects, while it differs from traditional APF in that the attractive effect is encoded using local targets and human demonstrations.DMP encodes the target and demonstration with a succession or concurrent execution of primitive movement.Among various types of DMP, discrete DMP is utilized for point-to-point trajectory planning.In obstacle-free scenarios, it has been mathematically proven that the trajectory will converge to the target position at the target time. [19]DMP has been applied to mobile robots [20] and surgical robots. [21]Recent research integrates DMP and MPC, which have improved in two aspects: learning from multiple demonstrations and avoiding dynamic obstacles. [22,23]Figure 2 demonstrates the effect of DMP-based trajectory planning.With four input demonstrations, the robot can reach the target from different initial states and avoid the collision.Additionally, accessing multiple demonstrations enables the robot to adjust its planning results by learning more from the demonstration closely resembling its current state.
Applying the point-to-point DMP to imitate driver behavior, however, presents its own set of challenges due to the dynamic nature of traffic scenarios.During driving process, a driver may have varied goals or intentions at different period of time, and their trajectories are often affected by uncertainty factors.Additionally, traffic scenarios incorporate moving obstacles, traffic rules, and various types of targets.Consequently, it becomes necessary to understand the driver's intention and bridge the gap between driving behavior and the dynamic traffic environment, requiring the planner to tackle these complexities effectively.
To address the dynamics of traffic scenarios, other methods are studied to imitate driving behaviors.In the field of intelligent driving, behavior cloning (BC) has been studied for driver imitation without prior reference trajectory. [24,25]BC dates back to the ALVINN system, [26] which aims to match the learned policy to human demonstrations.With the development of deep neural networks (DNN), DNN-based planning methods have been proposed. [24,27,28]However, safety concerns still hinder the application of related methods. [29,30]Hard constraints in target, obstacles, and traffic rules are difficult to be integrated and guaranteed.Besides, unfamiliar situations are also difficult to be settled.
Hierarchical planning offers a practical solution to combine the strengths of various planning methods.This process divides the planning into upper and lower layers.The upper layer is responsible for generating semantic behaviors such as turning, merging, and stopping, while the lower layer produces the corresponding trajectory.In this manner, it ensures a comprehensive and efficient approach to handling the complexity of driving scenarios.For the upper layer, various methods have been proposed for BC, including HMM, [31] SVM, [32] and DNN. [33]he lower layer generates trajectory with BC or model-based methods.Hierarchical planning simplifies trajectory planning by dividing the planning problem into two subproblems.As reported by, [33] the DNN-based BC method was significantly improved with respect to success rate by adding behavioral instructions.However, it is important to note that the hierarchical planning approach is flexible, and its design should be adapted based on the specifics of the traffic scenarios and the planning methods in each layer.
This paper introduces a novel CITP method, which combines APF and DMP to formulate a planning problem integrating human demonstrations, local target, and obstacles.We propose a hierarchical planning strategy to address the challenges posed by dynamic traffic environments.The framework, illustrated in Figure 3, consists of an upper-layer behavioral planner for target generation and a lower-layer trajectory planner for target reaching.The upper layer employs a long short-term memory (LSTM) network to generate static or dynamic targets, while the lower layer resolves the trajectory planning problem through model predictive optimization.The method's efficacy, regarding safety and alignment with human behavior, is validated through experiments and comparative tests in a virtual driving environment built on the CARLA simulator, covering two traffic scenarios: intersection and lane change.
The contributions of this work are two-folds: 1) A (CITP) method is developed by incorporating DMP with APF algorithm,  which can achieve imitation ability from human behaviors as well as guarantee safety constraints.Different from classical APF, we consider the human demonstration by encoding them as attractive effects.CITP also extends the application of DMP to solve the planning problem in dynamic traffic environment.2) To implement the above idea and address the dynamics of traffic, a hierarchical planning algorithm is designed, which consists of five modules, i.e., LSTM-based target generation, normalization and selection of demonstration, encoding of attractive and repulsive effects, and model predictive optimization.The experiments and comparative results indicate that CITP achieved expected trajectory planning performance in dynamic traffic environment.
The rest of this paper is structured as follows: Section 2 lays out the formulation of the imitative planning problem.Section 3 elaborates on the intricacies of CITP.To substantiate the effectiveness of our approach, experiments and results using the CARLA simulator are presented in Section 4. The paper concludes with a summary in Section 5.

Problem Statement
Trajectory planning aims to reach the local target without encountering collision.Nonetheless, the challenges of dynamic traffic environment should be solved to achieve imitative planning.Due to the existence of moving obstacles, the states of local targets are influenced by the traffic, while the driver's behavior is subject to different targets at different period of time.Besides, driver's behaviors are often fraught with uncertainty.
In order to formulate the trajectory planning problem, our method classifies the local target into two categories: static and dynamic and divides the planning process into upper and lower layers.Further more, the trajectory planning problem is solved with prediction, and the human demonstrations are normalized before imitation.

Classification of Target State
The states of drivers' local targets are influenced by the existence of obstacle vehicles.This work classifies the target states as static or dynamic: 1) Static target: the target state is not related to obstacles and is located at a fixed position, e.g., the stop line and the lane entrance at the intersection.2) Dynamic target: the target is located at a relative position based on a moving obstacle, e.g., following the front vehicle with a distance of 10 m.
To formulate the planning problem for the target states, an intersection scenario, and a lane-change scenario are modeled and demonstrated with Figure 4.
The intersection scenario includes five targets: the waiting lane T 1 , T 2 and lane entrance T 3 , T 4 , T 5 .All targets are static with fixed locations.The lane-change scenario includes two targets: the terminal of current lane T a and the car-following position after merging into the target lane T b .The former target is static, and the later target is dynamic.To maintain a concise narrative, this study primarily considers vehicles as obstacles, while disregarding the behaviors after merging or passing through intersections.

Formulation of Hierarchical Planner
During the driving process, the driver may have different targets at different periods of time.For instance, a driver should wait for a feasible chance before conducting merging behavior.Consequently, a hierarchical planner is devised, consisting of two layers: the first determines the target, and the second strategizes on how to reach that target.

Upper-Layer Planner
The upper layer of CITP generates the target states, which includes the selection of target (Equation (1) for intersection, and Equation (2) for lane-change), as well as the position, velocity, and time at the target as T ⇐ fS, Desg; T ∈ fT a , T b g where Des refers to the long-term driving destination, S is the safety status estimated from traffic environment, which will be introduced in Section 3. The inputs for behavior planning  In both scenarios, the driving process involves two stages.In the intersection scenario, the ego vehicle first heads to the target T 1 or T 2 at the stop line.Once it is estimated safe to enter the intersection, the target is switched to T 3 , T 4 , or T 5 at the lane entrance.In the lane-change scenario, the ego vehicle first drives toward the current lane's terminal T a .Once the opportunity for merging occurs, the target point is switched to the car-following position T b .At each target, the ego vehicle's motion state is represented as M T ¼ ½p T , v T , t T , denoting the position, velocity, and time, respectively.
It should be noted that target generation is performed during the driving process, allowing for targets to be switched without necessarily reaching the previous target.For instance, in an intersection scenario, provided the conditions are safe, the ego vehicle shall enter the intersection without stopping at the stop line.

Lower-Layer Planner
The lower layer of CITP generates the collision-free trajectory toward the target through imitating human demonstrations where ½p T , v T , t T refers to the ego vehicle's target of position, velocity, and time, M ego and M o are the initial motion states of ego vehicle and obstacle vehicles at t 0 , B denotes the position of lane boundary, and Demo refers to the selected human demonstrations for imitative planning.Considering the uncertainty of human behavior (e.g., different car-following distance), we unify the target states for safety and consistency: For static targets, ego vehicle comes to a stop at the target position, and for dynamic targets, ego vehicle maintains a fixed car-following distance from the front vehicle.T t is calculated based on the selected demonstrations and will be further introduced with Section 3.1.
The trajectory planning problem is subject to two types of constraints: reaching target and avoiding collision where the constraints in Equation ( 4) denote the initial states and terminal states for the planning problem.The constraints in Equation ( 5) are the minimum distance required between the ego vehicle and obstacle vehicles or road boundary.The ego vehicle (width W e ) has potential conflict with N o obstacle vehicles (i ∈ ½1, 2, : : : , N o ), and R i is calculated based on collision risk.The states of obstacles are predicted within the planning horizon, and the planning problem is solved with model predictive optimization.The collision risk is influenced by the relative position and velocity of the ego vehicle and obstacles.In this study, an APF is utilized to represent the collision risk, with a greater repulsive effect generated for higher collision risks.

Methodologies
This section introduces the design of CITP, which is composed of five modules.The first module represents the upper-layer planner, while the remaining four modules detail the design of the lower-layer planner.1) The target is generated with the upper-layer planner.2) Human demonstrations are normalized and selected.3) Human demonstration trajectories are encoded with DMP as attractive effects.4) Obstacles (vehicles and road boundary) are encoded with potential field as repulsive effects.5) Imitative trajectory planning is conducted through model predictive optimization.
In addition, safety is the priority of trajectory planning.Section 3.6 introduces the adaptation method of CITP in situations where sufficient demonstrations are lacking.

Target Generation with the Upper-Layer Planner
Based on the planning problem, the upper layer generates target states as behavioral instruction.The planning process includes target switching and target generation.
In both scenarios, the planning process is two staged.For the intersection scenario, the initial target is located at the stop line, and the ego vehicle should confirm safety before entering the intersection (switch the target from T 1 or T 2 , to T 3 -T 5 ).For the lane-change scenario, the initial target is located within the current lane, and merging can be performed after confirming safety (switch the target from T a to T b ).We apply LSTM to estimate the safety status S, as is illustrated with Figure 5.The function is shown as follows In the intersection scenario, the positions of targets p T are predetermined based on road structure.The target velocity v T is set as 0. The target time t T refers to the maximum time duration of selected demonstrations.In the lane-change scenario, the dynamic target locates at a particular position behind the front vehicle for car-following.The car-following distance is D f , and the target time t T is the maximum time duration of selected human demonstrations.The location of target T b in the Cartesian coordinate system, denoted as p T b , is generated based on the constant-velocity (CV) assumption of the front vehicle Given the widespread application of LSTM-based classification methods, this paper mainly focuses on introducing the inputoutput architecture of LSTM.In order to estimate the safety status, the first step is to identify the potential conflicts and the involved obstacle vehicles.As shown in Figure 6, the typical traffic conflicts comprise crossing and merging conflicts.In each conflict, a collision may occur at the conflict point with the front vehicle or the rear vehicle.Moreover, in the case of car following, the conflict can also be modeled as a merging conflict, neglecting the rear vehicle.
As is demonstrated with Figure 4, for the intersection scenario, if the ego vehicle turns left, it will encounter two crossing conflicts, whereas if the ego vehicle goes straight, one merging conflict may occur.For the lane-change scenario, the ego vehicle will encounter one merging conflict.To estimate the safety status, the inputs of LSTM are the motion states of ego vehicle M ego , and the motion states of the front and rear obstacle vehicles M o i for each related conflict.While collecting human demonstration data, the ground truth of safety status fTrue; Falseg is given by a button signal: once the test driver feel safe to enter intersection (or conduct merging), a button on the steering wheel will be pressed.

Normalization and Selection of Human Demonstration
The DMP-based trajectory planner produces a collision-free trajectory to reach the target.
During human driving, the target can shift over different periods of time.The normalization process splits the demonstration trajectory according to the temporal driving target.Besides, each human trajectory is connected to the target position for the consistency and safety of the demonstration.
1) Splitting the demonstration trajectory: The temporal target is indicated with the driver's long-term destination and button signal.According to the two-staged driving process, the target locates at the stop line (intersection) or current lane's terminal (lane change) before the button is pressed.Once the button is pressed, the target is switched to the entrance to the target lane (intersection) or the car-following position behind the front vehicle (lane change).
2) Connecting to the target: As is shown in Figure 7, the demonstration trajectories are connected to the target through rulebased method: 1) Static target: After enters the transition area, ego vehicle decelerates with constant acceleration, and stops at the target position.2) Dynamic target: In the lane-change scenario, once the lateral speed drops the transition region, the trajectory is connected to the target with two steps: decelerates to standstill, and reach the target with a symmetrical accelerationdeceleration process.
Note that in lane-change scenario, the origin of the coordinate system refers to the target states of car-following.
The constant-acceleration model (CA) plans vehicle trajectory as follows (a denotes the initial acceleration) 3) Selection of demonstrations: To select the proper demonstrations for imitative trajectory planning, the demonstrations are first labeled with three aspects: the initial and the terminal motion states of ego vehicle M es and M et ; the time duration t d of the demonstration; the status of potential conflicts M obs .For each related conflict, the motion states of the front and rear vehicles are recorded.To perform imitative trajectory planning from the current state M ec , the selection of demonstrations will be conducted based on the position and velocity similarity of M es .Furthermore, a demonstration should be discarded under following situations: the time duration t d or terminal state M e differs significantly from other demonstrations, or the collision risk is significantly lower than that of the current state.In this research, up to 4 demonstrations could be selected for imitative trajectory planning.

Encoding Human Demonstrations as Attractive Effects
Demonstrations are encoded as attractive effects in DMP by generating a stable dynamic system that is modulated with nonlinear components.A commonly applied and simplified dynamic system for DMP is the damped spring model where m, ṁ, m refers to position, velocity, and acceleration of the vehicle, respectively.g denotes the target state, based on the assumptions introduced in Section 2.2, the target is set as m ¼ g, ṁ ¼ 0, and g is the origin of coordinates.τ is a temporal scaling constant, and k p and k v are constants for the spring damping model.u denotes the external term encoded with demonstration, which contributes a part of the attractive effect, and G is the repulsive effect generated by the obstacles.To promote application, the attractive and the repulsive effects are encoded independently.
A dynamical system (DS) is formulated as an ordinary differential equation (ODE) to encode the demonstrations.This ODE represents the DS with state xðtÞ, attractive effect uðtÞ, phase variable sðtÞ, and parameters p ẋðtÞ ¼ AxðtÞ þ BuðtÞ ¼ ΦðxðtÞ, sðtÞ; pÞ (10)   Derived from Equation ( 9), the ODE is written as The application of phase variable s helps avoid time-variant problem with temporal scaling.s ranges from 0 to 1, and s decreases from 1 to 0 with time goes from the start t ¼ 0 to the end t ¼ t end ¼ τ.We define the canonical system as where parameter τ denotes the duration of the demonstration, and parameter α s is a constant for canonical system that affects convergence speed.This research set the parameters as τ ¼ 1 and α s = 0.1.
To generate the target-tractor dynamics for the term uðsÞ, DMP design uðsÞ based on a normalized linear combination of basis functions.In this research, Gaussian basis function (GBF) is applied to formulate the basis functions Ψ i ðsÞ where constants c n and σ n are the centers and widths of each GBF.These parameters are coupled as vector ϑ ¼ ½c where s t denotes the phase variable at time step t (T is the length of demonstration); x t ¼ ðm t , ṁt Þ is the vehicle state at time t; md t refers to the acceleration of demonstration trajectory; and Φðx t , s t ; ϑÞ is the DS's output acceleration.Note that considering the lack of information before the first time frame of demonstration, to improve accuracy, DMP commonly samples a short period of supplement trajectory in the front.Without being blocked by obstacles, the planning result based on the attractive effects will converge to the target (both state and time), which has been proved in ref. [19].

Encoding Obstacles as Potential Field
The obstacles in traffic scenarios are obstacle vehicles and lane boundaries.The collision risk is modeled as a repulsive effect Gðm, ṁ, tÞ based on the potential field, where greater collision risk generates a greater repulsive effect for collision avoidance.Basic models of potential fields have been studied by related research. [34,35]In addition, obstacle vehicles move according to their driving targets.The collision risk should be estimated with motion prediction.In order to model the repulsive effect of an obstacle vehicle, we employ an ellipse-shaped model.The potential field and the corresponding repulsive effect are depicted in Figure 9.It is assumed that once the ego vehicle enters the elliptic area, a collision take place.(l x ,l y ) denotes the length of horizontal and vertical axis: l x ¼ L e þ L o , and l y ¼ w e þ w o .The nearest point refers to the point on the ellipse that is closest to the ego vehicle, and R i denotes the distance between the position of i À th obstacle vehicle and corresponding nearest point.In this research, the repulsive effect is considered when the obstacle vehicle approaches closely and is not driving in the opposite direction, while maintaining a separation from the ego vehicle by a fence or solid lane line.Considering the relative motion states and traffic rules, the likelihood of collisions with other types of obstacles is low.Therefore, the influence of irrelevant obstacle vehicles should be avoided.
The potential field is built with two principles: 1) The obstacle's repulsive effect increases with the decrease of relative distance.
2) The repulsive effect increases with the increase of relative velocity toward the obstacle, which implies the repulsive effect is zero if the ego vehicle is leaving the obstacle vehicle.
According to the two principles, the potential field UðX , YÞ is formulated as where α, β, a are constant gains, θ and jjVjj denote the direction and value of relative velocity toward the obstacle.θ is calculated as the angle between the direction of relative velocity and the direction toward the nearest point on the ellipse.E(X,Y ) is the potential function of the elliptic obstacle where ðm x , m y Þ is the longitudinal-lateral position of ego vehicle at time t, and ðm o x , m o y Þ is the position of obstacle vehicle.The repulsive effect Gðm, ṁ, tÞ from the obstacle is calculated as the gradient of the potential function on ego vehicle's position Gðm, ṁ, tÞ ¼ À∇ X ðUðX , YÞÞ the term ∇ X ðcosθÞ is calculated as The repulsive effect of road boundary is designed with the following principles: 1) The direction of the repulsive effect is always normal to the boundary.2) The repulsive effect increases with the increase of relative velocity toward the road boundary and increases with the decrease of relative distance to the boundary.
According to the two principles, the repulsive effect from lane boundary is designed as where v l ≥ 0 denotes the vehicle velocity toward road boundary, D l measures the distance to the boundary, and W e is the width of ego vehicle.Constant parameter b is tuned to affect the value of repulsive effect.Figure 10 demonstrates the safety field generated by the road boundaries.
During driving process, the ego vehicle is affected by multiple obstacles (including vehicles and road boundaries).The repulsive effect from each obstacle is assumed to be independent, and the total repulsive effect is the vector sum of all effects, which is illustrated with Figure 1.2]

Trajectory Planning with Model Predictive Optimization
After encoding the demonstrations and obstacles as attractive and repulsive effects, respectively, the trajectory planning problem is solved with model predictive optimization.
The demonstrations are first considered to drive the vehicle to the target, which is formulated a predictive optimal control problem  The loss term of the optimal control problem is designed with respect to two considerations: the first loss term measures the distance between the state of performance and the reference states, and the second loss term is designed to avoid redundancy of solution.

The First Loss Term
To generate human-like trajectory, the first loss term is the residual Δx½k derived from the state of performance x p ½k and the convex hull of demonstration states at time step k where N½k ¼ ½x 1 ½k, : : : , x d ½k ∈ ℝ 2ÂD denotes the human demonstration states at time step k.The first loss term is illustrated with Figure 11.
To minimize the residual, the loss term is formulated as the norm of Δx½k: kΔxk 2 H ¼ Δx T HΔx.H is a positive semi-definite and symmetric matrix.

The Second Loss Term
When Δx½k ¼ 0, it is possible that λ½k have multiple equivalent solutions.To avoid redundancy of solution, the second loss term is introduced where c d measures the Euclidean distances between the states of performance to the reference state of dth demonstration: We also adopt η ≥ 0 to scale the value of the second loss term.In addition, through adding the second loss term, the nearer demonstration to the state of performance shall gain a bigger weight.Which means that vehicle can learn more from a nearer demonstration.
Based on the definition of the first and second loss terms, the loss function of the predictive optimal control problem is shown as follows where c½k ¼ ðc 1 ½k, : : : , c D ½kÞ T , and λ½k ¼ ðλ 1 ½k, : : : , λ D ½kÞ T .
To generate a predictive trajectory planning result, the planning problem is then modeled and solved as a MPC problem.
First, the continuous dynamic system of DMP is reformulated to a discrete system where the transition matrix and transition vector for the discrete system are A ¼ e AT , and Second, to generate a predictive planning results for P steps forward, the residuals for P steps are predicted.For a future time step k þ p; p ∈ ½1, P from current time k, the residual is calculated as Then, to represent the residual x p ½k þ p with x p ½k, an iterative operation is designed We reformulate the residuals for the P-steps in the form of matrix equations      and the matrix B d ∈ ℝ 2ðPþ1ÞÂðDþ1ÞðPþ1Þ is given as ÀN½k 0 : : : The loss function for the MPC is derived from (23).At each time step k, the optimization problem to be solved is During trajectory planning, obstacles also generate repulsive effects for collision avoidance.Besides, the motion of obstacles can be addressed through the receding horizon control scheme.In this research, we assume that the motion predictions of obstacle vehicles are predetermined.Note that the trajectory prediction of obstacles is an open problem for intelligent driving, and various methods have been proposed.For simplification, we assume that the obstacle will move along predetermined paths at constant velocity (CV).
With obstacles into consideration, the planning process of DMP is demonstrated with Figure 12.During the state update of ego vehicle from x k to x kþ1 , the state transition is calculated under the joint effect of demonstrations (attractive effects) and obstacles (repulsive effects) where m obs ½k and ṁobs ½k are the position and velocity offsets caused by the repulsive effect, Δt denotes the length of a time step, and parameter M is the virtual mass of the ego vehicle.
During state transition, it is assumed that the attractive and repulsive effects stay unchanged within the time step.In this research, the trajectory is generated with 50 steps, with the time goes from the start t ¼ 0 to the end t ¼ t end ¼ τ.The optimization method [36] is applied to solve the optimal planning problem.

Trajectory Planning without Demonstration
Safety is the foremost priority of trajectory planning.In the absence of proper human demonstration, the CITP shall generate backup strategy based on reference trajectory.Without a proper demonstration, the attractive effects u(t) (as depicted in Equation ( 9)) could be encoded using predefined reference trajectories, rather than a human trajectory.In this case, the planning result of CITP aligns with that obtained from traditional APF methods, thereby eliminating the need to switch to an alternative trajectory planning method.For both scenarios, the reference path is in the middle of the target lane, and the velocity profile consists of two stages: a CA procedure to the speed limit and a CV procedure for the rest.Once the trajectory enters the transition area, it is linked to the target through the method introduced in Section 3.2.The design of reference is adopted from related APF methods. [11,12]esides, for the upper-layer planner, rule-based safety estimation is designed based on the motion states of the front and rear vehicles.Thresholds are set to determine the proper time to merge or to enter the intersection.

Experiments
The effectiveness of the CITP was validated based on the CARLA driving simulator. [37]This section presents the experimental results from four perspectives.First, the experimental conditions are introduced.Second, the planning results of the hierarchical planner are demonstrated for the overall driving process.Then, the planning results of the upper and lower layer planners are analyzed individually.Finally, we compare the trajectory planning results of DMP with those of DNN-based imitation learning (BC) and APF-based planning methods regarding safety and consistency with human driving.
A video of the experiment is available at the following link: https://www.youtube.com/watch?v=u0glkUwLjEw.

Experimental Conditions
We utilized the CARLA driving simulator to create intersection and lane-change scenarios for collecting demonstration data and deploying planning algorithms.CARLA is one of the best open-source simulators for intelligent driving research, offering realistic traffic scenarios, obstacle agents, and practical APIs for integration with external devices.The ego vehicle was controlled by a human driver using a Logitech G29 steering wheel and pedals.
During the experiment, we conducted 100 demonstration drives for the intersection scenario and 80 drives for the lanechange scenario.The obstacle vehicles were randomly placed and controlled by the built-in controllers in the CARLA simulator.To simulate the uncertainty of the lane-change scenario, we incorporated a yielding logic for the obstacle vehicles, and their decision to yield or not was randomized.The maximum number of conflicting vehicles with the ego vehicle was four.
Figure 13 demonstrates the conditions of the experiment.The driver controlled the ego vehicle with Logitech G29, and when he decided to switch the target, a button on the steering wheel will be pressed to record the time.Table 1 demonstrates the design of parameters.

Experiment Results of Hierarchical Planning
In this research, the planning problem tackled by CITP is separated into two stages, depending on whether the vehicle starts to enter the intersection (or starts to merge).The upper-layer planner determines the target, while the lower layer generates the trajectory accordingly.
Figure 14 demonstrates the planning result of left-turning in the intersection.The planning process is divided into two stages: at first, the ego vehicle approaches the stop line; then, the ego vehicle enters the intersection after confirming safety.Figure 14a displays the complete path (x-y graph) of both stages, while Figure 14b presents the path of each stage separately.The human demonstrations are indicated with dotted lines, and the target position and velocity of each stage are defined as ð0, 0Þ, as is introduced with Section 2.2. Figure 14c,d demonstrates the position-time curves of the two stages.Figure 14c shows the x À t graph, and Figure 14d shows the y À t graph.
The collaboration between the upper and lower layer planners is demonstrated in Figure 14.During the planning process, the upper layer sets the target, and the lower layer generates the imitative trajectory accordingly.In Figure 14c,d, at the end of stage 1, the ego vehicle waits at the stop line until the conditions become safe for entering intersection.Note that in some other cases, the ego vehicle can enter the intersection without waiting.The target can be switched whenever the traffic condition is estimated as safe.
Figure 15 illustrates the planning results of the lane-change scenario, depicting the two-stage process of straight-going within the initial lane and merging into the adjacent lane.In the first stage, the target T a is set at the virtual lane terminal.Once it is safe to merge, the target is switched to T b , a dynamic target located at the car-following position.In the second stage, the dynamic target T b is set as the origin of the coordinate system.Figure 15b-d demonstrate the x À y graph, x À t graph, and y À t graph of the two stages, respectively.
During the first stage, the upper-layer planner is active to identify a safe merging opportunity.Once it appears, the target is switched from T a to T b , the demonstrations are changed accordingly (demonstrations are indicated by the dotted lines in 15(b-d)), and the current motion state become the initial state of the second stage.
The experimental results illustrate the workflow of the hierarchical planner.Furthermore, the x À t curves and y À t curves demonstrate the performance of DMP: the ego vehicle can always reach the target position at the target time (except being stopped by obstacles), and the output trajectories exhibit good consistency with human demonstrations.
To better demonstrate the trajectory planning results of CITP, a video is provided online, and a PID controller was applied for vehicle control: https://www.youtube.com/watch?v=u0glkUwLjEw.

Experiment Results of the Upper-Layer Planner
The upper-layer planner generates the driving target based on the states of ego vehicle and traffic.The LSTM is trained to estimate  the safety status of traffic conflicts, S ∈ fTrue; Falseg, and ego vehicle should enter the intersection or conduct merging once S ¼ True.
To verify the performance of the LSTM planner, we divided the training and testing set as 2:1 for each scenario and applied the button signal as ground truth for target switching.We calculated the precision Pr, recall rate Rr, and average time difference dT of target-switching with a human driver.
For all cases that are viewed as "safe to merge/enter the intersection" by a human driver, the metrics are where TP are the true positives, TP þ FP are the predicted positives, TP þ FN are the all positives.The precision indicates whether the output "Ture" is viewed as "safe" by human.The recall rate indicates whether all "safe" conflicts are identified by LSTM.For each safe conflict, the LSTM outputs True at time T LSTM , while human driver presses the button at time T Human .The time difference indicates whether the decision is made in time, and a positive dT means that LSTM makes the decision after human driver.
Table 2 presents the experimental results.The precision rate for both scenarios was 1, indicating that all "True" conditions were identified as "safe" by the human driver.The recall rate for intersection and lane change was 0.92 and 1, respectively, indicating that the LSTM planner produced more cautious results.The average decision-making time was 0.32 and 0.20 s slower than the human driver.Based on the experimental results, we conclude that the LSTM planner is effective in identifying safe conditions, and it is slightly more conservative than the human driver.

Experiment Results of the Lower Layer Planner
The DMP-based planner outputs the collision-free trajectory to reach the target.To verify and compare the effectiveness of the proposed method, we focused on evaluating three key subscenarios: left-turn (intersection), left-merge (intersection), and cut-in process (lane-change).Figure 16 demonstrates the subscenarios as well as the planning results.The dotted lines are the demonstration trajectories of a human driver, and the red lines are the trajectory planning results.For each scenario, the same set of demonstration trajectories was provided, and the vehicle was navigated to the target from 3 different initial states.The convergence to the target (including target's state and time) has been proved, [19] barring being blocked by obstacles.
With the presence of obstacles, Figure 17 and 18 demonstrate the output trajectories considering collision avoidance.In Figure 17a, the ego vehicle conducted merging after passing an obstacle vehicle, Figure 17b,c are the x À t and y À t graphs for the driving process.During the driving process, the repulsive effects from obstacles pushed the ego vehicle away from the demonstrations.After the collision was avoided, the ego vehicle resumed its motion toward the target, and the convergence performance to the target remained satisfactory.Note that as is illustrated with Equation ( 15), if the ego vehicle is driving away from the obstacle (θ ∈ ð0, π 2 Þ), the repulsive effect from the corresponding obstacle is 0.
Figure 18 illustrates the planning results with three different ego vehicles and obstacle initial states.The results suggest that the ego vehicle can adapt to various situations, including different initial states and obstacle states, by utilizing multiple demonstrations and learning more from the nearer human demonstrations.Conversely, if only a single demonstration is provided, once the ego vehicle gets far away from the demonstration, the vehicle will need to perform a rapid change of velocity to follow the demonstration.Note that in Figure 17 and 18, the conditions are considered safe by the upper planner.However, due to the limited nature of behavior modeling, it is possible that the trajectory planner may encounter unsafe conditions and necessitate collision avoidance.Incorporating collision avoidance capabilities into the trajectory planner can significantly enhance the safety of the planning process.
In this study, we compared our trajectory planning results with two baseline methods: a BC-based method conditional imitative learning (CIL) [33] and an APF-based planning method. [15]he CIL was designed with DNN, whose behavioral target was input as a prior to improve the success rate.The APF method solved the trajectory planning problem with MPC, to avoid collision with dynamic obstacles.The performance of each method was evaluated based on two metrics: the success rate of passing the scenarios and the trajectory deviation (Dev) from the human driver.The success rate was determined by conducting 50 tests for each scenario with randomly assigned initial states (the initial state should be estimated as safe for entering an intersection/merging), and the deviation was calculated based on all human demonstrations with identical initial states for planning  where T N denotes the total frames of human trajectory, H t and P t are the positions at each frame of human demonstration and planning result, respectively.Table 3 and Figure 19 illustrate the test results.On average, the success rate of CITP and APF was 1, whereas the BC was 0.87.The deviation of CITP from human trajectory was 0.85 m, indicating a reduction of 45.4% and 53.1% compared to BC (1.56 m) and APF (1.82 m), respectively.The performance advantages observed in CITP stem from its basic characteristics, which involve the integration of both constraints and demonstration data for trajectory planning.As is demonstrated, the DMPbased method performed the same level of safety as traditional APF, while reported better consistency with human behavior in most scenarios.The reference states of the APF method were designed according to the human driver's average path and velocity, which was not effective enough to reflect human behavior throughout the whole driving process.The BC performed well with shorter trajectories (e.g., left-merge), but for a longer trajectory including more steps of decision-making (e.g., left-turn), the lack of constraints and the accumulation of error enlarged the deviation and increased the probability of collision.In addition, the constraints for trajectory planning include collision avoidance and target reaching.While both APF and DMP take into account the aforementioned constraints, the advantage of DMP is that it considers not only the target position but also the target time.The target time is also important for safety because the target position may become unsafe if it is not reached on time.It should be noted that DMP is a trajectoryplanning method.The planning result requires to be tracked by ego vehicle through an additional vehicle control method.

Conclusion
In this article, we proposed a CTIP method for safe and humanlike intelligent driving.CITP comprises two layers: an upper layer that utilizes the LSTM network for target generation and a lower layer integrating DMP and APF for imitative trajectory planning.Five modules were proposed to address the challenges presented by dynamic traffic scenarios, including target generation, demonstration standardization and selection, encoding of attractive and repulsive effects, and trajectory planning with model predictive optimization.As is demonstrated from the experimental results, the CITP is effective in generating safe and human-like driving trajectories.Compared to the baseline BC-based and APF-based methods, the proposed method achieves an equivalent success rate to the APF method (both are 1).Besides, it exhibits a reduction of 45.4% and 53.1% in deviation from human trajectories compared to the BC and APF methods, respectively.Our planning method utilizes human demonstrations and local targets to model the attractive effects and also encodes obstacles as the repulsive effects.By effectively integrating both demonstrations and constraints, CITP offers a safe and humanlike trajectory planning approach that combines the strengths of both model-based and imitation-based methods.Future study should apply CITP in more complex scenarios, considering multitypes of obstacles, interaction with road users, and long-term planning tasks.The planning methods could be integrated with recent breakthrough in driver behavior and interaction modeling.

Figure 1 .
Figure 1.Artificial potential field (APF) methods model the obstacles as repulsive effects for collision avoidance.

Figure 2 .
Figure 2. Trajectory planning with dynamic movement primitives (DMP).The robot adapts to different situations by learning from multiple demonstrations.

Figure 3 .
Figure 3.The structure of CITP.The upper layer generates target states with LSTM, the lower layer applies DMP for imitative trajectory planning.Figure 4.An intersection and lane-change traffic scenario is modeled to study the static and dynamic targets a) Intersection: Static target; b) Lanechange: Dynamic target.

Figure 4 .
Figure 3.The structure of CITP.The upper layer generates target states with LSTM, the lower layer applies DMP for imitative trajectory planning.Figure 4.An intersection and lane-change traffic scenario is modeled to study the static and dynamic targets a) Intersection: Static target; b) Lanechange: Dynamic target.

Figure 5 .
Figure 5. Illustration of the input-output information of the LSTM.

Figure 6 .
Figure 6.The conflicts are classified as a) Crossing conflict and b) Merging conflict.

Figure 7 .
Figure 7. Normalization results of human demonstration trajectories to reach static and dynamic targets a) Demonstration trajectories for static targets; b) Demonstration trajectories for dynamic targets.
1 , σ 1 , : : : , c N , σ N T ∈ ℝ 2N .The target and the initial positions of each demonstration are g and m 0 .The weights of basis functions are w ¼ ½ω 1 , : : : , ω N T ∈ ℝ N .N = 20 GBFs are applied and are illustrated with Figure 8.According to the definition of DS, the encoding problem of attractive effect is transformed to a learning problem of the weight vector w based on the demonstration.The weight of each GBF w n is learned through locally weighted regression (LWR).During training process, the trained weight vector should minimize the Euclidean norm of the acceleration residual between demonstration trajectory and DS's output minimize ω 1 , : : : , ω N X T t¼1 ½Φðx t , s t ; ϑÞ À md t 2 (14)

Figure 9
Figure 9 demonstrates ego vehicle (length L e and width W e ) with motion state X ¼ ðm, ṁ, tÞ and an obstacle vehicle (length L o and width W o ) with state ðm o , ṁo , tÞ.An elliptic obstacle is modeled to represent the collision area: Y ¼ ðm o , ṁo , t, l x , l y Þ.It is assumed that once the ego vehicle enters the elliptic area, a collision take place.(l x ,l y ) denotes the length of horizontal and vertical axis:l x ¼ L e þ L o, and l y ¼ w e þ w o .The nearest point refers to the point on the ellipse that is closest to the ego vehicle, and R i denotes the distance between the position of i À th obstacle vehicle and corresponding nearest point.In this research, the repulsive effect is considered when the obstacle vehicle approaches closely and is not driving in the opposite direction, while maintaining a separation from the ego vehicle by a fence or solid lane line.Considering the relative motion states and traffic rules, the likelihood of collisions with other types of obstacles is low.Therefore, the influence of irrelevant obstacle vehicles should be avoided.The potential field is built with two principles: 1) The obstacle's repulsive effect increases with the decrease of relative distance.2)The repulsive effect increases with the increase of relative velocity toward the obstacle, which implies the repulsive effect is zero if the ego vehicle is leaving the obstacle vehicle.According to the two principles, the potential field UðX , YÞ is formulated as

Figure 9 .
Figure 9. Illustration of the repulsive effect from an obstacle vehicle, based on ellipse-shaped obstacle's potential field.

Figure 10 .
Figure 10.Illustration of the repulsive effects from road boundaries.

( 27 )
where the matrix A d ∈ ℝ 2ðPþ1ÞÂD is given as

Figure 11 .
Figure 11.Illustration of the residual from the convex hull.

Figure 12 .
Figure 12.Multistep trajectory planning based on the joint effect of demonstration (attractive effect) and obstacle (repulsive effect).

Figure 13 .
Figure 13.Demonstration of the CARLA driving simulation environment a) Logitech G29; b) Lane-change scenario; c) Intersection scenario.

Figure 14 .
Figure 14.Two-stage trajectory planning results in intersection scenario a) Driving process of left-turn; b) Imitative planning: x-y graph; c) Imitative planning: x-t graph; d) Imitative planning: y-t graph.

Figure 15 .
Figure 15.Two-stage trajectory planning results in lane-change scenario a) Driving process of lane-change; b) Imitative planning: x-y graph; c) Imitative planning: x-t graph; d) Imitative planning: y-t graph.

Figure 16 .
Figure 16.Demonstration of trajectory planning in three subscenarios a) Three representative processes; b) Imitative planning left-turn; c) Imitative planning for left-merge; d) Imitative planning for cut-in.

Figure 17 .
Figure 17.Collision avoidance with a moving obstacle vehicle a) Collision avoidance with moving obstacle vehicle; b) x-t graph; c) y-t graph.

Figure 18 .
Figure 18.Collision avoidance with different initial states and obstacles.

Table 1 .
Design of parameters.

Table 2 .
Test results of LSTM planner.