Evolution towards optimal driving strategies for large-scale autonomous vehicles

With rapidly developing autonomous vehicle (AV) technologies, the optimal driving strategy should consider multi-objective optimization problems of large-scale transportation systems, including safety and efﬁciency. Different driving strategies have different performance, and there is an interaction between vehicles with different strategies. Since the Nash equilibrium is hard to ﬁnd for an n -player game, it is difﬁcult to get an analytical solution to this multi-objective optimization problem. Therefor a coevolutionary algorithm is proposed to explore the interactions between populations with different strategies and investigate the cooperation and competition among vehicles. Combining the multi-objective optimization algorithm and the incentive mechanism of survival of the ﬁttest reproduction law, the system structure reaches a stable equilibrium state and the optimal group driving strategy evolves. Simulation results, with 40,000 vehicles driving in Luxembourg SUMO Trafﬁc Scenario, demonstrate that the rational–rational strategy performs best among six typical strategies. Meanwhile, the accident rate drops by 56%, while the overall average speed increases by 30%. The results of multi-vehicle and multi-objective coevolution are enlightening in designing optimal driving strategy with AVs.


INTRODUCTION
Traffic congestion and accidents are two critical problems in urban traffic systems, especially for megacities like Shenzhen and Beijing [1]. With the development and application of autonomous vehicle (AV) systems and intelligent transportation systems (ITS), urban traffic conditions are expected to be improved through excellent driving strategies and traffic control strategies [2]. Vehicles on the road interact with each other, therefore, the actions taken by the vehicle are affected by other surrounding vehicles. Moreover, based on V2X technology and ITS, strategies with superior performance will be imitated and adopted by more AVs. Autonomous vehicle systems support or automate driving tasks in a safe and efficient way [3]. The optimal driving strategy for large-scale urban traffic network should consider multiobjective optimization, such as the overall accident rate and efficiency. The urban transportation system is faced with the To solve this multi-objective optimization problem, we propose the driving strategy coevolution algorithm based on coevolution theory. Combining the multi-objective optimization algorithm and the incentive mechanism of survival of the fittest reproduction law, the algorithm is introduced to explore the interaction between multiple populations (with different driving strategies). Appropriate evolution rules, optimization objectives and the penalty term are set up. In the process of continuous evolution, the system structure reaches a stable equilibrium state, in which the traffic network has the highest traffic efficiency and the lowest accident rate. Therefore, the optimal group driving strategy evolves under different population distributions.
The reminder of this article is organized as follows. In Section 2, background and related works are introduced. Section 3 formulates the multi-objective optimization problem. Section 4 provides the problem-solving procedure based on the proposed coevolutionary algorithm and Section 5 presents the experimental results and discussion with a brief conclusion given in Section 6.

Evolutionary theory
Evolutionary theory comes from Darwin's theory of biological evolution, which is widely used to solve automatic problems [4]. In order to balance the distribution of traffic flow, Lu et al. proposed a distributed cooperative routing algorithm based on evolutionary game theory, which guides vehicles to routes with small traffic flows to get the shortest travel time [5]. Fogel proposed evolutionary programming to seek the optimal solution for numerous engineering problems [6,7]. However, those evolutionary algorithms are usually used on one single species. The coevolutionary paradigm, inspired by the reciprocal evolutionary change driven by the cooperative or competitive interaction between different species, has been extended successfully to multi-objective optimization recently [8]. Coevolution algorithms were introduced to explore the interactions between two or more species. Individuals of different populations collaborate with each other through competition and cooperation, which in turn promotes the evolution of the whole [9]. At present, the evolutionary theory is sometimes used for macro-control of traffic, but has not been used for optimization of driving strategies. With the popularization of intelligent networked vehicles, vehicles could learn and imitate each other. If optimal driving strategies will be learned by more vehicles, whilst strategies with low fitness will be discarded, then the overall performance of large-scale traffic will be enhanced. This inspires us to study group strategy optimization based on the coevolutionary paradigm.

2.2
Multi-objective optimization problems of large-scale vehicles Kalupová et al. presented that the aim of the use of ITS is more efficient functioning of transport systems, improve safety, increase productivity and economic efficiency of transport, and contribute to improving the environment [10]. The transportation system is a complex system. Various problems are intertwined and affect each other. Optimizing a single index may cause other performance degradation. Therefore, it is necessary to coordinate various optimization indicators and jointly optimize to achieve the balance of the system. There are currently two main methods for solving the multi-objective optimization problem of large-scale urban traffic: one is to optimize the dispatch system, such as traffic light timing optimization, public travel mode, and route optimization [11,12]; the other is to optimize the driving strategy of the vehicle [13][14][15][16]. The optimization of dispatching system has a strong macro-control ability, but its influence on road safety is limited and the control complexity is high. Driving strategy optimization with low control complexity makes vehicles cooperate with each other through collaborative control, improves driving efficiency, creates a benign road environment.
Hu et al. proposed a multi-vehicle cooperative bypassing algorithm for connected and autonomous vehicles, which was helpful to improve the traffic efficiency of the road and the driving speed of the vehicle formation [14]. Wang et al. presented a game theoretic approach based on the expected behaviour of other vehicles [15]. This approach can produce efficient lanechanging manoeuvres while obeying safety and comfort requirements. There is also the potential for a negative impact on traffic flow due to the excessive movement of autonomous vehicles. Tanimoto et al. established a new lane-changing protocol, which considered traffic density ahead of a car in next 50 m and restricted vehicles from changing lanes from low-density lanes to high-density lanes [16]. This new protocol can improve road traffic efficiency and reduce traffic fluctuations.
However, existing methods mainly focus on cooperative adaptive cruise control, cooperative lane changing, cooperative obstacle avoidance, etc. There are scarce researches involving the interaction between different populations for large-scale traffic.

Car-following model
Car-following model uses the method of dynamics to understand the macroscopic phenomenon of the traffic flow by analysing the microscopic behaviour of vehicles. Car-following model is widely used in the fields of microscopic traffic simulation, capacity analysis, adaptive cruise control, and traffic safety evaluation. Pipes [17] was the first to propose a car-following model based on the safety distance paradigm. He assumed that the driver expected to maintain a prescribed safe following distance with the leading car, that is, for every 4.47 m/s increase in the speed of the leading car, the safe following distance would need to increase by one car length (about 4.57 m). After him, more car-following models based on different paradigms have been proposed successively, such as Gipps model [18], Krauss model [19] and the Intelligent Driver Model (IDM) [20]. However, car-following model has a very obvious disadvantage, that is, it cannot overtake. Therefore, in order to simulate lane-changing behaviour in real traffic, Gipps comprehensively considered the three main factors of lane changing: necessity, desire, and safety, and proposed the first lane-changing decision model [21]. Based on Gipps' work, a lot of scholars have studied the lane-changing models, such as MITSIM model, CORSIM model, SITRAS model, and LC2013 model [22][23][24][25]. Thanks to the introduction of lane-changing rules, the following model has been improved to simulate the interaction between vehicles in different lanes.
In comparison to other microscopic lane-changing models, LC2013 discriminates between four different motivations for lane-changing explicitly: strategic change, cooperative change, tactical change, and obligatory change.
In this article, we adopt LC2013 as the lane-changing model, and combine it with the other three car-following models (Gipps model, Krauss model, and IDM) as our candidate strategies. Vehicles with LC2013 have both competitive and cooperative relations, which is closer to the real situation than other models. In order to explore the impact of lane changing on traffic, the static lane-change model was also adopted by us. Vehicles with the static lane-change model will always stay in one lane except for changing lanes in order to continue the route before an intersection.

PROBLEM FORMULATION
In this section, we give the definition of multi-objective optimization problem of driving strategy for large-scale urban traffic.
Our goal is to achieve optimal performance for a large-scale urban traffic, including driving efficiency, accident rate, comfort, and so on. This goal is in line with the CARMA Program [26] published by United States Department of Transportation Federal Highway Administration (FHWA). The program encourages collaborative research on transportation systems management and operations (TSMO). It aims to develop a traffic management system and operation strategy to reduce traffic accidents by 70-80%, reduce traffic jams by 60%, and increase the capacity of existing roads by 2 to 3 times. Varaiya also presented that intelligent vehicle (IV) systems support or automate driving tasks in a safe and efficient way [2].
In the urban traffic system, there are several candidate driving strategies to form a strategy set A. Different strategies have different performance in terms of efficiency and safety. Those autonomous vehicles (AVs) choose their strategies from the strategy set A = {a 1 , a 2 , a 3 , a 4 , a 5 , a 6 }. Those AVs that choose the same strategy a k ∈ A form a population V k . For the whole traffic network, the population state is For simplicity, let us take two optimization goals as examples. The multi-objective optimization problem of driving strategy for large-scale urban traffic is formulated in Equation (8). f 1 (x) is the first optimization component, which is used to evaluate the traffic efficiency of driving strategies, with the aim of maximizing the average speedv. f 2 (x) is the second optimization component used to evaluate the safety of driving strategies, with the aim of minimizing the accident rater col . The average speed is the average speed of all vehicles in the entire network. We use the frequency of transport accident as the evaluation index of accident rate, and calculate the accident rate in the unit times per million vehicles kilometre. Letv k and col k , respectively, imply the average speed and accident rate of population V k .
arg min Accident rate col k (times per 10 6 km), also called frequency of transport accident, refers to the ratio of the number of traffic accidents in a certain area to the total distance of the vehicle in a certain period of time. The traffic accident rate not only indicates the level of comprehensive management of road traffic, but also an evaluation index of traffic safety. It is calculated by Equation (13).
where C k is the number of accidents of population V k , and L k (10 6 km) is the driving distance of all vehicles of population V k . n denotes the total number of vehicles in population V k .

PROPOSED COEVOLUTION PROCEDURE TO SOLVE THE OPTIMIZATION PROBLEM
In order to achieve optimal performance for a large-scale urban traffic, we establish a multi-objective optimization equation concerning average speed and accident rate in Section 3. In this section, we introduce our coevolution procedure to solve the multi-objective optimization problem as mentioned above.
It is difficult to get an analytical solution to Equation (8). Aiming for multi-vehicle and multi-objective optimization, we integrate the coevolution mechanism to the large-scale urban traffic to realize the overall optimal performance including driving efficiency and accident rate. The coevolution algorithm has characteristics of self-organization, self-learning, and self-adaptation when solving large-scale global optimization problems. Moreover, its advantages include high robustness FIGURE 1 Velocity distribution appears to be Gaussian distribution in a simulation and wide applicability in high-dimensional, multi-objective, and large-scale problems. The multi-objective optimization problem above is taken as a coevolution of self-driving strategies among numerous populations, considering constrains of the vehicle kinematics model and driving strategy diversity. Based on V2X technology and ITS, strategies with superior performance will be imitated and adopted by AVs. The better the strategy, the more vehicles will adopt it. Therefore, the evolution rules of individuals and populations are proposed as follows: 1. The average speed was taken as the fitness of each individual, and the accident rate was taken as the penalty item of the population evolution. 2. Individuals with low fitness will be eliminated, while individuals with high fitness will be retained and even given the chance to reproduce (replicate).
Furthermore, after statistics, the vehicles' velocity distribution appears to be Gaussian distribution in a simulation as shown in Figure 1. Thus, we proposed the replicator dynamics rules: vehicles with average speed v ∈ [ + , + 2 ) will have two offspring; vehicles with average speed v ∈ [ , + ) will have one offspring; vehicles with average speed v ∈ (0, ) will not reproduce. The replicator dynamic equation is shown as follows where o f f Since accidents are a probability event that happens by a small number of vehicles, they cannot be directly used as a penalty for individual fitness. Here we use the accident rate as a statistically significant parameter. The accident rate of each population is calculated as the penalty item when updating the population proportion. The system flow is illustrated in Figure 2. In the process of continuous evolution, the population state is expected to be updated until it converges.
The procedure to investigate the candidate strategies is: Step 1) Initialize the road network H and the state A, the strategy set X , inflow rate R, vehicle routes L and the number of time steps N .
Step 2) Run the simulation and update the road environment at each time step.
Step 3) Record the performance of individual vehicles, including velocity and accidents.
Step 4) After simulation, compute the mean and standard deviation of all vehicles' velocity in the traffic network. Generate offspring from parents following the replicator dynamics rules as shown in Equation (15).
Step 5) Compute every population's accident rate col p and overall accident rate of the traffic network col . Adjust the proportion of each population to punish the population with high accident rate and reward the population with low accident rate.
Step 6) End when the state X converges. Or initialize the traffic network and run the next loop of the simulation.
The detailed algorithm is demonstrated in Algorithm 1 (see Appendix 1 for Algorithm 1).

Traffic network
Our simulation platform is developed based on simulation of urban mobility (SUMO). SUMO is an open source, portable, microscopic, and continuous multi-modal traffic simulation package designed to simulate large networks [27]. It is one of the most widely used simulation platforms in traffic simulation and autonomous driving simulation. SUMO includes an API called Traffic Control Interface (TraCI), which allows users to read and control the simulation process and extend existing SUMO functionality. The simulation platform architecture is shown in Figure 3. After importing the road network and initializing the simulation environment and the population state, the proposed coevolution algorithm passes a command via TraCI to start the simulation in SUMO. Controllers output actions after synthesizing the vehicle state, and these actions are then applied to controlled vehicles to promote the simulation. In each step, the proposed coevolution algorithm collects vehicle information from SUMO via TraCI. After running for a given number of steps, the proposed coevolution algorithm generate offspring from parents FIGURE 3 Simulation platform architecture. The proposed coevolution algorithm collects simulation data via TraCI to evaluate strategies following the replicator dynamics rules, and updates the population state. Vehicles with low fitness are reassigned their driving strategies so that the population state in the system meets the updated target value. After that, it passes a command via TraCI to continue the simulation.
The collision detection module in SUMO is used to detect whether a vehicle has collided in the process of simulation. At each time step, the number of collisions and vehicle information are obtained, respectively, via TraCI (tc.VAR_COLLIDING_VEHICLES_NUMBER, tc.VA-R_COLLIDING_VEHICLES_IDS). After statistics, we can get the number of collisions C k of each population in a loop of simulation.
SUMO tracks gaps between vehicles that are on the same edge either fully or partially. Whenever these gaps are reduced to below a limit value, a collision is registered. Meanwhile, collisions between vehicles on the same intersection are also checked by detecting overlap of the vehicle's shapes [28]. Besides, in order to avoid rear-end collision, the rear vehicle will break hard (warning: emergency braking) in some cases, and its acceleration will exceed the specified limit. It will also be registered as a collision.
Similarly, the driving distance of each vehicle can also be obtained via TraCI.

Traffic network
To reproduce realistic mobility patterns, we import a realistic traffic scenario named "Luxembourg SUMO Traffic (LuST) Scenario" as shown in Figure 4. Codeca et al. used information from a real city with a typical topology common in mid-size European cities, and realistic traffic demand and mobility patterns to build this scenario [29]. There are characteristic patterns in many European cities, which usually consist of a  central downtown area, surrounded by different neighbourhoods, which are linked by arterial roads [30]. The LuST scenario covers an area of almost 156 km 2 with 950 km of roads of different types, multiple intersections, roundabouts, and traffic light intersections. Table 1 lists the simulation scenario description.

Vehicles
The overall inflow rate of vehicles is set to 55,000 vehicles/h, and inflow rate of vehicles on the highway is set to 4000 vehicles/h, which represents typical traffic situation in urban cities [1]. In the simulation initialization phase, each vehicle randomly chooses from several predetermined routes and divides into multiple populations with different strategies. All vehicle types and their defaults are shown in Table 2.

Strategy set
In this experiment, we adopt three car-following strategies including the Krauss model, the IDM, the Gipps model and two lane-change strategies including LC2013, and the static lanechange model. All of them are classic and excellent in performance. They are combined to form six driving strategy profiles. There is a jump in the acceleration process of the Gipps model, that is, the entire process reaches the desired speed with the maximum acceleration, and then the acceleration suddenly changes to 0. We refer the Gipps model to an aggressive strategy. Default parameters of IDM result in rather conservative lane-changing gap acceptance. It is a conservative strategy. Theoretically, the Krauss model improves the acceleration and deceleration process of the Gipps model and takes into account the conditions of multi-lane traffic. Therefore, we refer the Krauss model to a rational strategy.
Static lane-change model, which seldom changes its lane for higher speed, is a conservative lane-changing strategy. LC2013 contains four different motivations for lane changing: strategic change, cooperative change, tactical change, and obligatory change. It makes cars have both competitive and cooperative relations, which is closer to the real situation than other models. We refer the LC2013 model to a rational strategy. Those candidate driving strategies in the experiment are listed in Table 3 (see Table 3).
The speed, the desire to change lanes, the acceptable lanechange gap, and the safety time gap when passing an intersection are set to different values for different strategies. First, the desire to change lanes of the aggressive-rational strategy is stronger, and the acceptable lane-change gap is narrower. The conservative-rational strategy has a lower desire to change lanes, and the acceptable lane-change gap is wider. Second, the safety time gap of aggressive-rational strategy and aggressiveconservative strategy is shorter when passing an intersection without priority, while the one of conservative strategy is longer.
Those AVs choose their strategies from those candidate driving strategies. During the simulation, vehicles learn and imitate each other's better strategies.

Experiment results
Based on SUMO and LuST, we perform collision statistics and performance evaluation in the process of simulation. Vehicles with six driving strategy profiles are added into the LuST scenario.
On the road, vehicles will change lanes for the purpose of getting higher speed, turning left or right before the intersection, or allowing others to change lanes. The strong desire to change lanes or the short acceptable lane-change gap may cause unsafe lateral movement, which may cause side collisions or rear-end collisions [31,32]. As shown in Figure 5, when the vehi-  Two vehicles (inside the green box) collide at the intersection cle changes lanes, it may collide with the rear vehicle or force the rear vehicle to slow down. Besides, collision may also occur at intersections (collisions at intersections) [33,34]. As shown in Figure 6, since conflicting streams is allowed to drive at the same time, if a vehicle's safety time gap is shorter when passing an intersection without priority, it will lead to a greater probability of collisions with conflicting vehicles. In order to survey the evolution law of the state under different population distribution, we carry out experiments with different initial states: the initial state with the same proportion and initial states with one single population.

5.2.1
Initial state with the same proportion In this experiment, we consider an initial state in which each population has the same proportion, that is, X 0 = (1∕6, 1∕6, 1∕6, 1∕6, 1∕6, 1∕6) . Through coevolution and the law of survival of the fittest, the population distribution of dif-  Figure 7 indicates the optimal strategy emergence after 12 generations. Finally, the convergence condition is met, that is, the proportion of each population hardly changes.
The experimental results demonstrate that: • Population with rational-rational strategy stands out from six candidates. They always perform the best in both low accident rate and high average speed, so their proportion quickly increases and gradually approaches almost 90%. • Populations with conservative-rational, aggressive-rational, conservative-conservative, and aggressive-conservative strategy show the worst performance. Although they started

FIGURE 9
Average speed increases after the optimal strategy emerges out with similar accident rates, the change of their accident rates in the later period varies greatly. Although the accident rate of the population with rational-conservative strategy is as low as population with rational-rational strategy, the average speed of the former is much lower than that of the latter. Therefore, its proportion does not increase as rapidly as the proportion of the population with rational-rational strategy at the beginning, and at last its proportion gradually approaches 6%. • Figure 8 indicates that the accident rate of the whole traffic network drops by 49% after one evolution, and the change of the accident rate slows down in the following evolution. In the end, the accident rate of the whole traffic network drops by 56% and approaches 12.1 times/10 6 km. • Figure 9 indicates that the average speed of the whole traffic network increases rapidly at the beginning of ninth generation, and slows down in the following evolution. At last, the average speed of the traffic network increases by 30% and approaches 28.4 m/s. • Finally, the proportion of each population has the following relationship:

Initial states with one single population
In the second experiment, we consider six different initial states in which there is only one single population. X i (i = 1, 2, ⋯ , 6) presents six different initial states. 1 means that the corresponding population proportion is 100% at the beginning, and 0 means that the corresponding population proportion is 0%.

FIGURE 10
Evolutionary process of traffic system: the traffic system always converges to the state with low accident rate and high speed By setting up an activator, other populations have a chance to produce and reproduce. Figure 10 indicates those evolutionary directions of transportation system under different initial states. No matter what the initial state is, the system converges along the direction of lower accident rate and higher speed (bottom right in the diagram). Simulation results are shown as Figures 10 and 11. Figure 10a,b,d, and e indicates that in the first 6-8 generations, the proportion of rational-conservative strategy is higher than rational-rational strategy. After eighth generation, the proportion of rational-conservative strategy is maintained at a relatively stable level. The proportion of rational-conservative strategy declines slightly until rational-rational strategy become dominant in the end.
In Figure 11 (see Figure 11), results demonstrate that no matter what kind of initial states, rational-radical strategy emerges from six candidate driving strategies, and the traffic network can converge to a stable state as the same as that found in the last experiment.
Rational-conservative strategy still exists although not as the champion, which indicates that changing lanes is not always useful to improve traffic efficiency in some cases.

Discussion
No matter what the initial state is, the system reaches a state as Equation (16). In the large-scale transportation system, rational-rational strategy always perform the best in both low accident rate and high average speed. At the same time, rationalconservative strategy also keeps a low accident rate and a high average speed, so changing lanes frequently is not always beneficial for the overall network efficiency. Under different traffic conditions, the motivation for changing lanes and the impact of lane changing on traffic flow are different. When the density is low, the main purpose of changing lanes is to overtake the slow car. During this time, due to the large front and rear clearance, the impact of lane change on surrounding vehicles is small, which has a positive effect on improving the efficiency of the entire road. However, when the density is high, the vehicles on different lanes are with similar speed. The lane-changing behaviour has a greater impact on the surrounding vehicles in this situation, and causes the traffic flow to change from order to disorder.

CONCLUSION
The traditional single-vehicle decision-making process of autonomous vehicles does not consider vehicle incentives and coevolution, so it cannot form the optimal decision of the system. Here, to solve the multi-objective optimization problem, we propose an algorithm based on coevolution theory to investigate the cooperative and competitive interaction among populations with different strategies. The multi-objective optimization algorithm and the incentive mechanism of survival of the fittest reproduction law are combined. Experimental results demonstrate that the system structure always reaches a stable equilibrium state under different population distributions. And when the optimal group driving strategy emerges, the overall accident rate drops by 72%, and the average speed increases by 30%. The results are instructive in designing self-driving strategies with future ITS and intelligent vehicles. Our method can be extended to explore optimal group driving strategies for a city containing millions of self-driving vehicles.