Human-Like Interactive Behavior Generation for Autonomous Vehicles: A Bayesian Game-Theoretic Approach with Turing Test

Interacting with surrounding road users is a key feature of vehicles and
is critical for intelligence testing of autonomous vehicles. The
Existing interaction modalities in autonomous vehicle simulation and
testing are not sufficiently smart and can hardly reflect human-like
behaviors in real world driving scenarios. To further improve the
technology, in this work we present a novel hierarchical
game-theoretical framework to represent naturalistic multi-modal
interactions among road users in simulation and testing, which is then
validated by the Turing test. Given that human drivers have no access to
the complete information of the surrounding road users, the Bayesian
game theory is utilized to model the decision-making process. Then, a
probing behavior is generated by the proposed game theoretic model, and
is further applied to control the vehicle via Markov chain. To validate
the feasibility and effectiveness, the proposed method is tested through
a series of experiments and compared with existing approaches. In
addition, Turing tests are conducted to quantify the human-likeness of
the proposed algorithm. The experiment results show that the proposed
Bayesian game theoretic framework can effectively generate
representative scenes of human-like decision-making during autonomous
vehicle interactions, demonstrating its feasibility and effectiveness.
 Corresponding author(s) Email:   lyuchen@ntu.edu.sg  

DOI: 10.1002/aisy.202100211 Interacting with surrounding road users is a key feature of autonomous vehicles and is critical for their intelligence testing. Existing interaction modalities in autonomous vehicle simulation and testing are not sufficiently smart, and they barely reflect human-like behaviors in real-world driving scenarios. Therefore, a novel hierarchical game-theoretical framework is presented to represent naturalistic multimodal interactions among road users during simulation and testing, which is subsequently validated by the Turing test. Given that human drivers have no access to the complete information of the surrounding road users, the Bayesian game theory is used to model the decision-making process. Next, a probing behavior is generated by the proposed game-theoretic model, and is further applied to control the vehicle via the Markov chain. The proposed method is tested through a series of experiments and compared with existing approaches. In addition, Turing tests are conducted to quantify the human-likeness of the proposed algorithm. The experimental results show that the proposed Bayesian gametheoretic framework can effectively generate representative scenes of human-like decision-making during autonomous vehicle interactions, demonstrating its feasibility and effectiveness. An interactive preprint version of the article can be found at: https://www.authorea.com/doi/full/10.22541/au.163482630.00283278.
considering the aggressiveness should be explored along with the representation and quantification of human-likeness.
To be more specific, we list some representative interactions and possible conflicts in Figure 1. The first situation is a vehicle-vehicle interaction, occurring during lane change and merging. The vehicle-pedestrian interaction is next presented, which is extremely important especially in unstructured or unsignalized areas. The third modality, i.e., pedestrian-pedestrian interaction, imposes more uncertainties on the autonomous driving scenarios. The most challenging situation is when road users experience conflict in their expected trajectories due to their noncooperative behaviors. For instance, in vehicle-pedestrian interaction, the optimal solution for each (blue and yellow lines, respectively, shown in Figure 1) is to maintain the current speed. However, if neither of them decelerates, a collision is inevitable.

Decision-Making behind the Scenes
Decision-making logic for autonomous vehicles and other road users can be similar and mutually beneficial for researches. Many scholars have studied the decision-making in autonomous vehicles [17] and pedestrians. [18] Among them, learning-based approaches are promising and gaining popularity. [17] Some studies have realized human driving behaviors generation using machine learning algorithms, such as deep learning, [12,19] imitation learning, [20] and inverse reinforcement learning. [21] However, owing to the inherent black-box nature of the neural networks, the interpreting ability of learning based methods is not ideal. Inspired by the game-like essence of road users interactions, interpretable game-theoretic approaches that are more reasonable and practical were investigated. Some researches formulated the decision-making process as a Stackelberg game, [3][4][5]15] and imposed a strong assumption on the availability of the leader and their utility function during the game. Furthermore, as the opponent vehicles may not always act as the formulated Stackelberg game expects, an online estimation algorithm is proposed using historical data to improve the gamebased interactions. [22,23] Additionally, there exists a problem in finding the Nash equilibrium. There may be more than one Nash equilibrium, conflicting with one another. [24,25] Apart from the abovementioned methods, Minimizing Overall Braking Induced by Lane Change-Intelligent Driver Model (MOBIL-IDM) [26] widely dominates the field of traffic scene generation. Though originally designed to be collision-free, by modifying some key parameters, such as politeness, acceleration, and grid distance estimation, the algorithm can generate adversarial behaviors for testing. [1,27] Other methods, including the risk field, [28] artificial potential field, [29,30] constrained Delaunay triangulation, [4,14] and scene prediction, [31] can model human driver's cognitive states while considering safety. In general, the aforementioned methods either make strong assumptions on the availability of data and information, or lack integrity in representing human behaviors, resulting in nonideal scenes for driving test. Moreover, the currently learning and game-based approaches require heavy computation resources, limiting the implementation of advanced scene generation algorithms.

Estimation of Driving Aggressiveness
Aggressiveness is an important factor for vehicles competing right of ways, [14] and can be considered as a trade-off between driving safety and travel efficiency. [22] Owing to the complexity of the problem, there is no unified method for measuring aggressiveness. [11] Intuitively, relative speed, acceleration, and distance between vehicles can be utilized to quantify aggressiveness. [14,[32][33][34] Nevertheless, using just vehicle dynamics to measure driving styles is not comprehensive. Thus, many studies shifted to the driver behavior-oriented and scene-specific methods. [35][36][37] In recent years, some new elements were introduced in evaluating aggressiveness. Many new explorations were conducted from aspect of scenes (straight road, [38] curves, [28] and roundabout [39] ) and human factors (hand, [40] eye, [41] EEG [42] ).
However, these methods have two main drawbacks. First, the aggressiveness of a driver may not be consistent, owing to the varying scenarios and travel demands. Second, from the energy management perspective, although the driving style recognition was proved beneficial for long-term strategy optimization, [43] obtaining the exact value of aggressiveness instantly may not be necessary. Therefore, instead of realizing an accurate and continuous value for the estimation, we maintain that identifying the relative competitiveness is a more feasible and pragmatic way for autonomous driving.

Human-Like Behaviors
One of the challenges that distinguish autonomous driving from other mobile platforms is traffic uncertainty. From this point of view, representing human-like behaviors of road users is essential for scene generation. The results reported in Feng et al. [1] indicate that generating rare cases, i.e., using the Markov model to generate initial scenes randomly and IDM-MOBIL model for adversarial behavior, can shorten the overall testing time. However, the Markov model is only used when the vehicle is cruising, during which the background vehicle is hardly relevant to the tested vehicle, and IDM-MOBIL cannot completely replicate human behavior during the interaction. Generating aggressive behavior and possible accidents are essential, but only the collision-avoidance functionality can be tested by inserting a nonhuman-like movement, which is merely one part of the driving intelligence test. Learning-based www.advancedsciencenews.com www.advintellsyst.com approaches are also exploited as approximators for human-like driving. [44,45] However, these methods suffer from the blackbox nature of neural networks which can hardly be customized and interpreted for logic analysis. Learning from datasets is an interesting and promising methodology for realizing human-like driving, [46] but the diversities and uncertainties of human driver behaviors should be further considered. Therefore, it is difficult to formulate all human decisions as a unified optimization problem, especially for solving a global optimum. Besides, human drivers have limited, but varying sensing and motor abilities, and their control performances are inherently imperfect. The contributions of this work are summarized as follows: 1) a Bayesian game theory based novel decision-making framework that does not require complete information of the surrounding road users is established; 2) an online aggressiveness estimation method is presented and validated using human-in-the-loop experiments; 3) the human-like moving behavior is generated using Markov chain. The above processes are customized for autonomous driving simulation and testing, and the details are presented; 4) the proposed method is compared with the state-of-the-art approaches, and human-likeness of the proposed method is validated through Turing tests.

Methodology
According to the aforementioned analysis, the formulated problem consists of two main elements: conflict and alternatives. First, for the conflict, there will be severe consequences if neither of the two interaction agents deviates from their original expected choices, and one of them has to yield eventually. Second, for the alternatives, each of the participants should have at least two options, i.e., fight or yield. The conflict is defined as multiple agents' predicted positions breaking the minimum distance threshold. The expected motion trajectory is predicted based on assumptions of the current velocity and yaw angle. This is in line with the human-like concept, as humans do not make complicated and precise predictions. The prediction algorithm is given below where X : t and Y : t are the longitudinal and lateral displacements at time t, respectively. v t is the current velocity and ϕ t is the yaw angle. Equation (1) can be further discretized as Equation (2).

<
: where D sv , L sv , D bv , and L bv are the width and length of the agent, surrounding obstacle agent, respectively. The geometry definition of α and β is given in Figure 2. Δw and Δs are tolerances. The collision checking algorithm can be replaced by other algorithms, such as calculating Euclidean distance for simplicity. This article primarily focuses on the alternatives.

Proposed Framework
The complete framework is shown in Figure 3a for interactive driving scene generation, which is explicit and inspired by the human's decision-making process. Within the framework, the scheduler is used to select players of interest, using Euclidean distance or a predefined Gaussian function. If one is not selected, it will follow its own expected waypoints; how it moves has no direct impact on the driving situation of the tested vehicle. However, when one agent is selected, the candidate trajectory generator and algorithm in Figure 3b will be activated. The trajectory generator algorithm, such as Rapidly-exploring Random Tree (RRT), [47] and semireactive trajectory generation [48] generate multiple possible trajectories for the player. The algorithm then determines whether there exists a conflict between the expected trajectory and the predicted trajectory. If there is no conflict, the best solution will be selected from the candidates. If there exists a conflict, the algorithm will determine whether there is still available room to fight. If the ego player is not blocked, it will estimate the aggressiveness of other surrounding agents using the method proposed. Then, the aggressiveness will be updated for the decision-making module, which is based on Bayesian game theory. If the decision is explicit, i.e., to fight or to yield, www.advancedsciencenews.com www.advintellsyst.com then the agent will follow the respective trajectory. However, when the decision is not clear, it takes a few small steps for probing, inspired by human behaviors. The small-step action is of low risk in terms of collision but sufficient to demonstrate the agent's aggressiveness. This proposed framework is applicable for decision-making and interactions among multimodal agents, including vehicles, cyclists, and pedestrians, but in this work, we primarily focus on the interactions between vehicles.

Bayesian Game Based Decision-Making
As the core algorithm for the interaction is the decision-making process, and the aggressiveness Observation & update module is designed to serve this process, we discuss the decision-making algorithm here. This module is based on the Bayesian game theory, which makes no assumptions on the accessibility of the cost function of the opponent or the game leader. However, to simulate the subject for different scenes, the cost function of player 1 is still required. Numerous scholars have studied game-theoretic approaches, in which an N-player game for N ¼ 2, 3, : : : n can be simplified into a two-player game, i.e., N ¼ 2. Each player i ∈ N has at least two alternative solutions based on the problem definition represented by a discrete set of A i ¼ fa i,1 , a i,2 , : : : , a i,k , : : : , a i,K g, K ∈ ½2, ∞ and utility function given by u i ða i,k , a i 0 , k 0 Þ. Based on the conflicts definition, A i comprises two clusters of strategies, i.e., is the optimal or expected trajectory that minimizes the utility function among all the alternatives to fight. The mechanism is the same for the yield decision A i,Y ∈ A i,Y . The best solution always dominates the remaining choices; thus, it is not necessary to list all the choices.
Considering the lack of information of other agents, relative aggressiveness becomes important for decision-making and interaction. Assuming two players, player 1 and player 2, from player 1's perspective, the aggressiveness of player 2 can be classified into three types: equally aggressive, less aggressive, and more aggressive. The probability distribution of these three types p j ð P p j ¼ 1ðj ¼ 1, 2, 3ÞÞ follows a multinomial distribution.
Different types of player 2 have different utility functions, as summarized in Table 1.
For clarity, u i is short for u i ða i,F=Y , a i 0 , Y=F Þ. In the table, u 0 1 and u 0 2 refer to the more aggressive player's cost if it chooses to fight, which are supposed to be much smaller than u i , because aggressive road users tend to underestimate collisions. For simplification, we define that u 0 Meanwhile, if player 1 chooses to yield, player 2's choice (to yield or to fight) makes no difference to player 1's cost, and vice versa. This means that To find the Bayesian Nash equilibrium, we have to extend the table (see Note S2, Supporting Information). Let where U 1 is the utility function shown in Table 1, and P ¼ ½p 1 , p 2 , p 3 is the probability distribution. Player 1 is confident to fight when Player 1 will choose to yield if This is because in Equation (5) and (6) there will be only one equilibrium. However, player 1 will be confused if there is more than one equilibrium as follows ( As can be seen from the aforementioned algorithm, the final decision is based on the probability of the relative aggressiveness of the opponent and cost function of the subject vehicle. From both the engineering and optimization perspectives, the cost function should be bounded. Thus, the cost functions are normalized using the exponential kernel. Exponential kernels may change the shape of the utility functions, but humans are more sensitive to a certain variations. To further illustrate this problem, the following sections take vehicle-vehicle interactions as an example. Travel efficiency is defined by Equation (8).
where f te is the cost of travel efficiency. The maximum velocity takes vehicle physical limitation v max,p , traffic rules v max,r , and vehicle stability limitation v max,t into consideration. Safety cost f sf is given by Equation (10).
We also consider the urgency of lane changing f ug as in Equation (11).
where ðX int , Y int Þ is the coordinate of the intersection and ðX term , Y term Þ is the terminal coordinate of the selected strategy. ϵ is to avoid zero denominator. Additionally, only the longitudinal control effort is estimated as in Equation (12).

Aggressiveness Estimation and Belief Updating
The road user's decision is based on their observation of the aggressiveness of the surrounding road users. Aggressiveness is defined as a trade-off between safe distance and travel efficiency. Based on this definition, the observed aggressiveness is given as a sum of the longitudinal and lateral components. This aligns with the concept of risk field: driver's reaction to the surrounding obstacles is a function of relative distance. However, we also have to mitigate distance variations that the subject driver generates. Thus, the risk field is simplified with the Gaussian model as in Equation (14), the parameters of which can be obtained via the proposed experiments.
where α bv is the estimated aggressiveness of the obstacle vehicle and σ defines the shape of the aggressiveness field. The velocity in the numerator eliminates the distance variation created by the subject vehicle. To accurately measure the parameters, we first assume A f ðΔv, ΔφÞ, σ X ðv sv Þ, and σ Y ðΔφ sv Þ are linear as below.
8 < : where θ 1 , θ 2 , θ 3 , θ 4 , θ 5 , θ 6 , and θ 7 are the parameters of these linear equations. The relative velocity is not included in Equation (15) because in the questionnaires, drivers reported not estimating the relative velocity, which is, however, reflected in the distance variations. After estimating the obstacle player's strategy, the subject player updates its belief about the driving style probability distribution, i.e., p i in Table 1. p follows a Dirichlet distribution given as www.advancedsciencenews.com www.advintellsyst.com pðpÞ % Dðβ 1 , : : : , Γð⋅Þ is the gamma function. Based on the observation of the obstacle vehicle, the algorithm updates hyperparameters β k using Equation (17).
where α th is the sensitivity threshold that the driver can spot. α sv and α bv are the aggressiveness of the subject and background vehicle, respectively. κ 1 , κ 2 , and κ 3 are the sensitivity parameters.
For the decision-making algorithm, the probability in Equation (4) is replaced by the expectation of the Dirichlet distribution defined by

Parameter Identification for Personalized Aggressiveness Estimation
A series of experiments were conducted to obtain the personalized parameters in Equation (15). The experiments platform is a 64-bit Windows 10 machine with an Intel Core i7 CPU, 32 GB memory, and two Logitech G29s. The experiment environment is based on Simulink and Unreal engine with 10 Hz sampling rate. Because aggressiveness is an abstract idea and is difficult to quantify, we aim to measure the extremity of aggressiveness (aggressiveness ¼ 1) experimentally first. Hence, this experiment is designed to find the participant's limitations and then fit into Equation (15). In the experiment, the participants can observe the subject and surrounding vehicle, but has no control over the subject vehicle except a stop button to terminate the experiment (details can be found in Note S6, Supporting Information). The subject and obstacle vehicle were initiated with a constant steering angle and velocity, which are designed to ensure collision if the participant does not terminate the experiment. The participants are asked to stop when they find the obstacle too aggressive to undertake. We then record the current relative position and yaw angle. The subject vehicle is controlled by a PID controller with white Gaussian noise to enrich the dataset.
After five repeated sets of experiments, the vehicle starts at a new position as in Figure 4. A total of six different collision angles are considered. The experiments are again repeated for different speeds. The obstacle maintains constant velocity and yaw angle. There are a total of five different velocity references.
The experiment results can be found in Note S3, Supporting Information. Particle swarm optimization is used to estimate the parameters of the nonlinear problem in Equation (19). Given that only one value, aggressiveness ¼ 1, is not sufficient to determine the shape of function, we need to add extra points. Determining the accurate intermediate values between 0 and 1 is difficult. However, we can determine when aggressiveness is 0. Hence, the future position in 5 s ahead (assuming the velocity is constant) and the adjacent lane center are set to the points with non-aggressiveness, as the indicated in red Figure 4. However, considering the optimization cost function, we balance the number of aggressiveness ¼ 0 and aggressiveness ¼ 1 points. Instead of 1 point, we manually add 15 points for each position.
arg min The fitting results are θ 1 ¼ À0.0176, θ 2 ¼ 1.9921, θ 3 ¼ À0.4016, θ 4 ¼ 1.2912, θ 5 ¼ 5.2985, θ 6 ¼ 27.5751, and θ 7 ¼ À30. We tested the fitting parameters and visualized them in Figure 5. The aggressiveness estimation is shaped by both velocity and relative yaw angle. As in Case I and II, the speed is maintained at 7 m s À1 , but the relative yaw angles are 0 and π=6 rad, respectively. When the yaw angle is larger, the shape of aggressiveness is wider along Y axle and shorter along X axle to imitate the merging case which is different from when two vehicles drive in parallel. Few previous research has considered relative yaw angles for aggressiveness estimation to the best of our knowledge. The red plain refers to aggressiveness ¼ 1 as in the experiment above. Compared to v ¼ 7m s À1 , when v ¼ 13 m s À1 , the X axle are elongated as in Figure 3. Case IV to Case III is the same as Case II to Case I. The experiment can be repeated with different participants, thus individualizing various observing behavior.

Probing Behavior Generation
If the subject vehicle is confused by the relative aggressiveness compared to the surrounding vehicles, it takes a few small steps to test the real aggressiveness of the obstacle vehicles. These steps can neither be too big to trigger collisions, nor too small that shows no sign of aggression. This part is what makes this work different from the extant studies. When dealing with uncertainties, current methods intend to give up or fight outright according to fixed thresholds, such as time-to-collision. However, in reality, people intend to fight/bluff to certain degrees. If the obstacles are indeed more competitive, they will then eventually give up; but if not, they can win the right of way, especially during a traffic jam where human driver inclines to get to a more advantageous position. The small step probing behavior is given by the algorithm in Table 2.
where α sv last is an interim; rand is a random number generator between 0 and 1 to simulate randomness in tests. The result of the above algorithm can be viewed as the subject vehicle's strategy or the aggressiveness it wants to impose on the surrounding road users. However, this should be a value between   Ideally, implanting the strategy interim α sv into the Markov transitional probability is the best. However, as α sv is an abstract variable, to connect the small-step strategy, the transitional probability above is not used directly. Inspired by Shin and Sunwoo, [49] the joint Markov chain is defined by These two equations can be viewed as a combination of behavior and strategy. N is a normal distribution with mean 2a gridX ðα sv À α M Þ or 2a gridY ðα sv À α M Þ, and standard deviation δ x or δ y , and are tuned according to the grid distance of the Markov transitional matrix. α M is given by the middle point of the transitional probability, which equals to 0.5 in this work. a gridX and a gridY are given by the shape of the transitional probability matrices. Ideally, the direction of fight or yield can be defined according to the gradient of risk field. [28] To simplify, the direction of fight is given by the direction of shortening the distance of the two subjects, and the direction of yield is vice versa.

Human-Likeness Evaluation of the Autonomous Driving Algorithms Using Turing Test
To further evaluate the human-likeness of the proposed algorithm, Turing tests are designed and conducted. The experiment environment is given in Figure 6. Two participants sit in two simulators separated by a barrier so that they cannot see each other. The scene and parameters are presented in Note S1, Supporting Information. Vehicle 1 can be manipulated by participant 2 or our algorithm. Twenty rounds of tests were carried out. The participant details are as follows: male: 16 The detailed procedures of the experiments are shown below: 1) Before the test, we inform the participants the following items: a) The goal of the experiment is to distinguish (driver B: V2) whether the opponent driver (driver A: V1) is a human or an algorithm based on the interaction. b) The task of the scene as is depicted in Note S1, Supporting Information. c) The driver should follow the traffic rules. d) While the driver should behave reasonably, it is also important to give pressure to vehicle 1. When driving very cautiously, without any conflict of interest, it is hard to evaluate the performance of vehicle 1. Furthermore, the proposed algorithm is for scene generation. A human-like manner to trigger an accident is sometimes needed. e) The driving style of the algorithm is randomly generated before each test. f ) One single test takes 20 s and there will be a total of ten tests for each participant. 2) Two participants have a few test rounds till they are familiar with the simulator and the dynamic performance of the vehicles. 3) After enough practice, driver B chooses to drive in autonomous mode or fully manual mode. If driver B chooses autonomous mode, random driving style parameter (α sv th ) will also be generated. However, driver A cannot view this process. 4) Driver B initiates one test and before the start of the test, driver B informs driver A so that driver A is well prepared for the experiment. 5) Complete one test. 6) Fill in a questionnaire. 7) Repeat step 3) to 6) nine times. 8) Score the entire experiment using table in Note S5, Supporting Information. 9) After one set of tests, driver B will be asked their judgment criteria.
The questionnaire for driver A comprises only one question with five alternatives for each test (Question: rate the performance of diver B. Choices: Robot driver, Somewhat robot-like, Not sure, Somewhat human-like, Human driver). The questionnaire for driver B comprises three questions (Question 1: the ground truth whether this test is driven by a human or an algorithm. Question 2: the driving style of driver A. Choices: Aggressive, Cautious, Normal. Question 3: is there a collision in this test and which driver is responsible for the accidents?) As intensive interactions are required for a better judgment to distinguish results and make them more persuasive, accidents by fierce interactions are unavoidable owing to the physical limitations of both the algorithm and human. Therefore, collisions are not deemed as standard for human-likeness evaluation. if α bv ≤ α sv th then 2 α sv ←α sv last þ ðα sv th À maxðα bv , α sv last ÞÞrand; 3 Else 4 α sv ←α sv last ð1 À randÞ; 5 End Return α sv www.advancedsciencenews.com www.advintellsyst.com

Results
Three major results, i.e., customizing the driving behavior, comparing with existing methods, and the Turing test results, are reported in this section.

Driving Behavior Customization
To demonstrate how the parameters in our algorithm will affect the interactive behavior, ways to manipulate the driving behavior by adjusting α sv th and κ are reported and compared in this subsection. First, different thresholds of α sv th are compared in Figure 7. For comparison, the surrounding vehicle (V1) stringently follows fixed predefined waypoints, which, including other information, is unknown to the subject vehicle (V2) by setting the Furthermore, the costs in Table 1 are set to constants for comparisons. Other parameters are given in Note S1, Supporting Information. The work by Werling et al. [48] is used as the candidate trajectory generation algorithm.
We also compare the influence of different κ values. Further, we use different κs in Equation (17), which is for different test requirements: some people might be more cautious under certain circumstances and reckless under others. For the comparison, we simplify the cases by assuming κ ¼ κ 1 ¼ κ 2 ¼ κ 3 (Figure 8).

Comparisons with Existing Methods
To verify whether the proposed algorithm is reasonable and human-like, the algorithm is first compared with extant decision-making methodologies. Comparisons with researches that  www.advancedsciencenews.com www.advintellsyst.com are proven to be human-like are conducted to test the humanlikeness. We compare our Bayesian game based approach with two widely accepted lane change decision-making methodologies: 1) IDM and MOBIL, which are the most frequently used method to imitate human driver's behavior as in Table 3, Figure 1a; 2) Xuemin's [50] method, which generates multiple alternatives and chooses one based on the cost function as in Table 3, Figure 1b.
To evaluate the human-likeness, the proposed method is compared to methods [3,51] that are proven to be human-like. Pedestrian trajectories are generated using game theory as the first comparison in Table 3 Figure 2a. The two pedestrians' shortest trajectories to their respective goals are contradictory. Using the rapid random tree method with B-spline, multiple trajectories can be generated. The best trajectory in Table 1 to fight is set to the shortest trajectory among all generated trajectories while the expected trajectory to yield is set to the shortest trajectory without possible collision. Meanwhile, a comparison with game-theoretic approach [3] is also reported.

Human-Likeness Results with Turing Test
To verify the human-likeness of the proposed method, we conducted a Turing test. The experiment results are summarized in Table 4 and Figure 9. If the proposed algorithm is evidently different from human driving behavior, then the score should be close to 0 or 10.

Algorithm Efficiency
The average calculation time is 2.0105 ms and standard deviation is 5.4302 ms. Furthermore, the decision-making process is independent, which allows simultaneous deployment of massive multiple surrounding road users. The road users only need to broadcast their current locations, velocity, shape, yaw angle (all can be directly seen in reality) to the surrounding road users. The algorithm for each of them is basically decoupled from the other participants; hence, the computational complexity is only proportional to the number of interactive participants. For more complex scene generation, such as curve road or intersections, or more participants, all that is needed is to find adequate candidate generation algorithm. Both the candidate trajectory generation algorithms mentioned in this work (Frenet polynomial and RRT) suit this purpose.

Discussion
As seen in Figure 7, all three cases start with uncertain states because of the incomplete information assumption where V2 has no priori on V1. When α sv th ¼ 0.5, subject vehicle (V2) assumes that it is less aggressive than obstacle vehicle (V1) as can be seen in Figure 6, but it still accelerates a little to ensure this assumption. After a short period of acceleration, it ascertains that it is actually less aggressive, and decelerates to give the right of way to V1 as shown in Figure 6. In contrast, when α sv th ¼ 0.7, V2 believes it is more aggressive than V1 (see Figure 6), where the green region is the largest at the beginning. V2 then accelerates longer to demonstrate its will to fight. At %3 s, V2 doubts whether it is really more aggressive than the surrounding vehicle. Because V1 follows predefined waypoints regardless of what happens, it can be considered extremely aggressive. V2 yields eventually till there is no driving conflict. However, when α sv th ¼ 0.9, V2 assumes it is more aggressive than the surrounding vehicle. Thus, it accelerates intensively, as shown by the green block in Figure 6. After a short period of probing, it chooses to fight. However, for the same reason, the obstacle vehicle can also be seen as extremely aggressive. V2 doubts itself even if it is driving parallel to V1. After a while, V2 recognizes that it is less aggressive and accelerates (the yield direction) to get out of the situation. Here, V2 is not more aggressive than V1, but when the subject vehicle tries to probe, it accelerates and finally blocks the surrounding road user.
As shown in Figure 8, when κ ¼ 1, the driver does not make a rush decision as compared to κ ¼ 2, and the relative aggressiveness varies less intensively. By tuning those above whitebox parameters, various complex behaviors can be generated.
As in Table 3 Figure 1a, when the ego vehicle is less polite or more aggressive as defined in this article, the driver intends to www.advancedsciencenews.com www.advintellsyst.com start a lane change earlier. However, our method is not exactly monotonous. This is because when the vehicle is probing, the subject might misjudge the obstacle vehicle's intention and block it; thus, the obstacle vehicle has to yield. This means that the proposed method concurs with MOBIL's method, which is proved effective in modeling large traffic flow, while our method can generate micro and more complicated human behavior. In addition, comparison with Xuemin's [50] method in Table 3 Figure 1b indicates that our method could be more aggressive. As shown in the early phase of lane change, the compared method is more conservative because, when there is a conflict of interest, the subject vehicle will choose a suboptimal candidate trajectory without conflict. However, the proposed method is adversarial because the algorithm assumes that the obstacle driver will yield eventually. When the collision weight is set to infinite, the trajectories selected by the algorithm are the same as the compared method as is validated in Figure 3(2a). It does not matter what P is because, when the collision cost is too high, our method will always choose a conservative alternative, which aligns with the literature. Furthermore, to fully generate human pedestrians' behavior, we will need to construct the transitional matrix. A case in point is shown in Note S4, Supporting Information, demonstrating that the technique can be generalized to other road users.
Moreover, we compared our method with the existing method proposed by Hang et al. [3] In Table 3 Figure 2b, vehicle 2 is assumed to be an aggressive driver. In our case, α sv th,v1 ¼ 0.1, α sv th,v2 ¼ 1, both methods indicate that the subject vehicle maintains a relatively low velocity. However, there are two major differences. In Table 3 Figure 2d, the subject vehicle of Hang et al. method accelerates and maintains its speed at %21 m s À1 . However, to enlarge the space for lane change, our vehicle 1 decelerates and then starts to accelerate at 5 s to restart a lane   Figure 9. Turing test results.
www.advancedsciencenews.com www.advintellsyst.com change. As can be seen, our method outperforms in lane changing time: proactive decelerating increases grid distance for a faster lane change. As for driver 2's behavior, though Hang's method accelerates intensively and maintains at %22 m s À1 , our vehicle 2 does not accelerate much because vehicle 1 is faster, as can be seen as more aggressive, from 0 to 1 s. Though the aggressiveness threshold of our vehicle 2 is 1, it accelerates conservatively to ensure driver 1's true aggressiveness. However, from 4 to 8 s, the vehicle using our method accelerates to a relatively high speed, in order to get rid of the lane changing vehicle. The other situation is the opposite of case one. As can be found in Table 3  According to the above comparisons, we can observe that at the decision level, the proposed method concurs with the state-of-the-art literature and is more flexible and intelligent with respect to scene generation. Though we compare the proposed method with some methodologies that are proven to be humanlike, to further evaluate the human-likeness of this method, a variation of the Turing test is conducted.
In the Turing test, most people cannot distinguish whether the opponent is human or algorithm as 50% of them have %50% accuracy. Moreover, the other participant's scores are close to 50%. This indicates our method can confuse the participants so that they cannot distinguish whether it is algorithmgenerated or controlled by a real human driver. Thus, the proposed method is human-like and an effective scheme for scene generation for driving intelligence tests. Meanwhile, the accident rate is 14% (most of them are caused by participant 1, rear-end collision), which is higher than usual, [1] indicating that the participants are actively testing the subject vehicle. This is higher than normal driving but aligns with what we conveyed to the drivers before the test: drive normally as a primary task but also test the subject vehicle, which makes our results more reliable. However, there are still some drawbacks in the proposed work. One major drawback as reported by the participants is that there are cases when driver A acts so indecisively. This is because when the random driving style (α sv th ) is too small, the algorithm behaves extremely cautiously, which does not take place in reality. There is also an overshoot problem for both participants and algorithm. Algorithm overshoot usually takes place in the early stage of lane changing because, when α sv th is too small, the driver will try to move away from the obstacle driver. With regard to real human driver, the overshoot usually happens in the late stage of lane changing because when they accelerate too heavily, they cannot control the vehicle properly. These cases are rare; thus, we think the results are valid. In future work, we may take the control level into consideration to generate more human-like behavior. In addition, we will increase the number of participants and set a baseline for the test.

Conclusions
This article presents a human-like decision-making algorithm for driving intelligence tests. The interaction model of road users is first established using the Bayesian game theory. In addition to an extreme conservative or aggressive choice, a probing behavior is generated using the proposed method based on the cost and relative aggressiveness probability. To evaluate the opponent's aggressiveness, an observation model is established and customized experimentally. Additionally, the driver's probing strategy generation method is developed to test the real aggressiveness of the background vehicle. The strategy is reflected on the vehicle's behavior through a proposed Markov method. Next, the proposed methodology is compared with commonly used approaches and state-of-the-art literature. The comparison indicates that our method concurs with previous researches and can generate more complex and human-like behavior. Finally, the human-likeness of our algorithm is evaluated using the Turing test. The test results indicate that the participants cannot distinguish human behavior from that generated by our algorithm.
Although the proposed method is designed for scene generation, it may also shed some light on the autonomous driving algorithm. One of the major challenges in autonomous driving is the uncertainty regarding traffic. Instead of passively accepting the probability, we may actively take a few small steps to reduce the entropy without compromising on safety as is specified in this article. Current researches focus on prediction accuracy and learning convergence, which is supposedly a trade-off between perception/ computation burden and accuracy; the more the data available and more powerful the computer, the better is the decision. Thus, we may eventually predict the future, and obtain a best decision. However, this demand is endless. The proposed decision algorithm, and prediction and aggressiveness estimation methodology are simple and direct, and computationally efficient because we do not insist on the global best decision, which is the same for normal human drivers; when human drivers are confused, they attempt small corrections, which are simple but powerful.
Additionally, the proposed Turing test framework might be applied to autonomous driving algorithm evaluation. This unified and objective method can be used by other researchers and manufacturers in developing self-driving algorithms for assessing human-likeness.
Our future work will focus on human control level. Other human behaviors, such as human distraction, and control latency will also be considered for more human-like behavior in autonomous tests. Furthermore, the Markov method will be replaced with a better approximator that can be even more tightly connected to the strategy. Moreover, we will focus on a more general Turing test procedure with more participants.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.