Social Interaction-Aware Dynamical Models and Decision Making for Autonomous Vehicles

Interaction-aware Autonomous Driving (IAAD) is a rapidly growing field of research that focuses on the development of autonomous vehicles (AVs) that are capable of interacting safely and efficiently with human road users. This is a challenging task, as it requires the autonomous vehicle to be able to understand and predict the behaviour of human road users. In this literature review, the current state of IAAD research is surveyed in this work. Commencing with an examination of terminology, attention is drawn to challenges and existing models employed for modelling the behaviour of drivers and pedestrians. Next, a comprehensive review is conducted on various techniques proposed for interaction modelling, encompassing cognitive methods, machine learning approaches, and game-theoretic methods. The conclusion is reached through a discussion of potential advantages and risks associated with IAAD, along with the illumination of pivotal research inquiries necessitating future exploration.


Introduction
In the past few years, there has been an increasing interest in the development of technology for AVs, as the recent advancements in Robotics and Machine Learning have enabled Autonomous Driving (AD) engineers to develop algorithms that could tackle the complexity of the autonomous driving task.AVs have the potential to improve traffic quality, reduce traffic accidents, and improve the quality of time spent while travelling [1].Nowadays, more and more AVs are being deployed into the real world, sharing the environment with other human road users.This has raised concerns that AVs could not be able to understand and interact smoothly with other human road users, potentially leading to traffic dilemmas and safety issues [2].In order to operate in an efficient and safe manner, AVs need to behave in a humanlike fashion and generate optimal behaviours that take the interactions with other human road users into account [3].This is critical for the reduction of potential traffic conflicts.For example, cautious but unnecessary stops at intersections might cause rear-end accidents.In order to develop fully automated vehicles, advances in many aspects of AV technology are required, ranging from perception, decisionmaking, planning, and control [4,5].When it comes to predicting the behaviour of surrounding human road users and taking decisions accordingly for AVs, the interactions with surrounding human road users become increasingly important, as the AV actions affect their behaviour and vice versa [6].The purpose of this paper is to provide an exhaustive survey of state-of-the-art techniques in interactionaware motion planning and decision in the context of autonomous driving.In particular, the text first covers human road user behavioural models to highlight what factors influence the decisions that human road users make on the road.Driver and Pedestrian Behavioural Models are relevant to AVs for multiple reasons.Firstly, they can be used to assess and predict what road users that surround the AV will do.Secondly, they can aid in the development of human-like AV behaviour.Therefore, they hold both a predictive value as well as add relevant insights for model/system design.This review consists of 5 main sections which cover different areas in interaction-aware autonomous driving.The terminology used in interaction-aware autonomous driving is introduced in Section 2. Please refer to Figure 1 for an overview of the paper structure.Section 3 will cover human factors studies on what affects human decision making while driving, as well as pedestrian behavioural studies.Section 4 gives a broad overview and classification of existing techniques that are used in interaction modelling.Finally, Sections 5 and 6 cover state-of-the-art techniques used for motion-planning and decision-making in interactive scenarios.
While autonomous driving has been an active research area in recent years, most of it focuses on scenarios involving only vehicles.There is more limited work that addresses heterogeneous scenarios, which include both vehicles and pedestrians.In this paper, the focus is on heterogeneous scenarios, but Sections 5 and 6 will also cover related work that deals with scenarios without pedestrians.This is because the techniques used in these papers can be easily adapted to mixed traffic scenarios, or they can offer important insights into how to deal with the general problem of mixed traffic scenarios.

Terminology in Interaction-Aware Autonomous Driving
Before discussing the recent advances in interaction-aware motion-planning and decision-making, the paper first defines some of the terminology used in this field.In the field of autonomous driving, the term Figure 2: a) the ego-vehicle is controlled by the autonomous system, whereas surrounding traffic participants act on their own will.b) two agents interacting with each other determine an area of conflict.Reproduced with permission. [Ref.]Copyright Year, Publisher.ego-vehicle refers to the specific vehicle whose behaviour is to be controlled and studied.All other vehicles, cyclists, pedestrians, etc., that occupy a region of space around the ego-vehicle are treated as interactive obstacles and are referred to as surrounding traffic participants, see Figure 2a.Since road traffic is unlikely to become fully automated in the near future, AVs will inevitably operate in environments mixed with human road users (HRUs), such as human drivers and pedestrians.Therefore, interaction-aware autonomous driving is a field of research that focuses on developing AVs that can safely and efficiently interact with surrounding HRUs.Traditional autonomous driving approaches often treat surrounding HRUs as dynamic obstacles.However, this is not a realistic approach, as they are constantly changing their behaviour to adapt to the current situation.
Generally, multiple surrounding HRUs can give rise to space-sharing conflicts amongst themselves or with the ego-vehicle: a situation from which it can be reasonably inferred that two or more road users intend to occupy the same region of space at the same time in the near future, see Figure 2b.The agents involved in the conflict are said to display an interactive behaviour, which implies that their behaviour would have been different if the space-sharing conflict had not occurred [7].Moreover, interaction does not necessarily involve conflict.It can be explicit or implicit communications that indicate a road user's intentions and affect HRUs.For instance, drivers could plan their driving strategy based on the turning light signals of vehicles in front, even though the ego-vehicle and vehicles in front are not in the same lane, and there will be no conflict in the near future.Hence, interactive behaviour refers to the different courses of action of road users, adapting to the behaviour of others or making requests for reactions and taking actions to achieve their desired goals [8].Since interactions happen all the time when driving, it is crucial that the algorithms developed for AVs be aware of the dynamics of the interactions between agents.Such algorithms are said to be interaction-aware and are often the focus of recent autonomous driving research [9].
The safe and socially acceptable interaction-aware autonomous driving systems are currently hampered by a number of challenges [10].One challenge is a lack of innovative theories about how HRUs interact [11].This is a difficult task, as the theories to be developed are not limited to predicting and modelling HRUs' behaviour but also exploring behaviour patterns and their underlying mechanisms.Integrating AVs into road traffic as seamlessly as humans would require more advanced behaviour theories and models.Another challenge is the need to develop algorithms that can safely and efficiently interact with other HRUs and produce an AV behaviour that compels human-like standards.Figure 3 shows the main parts that make up an AV system.Raw data from sensors is processed by a Perception Module, which detects the surrounding environment and performs localisation, which allows the generating of a global route plan for the egovehicle to reach its target destination.The scene can be further interpreted, and predictions regarding surrounding traffic participants can be performed.Interaction-aware models play a major role in prediction tasks, as agents affect each other's trajectories and decisions.
Decision-making and path-planning are two of the most important tasks in autonomous driving.They are responsible for determining how the vehicle will move through its environment.Decision-making is the process of choosing an action from a set of possible options.For example, the vehicle may need to decide whether to change lanes, slow down, or stop.Path-planning is the process of generating a safe and feasible trajectory for the vehicle to follow.Decision-making and path-planning are closely related.The decision-making process typically outputs a high-level plan, such as "change lanes to the left."The pathplanning process then takes this plan and generates a detailed trajectory that the vehicle can follow.Both tasks must take into account the vehicle's current position, the vehicle's capabilities and the surrounding traffic, which is why interaction-aware models are highly relevant to these two tasks.From a control system perspective, the dynamics of the vehicle are represented by its states, i.e. position and orientation, and their time derivatives.The state of the environment is determined by the states of all dynamics and static entities.The strictly physical state-space can also be augmented with additional latent-space variables that capture, for example, the intentions [12] or the behavioural preferences of surrounding users [13], which are part of the Scene Understanding system.

Human Behaviour Studies on Interactions
This section synthesizes empirical and modelling research findings on HRU behaviour, including that of human drivers and pedestrians interacting with AVs or conventional vehicles, especially from a communication perspective.The focus is on research involving road interactions, with the aim of discovering insights that may facilitate the development of interaction-aware AVs.Studies that look at macro-traffic conditions, such as the influences of route choice, weather, or regulation, are beyond this paper's scope.

Driver Behaviour Studies
Driver behaviour models are used to predict and understand how drivers will behave in different driving scenarios.These models can be used to improve the safety and efficiency of transportation systems and aid in the process of designing AVs.Many different factors can affect driving behaviour, including individual characteristics (age, gender, personality, experience), environmental factors, i.e. road and weather conditions, and social factors, which include the driver's interactions with HRUs [14].A comprehensive overview of Driver Behavioural Models (DBM) in vehicle-vehicle interactions can be found in [15].Here, the focus will be on DBMs that are relevant to vehicle-pedestrian interactions.
The most common driver behavioural models include: • Driver's Risk Field Models: (Figure 4a) This model predicts how drivers will perceive risk in different driving situations.The DRF model is based on the idea that drivers make decisions based on their perception of risk.The results of [16] suggest that driving behaviour is governed by a cost function that takes into account the effects of noise on human perception and actions.This is similar to how  , c) data-driven model in [18] motor-control tasks are governed by cost functions.Risk perception onboard of AVs has also been analysed in [19] in driving simulator scenarios.
• Theory Based : (Figure 4b) perceptual and cognitive models.Models based on perceptual information describe driver's behaviour based on perceptual cues, e.g.distance, vehicle speed, acceleration, expansion angle, reaction times, etc [17,20].Cognitive models outline the internal state flow and motive that regulates the driver's behaviour as a psychological human being [21,11].
• Data Driven Models: (Figure 4c) this set of methods relies on analysing naturalistic driving data with machine learning to analyse driver behaviour.Data-driven models can learn generative or discriminative [22,23,18] models of human behaviour to make predictions about the driver's future decisions or preferred driving style.Model validation can be done by comparing predictions with real data and by human-in-the-loop simulations.
Existing research highlights based on naturalistic driving data analyses how drivers behave in the presence of pedestrians.In [24], the authors found that drivers tend to maintain smaller minimum lateral clearance and lower overtaking speed when overtaking pedestrians who are walking in the opposite direction, on the lane edge, or when oncoming traffic is present.Minimum lateral clearance and time-to-collision were only weakly correlated with overtaking speed.The results in [25] show that the vehicle deceleration behaviour is relative to the initial Time To Collision (TTC), subjective judgment of pedestrian crossing intention, vehicle speed, pedestrian position and crossing direction.
There is less attention paid to multi-agent settings where multiple vehicles and pedestrians interact with each other.In [26], the authors develop a Multi-agent adversarial Inverse Reinforcement Learning (IRL) framework based on data collected at a road intersection to simulate driver and pedestrian behaviour at intersections.
Overall, DBMs are a promising area of research with the potential to significantly improve the safety and efficiency of transportation systems.However, there is still much work to be done in developing and validating these models.Future research should focus on developing more comprehensive models that take into account a wider range of factors, such as the driver's internal state, the environment, and the interactions with other HRUs.

Pedestrians Behaviour Studies
Since pedestrians are considered the most vulnerable road users, lacking protective equipment and moving more slowly than other road users [27], investigating pedestrian behaviour is clearly relevant to the safety and acceptance of AVs interacting with pedestrians.Pedestrian behaviour has been the subject of extensive research for decades [28].The emergence of AVs has recently prompted many new research questions about pedestrian behaviour.Given the large body of work in this area and our aims, this Section examines major studies rather than providing an exhaustive survey.The review covers pedestrian behaviour studies regarding interactions with vehicles from three perspectives: communications, theories and models of crossing behaviour, and AV-involved applications.The aim is to identify and summarize their value for developing interaction-aware AVs.

Communications
In dynamic traffic environments, road users intentionally or unintentionally convey signalling information to one another through their movements and spatial cues, giving rise to both explicit and implicit communication.Research findings concur that AVs' kinematics and signalling information significantly impact pedestrian road behaviour due to the absence of a driver role [7,29,30].Therefore, the identification of critical motion cues and signals affecting pedestrian road behaviour holds substantial research significance (see Figure 5a).Implicit communication signals, such as vehicle kinematic cues, involve road user behaviour that affects its own movement but can be interpreted as cues of intention or movement by another road user [7].Distance or TTC between approaching vehicles and pedestrians is the most critical Implicit information influencing pedestrian behaviour [31,32].Evidence showed that pedestrians tend to rely more on distance than TTC [33].That is, for the same TTC, more pedestrians crossed when vehicles approached at higher speeds.A recent study showed pedestrians used multiple sources of information from vehicle kinematics instead of relying on one.The impacts of speed, distance, and TTC on pedestrian behaviour were mutual coupling [34].
Braking manoeuvre is another critical implicit information influencing pedestrian behaviour in interactions.Vehicle movements correlated with pedestrian trust toward vehicles, emotion, and impact on pedestrian decisions [35,36,37].Pedestrians felt comfortable and started crossing quickly when approaching vehicles slowed down early and braked lightly.Harsh braking led to pedestrian avoidance behaviour [35,38,39].On the other hand, early braking manoeuvres and strong pitching reduced the time required for pedestrians to understand the vehicle's intentions [40,41].Vehicles that approached pedestrians slowly while yielding could hinder understanding [35,37].
Traffic characteristics, such as traffic volume and gap sizes, provide implicit information to pedestrians.High traffic volume forced pedestrians to accept small traffic gaps [42], as the increased time cost increased their propensity for risk-taking [43].However, substantial evidence indicated pedestrians who tended to wait were more cautious and less likely to accept risky gaps [44,45,46].The relationship between traffic volume and pedestrian crossing behaviour is context-dependent, potentially influenced by the size and order of gaps in traffic [47].
Furthermore, the pedestrian movement toward the road, presence at the curb, and pedestrian head orientation could convey key implicit information to approaching vehicles [48,49].Pedestrians often assert their right of way by stepping onto the road or looking at approaching vehicles [30].
Explicit communication signals involve road user behaviour that conveys signalling information to other road users without affecting one's own movement or perception [7].A common case is vehicles transmitting information to pedestrians through an external human-machine interface (eHMI).In the context of AVs, where there is no human driver, eHMIs have gained importance.Substantial evidence supported eHMI's benefits in pedestrian interactions with AVs [50,35,51].Various types of eHMI prototypes have been proposed, such as headlight [52], light band [36], anthropomorphic symbols [53], but consensus on the best eHMI form and information to convey remains elusive.
Numerous studies demonstrated that eHMI performance depended on various factors.Pedestrians' familiarity, trust, and interpretation of eHMIs could significantly impact eHMIs' effectiveness in communicating information to pedestrians.For instance, pedestrians better understood conventional eHMIs (flashing headlights) as signals for vehicles yielding than novel eHMIs (light bands) [36].Pedestrians overtrusting eHMIs might lead them to under-rely on vehicle motion cues, which is dangerous if eHMIs fail [54].Egocentric information transmitted by the eHMI, like "OK TO CROSS", compelled pedestrians more than allocentric information like "STOPPING" [55].Moreover, the reliability of eHMIs was questioned as it might be affected by weather [56], light condition [29], and vehicle behaviour [35].For example, in poor weather, pedestrians could not always read vehicle signs [57].Pedestrian willingness to cross was unaffected by eHMIs when vehicles did not yield or decelerated aggressively [35,50].Other concepts, e.g., installing eHMIs on road infrastructures instead of vehicles [58] and combining eHMIs with vehicle motion cues [59,60], could outperform pure eHMIs.
Additionally, from the vehicles' perspective, although less frequent, pedestrians also use explicit signals to communicate with AVs.These signals include eye contact and hand gestures, which are used by pedestrians to ensure they are seen by AVs and to request the right of way [61,7,62].To address the absence of a human driver, AVs could utilise a human-like visual embodiment in the driver's seat and wireless communication technology to enhance vehicle-pedestrian communication [63,64,65].

Theories and models of crossing behaviour
Pedestrian crossing behaviour involves various cognitive processes.Previous studies [57,66,47] suggested that there might be three levels of processes involved in constructing pedestrian crossing behaviour in interactions, i.e., perception, decision, initiation and motion.In light of this hypothesis, the following sections synthesise the theories and models of pedestrian crossing behaviour regarding the three cognitive processes (Figure 5b).
Perception The visual perception theory, as established by Gibson [71], explains that as an object approaches an observer, its image on the retina expands, forming the basis for human collision perception.In crossing scenarios, when the rate of image expansion of a vehicle on the retina reaches a certain threshold, pedestrians perceive that the vehicle is approaching, known as the visual looming phenomenon [72].A psychophysical model simplifies this expansion rate as the change in the visual angle subtended by the approaching vehicle at the pedestrian's pupil, denoted as θ (Figure 6a) [73,34].Recent research suggested that pedestrians use θ as a crucial visual cue to observe approaching vehicles [34,11].However, while θ provides spatial information, it does not convey when the vehicle arrives at the pedestrian's position [74].In crossing scenarios with yielding vehicles, pedestrians need temporal information to estimate whether the vehicle can stop in time.Lee's mathematical demonstration [73] showed that the visual cue τ , representing the ratio of θ to θ, may specify the TTC of the approaching vehicle.Moreover, the first temporal derivative of τ , denoted as τ , was relevant for detecting if the current deceleration rate is sufficient to avoid a collision [75].Furthermore, it was found that pedestrians might visually perceive oncoming collision events under a given angle, i.e., bearing angle, which is the angle between the vehicle and the pedestrian's line of regard [76] (Figure 6b).
In addition to visual cues, pedestrian perception may depend on perceptual strategies.Research by Tian et al. [37] indicated that pedestrian estimation of vehicle behaviour might be a separate process or a sub-process of crossing decision-making.When there is a large traffic gap, pedestrians tend not to rely on vehicle driving behaviour but rather on gap size.Similarly, Delucia [74] indicated that when collision events are distant, humans tend to use 'heuristic' visual cues, such as θ and θ.However, as collisions become imminent, optical invariants like τ dominate perception, providing richer spatiotemporal information.
Besides perception mechanisms, various factors could influence pedestrians' perceptions.Studies showed that elderly or child pedestrians faced a higher collision risk due to age-related perceptual limitations [31,77].Elderly pedestrians tended to rely more on distance than TTC to judge approaching vehicles, while children struggled to detect vehicles approaching at high speeds [33,78].Distractions, particularly those involving visual and manual components like smartphone usage, diverted significant attention resources and affected pedestrian observation of traffic conditions [79].In comparison, cognitive distractions, such as listening to music, might not significantly impact pedestrian perception [44].
Decision At uncontrolled crossings without signal lights, pedestrians often interact with yielding or nonyielding vehicles [36,31,80].In non-yielding scenarios, pedestrians usually make crossing decisions by evaluating gaps between approaching vehicles, known as gap acceptance behaviour (GA) [81].This concept led to the development of critical gap models, including the models by Raff [82], HCM2010 [83], and Rasouli [84].Alternatively, binary logit models treat crossing decisions as binary variables, utilising machine learning algorithms like Artificial Neural Networks (ANN), Support Vector Machines (SVM), and Logistic Regression (LR) [85].For example, Kadali et al. [69] used ANN to predict crossing decisions based on various independent variables (Figure 6c), while Sun et al. [86] employed LR with variables such as pedestrian age, gender, group size, and vehicle type.
In scenarios involving yielding vehicles, crossing decisions tend to follow a bimodal pattern referred to as bimodal crossing behaviour (BC) [36,87,37].Pedestrians prefer to cross when traffic gaps are sufficiently large or when vehicles are about to stop [87].However, making decisions in such scenarios can be challenging due to the contrasting relationships between decision cues and collision risk, with collision risk negatively correlated to traffic gap and positively correlated to vehicle speed [37].Zhu et al. [70] Initiation model Motion model a) clustered crossing decisions into three groups: crossing, dilemma condition, and waiting, based on vehicle speed and distance (Figure 6d).Moreover, Tian et al. [67] assumed pedestrians applied different decisionmaking strategies according to BC behaviour and modelled crossing decisions as responses to different visual cues.
While the mentioned approaches model crossing decisions based on observed behaviour patterns, other models delve into the psychological mechanisms that underlie these decisions.Specifically, Tian et al. modelled pedestrian GA behaviour based on pedestrian visual cues [34] and extended it to yielding scenarios with a more complex visual perception mechanism [67].Wang et al. utilised a Reinforcement Learning (RL) model to capture pedestrian crossing behaviour based on limited perception mechanisms [88].Furthermore, a class of models, namely Evidence Accumulation (EA) models, such as the drift-diffusion model, proposed that crossing decisions result from the accumulation of visual evidence and noise, with the decision determined once a certain threshold was reached [87,89,90,91].[11] integrated large-scale psychological theories to explain pedestrian crossing decisions in detail (Figure 6e).Additionally, game theory has also been applied to model crossing decisions when pedestrians negotiate the right of way with vehicles.Conventional game theory [92], Sequential Chicken (SC) game [93], and Dual Accumulator (DA) game [94] were utilised to characterise the dynamic crossing decisions.
Environmental variability and pedestrian heterogeneity further complicate crossing decision modelling.For example, crossing multiple lanes often involves pedestrians waiting at lane lines and accepting traffic gaps successively, known as rolling gap behaviour [95,96].Pedestrians waiting at lane lines might be more likely to accept smaller traffic gaps than those waiting at curbs [95,43].Another complex scenario is crossing a two-way road, which is physically and cognitively demanding.Pedestrians need to consider vehicles on both sides [97].Similarly, crossing at intersections with dense continuous traffic is also challenging, as pedestrians need to anticipate crossing gaps upstream of traffic and make trade-offs between safety and time efficiency [98].Typically, it was assumed that as waiting time increases, pedestrians tended to accept riskier crossing opportunities [43,84].However, the latest evidence suggested that pedestrians who tended to wait were more cautious and less likely to accept risky gaps [45,44,46].Regarding pedestrian heterogeneity, ANN and LR models were applied to characterise age impact on crossing decisions by [99,69].Distractions, such as cellphone usage, could also influence pedestrian crossing decisions [100,101].[69] applied ANNs to model the impact of cellphone usage on crossing decisions.Furthermore, pedestrians often cross the road in a group, exhibiting herd behaviour.This behaviour was described by the tendency of group members to maintain a certain distance from the group centre [102].[103] used an EA model to characterize information cascades in group decision-making, taking into account the influence of previous agents' decisions.
Initiation and motion crossing initiation time (CIT), represents the time it takes for pedestrians to begin crossing the road, reflecting the dynamic nature of their decisions [111].Generally, CIT is the duration between the moment when the crossing opportunity is available and when the pedestrian begins to move [34,31].Drift-diffusion theory posited that CIT is influenced by the accumulation of noisy evidence in the cognitive system, reflecting the efficiency of pedestrian cognitive and locomotor systems [112].Various factors could affect CIT, including vehicle kinematics, age, gender, and distractions.Pedestrians tended to initiate crossing more slowly when faced with higher vehicle speeds [34].Furthermore, female pedestrians tended to initiate crossings more quickly than males [113], and the elderly tend to initiate sooner than young pedestrians [45].The impact of distractions on the CIT depends on their components [44].
In scenarios where pedestrians face non-yielding vehicles, the risk of collision increases as the distance between the vehicle and the pedestrian decreases.Therefore, pedestrians typically make rapid decisions by assessing "snapshots" of approaching vehicles [36,31].The distribution of CIT in these scenarios is often concentrated, and right-skewed [89].Response time models, such as the Ex-Gaussian and Shifted Wald (SW) distributions, were used to model CITs in these situations [114].For instance, [115] modelled CITs as variables following SW distribution (Figure 7a).
In vehicle-yielding scenarios, as discussed in Section 3.2.2,CITs exhibit a bimodal distribution [87].For the early group of CITs, the distribution is similar to that in non-yielding scenarios, as pedestrians employ similar decision-making strategies [37].However, for the late group, the distribution is complex and cannot be described by standard response time distributions [87].EA models with time-varying evidence have been proposed to address this complexity, allowing for the generation of CIT distributions with intricate shapes [87,89] (Figure 7b).Moreover, [67] modelled CITs in vehicle-yielding scenarios using the joint distribution of response time models.Additionally, [88] applied an RL model to learn the crossing initiation patterns of pedestrians.
After pedestrians initiate their crossings, they need to traverse the road.Walking is a key part of crossing behaviour and is influenced by many factors, such as the presence of approaching vehicles, infrastructures, pedestrian age, and distractions.Pedestrians adjusted their walking trajectories to avoid vehicles [105].In multi-lane crossings, they tended to move to and wait at lane lines, accepting traffic gaps in each lane sequentially [95].Pedestrian walking speeds at crossings were typically faster than normal walking speeds in other scenarios [116].While gender has no significant effect on walking speeds, teenagers and the elderly walk slower than young and middle-aged adults [116,117].Distractions, such as cellphone use, can reduce pedestrian walking speeds [44].
Walking behaviour can be simulated using microscopic pedestrian motion models, including Cellular Automata (CA) models, Social Force (SF) models, and learning-based approaches.CA models are discrete in space, time, and state, making them ideal for simulating complex dynamic systems such as pedestrianvehicle interactions [109,108].The SF models, based on Newton's second law, were utilised to simulate pedestrian-vehicle interactions and large-scale pedestrian flows [118,105,102] (Figure 7c).[106] used an SF model to simulate the crossing behaviour of pedestrian crowds in complex interaction scenarios involving low-speed vehicles.
In contrast to the above white box models, there are black box models based on learning-based approaches, which learn pedestrian walking behaviour from naturalistic datasets or in pre-defined environments.For example, [107] employed ANNs to learn pedestrian walking behaviour by incorporating the relative spatial and motion relationships between pedestrians and other objects extracted from videos.[119] used the outputs of an SF model as inputs to ANNs to simulate multiple pedestrian walking behaviours.[110] proposed a Long Short-Term Memory Network (LSTM) pedestrian trajectory prediction model (Figure 7d).Additionally, RL and IRL models were also applied to model pedestrian walking behaviour.[120] applied an RL model to learn multiple pedestrians' walking behaviour in an SF environment.[26] developed an IRL model to learn pedestrian walking behaviour from video datasets.

AV-involved applications
In recent years, there has been a growing interest in studying the interactions between Autonomous Vehicles (AVs) and pedestrians.This interest has led to a multitude of studies that apply pedestrian crossing behaviour theories and models to enhance or assess the performance of AVs in these interactions (Table 2).
One prevalent approach is the use of learning-based methods, which learn pedestrian intention and trajectory from real-world datasets to aid AVs' decision-making.For instance, [121] proposed a Graph Convolutional Neural Network-based pedestrian trajectory prediction model, which considered past pedestrian trajectories to predict both deterministic and probabilistic future trajectories for a range of AV use cases.Other similar models aimed to improve prediction accuracy by considering the social context of interactions.For example, [110] proposed an LSTM pedestrian trajectory prediction model, which considered past trajectories, pedestrian head orientations, and distance to the approaching vehicle as inputs [29].In addition, there are studies aiming to anticipate pedestrian crossing intentions.[122] applied SVM, LSTM, and ANN to predict pedestrian crossing intentions separately.
Learning-based approaches have proven effective in predicting pedestrian trajectories and intentions.However, these models demand substantial data for robust performance, limiting their scalability when dealing with interaction cases lacking sufficient data.Moreover, the black box nature of these models can make it challenging to interpret the generated trajectories and intents, which poses a challenge for AV decision-making modelling [68].To address these issues, expert models have been developed.For example, the SF model has been modified to predict pedestrian trajectories for AVs by incorporating more interaction details, such as TTC and the interaction angle between vehicles and pedestrians [99,68].Moreover, SF and CA models have also been embedded in AV decision modules to represent pedestrian crossing behaviour and guide AV decisions in interactions with pedestrians [123,124,125].
Furthermore, crossing decision models have also been applied in AV research.For example, [124] employed crossing critical gap models to characterise pedestrian crossing decisions in their AV decision module.[70] applied their speed-distance model to design defensive and competitive interaction behaviour for AVs.[99,123] used LR models as pedestrian crossing decision models in their proposed AVs decisionmaking modules.To enhance the dynamic and interactive nature of crossing decisions, game theoretical models were used to model crossing decisions when negotiating the right of way with AVs [93,125].
Researchers also attempted to use pedestrian perception theories or models to design AV decisionmaking strategies.For instance, [17] simulated AV-pedestrian coupling behaviour using visual cues, τ and bearing angle, based on control theory.[68] modelled the right of way of AVs and pedestrians using bearing angle.

Interaction Modelling
Interaction modelling techniques are relevant to a huge variety of autonomous driving tasks, ranging from traffic forecasting to AV planning and decision-making.Understanding and modelling social interactions in autonomous driving is essential for predicting scene dynamics and ensuring safe AV behaviour.Accurate predictions enhance safety, while misunderstood AV behaviour can lead to accidents.Additionally, comprehending the social impact of AV actions can influence surrounding traffic, like encouraging pedestrian crossings through early stopping.As interaction modelling techniques can be utilised across different task domains, the focus will be on dividing existing interaction modelling techniques regardless of the specific driving task they have been designed for.
An introduction to the classification of interaction modeling techniques is now presented for discussion.The first distinction can be made between learning-based methods and model-based methods.There has been extensive research in autonomous driving that makes use of machine-learning and deep-learning based techniques [126].In a learning-based approach, a model is learned from an extensive dataset.This set of methods does not require any prior knowledge of the system.Data-driven methods are trained on a dataset of examples, and then they are used to make predictions or decisions.On the opposite side of the spectrum, model-based methods start with a theoretical understanding of the system.This a priori knowledge is used to create a mathematical model of the system.Empirical data is then used to validate the model or adjust its parameters to minimize the discrepancy between the model predictions and the data.
A further distinction can be made between methods that explicitly utilise cognitive features of the human mind, which try to explain the rationale that explains human actions, and methods that only implicitly try to model interactions, trying to map environmental inputs to decisions/actions.Human behaviour studies introduced in Section 3 can serve as a guideline for the development of explicit methods.For instance, game theoretic methods (see Section 6.2) take a more explicit approach by considering traffic participants as rational agents who actively consider each other's actions.On the other hand, as an example of non-cognitive approaches, social force methods offer a more empirical perspective, capturing the impact of one participant on another without explicitly detailing the reasoning that explains the agent's behaviour during the interaction.We propose to distinguish existing modelling approaches based on whether they explicitly or implicitly model the interactions.
Based on these two criteria, four major categories of interaction modelling are identified, and they are reported in Figure 8:

Learning-based Implicit Methods
These types of methods rely on machine learning or deep learning techniques.The interactions are implicitly modelled, which means that the agent's behaviour cannot be explained by the model.The model only learns an input-output mapping from the data.Model learning can be facilitated by exploiting interactive model architectures [127,128,129,130].In general, deep learning methods that use interaction-specific neural network architectures fall into this category.
In this type of method, the aim is to learn a probabilistic generative model that predicts the agent's future actions a.The model is a probability distribution conditioned on the environment state x, which includes the state of surrounding agents, and a set of learnable parameters θ. a ∼ p θ (a|x) (1)

Learning-based Methods with Cognitive Features
This set of methods relies on explicitly handcrafted interactive features that are used as inputs for a learning-based system.This type of interactive feature can include TTC, relative distance [131], looming and reflecting some cognitive process behind human reasoning.For example, in [132], an LSTM which utilises the inter-vehicle interactions has been developed to classify surrounding vehicles' lane change intentions.The interaction features are composed of risk matrices which account for worst-case TTC with vehicles in surrounding lanes and relative distance.Graph Convolutional Networks also fall into this category, as interaction features can be explicitly modelled in the adjacency matrix of the graph [133,134].
In this type of method, the aim is to learn a probabilistic generative model that predicts the agent's future actions a, similarly to 1.In this case, the probability distribution can be conditioned on the environment state x and on explicitly handcrafted interactive features I(x), which have the purpose of facilitating learning.

Model-based Non-Cognitive Methods
The modelling is non-cognitive in the sense that the interactions do not actively reason on the cognitive process that is behind the agent's actions.Methods of this group include SF [106] and potential fields.The interactions are described by potential functions (or SF), which contain a set of learnable parameters which can be fit from empirical data.Another set of methods includes driver risk fields, which are based on the hypothesis that the driver behaviour emerges from a risk-based field [16,135].The advantage of modelbased implicit methods is that they can be easily interpreted and they can embed domain knowledge, such as traffic regulations and scene context.Some models define a potential field and define the agent's action as proportional to the gradient of such field: Otherwise, the forces can be modelled directly so that the gradient operation is not required a ∝ F(x).

Model-based Cognitive Methods
Model-based cognitive methods describe the reasoning behind human decision-making.Two main sets of methods can be distinguished: utility maximisation models and cognitive models.
In utility maximisation methods, humans are modelled as optimizers that select their actions so as to maximise their future utility.a = arg max a U (a, x) These methods include game theory and Markov Decision Processes (MDPs).In Game theoretic approaches, agents are modelled as players competing or cooperating with each other, thereby taking into account how they react to each other [136,137].The framework of game theory offers a transparent and clear-cut solution for modelling the dynamic interactions among human drivers, allowing for an understandable explanation of the decision process.However, it still remains hard to satisfy computational tractability as this approach does not scale well with an increasing number of agents.Another possible solution is to model human behaviour as an agent of an MDP, which provides an excellent framework to model decision-making in scenarios where results are influenced by both chance and the decisions made by a decision-maker.Solutions to MDPs can be found with learning-based methods, e.g.DRL algorithms or Monte Carlo Tree Search [138], or with dynamic programming techniques [139].
The second set of methods aims to capture behavioural motivations behind agents' actions with psychological cognitive processes.This set of methods can include: • Stimulus-response models [34], where driver or pedestrian actions are determined, for example, on visual stimuli in the retina; • Evidence accumulation [87], where decisions are described as a result of accumulated evidence; • Theory of mind, which suggests that humans use their understanding of others' thoughts and behaviours to make decisions.By predicting others' actions and inferring their knowledge, humans can drive effectively and safely [140,141].
In the next sections, each of these classes of interaction modelling will be analyzed in greated detail.In particular, Cognitive and Non-Cognitive learning-based methods will be discussed in Section 5. Model-based cognitive methods have already been thoroughly discussed in Section 3, where Social Force and Potential Fields, Driver Risk Field models, Theory of Mind, Stimulus-Response, and Evidence Accumulation models were included.Section 6 will include Utility-Based methods, which comprise MDPs (Section 6.1) and Game Theory (Section 6.2).

Learning Based Methods
Machine Learning (ML) methods are widely used in autonomous driving for a variety of tasks, including object detection [142], scene understanding [143], path planning and control [144].By learning from large amounts of data, ML methods can learn to make decisions that are more accurate and efficient than those made by humans [145].This Section will comprise both implicit and explicit learning-based methods identified in the previous Section and give a more detailed view of relevant papers.An overview of some learning-based methods is shown in Figure 9.
Thanks to recent improvements in neural networks learning representations, it is now possible to use end-to-end driving approaches that take as input the raw sensor readings to output control commands, such as steering and throttle, to solve path-planning and control problems [146].However, it is challenging to learn the entirety of the driving task from high-dimensional raw sensory data (e.g.LiDAR point clouds, camera images) as this involves learning perception and decision-making at the same time.In most of  [147], c) probabilistic graphical model in [148], d) end-to-end imitation learning network in [149].
the works, the how-to-act learning process assumes that a scene representation is available to the motionplanning and decision-making module.This actually requires splitting the end-to-end driving into two main blocks, one in which the AV learns how-to-see and one in which it learns how-to-act.
There are two main approaches to end-to-end self-driving for planning and control tasks (how-to-act): • Imitation Learning: in which an agent learns to mimic the behaviour of an expert [150,151,152].
• Deep Reinforcement Learning (DRL): in which an agent tries to learn how to act in a trial-anderror process that typically takes place in a simulated environment.DRL methods will be analysed in greater detail in Section 6.
Imitation learning is a machine learning paradigm in which an agent learns to perform tasks by imitating the behaviour of expert demonstrators, making it a valuable approach for training autonomous systems and robots.In [151], interactive features are learned by means of a Graph Attention Network (GAT).The input to this network consists of surrounding agents' kinematic information as well as a feature vector that encodes scene representation coming from a Bird's Eye View.The model is trained on synthetic data generated by an expert driver in the CARLA simulator.Imitation learning methods tend to work really well in scenarios that are similar to the training scenarios but typically fail when the scenarios diverge from the training distribution.Algorithms like Dataset Aggregation (DAgger) [153] can improve the performances of imitation learning policies by augmenting the initial training dataset with human-labelled data for unseen situations.However, asking an expert to label new training samples can be expensive or unfeasible.
In the context of scene understanding and motion prediction, deep neural networks have been extensively used.[127] et al. proposed a social-pooling operation in their neural network architecture to account for surrounding neighbours in crowd motion prediction.Similarly, [154] made use of a star-topology network with max-pooling operation to account for interaction features in multi-agent forecasting.CIDNN [128] uses LSTM to track the movement of each pedestrian in a crowd and assigns a weight to each pedestrian's motion feature based on their proximity to the target pedestrian for location prediction.The study in [129] created a dataset and proposed a framework called VP-LSTM to predict the trajectories of vehicles and pedestrians together in crowded mixed scenes by exploiting different LSTM architectures for heterogeneous agents.A Generative Adversarial Network (GAN) is applied in [130] to sample plausible predictions for any agent in the scene.The shared feature of these methods is the usage of Recurrent Neural Networks that capture spatio-temporal interaction features in conjunction with pooling operations.The pooling operation allows one to account for surrounding agents by mixing up the hidden states extracted by the LSTMs.During the social-pooling operation, the hidden states of surrounding agents become features that are used to predict the current agent motion.Diffusion models are another set of deep-learning techniques with increasing popularity in modelling spatial-temporal trajectories, which can be used for predicting both pedestrian and car trajectories [155].
Graph Convolutional Networks (GCNs) have been widely used in trajectory prediction tasks with interacting agents.In these methods, the road structure is represented as a graph, with each node representing a traffic participant.Each node can carry information such as the traffic participant's class (car, truck, pedestrian, etc.), its location, or speed.Explicit interaction can be modelled in the Adjacency Matrix of the graph, whereas the implicit part consists of the graph convolutional layers.GCNs are widely used in traffic forecasting [156,157,158,159], and have also been recently used in motion planning [160,161,162,163], especially in combination with DRL.
Other machine learning techniques that can be used to model interactions include Gaussian Processes [164] and probabilistic graphical models, including Hidden Markov Models [165,166].

Utility Based Methods
Utility-based agents [167] employ utility functions to guide decision-making, assigning values to possible world states and selecting actions leading to the highest utility.In contrast to goal-based agents, which evaluate states based on goal satisfaction, utility-based agents can handle multiple goals and factor in probability and action cost.Utility based methods encompass Markov Decision Processes (MDPs) and Game Theoretic Models which will be analysed in Sections 6.1 and 6.2.

Markov Decision Processes
MDPs are a mathematical framework used to model decision-making problems where the outcomes are partly random and partly under the control of a decision-maker.The modelling framework for MDPs is illustrated in Figure 10.Two main methods exist to solve MDPs: dynamic programming and reinforcement-learning [139].Typically the latter set of methods are more used in autonomous driving, as they are more suitable for high-dimensional state spaces.

Reinforcement Learning
Reinforcement Learning (RL) leverages Markov Decision Processes (MDPs) to model complex environments and comprises a set of algorithms to learn policies that maximise the expected reward [139].
Traditionally, Dynamic Programming is a reliable approach for this purpose, iteratively calculating the value of each state, commencing from terminal states and working backwards to the initial state.This method excels in scenarios with modest state spaces.However, it can be computationally burdensome when confronting RL challenges characterized by vast state spaces, such as the domain of Autonomous Driving.More commonly, RL augmented with deep neural networks (DRL) is used.DRL algorithms can be more sample-efficient and scalable than dynamic programming algorithms, but they can also be more complex and difficult to train.For a more detailed survey on DRL applications to autonomous driving, please refer to [144].
DRL solutions in autonomous driving will be classified based on the scenario used, the state space representation, the action space, and the algorithm used.Typical state representations used in DRL include see Figure 11: • Vector based representation: in this type of representation, information regarding surrounding vehicles, such as position and velocity, is included in a vector of fixed length [168]; • Bird's Eye View (BEV) Image: a 2D image representation of the environment surrounding the egovehicle from a top-down perspective [169]; • Occupancy grid representation: similar to a BEV image, it is a 2D discrete representation of the environment that surrounds the ego-vehicle.It is a 2D or 3D grid of cells, each of which is assigned a probability of being occupied by an obstacle, as well as segmentation information regarding what type of entity is occupying the cell [170,171].
• Graph representation: it is a way of representing the state of the environment around an AV as a graph.The nodes in the graph represent objects in the environment, such as vehicles, pedestrians, and traffic lights.The edges in the graph represent the relationships between objects, such as proximity or potential for collision.Graph representations are compact and efficient and are a promising approach to representing the state of the environment [163,162].
Vector-based representation offers a compact and efficient representation of objects at the expense of limiting traffic information to a subset of fixed dimensions of surrounding vehicles.BEV images and occupancy grids offer a simple way to represent the environment with fixed and can be easily updated.However, they can be inaccurate in environments with high clutter or uncertainty.Graph representation can easily represent the relationships between agents in a compact way.On the other hand, it can be complex and computationally expensive to update the graph as the number of surrounding agents increases.
The action space can be continuous or discrete.Continuous actions usually include the ego vehicle's longitudinal acceleration and steering angle [172].Discrete actions usually depend on the specific task being solved.For example, in a lane change scenario, discrete actions include left-lane change, keeping the current lane, or right-lane change.The lower-level controller regulates the steering and acceleration of the vehicle to execute the manoeuvre [171,173].
Whilst most DRL papers focus on vehicle-only traffic scenes, the number of papers that deal with mixed traffic scenarios or with vehicle-pedestrian interaction is more limited.Some works exist in the mobile robots' crowd navigation.In [174], DRL is used to navigate a robot in a crowd in a multi-agent setting.In [175], the model in [174] is improved by using attention-based neural networks and social pooling.An autonomous braking system was developed in [176] with a DQN agent.The authors implement a Trauma Memory, which is used to sample from collision scenarios in a way similar to Prioritized Experience Replay (PER) [177].In [178], a DQN agent is trained to avoid collisions with a crossing pedestrian and is further used to develop an ADAS system to aid drivers in pedestrian collision avoidance scenarios.Deshpande et al. [171,179] used a grid-state representation with four layers.In a similar scenario, the authors in [180] developed a SAC agent with continuous actions is used.By integrating SVO component in the reward function, the vehicle can be trained to have different social-compliant behaviours, from pro-social behaviours to more aggressive ones.
Deploying Deep Reinforcement Learning (DRL) in real-world scenarios poses a significant challenge and is an open research field.Some studies, like [181], implement DRL policies directly in real-world applications without additional fine-tuning, showcasing their effectiveness in scenarios like unsignalised intersections.Transfer learning, a sub-field of deep learning, is currently being explored to transfer knowledge from the simulation to the real world.Two main techniques include domain adaptation and domain randomisation [182].With domain randomisation, the approach aims to have a sufficiently large training dataset that encompasses the real world as a specific case [183].With domain adaptation [184], the aim is to learn from a source distribution a model that performs well on a target distribution.
Another issue related to DRL is that the learning-based strategy has high training costs and makes it difficult to achieve semantic interpretation.Recently, some researchers have focused on interpretable learning algorithms and lifelong learning algorithms to solve the above shortcomings [185].

Multi Agent Reinforcement Learning
When multiple RL agents are being deployed into the real world and interact with each other, the problem becomes Multi-Agent Reinforcement Learning (MARL).In order to deal with multi-agent systems, multiple approaches are possible.The first approach is to have a centralised controller that manages the entire fleet.By increasing the state dimension to include all vehicles and having a joint action vector, the problem can again become a single-agent problem [186].The drawback is the increased dimensionality of the state and action spaces, which can make learning more complex.Recently, graph-based representation has been employed to overcome the curse of dimensionality of the problem [186].
Another approach, which takes inspiration from level-k game theory, is to have a single DRL learner but replace some of the surrounding agents with previous copies of itself [173].This technique is similar to self-play, which is used in competitive DRL scenarios [90].Finally, the last approach is to formulate the problem with a MARL approach, where multiple learners are in parallel.A multi-agent deep deterministic policy gradient (MADDPG) method is proposed in [187], which learns a separate centralized critic for each agent, allowing each agent to have different reward functions.See [188] for an extensive review of MARL.Other applications of MARL in autonomous driving can be found in [170,169,160,189].

Partially Observable Markov Decision Processes
Partially Observable Markov Decision Processes (POMDPs) are a generalisation of MDPs.If the process state s cannot be directly observed by the decision-maker, the MDP is said to be Partially Observable.POMDPs are computationally expensive but provide a general framework that can model a variety of realworld decision-making processes.POMDP applications for Autonomous Driving are becoming increasingly popular thanks to hardware improvements.In [190], POMDP has been used to navigate a mobile robot in a crowd.The robot keeps a belief over possible future goals for pedestrians.POMDPs have also been utilised for car decision-making in the presence of pedestrians [191].In a POMDP, the agents surrounding the ego-vehicle are modelled as part of the environment, and a belief vector is used to model their intentions.In [189], the authors develop a multi-agent interaction-aware Decision-making policy with DRL.The problem is modelled as a POMDP, and the interaction is modelled with an attentionbased neural network mechanism.POMDPs have also been used to solve decision-making problems under environmental occlusions at intersections [192].Other applications of POMDP to interactive decisionmaking can be found in [193,194].Traditional control methods usually deal with sensor uncertainty and planning in a sequential manner, where the sensor noises and uncertainties are dealt with by a stateestimator, and then a deterministic policy is used to determine the action based on the estimated state.POMDPs, on the other hand, do not make such separation, and the policy is determined based on a belief state.Surrounding agents can be either explicitly modelled as decision-makers (MARL) or can be treated as part of the environment in which a single agent operates (RL or DRL) [195,172].

Game Theoretic Models
Game theory is the study of mathematical models of strategical interactions between rational agents [197].Game theory primarily applies to economics but has also emerged in autonomous driving.In particular, dynamic non-cooperative game theory is of particular importance for Autonomous Driving [198].Game theory is dynamic if it involves multiple decisions and the order of such decisions is important, and it is non-cooperative if each person involved pursues their own interest, which is partly conflicting with other people's interest.The dynamic non-cooperative game theory comprises both discrete and continuous time games, and it provides a natural extension of optimal control to multi-agent settings [197].
Game Theory examines equilibrium solutions under optimal player assumptions, with multiple concepts applicable to trajectory games.Dynamic games classify into open-loop and feedback games based on available information; open-loop assumes that the only information available to each player is the initial state of the game.For feedback games, the information available to each agent is the current state of the game.Although the second type of game describes more accurately the AD setting, Open-Loop solutions are often preferred for their simplicity.Common equilibria in Autonomous Driving include Open-Loop Nash, Open-Loop Stackelberg, Closed-Loop Nash, and Closed-Loop Stackelberg equilibria.For more details on the topic see [197].
When the agents' dynamics have to comply with a set of constraints, such as collision-avoidance constraints, the equilibria are called generalized.Generalized equilibria are studied in [220].Numerical solutions to the problems of Open Loop Nash Equilibria can be found in [217,221,222].The drawback of the Open-Loop Nash equilibrium formulation is that the players cannot reason directly on how their actions influence the behaviour of surrounding agents.A first simplification of this is the Open-loop Stackelberg equilibrium, which has been applied, for instance, in [203] in the context of drone-autonomous racing.In a Stackelberg competition, the leader makes the first move, and subsequent players follow sequentially, allowing those with higher precedence to consider how those with lower precedence will plan their actions.
In [207], the authors propose a sequential bimatrix game approach to autonomous racing based on an Open-Loop Stackelberg game formulation.Other applications of Stackelberg formulation can be found in [216,208,209].A formulation for solving generalized Feedback Nash equilibria can be found in [223].
Other methods can be found in [201,202,205].Sadigh et al. [198] model AV-human interaction as a Partially Observable Stochastic Game in a Stackelberg competition.The human estimates the AV's plan and acts accordingly, while the AV optimizes its own actions, assuming indirect control over the human's actions.
Typically, game-theoretic approaches suffer from the following problems: (1) the computational complexity grows exponentially in the number of agents and with the increasing temporal horizon, (2) they assume that the utility function that justifies other agents' actions is known to the ego-vehicle and that the agents act rationally with respect to those reward functions -however it is a known fact in game-theoretic finances problems that humans often do not act rationally [20]; (3) the behaviour of the agents might be stochastic and solving for mixed or behavioural strategies makes the computations even more intractable.Naturally, game theory also has the great advantage of capturing the interdependence of actions and some exact solutions exist for a restricted number of problems.Many papers in the field of game-theoretic autonomous driving try to alleviate those issues by further simplifying the problem or by finding approximate solutions.Now, a look will be taken at some papers in the field to analyze their simplifying hypotheses.

Research Scenario
Game Theory Model Agents Action [199] Lane Change Nash Equilibrium Up to 4 Both discrete (Lane change) and continuous a [200] Autonomous Racing Nash Equilibrium 2 Continuous [201,202] Intersection Nash Equilibrium 3 Continuous (ω and a) [203] Drone Racing Nash Equilibrium 2 Continuous (ω and v) [204] Autonomous Racing Nash Equilibrium 2 Continuous (a and δ) [205] Autonomous Racing Nash Equilibrium 2 Continuous (a and δ) [13] Merging Nash Equilibrium, SVO 2 Continuous (a and δ) [206] Highway, Lane Change Nash Equilibrium, DRL 2 Discrete (a and lane change) [207] Autonomous Racing Nash Eq. and Stackelberg Eq. 2 Discrete trajectories [198] Highway navigation, Stackelberg Eq. 2 Continuous (a and δ) Intersection [208] Lane Change Stackelberg Eq.Level-k theories break down the Nash-equilibrium rational expectations logic by assuming people see others as being less sophisticated than themselves.This is level-k reasoning, in which the iteration process is stopped after k steps.Other agents are modelled as level k-1 actors.A level-k agent assumes that all the other agents are level-(k-1), makes predictions based on this assumption, and responds accordingly.In [219], level-k reasoning is applied to a roundabout scenario.This approach has also been incorporated in an RL framework in [206]: the authors restrict the problem to two interacting agents and solve the Markov Game with two vehicles with a DQN RL-based approach.In [218], level-k reasoning is adopted to resolve conflicts at intersections.The authors showed that conflicts can be resolved easily in the case when the ego-vehicle is a level-k agent and all surrounding vehicles are level-k-1 or inferior.However, the number of collisions increases when both agents are of the same level, which indicates that further improvements need to be added to tackle scenarios with agents of the same kind, which would be crucial in the case of multiple AVs.
In order to keep computational complexity tractable, the number of agents can be reduced by identifying a subset of all agents that interact with the ego-vehicle [210,214].The time horizon can also be limited by considering a receding horizon controller or by implying hierarchical game-theoretic planning.The latter consists of having a short-horizon tactical planner in combination with a long-horizon strategic planner.The first one is responsible for accurate dynamics modelling of the problem, and the second one is responsible for deciding the strategy with approximate dynamics.Examples of this approach can be found in [210,212].
Iterative linear-quadratic (LQ) methods are increasingly common in robotics and control communities.The authors of [201] formulate the problem as a general-sum differential game characterized by nonlinear system dynamics.In [202] extend their methods to systems with feedback-linearizable dynamics.
Another method to solve game theoretic problems is to use Iterative Best Response to calculate Pure Nash Equilibria, i.e.Nash Equilibria in pure strategies.The authors of [216] propose a "sensitivity enhanced" iterative best response solver.In [204] an online game-theoretic trajectory planner based on the IBR is presented.The planner is suitable for online planning and exhibits complex behaviours in competitive racing scenarios.Williams et al. [200] propose an IBR algorithm together with an informationtheoretic planner for controlling two ground vehicles in close proximity.
In [13], Schwarting et al. propose an alternative method to the iterative best response to solve the Nash equilibrium problem based on a reformulation of the optimisation problem as a local single-level optimisation using the Karush-Kuhn-Tucker conditions.In [137], game theory is used to model other vehicles' decision-making.They propose a Parallel-Game Interaction Model (PGIM) to serve active and socially compliant driving interactions.To address uncertainty in the environment, [205] extend the game theoretic Nash equilibrium concept to POMDPs.In [215], the authors take uncertainty about the intention of other agents by constructing multiple hypotheses about the objectives and constraints of other agents in the scene.

Discussion and Future Challenges
In this comprehensive review, two critical sections central to the advancement of autonomous driving were introduced: Human Behaviour Studies and Interaction Modelling.These sections form the base for understanding and optimizing the intricate dynamics of interactions within autonomous driving scenarios.In this Section, challenges and research directions are highlighted for future autonomous vehicle research in interaction scenarios.

Human Behaviour Studies
Driven by society's strong desire for autonomous driving, human behaviour research has once again become a hot topic in recent years, especially research in AV contexts.In order to better understand pedestrian behaviour during AV interactions, many challenges still need to be overcome.
In general, the exploration of Driver Behavioral Models holds promise as a research area with the potential to bring about substantial enhancements in the safety and efficiency of transportation systems.Nevertheless, there is a substantial amount of work yet to be undertaken in the development and validation of these models.Future research should prioritise the creation of more holistic models that encompass a broader spectrum of factors, including the driver's psychological state, the surrounding environment, and interactions with other human participants on the road.
For pedestrian behaviour studies, one critical challenge is communication.Firstly, although most researchers agree on the effectiveness of eHMIs, a consensus is still lacking regarding their contents, forms, and perspectives [224].An open question persists regarding whether eHMIs should be anthropomorphic or non-anthropomorphic.A similar question arises for textual and non-textual eHMIs.Furthermore, given the presence of multiple pedestrians on the road, the current eHMIs are primarily designed for one-to-one encounters, which may mislead other pedestrians [225].Numerous comparable problems persist, which impede the standardisation of the eHMI.On the other hand, since implicit signals, such as vehicle kinematics, are widely accepted, pervasive, common, and reliable communication methods, their critical roles cannot be ignored [226].Although researchers have made initial attempts to influence pedestrians by manipulating implicit signals, such as vehicle deceleration rate, lateral distance, and pitch, these efforts are insufficient to ensure safe and efficient communication [35,227,228].These communication methods lack relevant theoretical support to demonstrate the accurate and effective delivery of communication information.Additionally, there is a dearth of reliable research paradigms to guide the research methodology, including vehicle driving behaviour design, subjective and objective experimental design, and more.Additionally, how to combine eHMI and implicit signals efficiently and smoothly to take advantage of both parties is also an interesting research direction.
Another challenge is pedestrian behaviour studies.Pedestrian decision-making and behavioural patterns are influenced by the diversity of interaction situations, traffic environments, and participants.However, these aspects currently lack sufficient research attention.Existing studies often focus on specific and simple interaction scenarios to control variables or simplify research complexity.However, real-life situations involve a plethora of complex scenarios, including crossings at multi-lane, two-way, or unstructured roads, crossings facing dense continuous traffic flow, multi-pedestrian crossing scenarios, and more.Moreover, pedestrian heterogeneity, such as gender, age, distraction, and group effects, also plays a significant role in interactions.It is noteworthy that many influential factors, such as waiting time and distraction, still lack a consensus.Accordingly, Due to the dearth of sufficient and reliable results, research conclusions mostly rely on assumptions, highlighting the inadequacy of understanding the underlying mechanisms of pedestrian road behaviour.
Regarding pedestrian behaviour modelling, learning-based methods have been appealing in recent years.The end-to-end deep neural networks can effectively capture complex behavioural mechanisms and have made considerable progress in the field of pedestrian intention prediction and trajectory prediction.However, its black box nature cannot be ignored.These approaches require a significant amount of data to achieve robust performance, which limits their scalability to sporadic cases with insufficient data.Additionally, the black box models have difficulties in explaining their decision-making and behaviour logic, which brings new problems to modelling.In contrast, expert models, such as the Social Force models, Evidence Accumulation models, or Game Theoretical models, have solid psychological and behavioural foundations, and their behavioural decision-making logic is clear and interpretable.However, most of these models have only been validated on limited datasets or are still in the stage of laboratory validation, lacking extensive engineering practice.Hence, the theories of expert models need to be further refined and extensively verified on a large number of real datasets in the future.In addition, expert models and data-driven models have advantages in different aspects.A possible future trend is to find a balance point where the two models are used together.
Finally, considering that only a small fraction of the overall literature on autonomous driving explicitly considers the behaviour of pedestrians, there is a need to increase the application of pedestrian behaviour models, potentially including but not limited to pedestrian behaviour prediction, AV behaviour design, and virtual AV validation.

Interaction Modelling
As autonomous driving technology continues to evolve, research in interaction modelling will play a critical role in addressing challenges and driving the development of safer and more reliable autonomous vehicles.
One prominent approach that has garnered attention in autonomous driving research is the use of learning-based methods.These methods offer the appeal of end-to-end solutions [5], directly mapping sensory inputs and destination knowledge into the ego vehicle's actions.However, such systems can behave like a black-box, leading to issues with interpretability in case of faults and the validation of their models.Furthermore, the enormity of the task, i.e., learning the entire driving process, poses a significant challenge.Hence, current research endeavours are breaking this down into sub-tasks, including route planning, perception, motion planning, and control and utilising learning-based methods to address these partial challenges.
The advantage of learning interactive behaviours through imitation learning or simulation in Deep Reinforcement Learning (DRL) approaches is also gaining momentum.Nevertheless, challenges persist.Most decision-making with Deep Learning assumes ideal road scenarios and a perfect perception of the surrounding environment.Yet, real-world conditions often involve occlusions, sensor noise, and environmental anomalies.Maintaining system performance in these sporadic events and handling partial or noisy information is an ongoing research challenge.Uncertainty seeps in from unpredictable behaviours in surrounding traffic participants, as well as sensor noise and vehicle models.Furthermore, models trained in simulated environments (like DRL models) raise the question of how to bridge the gap between simulation and reality.Several strategies have been proposed, including making simulations more realistic, domain randomisation, and domain adaptation [183].These approaches aim to prepare models for the unpredictability and complexity of the real world, ensuring that what they've learned in simulation can be applied effectively on the road.
An alternative approach to learning-based methods is model-based methods.This set of methods includes Game Theoretic models, Behavioural Models (which have been discussed in the previous Section), Social Forces and Potential Fields.
Game theory offers versatility and adaptability and can effectively handle various scenarios without relying on specific data distributions.One of its key advantages is the ability to address planning and prediction for agents in a given situation.However, there's a trade-off in terms of computation.As the number of agents and the time horizon grow, the computational burden increases.Researchers have put forth several strategies to enhance game-theoretic solutions, including hierarchical game-theoretic formulations [212], limiting the optimisation problem for surrounding agents to approximate solutions [198], level-k game theory [219], or improving nonlinear optimisation solvers' performances [217,202,201].
On the other hand, Social Force or Potential Fields methods offer a fast-computing solution.They can be used to predict surrounding agents' behaviour but also for ego-vehicle control.SFMs rely on simplified assumptions about human behaviour.They often treat pedestrians as particles or agents with fixed characteristics, neglecting the cognitive aspects of human decision-making, which can lead to unrealistic representations of complex and dynamic human behaviours.Future research directions for these methods include incorporating cognitive elements or contextual information, such as road rules and traffic signals.
Exploring the integration of machine learning techniques to improve the adaptability and predictive power of SFMs can also be a possible future research direction.
The research predominantly concentrates on vehicle-vehicle interactions, which undoubtedly play a crucial role in autonomous driving.However, there is a pressing need to develop methods that address interactions with human road users, particularly pedestrians.As the realm of autonomous driving continues to evolve, the elucidation of theories and models that govern communication and interaction with diverse road users takes on an increasingly technical relevance, promising to propel safety and efficiency within autonomous driving scenarios.

Figure 1 :
Figure 1: Flow chart: form human behaviour to social interaction aware AVs.

Figure 3 :
Figure 3: Architecture of AV systems.Solid line boxes identify modules that are closely related to interaction-aware models.

Figure 4 :
Figure 4: Illustration of Driver Behaviour Models.a) driver risk field from [16], b) joint theory-based model in[17], c) data-driven model in[18] Initiation and motion: • Response time modelling • Cognitive modelling • Pedestrian walking behaviour modelling: CA, SF, and data-driven models

Figure 5 :
Figure 5: a) Communication between pedestrians and automated vehicles.b) Theories and models for pedestrian crossing perception, decision, initiation, and motion.

Figure 8 :
Figure 8: A map of state-of-the-art techniques in interaction-aware autonomous driving.

Figure 9 :
Figure 9: Overview figure of deep learning methods in interaction-aware tasks.a) GCNs can be used for both node-level predictions of surrounding agents behaviour as well as ego-vehicle motion generation (graph-level output), b) social-pooling operation in[147], c) probabilistic graphical model in[148], d) end-to-end imitation learning network in[149].

Figure 10 :
Figure 10: MDP framework.An agent takes an action that affects the environment state.The updated environment state is used to take the next action, and the cycle repeats.The reward function is used to define the objective of the MDP, which is to maximise the expected cumulative reward over time.

Figure 11 :
Figure 11: Illustration of state representations typically used in AD. a) vector representation, b) grid-based, c) Bird's Eye View, d) graph.

Table 1 :
Pedestrian models and theories

Table 2 :
Applications of pedestrian theories and models in AV contexts

Table 3 :
DRL overview table

Table 4 :
Game Theory Models for Decision Making in Various Scenarios