Deep adaptive learning for safe and efﬁcient navigation of pedestrian dynamics

An efﬁcient and safe evacuation of passengers is important during emergencies. Over-capacity on a route can cause an increased evacuation time. Decision making is essential to optimally guide and distribute pedestrians to multiple routes while ensuring safety. Developing an optimal pedestrian path planning route while considering learning dynamics and uncertainties in the environment generated from pedestrian behaviour is challenging. While previous evacuation planning studies have focused on either simulation of realistic behaviours or simple route planning, the best route decisions with several intermediate decision-points, especially under real-time changing environments, have not been considered. This paper develops an optimal navigation model providing more navigation guidance for evacuation emergencies to minimize the total evacuation time while considering the inﬂuence of other passengers based on the social-force model. The integration of the optimal navigation model was ultimately able to reduce the overall evacuation time of multiple scenarios presented with two different overall pedestrian totals. The overall maximum evacuation time savings presented was 10.6%.


INTRODUCTION
Pedestrian evacuation in emergencies is a significant and ubiquitous problem especially when non-optimal pedestrian evacuations strategies are considered. Non-optimal evacuations can cause unnecessarily prolonged evacuation times and increased injuries or fatalities. When an emergency occurs, pedestrians are often required to make quick decisions under high timesensitive pressure while dealing with uncertainty and unfamiliar surroundings [1][2][3]. Successful decision-making is largely dependent on the extent of information available, accuracy of the information given, limited processing time [4][5][6][7][8][9], as well as, several interdependent human factors (e.g. decision-making, cognition and perception) and environmental factors (e.g. layout, pedestrian density, visual range and obstacles, type of emergency etc.) [10][11][12][13][14][15][16]. The uniqueness of the emergent situation makes computational modelling an essential tool for policy analysis and design [17,18].
While there are numerous successful models like social force models [19,20], and agent-based models [21,22] that address high-density crowds, there is a glaring lack of effective modelling techniques targeted at low-to medium-density pedestrian situations. Furthermore, previous studies have focused on either pedestrians' strategic route planning or pedestrians' physical movements without considering the interactions between these two levels. Most of the previous studies have focused on the simulation of realistic pedestrian dynamics, but the decision-making process has been static, and hard to adapt to dynamic environment changes. Especially when there are multiple stationary obstacles (e.g. chairs, walls) and moving obstacles (e.g. other pedestrians), the optimal strategy keeps changing accordingly.
To safely reduce the congestion effect among other agents and ultimately produce a shorter evacuation time, an adaptive routing strategy per individual passenger is required. In this study, while previously developed social force model [22] is trained by real-world data to provide a realistic simulation of pedestrian behaviours under emergent situations, a dynamic routing model is developed to suggest the best options to evacuate faster. A performance of evacuation times will be compared against our optimally guided path planning model and social force model without optimal path guidance.
We present a computational modelling framework expanded from previously developed social force model applied to an evacuation simulation. The pedestrian dynamics patterns are integrated into the path planning model solved by Markov Decision Process (MDP). While there are numerous limitations from previous approaches, the contribution of this paper is summarized as follows: 1. Model the process of pedestrian making choices by identifying a realistic decision and integrating with local pedestrian movements in a complex environment of an emergency evacuation 2. Integrate social force modelling with the optimally guided path planning model for emergency evacuations 3. Simulate newly integrated optimally guided path planning model applied to a pedestrian evacuation 4. Evaluate and compare the performance of average pedestrian evacuation time with and without optimal path guidance

LITERATURE REVIEW
In prior related work [20,[23][24][25][26][27], we developed a particle dynamics pedestrian movement model that tracks the movement of individual passengers in airplanes, airports, and combined it with stochastic infection dynamics. In Liu and Namilae et al. (2018) [22], human panic behaviours were considered, and several simulations models were developed based on review of previous studies, using social forces models and agent-based models. These factors include pedestrian density, environmental barriers, layout, numbers of evacuees, age, and gender factors. The significant effects of these factors on the efficiency of evacuations were identified and these results laid out a solid foundation for data collection, mathematical formulation, and simulation analysis for addressing the human behaviours under emergency. Other recent pedestrian evacuation studies using social force models such as Zhou et al. [28] have studied route choice of pedestrians based on density, distance, and capacity factors. The study provided valuable insight at understanding effective strategies that result in quick evacuation response times depending on the number of evacuating pedestrians. MDP framework has shown promise with the ability to solve path planning problems [29][30][31][32][33][34][35]. MDPs have also been used for modelling problems such as crowd simulation [36], optimal dispatching [37], dynamic vehicle routing [38], e-bike drivers decision-making during signal change [39], evacuation planning, and dispatching [40,41]. Sidran et al. (2018) solved an MDP using Mixed Integer Programming for an evacuation route planning problem in a closed loop. This method exhibited 90% performance of an optimal MDP problem. Jeong et al.
(2014) solved a mission planning problem involving multi-UAV surveillance by formulating an MDP [42]. Burlet et al. (2004) present an MDP-based planning tool for mobile robots. Nardi et al. (2019) proposed an uncertainty augmented MDP for path planning on road networks that consider the uncertainty of a robot's location. The approach taken was able to trade off safety and travel time utilizing the information on the robot's uncertainty. Achour et al. (2010) was able to achieve improved execution time of MDP in a robot path planning example. Allamraju et al. (2014) provided a non-stationary MDP based model for Unmanned Aerial Systems to avoid the potential of a UAV being seen by a human for privacy reasons. The goal was to plan a route that avoided the anticipation of where the human population would be based on time of day, year, and season.
This study integrates a social force model and our optimally guided path planning model. The optimally guided path planning model navigates a group of pedestrians on a separate guided route that avoids the social force model path. The information on the local environment and human behavioural characteristics are formulated into a reward matrix to re-plan pedestrians' path or adapt to the changes in the environments. This approach incorporates modelling of the nonlinear characteristics of the human decision-making processes beyond simple rule-based models. The main goal is to decrease the overall evacuation time with the incorporation of the optimally guided path planning model.

Social force model
The social force model is used to describe/model each pedestrian's individual moving behaviour within a group and environment, as if they are subjected to "social forces," These forces are a measure for individual's internal motivation and goal seeking to take certain actions, such as avoid collision or taking a detour. Regardless of what path planning they are using, individuals always use social force models to guide their movement. We model each mobile pedestrian as a particle and immobile objects like walls as groups of stationary particles. The evolution of pedestrian i and their interaction with other pedestrians and stationary objects are modelled by molecular dynamics like a social force model [19]. The net force acting on the pedestrian i can be defined as: with the pedestrian i's position at a given time obtained by integratingr i (t ) = ∫v i (t )dt (the desired velocity of pedestrian i), v i (t ) (that of the actual velocity), m i (the mass of pedestrian i) and (the evolution time constant). The momentum generated by a pedestrian i's intention, denoted by , results in a self-propulsion force that is balanced by a repulsion force to obstacles in the direction of motion. The implementation of social force model in Any logic used in this work include repulsive force for collision avoidance, used previously by   [23]. We introduce location dependence to the desired velocity in the self-propulsion term as: whereê v = cos( )̂+ sin( )̂is the direction of desired motion. v A and i  B are the deterministic and stochastic components of desired velocity, is a distance constant such that at distance between i th and k th pedestrians the desired velocity of i th pedestrian is zero. The active particle evolution based on these equations in certain instances will be modified to match targeted behaviour like in agent-based models. The walking speeds of individuals vary by age, group, and sex [43]. In crowded locations like airports and mass gatherings, pedestrians often need to circumvent stationary and mobile obstacles. While the social force model works in simple situations, modifications to equations of motion are required for trajectories involving obstacle avoidance. While the social force model works in simple situations, modifications to equations of motion as shown in Equation (2) are required for trajectories involving obstacle avoidance and formation of pedestrian queues.

Pedestrian dynamic agent based simulation
We develop a simulation model of a midsize Airport (KTAR) by incorporating the social force model (Section 3.1) in agentbased simulation software AnyLogic [14,16]. As seen in Figure 1, KTAR has six doors for evacuation and in the model, it was assumed that the middle exit is blocked by danger and not accessible. All the passengers would use the other five doors equally (i.e. each door had a 16.67% chance to be chosen by passengers to evacuate). In our experiment, we are comparing this equal/random exit selection strategy with the shortest queue strategy. As in an emergent situation, pedestrian might not be rational to choose the right exit, so it is possible this random policy would occur, especially for those who are not familiar with the environment. Passengers were expected to evacuate in the following six steps: (1) leave the aircraft and enter the terminal through the aerobridge at the second floor; (2) proceed to the escalator or stairs that connect the second floor with the first floor; (3) make a choice between using escalator or stairs; (4) reach the first floor via the choice they made; (5) choose one of the available doors to exit; (6) proceed to the selected door and leave the terminal. However, it should be noted that only the passengers inside the airport were considered; consequently, the time taken by the passengers to evacuate the aircraft and reach the airport was not considered for this study.
The walking speed of the pedestrians in this study was set between 1.2 and 1.8 m/s. According to [44], in the emergency evacuation simulation, the passenger's walking speed was uni- The framework for our Optimally Guided Path Planning Model Safe and Efficient Navigation of Pedestrian for Evacuations. The social force model (Pedestrian Dynamics Model) provides density input for the optimally guided path planning model. The two models integrated together provide a realistic evacuation model for crowds. The integrated model can be expanded for policy designs for Information Diffusion and Epidemic Spread formed distributed between 1.2 and 1.8 m/s. In addition to the sources of data mentioned above, the average speed of pedestrians on escalator and stairs were also provided by previous studies [22]. For the data collection device, this study was collecting the total evacuation time from when the first passenger emerged on the second floor to when the last passenger left the terminal.
One thousand passengers were generated in the gate area on the second floor of the airport; these passengers then ran to escalator and stairs down to the first floor to evacuate, simulating an emergency scenario on the first floor of the airport terminal. There are six doors in the first floor terminal; it was assumed the middle exit is blocked and not accessible. Thus, passengers can only use the other five doors to evacuate the airport. The authors in this paper previously worked on this airport evacuation simulation. The use of the aircraft scenario proposed in this section is used to illustrate that the social force dynamics model has been successful previously at pedestrian evacuations in airport scenarios previously and has been justified in previous research to use in evacuation scenarios. The subsequent simulation presented later in this work uses the same social force dynamics model presented in this section, in a new environment in cooperation with the new optimally guided modelling technique to further reduce the evacuation time. Our current work expands the previous simulation by incorporating the optimally guided path planning model to better navigate the evacuation pedestrians.

3.3
Optimally guided path planning model A Markov chain is a random process in discrete time that is a sequence of a random variable. We can describe a Markov chain tuple as follows < S, A, P, R >. There are a set of finite states S = {S 1 , S 2 , … , S N }, N being the number of possible states. A is a finite set of actions A = {A 1 , A 2 , … , A n }, n is the number of possible actions. Actions allow agents to transition from one state to another. A state transition P : SXAXS ′ − > [0, 1], is the transition function of probabilities of transition from state S to state S ′ . R is defined as the Reward function. The transition probability (P i j ) matrix is a (I x J) matrix that can be denoted as: R (S, A, S ′ ) is the reward vector, defined as the reward received for a transition from state S to State S ′ . This model rewards state is determined by the pedestrian movements taken by the social force model (this will be discussed in detail in Section 4.1).
We developed a reinforcement learning technique to learn local navigation behaviours and simulate dynamic pedestrian behaviours when there is an emergency at airport buildings. This model determines intermediate goals for each pedestrian, which is a key input for the time evolution of pedestrian trajectories. Previous reinforcement learning studies have used desired velocity and the proximity to other pedestrians; when the intermediate goal is moved along with the environment, previously learned information interferes with the task of finding the new goal. The sub-agent model was developed separately. This introduces stochastic and dynamic terms (e.g. self-propulsion) to provide a more realistic learning process and simulation.
To guarantee global goal-seeking in complex dynamic environments, we formulate the problem, with the main objective of maximizing the expected accumulative reward r of with paired action and space (s, a). The Q look-up table provides training of the best action a ∈ A in a finite action space, based on pedestrian's state s ∈ S , for optimal learning. The discount factor, , balances the immediate reward (exploration) and future reward (exploitation). A negative reward is given for crashing stationary objects and another pedestrian at the airport in an evacuation scenario to learn a proper action.
The pedestrian's state S = (v p , a v , d g , v i p , d i p , a i p , d j p , a j p ) has high dimensional space with features: v p (velocity of the current pedestrian as an out from the social force model); a v (angle of the velocity vector relative to the reference line), d g (distance to the goal);v i p (velocity of the nearest pedestrian i in current pedes-  Figure 2, the initial pedestrian dynamic model provides the velocity v p (t=1) as an input to the reinforcement learning model in the initial stage (s t =1 , a t =1 ). In each time step, the optimal action of the reinforcement learning model serves as an input to velocity model for more accurate velocity information v p (t=2) in the dynamic environment. The velocity is used to update the next time stage state and action (s t =2 , a t =2 ) in the reinforcement learning model [45].
The SARSA (State-Action-Reward-State-Action) algorithm [46] is similar to the Q-Learning algorithm except it is an on-policy algorithm for Temporal Difference (TD) Learning. This algorithm varies slightly from Q-Learning; when updating, the maximum reward is not always chosen and updates Q-values. Rather the SARSA algorithm utilizes the same policy that was generated from the previous action for the next upcoming action. This is accomplished by completing the show the areas that are not highly dense. The upper limit of the density range is set by default to 1.5 m∕p 2 by AnyLogic software. The pedestrian density map enables us to determine the most densely populated areas and incorporate into our optimal path guidance model for path planning Q function updates in Equation (5). The discount factor, , balances the immediate reward (exploration) and future reward (exploitation), similar to the Q-Learning approach 4 RESEARCH APPROACH

Environment
The Optimally Guided Path Planning model environment for reinforcement learning created is a 10 x 10 grid world (500 sq. in × 500 sq in.), with eleven obstacles and a terminal state (exit) (100 sq. in. opening) shown in Figure 6. The pedestrian icon represents the starting location and the exit signs represent the terminal states. Currently, the reward functions value is determined by the states pedestrian density value and whether the goal of reaching the terminal state is met. The density map data is provided based on the social force model. The density data is a direct input into our optimal path planning model's reward map. Highly dense areas have a higher negative reward value and lower dense areas have a lower negative reward value. The reward for reaching the terminal state is +100. The social force pedestrian movement provides an input to the magnitude of the reward values of the optimally guided path planning model. A pedestrian density map (Figure 3) was provided to identify the highly traversed and congested areas. To generate the pedestrian density map, the simulation was simulated following social force dynamics using AnyLogic, and the results were recorded after the entire simulation was complete. A density value was associated with each square inch. AnyLogic has a built-in function "maximumDensity" that will display and record the maximum observed density values. This value refers to the maximum total density of that location (per square inch) for the entire FIGURE 4 A Pedestrian Density Map generated from our evacuation scenario produced by AnyLogic, with a upper limit density range set to 3.0 p/m 2 . The simulation is of 50 pedestrians navigating throughout the environment to the exit location. The red shaded areas pictured show the most populated areas of the environment. The blue shaded areas show the areas that are not highly dense. The pedestrian density map enables us to determine the most densely populated areas and incorporate into our optimal path guidance model for path planning

FIGURE 5
A Pedestrian Density Map generated from our evacuation scenario produced by AnyLogic, with a upper limit density range set to 5.0 p/m 2 . The simulation is of 50 pedestrians navigating throughout the environment to the exit location. The red shaded areas pictured show the most populated areas of the environment. The blue shaded areas show the areas that are not highly dense. The pedestrian density map enables us to determine the most densely populated areas and incorporate into our optimal path guidance model for path planning simulation. Following the completion of the simulation, the results were stored in a csv file via AnyLogic. MATLAB software was used to generate a density map based on the maximum density values recorded (as seen in Figure 3). The default and recommended upper limit of density is set to 1.5 p/m 2 by AnyLogic. Figures 4 and 5 have increased upper limit values of 3 and 5 p/m 2 to illustrate analyzing the density maps of a larger density value. The highly traversed areas represent the areas shown in red on the pedestrian density map. The most densely is the bottom left of the environment. The optimally guided path is the shaded gray area. The pedestrians that are not following the social force model will use this optimally guided path to travel throughout the environment populated areas are recorded and converted to the optimally guided path planning model's environment. The highly traversed areas have a reward of -10. The highly negative rewards are to discourage navigation through highly populated areas and avoid congestion as much as possible. The orange squares represent the obstacles present. The obstacles are similar to walls and these states cannot be traversed through. The remaining state locations not mentioned previously are all set to a reward of -1. This is to avoid taking any unnecessary steps, further reducing the time it takes to exit. It is important to note that in an airport evacuation scenario, police officers and security guards would be placed along the path to guide pedestrians. This is needed to ensure pedestrians will follow the specified optimal guided path.

Agent-based evacuation simulation I Anylogic-50 pedestrians
A baseline example of 50 pedestrians, exiting the environment, is created and simulated following social force model dynamics (Figure 7). The parameters for social force models have been validated by comparison with empirical data in a number of previous publications [19,20,47,48]. We utilize these parameters in our simulation models. The path planning model changes the destination but the social force parameters are not altered. Multiple scenarios (5) are investigated regarding average evacuation time by the number of pedestrians following each model. The initial average evacuation timing of 50 pedestrians following the FIGURE 7 Pedestrian Evacuation Simulation developed in Anylogic software. The arrival rate of all the pedestrians is set to 6.67 per second. The reach tolerance for the optimally guided path pedestrians is set to 0.25 meters. The yellow agents (SF pedestrians) are the pedestrians following the social force model and the blue agents are the pedestrians following the optimally guided path planning model. Notice the social force model pedestrians are able to freely travel throughout the environment. The remaining pedestrians are guided using the optimally guided path to navigate throughout the environment social force model only was 84.95 s. The goal is to incorporate both modelling techniques together to reduce the overall time it takes to evacuate the environment. The benefit of the MDP decision making is to provide an additional optimally guided route that avoids obstacles, and highly congested areas, overall reducing the time it will take for all pedestrians to exit the room. The pedestrian walking speed is uniformly distributed between 0.5 and 1.0 m/s. The additional five scenarios involve dividing the number of pedestrians to follow different modelling techniques. Each of the 5 scenarios will increase the number of pedestrians following the optimally guided path planning model by 5 until there is an even split of 25 pedestrians following each modelling technique's path. The first scenario consists of 5 pedestrians following the optimally guided planned path and the remaining 45 pedestrians will follow the social force model dynamics. The pedestrians are divided into the following groups for the five scenarios: (45/5, 40/10, 35/15, 30/20, 25/25).

Agent-based evacuation simulation II Anylogic -200 pedestrians
This subsection presents an additional evacuation simulation (Figure 12), similar to Section 4.2, but with an increased number of pedestrians. The number of pedestrians increased by 4×, to a total of 200 pedestrians. In order to generate this simulation with increased pedestrians numbers, while attempting to keep FIGURE 8 A Pedestrian Density Map generated from our evacuation scenario produced by AnyLogic, with a upper limit density range set to 1.5 p/m 2 . The simulation is of 200 pedestrians navigating throughout the environment to the exit location. The red shaded areas pictured show the most populated areas of the environment. The blue shaded areas show the areas that are not highly dense. The pedestrian density map enables us to determine the most densely populated areas and incorporate into our optimal path guidance model for path planning FIGURE 9 A Pedestrian Density Map generated from our evacuation scenario produced by AnyLogic, with a upper limit density range set to 3.0 p/m 2 . The simulation is of 200 pedestrians navigating throughout the environment to the exit location. The red shaded areas pictured show the most populated areas of the environment. The blue shaded areas show the areas that are not highly dense. The pedestrian density map enables us to determine the most densely populated areas and incorporate into our optimal path guidance model for path planning the originality of the initial simulation, a few adjustments were considered. The first adjustment was the environment size was increased to 1000 square inches × 1000 square inches. Secondly, due to the increase of pedestrians, the density map was updated (refer to Figure 8). The default and recommended upper density is set to 1.5 p/m 2 . Supplementary density maps are shown in Figure 9 (increased upper limit value set to 3.0 p/m 2 ) and Figure 10 (increased upper limit value set to 5.0 p/m 2 ). As a result of the change in the density map, a new path was FIGURE 10 A Pedestrian Density Map generated from our evacuation scenario produced by AnyLogic, with a upper limit density range set to 5.0 p/m 2 . The simulation is of 200 pedestrians navigating throughout the environment to the exit location. The red shaded areas pictured show the most populated areas of the environment. The blue shaded areas show the areas that are not highly dense. The pedestrian density map enables us to determine the most densely populated areas and incorporate into our optimal path guidance model for path planning generated ( Figure 11) to incorporate the changes in pedestrians movement and behaviour. Lastly, the arrival rate was reduced by 50% to reduce the bottleneck created (near the origin) in the beginning of the simulation. Four scenarios (4) are simulated to determine the average evacuation time when varying the number of pedestrians following each model. The initial average evacuation timing of 200 pedestrians following the social force model only was 165.3 s. The pedestrian walking speed is uniformly distributed between 0.5 and 1.0 m/s. The additional four scenarios involve dividing the number of pedestrians to follow different modelling techniques. Each of the four scenarios will increase the number of pedestrians following the optimally guided path planning model by 10% until there is a maximum of 80 pedestrians following the optimally guided path model. We choose to set the maximum at 80 pedestrians, to ensure that we will not overload one single path. Overloading one single path can create additional bottlenecks and congestion, possibly causing more delay. The first scenario consists of 20 pedestrians following the optimally guided planned path and the remaining 180 pedestrians will follow the social force model dynamics. The pedestrians are divided into the following groups for the four scenarios: (180/20, 160/40, 140/60, 120/80).

RESULTS
The baseline example for simulation I (Section 4.2) has an average evacuation time of 84.95 s. In the remaining scenarios, we plan to improve performance by reducing the evacuation times. The evacuation times of the remaining examples are shown in Table 1. In Scenario 1, the incorporation of the five optimally guided path pedestrians is shown to reduce the overall

FIGURE 11
The Emergency Evacuation Environment. Eleven objects are placed throughout the environment. The obstacle states (orange squares) are not able to be accessed by the pedestrians. All of the pedestrian's origin is the top right of the environment, labelled Starting Location. The destination (exit) is the bottom left of the environment. The optimally guided path is the shaded gray area. The pedestrians that are not following the social force model will use this optimally guided path to travel throughout the environment evacuation time to 80.64 s; this is a 5.1% decrease of total evacuation time. In Scenario 2, five more pedestrians (10 total) are following the optimally guided path planning model. This further reduces the evacuation time to an average of 79.71 s. Scenario 3, 30% of the pedestrians are now using the optimally guided path. This increases the percentage decrease of the average evacuation time to 9.2%, and overall average evacuation time to 77.16 s. In the final scenario, 50% (maximum capacity) of the pedestrians are chosen to follow the optimally guided path. We do not overload this path by creating any additional bottlenecks or creating anymore delay. The final scenario was able to reduce the average evacuation time to 75.9 s. By being able to reduce the average evacuation time to 75.9 s, the percentage decrease of evacuation time was 10.6%.

FIGURE 12
Pedestrian Evacuation Simulation developed in Anylogic software. The arrival rate of all the pedestrians is set to 3.335 per second. The reach tolerance for the optimally guided path pedestrians is set to 0.25 m. The yellow agents (SF Pedestrians) are the pedestrians following the social force model and the blue agents are the pedestrians following the optimally guided path planning model. Notice the social force model pedestrians are able to freely travel throughout the environment. The remaining pedestrians are guided using the optimally guided path to navigate throughout the environment The results for simulation II (Section 4.3) are shown in Table 2. The baseline example for simulation II has an average evacuation time of 165.3 s. In Scenario 5, 10% of pedestrians follow the optimally guided path. The incorporation of the optimally guided path showed to reduce the evacuation time by a total average of 8.8 s. The average evacuation time was 156.5 s. In Scenario 6, 10% more pedestrians (40 total) are following the optimally guided path planning model. The evacuation time savings for this scenario was 6.9 s. The last two scenarios (Scenarios 7 and 8), results were very similar with the difference of 0.1 s. The average evacuation time savings was respectively 5.7 and 5.8 s. Our evacuation time saving results indicated the

DISCUSSION AND CONCLUSION
Based on the work conducted in this study, this research successfully incorporates a social force model into an optimally guided path planning model, for the evacuation of a group of pedestrians. Pedestrian navigation safety is critical to saving lives and preventing injury in a multitude of emergencies such as active shooter incidents and fires. By integrating the two models, we can further optimize mobility in emergency evacuation situations. Our optimally guided path planning model incorporates the social force model's density map and assigns highly congested areas to negative reward states. The negative reward states are used to avoid congested areas and pathways and advise pedestrians to follow the optimally guided path. For simulation I, the baseline example evacuation time was an average 84.95 s. Utilizing the two integrated models, we were able to show a reduction in the average evacuation time of at least 4.31 s or a 5.1% evacuation time decrease. The total evacuation time was able to be reduced to a total average of 75.9 s and a 10.6% decrease in average evacuation time compared to the baseline example. For simulation II, in a more dense environment, the baseline example's evacuation time was an average of 165.3 s. Integrating the two modelling techniques, we were able to provide a total evacuation time savings of 8.8 s. In simulation I, where 50 pedestrians are present, the optimal effect of path optimization is splitting the pedestrians evenly. Fifty percent of the pedestrians followed the social force model, and the remaining 50% follow the optimally guided path. This provides a 10.1% evacuation time savings. In simulation II, where 200 pedestrians are present, the optimal effect of path optimization is to allow 10% of pedestrians to follow the optimally guided path. Following this approach provided an average time savings of 8.8 s.
In the future, we will expand the current MDP framework to a Time-Dependent MDP (TDMP). The time-dependent MDP will incorporate dynamic reward changing based on the social force models' movement in current time. Using a TDMP may reduce the overall evacuation time by providing optimal routing choices in real-time with rewards being updated at each time step. Expanding our current modelling framework will allow for incorporation of a more robust and dynamic reward function. The reward function will allow an updating of the reward map based on the current states pedestrian's density map, and the agents distance from the goal. In addition, we would like to simulate more scenarios incorporating an increased number of pedestrians, which will further provide even more realistic results for the emergent situation in the real world. To incorporate an increased number of pedestrians, other factors would need to be placed into consideration, such as dimensions of the environment, the number, location and size of the exit, as these factors will impact the overall efficiency of the evacuation process, especially with large number of the pedestrians. Another factor that will need to be taken into consideration with increased pedestrian numbers is the addition of a secondary optimally guided path. Please note that we currently implemented this integrated modelling technique using one optimally guided path (provided by MDP path planning). With an increase in the number of optimally guided paths, there is potential for an increased time savings. This will be one important factor to consider when the size of the pedestrian increases. There could be a series of rooms (e.g. 5) for 1000 pedestrians, and each room evacuation can be addressed by this paper. Multiple path planning is certainly interesting extension to this paper in future research. Additionally, consideration of complicated pedestrian behaviours was not considered and is beyond the scope of this current work. However, the model has the ability to be changed and the integration of these two models can consider other scenarios with more behaviours of pedestrians in future studies. We plan to apply this integrated modelling technique to a simulated airport emergency evacuation. The belief is that applying this modelling technique will reduce the total overall evacuation time of pedestrians in an emergency airport situation. The findings of this project outcome will lead to a multidisciplinary computational framework for understanding and modelling the human decision-making process and resulting actions in emergency evacuations.