Dynamic Motion Planning Model for Multirobot Using Graph Neural Network and Historical Information

In order to effectively improve the path‐finding capability of a multirobot system in a decentralized control approach, a dynamic motion planning model based on graph neural networks and historical information (GNNHIM) is proposed. Due to the limited sensing range of the robot's onboard sensors, GNNHIM uses convolutional neural networks to extract features from the heterogeneous environmental information collected and analyzes the motion trends of other robots in conjunction with the historical path information stored locally. After finishing feature fusion, each robot exchanges its feature vectors with other robots still within sensing range to obtain the next action by the graph neural network model processing. Herein, imitation learning is used to train robots to choose behavioral strategies to maximize team benefit. The experimental results show that GNNHIM can make path planning for multirobot systems more efficient and reliable, enabling all robots to reach their own goal within a limited time in 95.1% of the experimental dataset. Moreover, GNNHIM still has great generalization when applied to unseen scenarios and larger scale robot systems. GNNHIM in unseen scenarios has improved the success rate by 6%‐34% and reduced the detour percentage by 3.8% on average compared to the other scheduling model.

DOI: 10.1002/aisy.202300036 In order to effectively improve the path-finding capability of a multirobot system in a decentralized control approach, a dynamic motion planning model based on graph neural networks and historical information (GNNHIM) is proposed. Due to the limited sensing range of the robot's onboard sensors, GNNHIM uses convolutional neural networks to extract features from the heterogeneous environmental information collected and analyzes the motion trends of other robots in conjunction with the historical path information stored locally. After finishing feature fusion, each robot exchanges its feature vectors with other robots still within sensing range to obtain the next action by the graph neural network model processing. Herein, imitation learning is used to train robots to choose behavioral strategies to maximize team benefit. The experimental results show that GNNHIM can make path planning for multirobot systems more efficient and reliable, enabling all robots to reach their own goal within a limited time in 95.1% of the experimental dataset. Moreover, GNNHIM still has great generalization when applied to unseen scenarios and larger scale robot systems. GNNHIM in unseen scenarios has improved the success rate by 6%-34% and reduced the detour percentage by 3.8% on average compared to the other scheduling model.
aid of the rest of the information can trigger deadlock easily. [13] However, robots can obtain more data that cannot be observed in the local field of view from others by transferring information, so communication-based methods are often better than noncommunication methods. [14] Among the communication-based methods, many studies use broadcast-based transmission of information. [15][16][17][18] Although the broadcast style is easy to implement and deploy in the actual scene, it often generates a large amount of redundant information that leads to unnecessary consumption of network and data processing. [19] Therefore, we designed a path planning model that can only communicate with neighboring robots to minimize the network load. Due to the superior self-learning capability of neural networks, Yu et al. introduced them to the field of human-robot collaboration and proposed a hybrid framework based on vision and force sensing, which allows robots to perform complex human-robot cooperative transportation tasks. [20,21] In order to improve the generalization of the model, we use neural networks to extract the features of the sensor data so that the robot can make the next action in real time based on local observation.
The major contributions of this paper are: 1) A dynamic motion planning model based on graph neural networks and historical information (GNNHIM) is proposed. The mobile robot can combine the surrounding environment information obtained from onboard sensors and feature information communicating with others to plan the next action in real time; 2) The planning model only communicates with neighboring robots to avoid information flooding in making the necessary communication and considers the surrounding robots' historical trajectories to analyze their future movement trends; and 3) We show that GNNHIM can improve the real-time decision-making ability of each robot and the efficiency of the whole swarm of robots, including seen and unseen scenarios.
The remaining sections of this paper are organized as follows. In Section 2, related work on centralized and decentralized methods in the field of path planning for mobile robots is introduced. Section 3 presents the research problem and the knowledge of graph neural networks used. In Section 4, we propose a dynamic motion planning model GNNHIM and describe its training process in detail. Section 5 describes the experimental setup and performance metrics and introduces the simulation results in Section 6. Finally, in Section 7, we summarize the research results.

Centralized Approach
Traditional centralized planning methods mainly contain A*-based algorithms, Conflict-Based Search (CBS) and Increasing Cost Tree Search (ICTS). A* algorithm is representative of heuristic search algorithms. When applied directly to the path-finding problem, the search space and branching factor will show exponential growth with the number of robots, so the efficiency and quality of A* in solving the multirobot systems are not high. In order to prevent the number of branching factors from proliferating, Wagner et al. proposed an M* algorithm that can realize dynamic search. [22] Based on this basis, they further introduced operator decomposition technology and proposed an ODrM* algorithm to obtain better solutions. [23] Currently, the advanced centralized planning method is CBS, [24] which uses a two-stage search to guarantee optimal solutions. The low-level search finds the optimal path for each robot independently, and the high-level search uses a constraint tree to resolve conflicts among the robots. However, CBS performs poorly with highly coupled robots and has difficulty meeting the time limit in large scenarios. [25] Similar to CBS, there also exist two levels of search in ICTS. [26] The difference is that ICTS finds all robots' individually optimal paths at the high-level search and verifies each state node at the low-level search to check a set of optimal solutions.
In addition, Silver proposed HCA* algorithm that computes each robot's path according to a predefined priority and records the entire path node and time nodes in a reservation table when completing a path computation. [27] Path nodes and time nodes already exist in the reservation table that cannot be used during subsequent path computations, which can guarantee that no conflicts occur between paths. To speed up the processing time, it is also possible to map all possible conflicting scenarios and corresponding solutions into numbers and store them in the database. For example, the DDM approach builds solution database for all 2 Â 3 and 3 Â 3 subgraphs and ensures quick local conflict resolution via database retrievals. [28] However, DDM can be only applied to warehouse-like environments and cannot be used in narrow aisle-type maps.

Decentralized Approach
With the increasing intelligence of end devices, robots have sufficient environment perception and autonomous decisionmaking ability to finish tasks. Global replanning (Replan) is an interaction-aware path planning method. [29] Each robot independently searches a feasible path using the A* algorithm based on global information and will search again to find a new alternative path when there is a conflict with others while driving. ORCA continuously adjusts its motion attitude according to the relative positions and velocities of neighboring robots to prevent collisions. [30] However, Replan and ORCA methods consider only their own tasks and are prone to deadlock in multirobot systems. To enhance the collaboration capability among robot clusters, Ge and Chen proposed a one-way broadcast communication framework in which the robots can obtain the optimal path after receiving a message through the approximation theory of neural networks and gradient optimization techniques, whose shortcomings are the large bandwidth consumption and long communication delay during communication. [31] Many studies in recent years have shown that learning-based approaches are fruitful in decentralized control strategies for robots. [32] Most methods combine deep learning or reinforcement learning to achieve path finding with good generalization and high solution quality. Liu et al. put the local environment information collected by sensors into a residual convolutional neural network-based scheduling model for analysis and finally obtain the approximate global optimal feasible path. [33] However, the method is more applicable to single-robot systems. PRIMAL method uses reinforcement learning (RL) and imitation learning (IL) to train a decentralized decision model. [12] The robot can get the next action command after inputting the observation and reward values obtained in the local environment to the model. Since reinforcement learning typically yields sparse rewards over long navigation times, to mitigate the decreasing trend in planning effectiveness as the navigation distance increases, MAPPER method uses global information to plan a reference path from the start to the target position and generates several waypoints with reward values. [34] Unlike other RL-based methods, the robot considers how to reach the next waypoint instead of using the endpoint location as an input value at each decision.
The earlier methods either do not consider historical data in the decision-making process or do not consider the computational overhead associated with the processing of redundant information by a multirobot system. In contrast, to get a better planning solution, our proposed GNNHIM combines historical data to analyze the next potential motion direction of surrounding robots and limits the transmission of redundant information in the communication process.

Problem Formulation
Mobile robots have many applications in industrial manufacturing. Since robots can replace humans in heavy physical and highintensity handling work, many companies use mobile robots to assist workers in the inbound and outbound storage of goods, thereby increasing the throughput per unit of time. The working environment of the robot is shown in Figure 1. Each robot needs to carry items to a specified location to complete the sorting of goods. The robot completes its positioning by dynamically recognizing the QR codes arranged on the ground. The QR code distribution presents a grid format. Therefore, a mobile robot in working status is similar to a four-way drive vehicle in that it can travel in four directions: front, back, left, and right.
To better solve the path planning problem of the mobile robot, we build mathematical modeling for the above scenario. Consider a two-dimensional discrete grid environment D ∈ R 2 with size H Â W. There are N s static obstacles C s ¼ fs 1 , s 2 , : : : , s N s g in environment, with each obstacle s i ∈ C s occupying a unit grid size. The idle region C f ¼ D \ C s can be visited by any robot at any moment, but f i ∈ C f at the same moment t ∈ Z can be occupied by only one robot. Let R ¼ fr 1 , r 2 , : : : , r N , g be a team containing N robots, each robot r i ∈ R can execute five discrete action (up, down, left, right, and wait) and needs to formulate the action instructions to be executed at the beginning of each moment. The robot will move from its original position to an adjacent position within a unit of time after executing one action instruction. The position of r i at the moment t is denoted by P i ðtÞ, makespan is the maximum path cost of all robots and ‖⋅‖ represents the Euclidean distance. A path of robot r i is defined as {P i ð0Þ, P i ð1Þ, · ·· ,P i ðtÞ, · · · P i ðmakespanÞ} satisfying ∀t, kP i ðt þ 1Þ À P i ðtÞk ≤ 1. In addition, the robot will perform emergency braking to complete collision avoidance if it is too close to an obstacle or other robots during driving.
Each robot r i has a starting position start i and a target position goal i of a task satisfying ∀i, j, start i 6 ¼ start j , goal i 6 ¼ goal j . We follow the standard multirobot system setup, [35] where robots are considered to have finished the task when they reach the end point, after which the robots stay in place rather than disappear. In this article, multirobot systems do not allow vertex, edge, and swapping conflicts during path planning, but cycle conflict and swapping conflict are allowed.

Graph Neural Network
The graph G ¼ ðR, U, WÞ represents the communication relationship between robots, where R denotes the robots in the network, U and W represent the edges and weights in the communication network topology graph, respectively. The edges in the graph represent whether the two robots have the authority to communicate with each other, and the weights represent the degree of noise interference during communication. With the edge relationship, we can define 1-hop neighbors and k-hop neighbors. 1-hop neighbors are directly connected to the www.advancedsciencenews.com www.advintellsyst.com nodes in the network., e.g., the 1-hop neighbors of the node i can be defined as N 1 i ¼ fði, jÞ ∈ U t , j ∈ Rg. k-hop neighbors are those nodes that have passed through no more than k nodes from the current node. As shown in Figure 2, if the communication network is limited to delivered in 2-hop neighbors, r 1 can communicate with r 3 , but it is not possible to send data to r 4 .
Assume that each robot r i transmits feature data x i ∈ R F of length F, i.e., each node in the graph has a graph signal of length F. The graph signal of all nodes can be represented by the matrix X t ¼ ½x T 1 , : : : , x T N ∈ R NÂF . The GNN continuously aggregates the graph signal and updates the node states by using the graph shift operation (GSO) S t ∈ R NÂN . The aggregation process must follow the structure of the current graph, e.g., r 1 cannot directly receive the information of r 3 because they are not 1-hop neighbors.
GSO can be represented in various ways, such as adjacency, [36] Laplacian, [37] and Markov matrices. [38] Using different GSO in the same network will lead to different final results. Since there is no definite conclusion as to which form can obtain the best results, [39] we adopt the adjacency matrix, which is easy to understand, as the GSO.
The graph convolution operation containing K filter taps is given by where H k ∈ R FÂG is the kth filter. Each node in the graph aggregates the information shared K times by their 1-hop neighbors and then weights this information by the filter taps contained in H k . [40] Since the filter is mainly used to map a graph signal R NÂF into another one R NÂG , its value can be arbitrarily and distributed stored in the robot's memory. [41] After finishing the convolution operation, calculation results require a point-wise nonlinear transformation using the activation function σ∶R ! R to generate the final output of GNN layer.
When a GNN module contains ℒ layers of graph convolution, the GSO of each layer is the same, but the output feature at previous layer will be fed into the next layer. For example, we assume that the output dimensions of different layers are fG 1 , G 2 , : : : , G ℒ g and Q l is the output feature of the layer l. The output results of each layer are given by Generally, the number of layers of a graph neural network is at most three. Otherwise, too many layers will make the features of all nodes within the same connected component converge to the same value, affecting the performance of subsequent neural networks. [42]

Methodology
In the following sections, we first introduce the necessary data and the corresponding encoding format in GNNHIM, followed by an elaboration of the architectural details and computational process of GNNHIM, and finally, a detailed description of how to train the model using imitation learning.

Local Observation
In the real world, robots can finish environment perception through inexpensive and widely available hardware such as cameras and infrared distance sensors. [43] Therefore, in this paper, we assume that each robot can obtain local observation within a certain range centered on itself. In order to extract environmental features, we use different encoding methods for different types of information. We encode the instantaneous observed information into three channels of image data Z F i ¼ fz 1 , z 2 , z 3 g and encode the historical path information that requires continuous observation into vector data Z P i . For example, if the environment at a certain moment is as Figure 3 shows, the robot will center itself on the region R FOV Â R FOV with obstacles and other robots' position information written into the first channel z 1 and second channel z 2 of size ðR FOV þ 2Þ Â ðR FOV þ 2Þ. If the goal position of this robot locates within the sensing area, z 3 is drawn in the same way as z 1 and z 2 . Otherwise, as Figure 3 shows, we need to project the goal position on the edge of z 3 to indicate the target's orientation.
Let N h be the upper limit of historical location records and N p be the upper bound on the number of peers. Then the robot r i 's historical path trajectory can be represented by A i ðtÞ ¼ fP i ðtÞ, P i ðt À 1Þ, : : : , P i ðt À N h þ 1Þg. Finally, we define vector data by Z P i ,which contains not only the paths of itself in the past timebut also the nearest N p robots within the perceptual range, i.e., Z P i ðtÞ ¼ fA i ðtÞ ∪ A j ðtÞ, j ∈ N i ðtÞg where jZ P i ðtÞj ≤ N p þ 1. If the number of other robots in the sensing range is less than www.advancedsciencenews.com www.advintellsyst.com N p , then the vector data Z P i is filled using the dummy value (e.g., À1) to maintain the same dimension.
Thus, the module is very suitable for decentralized control for multirobots, as they only communicate with others within their sensing range and can make the next action decision independently based on the local environmental information fZ F i , Z P i g obtained through its sensors.

Model Architecture
The architecture of GNNHIM is shown in Figure 4, which mainly contains convolutional layers, normalization layers, pooling layers, etc., where all the kernels are zero-padding. The detailed calculation process is summarized in Algorithm 1. In line 1, the robot r i obtains the sensor information including image information Z F i of size 3 Â ðR FOV þ 2Þ Â ðR FOV þ 2Þ and vector data Z P i of size ðN p þ 1Þ Â ðN h Â dimÞ, where dim represents the dimensionality of the coordinate system. We assume that the environment is a grid world D ∈ R 2 , so dim = 2 in this paper.
In line 2, each robot builds a network topology graph based on the Euclidean distance between each other, and the communication network is described by G ¼ ðR, U, WÞ. If another robot r j is within r i 's sensing region, an undirected edge ðr i , r j Þ with weight 1 will be added to G, i.e., ðr i , r j Þ ∈ U and w ij ¼ 1. Otherwise, it means that ðr i , r j Þ ∈ = U, and the distance between r i and r j has exceeded the sensing range R FOV . After constructing the graph G, each robot r i can obtain the GSO and the 1-hop neighbors N i .    Figure 4. Framework of GNNHIM. The whole model is divided into four regions according to the functions to facilitate the presentation: feature extractor regions F 1 and F 2 , feature fusion region F 3 , and action decision region F 4 . Each robot can combine the image data Z F i obtained by the sensor and the vector data Z P i recorded in the onboard memory to get the next execution action. Conv (3 Â 3, 1, 64) denotes the kernel size is 3, the stride is 1, and the channel of the output feature map is 64 in Convolution layer. BN and ReLU represent the normalization layer and the activation layer, respectively. MaxPooling represents the pooling layer where both the kernel size and stride are 2. FC x represents the fully connected layer containing x channels.
www.advancedsciencenews.com www.advintellsyst.com In lines 3-4, Z F i and Z P i will be fed into the convolutional and fully connected layers to obtain the features h 1 and h 2 . After feature fusion, we can obtain the length F = 128 of the feature data h 3 (line 5). Subsequently, robot r i will send h 3 to its 1-hop neighbors and collect graph signals via k-hop communication exchanges (lines 6-10). When h 3 and other robots' graph signals are fed into the action decision region F 4 , the robot r i can obtain the execution probability of each action. Finally, the model will choose the one with the highest probability as the action command for the next moment a tþ1 i (line 11). Note that if a robot r j exists instead of a static obstacle in the forward direction of the robot r i , r i will still execute the action a tþ1 i as planned before because r j may move to another location at the next moment, thus allowing r i to follow r j without colliding. If the robot is found to be too close to each other (e.g., half the robot's body length) while driving, it applies an emergency brake to prevent hitting other robots.

Training Model
Whether the robot can reach the destination results from a series of action decisions makes it impossible to train the model directly using raw, unprocessed map data. When there is no explicitly class-labeled training set, IL is an effective way to learn control strategies. [44] To generate expert demonstration data that can be used for training, we use expert algorithms to preprocess the map data. Enhanced conflict-based search (ECBS) improves the high-level and low-level search based on CBS and considers the potential feasible paths of surrounding robots. [45] Therefore, this paper chooses ECBS as an expert algorithm for IL to enhance the implicit collaboration between robots.
In the preprocessing process, we first use ECBS to generate each robot's 'ground-truth paths' A Ã and the set of expected actions U Ã . Then, we divide expected actions according to time steps and combine them with local observation at the corresponding time to form the expert demonstration data. Thus, in the expert demonstration data, each robot r i ∈ R at any time t has the local environment information ðZ F i , Z P i Þ and the expected action U Ã i . After training, when the robot is in the same environment again during running, the model should be able to predict that the following action to be executed is U Ã i . We take the environment information in the expert sample dataset as the input of the model F and the expected action as the expected output to construct the training set T . During the training process, we look for a set of parameters θ that can minimize the cross-entropy loss of the output and the expected action

Experiment
Our model is implemented in PyTorch and conducted using a 16-core, Intel Xeon Gold 5218 CPU. The program source code and datasets of this paper can be obtained freely from https:// gitee.com/bupt_htl/gnnhim.

Dataset
We randomly generated ten scenarios with 10% static obstacle density as our dataset, and the detailed parameters are shown in Table 1. Each scenario contains 30 000 cases, and each case contains different obstacle positions and different start and target positions for all robots, just as shown in Figure 5. It is guaranteed that the start and target of the same robot must be located in the same connectivity graph to prevent the robot from failing to accomplish the task due to no "ground-truth path" to its target. The training, validation, and test sets of the dataset in this paper are 70%, 15%, and 15%, respectively. Output: Action to be performed a tþ1

Experiment Design
We designed two experiments to examine the performance of different models under different scenarios: the effectiveness experiment and the generalization experiment. The effectiveness experiment focuses on the ability of the model to solve pathplanning problems for multirobot systems when the models are trained and tested using the corresponding training and test sets of each scenario. The generalization experiment tests decentralized control models' performance in unseen scenarios and larger team sizes. In this generalization experiment, all models are trained only using a training set with 20 Â 20 maps and 10 robots and then tested directly using the test set of different scenarios. In order to objectively evaluate the performance of each model, all calculated results will be obtained after ten replicate trials. The perceptual range R FOV is set to 9 and the batch size is 64 during training. We update the network parameters using the Adam optimizer with the exponential decay rate 0.9 and 0.999 for the first and second-moment estimates, respectively, and the learning rate decays from 1e-3 to 1e-6. The success rate is calculated using the validation set after each epoch, and we choose the neural network parameters with the highest success rate as the final model.

Performance Metrics
We evaluate the algorithms using success rate and detours percentage, respectively. The specific calculation rules are shown later.
1. Success rate Success Rate ¼ n success n total (7) where n success and n total represent the number of successful cases and all cases. A case is considered successful only if all robots finish their task within the maximum allowed time steps. Since differences in map size and assigned tasks can cause changes in the time to complete all tasks, it is unfair to use a constant as the specified time of all cases. Therefore, we first use ECBS to calculate the maximum path cost of all robots (makespan) for each case and later set the specified time of the corresponding case to 3 Â makespan. The solution's quality for the whole robot team is directly related to the success rate. Even if the model can plan a practical path for most robots, the success rate will be low if one robot in each case cannot reach the goal position in the specified time.
2. Detour percentage Detour Percentage ¼ Tðstart; goalÞ À T Ã expert ðstart, goalÞ T Ã expert ðstart, goalÞ where T Ã expert ðstart, goalÞ and Tðstart, goalÞ denote the total sum of all robot travel path lengths under scheduling by the expert algorithm (ECBS) and model to be evaluated. The detour percentage reflects the total cost of the scheduling model to some extent, and lower values indicate that the paths planned by the model are closer to the optimal paths. 3. Decision time Decision time represents the time required for a robot to formulate its next execution action considering the current environment after completing the previous one. A shorter decision time allows the robot to adapt quickly to environmental changes and perform complex tasks in real time.

Results of the Effectiveness Experiment
In the effectiveness experiment, we select the GNN-based decentralized control model [41] and the centralized offline scheduling model HCA* [27] as baselines. The results are shown in Table 2 and 3. We find that the success rate of GNNHIM is higher than that of the GNN-based model in all scenarios. The average success rate of the GNN model is 88.9%, while that of GNNHIM is 95.1%, which is improved by 6.2% on average. It is also found Figure 5. Simulation environment demonstration. The numbers embedded in each block indicate each robot's unique identifier, and each robot is required to navigate autonomously from its starting point (green) to the goal point (orange). Notably, each robot's starting and ending points are unique and distinct from those of the other robots. However, it is possible for one robot's starting point to coincide with another robot's goal point.
www.advancedsciencenews.com www.advintellsyst.com that using historical information as a deciding factor in the action planning process can also reduce the robot's total path length. When the robot density is constant, the HCA* model with centralized control is hardly affected by changes in map size, with the success rate consistently in the range of 99% to 99.8%. However, when trying to schedule more robots in the map size, the success rate of the HCA* model begins to show a declining trend. Robots cannot occupy the paths of robots with higher priority under HCA* model, so more free areas will be locked as robot density increases and the eventual inability of some robots to find a viable path to the target, leading to the success rate drops from 91.2% to 76.6% at 50 Â 50 maps.
HCA* has the smallest detour percentage in most scenarios for two main reasons: 1) the detour percentage of the centralized model is generally less because the central controller assigns optional paths for individual robots in advance, which will not occur meaningless actions and path jitter problems as in the decentralized model; and 2) If no feasible solution exists due to deadlocks, the HCA* model tends to stop the robot in place. In contrast, due to the limited sensing range and local observation obtained, the decentralized model can only stop the motion after moving to where the deadlock occurred.

Results of Generalization Experiment
We use two decentralized models, GNN-based and Replan models, [29] as the generalization experiment's baselines. The success rates of different models are shown in Figure 6. When the Robot team is expanded to fivefold and sixfold size (e.g., the scenario of 45 Â 45 maps with 50 robots and 50 Â 50 maps with 60 robots), the GNN has a success rate of only 48.9% and 46.4%, while GNNHIM performs 80.5% and 80.4% in the same period.
The detour percentages of each model are shown in Figure 7. The detour percentage of GNN and Replan is higher than GNNHIM in different scenarios. This terrible performance of   Figure 6. The success rates of all decentralized control models decrease as the map size expands or the robot density increases. However, the GNNHIM model is more stable and has a high success rate in different scenarios. On the other hand, the success rate of the GNN model drops faster, and it performs worse than the nonlearning-based Replan model in several scenarios. a) Same robot density; b) Different robot densities.
www.advancedsciencenews.com www.advintellsyst.com the GNN-based model is mainly because too few elements can influence decision-making, leading to some robots visiting a position frequently within a short period when the model cannot make appropriate decisions at certain times. In contrast, Replan will reselect a new path whenever it encounters a conflict and not consider waiting for others to leave the conflict area before moving on the initially planned route. Hence, the detour percentage in Replan is always higher than in the other two models. The average detour percentages for Replan, GNN, and GNNHIM are 20.1%, 13.9%, and 10.1%, respectively. The relatively better-combined performance of GNNHIM is due to the fact that it can analyze the actions that others may perform at the next moment according to historical information of other robots working around. A higher success rate means that the fewer robots that have yet to reach the goal position at a later stage during scheduling, the lower the probability of triggering path jitter and loops and the fewer total miles traveled by all robots. Therefore, the detour percentage of GNNHIM is also less than the other two models.
Compared to GNN and Replan, GNNHIM has better generalization. Even if it is only trained in the 20 Â 20 scenario with 10 robots, it still maintains a reasonable success rate when applied to other scenarios. Excellent generalization ability means the model can have many implementations and applications in real life. On the one hand, the GNNHIM proposed in this paper can apply to the case where the number of robots may change at any time and is more suitable for accurate production scheduling in enterprises. For example, if the number of orders increases quickly in a short time in the logistics industry, managers can increase robots to complete the work, such as crating and packing goods quickly, to prevent late delivery of goods. On the other hand, it also applies to schedules in dynamic environments. When some shelves are temporarily added or reduced in the warehouse and managers shuttle around the robot work area, GNNHIM can still ensure the stable work of the multirobot system because it generates actions by considering local observations using onboard sensors.

Time Consumption of Different Models
The average decision times for different planning models are shown in Figure 8. For all models, the decision time does not include the period during which the sensor measures the surrounding environment. Decentralized planning models, such as Replan, GNN, and GNNHIM, enable robots to generate plans in real time according to the local environment perception. The centralized models, such as HCA* and ECBS, take offline computation to Figure 7. The detour percentage of the generalization experiment. GNNHIM can make more correct decisions than the other models, reducing the problem of excessive total distance traveled due to wandering. Replan performs worse than the learning-based model, and the robots need to travel longer distances in all scenarios. a) Same robot density; b) Different robot densities. www.advancedsciencenews.com www.advintellsyst.com schedule multiple robots uniformly and finally get the actions all robots should perform at each moment. To determine the average decision time for each action, we utilize a 20 Â 20 scenario with 10 robots as a representative example. It is postulated that every robot is capable of reaching its target location within 30 executed actions. Consequently, we derive the average decision time for each robot by dividing the total decision time by 300 (30 Â 10). We can find that Replan consumes the least time because it only dynamically executes the A* algorithm without considering other robots. However, its success rate may not be optimal in complex and dynamic environments. GNNHIM, as a graph neural network-based algorithm, analyzes the historical trajectories of surrounding robots and considers the potential movement trends of others before making decisions. Its decision-making time is longer than Replan and GNN, but it can achieve a higher success rate. ECBS is a complete algorithm that can assign feasible paths for all robots through a complex computational process within a long time. However, while central ed, the HCA* model does not guarantee solutions for all robots, resulting in a relatively shorter computation time than the ECBS model.
It is noteworthy that althoug the decision time of GNNHIM is not the least, it is still within an acceptable range. Its high success rate can reduce the task's delay due to the robots' late arrival at the target position and improve the work efficiency of the whole robot team. As the centralized path-planning model, the decisionmaking time of HCA* and ECBS increases gradually with the number of robots and the map size. In contrast, GNNHIM makes decisions based on a limited sensor sensing range, independent of the number of robots and scenario size, and can be applied more flexibly in different environments.

Conclusions
In this article, we propose a novel decentralized motion planning model GNNHIM, which continuously senses the surrounding environment and record historical path information of other robots within the sensing range on the onboard memory. In order to generate the following action to be executed, GNNHIM implements feature extraction from the local observation and then shares the fused features with other robots still in communication range to collaborate. The experimental results show that GNNHIM has better scheduling effects and more significant improvement in success rate and detour percentage than other models. The model's generalization has improved significantly by introducing graphical neural networks and historical information.
In this research, we assume that the robots work and communicate ideally, i.e., 1) All the decisions of the respective robots are simultaneous; 2) The communication between the robots is synchronous and time delay free. However, in actual scenarios, these hypotheses are difficult to establish. In future work, we will investigate how the motion planning model should handle asynchronous decision-making and communication in swarm robot systems under actual world conditions. In addition, we will also study the problem of maximizing the overall benefit by dynamically changing the layout of obstacles (e.g., bins and shelves) in the environment according to the task requirements.