A novel divergence measure–based routing algorithm in large-scale satellite networks

In recent years, large-scale satellite networks have been studied emphatically due to its advantages in high throughput and low latency. Many companies and institutions are keen to build large-scale low earth orbit satellite constellations to provide space-based Internet services, which poses a great demand for the design of efﬁcient and reliable routing schemes. Three attribute indexes to evaluate the performance of the satellites are proposed. With the Dempster–Shafer theory, the attributes can be fused for the routing decision-making process. Based on the Message Identiﬁcation divergence (M-I divergence) measure method, a Belief Message Importance divergence based Routing algorithm is proposed. By adjusting the characteristic parameter, it can amplify the divergence between distance-based evidence and others properly. In this way, Belief Message Importance divergence-based Routing attempts to ﬁnd the optimal shortest path according to the status of intermediate nodes. Compared with existing satellite routing schemes, the algorithm that is proposed can approach the optimal routing performance, increasing the throughput by


INTRODUCTION
In recent years, the research on satellite networks is attracting increasing attention due to the rapid development of wireless technology and the continuous reduction of the cost of space materials [1,2]. Compared with other data acquisition methods, the satellite networks can achieve global wireless coverage, which plays an indispensable role in remote sensing and telemetry, meteorological forecast and mobile communication. As their functional connectivity and coverage are not limited by the geographical environment, satellite networks effectively extend the application scope of the ground network, integrate the ground and space network resources and can provide diversified data services for global users [3,4].
With the development of hardware integration technology, low earth orbit (LEO) satellites can be quickly deployed in space networks. LEO satellite has low round-trip delay, low propagation loss, which can support a higher data transmission rate and provide more reliable communications and data return services [5]. Given these outstanding features, with the continu-communication congestion and satellite node failure, and so forth. To solve the above defects, the virtual node routing algorithms have been proposed. The most representative of the virtual node routing strategy is the distributed routing algorithm [12,13]. Since global information does not need to be stored, this kind of algorithm can significantly reduce the cost of satellite computing and save a lot of storage space. However, this kind of distributed algorithm only considers local information and cannot solve the impact of satellite traffic congestion and node failure from a global perspective. If few neighbouring satellites fail at the same time, it is difficult to guarantee the success rate. In [14], the authors proposed a distributed contact graph routing (CGR) design for LSSNs. The basic idea of CGR is to utilize a series of pre-scheduled time-varying contact graphs to calculate the optimal routing [15,16]. When a message is sent from a source satellite node, the CGR algorithm will analyse the contact graph and calculate the set of next-hop nodes, then select the transmission path in the set. When the data reaches the next-hop node, the CGR will continue to run the process on the next-hop node until it is to the end.
Because the satellites are mainly used for data transmitting and receiving, the traditional routing schemes mainly focus on how to find the shortest path from the source satellite to the destination. However, the 'many to one' data transmission mode may occur at certain points where the orbitals intersect. For instance, when the neighbour nodes of a node located at the intersection of two orbits are all processing several vital events, the congestion may occur this time, resulting in congestion. Congestion can cause a large number of data packets to be discarded, reducing the processing ability and reliability of data transmission. When considering the status of satellites [17], those satellites in areas with high traffic density are obviously not good at data transfer service. Moreover, a regional service will fail if a satellite is subjected to a physical attack or a cyberattack [18]. Meanwhile, there will be unnecessary energy consumption and transmission delay due to data retransmission and link closure caused by packet loss. In the energy-constrained satellite network, those frequently used satellite nodes will have significant energy fluctuations, especially those that do not have sufficient solar energy supply, will have significant energy decline, affecting the processing ability and may even reduce the lifetime. Thus, it is essential to analyse the state of neighbour satellites and avoid satellites with poor attributes.
Based on the above discussions, a novel node state evaluation distributed routing scheme for LSSNs is in need. By adopting multi-attribute decision-making method, to design such a routing scheme, the most important step is to fuse the multiple state attribute values in order to determine the best selection. Considering the fundamentals of data transmission, the routing scheme should still focus on the shortest path. Then an efficient and reliable routing strategy (Belief Message Importance divergencebased Routing (BMIR)) is proposed for satellites with different status to compute and analyse the next-hop path autonomously.
The main contributions of this paper are as follows: (1) From the perspective of evaluating multiple status attributes, BMIR establishes the node evaluation function.
That is, the attributes of the nodes are considered comprehensively and abstracted into three evidence indexes: transmission distance ratio, packet processing efficiency and residual energy. (2) To better describe the divergence between two evidence distributions and focus on the shortest path, we propose a belief message importance (BMI) divergence measure scheme with an adjustable parameter, denoted by . (3) Based on Dempster-Shafer (DS) evidence theory and BMI, we propose a flexible and efficient routing algorithm for the next-hop decision-making process. We innovatively define 1 , 2 and 3 to adjust the divergence between every two evidences. By increasing the parameter properly, the divergence between two evidences can be amplified to highlight the distinction before fusing. Considering the operation of each round of the network, we objectively determine the weight of each index according to the divergence values. We also analyse the effect of two extreme cases on the value of the parameter and gave the experience thresholds. (4) Significantly, we evaluate the proposed routing strategy and compare it with other traditional algorithms and similar entropy-based routing algorithms. Simulation results show that our proposed algorithms can achieve the optimal routing performance with lower delay, lower packet drop rate (PDR) and higher throughput. The rest of the paper is given in seven sections. Section 2 gives a brief introduction to the motivation of the routing design and provides the network model. Then, the most important fusion theory is discussed. Based on these, three evidences are proposed in Section 3. In Section 4, we review the Belief Jensen Shannon (BJS) divergence measure scheme and propose the BMI scheme. Some properties are given here to show its superiority. Then Section 5 analyses the adjustable parameter with its effect and experience interval. The simulations and conclusions are drawn in Section 6 and Section 7, respectively.

Motivation
The LSSNs mainly consist of thousands of LEO satellites that are evenly distributed from the poles to the equator. The highspeed movement of LEO satellite networks leads to the constant change of topology. Compared with terrestrial networks, its routing mechanism will face the following challenges: (1) With the increasing number of LEO satellites, massive data packets of multiple services will inevitably pass through the same satellite node, especially those in the traffic-intensive area.
Thus, it will affect the performance of end-to-end transmission delay and throughput. (2) The on-board processing and storage capacity of resources is limited, so it is imperative to make the full use of remaining resources. (3) The increase of adjacent satellite nodes results in the diversity of routing strategies. To improve satellite routing performance, it needs to consider Motivated by these facts, a novel routing strategy is proposed to better utilize the above limited conditions and provide comprehensive node information for routing calculation.

Network model
The topology of LSSNs is shown in Figure 1. Satellites are distributed in different orbits, that is, the main constellation of the StarLink system contains 32 orbits with 50 satellites in each orbit [19]. Orbits cross each other as planned. Satellites communicate with their neighbours via ISLs. All satellites are isomorphic, with broadcasting, processing, forwarding and receiving functions. Due to different service demand provided to neighbour satellites and earth stations, the initial status and the energy cost of the satellites are not the same, which will be detailed in Section 3. Suppose all satellites in the system have the ephemeris of the others, it means one satellite can independently get the trajectory and the meeting interval of each approaching and departing satellites. Based on the above, the packets delivery process can be described as follows: Once a source satellite node needs to send a data packet to the destination satellite node, it determines its neighbour nodes set by analysing the ephemeris. Several conditions shall be considered in the routing selection process. Once the best forward neighbour satellite node is determined, the data packet will be sent. Due to the extremely fast transmitting rate and on-board processing rate, the routing scheme can be seen as static routing in the transmission process. Despite this, the topology does change slightly. The information of the next-hop nodes set needs to be updated in real time as the sending satellite node changes.
The LSSNs studied are mainly used for event discovery and information collection. In the routing decision, the selection of the next satellite node is always affected by the distance and the status of the satellites in the next neighbour's set. The optimal routings determined under multiple evidences are different or even conflicted. As a classical multi-evidence decision-making method, DS evidence theory can effectively deal with this uncertain and incomplete information, which can provide a theoretical basis and fusion rules for the comprehensive judgement of satellite nodes performance. Hence in the following part, we will introduce the fusion rules of DS evidence theory.

DS evidence theory
DS evidence theory, as a generalization of Bayes' probability theory [20], was first proposed by Dempster in 1967, and then perfected by his student Shafer. With the development of multisource information, DS evidence theory has been applied in many fields, such as target recognition [21,22], artificial intelligence [23,24] and fault diagnosis [25]. We mainly adopt the basic probability assignment (BPA) function and Dempster's rule of combination.
(1) BPA function The frame of discernment ℍ is defined as a set of mutually exclusive and collectively exhaustive hypotheses of the proposition A, m(A) is the BPA function in the frame of discernment ℍ, that is the degree of support in the propo- The BPA function satisfies the following conditions: (2) Dempster's rule of combination Let m 1 and m 2 be two mutually independent BPAs over the same frame of discernment ℍ, Dempster's rule of combination, represented in the form m = m 1 ⊕ m 2 , is defined as follows: with where B, C ∈ 2 ℍ and K is the coefficient of conflict between BPAs m 1 and m 2 . K demonstrates the degree of conflict between two evidences.
The above is an example of two evidences. It is obvious to see that the DS combination rule satisfies associate law. When more than two evidences are needed as decision-making indexes, do fuse the first two evidences according to Equations (2) and (3), and then fuse the result as new evidence, so as to get the final fusion result by analogy [26].

ATTRIBUTE INDEXES DECISION METHOD IN LSSNs
DS evidence theory is applied in the routing decision-making process by fusing multi-dimensional evidences. The evidence can be the distance between the transmitting and receiving satellite nodes, the processing ability of intermediate forward nodes, the traffic congestion of neighbour nodes set, residual energy and so on. To simplify our algorithm, the above evidences can be summarized as three attribute indexes: transmission distance ratio, packet processing efficiency and residual energy.

Transmission distance ratio
This paper considers the spatial positional relationship between the current satellite node i, the destination satellite node d and the forward neighbour node n. It is readily seen that the closer the forward neighbour satellite node to the transmitting node i, the closer the straight line distance dis id is, the faster the data packet can be transmitted to d . As it can be seen from Figure 1, satellite node a is closer to the straight line from the transmitting node to the destination node than b, so node a under this index will be the preferred node as the next selection. As shown in Figure 2, satellite node a is selected to be the next hop of satellite node i, the distance should be calculated with the straight line from i to a, denoted by dis ia . However, there are multi-hops in the whole transmission process, even if one-hop distance is the straight line distance, with the next-hop changing, the trajectory should be a broken line. From the aspect of shortest path transmitting, the distance can be regarded as the spherical distance, denoted by ⌢ dis id . According to the spherical distance calculation formula, ⌢ dis id can be calculated in the following equation: (see Equation 4).
In Equation (4), R is the mean radius of the earth, height is the satellite height related to the ground, lon and lat are the longitude and latitude of the satellite, respectively. Similarly, we can have the distance between a and d , denoted by ⌢ dis ad . Hence, the expression of transmission distance ratio is as follows: Note that the larger d (a) is, the shorter the total distance will be. So the data packet is forwarded along the shortest path as much as possible.

Packet processing efficiency
In our model, the end-to-end transmission delay includes linkwaiting delay, queuing delay and free-space propagation delay. Though queuing delay may have only a small portion of the path delay, considering the queue size of intermediate nodes and packet processing ability, when packet stream bursting, queuing delay may lead to traffic congestion, resulting in the increase of the end-to-end transmission delay. In this case, those satellite nodes near the destination satellite node and the satellite nodes in areas with a high demand of services (e.g. shown in Figure 1, coloured by red) are prone to reach the limit of processing capacity, which seriously affects the routing selection. We assume that the satellite node transmits and receives packets at a constant rate. The expression of packet processing efficiency is as follows: where Q len is the queue length of the intermediate node, Q res is the residual queue length, R s is the service rate. The larger q(a) the satellite has, the higher packet processing efficiency will be.

Residual energy
Under the condition of limited satellite node energy, active intermediate nodes will consume a large amount of energy to forward data to neighbour nodes, resulting in relatively fast energy consumption. Thus, it is unable to carry out data forwarding in future time. It is obvious that the routing strategy needs to consider the facts of node activity and energy availability, so as to select the relay node for the data carried by the node more reasonably. Therefore, from the perspective of energy, the residual energy of the node can be abstractly represented as follows: where E ini is the initial state energy, E sup is the extra energy supplied by solar energy, E base is the fixed energy for processing onboard event, E trans is the energy consumption for data packet transmission, E scan is the energy for scanning neighbour nodes.
Considering the model we proposed, when E sup is continuously supplied by the sun, the residual energy seems to be sufficient. That is, if the intermediate satellite nodes that are in the next-hop selection set all fully receive solar energy supplies, they tend to have balanced (maximum) residual energy, denoted by E full . As a result, energy-based evidence has a less decisive role in the decision-making process.
However, satellites cannot be constantly illuminated by the sun, so E sup is not always replenished. When a satellite is on the dark side of the earth, E sup equals to zero, at this time, the residual energy decreases along with packets processing and forwarding. To make the energy model more clear, two thresholds are set to divide the residual energy intervals, and the residual energy distribution intervals are expressed as follows: can be treated as nodes that receive sufficient solar energy; when < e lower th , if the energy of the node continues to decrease, it shall lead to harmful effects to the satellite's lifetime, which further reduces the connectivity of the network, and the data carried by intermediate nodes will be discarded along with the low power of the node. Under this circumstance, these nodes should be considered invalid and not included in the nexthop selection. And when e lower th ≤ < e upper th , e(a) is a linear distribution with respect to E (a), reflecting the relationship between the residual energy and the selection probability. Furthermore, e(a) is a benefit index, the larger e(a) is, the higher residual energy node will be preferred when choosing the next hop.

BMI DIVERGENCE MEASURE
DS evidence theory has been successfully applied in data fusion, intelligent optimization, decision analysis and so on. Here, it is used in three evidences fusion and uncertainty inference. Actually, fusion is a kind of information processing. From the perspective of information theory, entropy is the measurement of the amount of information that an event contains. By evaluating the entropy of three evidences and giving the entropy weight, we can intuitively see how much information the different index contributes to the fusion result. Furthermore, by considering both of the credibility degrees between the evidence and the uncertainty measure of the evidence on the weight, we can obtain a more appropriately weighted average evidence before using the Dempster's combination rule.

Belief Jensen-Shanon (BJS) divergence measure
Let m 1 and m 2 be two BPAs in the frame of discernment ℍ, where A i is a hypothesis of m and ℍ contains h independent and collectively exhaustive events. The BJS divergence between m 1 and m 2 is expressed as: where S (m 1 , can also be expressed in the following form: where Three properties of the BJS divergence [26] are as follows: √ BJS(m 1 , m 2 ) satises triangle inequality.

BMI divergence measure
Note that ) that appears in BJS is a kind of relative entropy. The relative entropy as a special case of K-L divergence is a superior tool of measuring information distance. Moreover, Shannon entropy can also be regarded as a special case of K-L divergence with a uniform distribution. To better describe the divergence between two evidence distributions, a special divergence measurement [27] is given with an adjustable parameter.
By integrating the DS theory with M-I divergence, a novel divergence measure which is designed for the belief function is defined as follows: Definition 1. Let A i be a hypothesis of the belief function m, and let m 1 and m 2 be two belief functions in the same frame of discernment ℍ, containing h independent and collectively exhaustive hypotheses. The BMI divergence is defined as: where Furthermore, BMI(m 1 , m 2 ) can also be expressed in the following form:

Properties of BMI divergence measure
(1) Symmetric and always well defined.
, the relationship between BMI(m 1 , m 2 ) and BJS(m 1 , m 2 ) satisfy the following form: The proof of the above properties is given in Appendices.
Note that the m j ( j = 1, 2, 3) in our proposed model are homogeneous belief functions whose hypotheses consist of singleton sets.
Compared with BJS(m 1 , m 2 ) which can only describe the distance in the fusion process, the distance between two adjacent evidence distributions can be amplified in BMI(m 1 , m 2 ). Moreover, BMI(m 1 , m 2 ) is more sensitive than the other measurements to measure the distance between two non-adjacent distributions. Thus, it is more efficient for BMI(m 1 , m 2 ) to distinguish two distributions by adjusting the parameter properly.

BMIR ALGORITHM
In this section, we focus on a divergence measure-based routing algorithm. Three adjustable parameters 1 , 2 and 3 are proposed at first. By adjusting dynamically, we can amplify the divergence between each two evidence, hence adjust the weight of each evidence in the fusion process. In order to prove that the BMI is still inclined to select the shortest path with proper , we give the relationship between parameter and distance by analysing the extreme cases. Then, a divergence measure-based routing algorithm is proposed.

Adjustable parameter and weight allocation
BMI divergence is proposed to measure the discrepancy and conflict among the multiple attribute indexes of satellites. Considering the three evidences proposed in our model, the processing ability-based evidence and distance-based evidence are randomly distributed, and the residual energy-based evidence tends to be almost uniformly distributed. However, for distance-based evidence, the evidence values are almost equal when the source node is far from the destination node, but will vary considerably when the intermediate nodes are closed to the destination node. It is not difficult to imagine that as the routing process continues, the divergence between distance and processing efficiency becomes smaller, and the divergence between distance and residual energy becomes larger, resulting in the evidence that dominates the fusion process changes from the processing efficiency-based evidence to both two evidences.
Based on the above facts, it is necessary to increase the proportion of distance-based evidence in the fusion process. Let E 1 be the distance-based evidence, E 2 and E 3 are processing-based evidence and energy-based evidence, respectively. Definition 2 (Adjustable parameter). For these three evidences in our fusion process, define 1 to be the parameter to adjust the divergence between E 1 and E 2 , define 2 to be the parameter to adjust the divergence between E 1 and E 3 , define 3 to be the parameter to adjust the divergence between E 2 and E 3 .

Definition 3 (Weight allocation
It is readily seen that BMI 11 , BMI 22 and BMI 33 are zero. The average BMI divergence BMI i of evidence E i is calculated as: The characteristic weight under evidence E i is defined as: According to Definition 3, the greater the divergence of E i with the others, the larger the amount of information reflected by this evidence. Moreover, according to Proposition 3, with the increasing of 1 , BMI 12 becomes larger. Besides, as the node selection process proceeds, the distribution of E 1 is gradually uneven, leading to an increase on BMI 13 . Both aspects above are beneficial to BMI 1 , which satisfies our need to consider distance as the primary determinant.

Extreme cases
Since the routing strategy we proposed is still based on the shortest path, it does not ignore the influence of other factors in the selecting process. Hence, it should be noted that when adjusting parameters 1 , 2 and 3 , it should not exceed a certain threshold, in which case distance is absolutely dominant while other evidence is too weak to be effective.
Based on the spatial topology information of intermediate satellite nodes, two models are given to illustrate the extreme cases. It should be emphasized that the two cases only represent the ideal extreme cases, which do not exist in reality. Our strategy is to find the appropriate mapping relationship between and distance from these cases. A special spatial relationship of distance is shown in Figure 3.
Satellite nodes a, b and c are neighbour nodes of the current node i, and d is the destination node. The dotted line in red represents the maximum communication radius R max of node i, and is set to 1000 km. The dotted line in green represents the minimum radius of the selection range for the next hop R min . In our proposed method, the open-angle of neighbour nodes selection range is set to 90 • , in order to avoid the cases that the alternative nodes of the next-hop still contains the alternative nodes of the previous hop, R min must satisfy the relationship below: As we can see in Figure 3, from the perspective of the shortest path, a always has an advantage over other selected nodes (e.g. b and c). However, when considering the value of processingbased evidence E 2 and the value of energy-based evidence E 3 , if two values of node a are almost zero, respectively, it is unwise to still take a as the next hop. To give a numerical analysis of distance-based adjustable parameter 1 and the distance, two extreme cases are discussed below. Case 1: Satellite node b has the maximum processing ability and residual energy, the value of E 2 and E 3 are almost full, while satellite node a and c are poor in these. In this case, node b has the absolute advantage and is more likely to be the next hop, although its distance-based evidence is not as good as node a.
Case 2: Both satellite nodes b and c have the same processing ability and residual energy, and the attributes of node a are the same as in Case 1. In this case, nodes b and c are all likely to be the next hop and a still behaves worse.
Based on the above, we calculate the 1 when the distance between the current satellite node i and destination satellite node d (denoted by ⌢ dis id ) increases. As shown in Figure 4, the above surface corresponds to Case 1, and the bottom surface corresponds to Case 2. It is readily seen that as the distance dis id increases, the limit 1 is also increasing. Under these constraints, when the mapping of 1 and ⌢ dis id is close to Case 1, the selection strategy tends to be the shortest path selection strategy; when it is closed to Case 2, it still tends to be the shortest path dominant, but not as strong as the former. It should be noted that when 1 exceeds the calculated threshold, the node with the shortest distance (e.g. node a shown in Figure 3) will be chosen even if the other nodes perform better.
Case 1 ′ and Case 2 ′ will be discussed in Section 6.

Design of BMI-based routing (BMIR)
To more clearly describe the routing process, the overall procedures of the proposed BMIR are summarized in Algorithm 1. When the routing search requirements are established, the trans-mission path of data packets needs the satellites with decisionmaking capability to calculate according to DS evidence theory and BMI divergence measure. According to the searching range shown in Figure 3, the neighbour nodes set is confirmed. Then, by collecting the status of neighbour nodes, the three evidences can be established and calculated. Note that the values of different evidences can be different dimensions, that is, the processing ability and residual energy. In order to eliminate the different dimensions of each evidence, a normalization step shall be implemented: where min v a and max v a are the minimum and maximum values of each evidence, respectively. As proposed in the extreme cases, the 1 → (dis id , 2 3 ) relationship function can be used in BMI characteristic weight allocation process. All three weights are calculated and the weight of distance increases moderately according to demand. Finally, based on the data fusion rule of DS evidence theory, three values are fused with the characteristic weight calculated before. The fusion result shows the numerical comparison and the node with the maximum value should be selected.

SIMULATION RESULTS
In this section, we perform a series of simulations to evaluate the performance of the proposed routing algorithm with ALGORITHM 1 BMI Divergence Measure based Routing traditional methods. Simulation results show that our routing scheme can approach the optimal routing performance by adjusting the parameter properly.

Configuration
In this simulation, a Walker constellation is adopted to construct satellite networks by STK 11 and the hardware and software configurations of the simulation platform are: Core i9-9880H CPU, 2.30GHz, 16GB RAM, O.S. Windows 10 Professional 64 bits. We also assume that all satellites have omni-directional antennas, and each satellite can connect to multiple satellites which are distributed in its communication range at the same time. The other specific simulation parameters are shown in Table 1. Comparison routing schemes are given as follows: • Proposed routing (BMIR): BMIR algorithm based on BMI divergence measure is performed. BMIR will be executed at the source satellite node and the intermediate nodes by evaluating the distance and the status of neighbour nodes to find the optimal shortest routing path. DS-EERA adopts an entropy weight method to objectively determine the weight of evidence indexes before fusing each index value [28]. It executes the routing algorithm at each intermediate node during the transmission. • Belief Jensen-Shannon based routing (BJSR): As a similar algorithm in the previous work [26]. BJS also gives the divergence by adopting BJS measurement. To justify the improvement of BMIR, the BJS-based routing algorithm is also implemented in the simulation to compare the performance. • Analytic hierarchy process (AHP): AHP is a decision analysis method with compound criteria that are used to lower the ratio of the paired comparisons of criteria and alternatives, both discrete and continuous that will be arranged in a multilevel hierarchy. This comparison can be drawn by using the basic scale that shows interest/relative strength based on the preferences of deciders.

Routing performance
The performance comparison between BMIR and CGR versus different packet sizes is shown in Figure 5 and Figure 6. To make the performance curves easier to follow, solid lines in the figure represent the performance of PDR, and the dotted lines with throughput label in the figure represent the performance of throughput. Since CGR only calculates the shortest path without considering the status of forward nodes, it may choose the nodes with poor processing ability, which is manifested as traffic congestion, resulting in high PDR and low throughput. We can see that our BMIR routing schemes which consider the impact of multiple factors, are close to Case 1 and Case 2, named Case 1 ′ and Case 2 ′ , respectively, and can approach the optimal performance. When the packet size is small, BMIR in Case 1 ′ can FIGURE 5 PDR and throughput performance comparison versus different packet size between BMIR and CGR

FIGURE 6
End-to-end transmission delay comparison versus different packet size between BMIR and CGR offer similar performance with BMIR in Case 2 ′ . When the packet size exceeds 400 kb, the performance of BMIR in Case 1 ′ degrades compared with BMIR in Case 2 ′ . This is due to the fact that BMIR in Case 1 ′ focuses too much on distance and ignores other facts. At this point, the algorithm gradually evolves to CGR, and the nodes with poor performance may be selected.
Since the average end-to-end transmission delay increases as the packet size increases, the processing ability of intermediate nodes will highly influence the performance of delay. For CGR scheme, it executes the routing algorithm at each intermediate node, which can adapt to the complex topology while packets are transmitting, from this aspect, it should reach the best performance in transmission delay. However, those contacts in contact plan still lack flexibility. CGR is designed for global optimization. During the transmission of users' data packets, each intermediate node needs to wait for these contacts in the contact plan to set up, and then the transmission can start. Moreover, when considering the processing ability, the nodes which are chosen by executing CGR algorithm may have a short residual task queue, this further leads to the increase of queuing delay. In this case, there is no doubt that the delay performance of CGR is worse than BMIR. For BMIR algorithm, both of them weigh the shortest path and the status of forward neighbour nodes, though BMIR in Case 2 ′ may select more hops from a global perspective, the total transmission delay is even lower than that in Case 1 ′ . In general, Figure 5 and Figure 6 validate the performance of CGR and BMIR. Since BMIR, BJSR, DS-EERA and AHP all execute weight allocation methods before the fusion process, it is necessary to make a performance comparison between them. In the assessment of the relative importance of the two elements, we assumed that distance-based evidence is more important than the other two evidences. For instance, in AHP (1-2-3), distance-based evidence rated three times more important than the other two evidences, and processing-based evidence rated two times more important than energy-based evidence. The simulation results are shown in Figures 7-9. We can see BMIR in Case 2 ′ outperforms other algorithms as packet size varies. The performance gap between BMIR in Case 1 ′ , BJSR and AHP (1-5-5) are small. The reason is described as follows. Although BJSR considers the status of neighbour nodes, the weight of distance it allocates is too even to make a short path. Thus, transmission delay and excessive interference are inevitable. However, BMIR in Case 1 ′ is nearly approaching the limit case, because the mapping relationships it uses are too extreme to achieve a satisfactory result. Similarly, AHP (1-5-5) allocates too large a proportion of distance-based evidence without noticing the influence of other factors on transmission, leading to traffic congestion at intermediate nodes. Hence, these three algorithms are similar in performance. Unlike AHP (1-5-5), the importance of three evidences is considered relatively evenly without lack of inclination in AHP (1-2-3). It should be emphasized that although AHP utilizes the weight allocation strategy by making the End-to-end transmission delay comparison versus different packet size among six routing schemes pairwise comparison matrix, it does determine the weight allocation values before the beginning of global decision. And the distribution of these weight values is determined only by the preference of the decision maker or some actual measurements values. On the contrary, in the whole selection process of intermediate nodes, BMI calculates the local optimal solutions and pays attention to the distance to keep the global sub-optimal. In addition, BMI is devised to measure the discrepancy and conflict degree among the evidences, then the credibility degree deriving from BMI divergence measure is obtained to denote the reliability of the evidences. It is an objective measurement and allocation of information contained in evidence from the perspective of information entropy, not just from a subjective point of view. From the results, we can see AHP (1-2-3) and BMIR in Case 2 ′ , which both pay attention to reason- able weight allocations, outperform the others, and BMIR in Case 2 ′ performed slightly better than the former. Particularly, DS-EERA, BJSR and AHP simply give the weight of distance, without noticing the importance of one-hop distance, the selections may become inaccurate and sub-optimal in shortest path routing strategies. As we have mentioned before, considering E sup , which represents the extra energy supplied by sun, it is necessary to discuss the effects of residual energy and energy thresholds on the selection results. To evaluate the impact of the residual energy thresholds, we set the packet size as 300 kb, residual energy is not always sufficient, that is, there are insufficient and extreme cases (e.g. lower than 20%). Figures 10-12 illustrate the routing performance comparison versus different energy thresholds. As the upper threshold and lower threshold decrease, the transmission delay and throughput of BMIR in Case 2 ′ and AHP (1-2-3) change slightly because these two schemes give reasonable weight values of residual energy while ensuring that distance dominates. Therefore, these schemes are better at avoiding nodes of insufficient energy. However, the performance change in PDR is still evident. The reason is that the global weight values in AHP are assigned at the beginning, node attribute evaluation cannot be carried out flexibly with the progress of the process. Hence, the node selected may not be the optimal solution. The results of BMIR in Case 1 ′ , AHP (1-5-5) and BJS are not ideal compared with the previous two algorithms, as they did not allocate the weight of the evidences properly and paid too much attention to distance. DS-EERA still behaves the worst because it did not take into account the discrepancy and conflict among the evidence. From the results in Figures 10-12, an overall decline in the linear interval between the two thresholds will result in a deterioration in overall performance. It results from the fact that the decrease of the upper threshold would cause more nodes to be considered sufficient so as to be equally selective, but which in fact do not perform as well as nodes that are truly sufficient. And the rise of the lower threshold will result in fewer nodes with almost exhausted energy being discarded, that is, the discarding rate will be lower, resulting in the overall performance decline.

CONCLUSION
We developed an efficient and reliable routing scheme for LSSNs. Firstly, we proposed the topology description model and described the important three evidences in satellite routing problems. Thanks to DS evidence theory, multiple evidences can be fused in the routing decision-making process. Based on the DMI divergence measure method, we proposed a BMI divergence-based routing algorithm. By amplifying the divergence between distance-based evidence and others properly, BMIR can find the optimal path according to the status of intermediate nodes. Compared with the traditional routing algorithm (CGR and AHP) and other entropy-based routing algorithms, our analyses showed that the proposed routing algorithm could approach the optimal routing performance with lower delay, lower PDR and higher throughput.  Proof. where (i = 1, 2, … , 2 h ). □

A.2 Appendix B: Non-negative property Proposition 2. BMI(m 1 , m 2 ) with
> 0 is non-negative for any x with > 0. It is readily seen that the second-order derivative of f (x) with respect to x is positive, namely, x > 0. Then, we know that f (x) is a convex function for x ∈ R. According to Jensen's inequality and the concavity of log(x), we have: In particular, the equality holds if and only if m 1 (A i ) = m 2 (A i ), where (i = 1, 2, … , 2 h ).

A.3
Appendix C: Monotonicity Proposition 3. For the parameter ∈ (0, +∞), the BMI(m 1 , m 2 ) Proof. Calculate the first-order partial derivative of BMI(m 1 , m 2 ) with respect to : (see (C.2)). . Otherwise, BMI(m 1 , m 2 ) is increasing in . According to this property, it can be obviously deduced that is an adjustable parameter for BMI(m 1 , m 2 ) to amplify the divergence between different m(A i ) (i = 1, 2, … , 2 h ). Note that BMI(m 1 , m 2 ) is composed of D (m 1 , m 2 ) according to Equation (17), the above method is also suitable for calculating the concavity of D (m 1 , m 2 ), it is not difficult to see that D (m 1 , m 2 ) is also monotonically nondecreasing in .

A.4
Appendix D: Convexity Proposition 4. For any ∈ (0, +∞), BMI(M, N ) is jointly convex with an exponential form. That is, for two given pairs of probability distributions (M 0 , N 0 ) and (M 1 , N 1 ) without zero elements, and any 0 < < 1, according to Equation (12)  Proof. Define f (x) = xe x with > 0 and x ∈ R. It is easy to see that the first-order derivative of f ′ (x) = e x (1 + ) and considering the concavity of logarithmic function, we have: which implies that D (m 1 , m 2 ) ≥ D(m 1 , m 2 ) does work for ≥ 1 due to the monotonicity of D (m 1 , m 2 ). Furthermore, BMI(m 1 , m 2 ) and BJS(m 1 , m 2 ) are both composed of the above two parts, respectively. It is readily seen that BMI(m 1 , m 2 ) ≥ BMI(m 1 , m 2 ) when ≥ 1.