A network-decomposition-based multi-rate parallel transient simulation technique for active distribution networks

Emerging power sources and load demands penetrating the power system on demand side lead to diversiﬁed dynamic processes of distribution networks and thus create challenges for transient simulation of large-scale active distribution networks. This paper proposes a network-decomposition-based multi-rate parallel transient simulation technique for active distribution networks. Prominent advantages of the proposed method are accentuated by balancing computational complexities of CPU cores and devised core-oriented adaptive step sizes in terms of the simulated elements with various dynamic characteristics. First, an optimised decomposition strategy of active distribution networks is proposed with the goals of balancing the computational burden of each core and minimising interactive data between cores. On that basis, coordinating models of multi-port networks and equivalent models of interfaces are established. Then, a parallel simulation method of coordinated adaptive step sizes is proposed to accelerate the transient simulation. Case studies on eight feeders connecting to a common 10-kV bus are carried out to verify the simulation speed and precision of the proposed parallel simulation method.


INTRODUCTION
Over recent decades, large-scale penetration of new facilities, such as distributed generations (DGs), electric vehicles and energy storage systems on demand side, gives rise to the emerging active distribution networks [1][2][3]. Due to increasing amount of elements with different dynamic response characteristics, the new generation of distribution networks can essentially be characterised by the operation mode of multi-end multisource power supply and, thereby, have gradually evolved into increasingly complex infrastructures [4], [5]. Accordingly, the opportunity of decoupling simulation and challenge of real-time dynamic simulation prompt the development of multi-rate parallel transient simulation methods for active distribution networks. Significant concerns have come to the fore, over funda- of speeding up the simulation while guaranteeing the simulation accuracy [6]. In order to achieve efficacious transient simulation of largescale active distribution networks, the existing research works generally come into play from two aspects of simplifying element models and resorting to parallel simulation, respectively. Belhocine and Marinescu [7] and Ha et al. [8] used the method of dimension reduction to simplify simulation models so as to reduce the computational burden of the transient simulation; however, the dimension reduction can hardly be controlled completely and, thus, cannot always justify the loss of detailed element dynamics [9]. On the other hand, parallel simulation technology based on hardware acceleration seems to possess great advantages of accelerating the simulation without major sacrifices of the modelling accuracy. Therefore, many efforts have been paid to the realm of distribution network decomposition, which purposely caters for parallel simulation advantages. Nowadays, the common parallel simulation platforms include multi-core CPUs, CPU clusters, field-programmable gate arrays, GPUs and supercomputers [6], [10]. Be that as it may, it has been found in [11] that the hardware-in-the-loop parallel simulation systems are effective to simulate the large-scale distribution networks; however, there is still large room for simulation efficiency to further increase by means of network decomposition, subnet coordination and simulation step size optimisation.
As for large-scale active distribution network decomposition, the existing research can roughly be divided into two types: equation-level method and element-level method. Lee et al. [12] and Peng et al. [13] proposed empirical mode decomposition technologies to achieve explicit parallel simulation and implicit parallel simulation of distribution networks with large sparse matrices. However, due to the large amount and high complexity and coupling relations of equations in transient simulation of active distribution networks, it is still difficult to gain the optimal equation-level decomposition results. An element-level network decomposition method was proposed in [14], and an optimisation model was established based on the principle of balancing the element number of each subnet. Wang et al. [15] put forward an optimised decomposition strategy taking into account calculation ability of parallel processors to optimise network decomposition by balancing the element number of each planned subnet and the amount of necessary interface data. However, in the absence of modelling element dynamic responses, the proposed strategy in [15] is restricted to be applicable to steady-state scenarios.
Subnet coordination is defined for synchronising inter-media simulation information between the decomposed sub-networks. Some researchers have focused on data interaction and coordination between subnets. On the one hand, in traditional parallel simulation methods as in [16], to achieve data interaction between subnets, power supply equivalent models often adopt an identical coordination model to govern subnet intercommunication, which utilise current ports or voltage ports, respectively. However, with neglection of the special connection topology of subnets, the amount of interactive data cannot be reduced as expected. On the other hand, the existing equivalent models, as proposed in [17], make all elements equivalent to controlled voltage sources for the coordination between subnets without considering various external electrical characteristics of different elements, so as to obtain the complete simulation results of each feeder. As a consequence, the coordination models cannot be adjusted flexibly towards external electrical characteristics of different elements in active distribution networks.
The existing methods of determining simulation step sizes fall into two coarse categories of fixed step sizes and varying ones. For instance, Aristidou et al. [18] adopted the same sampling step size for all subnets, which cannot reflect different step size requirements of various elements in the transient simulation. On the other hand, Duan et al. [19] adopted a fixed step size for each element in the whole simulation process, which proposed a multi-rate parallel method for different subnets and realised data interaction between subnets with various simulation time steps. However, the weakness lies in that the subnet with small time step always needs to extrapolate the state information at the same simulation moment of the other sub-nets with larger time steps. The interpolation method described in [20] is intended for overcoming the above shortcoming by forecasting intermediate states of large-step subnets; thus, it is difficult to ensure the extrapolation accuracy. Fortunately, simulation performance is validated to be able to be improved by orchestrating simulation step sizes for certain transient characteristics of simulated elements. For instance, an adaptive variable step method proposed in [21] was applied to transient simulation of active distribution networks. The adaptive adjustment of simulation step sizes of subnets was decided by leveraging local truncation errors.
In summary, the shortcomings of the existing parallel transient simulation methods for active distribution networks are threefold.
(1) The optimised decomposition models only consider the balance of element number of each subnet and the minimum amount of interface data, without considering the computational complexity of different elements belonging to each subnet. (2) Subnet coordination models lack the consideration of connection configuration of subnets and the requirement of data interaction between the subnets, such as element models, interface types and interactive relations. (3) The parallel simulation method does not realise multi-rate parallel communication among subnets of various elements with distinct complexity needing to be simulated with inhomogeneous step sizes.
To overcome the above shortcomings, this paper proposes a network-decomposition-based multi-rate parallel transient simulation technique for active distribution networks. The main contributions lie in the following.
• In order to solve shortcoming (1), an optimised decomposition strategy is proposed, which balances the computational burden of subnets and minimising the amount of entailed communication data between the subnets, with consideration of the subnet-inside element type. • In order to solve shortcoming (2), equivalent coordination models are selected according to connection configuration of subnets and elements, which contain element models, interface types and interactive relations. • In order to solve shortcoming (3), the proposed multi-rate parallel simulation method with subnet-oriented adaptive step sizes is featured by an interpolation method to coordinate subnet simulation with different time steps, in terms of different interaction data requirements as well as element type. In addition, another salient aspect is that an adaptive adjustment of simulation step sizes for different subnets is realised by controlling truncation errors. Moreover, the accuracy of an employed variable step size technology with controlled truncation errors is ensured by proportional-integral (PI) controllers [22] fused into the proposed method.
The remainder of this paper is organised as follows. Section 2 describes the architecture of the network-decomposition-based FIGURE 1 Parallel simulation framework of active distribution networks multi-rate parallel simulation technique. The dedicated network decomposition strategy is presented in Section 3. Section 4 elaborates the proposed multi-rate parallel simulation method with subnet-oriented adaptive step sizes. Section 5 presents a case study on a testing system of eight feeders in a 110/10-kV substation. Finally, conclusions are drawn in Section 6.

NOTIONS AND FRAMEWORK
As shown in Figure 1, the architecture of the proposed multirate parallel simulation method with subnet-oriented adaptive step sizes is composed of three layers. For thoroughly understanding this architecture, some pivotal notions are clarified as follows.
First of all, basic elements, such as power line, load, DG and diesel generator, which comprise an active distribution network are classified and modelled in detail to cater for accurate simulation. To this end, a constituent element can be viewed as either a long-term dynamic element or an electromechanical dynamic element or an electromagnetic dynamic element, in terms of its dynamic response speed to voltage and current fluctuation. To facilitate the narrative, the long-term dynamic elements refer to traditional elements (type I), while electromechanical or electromagnetic dynamic elements are both termed dynamic elements (type II). Note that the elements grouped in type II can further be identified by its computational burden, e.g. photovoltaic generation system as type II.1, synchronous generator as type II.2, and so on. Each type of elements is then assigned to several subnets in the runtime according to predefined optimisation philosophy. As such, a subnet is always guaranteed to be simulated by the only designated CPU core. Based on these definitions, to balance the computational burden among all the available CPU cores, whose memory allocation are maintained the same, it is required to quantify the computational burden of each CPU core responsible for all the allocated subnets. Herein, the computational burden is simply defined to be the number of equivalent traditional elements designated to the CPU core. To meet this quantification requirement, this paper proposes to equate a dynamic element to several virtual traditional elements, and the specific equivalence principle will be discussed later.
By the way, since the overall parallel simulation efficiency could significantly be affected by the amount of interactive data necessary to exchange between CPU cores, thus, a sensible network decomposition strategy should try to allocate communication-intensive elements into the same subnet as much as possible. Fortunately, the radial operating structure of an active distribution network creates such a good opportunity to divide all the constituent elements by feeders of which commination-less nature is fully exploited.
Accordingly, the layer relationship of the proposed multi-rate parallel simulation method is explained as follows.

Between elements and subnets:
Realise the decomposition of active distribution networks. The influence of element type on transient simulation and the amount of interactive data between cores will be considered in the optimised decomposition strategy. This is because the element type and interactive data can affect the parallel simulation rate by impacting on the computing time of each core and the data quantity at the interface, respectively. To balance the computational burden, the computational complexity of traditional elements and dynamic elements is organised into different subnets for simulation. 2. Between cores and subnets: Multi-rate parallel simulation is accomplished between subnets by means of adaptive step sizes for each corresponding core. According to the network connection architecture and the external electrical characteristics of constituent elements, subnet coordination models and element equivalent models are established, and any proceeding simulated voltage or current by each core is modelled as a virtual controlled source that is the interface data employed by other subnets for respective independent simulation. On the basis of network decomposition and data interaction between subnets, different step sizes series are specialised for corresponding core to the subnet encompassing various element types, and any specialised step size series is designed for adaptively coordinating subnet simulation. That is why the proposed simulation architecture is highlighted with the feature of multi-rate simulation. It should be accentuated that the step sizes for different subnet should alter in compliance with an "integer multiple" condition, more detailed explanations will be presented later.

AN OPTIMISED DECOMPOSITION STRATEGY OF ACTIVE DISTRIBUTION NETWORKS
From a simulation viewpoint, an active distribution network possesses many distinguishing characteristics as compared to high-voltage transmission networks and micro-grids. Most remarkably amongst them are: a) the active distribution networks are easy to decouple for simulation owing to their radial topology, which is different from high-voltage transmission networks; b) compared with the centralised access of renewable energy sources in high-voltage transmission networks, decentralised access of DGs with large scales in active distribution networks leads to a larger computation complexity; c) to achieve data coordination between several subnets, the node splitting method in active distribution networks is different from the transmission line decoupling method in transmission network simulations; and d) there is no necessary of decomposition for micro-grids due to their small scales and computation complexity.
In this section, the proposed decomposition strategy will be explicated, which is devised in terms of computational complexity of each CPU core and required interactive data through interfaces. Two principles of the decomposition, i.e. balancing the computational burden between CPU cores and meanwhile minimising the interactive data between them, are first explained to help in understanding the following exhaustive mathematical expressions.

3.1.1
Balancing the computational burden of each core Electrical elements, from type to type by definition, may have very different computational complexity in the transient simulation. Thus, the computational complexity of each type of electrical element should be an important factor to be considered into the decomposition strategy. In this regard, the first principle is employed that the total computational complexity of different electrical elements is evenly shared by the CPU cores.

Reducing the interactions between cores
For two naturally connected elements falling into two subnets, a data interaction interface is entailed to represent the natural electrical connection for the parallel simulation. Thus, an interaction interface is defined to transmit necessary interactive data between communication-intensive elements, e.g. directly connected. For this reason, the other principle is introduced that communication-intensive elements are grouped into the same core as much as possible. This principle is realised as an optimisation philosophy of minimising imperative interactive data between CPU cores. Since more than two elements could directly be connected, multi-port network models (see Section 4) are established for the separated elements into different CPU cores.

Mathematical expressions of optimising decomposition
During the simulation, the dynamic response speeds of different elements to fluctuations of voltage or current could vary significantly. For instance, the pulse width modulation (PWM) signals of power electronic elements response at a frequency magnitude of kilohertz to meet the rapid response of DC/AC inverters; we have introduced the PWM to clarify the switching function of the inverter controller. In contrast, the dynamic response speeds of motors, generators and other rotating equipment mainly depend on the moment of inertia, so their responses are usually slower than power electronic elements, but faster than circuits, traditional ZIP loads and other elements. Considering the influence of element type on dynamic response speed, different simulation steps are crucial for different types of dynamic element in the interest of simulation efficiency. Therefore, during the optimised decomposition strategy of an active distribution network, it is better for a CPU core to tackle the same type elements (see Figure 1). In order to achieve the optimal computational burden of each core and the balance of different cores, multiple parallel subnets are correspondingly formulated.
In the optimised decomposition strategy proposed in this paper, the number of CPU cores involved in the calculation of traditional elements and that of dynamic elements is determined as the following devised equation: where n CPU is the total number of CPU cores available in the calculation, n T CPU is the number of CPU cores allocated for the calculation of traditional elements, n D CPU is the number of CPU cores for the calculation of dynamic elements, c T denotes the computational complexity of traditional load elements, c D stands for the computational complexity of dynamic elements, N represents the average multiple of the computational complexity of a dynamic element with respect to that of a traditional element in the transient simulation and round( ) rounds to the nearest integer.
Then, the dynamic elements are classified according to their types [8]. If the simulation system contains q different types of dynamic elements, and the computational burden of dynamic elements k is c D k (k = 1, 2, … , q), then the computational burden of CPU cores involved in the calculation of each type of dynamic element is defined by where n D,k CPU is the number of CPU cores assigned for calculation of the dynamic element k, c D k represents the computational complexity of dynamic elements k and N k is the multiple relations between the computational complexity of a dynamic element k and that of a traditional element in the transient simulation.
Since the optimised decomposition strategy aims at balancing the computational burden [15] and reducing the interactions between cores [23], the objective function is established as where F T is the optimised objective function related to traditional elements, F D k is the optimised objective function pertinent to the kth dynamic elements, 1 and 2 are the weight coefficients of the optimal objective, which represents different effects of computational burdens and interactions between cores on decomposition results and I i is equal to the interaction number of the ith CPU core, which is determined by the naturally connection between elements simulated in the ith CPU core and other cores. For example, I i increases with dispersion degree of traditional elements on the same feeder among different cores. Note that the interaction amount requirement between a dynamic element and its connected traditional element is a given constant and does not exist among dynamic elements. The reason is obvious that dynamic elements are singleport elements connected to traditional elements and are weakly coupled to each other. c T i is the computational complexity of the ith CPU core used in the simulation of traditional elements, i = 1, 2, … , n T CPU , to be precise, c T i * quantifies the rated computational capability of the ith CPU core to deal with equivalent traditional elements, which relies on the instruction amount of the s-functions constituting simulation models. c D k, j and c D k, j * are the computational burden and reference optimal computational burden of the jth CPU core used in the calculation of dynamic element k, respectively, j = 1, 2, … , n D,k CPU . To choose proper weighted factors for the optimised decomposition strategy proposed in this paper, a sensitivity analysis of the weighted factors in (3) is indicated, as shown in Figure 2, where simulation times are obtained from a 2-s parallel transient simulation with a step size of 5 × 10 −5 s in both the large-scale benchmark system and the realistic system. By distributing the ratio between 1 and 2 , curves approximated to concave functions are gained, and then, the interval of log 10 ( 1 ∕ 2 ) is determined as [-0.067, 0.135]. So, 1 ∕ 2 is set to 1 for its optimal interval is [0.857, 1.365].
In terms of the complicated nonlinear feature of the above optimisation problem, a classical genetic algorithm (GA) is adopted to resolve the optimised decomposition strategy, where c T i and c D k, j are taken as the decision variables. Since the performance of an employed GA for doing the pre-processing only  (3) depends on the amount of element, network topology and hardware environment, it is irrelevant to run time measurement. In particular, this optimisation problem, since resolved before running the simulation, has no effect on simulation time. It is to say, when simulating an active distribution network with various simulation time and scenarios repeatedly under the condition of keeping the amount of element, the topology of networks and computational capabilities of CPUs the same or similar, the optimised decomposition strategy is executed only once, which is similar to the time consumed in building a parallel simulation model, and they should not be counted in the simulation time.

PARALLEL SIMULATION METHOD OF ACTIVE DISTRIBUTION NETWORKS
Based on the above optimised decomposition strategy, transient simulation of multiple elements will be coped with through the formulated subnets by different cores. Therefore, to ensure the simulation accuracy, we proposed a subnet coordination model to capture element connection mode, and also a power supply equivalent model accounting for element types, to help realise an effective data interaction for parallel simulation.

4.1
Interface equivalent model for data coordination Coordination model of multi-port networks A multi-port equivalent circuit of a network can be employed to realise data exchange of terminal voltage or currents between subnets in the parallel simulation. If the currents flowing into multiple subnet interfaces are equal, using current ports can be expected to greatly reduce the amount of interactive data. Similarly, if multiple subnet interfaces have the same voltage to FIGURE 3 Subnet coordination models considering connection mode ground, it takes for granted that using voltage ports can significantly reduce the amount of interactive data. In this paper, different subnet coordination models are adopted for different connection modes. According to the port type, the calculation results of other subnets are equivalents of controlled sources to subnets to be resolved, and then, the overall result of the parallel simulation is obtained.
As shown in Figure 3, a multi-port network composed of multiple feeders [24] can be replaced by a passive linear network N especially connecting with voltage sources or current sources at the equivalent ports. As can be noted that ports 1-k share a common feature of being in series with a voltage source, and each port k + 1 to port m can easily be recognised by a shunt current source, values of k and m depend on the number of the external port-linking networks with whether concerned injected currents or terminal voltage, respectively. As per the superposition theorem, it holds that where all the bold symbols signify column-wise vectors; specifically, In an active distribution network, multiple feeders are generally connected to a common bus in parallel. Therefore, in order to reduce the interactive data, a multi-port network model with k = 0 is leveraged to model multiple feeders.

Equivalent model of element interfaces
In the transient simulation model of an active distribution network, feeders can be expressed as multiple single-phase impedances in series, while loads are usually equivalent to constant power models. The constant power control mode is generally adopted by the inverter of DG, whose structure is doubleloop. A two-layer PI controller can ensure the dynamic response speed of the DG by the application of the outer power loop and the inner current loop. It can be seen that the external charac- teristics of different types of elements are quite different, so the computational burden of traditional elements and dynamic elements on the same feeder is proposed to be allocated into different cores in simulation. In this paper, different power equivalent models are adopted for the elements with different external characteristics, and the simulation voltage or currents of other elements are equivalent to controlled power sources connected to the element to be simulated, so as to obtain the complete simulation results of each feeder. With the latest voltage and current results released from the last simulation step, a parallel subnet needs to know the boundary condition to proceed the next-step simulation. So, the latest simulation results can be regarded as interaction data to help establish the boundary conditions. We use an aforementioned multi-port network model with all the ports defined to be the CS ports as an example, to illustrate the philosophy of simulation data interaction between the equivalent multi-port network and the external connected elements with it. As shown in Figure 4(a), decoupling points are introduced to define the boundaries in the form of network ports; a dynamic element with the latest current result plugged into the multi-port network at port i (i = 1, 2, … , k)will be regarded as an ideal current source I S,i in accordance with the substitution theorem, so that the interaction I i is equal to I S,i . On the other hand, in Figure 4(b), with respect to lumped external elements, the same multi-port network can be regarded as an ideal voltage source U S,i , which is essentially U OP i , in turn, taken as the interaction voltage with the external element.

Parallel simulation method of multiple rates and variable step sizes
This section emphasises a proposed adaptive multi-rate parallel simulation method with variable step sizes to accelerate transient response without significantly increasing the computational burden, meanwhile, to ensure the transient simulation precision of each subnet.

Variable step size technology in subnet
In order to adjust the step size of each subnet, the PI controller on the basis of traditional extrapolation variable step size technology is introduced to reduce the truncation error in every step size, so as to make the system rapidly converge to the appropriate step size.
The variable step size technology adopted reduces the step size in big truncation error simulation and increases the step size in small truncation error simulation.
For the p-order algorithm, the method for adjusting the simulation step size is as follows: where h (k+1) is the step size initialised by the kth adjustment based on h (k) , e.g. the initial step size is h (0) , Tol represents the permissible tolerance, (k) is the estimated value of truncation error and is the safety margin coefficient, which value is set to be 0.8 in this paper. Let t (k) 0 denote the initial time of the kth step; then, the kth estimated truncation error based on extrapolation can be calculated by where X t (k) 0 is the state of a subnet at the initial time of the kth step; u and vare both natural numbers to satisfy uh (k−1) = vh (k) ; in other words, u and vrepresent the least common multiples of h (k−1) and h (k) . X t (k) 0 +uh (k−1) and X t (k) 0 +vh (k) stand for two states of the subnet at the same instant, but simulated with different step sizes of h (k−1) and h (k) , respectively; F T +h (k) (X T , h (k) ) is a state transition function, which quantifies the state increment in second from the moment T to T + h (k) .
Furthermore, a PI controller [23] is introduced to enhance the robustness of step size adjustment method, and (5) can be modified into where K P is the proportional coefficient and K I is the integral coefficient.
In order to ensure the integral multiple relationships between step sizes of two subnets, we can adjust the step sizes of an arbitrary subnet while leaving the other's unchanged. For instance, given the step sizes of subnet 2 chosen to keep pace with those of subnet 1, for any kth step, it should follow that

4.2.2
Multi-rate parallel simulation between subnets Based on (5)-(8), the multi-rate parallel simulation is proposed as follows, which includes a schema of the integration technique between the variable step sizes and the multi-rate parallel simulation to guarantee the simulation precision. For easy representation, simulated states of two parallel subnets will be differentiated by notations of x and y, respectively, and the initial moment of the kth simulation step is denoted by t k , and x k and y k stand for the simulated subnet states at moment t k .
Without loss of generality, as shown in Figure 5, starting with the assumption of h and present calendar moment t k , the execution steps of multi-rate parallel simulation between exemplary subnet 1 and subnet 2 read as follows.
Step 1: Initialise step sizes of each subnet at t k , then perform Steps 2 and 3 in parallel, after that, continue to Step 4.
Step 2: Subnet 1 transfers its states x k−1 and x k to subnet 2.
Step 3: Subnet 2 transfers its state y k to subnet 1. In the time interval of [t k , t k + h (k) 1 ], Steps 4 and 5 are performed in parallel.
Step 5: First, [t k , t k + h (k) 1 ] is cut into m equal intervals. Then, as shown by (10), an interpolation algorithm is adopted to predict intermediate states of x k,1 , x k,2 , … , x k,m for each interval based on x k−1 and x k by (11). On top of that, m steps simulations can be performed in subnet 2 Step 6: Proceed into t k+1 , repeat the above Steps 2-5 until the end of the simulation. Particularly, the generic expression of h must be satisfied in every step size to fulfil the requirement of the integration technique between the variable step sizes and the so-called multi-rate parallel simulation.

Simulation background
The following simulation experiments are conducted on hypothetical eight feeders connecting to a common 10-kV bus, as shown in Figure 6, which contains 336 traditional elements, 21 diesel generators and 13 DGs. A variety of dynamic ele-ments are added to each feeder on the basis of the IEEE 33-bus system and the IEEE 69-bus system; the topological parameters are listed in Table 1. To validate the proposed simulation method, the adopted simulation models of long-term dynamic/electromechanical transient/electromagnetic dynamic elements are restricted to traditional loads and impedance [25]/synchronous generators [26]/DGs [27], respectively. Note that DGs, such as PVs, are modelled as inverter grid-connected power supplies, where the generator is simplified as a DC power supply and then connected to AC distribution networks through a DC/AC inverter. Because a double-loop structure is selected for the constant power control of the inverter, the effect of controller and accuracy of simulation are ensured, so the above model of DGs can be adopted by the parallel simulation. In the cases, the DGs each with nominal power 100 kW are connected to feeder s1-6, and the synchronous generators each with rated power 50 kW are connected to feeders 7 and 8. Simulink is adopted for its convenience on multi-rate parallel simulation based on the abundant toolbox and powergui block. Common settings of the simulation experiments include: 2 s of total simulation time, a fault is supposed to exist from 0.5 to 1 s on node 1 of L3. On that basis, special experimental cases are designed as follows.
Case 1-1: Serial simulation mode is performed with a fixed step size of 5 × 10 −5 s in Simulink.
Case 1-2: Parallel simulation with eight CPU cores is performed with the same step size of 5 × 10 −5 s. An artificial decomposition strategy is supposed to balance the computational burden of each core. Case 1-3: Parallel simulation with eight CPU cores is performed with the same step size of 5 × 10 −5 svia the optimised decomposition strategy proposed in this paper; pre-determined parameters for the optimised decomposition strategy are listed in Table 2. Among them, N 1 and N 2 are determined by the multiple relations between the computational complexity of a synchronous generator or a DG and that of a traditional element in the transient simulation, whose differences are determined by the complexity of solving their simulation models.  Case 1-4: Parallel simulation with eight CPU cores is performed with the same decomposition strategy as Case 1-3. The step sizes of CPU #1 and #2 vary from 5 × 10 −5 s to 5 × 10 −3 s, and those of CPU #3 to #6 vary from 5 × 10 −5 s to 5 × 10 −4 s, and those of CPU #7 and Core #8 are all 5 × 10 −5 s.
The decomposition results of the above cases are shown in Table 3, where CPUs #1 and #2 are used to simulate traditional elements, CPUs #3 to #6 are used for synchronous generators and CPUs #7 and #8 are for DGs. The optimal decomposition results of Cases 1-3 and 1-4 are shown in Figure 7.

Analysis of speed indexes
Single phase to earth fault is assumed for this test, and the simulation speeds of these cases are compared with commonly defined indexes of where £ c is the balance rateof the computing time of subnets, it belongs to [0, 1] and the optimal value is 100%; i is the speedup ratio, and the bigger the better; T 1 , T 2 , … , T n CPU are the com-  puting time of CPU cores, respectively; T ∞ signifies the sum of computing time of each core in parallel simulation; T t is the communication time of CPU cores and is negligible. As shown in Table 4, computing time, balance rate and speedup ratio in different cases can be compared. If the computational ability of a single CPU core satisfies the simulation requirement by the computational burden, i.e., the number of instructions in element simulation models that need to be processed in a single-core per second does not exceed the maximum instruction throughput per second of this type of CPU core, the multiple of simulation speed improvement by parallel simulation will not be more than the amount of CPU. It is shown that the computational burden of Case 1-1 goes much beyond the computational ability of a single CPU core; the simulation time increases exponentially in traditional serial simulation, which is defined as "timeout"; meanwhile, the simulation time of Case 1-2 is rapidly reduced than Case 1-1. In these scenario, where the computational ability of a single CPU core cannot satisfy the simulation requirement by the computational burden, the results prove that the parallel simulation method can significantly speedup the simulation, as compared with the traditional serial simulation.
It should be noted in advance that solving the optimised decomposition strategy using GA consumes 22.46 s, which is greater than the computing time in Cases 1-2-1-4. However, the above process has been executed before running the simulation. Hence, simulation time of Cases 1-3 and 1-4 can be counted independently from the pre-processing time of the simulation. On this basis, the computing time of Case 1-3 is 88.46% of Case 1-2, the balance rate of Case 1-3 is 141.62% of Case 1-2 and the speedup ratio of Case 1-3 is 116.30% of Case 1-2; hence, it is proved that the proposed optimised decomposition strategy can further optimise the parallel simulation compared with artificially balancing the computational burden of each core. The computing time of Case 1-4 is 95.57% of Case 1-3, the balance rate of Case 1-4 is 98.09% of Case 1-3 and the speedup ratio of Case 1-4 is 101.21% of Case 1-3, which thus proves that the multi-rate parallel simulation method with adaptive variable step sizes can effectively reduce the computing time and increase the speedup ratio on the basis of maintaining the balance rate. Moreover, the memory occupancy of CPUs is analysed, as shown in Figure 8, where the radar chart and box chart show the superiority of Case 1-4 in balancing every CPU's load.
In conclusion, the proposed parallel simulation strategy can balance the computational burden of subnets and accelerate the transient simulation of the active distribution network. The precision of Case 1-1 is the highest for its series simulation mode, whose calculation results can be taken as true values [19], [22]. Therefore, differences between the results of Cases 1-2-1-4 and Case 1-1 are defined as the simulation errors, and ratios between the average of simulation errors and measurement values are defined as percentages of simulation errors.  Table 5, the average voltage error and current error of Case 1-3 are 0.093% and 0.13%, respectively, and the average voltage error and current error of Case 1-4 are 0.12% and 0.18%, respectively. Although the simulation errors of Case 1-4 are larger than those of Case 1-3 due to the truncation error step size control and interpolation method, they can still be limited within  Figure 10. In small transient durations after the start time (0.5 s) and the end time (1 s) of the fault, simulation errors get larger than those in other steady processes. As shown in Table 6, the average voltage error and current error of Case 1-3 are 0.30% and 0.46%, respectively, and the average voltage error and current error of Case 1-4 are 0.39% and 0.51%, respectively. Although the simulation errors of Case 1-4 are larger than those of Case 1-3 due to the truncation error step size control and interpolation method, they can  Topology of a 10-kV realistic distribution network still be limited within 0.51%. Therefore, it suggests that the proposed parallel simulation method can improve the computing speed while ensuring the accuracy under the most severe faults. Based on the above simulation results, simulation errors are generated by the optimised decomposition strategy and the multi-rate simulation technology with adaptive variable step sizes in the parallel simulation, but the average errors can be limited to less than 0.51% in all fault scenarios.

Simulation background
To further verify the superiority of the proposed multi-rate parallel transient simulation technique in engineering practice, the following case studies are organised based on a realistic distribution network in Xiamen, Fujian Province, China. As shown in Figure 11, 13 feeders from Baisha 110/10-kV substation are taken with specifications listed in Table 7 Table 8, and those of Cases 2-3 and 2-4 are yet shown in Figure 12. The other parameters of Cases 2-1-2-4 are kept in line with Cases 1-1-1-4. CPUs #1-#4 are used to simulate traditional elements with the step size varying from 5 × 10 −5 s to 5 × 10 −3 s, CPUs #5-#7 are budgeted for diesel generators with the step size varying from 5 × 10 −5 s to 5 × 10 −4 s and CPU #8 is for DGs with the step size 5 × 10 −5 s.

Analysis of speed indexes
As shown in Table 9, single-phase earthing fault is simulated to prove the speed superiority of the proposed method applied to realistic networks. The computing time of Case 2-2 is much less than that of Case 2-1, proving that the parallel simulation method can significantly improve the simulation speed compared with traditional serial simulation. Without considering the offline execution time, the computing time, balance rate and speedup ratio of Case 2-3 are 89.00%, 144.71% and 115.58% of Case 2-2, respectively, which hence proves that the proposed optimised decomposition strategy can further optimise the parallel simulation in realistic networks. The computing time, balance rate and speedup ratio of Case 2-4 are 97.46%, 100.67% and 103.49% of Case 2-3, respectively, which thus proves that the multi-rate parallel simulation method with adaptive variable step sizes can effectively improve the characteristics of simulation speed. Moreover, the memory occupancy of CPUs is analysed as shown in Figure 13, where the radar chart and the box

Analysis of precision indexes
Based on the same preconditions as benchmark network tests, a multi-fault realistic test is conducted as follows. The simulation starts from operating the distribution network under normal states; then, a single-phase earthing fault occurs at 0.5 s and sustains at 0.5 s. After a sequel time span from 1.0 to 1.5 s, a three-phase short-circuit fault is hypothesised to happen and last until the end of simulation. As shown in Figure 14, the voltage and current errors of the fault point increase rapidly at the moment of state transition, but they can also be cut down by the proposed parallel simulation method owing to multiple rates and variable step sizes. As shown in Table 10, the average voltage error and current error of Case 2-3 are 0.22% and 0.27%, respectively, and the average voltage error and current error of Case 2-4 are 0.27% and  0.29%, respectively. Although the simulation errors of Case 2-4 are larger than those of Case 2-3 due to the truncation error step size control and interpolation method, they can still be limited within 0.29%. As a consequence, the proposed parallel simulation method can improve the computing speed while ensuring the accuracy under similar-sized distribution networks to the employed realistic network in Xiamen.

CONCLUSION
This paper proposes a network-decomposition-based multi-rate parallel transient simulation technique for active distribution networks. As validated by the experiments, we have the following conclusion.
1. The optimised decomposition strategy can improve the speed of parallel simulation and greatly improve indexes such as the computing time, balance rate and speedup ratio, owing to accounting for the influence of element type on transient simulation. 2. The speed of parallel simulation is improved due to the interface equivalent model and the multi-rate simulation technology with adaptive variable step sizes. The interpolation method is used to reduce errors on the interface of multi-rate parallel simulation, and the truncation error controlling strategy is used to ensure the accuracy of the simulation by adjusting step sizes adaptively.