An improved resource allocation method for mapping service function chains based on A3C

Network function virtualization(NFV) technology deploys network functions as software functions on a generalised hardware platform and provides customised network services in the form of service function chain (SFC), which improves the ﬂexibility and scalability of network services and reduces network service costs. However, irrational resource allocation during service function chain mapping will cause problems such as low resource utilisation, long service request processing time and low mapping rate. To address the unreasonable problem of service mapping resource allocation, we proposes an improved service function chain mapping resource allocation method (SA3C) based on the Asynchronous advantageous action evaluation algorithm (A3C). Firstly, the SFC mapping model is established, the SFC mapping process is divided into two layers of physical


Introduction
With the rapid development of communication network technology, the number of network users is increasing year by year, and the network functions and services required by users are becoming more and more diversified.The strong coupling between network functions and special hardware equipment in traditional network architecture cannot provide users with flexible and customisable network services.Network functions virtualization (NFV) [1] transforms network functions into virtualized network functions (VNF) [2] and deploys them on general-purpose hardware platforms.It provides users with customized network services in the form of service function chain (SFC) [3], provides a new network function service business processing architecture, and improves the flexibility and scalability of network services.In NFV, SFC refers to the logical chaining of service requests for completing network service processing by directing network traffic to be transmitted to the corresponding network functional nodes for processing through the network links in sequence according to the sequential logical order of VNFs in the user-issued SFC request (SFCR) [4].Among them, SFC mapping is the key process to complete SFCR processing [5], which refers to the process of mapping the VNFs in SFCR to the physical nodes and links that can provide the corresponding function instances, providing the computing, operation, and communication resources required by the nodes and links for each service request, and causing the traffic to pass through the corresponding network functions in order.
In the SFC mapping process, the main network resources involved are node computing resources and link bandwidth communication resources, and the weight of the two resources allocated on the SFCR fundamentally affects the response and processing efficiency of the SFCR.Therefore, this paper forms the SFC resource allocation problem with two key resources, node computing resources and link bandwidth communication resources.The SFC resource allocation problem is the SFC mapping problem, which has been recognised as the NP-Hard problem [6].The NP-Hard problem is considered very challenging and computationally expensive to solve efficiently.The SFC resource allocation problem can be classified into resource allocation problems for static and dynamic scenarios according to the mapping environment [7].Static scenarios refer to offline scenarios where all SFC requests are known before mapping, so when solving the static scenario resource allocation problem, it is usually required to provide an overall optimal solution for all SFC requests [8].Moreover, the network functions within SFC have interdependent relationships (service request functions are sequential), indicating that the SFC mapping process is not a single resource allocation, but also needs to consider the relationships among the individual resources to be allocated.Dynamic scenarios refer to online scenarios, where service requests arrive dynamically and randomly, and the state of the data volume, service requirements, service priority, etc. carried by the service requests is also uncertain.Therefore, when solving the resource allocation problem for dynamic scenarios, it is usually necessary to find the optimal resource allocation scheme for each arriving SFC based on the current network state and the remaining resources [9].
Reasonable SFC mapping resource allocation strategy can improve network resource utilisation, enhance network and service processing speed, scientific and effective SFC resource allocation decision-making has become a key means to ensure efficient operation of network services [10].Since SFC requests tend to arrive unpredictably and randomly in real network scenarios, the focus of existing scholars' research has gradually shifted from static scenarios to the study of SFC resource allocation in dynamic scenarios [11].Meanwhile, the rapid development of deep reinforcement learning (DRL) has had a great impact on SFC applications, and its combination of the perceptual ability of deep learning and the decision-making ability of reinforcement learning can greatly improve the SFC resource allocation capability.Introducing DRL into the SFC resource allocation problem is of great research significance to improve the resource allocation effect.However, most of the existing research on SFC resource allocation [12] focuses on single node computational resources or link bandwidth communication resource allocation problem, which is prone to another low resource utilisation and resource allocation imbalance problem, and due to the complexity of SFC mapping actions, there is a slow convergence efficiency problem.
In this paper, the goal is to allocate the service function chain mapping resources more rationally, aiming to improve the resource utilisation, enhance the service efficiency and reduce the total service processing time by rationally allocating the node computing resources and link bandwidth communication resources for the SFC.The main contributions of this paper are as follows: (1)We analyse the correlation between resource allocation and service processing efficiency during SFC mapping and express it in formulas to provide theoretical support for the formulation of optimal allocation formulas for a more reasonable allocation of node computational resources and link bandwidth communication resources; we also model the resource allocation of the SFC mapping process as a Markovian decision-making process, and introduce the asynchronous training of the A3C method to accelerate the convergence efficiency of the method, which effectively improves the resource utilisation rate and the SFC mapping rate, and reduce the total processing time of SFC; In parallel training, we treat each SFC request as a sub-network-trained intelligence to find the optimal resource allocation strategy, and synchronise it with the main network parameters by means of differential updating.
(2)We compare SA3C with other methods and find that it improves the resource utilisation by 9.85%, reduces the total processing time by 10.72% and improves the mapping rate by 6.72% as compared to the Actor-Critic (AC) and Policy Gradient (PG) algorithms.Therefore, the proposed method can prove that the resource utilisation can be effectively improved and the service efficiency can be enhanced by rational allocation of node and link resources.
The rest of the paper is organised as follows: section 2 provides an overview of previous related work carried out by other researchers; section 3 describes the network architecture, optimisation objective and formulation description of the SFC mapping; section 4 describes in detail the algorithmic model and algorithmic updating steps of the SA3C; section 5 shows simulation experiments and performance evaluation; and section 6 concludes and discusses future work.SFC, and uses these predictions to estimate the future QoS and formulate SFC reconfiguration strategies, which effectively reduces the average delay of the SFC.However, the study has a small learning rate set and the convergence time of the DQN model is long.Tang et al [16].proposed a multi-timescale online adaptive virtual resource allocation algorithm based on the autoregressive moving average (ARMA) prediction method, which allocates cache space and time-frequency resources in the network to meet the quality of service (QoS) requirements of each virtual network in terms of the overflow probability while minimising the network overhead, and solves the irrationality of the traditional methods due to the traffic uncertainty and the delay of information feedback.Zhang et al [17].proposed a deep learning-based predictive SFC dynamic deployment model that can predict the future resource requirements of each VNF instance, with the goal of minimising the overall end-to-end delay, and transform the dynamic deployment problem into an integer nonlinear programming (INLP) solution.However, the above method, by predicting future service requests, relies heavily on the training dataset, and when there are differences or even conflicts between the training dataset and the dynamically reached service requests in the real network, the prediction results will produce deviations or even the opposite results, which will lead to worse resource utilisation, quality of service, and so on.Unfortunately, requests for dynamic network services are often real-time and random.
Li et al [18].proposed an online coordinated resource allocation method for NFV, which completes the SFC resource allocation in an online coordinated way by combining parallel multi-intelligence deep reinforcement learning with graph convolutional neural networks and reinforcement learning training techniques, using CPU, memory, bandwidth, etc. as resources.Zhang et al [19].considered different characteristics of nodes and resource loads and proposed a federated learning based SFC resource allocation algorithm that effectively balances resource consumption and allows SFC reconfiguration to reduce service blocking rate.Liu et al [20].proposed a multi-objective optimal service function chain mapping method based on reinforcement learning (RL), which obtains the mapping probability of each physical node and completes the SFC mapping based on the probability by optimising the resource allocation model and using the information matrix extracted from the physical network as a training environment for the intelligences.
In summary, SFC resource allocation can save network resources and improve service quality, and has become a research hotspot in the field of SFC mapping.With the advantage of continuous learning from data, self-optimisation and improvement, machine learning has achieved a large number of research results in solving the SFC resource allocation problem based on machine learning, which proves that it is feasible to optimise the SFC resource allocation strategy with the help of DRL methods.Compared with existing research, we improve on A3C by considering the dynamic arrival of SFC requests and the complexity of resource allocation actions when applying DRL, and the results show positive results on resource utilisation, total processing time, and mapping rate.

System model and problem description 3.1 system model
In this paper, only the mapping between the set of virtual network function requests and the physical topology resource layer is considered when describing the service function chain mapping resource allocation, and the SFC mapping system model is shown in Fig. 1.
As shown in Fig. 1, The physical topology resource layer can be represented by an undirected graph G = {N, L}, where N = {n 1 , n 2 ...n m } denotes the set of physical service nodes, consisting of generalised high-performance servers, and L = {l i,j = (n i , n j )|i, j ≤ m} denotes the set of physical links.Multiple VNF instances may be deployed in each physical service node to provide network services for the user, and different VNF instances can accomplish different network service requirements, and the set of network instances deployed in the physical service node is denoted by V N F Is i = {vnf 1 , vnf 2 ...}, and the available computational resources of the physical service node are denoted by , where c i is the available computational resources of the physical service node n i .The physical link bandwidth resource between physical service nodes n i and n j is denoted by B i,j .If B i,j = 0 then it means that the available link bandwidth resource is 0 or it means that no physical link connection exists between the two points.The set of virtual network function requests is denoted by

set of virtual network function requests physical topology resource layer
denotes the sequence of virtual network function requests, denotes the logical links of the SFC mapping; E f = {e 1 , e 2 , ...e i } denotes the set of virtual links, where e i = {v i , v j } denotes the virtual link between the virtual network functions v i and v j , denotes the direction in which the data flow or traffic flow passes through; Data f denotes the magnitude of the amount of data to be transmitted by f , and P f = {p 1 , p 2 ...p 5 } denotes the priority of the processing of f , with a minimum of p 1 and a maximum of p 5 in the priority scale.

Description of the problem
In this paper, the process of processing service requests is divided into a number of time slots, denoted by T = {1, 2, ...t}, the set of time slots, and a time slot t as the mapping processing cycle of the set of SFC requests.The node mapping status is denoted by c f,i j (t)={0,1} in the time slot t in which the virtual function is mapped into the physical topology, when c f,i j (t) = 1 indicates that the ith virtual network function in the f th SFC is successfully mapped into the jth physical service node in the physical topology, and when c f,i j (t) = 0 indicates that there is no successful mapping.When all the virtual network functions in the f th SFC request are mapped, the binary matrix formed by the mapping of its virtual network functions can be represented by The link mapping state is denoted by b f,e p,q (t).When b f,e p,q (t) = 1, it means that the eth virtual link in the f th SFC is successfully mapped to the physical link between the physical service nodes n p and n q in the physical topology, and if the mapping fails, the mapping state is b f,e p,q (t) = 0.After all the virtual links in the f th SFC request have been mapped, the binary matrix formed by the mapping of its virtual links can be represented by B(t) = [b f,e p,q (t)].During the mapping process, the processing rate of the ath SFC request in the physical service node at time slot b is proportional to the computational resources allocated to that physical node, which can be described by Eq.( 1).
where ϕ is the transformation factor, and ϕ > 1, c f c (t) are the computational resources allocated for it by the physical service node that is successfully mapped at time slot t.Therefore, the time spent by the f th SFC request at that physical node can be expressed by Eq. (2).
Therefore, the total node processing time for the f th SFC request is: Similarly, the processing rate of the f th SFC request in the physical link at time slot t is proportional to the bandwidth resource allocated to that physical link, which can be described by Eq.( 4).
where η is the transformation factor, and η > 1, b f e (t) are the computational resources allocated for it by the physical link that is successfully mapped at time slot t, and b f e (t) ≤ B p,q .Therefore, the time that the f th SFC request stays at that physical link can be expressed by Eq.( 5): Therefore, the total link processing time for the f th SFC request is: Therefore, the total processing time of the f th SFC request is the sum of the total processing time of the physical service node and the total processing time of the link, which can be expressed by Eq. (7).
In the mapping process, the priority p i will also affect the order of mapping.If the priority of each SFC is not set, the default is p 3 .The higher the priority, the greater the proportion of the allocated computing resources and link bandwidth resources, and the shorter the total processing time under the same data volume.
In summary, the SFC resource allocation problem in this paper can be classified as a processing time minimisation mathematical model that can be built as a joint allocation of node resources and link bandwidth resources and satisfies the following constraints: min Where C1 is the total minimised processing time for processing the set of virtual network function requests, which is also the objective of the optimisation, denotes that each VNF has to be mapped to a physical service node, where I is a unit matrix of all 1's; C2 denotes that the allocated computational resources of a VNF, when mapped to a physical service node, are limited by the capacity of the available resources of the physical node; C3 denotes that the allocated bandwidth resources of a virtual link, when mapped to a physical link, are limited by the capacity of the available bandwidth resources of the physical link.

Mapping strategy 4.1 Mapping problem transformation
The research objective of this paper is to improve the resource utilisation during network service, enhance the service efficiency and reduce the total service processing time.Therefore, by regulating the physical service node computational resources and link bandwidth communication resources allocated when processing SFCs, the computational processing and communication transmission time of SFCs are affected, so as to improve the service processing efficiency.The state of the mapping environment in the resource allocation process described in section 3.2 is a dynamically changing process that can be modelled as markov decision process (MDP).This mapping resource allocation process modelled as an MDP can be defined as a triple M =< S, A, R > , where S is a finite state space, A is a finite action space, and R is a reward space.
The state space S is composed of the system states of the individual SFCs mapped to the physical topology at time slot t.Thus the system state at time slot t can be expressed as: where s i (t) = (s N t , s fi t ), s N t denote the physical topology resource information at time slot t, and s fi t denotes the characteristics of the ith SFC request.The action space A is represented as the way the SFC allocates various types of resources in the current system state at the time of mapping, so the action space of time slot t can be represented as: where a c (t) denotes how the computational resources of the physical nodes are allocated and a b (t) denotes how the bandwidth resources of the physical links are allocated.When an action a t is taken at time slot t based on the state s t , it will enter the next state s t+1 and obtain an immediate reward r t , and the immediate rewards of each time slot t constitute the reward space R .Since the optimisation objective of this paper is to minimise the SFC processing time, the opposite of the processing time is set as the reward function, denoted as follows:

SA3C Description
Since SFC arrivals are dynamic, the centre of gravity of mapping resource allocation needs to be adjusted automatically according to SFC demands, and SFC has continuity in mapping action selection, therefore, this paper introduces the A3C method to optimize the joint allocation strategy of node computing resources and link bandwidth resources for SFC, and proposes SA3C, an improved SFC mapping resource allocation algorithm based on A3C.SA3C method uses the idea of asynchronous training, using multi-threading technology can be executed

Fig. 2 SA3C Resource Allocation Model
As shown in Fig. 2, this paper treats a single SFC as a sub-network of intelligences that learn by interacting with the underlying physical network topology.The set of subnetworks is M .The parameter vector T is used to generate the action selection policy function π(s), and T is used to generate the state value function V (s).Therefore, both the main network and the sub-networks of SA3C have a random action policy π(s t , a t |θ µ ) and a state-value function V (s t |θ Q ) that are maintained by each network.In SA3C, it is necessary to first define the state-value function of strategy π(s), which is essentially the cumulative expected discount reward and can be expressed as Eq.( 12). V where γ ∈ (0, 1) is the discount factor, representing the degree of influence of future decisions on the current state.Secondly, the value function of the action strategy needs to be defined for evaluating the magnitude of the value corresponding to the current action with respect to the average value, which can be described by Eq.( 13).
In addition, a dominance function has been introduced in SA3C to represent the gain behaviour due to the action using an action value function that subtracts the baseline value possessed by the corresponding state, thus reducing the variance due to the change in the baseline value of the state.The representation formula is shown in Eq. (14).
In SA3C N-step sampling is used for accelerated convergence, so the dominance function can be expressed as: In SA3C, both the main network and the sub-network are essentially Actor-Critic networks, which need to update the policy network parameters and the value network parameters when updating.Therefore, the policy network and value network parameters in the main network are denoted by θ µ and θ Q , and the parameters of the policy network and value network in the sub-network are denoted by θ µ ′ and θ Q ′ .The update of parameter θ µ in the Actor policy network follows the policy gradient formula expressed as follows.
Where, δ is the entropy hyperparameter, which is set larger at the beginning and gradually decreases during the training phase, and H() represents the entropy of the strategy.parameter θ Q update in the Critic value network follows the strategy gradient formula: Therefore, the SA3C algorithm is shown in Algorithm 1.
Algorithm 1 The SA3C algorithm Input: physical topology resource layerG = {N, L}, Set of Service Function Requests SF Cs = {f 1 , f 2 , ...} OutPut: Main network parameters θ µ and θ Q 1: Initialising the main network parameters θ µ and θ Q , Maximum number of iterations of the main network T max and master network counters T = 1 2: Initialise M sub-network thread and make the parameters in the sub-network θ µ ′ = θ µ and θ Q ′ = θ Q , the maximum number of iterations in the sub-network T local and sub-network counter t = 1 3: For T in {1, 2, ...T max } Do 4: For u in {1, 2, ..., M } Do 5: Reset the subnetwork gradient dθ µ ← 0 , dθ Q ← 0 6: Synchronise the main network parameters to update the sub-network parameters For t in {1, 2, ...T local } Do 10: Execution a t obtains r t , s t+1 12: End For 14: If s t is end state: Q(s t , a t ) = 0 15: Else: 19: u ← u + 1 21: End for 22: End for 23: Update main network parameters: 5 Simulation experiment design and validation

Simulation experiment environment design
In order to verify the validity of the models and methods proposed in this paper, the simulation experiments are run on a device with rows of i5-11400KF-CPU, 16 GB-DDR4 RAM and two GTX-3050 8G graphics cards based on Python 3.6 and tensorflow-gpu-2.4platforms.The physical network topology uses the France network topology in SDN-lib, which has a total of 25 nodes and 45 links, with an average access degree of 3.6 and a link density of 15%, and sets the physical node computational resources and link bandwidth fetch resources to follow a uniform distribution of [100,200] units.The number of SFCs is [10,100] and the number of SFCs has 70% of priority p 3 , 10% each of p 2 and p 4 , and 5% each of p 1 and p 5 .The length of the VNF sequences in a single f takes a value uniformly distributed between [2,10], and the amount of data carried by each f is normally distributed in [10,20] units.
In order to improve the accuracy of the algorithm evaluation, Monte Carlo method was used and 100 sets of simulation tests were conducted in each experimental scenario, and the average of the 100 sets of simulation tests was taken as the test results.And in the SA3C training process, its parameters involve the number of sub-networks, the learning rate, the transformation factor, the discount factor, the entropy hyperparameter, and the maximum number of iterations of the main network and sub-networks during the training process.The specific configurations of the parameters of the SA3C training network are shown in Table 1.

Analysis of experimental results
In this paper, we validate the effectiveness of the proposed method in terms of resource utilisation rate, total processing time and mapping rate, and compare it with the Policy Gradient (PG) method and the AC method.
(1)Resource utilisation rate SFC requests need to occupy a certain amount of resources to be processed when they are mapped into the physical network, and the resource utilisation of the current set of the number of SFC requests can be defined as Eq.( 18), where A comparison of the resource utilisation rates of the three methods is shown in Fig. 3, from which it can be seen that the resource utilisation rates of the three methods show an increasing trend, which is due to the fact that with the increase in the number of SFCs to be processed, the allocatable physical node computational resources and physical link bandwidth resources are gradually decreasing, and resource utilisation rates will gradually increase.Since the SFC needs to select the node that deploys the corresponding VNF for mapping, if there is no mapped VNF node, the SFC mapping fails and does not take up the resources of the remaining nodes that are not deployed with the corresponding VNFs, the resource utilisation rate, although showing an upward trend, will not be fully utilised in general.
As shown in Fig. 3, the resource utilisation rate of SA3C is 69.31% when mapping processing for 100 SFCs, which is an improvement of 9.85% compared to AC and 17.08% for PG algorithm.This is because SA3C takes the allocation of physical node computing resources and physical link bandwidth resources as a joint optimisation objective, adjusts the weight of resource allocation during SFC mapping, and speeds up the processing rate of SFCs, thus releasing resources for processing more SFCs.(2)Total processing time According to Eq.( 7) the total processing time for the current set of SFC request numbers can be calculated as shown in Eq. (19).The comparison graph of the total processing time of the three methods is shown in Fig. 4. As shown in Fig. 4, the total processing time in all three methods shows a rising trend, which is because as the number of SFC requests increases, the amount of data to be processed gradually increases, and the time used to process the data also rises.However, the upward trend in the total processing time of the SA3C method is significantly smaller than that of the AC and PG methods.In completing the processing of 100 SFC requests, the total processing time required by SA3C is 1025.86 units of time, which is a reduction of 10.72% units of time compared to AC, and 18.43% units of time for the PG algorithm, which is due to the fact that SA3C sends each SFC to a sub-network for learning and training in order to achieve the optimisation effect of minimising the total processing time.

RRate
DT imes (3)mapping rate The mapping rate at which SFC requests are successfully mapped into the physical network for processing can be defined as Eq.( 20), where N um(SF Cs) denotes the total number of current SFC requests and Tmax t=0 N um(f ) denotes the number of successful mappings.The comparison of the mapping rates of the three methods is shown in Fig. 5, from which it can be seen that in the beginning stage, the mapping rates of the three methods gradually decrease from 100%, which is because with the increase of the number of SFC requests, the node computational resources and link bandwidth resources in the physical network are gradually occupied, which makes some SFCs fail to map due to unallocated resources, but the decrease of the mapping rates of the SA3C method is significantly slower than that of the PG and AC algorithms.At the end of mapping at 100 SFC requests, SA3C has a mapping rate of 75.74%, which is an improvement of 6.72% compared to AC and 16.7% for the PG algorithm.This is due to the fact that SA3C uses multiple sub-networks to learn in parallel, which reduces the training time and speeds up the convergence efficiency.

Conclusion
Virtual link mapping is one of the main problems that need to be solved by the service function chaining technology, thanks to the network function virtualisation technology, the function instances can be placed in any node in the network according to the need, but the network resources such as the computation of the physical nodes and the bandwidth of the links are limited, and most of the existing researches are only focusing on one kind of resources for optimisation, which leads to the wastage of the other kind of resources during optimisation, and results in the excessive depletion of the network resources.Deep reinforcement learning has a powerful understanding ability, as well as the decision-making ability of reinforcement learning.Based on this, this paper improves the A3C algorithm based on deep reinforcement learning, and proposes an improved SFC resource allocation method-SA3C based on A3C for the problem of irrational allocation of node resources and link resources in the process of SFC mapping, which results in low SFC mapping rate and long processing time.The method jointly optimises the allocation of node computing resources and link bandwidth resources, and generates multiple sub-networks for parallel training by asynchronous training, using multi-threading technique, treating each SFC request as a sub-network-trained intelligence, finding the optimal resource allocation strategy, and synchronising it with the parameters of the main network by means of differential updating, so as to efficiently accelerate the algorithm's convergence efficiency.Simulation experimental results show that the method proposed in this paper can effectively improve the SFC mapping rate and resource utilisation, reduce the total processing time of SFC, and effectively improve the service efficiency.However, mapping resource allocation is a key factor that affects the quality of user service, but there are differences in the degree of resource demand for different network functions in different user requests, such as instruction scheduling is biased towards CPU resources, image computing is biased towards GPU resources, and file reading is biased towards disk resources.And the experiments in this paper are based on a maximum of 100 service requests, which is a gap with real network service scenarios.Therefore, node resources will be specifically discussed in future work to analyse the impact of CPU, memory, disk and other resources in the nodes on the efficiency of service mapping, and to study the optimal solution of the resource allocation problem in large-scale service mapping based on machine learning methods.
Authors' contributions.Conceptualization by WH and SL; methodology by WH, SL and HL; software by SL and HL; validation by WH, CZ, GL and YL; formal analysis by WH SL and CZ; investigation by SL, HL and GL; data curation by SL , GL and YL; writing-original draft preparation by WH and SL; writing-review and editing by WH, SL, HL , CZ , GL and YL.All authors read and approved the fnal manuscript

24
i=1 c i denotes the physical node computational resources, 24 i=1 45 j=1 b i,i denotes the link bandwidth resources, and c f c (t) + b f e (t) denotes the sum of the resources of the physical nodes and link bandwidths allocated by the SFC.

Fig. 5
Fig. 5 Mapping rate in parallel multiple training, can effectively accelerate the convergence efficiency, and use the optimisation function instead of the state-action-value function to reduce the training empirical variance.SA3C Resource Allocation Model is shown in Fig.2.

Table 1
The specific configurations of the parameters of the SA3C