Intelligent flow control algorithm for microservice system

In microservice systems, availability can be ensured through a variety of measures, such as fault tolerance and flow limiting, which are collectively called the flow control. In the current mainstream system design, the flow control rules are usually fixed and set manually, which cannot be dynamically adjusted according to the flow shape. The performance of the system is thus not fully explored. To mitigate this problem, an adaptive dynamic flow control algorithm is proposed. Based on the system's monitoring data and current flow, the algorithm calculates the flow-limiting threshold in real time, and then it implements fine-grained service adaptive flow control to improve the resource utilization. Experimental results show that the performance of the adaptive automatic flow control is better than that of the traditional static method on resource utilization.


| INTRODUCTION
With increasing complexity of computer software, the performance requirements of programs have exceeded the limits of a single computer and a distribution-friendly architecture is urgently needed. Like microservice architecture [1], programs designed with microservice architecture are usually divided into multiple functional modules. While each module is implemented as a small yet independent system, communication between modules usually relies on technology-agnostic protocols such as HTTP [2].
Microservices architecture brings many benefits to the software industry, which makes applications easier to understand, develop and maintain, and brings faster delivery, improved scalability and greater autonomy [3]. However, microservices approach is also subject to criticism for a number of issues, especially its stability. Distributed microservice architecture introduces more servers, and unstable infrastructures such that network problems may easily cause service failures [4]. Owing to the complex calling paths between services, the failure of a service on the path may be passed down and cause the entire system to crash. When facing high concurrent service requests, sudden large traffic may cause the system CPU usage and load to soar. As a result, requests cannot be processed in a timely manner. A series of measures to address the issue and ensure service availability have been proposed.
Fault-tolerant mechanisms, such as flow limiting and circuit breaking, are popular methods to maintain the availability of microservice systems [5]. Flow limiting controls the number of requests for services per unit time and prevents sudden request flood from exceeding the system's processing capacity. Circuit breaking is used to detect failures and encapsulate the logic of preventing failure from constantly reoccurring during maintenance, temporary external system failure or unexpected system difficulties.
While flow control libraries such as Hystrix [6] and Sentinel [7] can help developers quickly implement flow limiting and circuit breaking, and it is still challenging to figure out the capacity of a large dynamic distributed system where flow and latency characteristics are constantly changing. Historically, flow control mechanisms are manually configured by fixed limit rules with the help of performance testing and profiling, which is a complicated and arduous process. It is difficult for even experienced engineers to accurately fit the rules to the system performance. On the other hand, fixed static rules cannot adapt to changing flow conditions and sometimes limit the throughput. Some researchers have proposed to simplify the testing process through automated testing [8,9].
In our work, we try to simplify testing process through dynamic flow control to improve system performance. To achieve this goal, we introduce an adaptive and auto-scaling solution to improve system resource utilization.

| MODELING MICROSERVICE FLOW
Setting flow threshold is a resource allocation problem, the key is to allocate appropriate amount of resources to a service that is frequently called.

| Flow characteristics in microservices
In a microservice system, modules are deployed and run independently, and call each other through network interfaces. We consider each module that runs separately as the minimum unit to calculate the flow-limit threshold.
Inside the module, we divide services into downstream services and upstream services, where upstream services need to call their downstream services when they are called. Therefore, the call path of this module can be represented as a directed acyclic graph (DAG), G = (V, E). The node set V represents the service flow entry, and the edge set E represents the service calling relationship. Weight of the node x, the realtime requesting flow of each service, is calculated as the sum of the weights of all the previous nodes. We use x + to represent flow from external calls to the module. An example of such a DAG is shown in Figure 1.

| Optimization model
As can be seen from above figure, the module's throughput is determined by the flow of externally accessible upstream service. Therefore, the goal of the flow limit rules is to maximize the sum of x + i on the premise that resources are available. Let the service's flow-limit threshold be y i , the actual passing flow of the node is min(x i , y i ). The total throughput can be expressed as: The flow X ¼ x 1 ; x 2 ; :::; x i need to be fitted to Y ¼ y 1 ; y 2 ; :::; y i to obtain higher throughput. Meansquare distance can be used to assess the fitness: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi The flow-limiting rules Y should obey the following linear constraints: � System resource constraints The system resources required to process the current number of requests should not be greater than the sum of the node resources: c is the load factor of the service, which is depended on the complexity of the task. m is the total amount of node resources, which can be smaller than the actual value in order to play a protective role.

� Call path constrains
When there is a call path between services, the service, flow limit should be greater than the sum of its upstream flow. For example, service 2 and service 3 are downstream services of service 1 .

� Maximum-minimum constraints
In order to reduce the MD, the maximum constraints prevent the threshold from being loosened too much. Since the calculation of the flow limit always delays the actual flow, adding a relaxation factor a ∈ ½1; þ ∞� can prevent jitter. To prevent low-priority services from being overcrowded during burst flow peaks, a minimum value of a constant N i is set for flow limit rules.
3 | FLOW CONTROL RULES

| Real-time monitoring
In order to adaptively estimate the parameters of the above model and auto-scaling microservice system, real-time metrics need to be collected. Monitoring includes service monitoring and system monitoring. Service monitoring mainly monitors the description of service quality, including response time and queries-per-second. System monitoring includes the operating status attributes of the computing devices such as CPU usage, thread queues and device physical attributes. Table 1 shows the attributes of interest in real-time monitoring.
Based on above monitoring information, it can be judged whether the number of current requests exceeds the processing range of system, and the corresponding flow-limiting measures can be taken. Monitoring data is extracted every second to update the flow control threshold.
minRT indicates that the request can be processed directly without queuing, so minRT approximates the CPU time required to perform a service. In a unit time (1 s), the time required to process the given requests is LoadAvg is the queue length waiting for CPU (the ready queue or run queue), which presents the average system load over a period of time. When LoadAvg > CpuNum, the process queue length is greater than the number of cores. In this case, the system is busy, and service flow should be limited.
To exclude the effects of processes that are not related to the service module running in the system, Loadlevel is defined to indicate the relationship between current flow and system capacity:

| Flow-limit threshold
The flow-limit threshold is obtained by solving the optimization model. The input of the model is the current microservice system topology and real-time monitoring data, while the output is the flow-limit threshold Y for the next moment. This problem can be solved using heuristic algorithms or machine learning methods such as ant colony algorithms and recurrent neural networks. These methods can automatically learn parameters and capture non-linear relationships. As this task is sensitive to delay, we prefer to choose a lightweight method.
Since the constraints are all linear, linear optimization algorithm could be used to solve the problem. Since the optimization goal of the model is to maximize system throughput, sum of downstream service flow is used as the objective.
For various business systems, the priority of each service would be different. For example, requests for important services such as ordering and payment should be processed first. Adding the benefit coefficient k enables the model to respond preferentially to requests for important services. The objective function is: And the problem can be expressed as: There are multiple methods for solving linear programming problems, such as simplex method, ellipsoid algorithm, which can obtain the optimal solution in a polynomial time.

| Circuit breaking
Circuit breaking degrades a service whose response times out, to avoid the upstream service failures. The response time of different services is different, and a mechanism is needed to determine whether the response time of the service is abnormal. We can simply set a timeout threshold to blow the service, but we tend to use more intelligent methods to achieve better results.
We collect RT monitor data in a sliding time window (m seconds), and calculate the gradient at each moment: where β is a constant used to prevent gradient explosions caused by small denominators. When the gradient exceeds a certain value, it means that the RT has increased sharply over a period of time. It can be considered that the service is in an abnormal state and should be downgraded. In this case, the y corresponding to the service is moved out of the linear optimization goal, and its flow would be automatically set to 0.  A countdown will be started at the same time, and after countdown the optimization goal will be resumed.

Algorithm 1 Dynamic flow control algorithm
Input: The set of current flow of each service x n ; Real-time monitoring data ; The set of service load factor c n ; The set of benefit coefficient k n ; Circuit breaking gradient threshold a; Output: A set of flow control rules for next time window y n 1: for each i 2 n do 2: calculate gradient i ; 3: if gradient i > a then 4: Modeling linear programming model with monitoring data and c n ; k n ; 12: Solve the model to calculate y n ; 13: end if 14: Return y n Algorithm 1 presents the algorithm of our dynamic flow control.
The time complexity of Algorithm 1 is O(N), where N is the number of services. When calculating the threshold through the linear programming model, the main computational costs are from the two-stage simplex method, which in the worst case has exponential complexity. However, manual rules can be introduced to reduce the average number of iterations for simplex to 2n + 1.

| Leaky bucket algorithm
It takes a numerical time window to count the number of requests. For example, QPS refers to the number of requests within 1 s. This time window is not fixed, sometimes longer or shorter. Algorithm 1 calculates the load based on the average processing capacity. In a long time window, if requests accumulate in a short period of time, and other times are idle, flow control may not be able to protect the system. Ideally, no matter how long the time window is, we hope that the request can arrive at a fixed frequency.
We use the leaky bucket algorithm to shape the flow so that requests can pass at a uniform rate. The leaky bucket algorithm maintains a bucket with fixed capacity. When a request arrives, it is put into the bucket first. The requests in the bucket leak to the services at a fixed rate. The maximum request volume can be controlled by setting the capacity of the bucket.
The leaky bucket is essentially a simple FIFO buffer to remove burstiness or jitter, which was previously used in the packet-switched networks [10] and asynchronous transfer mode (ATM) networks [11]. Here, we regard the service as a control unit, use it in the microservice system to ensure availability, and improve the stability of flow control.
The setting of the leaky bucket corresponds to Algorithm 1 is presented in the following form: where maxRT is usually set by the user and refers to the highest tolerance for RT. The request is passed form the bucket every 1/y n second.

| Adaptive time interval
Although Algorithm 1 has a polynomial time complexity, we still hope to reduce its computational resource overhead as much as possible. In most cases, the flow would not change drastically, there is no need to frequently change the flow control rules. The frequency of executing Algorithm 1 should be adjusted appropriately, and the adjustment of the frequency is related to the trend of the flow. When the flow is stable, the flow limit rules can be updated at a longer interval, but when the flow rate changes greatly, the frequency of adjustment needs to be increased to avoid a decrease in throughput. Slow-start is a congestion control strategy with an additive increase/multiplicative decrease (AIMD) scheme, which has been used in TCP transport protocol [12]. Based on slow start, we proposed adaptive time interval algorithm to determine the frequency of updating flow control rules.
Use a variable T to indicate the time to wait for the next execution of Algorithm 1, Algorithm 2 presents the process of calculating T.

Algorithm 2 Adaptive time interval
Input: The current flow control rules y n ;The previous flow control rules y n 0 ; The time elapsed since the last calculation T (seconds); The global variable k is used to record the maximum value of T at the last reset; Update frequency factor s; Output: The time interval for the next calculation flow control rules T 0 1: flag ← 0; 2: for eachi 2 n do 3: if y n À y n 0 j j=y n > s then 4 According to the characteristics of the microservice scenario, the increase of interval time will be first multiplicative and then additive, until the flow jitters. In this case, the interval time will be reset to the minimum.

| Generating simulated flow
Service flow is a random process and can be expressed as a function of time fxðtÞ; t ∈ T g. The previous research found that the phenomenon of self-similarity is a characteristic of the Internet traffic dominated by network technologies such as the Ethernet and the TCP/IP protocol stack [13]. Statistically the traffic autocorrelation structure is maintained for several time scales and the traffic presents a characteristic of long dependence interval or Long-Range Dependence (LRD) [14]. A self-similar process that captures the LRD-nature is characterised by a single parameter, called Hurst parameter (H). A process of LRD has H larger than 0.5, and with high degrees of self-similarity when it is close to 1 [15]. There are multiple models for self-similar traffic generation, for example the Fractional Brownian Motion (FBM), the Fractional Gaussian Noise (FGN) and the On/Off model. Here we use the fractal Gaussian noise to generate a simulated flow for its adjustable H parameter [16,17].

| Comparison with static method
We use random midpoint displacement method (RMD) to generate a 1024-length fractal Gaussian noise based simulated flow with Hurst parameter of 0.7. For better illustration, we set the average values to 50, 100 and 150, respectively, and show the flow curve in Figure 2.
The service load factors are set as 0.06, 0.04 and 0.02, respectively, and the total resource m is set as 10. At this time, the static flow-limiting threshold is the average value of the flow, which is the theoretical optimal solution of the static flow-limiting method. Figure 3 shows the actual passing traffic after applying static and dynamic flow control on the flow scenario in Figure 2.
It can be seen from Figure 3 that the actual passing flow under the dynamic flow control rule is similar to the arriving flow. When the flow peak occurs, the flow limit thresholds of all services can be dynamically adjusted to adapt to current flow scenario and maximize resource utilization. Static the flow control rules the flattened peaks, other services' flow are not saturated, resulting in lower resource utilization. Table 2 lists the statistics of comparison experiments, where the throughput is the sum of passed QPS, the resource utilization is the product of traffic and load factor divided by the total resource amount and request arrival rate represents the ratio of total requests to actual requests. The experimental results show that when the resources conditions are the same, the performance of the dynamic flow control is about 1.5% higher than that of the static method in terms of main indicators (throughput and resource utilization), and the dynamic rules are closer to the real-time incoming flow (MD increased by about 158%).

| Influence of self-similarity on dynamic flow control
In order to compare the effect of flow self-similarity on dynamic flow control algorithm, the fractal Gaussian noise is used to generate flow with different H parameters (Figure 4). The H parameters are 0.2, 0.4, 0.6 and 0.8, respectively.
There could be random errors with Hurst parameters of flow simulated by the fractal Gaussian noise. We repeat the experiment for 100 times and take the average to reduce the effect of such error on experimental results. Table 3 presents the performance of the dynamic flow control method for different self-similar flows.
It can be seen from Table 3 that the dynamic rules' performance is better with the self-similarity flow. Considering that the Hurst parameter of microservice is usually F I G U R E 2 Simulated service flow LI ET AL. between 0.7 and 0.9, the adaptive dynamic flow control algorithm can achieve performance improvement with microservice flow scenarios.

| Flow control on different scenarios
In various traffic scenarios, the magnitude of the traffic change will be different. We simulate this by adjusting the variance of the generated flow. Based on the standard FGN simulated flow (variance is 10, Hurst parameter is 0.7), we compared the effects of different flow control strategies when the variance is scaled. As shown in Table 4, the variance of the original flow is scaled from 50% to 130%, and we measured various indicators when static and dynamic flow control methods were applied. Table 5 shows the improvement in various indicators of the dynamic method over the static method in different flow scenarios. Among them, MD represents the Euclidean distance between the flow control rules and the actual flow, and its value indicates the efficiency of the flow control protection. In all scenarios, the ratio of the MD value of the dynamic method to the static method exceeds 100%. In terms of resource utilization, throughput, arrival rate and other indicators, as the variance of flow decreases, the benefits of dynamic flow control also decrease accordingly. It shows that the dynamic method is more effective for the drastically changing flow scenarios.

| Time interval of the updating flow control rules
From an intuitive point of view, the higher the frequency of updating the flow-limiting rules, the better the fit with the actual flow, and the higher the utilization of resources. Updating flow control rules, especially solving linear programming problems, requires computing resources. Therefore, we need to adjust the update frequency to balance the computational costs with the benefits obtained.
In this part, we designed experiments to compare the impact of updating rules with different frequencies. It can be seen from the Table 6 that even the lowest frequency can greatly improve the resource utilization rate, which further improves as the frequency increases. The actual passed flow corresponding to different update frequencies is shown in Figure 5.
It can be seen that although various indicators have improved with the increase of update frequency, the extent of the improvement has decreased. Therefore, we hope to find an adaptive time interval that can balance the benefits of adjusting the rules and the cost of computation. In order to better show the gap between different methods, we constructed a simulated flow of 90% load, which allows more room for adjustment when the load is not full.
In Table 7, we compared fixed time interval, random time interval and adaptive time interval (Algorithm 2). We control the adaptive time interval by adjusting the value of s. Compared with other methods, with the same or less update times, better results are obtained. Since in actual business, the flow scenarios are different, we recommend to re-experiment according to the actual business flow to set an appropriate s value.

| IMPLEMENTATION
Sentinel is an open source framework with high availability components such as flow control, traffic shaping, circuit breaking and system adaptive protection, guaranteeing reliability and resilience for microservices. It has implemented basic functions of the microservice flow control and provides interfaces to facilitate the secondary development. We implemented an adaptive flow control on Sentinel as a function model, which enables adaptive flow control for multiple services.

| Resources and rules in Sentinel
Sentinel regards resources as the control unit, and developers only need to pay attention to the definition of resources. Various rules of accessing the resource can be added as soon as the resource is defined. Here we define services as resources to achieve the flow control of the microservice system. Rules can be dynamically changed and take effect in real time.
The Sentinel provides pre-set rules to decide whether the incoming requests should be controlled. Developers need to manually set rules based on the characteristics of the service flow, such as limiting the QPS or concurrent number of services. Rules can also be dynamically loaded from data sources. However, setting different rules in different time periods requires manual configuration.

| Adaptive dynamic flow control
The Sentinel provides a number of infrastructures such as the resource run-time statistics and flow control, system monitoring, and establishment of the invocation chain for our algorithm implementation. The monitoring information is updated in real-time. Based on this information, we construct the linear optimization model described above. We built a AutomaticRuleManager, which build and solve model to generate rules based on monitoring information and Algorithm. The two phrase simplex method [18] is used to solve the linear programming model and we implemented the automatic construction of the simplex tableau.
Users only need to use the AutomaticSlot when accessing the Sentinel and set parameters, such as the highest CPU usage and the highest system load for the AutomaticRuleManager, which can automatically perform flow control according to the real-time flow situation and system status, without additional manual configuration. This dynamic flow control is calculated in real time. There is no need to access additional data sources, which simplifies the use process. We also provide a code example [19].

| CONCLUSION
We propose a flow-control algorithm to dynamically calculate the flow-limiting threshold based on the real-time monitoring data. The method in this article has been experimentally verified to have a significant performance improvement over the traditional flow control method and implemented an open source within the Sentinel. The algorithm can reduce manual intervention during the microservice system deployment, improve system resource utilization, and has great application potentials for the current distributed microservice architecture. In future, we will study the flow control problem under the condition of non-linear resource consumption to improve the robustness of the algorithm.