Learning‐based containment control in the absence of leaders' information

This article considers the containment control problem of continuous‐time dynamic agents without leaders' state information. The leaders have first‐ and second‐order dynamics, respectively, while first‐order dynamics always govern the followers. Each agent has inherent nonlinear dynamics and can only measure the output information of its neighbors. The output of each leader is expressed as the product of an unknown coefficient and a position‐like state, while the output of each follower is equal to its position‐like state. To stabilize the position‐like states of the followers to the convex hull spanned by leaders, the unknown coefficients are asymptotically tracked by leveraging reinforcement learning based on the inherent dynamics and the output information. Two distributed learning‐based containment protocols are proposed, respectively. It is proved that if the directed communication topology has a spanning forest and certain conditions in terms of the inherent nonlinear dynamics are satisfied, then the proposed controllers with proper control gains solve the containment control problem asymptotically under arbitrary initial states. An exciting conclusion is that the learning algorithms' convergence rate plays an important role in achieving containment control. Numerical simulations are performed to validate the effectiveness of the obtained theoretical results.

With different distributed protocols, mobile autonomous agents can accomplish various tasks.The most studied issues include consensus building, [9][10][11][12] containment problems, [13][14][15][16][17][18] flocking, 19,20 rendezvous, 21 formation maintaining 20 and distributed optimization and games, [22][23][24][25] etc.As one of these challenging topics, the containment problem characterizes a collection of agents that autonomously move into a convex hull spanned by several other stationary or dynamic agents.For example, in reality, a team of vehicles carrying flammable materials must be protected by several robots from venturing into areas of elevated temperatures.It is often desirable that the vehicles are stabilized in a specific area enclosed by these robots during the transit.The robots normally take on leader roles while the vehicles to be secured take on follower roles, this is due to the fact that the positions of vehicles are usually determined by those of the robots.In the study of containment control of multi-agent systems, the leaders are often distinguished from a group of agents by the fact that they have no neighbors in the communication graph.In other words, leaders never measure information from other agents.
Many experimental and theoretical efforts have been carried out on containment control of multiple autonomous agents, for example, [13][14][15][16]26,27 In Reference 13, the authors propose a hybrid Stop-Go policy to drive a group of leaders to reach a target area while ensuring that all followers stay in the convex polytope spanned by the leaders. In Reerences 14 and 15, the authors show that a consensus-like protocol can be implemented to solve the containment problem in the presence of more than two leaders among the agents.They obtain that containment can be achieved for agents with either first-or second-order dynamics if the underlying directed communication graph contains a spanning forest.However, all these works are based on the assumption that all the agents have identical dynamics, and the relative state information can be transmitted among the agents.Phenomena that leaders and followers have different dynamics are ubiquitously observed in reality.Moreover, only an agent's output information can usually be measured; the relative state information cannot be transmitted directly between two agents if their output coefficients are nonuniform.16 studied the containment problem of multiple heterogeneous agents.However, there is a fundamental implicit assumption that all the followers can directly measure the relative state information from the leaders.
In this article, we study the containment problem of multiple autonomous agents in the absence of the leaders' state information.More specifically, the followers can only measure the output information of leaders.In addition, the leaders' dynamics may differ from those of followers.In contrast to, Reference 11 where the output of each leader is equal to its position-like state, we consider the case that the output of a leader can be expressed as a product of a constant (called the output coefficient) and its position-like state.Since the output coefficient of each leader is unavailable to the follower connecting to it, each follower can no longer achieve the state information of leaders.The observer-based method in the literature can often estimate unavailable state information.The system state is usually calculated for a linear visual system by developing an observer based on the output coefficient, input, and output information. 28,29For a multi-agent system with integrator agents, a typical approach to estimating the unavailable state measurements is to design a state observer using some measured state information. 11An observer for learning the state can be developed for a linear multi-agent system based on local relative output information and the system coefficients. 30Unfortunately, these methods usually require each agent to know either the position-like states of its neighbors or the system coefficients.They, therefore, do not apply to the problem in this article.Although achieving containment control of outputs is also interesting, 31 sometimes it is not the case in practice.
Out of the situation mentioned above, in our study, the outputs of leaders involve unknown coefficients.Both firstand second-order dynamics are considered for leaders, while first-order dynamics always govern the followers.Similar to, 11,12 we investigate each agent's case with a common inherent nonlinear dynamic.Two learning-based protocols are proposed to accomplish the task of containment, respectively.For the multi-agent system with first-order leaders, by utilizing the nonlinear dynamics and the output information measured from neighbors, observers for estimating the output coefficients of leaders are designed for followers connecting to leaders.For the multi-agent system with second-order leaders, an extra observer is developed for each follower to estimate the velocity-like state of leaders.Both cases view the observer as a local reinforcement learning algorithm.The estimated coefficient is the policy, the inherent nonlinear dynamics is the action, and the leader's output is the reward from the environment.It is shown that if the underlying communication graph contains a spanning forest and the inherent nonlinear term satisfies some mild conditions, the proposed protocols will drive the position-like states of all followers into the convex hull spanned by leaders asymptotically.Specifically, if there is only one leader in the network, the containment problem solved in this article is equivalent to the tracking problem.
Moreover, in the case of first-order leaders, by designing proper reference inputs as the inherent dynamics, the position-like states of leaders can be either fixed or time-varying at the steady state.The containment problem can be solved once the estimation errors asymptotically converge to zero.In contrast, in the case of second-order leaders, the position-like states of leaders will always be time-varying.For achieving containment, the estimation errors should converge to zero exponentially.
The remainder of this article is as follows.Some basic concepts of graph theory, the detailed problem statement, definitions, and useful lemmas are introduced in Section 2. The containment problem with first-order leaders is analyzed in Section 3. The containment problem with second-order leaders is discussed in Section 4. Numerical simulations are presented in Section 5. Section 6 concludes the whole article.Notation: Let R, R n and R m×n denote the set of real numbers, n-dimensional real vectors, and m × n real matrices, respectively; | ⋅ | is the Euclidean norm; | | denotes the cardinality of the set  ; 1 denotes an all-1 vector with compatible dimension; diag(c 1 , … , c n ) defines a diagonal matrix with c 1 , … , c n being the diagonal elements; ⊗denotes the Kronecker product; X T means the transpose of matrix X; ∞ implies +∞.

Preliminaries of graph theory
The interaction topology among the agents is modeled as a weighted directed graph  = (, , ), where  = {1, … , n + m} denotes the set of agents,  denotes the set of communication links,  = [ a ij ] is the adjacency matrix with a ij > 0 if (j, i) ∈  and a ij = 0 otherwise.A communication link (j, i) ∈  means that agent j can measure the information of agent i.The set of neighbors of agent i is denoted by ∑ j∈ i l ij and l ij = −a ij for i ≠ j.A directed path from agent i to agent j is a sequence of edges (i, i 1 ), (i 1 , i 2 ), … , (i l , j), where (i k , i k+1 ) ∈  for k ∈ {1, … , l}.A directed graph is said to contain a spanning tree if there exists a vertex i 0 such that each other vertex in the graph is connected to i 0 by a path starting from i 0 .A graph contains a directed spanning forest if each connected component contains a directed spanning tree.
Throughout this article, we make the following assumption.
Assumption 1.The communication graph  contains a directed spanning forest.

Problem statement
Consider a group of n + m autonomous agents composed of n followers and m leaders.An agent is said to be a follower of at least one neighbor.An agent with no neighbors is called a leader.The dynamics of the ith follower are given by where x i ∈ R N denotes the position-like state of follower i, f (t) is the inherent dynamics of each agent, u i is the protocol to be designed, y i is the measured output of follower i,  = {1, … , n}.
Since each leader measures no information from other agents, its motion is driven by only the inherent dynamics f (t).We consider the leaders with either first-order dynamics ẋi = f (t), or second-order dynamics where x i ∈ R N denotes the position-like state of leader i, v 0 is the common velocity-like state shared by all leaders, In this article, we consider a simple case where c i is a scalar.Note that c i may be a matrix in practice.If c i is an unknown matrix, then it will be more difficult to address.We will try to solve this problem in the future.The inherent dynamics f (t) can be viewed as a reference input steering the agents to have desired velocity or acceleration, similar to References 11 and 12 Our objective is to propose a distributed protocol for the followers to solve the containment problem, which is defined as follows.
Definition 1.The containment problem is solved if the position-like states of all followers asymptotically converge to the convex hull spanned by leaders.
In this article, we consider that each agent can only measure the output information of its neighbors.As a result, the followers cannot obtain the real state measurements of leaders.To solve the containment problem, we will establish distributed observers for followers based on f (t) and the output of leaders.To achieve this goal, the following assumptions on f (t) may be used in different scenarios in this article.
There exist two positive constants , T and a constant vector g ∈ R N , such that ∫ s+T s g T f (s)ds ≥  for any s ≥ 0.
Assumptions 2 and 4 guarantee boundedness and continuity of the inherent dynamics.Note that if f (t) vanishes in finite time, it will be difficult for the followers to utilize it to estimate the real position-like states of leaders.Note also that Assumption 4 is stronger than Assumption 2. For example, f = (sin t, 0) T satisfies Assumption 2 by choosing g = f , but does not satisfy Assumption 4. These two assumptions will be used in different cases in this article.Assumptions 3 and 5 are conditions for the exponential convergence of observers developed in this article.In our work, exponential convergence of the estimation errors is required for the case of second-order leaders but not necessary for the case of first-order leaders.Moreover, the differentiability of f (t) in Assumption 4 is used only when different dynamics govern leaders and followers.It allows us to design time-varying control gains associated with the derivative of the inherent nonlinear dynamics.In reality, these assumptions can be satisfied by artificially designing f (t), thus are reasonable from a practical point of view.It is easy to see that the adjacency matrix  and Laplacian matrix  can be written as Before entering into details of the main results, we list the following lemmas.
Lemma 1. (14).If the communication graph  has a directed spanning forest, then all the eigenvalues of   have positive real parts, each entry of Lemma 2. (8).Suppose that a symmetric matrix is partitioned as E = ) , where E

𝛽(s)ds ≥ 𝛿
for any k = 0, 1, 2, … , then we have This implies that there exists a constant K * > 0, such that for any K ≥ K * , we have On the other hand, let c = −K * T − K * T < 0, it is easy to see that for any t < K * T, we have is bounded, then lim t→∞ px = 0 and lim t→∞ py = 0.
Proof.Let q = exp(−at)b, we first show lim t→∞ qx = 0 and lim t→∞ qy = 0. Let y(t) = ∫ t 0 |z(s)|ds + y(0).It is easy to see that lim t→∞ y either converges to a bounded constant or goes to infinity.If y is bounded for all t ≥ 0, then it is obvious that lim t→∞ qy = 0. Otherwise, the L'Hospital's rule can be used: lim t→∞ qy = lim t→∞

CONTAINMENT OF MULTIPLE HOMOGENEOUS AGENTS
This section considers the case when first-order dynamics govern all agents.To achieve containment, it is required for the followers to know the position-like states of leaders.However, due to the absence of c i , i ∈ , the position-like state of leader i is unavailable.Therefore, we plan to estimate c i during the evolution.If follower i is connected to leader j, then we use i  j to denote the estimate of c j by follower i.The distributed containment protocol for follower i is designed as follows: It suffices to develop observers for the followers connecting to at least one leader to learn the corresponding coefficients, which implies that for any i ∈  , j ∈ , i  j exists if and only if a ij ≠ 0. The corresponding observer is designed as follows: where g(t) ∈ R N is a vector such that ∫ ∞ 0 g T (s)f (s)ds = ∞, k > 0 is a constant control gain to be designed appropriately.The existence of g(t) can be guaranteed once Assumption 2 holds.
The observer (5) can be viewed as a local reinforcement learning algorithm for agent i.Here, i  j is the policy to be updated and will be employed in the containment control protocol, f (t) is the action, and y j is the reward returned from the leader, which can be viewed as the environment.
Note also that for agent j ∈   , the information measured by follower i is y j = x j if j ∈  , and y j = c j x j if j ∈ .As a result, protocol (4) designed for follower i only employs the information received from its neighbors and thereby is decentralized.
Denote i p j = i  j − c j , i q j = g T ( y j − i xj ) , and  = g T f .Then, the system for follower i to estimate c j can be transformed into To show the effectiveness of the observer (5), it suffices to show lim i t→∞ p j = 0. Lemma 5.For any i ∈  and j ∈  satisfying  ij ≠ 0, by implementing observer (5) with k ≥ √ 2f g, the following statements hold.

By Assumption 2, we have
Consider the function V = k i p 2 j + 2 i p i j q j + 2 k i q 2 j .It is easy to see that the matrix Differentiating V along the solution to (6), yields ) .
From 0).Due to Assumption 2, we have V(t) → 0 as t → ∞.Together with the positive definiteness of V, we have i p j , i q j → 0, which results in lim t→∞ 2. By Lemma 3, we have V(t) ≤ exp . This implies exponential stability.▪ Remark 1.For a first-order agent, the inherent nonlinear dynamics f (t) can be viewed as a reference velocity-like input.That is, all agents will achieve a common velocity f (t) at the steady-state.Lemma 5 shows that Assumption 2 is sufficient for f (t) to ensure the effectiveness of the observer (5).Note that this assumption also holds even if lim t→∞ f (t) = 0.For example, f (t) = 1 1+t .In this case, all leaders can be stationary at the steady-state.
Remark 2. In some circumstances, each leader requires an extra control input, stabilizing the leaders to form a specific geometric shape or guide the leaders to the desired direction.In the literature (e.g., 3,13 ), this control input can be denoted by a gradient flow, that is, ũi = −∇ x i V, i ∈ , where V is a nonnegative potential function.As a result, the dynamic equation of leader i can be expressed as ẋi = −∇ x i V + f (t).In this scenario, the output coefficients of leaders can also be observed if the velocity v j of each leader j can be obtained by the followers connecting to it.By replacing f with v j , observer (5) can be employed to estimate c j for follower i.

√
2f g, then protocol (4) with observer (5) solves the containment problem for homogeneous agents under arbitrary initial states.
Proof.The dynamic equation of follower i can be written as follows: , where C = diag(c n+1 , … , c n+m ).Hence, ( 7) can be expressed in the following compact form ẋ It is clear that  and  are isomorphic; together with Lemma 1, we obtain that −  is stable and

Now we only have to prove that x
It is easy to see that for all i ∈  , there exist some nonnegative constants  j such that ∑ j∈  j = 1 and  i =  i ∑ j∈  j x j .Now we show that lim t→∞  = 0. From Lemma 5,  i → 0 as t → ∞ for any i ∈  .Note that x i =x i (0) + ∫ t 0 f (s)ds.Let x (h) i and  (h) i be the hth component of x i and  i , respectively.Consider x Using L'Hospital's rule, one has lim By Lemma 5, it holds that lim t→∞ = 0, together with the boundedness of f , we obtain that lim , by Squeeze theorem, we have lim That is, lim t→∞  = 0. Since −  is stable, there exists a positive matrix P such that Consider the Lyapunov candidate function V =  T (P ⊗ I N ).Let  1 and  n be the smallest and largest eigenvalue of matrix P, respectively.It follows that Since lim t→∞  i = 0 and lim t→∞  i = 0, for any given  > 0, there exists a time T > 0, such that when t ≥ T, it holds Therefore, Note that there must exist another T 1 > T, such that if t > T 1 , then exp From arbitrariness of  and positive definiteness of V, we have lim t→∞ V = 0.As a result, x  → x  as t → ∞.
Remark 3. From (9), one can observe that the convergence rate of multi-agent system (1-2) with the protocol ( 4) is dependent on ||||.By Lemma 5, if Assumption 3 holds, then i p j converges to zero exponentially.
Therefore,  i ≤ exp(−a i t)b i for some scalars Together with the fact that then  vanishes to zero exponentially.In this scenario, the term  2 in (10) can be replaced by exp(−at)b for some scalars a, b > 0, then exponential convergence of the multi-agent system can be obtained.This implies that the system is input-to-state stable (Reference 28 Lemma 4.6).As a result, if f (t) is not completely known for the followers, for example, each follower can only have access to f (t) + d(t), where d(t) is unknown but bounded by d, then containment control can be achieved in the sense that x  is bounded by an increasing function of d,  vanishes to zero if d = 0.

CONTAINMENT OF MULTIPLE HETEROGENEOUS AGENTS
In this section, we consider the case that first-order dynamics govern all followers while all leaders have second-order dynamics and share a common velocity-like state, that is, v i = v j = v 0 , i, jϵ.Then, f (t) can be viewed as an acceleration-like reference input known for all agents.In this circumstance, we have to face two challenges: (1).The position-like and velocity-like states of leaders cannot be measured directly.(2).The dynamics of followers are different from those of leaders.To solve the containment problem, we will develop an observer for each follower to estimate the common velocity-like state of leaders.Moreover, for any follower connecting to at least one leader, it is necessary to develop extra observers to estimate the leaders' output coefficients.The observer-based protocol for follower i is set as.The distributed containment protocol for follower i is designed as follows: where v i is estimated by the following observer: Under Assumption 4, the observer for follower i to estimate c j is designed as where , and  = g T f then observer (13) can be transformed to Similar to ( 5), ( 14) is a local reinforcement learning algorithm for agent i as well.Lemma 6.For any i ∈  , j ∈ , and  ij ≠ 0, let k 1 =  + 1, k 2 = − β +  + 1, we have the following results.
1. Denote i w j = i r j − k 1 (t) i q j .Then i qj = i w j i ẇj = − i p j − ( + 1) i w j − ( + 1) i q j It follows that i ẇj = − i w j − i q j + i ̇j i ̇j = − i  j where i  j = i p j + i q j + i w j .Hence, i  j (t) = exp By Assumption 4, we have lim i t→∞  j = 0, lim i t→∞ ̇j = 0. Consider the function V = 2 i w 2 j + 2 i w i j q j + 3 i q 2 j , it is easy to see that 0 Hence, lim t→∞ V = 0.As a result, lim i t→∞ w j = 0, and lim i t→∞ q j = 0. Therefore, lim i t→∞ p j = lim t→∞ ( i  j − i p j − i w j ) = 0.That is, lim i t→∞ a j = c j .
2. By following similar lines to the proof of Lemma 5, we can deduce that i ̇j exponentially converges to zero.
Therefore, Proof.From (1), we can find that protocol u i is the velocity-like state of follower i.Hence, the dynamic equation of follower i with the protocol ( 11) and ( 12) can be written as follows: For the simplicity of notation, we adopt the shorthand Ax to denote (A ⊗ I N )x for AϵR p×q and x ∈ R qN , where p, q ∈ N. Then we have

SIMULATIONS
In this section, we present two examples to illustrate the effectiveness of the theoretical results.In the following, the red and blue dash-dotted lines will denote the trajectories of leaders and followers, respectively.
From Figure 4, we can see that the control input, that is, the velocity of each agent, finally converges to zero.Each agent is stabilized in a fixed position at the steady-state.

2
) T , g(t) = (1, 0) T .It is easy to see that  = g T f = sin 2 t 2 is a periodic function with a period 2π.Hence, for any t ≥ 0, we have ∫ t+2 t g(s)ds = ∫ t2π 0 sin 2 s 2 ds = .That is, Assumption 5 is valid.The trajectories of i a j − c j and v i are described in Figures 6 and 8, respectively.It is shown that the tracking errors i a j − c j    F I G U R E 8 Trajectories of velocity states of agents in each direction under observer (12).and v i − v 0 both converge to zero asymptotically, which is consistent with Lemma 6.By Theorem 2, the containment problem is solved by implementing protocol (11) and (12).The agents' position states are shown in Figures 5 and 7.

CONCLUSION
This article solved the containment problem of multi-agent systems in the presence of communication constraints between leaders followers.The leaders first-or second-order dynamics, while first-order dynamics always govern the followers.By developing distributed observers, we proposed a learning-based protocol for solving the containment problem in each case.When the underlying communication graph contains a spanning forest, we proved that containment could be achieved for leaders with first-order dynamics if the estimation errors converge to zeros asymptotically.For leaders with second-order dynamics, containment can be achieved if the estimation errors converge to zeros exponentially.Two simulation examples illustrated the effectiveness of our control strategies.In the future, we will try to solve the containment problem when time-delays exist in communications and the interaction topology is time-varying.

9 F I G U R E 1 A
communication topology contains a spanning forest.The red and blue nodes denote the leaders and the followers, x−axis Position State y−axis F I G U R E 2 Trajectories of agents under a protocol (4).

4 F
I G U R E 3 Evolution of the tracking error a − c under observer(5).

6 F I G U R E 4
All agents with protocol (4) finally stop moving.
x−axis Position State y−axis F I G U R E 5 Trajectories of agents under a protocol (11).

6
Evolution of the tracking error a − c under observer(13).

7
Trajectories of position states of agents in each direction under a protocol(11).
for any t ≥ 0 if and only if there exist constants  > 0, c ∈ R such that ∫ Proof.Sufficiency.It is obvious that ∫ t+T t (s)ds ≥ , T = .Necessity.Let  =  2T , note that ∫ (k+1)T kT