A proximity moving horizon estimator for a class of nonlinear systems

In this article, we present a proximity‐based formulation for moving horizon estimation (MHE) of a class of constrained discrete‐time nonlinear systems. The cost function of the proposed estimator includes a convex stage cost as well as a Bregman distance for the a priori estimate. This rather general formulation of the cost function allows for a flexible design of the estimator depending on the setting of the problem at hand. We derive sufficient conditions for the global exponential stability of the underlying estimation error. We also investigate the robustness properties of the proposed MHE scheme applied for linear systems subject to unknown process and measurement disturbances in terms of an input‐to‐state stability analysis. The estimator performance is illustrated by means of simulation examples.


INTRODUCTION
Moving horizon estimation (MHE) is a powerful online optimization-based approach that estimates the state of a dynamical system by solving at each time instant a suitable optimization problem. Hereby, MHE takes a fixed and limited number of recent measurements into account and the considered horizon of measurements is shifted forward in time (in a receding horizon manner) as a new measurement becomes available. The main advantages of MHE include the ability to specify a performance criterion that can be designed specifically for the problem at hand, the ability to handle potential constraints on the estimated states in the underlying optimization problem, and a tractable online computational complexity compared to the full information estimation problem. 1 The stability properties of the estimator depend on how to incorporate information into old measurements that are discarded by the receding-horizon optimization problem. Several design methods were proposed in the literature in order to preserve stability for linear systems [2][3][4] and for nonlinear systems 5,6 as well as for large-scale systems. [7][8][9] We also mention works where robust stability of MHE is analyzed for linear 10,11 and nonlinear systems, 12,13 as well as methods for the fast implementation of MHE with stability guarantees. [14][15][16] Typically, a quadratic prior weighting centered at a successively updated a priori estimate is introduced in the cost function in order to ensure stability. One particularity appealing strategy 4 relevant to our approach solves a least-squares estimation problem where a stabilizing preestimating observer is employed into the a priori estimate and forward prediction equations, which allows ensuring design stability of the estimation error.
In our previous work, 17 we introduced for the problem class of discrete-time linear systems a novel proximal MHE formulation whose stability is inherited from the a priori estimate and in which a rather general and not necessarily quadratic performance criterion enables to handle outliers and to consider other specific data characteristics. More specifically, the cost function of the underlying optimization problem consists of general convex stage costs that enable to include, for example, sparsity promoting 1 estimates, which have been highly successful in many signal processing applications. 18,19 Furthermore, a quadratic prior weighting is included as a proximity measure between the MHE estimate and the a priori estimate chosen as a stabilizing Luenberger observer. Recently, we generalized this MHE formulation by introducing the so-called Bregman distance as a more general proximity measure for the stabilizing a priori estimate. 20 Within this Bregman formulation, it is possible to transform a constrained estimation problem with a specific choice of the Bregman distance, in particular with the so-called relaxed barrier functions, into an unconstrained problem, which can, therefore, yield to a simple and efficient numerical implementation. 21,22 The goal of this work is to generalize these recent results. 17,20 In particular, the contributions of this article and the main differences with respect to our recent results are twofold. First, we extend these results by investigating the inherent robustness properties of the proximity-based MHE scheme in terms of an input-to-state stability analysis. Second, we extend them to nonlinear systems. More specifically, we consider a class of nonlinear discrete-time systems that can be transformed into systems that are affine in the unmeasured state. We design a proximal MHE scheme based on Bregman distances for the resulting class of systems and derive sufficient conditions for the global exponential stability of the estimation error based on a particular assumption on the Bregman distance. Second, we investigate for a special case the input-to-state stability properties of the employed estimator, in which the Bregman distance is chosen as a weighted quadratic function. The main methodological concepts for the design and analysis within the framework of the proposed proximity-based MHE approaches are from the field of state estimation and observability theory and the powerful class of proximal point algorithms [23][24][25] and proximal methods based on Bregman distances. 26,27 The remainder of this article is structured as follows. In Section 2, we introduce the considered class of nonlinear systems as well as the proposed MHE scheme. In Section 3, we analyze the nominal stability properties of the underlying estimation error. In Section 4, we restrict our attention to a special case of the considered class of systems for which the input-to-state stability analysis of the MHE strategy is investigated and established under rather mild assumptions. In Section 5, we illustrate our results by means of simulation examples.
Notation: Let R + and R ++ be the sets of nonnegative real and positive real numbers, respectively, and S n + and S n ++ be the sets of symmetric positive-semidefinite and positive-definite matrices of dimension n ∈ N + , respectively. For a vector v ∈ R n , let ‖v‖ P ∶= refer to the spectral norm of a matrix M. For a for given i, k ∈ N, as well as the norm of the column vector ||v|| ∶= ||col

FORMULATION OF THE ESTIMATOR
We consider a class of discrete-time nonlinear systems of the form where z k ∈ R n denotes the state vector, u k ∈ R m is the input vector, and y k ∈ R p is the output vector. We assume that (1) can be transformed into the following system: where denotes a nonlinearity that depends on the output y k . We state this more formally in the following assumption. Assumption 1. We assume that system (1) is globally equivalent to a linear observable system (2) via a diffeomorphism x = T(z).
Necessary and sufficient conditions for Assumption 1 to hold can be found in Reference 28. We also assume that the state x k of the new system lies in the interior of the following polytopic set where C x ∈ R q x ×n and d x ∈ R q x , with q x ∈ N + referring to the number of affine-state constraints. Considering nonlinear systems for which such transformations exist might be regarded as rather restrictive, it is, however, a common assumption in nonlinear observer design. In fact, such a class of nonlinear systems is extensively studied in the literature. [28][29][30][31][32] For instance, the transformation into system (2) includes the case of nonlinear state transformations in which a nonlinear system (1) is transformed into the so-called nonlinear observer form. 28,31 This form corresponds to system (2) with the canonical structure given by for the example of a single output and no input. The transformation of the nonlinear system allows obtaining system dynamics that are affine in the unmeasured state x k and, hence, an MHE formulation in which the underlying optimization problem is convex. 17,20 More specifically, we design an MHE scheme in which the cost function consists of convex stage costs that penalize the output residuals and the model mismatch as well as a strongly convex Bregman distance centered around an a priori estimate. Here, we choose a stabilizing Luenberger observer for system (2) as an a priori estimate. 4,17 For the sake of completeness and ease of analysis, we also center the Bregman distances associated with the model mismatch around zero. We handle the MHE problem by solving the following constrained optimization problem at each time instant. Estimation problem P: At the time instant k, given the inputs u k−N , … , u k−1 , the measurements y k−N , … , y k−1 and the a priori estimates z = ( , where 0 = {0, … , 0} denotes N zero vectors of dimension n and minimize the following cost subject tox We considerẑ = (x k−N , ‚ w) as the decision variable, whereŵ = {ŵ k−N , … ,ŵ k−1 } refers to the model mismatch such that we can eliminate the predicted estimated statesx i in the optimization problem.
refer to the unique MHE estimate of (5), which yields the state estimatê and the estimation error Here, D denotes a Bregman distance induced from a strongly convex function and is defined as More detail on Bregman distances as well as some of their useful properties can be found in Appendix A1. The Bregman distance D acts in the proposed MHE scheme as a measure of the proximity between the estimated statê x k−N and the stabilizing a priori estimate x k−N and is a generalization of the commonly used quadratic prior weighting. More importantly, its usage enables for the MHE scheme to inherit the stability properties offered by the a priori estimate. Moreover, it offers a rather flexible design that allows, for instance, eliminating the constraints (5d) and incorporate them into the cost function by choosing the so-called relaxed barrier functions as Bregman distances. [20][21][22] From a technical point of view, and as we will see in the next section, the main advantage of the Bregman distance is that it enables to prove stability without an explicit representation of the error dynamics and independently of the rather general stage cost functions used in the optimization problem. Note that the proximity-based MHE formulation with convex stage cost and Bregman distances is inspired from the powerful class of proximal methods with Bregman distances. The basic proximal method iteratively finds a minimizer of a convex function by appending a Bregman distance centered at the current iterate and solving the resulting approximate problem instead. The proximal MHE formulation possesses a relatively similar structure. For this reason, we will adapt, in the following section, tools from the convergence analysis of proximal methods in order to be able to investigate the stability properties of the estimation error.

STABILITY ANALYSIS
In this section, we investigate the stability properties of the estimation error of the MHE strategy (5). We impose the following assumptions.

Assumption 2.
Suppose that the proposed MHE problem (5) satisfies the following assumptions.

A1:
The stage costs r ∶ R p → R and q ∶ R n → R are convex, nonnegative, and achieve their corresponding minimum at zero. A2: The Bregman distance D satisfies the following bounds for anyẑ = (x,ŵ) and z = ( for some 0 ≺ P 1 ≼ P 2 ∈ S n ++ and 0 < c 1 ≤ c 2 . Note that the strong convexity of D gives the quadratic lower bound in A2 of Assumption 2, while the upper bound comes from the additional assumption that the gradient of D is Lipschitz continuous. Even though the resulting class of Bregman distances might seem rather restrictive, it includes the important special case of quadratic distances D (x, y) = ||x − y|| 2 P induced from the function (x) = ||x|| 2 P , with P positive definite. In fact, quadratic distances are widely used in the MHE literature in order to replace the arrival cost, for which, typically, a closed form cannot be computed. In our setting, we can consider in addition any function of the form (x) = ||x|| 2 P + B(x), where B ∶ R n → R is a convex and strongly smooth function.
We will show that, if the weight matrices P 1 and P 2 from the quadratic bounds (9) satisfy a system of linear matrix inequalities (LMIs), then the estimation error (7) is guaranteed to be globally exponentially stable (GES). This will be proven by employing the Bregman distance D as a Lyapunov function. Before stating the first main result, we first need the following lemma that is obtained by analyzing problem (5) at some fixed time instant k. Lemma 1. Consider system (2) with the constraints (3) and the MHE scheme (5) at the time instant k and let A1 of Assumption 2 hold true. Then the following inequality holds D (z,ẑ * ) ≤ D (z, z), (10) whereẑ * = (x * k−N ,ŵ * ) refers to the unique solution of (5) and z = (x k−N , 0) to the a priori estimates. Moreover, x k−N in z = (x k−N , 0) denotes the state of system (2) with zero model mismatch.
Proof. We rely in this proof on useful properties of the Bregman distances, which are summarized in Appendix A1. Let us first eliminate the forward prediction equations (5c) and rewrite the estimation problem in terms of the decision variableŝ z = (x k−N ,ŵ). For given k and N, let Hence, we can write down each state in the horizon window in terms ofẑ as followŝ We fix the time instant k and obtain for the estimation problem (5b), (5c), and (5d) the following condensed formulation accounts for the sum of stage costs and represents the constraint set associated with the reformulation of (5d) in terms ofẑ = (x k−N , ‚ w). Since the cost of (13) is strongly convex, a unique solutionẑ * exists. By optimality ofẑ * , it holds for allẑ ∈  k that Thus, for allẑ ∈  k with f k (ẑ) ≤ f k (ẑ * ), we deduce that In other words,ẑ * is the unique D -projection of z onto the convex set Ω k , that is, where the set Ω k is defined as Since and lies in this polytopic set  k , and since the stage costs q and r are nonnegative, it holds that z ∈ Ω k . Based on (A5) from Lemma 3 in the Appendix, we get Since the left-hand side of the above inequality is nonnegative, we obtain (10). ▪ What we have shown is that, if we solve the optimization problem (5) at a given time instant k, the Bregman distance between the state of system (2) and the optimal solution is guaranteed to be smaller than the distance between this state and the stabilizing a priori estimate. On the basis of this Lemma, we are able to establish GES of the estimation error in the following theorem.

Theorem 1. Consider the proposed MHE scheme (5) for system (2) with the constraints (3) and let Assumption 2 hold true.
If the weight matrices P 1 , P 2 ∈ S n ++ from the quadratic bounds (9) on the Bregman distance satisfy the following LMI for each output y k and some positive-definite matrix Q, then the dynamics of the estimation error (7) are GES.
Proof. We introduce in this proof the subindex k to define the MHE errors at the time instant k. More precisely, the errors are given by refer to the solution of the MHE problem (5) and the true state of system (2) with zero model mismatch, respectively. We prove this result by employing a Lyapunov function V, which satisfies the following conditions and for some positive constants 1 , 2 , and 3 . We choose the Bregman distance in (5b) as a candidate Lyapunov function, that is, Given the bounds (9) on D in Assumption 2, the function V satisfies (23a) with 1 = min( min (P 1 ), c 1 )∕2 and 2 = max( max (P 2 ), c 2 )∕2. Moreover, we have in view of (10) in Lemma 1 that Using again the quadratic bounds (9), we obtain This is due to the fact that there is no model mismatch in (2) and that the associated a priori estimate equals zero. We show in the following that the stability of the MHE scheme can be inherited from the stabilizing a priori estimate x k−N by employing the LMI in (22). In fact, we get by using the dynamics of system (2) and the a priori estimate (5a) since the input terms as well as the nonlinearities , which depend on the output, cancel out. Given that P 1 and P 2 satisfy (22), we obtain by substituting the latter equation in (26) This proves that the estimation error (7) is GES. ▪ Observe that the LMI (22) implies that each resulting pair (A(y k ), C) is detectable for each y k ∈ R p . Moreover, the above result states that the existence of matrices P 1 and P 2 , satisfying the LMI (22) for each output y k , is required to guarantee stability since arbitrary weight matrices might lead to instability as discussed later in Example 3. In general, finding these matrices in accordance to this condition and to Assumption A2 might be difficult. Nevertheless, we mention special cases for which this difficulty can be circumvented and simpler conditions can be achieved. First, one could check if the observer gain L (y k ) in (22) can be constructed such that A (y k ) − L (y k ) C is constant for all time instants k and does not depend on the output y k . This would reduce the problem to just one LMI that has to be fulfilled instead of a total of T − N + 1 LMIs, with T referring to the simulation time. Second, if A(y k ) is polynomial in y k and the output trajectory is available offline, one could solve the LMI (22) for the weight matrices P 1 and P 2 using the sum of squares techniques. 33 Third, we mention the case where the resulting system matrices in (2) do not depend on the output y k . This corresponds, for instance, to transformations of system (1) into the nonlinear observer form discussed earlier. We will study this case in the next section to show input-to-state stability of the proposed estimator.
Remark 1. It is worth pointing out that any stabilizing a priori estimate for which the Bregman distance constitutes a Lyapunov function can be used to ensure stability. For instance, one can construct the a priori estimate based on the Kalman filter and choose the Bregman distance as a quadratic distance weighted with the inverse of the error covariance matrix. 34

INPUT-TO-STATE STABILITY
In this section, we focus on a special case of transformations of system (1) to system (2) in which the system matrices A and B do not depend on the output y k . More specifically, consider whereỹ k ∶= Cx k refers to the nominal output and w k ∈ R n and v k ∈ R p denote the unknown process and measurement disturbances, respectively. We assume that the pair (A, C) is detectable and that x k satisfies the constraints in (3). Moreover, we choose a quadratic prior weighting as a Bregman distance D in (5b), that is, D (x, x) = ||x − x|| 2 P , where P ∈ S n ++ . Note that w k represents, in this case, an external disturbance that we do not consider in the optimization problem and that only the first statex k−N in the horizon window constitutes the decision variable. The MHE scheme for system (29) is given as follows.
Estimation problem P ′ : At the time instant k, given the inputs u k−N , … , u k−1 , the measurements y k−N , … , y k−1 , and the a priori estimate minimize the following cost with respect tox k−N . Letx * k−N refer to the unique MHE estimate of (30), which yields the state estimatê and the estimation error In the following, we will show that, if the weight matrix P satisfies a simple LMI, then the estimation error is guaranteed to be input-to-state stable (ISS) with respect to the process and measurement disturbances.

Assumption 3.
Suppose that the MHE problem (30) satisfies the following assumptions.

A1:
The stage cost r ∶ R p → R in (30b) is convex, nonnegative, and achieves its minimum at zero. Moreover, it is differentiable and its gradient is globally Lipschitz with parameter L r , that is, A2: The nonlinearity is globally Lipschitz continuous with parameter L , that is, we have A3: In the a priori estimate (30a), L is designed such that all the eigenvalues of A − LC are strictly within the unit circle.
Note that we need for the subsequent analysis the additional Lipschitz continuity assumption on the gradient of the stage cost r and on the nonlinearity , which are given in A1 and A2, respectively. As in the previous case, we present the following result. (30) for system (29) with the constraints (3) at the time instant k and let A1 of Assumption 3 hold true. Then,

Lemma 2. Consider the MHE scheme
where x k−N denotes the state of (29), x k−N is the a priori estimate, and Δ j ∶= (ỹ j ) − (y j ).
Proof. Let us rewrite problem (30) in terms of the decision variablex k−N .
We reformulate the estimation problem (30b), (30c), and (30d) as follows The remainder of the proof constitutes of two parts. First, we show that, given the operator ∶ R n → R n defined as: we can derive for a given time instant k a nonexpansiveness property of as developed for proximal point algorithms. 25,35 Second, we use this property to obtain the result of the lemma. Part 1. Since the overall cost function is strongly convex, the minimizer (z) is unique for each z.
We derive the Karush-Kuhn-Tucker (KKT) optimality conditions corresponding to problem (37) at the minimum (z) for an a priori estimate z. We obtain where ∈ R (N+1)q x denotes the associated Lagrange multiplier. For an a priori estimate z ′ , we can write down the corresponding KKT conditions at the minimum (z ′ ) with the Lagrange multiplier ′ . Subtracting the gradient condition (38a) at (z ′ ) from the condition at (z) and then multiplying from the left by By defining y i = y i −  i (z) − C i and y ′ i = y i −  i (z ′ ) − C i , each ith term of the sum in the latter equality can be computed as follows ( Since the operator ∇r is convex, we know that (40) is nonngeative. Hence, we obtain in view of (39) the following inequality Given the complementary slackness condition (38b), the second term of the right-hand side of (41) can be written as where the fact that the latter expression is nonnegative holds as a result of (38c). Hence, (41) becomes Since P 1/2 is also symmetric positive definite, we obtain in view of the Cauchy-Schwarz inequality By taking the square of (44), we get If z ≠ z ′ , then (z) ≠ (z ′ ) and we can divide by ‖ (z) − (z ′ )‖ 2 P to get the following nonexpansiveness property for the operator defined in (37). Otherwise, (z) = (z ′ ) and we would get 0 ≤ 0 in (45). Part 2. The idea is to choose z such that p(z) = x k−N in order to employ the nonexpansiveness property (46) for the subsequent Lyapunov analysis. Substituting the true state x k−N = p(z) of system (29) into (38a) yields This holds since the true state x k−N strictly satisfies the constraints (36a), which yields in view of the complementary slackness condition (38b) that the associated Lagrange multiplier equals zero. The true measurements in (29) are given by . Substituting the latter expression in (47) yields since P is invertible. Hence, evaluating the property (46) at z computed in (49) with p(z) = x k−N , and z ′ = x k−N with p(z ′ ) = x * k−N yields the desired inequality (35). ▪ If we set the disturbances to zero, the result (35) can be regarded as a special case of (10) with D (x, x) = ‖ ‖ x − x ‖ ‖ 2 P . However, observe that the corresponding proofs are quite different. This is due to the fact that, in contrast to the proof of Lemma 1, the presence of disturbances in system (29) did not allow us to take advantage of the central property of Bregman distances introduced in Lemma 3 in Appendix A1 More precisely, we needed in Lemma 1 the crucial step that the cost function evaluated at the true state achieves its minimal value of zero, which does not hold in the perturbed case. We establish in the following theorem that the estimation error is ISS with respect to the disturbances. Theorem 2. Consider the proposed MHE scheme (30) for system (29) with constraints (3) such that Assumption 3 holds true. If we choose the weight matrix P ∈ S n ++ in (30b) such that it satisfies the following LMI for some positive-definite matrix Q ∈ S n ++ , then the corresponding estimation error dynamics are ISS with respect to the process and measurement disturbances.
Proof. Let p ∶= ‖P‖, a ∶= ‖A‖,ã ∶= ‖A − LC‖, c ∶= ‖C‖, l ∶= ‖L‖, ∶= ‖‖, g ∶= ‖G‖, p = min (P), and q = min (Q). Following the lines of the proof of Theorem 1, we choose the employed Bregman distance as a candidate Lyapunov function, that is, V(e k−N ) = ||e k−N || 2 P . The conditions on V such that it can constitute an ISS-Lyapunov function are given in Definition 1 in Appendix B1. Note that condition (B3a) is already fulfilled. Moreover, we have in view of (35) Substituting the state of system (29) and the a priori estimate (30a) yields Thus, we get accounts for the noise terms. Exploiting the LMI in (50), we get for the first term in the right-hand side of (53) By using Young's inequality, we obtain for the second term where s ∈ R ++ is arbitrary. For the third term, we have Hence, we finally obtain where c 1 ∶= q − s and c 2 ∶= p (1 +ã∕s) are strictly positive if we choose s such that 0 < s < q. Thus, condition (B3b) is fulfilled and V is an ISS-Lyapunov function for the estimation error with respect to the disturbances in d k−N−1 defined in (54). For the sake of completeness, we also compute the ISS gains. The computations can be found in Appendix C1. In particular, we show that there exist e ∈ (0, 1) and c e , c w , c v ∈ R ++ such that for all = e 0 ∈ R n and any k with ▪ Remark 2. Note that an ISS result with matrices A and B depending on the output might be difficult to achieve. This is due to the fact that, in the estimation problem, the measurement noise will also enter the system matrix A. More specifically, in the Lyapunov analysis, we will not be able to separate the disturbances from the estimation error in order to arrive to the desired ISS property of the form (58).

SIMULATION EXAMPLES
We illustrate the stability results obtained in the previous two sections by means of three simulations examples. More precisely, we demonstrate the global exponential stability established in Theorem 1 in the first example and input-to-state stability established in Theorem 2 in the second and third examples.

Example 1
In this section, we consider an example of nonlinear systems in which states are multiplied with a (potentially nonlinear) function of states that can be measured. By reformulating these nonlinearities in terms of the outputs, we can obtain a system of form (2). For the sake of illustration, we consider a simple nonlinear system of the form (1) where which can be transformed to system (2) as follows Observe that the state x k obtained from this transformation is equivalent to the state z k of system (61), that is, the evolutions of both systems are identical. Also note that the new system can be viewed as a discrete-time linear time varying system where A(y k ) varies at each time instant k based on an a given output trajectory. We estimate the state x k of system (62) by solving the MHE problem (5)  satisfies the LMIs (22) with P = P 1 = P 2 . We compare the results of the MHE scheme (5) with those obtained by employing a Luenberger observer. For a fair comparison, we design the Luenberger observer with the same L matrices as in (5a), which gives the following observer dynamicŝ We set the initial condition z 0 = x 0 = [0.51.5] ⊤ and choose the initial guessx 0 = [0 1.9] ⊤ for both estimators. The state estimation results are presented in Figure 1. The resulting estimation errors for both estimators are shown in Figure 2. We can see that the proximity MHE scheme (5) reconstructs the states of the nonlinear system (61) and exhibits a good estimation performance. In fact, its performance is superior to the one of the Luenberger observer even with a small horizon length of 2. This observation was rather expected since the proposed MHE strategy employs the Luenberger observer as an a priori estimate and performs an additional optimization based on the available measurements, that is, the MHE scheme should perform at least as good as the Luenberger observer designed accordingly.

Example 2
We consider the following discrete-time nonlinear system 28 ] ⊤ into a system in the nonlinear observer form given by (29) with zero disturbances and initial condition x 0 = T(z 0 ) = [1, 0.5, 0.3, 0.4] ⊤ . Note that the resulting pair (A, C) is observable. We design for the new system the proposed proximity MHE scheme (30) with horizon length N = 3 and stage cost r(x) = R‖x‖ 2 with R = 0.01. We choose the weight matrix P in (30b) such that the LMI (50) is satisfied for the designed observer gain L. In order to illustrate the established ISS results, we assume that the new system is subject to process and measurement disturbances, which are chosen as independent, zero mean, Gaussian random variables with variances w = 0.01 [1, 1, 1, 1] ⊤ and v = 0.1 [1, 1] ⊤ , respectively. As was done in the previous case, we compare the results of the proximal MHE scheme (30) with a Luenberger observer design with the same observer gain L and initial guesŝ The estimation results are depicted in Figures 3 and 4. We can see that, in the presence of process and measurement disturbances, the proximal MHE approach yields ISS estimation errors. Moreover, the estimated states converge to a neighborhood of the true state of system (29), (66). This is also the case for the Luenberger observer. However, observe that the proximal MHE scheme with a quadratic stage cost exhibits a better estimation performance and is more robust in the presence of the Gaussian noise.

Example 3
In this section, we focus on an example of nonlinear systems in which every nonlinearity can be expressed in terms of the output. This yields a system with a constant matrix A, where all nonlinear terms are stacked into the nonlinearity (y k ).
More specifically, we consider the example of a flexible joint robot link where joint flexibility is modeled by a stiffening torsional spring. 36 The associated continuous-time nonlinear model 36,37 is given bẏ where m , m , l , and l denote the motor and link positions and velocities, respectively, and is the torque, which is computed as with 1 = 2 = 1 due to the stiffening spring. The parameters of the model are presented in Table 1. The input is given by u(t) = sin(t) and we assume to measure the motor and link positions, that is, y = [y 1 y 2 ] ⊤ = [ m l ] ⊤ . We define the state vector x = [ m m l l ] ⊤ and transform system (67) into the following forṁ where (69c) We discretize system (69)   outlier that occurs at t 2 = 0.5 second. The resulting estimation errors are depicted in Figure 5 for scenario (a) and Figure 6 for scenario (b) where t = kh. Figure 5 shows that both proximal MHE schemes (denoted by pMHE) exhibit a much better performance than the Luenberger observer. Nevertheless, we can see that the MHE scheme with a quadratic stage cost is affected by the outliers and shows rather large estimation errors. In contrast to this behavior, observe that the estimator with the 1 -norm as a stage cost is insensitive to the outliers and that the corresponding estimation errors are GES. The results of scenario (b) in Figure 6 confirm this observation and highlight the performance of both estimators with different noise characteristics. More precisely, in the presence of Gaussian noise, the MHE scheme with the 2 -norm performs in this example much better than the MHE scheme with the 1 -norm. However, when the outlier takes place, the estimation errors of the quadratic estimator become larger than those of the pMHE with the 1 -norm. These observations are validated by the computed root-mean-square errors (RMSEs) associated with each proximal MHE scheme, where T sim = 100 denotes the simulation time for system (69). These are reported in Tables 2 and 3 for scenarios (a) and (b), respectively, along with the performance of the MHE schemes with other values of the horizon length N.
In scenario (a), since the MHE scheme with the 1 -norm is robust against outliers and exhibits GES of the estimation errors, its performance improves with an increasing value of the horizon length N, as depicted in Table 2. In fact, for N = 30, its RMSE is much smaller than the one corresponding to the MHE scheme with the 2 -norm. In scenario (b), we can see in Table 3 that both schemes achieve a better performance with a larger horizon length. Moreover, their performances are more comparable, which could be explained by the fact that, even though the 1 -norm is more robust against the presence of the outlier, the 2 -norm is more suitable when white noise is present. Table 4 reports the required computational effort of the proximal MHE schemes in comparison with the Luenberger observer. Obviously, the least computationally involved scheme is the Luenberger observer. Since the pMHE problem with the 2 -norm constitutes a quadratic program, we can observe that its solution requires less time than the pMHE scheme with the 1 -norm. Note that both optimization problems are solved with the same nonlinear programming solver of Matlab.
For scenario (a), we also compare the estimation results with those provided from the extended Kalman filer (EKF). The EKF estimates the state of the nonlinear system obtained after discretizing system (67) using the Euler method with a sampling time of h = 0.01. Note that the tuning of the process and measurement noise matrices Q ekf and R ekf is by no means trivial. More specifically, R ekf has to be chosen large enough such that the influence of the outliers can be attenuated, but also not so large such that the information provided from the measurements becomes neglected. After tuning, we obtain the results depicted in Figure 7, in which we can observe that the EKF is highly affected by the presence of outliers and that both MHE schemes exhibit a much better performance.
It is important to mention that, if we set the weight matrix P as a free-tuning parameter and choose it as the identity matrix for instance, the estimation error would diverge in this example. In fact, this specific choice of P does not satisfy the LMI (50). This demonstrates that, even though the proximal MHE scheme is based on a stabilizing Luenberger observer, the weights have to be chosen accordingly, for example, such that the LMI (22) is fulfilled for the generalized Bregman case or such that the LMI (50) is satisfied for the quadratic case in order to guarantee stability.

CONCLUSION
In this article, we presented a proximity-based MHE formulation for a class of discrete-time nonlinear systems, which can be transformed into systems that are affine in the unmeasured state. The convex cost function of the estimation problem consists of two terms. The first term specifies the performance of the estimator and comprises the sum of convex stage costs, which take the recent online measurements into account. The second term is designed to ensure stability and consists of Bregman distances that penalize the deviation from a stabilizing a priori estimate. As a first main result, we showed the global exponential stability of the resulting estimation error. For a special case that includes the class of linear systems with additive disturbances, we established as a second main result the input-to-state stability of the underlying estimation error with respect to process and measurement disturbances under rather mild assumptions. Future research can focus on deriving weaker conditions for stability and to modify the design of the estimator accordingly. While the imposed system class seems to be rather restrictive, it enables as a future work to investigate employing an EKF as an a priori estimate instead of the Luenberger observer by replacing the output dependent matrices in (2) by time-varying ones. This might have the potential benefit of enhancing the local stability and performance properties of the widely used EKF.

ORCID
Meriem Gharbi https://orcid.org/0000-0002-0466-5101 Definition 1. System (B1) is ISS with respect to w k if there exists a -function and a -function such that, for each bounded input sequence w, it holds that for any k ∈ N, where x k refers to the solution of (B1) at time instant k with initial condition = x 0 .
In particular, the state of an ISS system decays for a decaying input and the origin x = 0 is globally asymptotically stable whenever w k = 0 for all k ∈ N + . Definition 2. A continuous function V ∶ R n → R is an ISS-Lyapunov function for (B1) if there exist K ∞ -functions 1 , 2 , 3 , and an K-function such that and for any k and any inputs w.