A Critical Comparison on Attitude Estimation: From Gaussian Approximate Filters to Coordinate-free Dual Optimal Control

This paper conveys attitude and rate estimation without rate sensors by performing a critical comparison, validated by extensive simulations. The two dominant approaches to facilitate attitude estimation are based on stochastic and set-membership reasoning. The first one mostly utilizes the commonly known Gaussian-approximate filters, namely the EKF and UKF. Although more conservative, the latter seems to be more promising as it considers the inherent geometric characteristics of the underline compact state space and accounts -- from first principles -- for large model errors. We address the set-theoretic approach from a control point of view, and we show that it can overcome reported deficiencies of the Bayesian architectures related to this problem, leading to coordinate-free optimal filters. Lastly, as an example, we derive a modified predictive filter on the tangent bundle of the special orthogonal group $\mathbb{TSO}(3)$.


Introduction
Attitude and rate estimation is an important aspect of aerial robotics. Throughout the decades, it has proven very accurate and versatile in applications from the first Low Earth Orbit (LEO) satellites [47] to Unmanned Aerial Vehicles (UAVs) [8] and from the Unmanned Aerial Systems [25] to recent Aerial Robotic Workers [1]. At the same time, technological and technical advances allow for increased specifications of autonomy in conjunction with precise and agile maneuvering. Consequently, position and orientation (attitude) control constitutes a field of research that is vital component of aerial robotics. In many cases, the model can be decoupled and attitude control can be implemented independently from position control [8]. Lately, more focus has been given to attitude controllers due to the increased difficulty and complexity of the specific control problem [2]; the success of these controllers relies upon the accurate knowledge of the real orientation and the angular rate of the aerial robot. Thus, it is imperative to develop efficient attitude filters, to deal not only with the measurement noise but also with the model errors.
When a-priori statistical information is available, such uncertainties are represented by utilization of the stochastic framework. Subsequently, model errors and measurement noise are then expressed as stochastic inputs to provide a Alternatively, the Unscented Kalman Filter (UKF) has the advantage of handling nonlinearities through the Unscented Transform (UT) more efficiently compared to the EKF. This makes reasonable the choice for using it in conjunction with the Euler angles coordinate system and the shifted frame of reference method of [19]. An attempt towards this direction can be found in [14]. On the other hand, when the quaternion representation is used, the UKF in a standard format cannot be implemented straightforwardly. The reason is again the quaternion's unit constraint. There is no guarantee that the predicted quaternion mean of the UKF will satisfy this constraint and express an orientation. In [16], the authors tackle this obstacle by the use of the Generalized Rodrigues Parameters (GRP) [58] to represent an attitude-error quaternion. Lastly, a comparison between the EKF and the UKF under the quaternion representation can be found in [43]. The conclusion is that the UKF shows better performance compared with the EKF, when the kurtosis and the higher order moments in the state error distributions are significant. A compelling discussion on the application of the mentioned Kalman-based filters for gyro-less attitude and rate estimation can be found in [26]. The EKF and UKF are local methods and are characterized by relatively small computational complexity. However, they are strictly suboptimal and, thus, they at most constitute efficient heuristics, but without explicit theoretical guarantees [39].
An attempt to set the state estimation problem within the dual optimal control framework [36] was made in [52]. This method determines the corrections added to the assumed model, such that the model and corrections yield an accurate representation of the system's behavior. The model uncertainty is considered as an unknown but deterministic signal within a Hilbert space. The goal is to estimate the states for the resulting measurements to approximate the measured observations, while keeping the considered model as valid as possible. This is done by minimizing the total norm of the augmented measurement-model uncertainty vector. The optimization problem incorporates a covariance constraint in order to ensure that the state estimates remain statistically consistent. However, the above filter is based on a twopoint boundary condition problem and is, essentially, an offline optimal state estimator. In [54], the modal trajectory estimator is derived. This approach is based entirely on the Hamiltonian formulation of optimal control and results in a recursive filter.
The importance of the dual optimal control formulation for the problem of attitude and rate estimation stems from the nature of orientation itself. Euler's theorem [18] indicates that the set of orientations is the special orthogonal group SO(3), which is a compact Lie group associated with the Lie algebra so(3) of the 3 × 3 skew symmetric matrices. A Lie group is a differentiable manifold equipped with the algebraic structure of a group [22,60]. Therefore, instead of relying on the prefabricated Bayesian architectures, the problem can be directly set and solved in a coordinatefree fashion as a dual optimal control problem by applying tools from differential geometry. The approach of [54], commonly known as minimum energy filtering, was utilised in [56] where the second-order-optimal minimum energy filter on Lie groups was derived.
Conclusively, we observe that within the -Bayesian framework-the success of a gyroless attitude estimation scheme depends primarily on the chosen coordinate system. Essentially, there is an incompatibility between the Bayesian architectures and the space of orientation. This incompatibility is justified by the fact that the Gaussian approximate filters are primarily built to approximate the conditional mean, rather than comply with the geometric characteristics of the underlying state-space.
Furthermore, based on stochastic modelling, the Bayesian strategies assume second-order statistical knowledge for both the measurement noise and the model's uncertainty. Although aggregating second order statistics for the measurement noise is feasible through (offline) experimentation, for the case of the model error -usually referred to as "process noise"-the assumption that it is a symmetrically-distributed white noise process of known covariance has no theoretical basis. For physical systems, model uncertainty represents environmental phenomena; therefore, it is more reasonably expressed by smooth functions within a Hilbert space.
With regards to the dual optimal control formulation, deterministic filtering originates from set membership estimation, where the prior and the underlying uncertainties are expressed as assigned -from the modeler-sets. On the one hand, although intuitively the set-membership reasoning seems compatible with the compact nature of the space of orientations, it does not provide any accuracy about the belief degree. On the other hand, dual optimal control provides the machinery to formulate the estimation problem as a well-defined optimization problem [66].
In this paper, we consider sensors that measure only vector directions and we incorporate both the kinematic and the dynamic models for the attitude motion. The set of observations, are made w.r.t. the inertial frame and obtained from sensors that measure w.r.t. the body frame. The contribution of this work is the critical assessment of the reasons governing the superiority of deterministic modelling over stochastic, for the problem of orientation and rate estimation from vector measurements. Although many works study various attitude filters' performances in terms of attitude and rate error accuracy, none of them is motivated by the fact that deterministic modelling naturally leads to a coordinatefree problem formulation. To this extent, the present paper is motivated by the dual optimal control approach, that accounts directly both for the underlying state-space and the environmental phenomena affecting the existing system without ad-hoc simplification assumptions. To this direction, we also derive the modified predictive filter on TSO(3). Extensive simulations are used to compare the second-order-optimal minimum energy filter (MEF) [56] and predictive filter (PF) performance versus the EKF and UKF. Both the analysis and the simulations' results conclusively indicate that coordinate-free deterministic filtering tackles the vices of the stochastic approach.
Bayesian formulation of attitude estimation and Gaussian approximate filters (Section 2). After establishing a coordinate system map for the orientation we end up with a system of nonlinear differential equations. We may then attack the problem under stochastic reasoning and in particular through the Bayesian approach. We present the Bayesian formulation of attitude estimation and employ the Kushner equation [41] for analysis. Set membership state estimation and dual optimal control formulation (Section 3). A more conservative point of view considers set-membership reasoning for the problem at hand. We show that set-membership estimation approach naturally leads to a control formulation of estimation, which is optimally implemented by the minimum energy filter. In the same section, we derive the modified predictive filter on TSO(3) by proposing a new error function. Algorithms and Numerical Implementation (Section 4). We provide the algorithm summaries for each filter providing exceptional care to the numerical integration. On the one hand, explicit integration methods add, artificially, energy into the system. On the other hand, implicit integration schemes operate as (artificial) dampers removing energy from the system. To this direction, we utilize the Lie group symplectic integration, which essentially is produced under the machinery of the variational principle of mechanics by discretizing directly the cost functional [50]. Simulation Results (Section 5). We present the results for two case studies regarding UAVs and two for satellite attitude filtering. In particular, we demonstrate how the filters operate under the presence of process noise and significant deterministic model errors. Simulation Results (Section 6). The paper concludes with remarks that are drawn based on the obtained results. Notation: The following notation is used throughout the paper: R is the set of real numbers. With rod : q → R we declare the Rodrigues formula which maps the quaternion q (or the principal rotation vector) to the Directional Cosine Matrix (DCM) R ∈ SO(3). The matrix expm(X) is the exponential of X ∈ R n×n . The map ( ) × :R 3 → so(3) is an isomorphism from the arrays in R 3 to the Lie algebra of the 3 × 3 skew symmetric matrices so(3). The Euclidean norm is denoted by || ||. ∇V is the gradient of the real-valued function V : R n → R and ∂ X (f ) denotes the partial derivative of f w.r.t. X. Further, X, Y : R n × R n → R denotes the inner product ∀ X, Y ∈ R n . Lastly, the estimate of X is denoted by X, while the optimal estimate of X by X 2 Bayesian formulation of attitude estimation and Gaussian approximate filters Let (Ω, F, P) be the filtered probability space and the filtration F t with respect to which all processes will be adapted. After establishing a coordinate system map, we consider the following processes of interest: where the state process X = X t = [X ⊤ 1,t X ⊤ 2,t ] ⊤ , t ≥ 0 is defined to be the solution of the stochastic differential equation (1) and equation (2) defines the observation process Y = Y t , t ≥ 0 . Furthermore, W ∈ R n 2 and V ∈ R q express environmental effects and the measurement noise respectively and are assumed to be independent Brownian motions. The coefficients f = [f ⊤ 1 f ⊤ 2 ] ⊤ : R n 1 +n 2 → R n 1 +n 2 and h : R n 1 → R q are assumed to be Lipschitz continuous mappings. Lastly, the control u ∈ U ⊆ R n 2 is considered as known input torques and G 1 2 is the square root of G ∈ R n 2 ×n 2 . The coefficients f 1 and f 2 express the kinematics and the dynamics of the physical motion respectively, as indicated by the Euler's equations of motion [57]. The process X comprises the orientation and angular rate respectively, where n 1 depends on the chosen coordinate system map.
We denote by F Y t the σ-algebra generated by {Y τ , 0 ≤ τ ≤ t}. The optimal estimate X * t is then given as the solution to the following optimization problem: where ρ(x|F Y t ) is the conditional probability density of the state, given the noisy measurements up to and including time t. Therefore, knowledge of the posterior density for each t, constitutes the complete solution of the problem (3). For C(x, x) = ||x − x|| 2 2 , the optimal Mean Square Error (MSE) estimate is given by In order to derive a differential equation for the optimal MSE estimate, we can differentiate (4) w.r.t. time. By utilizing the generalized Leibniz rule [21] we obtain: Furthermore, the posterior density ρ = ρ(x|F Y t ) evolves according to the Kushner equation [41]: Thus, the optimal nonlinear filter is given by: with k = 1, ..., n 1 + n 2 and V k,s satisfying the stochastic differential equation: Nevertheless, actual evaluations of the terms E{f k |F Y s }, E{h k |F Y s } are possible only in case where the system is linear and the noise distributions are Gaussian, resulting in the well known Kalman filter [37]. In the nonlinear case, both terms require knowledge of the entire posterior density, yielding an infinite dimensional filter [51]. The EKF and the UKF are proposed to tackle this issue. The EKF applies the Kalman filter framework to nonlinear systems, by first linearizing the system model using a first-order truncated Taylor series expansion around the current estimates [65]. This linearization step affects the accuracy of the posterior predictions and often leads to divergence of the filter [65]. On the contrary, the UKF [35] makes explicit use of the scaled Unscented Transformation (UT) (stochastic linearization) [34] which is based on the idea that, it is preferable to approximate a probability distribution instead of an arbitrary nonlinear function. Since both filters utilize only the Gaussian parameters (first and second statistical moments), these methods belong to a broader class entitled as Gaussian approximate filters.
Implementing both the EKF and UKF with the quaternion representation yields bilinear kinematic equations. However, there is no guarantee that the quaternion mean of the EKF and UKF will satisfy the unit-norm constraint due to the addition operator in the correction step of the filters. To overcome this issue, in this work we use the Euclidean normalization approach [4]. Although this normalization step leads to meaningful results, it is an external intervention on both the EKF and UKF algorithms and affects the unbiasedness of both the quaternion and rate estimates. This phenomenon is discussed and analysed further in the (Appendix 7.3).
3 Set membership state estimation and dual optimal control formulation Orientation belongs in a compact space, the special orthogonal group. An assumption that is valid in many applications is that the angular rate lies within a bounded space. The same can be inferred for both the model and measurement uncertainty. Thus, instead of modeling uncertainties utilizing stochastic reasoning, we use the more elementary concept of a set [62]. This section shows how deterministic filtering naturally recasts as a control problem adopting set-theoretic reasoning.
Consider the system described by the state-space model of the form: x Without loss of generality, t i ∈ I, where I is a partition of time. Thus, the measurement equation is given by where the functions f k , k = 1, 2, G and h are defined as in Section 2. Regarding (9), (10), the model and measurement uncertainties are considered as unknown and deterministic signals where δ ∈ D and ǫ ∈ E, with D ⊂ R n 2 , E ⊂ R 6 . The system's state x ∈ S × W, where S declares the space of orientation and W ⊂ R n 2 a properly chosen C-set [5]. Assuming complete lack of knowledge regarding the initial state estimate, we can write X 0 = S ×W. Set-membership state estimation repeats the following two steps [42]: The guess X i regarding the state x at time t i is projected forward in time, resulting the set which consists of all the states in S × W compatible with the measurement y i+1 for some ǫ i+1 ∈ E. The prediction and correction of the state are then given as: and respectively. Note that (13) and (14) correspond to the prediction and correction step in the optimal Bayesian update [51] respectively. However, in this case we can go a step further by defining the input pair where φ is the solution of (9) and δ [0,i] , u [0,i] declare the model error and input values respectively within [t 0 , t i ]. Then, the second step of the method is equivalently modified by defining the set of all input pairs which produce observations compatible with the measurements. In other words, the goal is to actually determine the set of different decisions δ [0,i] and the initial state X 0,i that produce -throughout the dynamicsthe received measurements for ǫ i ∈ E. Lastly, it is possible to ask for the pair and expect X 0,i ↓ X * 0 , i.e. X 0,i to be a decreasing sequence with limit the optimal estimate X * 0 . Nonetheless, the settheoretic algorithm in its general form accounts for some difficulties: The performance of the above method depends on the initial guess X 0 , as well as on our knowledge regarding the sets D and E. Subsequently, representing the sets X 0 , D, E, R D,u i and C E,y i+1 (D E,y i+1 ) in practical applications -at least approximately-by a finite set of parameters, is not a trivial task [27]. Finally, the method does not provide any accuracy about the belief degree regarding the state estimates.

Minimum energy filtering
The set-theoretic approach determines possible sequences of decisions. The minimum energy filter considers the sequence with the minimum norm which creates observations compatible with the obtained measurements, and constitutes one of the first implementations of this approach. It was first introduced by Mortensen [54], and consists of a method for deriving nonlinear estimators, based on the value function of the optimal estimation problem.
Consider the system described by (9). The signals δ(·), ǫ(·) and the initial condition x 0 are now modeled as arbitrary disturbances within a Hilbert space. Thus, consider the cost where δ [t 0 ,t] and ǫ [t 0 ,t] refer to the model and measurement error values within the interval [t 0 , t]. Furthermore, Φ : R n 2 → R + and Q : R 6 → R + are two quadratic forms that measure the instantaneous energy of the error signals.
In addition, S 0 : R n 1 +n 2 → R + is the initial cost encapsulating the a-priori knowledge regarding the state at time t 0 and is a function with a global minimum [56]. Since ǫ is deterministic, (17) can be written as Note that in order for the filter to track the actual measurements, the uncertainties δ(·) and S 0 (x 0 ) should be minimal; within the estimation context, minimizing t t 0 Φ(δ)dτ is essential rather than an additional requirement as it is posed in classic optimal control theory. The minimization of the uncertainty regarding the actual system is equivalent to the information gain. It is impossible to track the actual system or equivalently estimate the system's state without minimizing the uncertainty for the actual system. Therefore, the goal is to minimize the model uncertainty, while tracking the given measurements. This will yield an optimal minimum energy pair ( Thus, the following optimization problem min has to be solved for each t as new observations arrive online, since the optimal decisions δ * [t 0 ,t] are affected from the incoming information at each time instant t. At this point, we follow [55] where (19) is tackled by first assuming fixed x 0 and finding the optimal δ * [t 0 ,t] with the Hamiltonian formulation of optimal control providing the necessary conditions for optimality for δ * [t 0 ,t] [3]. The value function is defined as In order to completely solve the optimization problem of (19), the minimum of the value function w.r.t. the initial condition x 0 for each t must be considered. The necessary condition for optimality yields: since determining the optimal end point x * t -and given the optimal control decisions δ * [t 0 ,t] -fully specifies the optimal initial condition x 0 for each t, by running time backwards. Essentially, this equivalence allows us to express the value function w.r.t. the optimal estimate x * t and, therefore, to derive the minimum energy filter [54].

Predictive filter on TSO(3)
The predictive filter on TSO (3) is a deterministic filter that predicts the model error and drives the rate and attitude estimate towards the real state under the presence of significant model errors. The filter emerges from the continuoustime nonlinear controller of [12] along with the covariant constraint from [53]. A predictive quaternion attitude filter based on the nonlinear controller of [46] was derived in [15]. However, our derivation is based on a different cost function which leads to a faster transient response. Furthermore, the output Jacobians are determined intrinsically, directly on the TSO(3) as shown in the (Appendix 7.1).
Consider the state space model:Ṙ where R ∈ SO (3), Ω ∈ R 3 , α i : R + → R 3 . The vector ǫ represents the unknown measurement error with D being block diagonal, namely Given that the term y × i y i forms an error axis between the estimated output y i and the system's output y i , it is reasonable to target for the model error that minimises the predicted mean error axis formed by the two measurements. Based on this observation, the predictive filter on TSO(3) results from the following minimization problem: where the matrices Q ∈ R 3×3 and Σ ∈ R 3×3 penalise the prediction error and the correction term respectively. The value of the uncertainty term δ at time t influences the state (R, Ω) at a posterior instant of time t+h and, subsequently, the same is true for the output since the state-output relation is expressed via a memoryless system. The constrained optimization problem of (25) recasts into an unconstrained one by using the expansion [9,24]: where and Term L ξ f ( y i ), ξ = 1, 2 denotes the ξ-th order Lie derivative of y i w.r.t. the system. After substituting (26) in the cost, the necessary condition for optimality yields the optimal correction term: where B is a function of R, Ω, and h, given by: γ(t) is given by: and w k is given by: Lastly, by substituting the Lie derivative terms (Appendix 7.1), equation (27) results in The main advantage of this method is that the correction is performed only through the dynamics, while the kinematic equation remains isolated; consequently, it can be integrated geometrically. This was not the case in the Gaussian approximate filters, where the addition operator in the correction step violates the space's geometry. Furthermore, there is no need to initialise the filter with prior information.
Until now, the problem has been treated as a tracking problem of optimal control. However, the estimates should be statistically consistent. As can be seen from (29), by decreasing the model error penalty matrix Σ, the estimates are based more on the measurements, so the output estimates get closer to the noisy observations. Assuming white measurement noise, a limit must be set w.r.t. how much the estimated outputs should match the noisy observations. This is accomplished by choosing the model error penalty matrix Σ such that it approximately achieves the balance expressed by referred as the covariant constraint [53]. For our application, we estimate the output error covariance by where N is the total number of samples. To examine (34), we utilise the L 2,2 matrix norm. Since the measurement noise covariance matrix is of the form D = σI 6×6 , (34) is satisfied when: which after some calculations, yields

Algorithms and Numerical Implementation
An extensive simulation study is carried out to compare the performance of the second-order-optimal MEF and PF against the EKF and UKF. In this section, the model that is utilised in the simulations and the error functions used to assess the efficiency of the methods are presented. Also the algorithmic summaries are given for each of these four filters and some aspects relating to the numerical implementation are presented.
The EKF and UKF use the quaternion representation, whereas the MEF and PF are set directly on the special orthogonal group. Although many works study the performance of various attitude filters in terms of attitude and rate error accuracy, none of them does so by considering dynamics with significant model errors. Attitude and rate estimation from vector measurements should take into account environmental phenomena which affect the actual system. In [17], the 4-th order Runge-Kutta method is employed for simulation. Nevertheless, these methods do not preserve the continuous-time motion's essential features like kinetic energy and momentum. The main contributions to address these gaps are: Algorithm summaries for each of the aforementioned attitude filters, as well as a comprehensive simulation study that compares the selected stochastic attitude filters against the deterministic ones. The comparison considers measurement errors, initialization errors, and model errors that typically appear in attitude and angular rate filtering for UAVs and satellite missions.

Model and Error function
For expressing the orientation of the rigid body, we use the quaternion representation q ∈ S 3 and the matrix representation R ∈ SO(3). Then, the rigid body kinematics are given as:q Expressed in the body-fixed frame, we denote by I ∈ R 3×3 the inertia tensor, by Ω ∈ R 3 the angular rate of the rigid body and by T ∈ R 3 the applied torques. The angular rate Ω evolves according to Euler's equation [50]: up to model uncertainty δ(t) ∈ R 3 , with G ∈ R 3×3 . Two time varying directions α 1 (t) and α 2 (t) are measured on board, as y 1 (t) and y 2 (t) according to: where r ∈ SO(3) is the Directional Cosine Matrix (D.C.M.) parameterised w.r.t the unit quaternion q(t), and ǫ(t) is the measurement noise. We assume that the two sensors operate independently, so the matrix D is chosen block diagonal: The attitude estimation error is given by the following functions: and w.r.t. the quaternion and matrix representation, respectively. The angular rate estimation error is calculated as e Ω = Ω(t) − Ω(t) where both Ω(t) and Ω(t) are expressed w.r.t. the inertial frame.

Numerical implementation
In this section, discrete-time implementations of the continuous-time filters are presented via algorithm summaries. Discretization should be addressed carefully as the Lie group structure of the underlying state space; the motion's energy and momentum have to be preserved under any numerical calculation. Proper discretization of the continues-time differential equations requires Lie group variational (symplectic) integration [50]. The numerical integration of the kinematic equation is made by assuming a short-time step h. Since the attitude motion is instantaneously a rotation, the discrete orientation update is obtained using the exponential map as: w.r.t. the quaternion representation, and as in terms of R ∈ SO(3). The angular velocity update emerges by employing a Newton solver for where U is the control vector and C exp (X) = I 3×3 − In this work, the physical motion, the prediction step of the EKF, and state propagation of the sigma points in the UKF are made using the Lie group symplectic integration [6,7,40].
An important flaw related with the EKF's and UKF's implementation is the singularity of the state estimation error covariance matrix, when the orientation is expressed by the unit quaternions. The unit norm constraint results in the singularity of the latter covariance matrix [44]. Three solutions to this problem exist [44]. In this work, regarding the UKF's implementation, we utilise the approach that deletes one of the quaternion components in order to obtain a truncated state error covariance expression. Per contra, because the EKF consists of second order terms only, it does not compute an ill-conditioned covariance matrix. This claim is mathematically justified in [10].

Simulation Results
In this section, we describe a series of simulations for two distinct cases. We demonstrate attitude and rate estimation from vector measurements for UAVs and LEO satellites. For both case studies, the measurement noise and model uncertainty are initially modeled as white Gaussian noises. Subsequently, to stress the significance of the dual optimal control formulation, we replace the model error with an unknown deterministic disturbance that exerts on the existing system.

Simulation Cases
5.1.1 Case 1: Attitude and rate estimation for UAVs n this case, the measurement noise is Gaussian zero mean random process and it is set relatively large aiming to express poor sensor quality. In particular, matrix D is chosen so that the signals and the angular rate Ω 0 = [0.3 0.2 0.1] ⊤ rad/sec. The initial orientation matrix is obtained by using the Rodrigues formula R 0 = rod(q 0 ) [28]. The control torques are given by T (t) = [sin 2π 3 t − sin 2π 1 t cos 2π 5 t ] ⊤ . Lastly, we assume that the two reference vectors α i (t), i = 1, 2 are orthogonal for every t. Table 1    In this case, we consider smaller measurement noise levels. The input torques are also assumed of lower frequency and the inertia tensor is increased resulting a slow satellite's motion. The initial orientation deviates significantly from the identity since the spacecraft can be oriented arbitrarily around its center of mass. The initial angular rate is set smaller compared to the previous experimental study declaring the much slower motion of the satellite. The parameters of the system and the initialization parameters of the filters are summarised in Table 3 and Table 4, respectively.

Case 1: Attitude and rate estimation for a UAV
In Fig. 1, the three components (X, Y, Z) of the angular rate estimation error are shown for each case, respectively. In the case of the MEF (depicted in yellow), the estimation errors converge after a very brief transient response providing with the smallest steady state error value of all filters. In the case of the PF (depicted in grey), Σ = 0.3 · 10 −3 was selected, which satisfies the covariance constraint imposed by (34)-(37), with trace(M ) = 0.63. Note that the covariance constraint induces a trade-off regarding the PF's transient response, due to the fact that increased measurement noise levels require larger value for Σ; this results in slower transient response. As in the case of the MEF, the PF has also a short transient response but presents an oscillatory behavior in the steady state. This is attributed to the non-adaptive nature of the filter, and to the fact that measurement error in the optimal correction term δ * (t) is scaled by an almost constant matrix, as can be seen from (29). Regarding the stochastic filters, the EKF (depicted in red) seems to outperform the UKF (depicted in blue) as the former has a fast convergence, while the latter presents an oscillatory behavior. This is because the rate estimates depend on the orientation estimates, which are re-projected many times within the algorithm. Another reason for the UKF's noisy asymptotic performance is the ad-hoc fine-tuning in our experiments. Nevertheless, such an approach is necessary to balance efficiency and extreme computational burden. For the orientation error presented in Fig. 2, the MEF again shows its superiority by having the fastest transient response and smallest asymptotic error, while the predictive filter has a small angle bias due to the remaining angular velocity error. This is because the PF's orientation correction is made exclusively through the axis of rotation, and the kinematics remain isolated for geometric integration. However, the PF achieves the second-lowest estimation error with the lowest computational cost. The downside of the PF is that it needs precise tuning and many iterations in order for the estimates to be statistically consistent. Thus, on the one hand, the PF architecture avoids an additional re-projection step and an expensive implementation; on the other hand, it leads to a constant deviation of around 0.056 o due to the lack of additional kinematic correction. Worth noticing, however, is that the kinematics express Euler's theorem and thus cannot be considered uncertain. The EKF converges smoothly towards zero, whereas the UKF appears to have an additional peak. This is because the rotation axis has not been estimated well up to that time step (Fig. 1). At the same time, the EKF outperforms the UKF during the steady-state. The reason is that the latter employs a stochastic linearization including mainly addition operations to produce the prediction and correction state, thus requiring intermediate normalization steps. Another compelling observation is that for the PF, higher scaled gain can result in faster convergence at the cost of increasing the asymptotic estimation error. Fig. 3 depict the angular rate estimation in the case where a deterministic model error acts on the system dynamics. Both deterministic filters perform very well since they determine Figure 2: Attitude estimation error e q,R (t) (process noise-case 1) the necessary model error that drives the actual system and produces the obtained observations. In particular, by reducing the measurement noise, both the MEF's and the PF's rate estimation error converge fast to zero, and the same holds for the attitude estimation error (Fig. 4). For the PF however, the oscillatory behavior that has been observed previously, appears only within the transient state in this case.
On the other hand, the stochastic filters' rate is affected significantly as the model error frequencies are transferred in the angular velocity error. In terms of the orientation error (Fig. 4), the deterministic model uncertainty influences only the EKF's response, while the UKF's remains unaffected, due to the stochastic linearization process. The deterministic model error is canceled out in the predicted state covariance step of the algorithm (proof in Appendix 7.1). In particular, it is shown that there are sigmapoint distributions that block the model error influence in the orientation estimate; hence, the model error appears only in the angular velocity's correction step. Thus, the UT -for particular sigma point distributions-potentially recasts the angle estimates uncontrollable from the model error. However, this does not mean that the UKF estimates the orientation correctly since the estimated rotation axis deviates significantly from its nominal trajectory; the filter remains blind w.r.t. model errors and trusts more its angle estimates. Additionally, the fact that the corrected angular velocity is a linear combination of the model error and the scaled output error, preserves the frequencies of all three components which appear unaltered in the angular velocity estimation error (Fig. 3). The exact opposite is true in the EKF; both the rate and angle estimates are affected by the model uncertainty, as the Jacobian matrix (prediction update) is a function of the predicted rate estimate, and its components appear to all its entries. By utilizing the matrix inversion lemma for the output covariance matrix, it can be shown explicitly that the model error vector appears both in the upper and in the lower part of the gain matrix. rate. The MEF, once again, outperforms the rest of the filters by showcasing a similar behaviour as in the UAV case. Furthermore, the Gaussian approximate filters can be re-tuned to converge faster. In addition, the decreased measurement noise, allows us to opt Σ = 0.3 · 10 −4 , for the PF and thus achieving a fastest transient and an improved asymptotic error.

Case 2: Attitude and rate estimation for a satellite
Regarding the model error case we observe once again that the deterministic filters outperform the stochastic ones, since both the axis of rotation and angle estimates present low transient and asymptotic error. The EKF as well as the UKF transfer the model error unhurt in the rate error as it is imposed by their architecture. Once again the UKF's orientation estimates are not affected by the model uncertainty for the same structural reasons mentioned previously.

Conclusions
This work performed a critical assessment of the reasons governing the superiority of deterministic modelling over stochastic, for the problem of orientation and rate estimation from vector measurements. The distinction between the two approaches was emphasised and investigated, with the state space's geometry and characteristics being the main criterion. By the analysis and the results of extensive simulations, the deterministic approach was shown to overcome important deficiencies imposed by the Bayesian architectures, and to handle large model errors. As an example, the second-order-optimal Minimum Energy Filter (MEF) [56] was presented, and a modified predictive filter (PF) on the TSO(3) was derived. Both of these filters were compared versus the most commonly used representatives of the Gaussian Approximate Filters, the EKF and the UKF. Two different simulation cases were considered, for a UAV and for a satellite, respectively. The simulations revealed that the deterministic filters, and in particular the MEF, outperform the Gaussian approximate solutions especially in the realistic scenario, where a deterministic model error exerts on the actual plant. The reason is fundamental -from first principles-and originates in the set-theoretic approach for estimation, when seen as a dual optimal control problem. The stochastic filters require at least one re-projection step and are affected by model errors. In particular, we address that quaternion normalization leads to unbiasedness of the orientation and rate estimates. In addition, for certain sigma point distributions, the UKF's estimation angle is uncontrollable from model errors. While a more efficient implementation of the UKF for attitude estimation exists [13] where the stochastic linearization is performed by utilizing intrinsic gradient descent algorithms, it is not robust w.r.t. deterministic model errors and also requires one re-projection step.
Another remark is that both stochastic filters require the initial prior information in contrast with the deterministic ones. In practice this information may not be available. For example, satellite missions are placed in environments that are not fully known beforehand, which makes it impossible to obtain data in advance. Deterministic filters do not require any prior initialization, providing exceptional flexibility for the problem at hand. From the deterministic filters presented in this work, the predictive filter has to be tuned to provide statistically consistent results. However, this tuning is based on the measurement noise statistics, which can be determined offline by experimentation. Having the disadvantage of an almost fixed gain, the predictive filter is still to be investigated as future research under adapted gain scaling. The EKF and UKF have roughly the same accuracy. Thus, due to the computational overhead of the UKF, the simplicity of the Jacobian matrix calculations, and the quasi-linear nature of the quaternion kinematics the EKF is considered preferable compared to the UKF for the task. Per contra, both deterministic filters -and especially the MEF-perform better as they achieve lower errors in both cases. Finally, a further analysis for examining the filters' limitations is a potential future research objective. Such operating factors include eclipse conditions, event-triggered change of dynamics [29], co-linearity of measurements, extreme measurement noise, risk-averse events [38], etc.
In order to calculate the first term in the brackett, we consider a deviation δR from R with δR = exp( Ω × ) and Ω a tangent vector attached on the identity. Then The latter results using the Taylor expansion of the exponential matrix, and from the fact that with X = x × ∈ so(3) Finally, since ∂ Ω ( R ⊤ α k ) = 0 we obtain Furthermore, the second-order Lie derivative of (51) reads where the term ∂ R Ω × R ⊤ α k is computed using the product rule:

UKF with deterministic model error
Let us declare with u the nominal input to the filter. Then, we can writeũ = u + Iδ that is, the input torques corrupted by the model error δ as they applied to the model. The filter utilises the equations where the first n 1 rows ofG refer to the kinematics and therefore are zero. The predicted sigma-points are then calculated according to x δ k|k−1 = f (x k−1 ,ũ k−1 ) − GIδ k−1 = x k|k−1 − GIδ k−1 (61) and the predicted state according to By utilizing the scaled UT where λ = α 2 (L + k) − L [64]. From here we observe that for k = 0 and α = 1 Therefore, the predicted state covariance The same applies to the cross covariance matrix where the model error term is canceled out. Thus, the adaptive gain of the filter is not affected by the model error. The only step where the error applies is the correction step through the rate-part of x * k .

Bias due to quaternion re-projection
The last step for both the EKF and UKF is a re-projection of the corrected quaternion q k|k = q k|k−1 + K u ǫ k|k−1 (66) where K u = K [1:4,1:6] and ǫ k|k−1 = y k|k−1 − y k|k−1 . The normalised corrected quaternion is given by We are interested to examine the function d : R 4 → R with d(X) = ||X|| −1 in a neighborhood of q k|k ∈ S 3 ⊂ R 4 in the direction of K u ǫ k|k−1 .