Time‐limited balanced truncation within incremental four‐dimensional variational data assimilation

Four‐dimensional variational data assimilation (4D‐Var) is a data assimilation method often used in weather forecasting. Based on a numerical model and observations of a system, it predicts the system state beyond the last time of measurement. This requires the minimisation of a functional. At each step of the optimisation algorithm, a full nonlinear model evaluation and its adjoint is required. This quickly becomes very costly, especially in high dimensions. For this reason, a surrogate model is needed that approximates the full model well, but requires significantly less computational effort. In this paper, we propose time‐limited balanced truncation to build such a reduced‐order model. Our approach is able to deal with unstable system matrices. We demonstrate its performance in experiments and compare it with α‐bounded balanced truncation, which is an another reduction approach for unstable systems.

derived [6].The quantity of interest is the maximum a posteriori estimate    * 0 = arg min    0 (   0 ), where the cost functional that needs to be minimised is given by subject to the forward model dynamics (1a) for     , the state at time   , and ‖  ‖ 2    ∶=    T     .This setting is called strong constraint 4D-Var and has its origins in weather forecasting [1], where the name 4D-Var refers to the three spatial dimensions and the additional evolution in time.The minimisation of the functional (   0 ) requires to run the possibly nonlinear model (1) and its adjoint multiple times and is thus very costly to compute.In data assimilation, the incremental 4D-Var approach [3] is often used.It uses successive linearised versions of the operators and then minimises a quadratic functional, just like the inexact Gauss-Newton Method [5].The linearisation of the operators   and   around     leads to the Jacobians     and     in the so-called tangent linear model: where     denotes the state perturbation and     ∶=     −   (    ) is the distance between the measurement and the output of the full model at time   .Problems where 4D-Var is applied especially appear in weather forecasting [1,7] where one typically has to deal with unstable matrices     after linearisation.This will be addressed by our model order reduction approach later.The quadratic cost functional derived from (3) that has to be minimised for    0 is given by subject to the forward model dynamics (3a).The minimisation of ( 2) is now done as an inner-outer iteration: In the outer loop the current guess for the initial condition is updated by     ] and its expected observations for the computation of (4).The idea of obtaining    * 0 incrementally is promising, but the minimisation of ( 4) is very costly for high dimensions, even in the linear setting.Instead of using the full model, we want to project the system (3) or its system matrices     and     onto a lower-dimensional subspace.We need a projection operator    T ∈ ℝ × projecting the increment    ∈ ℝ  onto The state variables are reduced in dimension by this projection, hence x   ∶=    T    for all time and iteration indices.The projected version of the prior covariance matrix is thus given as Γ Γ Γ ∶=    T Γ Γ Γ pr   .The projection does not effect the observations     and the corresponding error covariance     .The projected functional (4) reads: and is now less costly to compute because the dimension was reduced from  to the much smaller .Suitable projection matrices    and    must be found so that the reduced system still approximates well the behaviour of the full system.The remainder of this paper discusses the choice of projection operators and demonstrates our approach in numerical examples.

Model reduction by balanced truncation
Model reduction for dynamical systems often relies on projection-based methods [8,9], and one such method that has been used in data assimilation [2,10,11] is balanced truncation (BT) [12,13].A more detailed explanation of BT can be found in the book [14] which we based this subsection on.The original BT method considers linear time invariant (LTI) systems with a control input   ().For this work, we focus on the setting in discrete time ( ∈ ℤ + ): ( + 1) =   () +   (),   () =   (),   (0) =    0 .
The main idea of BT is to only keep the subspace of states which are simultaneously easy to reach and easy to observe.The computations rely on the so-called reachability and observability Gramians, which are associated with the reachability and the observability energy of a state.The Gramians are functions of time, and defined as follows: T (   T )  , the time-limited reachability Gramian, and (8a) The time-limited Gramians describe the amount of energy necessary for a state to be reached or observed up to a fixed time  .The system behaviour in the entire time span from zero to infinity gives the energy needed to reach or observe a state at any point of time.This justifies the definition of the limits of the finite Gramians (8a) and (8b).These limits only exist for systems (7) which are stable, meaning the eigenvalues of    have to lie inside the unit circle such that     is bounded for  → ∞ and consequently     and     are bounded for  → ∞.The limits    ∞ = lim  →∞     and    ∞ = lim  →∞     for a stable LTI system (7) are: T (   T )  , the infinite reachability Gramian, and (9a) (   T )     T       , the infinite observability Gramian. ( Infinite Gramians have the advantage that they can be computed efficiently as the solutions of discrete-time Lyapunov (or Stein) equations.For a stable discrete-time system (7) the infinite Gramians (9a) and (9b) are unique, positive-definite solutions to the discrete-time Lyapunov equations      ∞    T +      T =    ∞ and    T    ∞    +    T    =    ∞ .The downside of using infinite Gramians is the limitation of BT to stable systems.In data assimilation applications unstable systems are very common.For this reason we propose the time-limited BT approach for unstable systems and rather use the time-limited Gramians evaluated at the last time of observation, namely   (  ) and   (  ).From now on, we denote the reachability Gramian by    and the observability Gramian with   , no matter whether they are time-limited or infinite.The Gramians are necessary to find a low-dimensional subspace that contains states with low reachability energy and high observability energy so that the states in this reduced subspace are both easily reachable and easily observable.Reachability and observability must be made comparable by choosing a basis in which both concepts are equivalent.This is done by making sure that the directions obtained by BT maximise the Rayleigh quotient (‖  ‖    ∕‖  ‖    −1 ) 2 .
This holds for the dominant eigendirections     of the matrix pencil (  ,    −1 ) associated with the dominant eigenvalues  2   .They satisfy       =  2     −1     and the eigenvalue square roots  1 >  2 > … >   are called Hankel singular values.The matrix of the  dominant eigendirections    = [ 1 , .… ,   ] is used to build rank- projection operators    and    with    T    =     , that project the state variables and matrices of the system (7) onto a suitable low-dimensional subspace, just as in (5).The algorithm for BT of systems of the form ( 7) is based on the singular value decomposition.Details can be found in the book [14].
Remark 2.1.BT can be applied without further assumptions on the system (7), as long as suitable Gramians can be defined.The standard application are stable systems with infinite Gramians (9) which can be obtained by solving Lyapunov equa-tions.Time-limited BT works with the harder to compute time-limited Gramians (8) but makes the approach applicable to unstable systems.For both choices of Gramians output error bounds are known [14,15].Remark 2.2.Another approach for unstable systems is -bounded balanced truncation [2] which we will use for comparison.In the discrete setting, all eigenvalues of the finite matrix    in (7) lie inside a circle of radius (  ) (the maximal absolute value that an eigenvalue of    has) around the origin.Once  > (  ) is chosen, the system can be shifted by     ∶=   ∕,     ∶=   ∕ √  and     ∶=   ∕ √ .BT is now performed on the shifted system, which is stable.

BT within strong constraint incremental 4D-Var
The idea of model reduction by BT for 4D-Var first appeared in the work of Lawless et al. [4] and was subsequently applied to strong-constraint 4D-Var [2,16] and to the weak constraint setting (including model errors, which we ignore here) [10].
Within this work, we demonstrate the use of time-limited balanced truncation (TLBT) for the tangent linear model (3), as we proposed in our work [17].We assume that     =    and     =    for all  to have a LTI system.Up to now we have ignored noise in our explanation of 4D-Var but want to take into account the prior and measurement uncertainty in Gramian computations.The tangent linear model including prior uncertainty is then given by:    0 =    0 ∼  (0 0 0, Γ Γ Γ pr ) and    +1 =       for  ≥ 0     =       , (10) so that the covariance of the initial increment    0 corresponds to the prior covariance.We additionally assume timeinvariance of the observation error, that is,     = Γ Γ Γ  for all .
We now apply BT which is defined for systems of type (7) to the inner loop LTI system (10) and thus need to define suitable Gramians.For  ≥ 1 the systems are the same with the input port    = 0 in (10).For  = 0 we have to take into account the prior distribution in (10) and hence a different input operator at the initial time.This is resolved by using a different summand at  = 0 in the computations of the Gramians (8a) and (9a), namely          T (   T )  =     0 0 0(   T )  = 0 0 0 for  ≥ 1 and    0 Γ Γ Γ pr (   T ) 0 = Γ Γ Γ pr at time  = 0 (see [10]).The Gramians for system (10) are given by: 0 0 0(   T )  = Γ Γ Γ pr , the time-limited empirical reachability Gramian, and (11a) (   T )     T Γ Γ Γ −1        , the time-limited empirical observability Gramian.
( 1 1 b ) Infinite Gramians are obtained for  → ∞ and it always holds    4D-Var = Γ Γ Γ pr .These empirical Gramians are found in our work [17] and similarly in a paper by Qian et al. [11] for the continuous setting.The observation error of (1b) was included in the observability Gramian as proposed by literature [11,16].This choice is reasonable because of an argument by Bernstein et al. [18]: The Γ Γ Γ −1  -weighted expected distance error between the observations of the full system (10) and the observations of its reduced projected version with operators (5), is minimised by the reduced model obtained using (11b).This is explained in more detail in our paper [17].
Time-limited balanced truncation can now be applied to system (10) using the Gramians (11).Projection matrices    and    T are built by TLBT as described in Section 2.1 and used to reduce (10) and obtain the corresponding reduced cost functional (6) of the inner loop.The minimisation in the inner loop and update of the outer loop can then proceed as described in Section 1. TLBT within the inner loop of incremental 4D-Var is particularly effective if the projectors    and    T do not change at each call of the inner loop but can be computed once for the entire incremental 4D-Var algorithm.This is the case if it can be assumed that the linearised system matrices of (3) do not change during the iterations of the outer loop.The non-linear model trajectories are not affected by the projection and need to be re-evaluated at each outer loop iteration.

2.3
Posterior precision as constant Hessian in a Quasi-Newton method The typical framework of incremental 4D-Var is the linearisation around the current iterate initial state    () 0 and then finding the minimum of (4).This minimisation requires the computation of the gradient ∇    0 J(   () 0 ) and the Hessian ∇∇    0 J(   ).The authors of [5] demonstrate that for an exact tangent linear model (3) this corresponds to a Gauß-Newton step.We want to make this computationally tractable and use an approximate and reduced model.We thus propose a further simplification, namely to determine linearisations     =    ∈ ℝ × and     =    ∈ ℝ   × in (3) primarily for all possible (initial) states, that is we chose one linearisation for the whole model and do not linearise at each step of the outer loop.We also assume time-invariance of the observation error, that is     = Γ Γ Γ  for all .Then, a system of type (1) with     0 = 0 0 0 reads: =     +     ,     ∼  (0 0 0, Γ Γ Γ  ) for  = 1, … , .
We apply the Bayesian statistical approach with the Gaussian prior distribution for    0 and the derived Gaussian likelihood where . It is well known that the posterior distribution of    0 is again Gaussian (see e.g., [19]): The Fisher matrix    ∈ ℝ × is defined as Note that these summands also appear in the empirical observability Gramian (11b).It can be verified that Γ Γ Γ pos corresponds to the inverse Hessian for the minimisation of (4) for constant     ,     and     .Using     =        0 we obtain: ), that is the gradient shifted by the weighted distance to the non-zero background error.We can use the reduced quantities (5) to compute reduced posterior quantities (12) and thus the (inverse) Hessian and gradient of the of the reduced cost functional (6) for a given initial state (and the corresponding trajectory).Qian et al. [11] have explained how BT is used for the efficient computation of a good approximation of (12) and we generalised the approach to arbitrary priors and unstable systems using TLBT [17].Standard methods [4,5] use the adjoint model to find the increment    () 0 for the i-th step of the outer loop.Our approach is to use the described TLBT-based computations for building a low-resolution model in the inner loop and compute an approximation of    () 0 as the minimiser of (6) effectively.
The assumption     =    and     =    for the whole model is particularly effective as we only need to build the reduced model by TLBT once and can then perform computations in the lower-dimensional setting at each iterate of the outer loop.Additionally, it can be seen that the computation of Γ Γ Γ pos in (12) does not depend on the current trajectory but only on the (reduced) system matrices.With our assumptions, they are constant all throughout the optimisation.Consequently, we only need to compute the reduced prior precision as approximate Hessian once in the beginning and can use it at each update step.Only the gradient computation needs to be performed for each iterate to determine the descent direction.Hence, this results in a Quasi-Newton method [20] and saves computational effort.
F I G U R E 1 RMS errors for 4D-Var as described above, with Γ Γ Γ pr Gaussian exponential matrix and all  variables observed.RMS, root-mean-square.
In this section, we have proposed TLBT for model-reduction within the inner loop of incremental 4D-Var.With fixed linear system matrices    and    as approximate system and observation operator, we only need to compute an approximate Hessian once.This leads to a significant speed up in the computations for solving the initial value problem (1), at least approximately.The speed-up with our approach and the quality of the approximate solution will now be demonstrated in experiments.

NUMERICAL EXPERIMENTS WITH THE LORENZ-95 SYSTEM
The Lorenz-95 system was introduced by Edward Lorenz in 1995 [21] and is a nonlinear system of ODEs that is known for its behaviour similar to numerical weather models.It is characterised by its dimension  and the forcing : d    d = −   −2    −1 +    −1    +1 −     +  for  = 1, … ,  and cyclic boundary conditions (13) For our experiments with the Lorenz-95 model, we have chosen the parameters  = 500 and  = 8.We want to consider an assimilation window of 50 steps with incoming measurements and then compute a forecast of 100 steps.The step size is set to ℎ = 0.01, leading to   = 0.5 the end-time for the last measurement.The chaotic and unstable nature of the system only allows for short-term predictions.We provide the MATLAB code for reproducing the presented results at https://github.com/joskoUP/TLBT4DVar.For our setup, it can be chosen between    =     for the constant observation matrix or if only partial observations every  of  dimensions are made.The prior covariance can be chosen to be Γ Γ Γ pr = 0.1    or to be a Gaussian exponential, i.e. (Γ Γ Γ pr ) , = 0.1 exp( ) ∀, .The observation error covariance is Γ Γ Γ  = 0.01    .
We create a true initial condition    0 = [   1 (0), … ,     (0)] T by sampling from a standard normal distribution and then compute the background state     0 (which is the starting value for the outer loop in our optimisation) by disturbing    0 with  (0 0 0, Γ Γ Γ pr )-distributed additive noise.All trajectories of the full model are computed from a given initial condition with the system equations ( 13) by a fourth order Runge-Kutta method.The trajectory starting from    0 is considered the truth, whereas the trajectory starting from     0 is the first guess before assimilation.We use incremental 4D-Var to infer the initial condition    * 0 as best guess of    0 .For the full model, we linearise at each iteration of the outer loop to obtain the timevarying tangent linear model (3).In the reduced setups (TLBT and -BT), we assume     =    constant over all iterations and construct    once in the beginning via linearisation in the middle of the assimilation window, which is  25 = 0.25.We compare the trajectory starting from    * 0 (obtained after assimilation) to the truth in the root mean square error.This is plotted in Figure 1 with the Gaussian exponential prior and full observations and in Figure 2 for the case with Γ Γ Γ pr = 0.1    F I G U R E 2 RMS errors after assimilation for observed and unobserved variables in 4D-Var as described above, with Γ Γ Γ pr = 0.1    and observations every 9 points.RMS, root-mean-square.

TA B L E 1
Runtimes and root-mean-square error over the whole assimilation window for the full model compared to the reduced model with time-limited balanced truncation and -balanced truncation in the inner loop.and partial observations (only every 9th spatial point).It can be observed that TLBT does perform well compared to the full model and that the results for -BT depend strongly on the chosen setting.

𝚪 𝚪 𝚪
In Table 1 the runtimes and root-mean-square (RMS) errors for several settings are given.We can observe a significant speed-up even though we have only reduced the dimension from  = 500 to  = 375.This is mainly due to the fact that we use a constant Hessian and can precompute (   T )     T Γ Γ Γ −1  for all  for the gradients.Building the projection matrices and the reduced model operators (5) requires a lot of effort.The Quasi-Newton steps are comparably cheap so that we can allow our approximate model to need more steps than the full model and still be faster.Model reduction by -BT is cheaper than by TLBT, because we can compute the Gramians as the solutions of Stein equations instead of the direct summation approach that is necessary with unstable system matrices.The big advantage of TLBT compared to -BT is that with TLBT, there is no need to solve an eigenvalue problem and determine a heuristic value for the parameter .With  = (  ) + 10 −5 we have made a good choice for most of our settings.TLBT and -BT lead to assimilation errors in the same order of magnitude and -BT can outperform TLBT.In the setting with the Gaussian exponential prior and partial minimum    * 0 =∶    () 0 of (4) is used in this update and the minimisation of (4) is called the inner loop.The nonlinear model is run from the current step initial condition    () 0 to obtain the trajectory    () = [ obtain the new increment    (+1) 0 as the solution of ∇∇    0 J( T    =∶  x   ∈ ℝ  , with  ≪ .The operator    such that    T    =     gives the reduced model and observation operators as: M    ∶=    T        and Ĥ    ∶=       .