Spectral estimates for saddle point matrices arising in weak constraint four‐dimensional variational data assimilation

We consider the large sparse symmetric linear systems of equations that arise in the solution of weak constraint four‐dimensional variational data assimilation, a method of high interest for numerical weather prediction. These systems can be written as saddle point systems with a 3 × 3 block structure but block eliminations can be performed to reduce them to saddle point systems with a 2 × 2 block structure, or further to symmetric positive definite systems. In this article, we analyse how sensitive the spectra of these matrices are to the number of observations of the underlying dynamical system. We also obtain bounds on the eigenvalues of the matrices. Numerical experiments are used to confirm the theoretical analysis and bounds.


INTRODUCTION
Data assimilation estimates the state of a dynamical system by combining observations of the system with a prior estimate. The latter is called a background state and it is usually an output of a numerical model that simulates the dynamics of the system. The impact that the observations and the background state have on the state estimate depends on their errors whose statistical properties we assume are known. Data assimilation is used to produce initial conditions in numerical weather prediction (NWP), 1,2 as well as other areas, for example, flood forecasting, 3 research into atmospheric composition, 4 and neuroscience. 5 In operational applications, the process is made more challenging by the size of the system, for example, the numerical model may be operating on 10 8 state variables and 10 5 −10 6 observations may be incorporated. 6,7 Moreover, there is usually a constraint on the time that can be spent on calculations.
The solution, called the analysis, is obtained by combining the observations and the background state in an optimal way. One approach is to solve a weighted least-squares problem, which requires minimizing a cost function. An active research topic in this area is the weak constraint four-dimensional variational (4D-Var) data assimilation method. [8][9][10][11][12][13][14] It is employed in the search for states of the system over a time period, called the assimilation window. This method uses a cost function that is formulated under the assumption that the numerical model is not perfect and penalizes the weighted discrepancy between the analysis and the observations, the analysis and the background state, and the difference between the analysis and the trajectory given by integrating the dynamical model.
Effective minimization techniques need evaluations of the cost function and its gradient that involve expensive operations with the dynamical model and its linearized variant. Such approaches are impractical in operational applications. One way to approximate the minimum of the weak constraint 4D-Var is to use an inexact Gauss-Newton method, 15 in which a series of linearized quadratic cost functions with a low resolution model are minimized, 16 and the minima are used to update the high resolution state estimate. The state estimate update is found by solving sparse symmetric linear systems using an iterative method. 17 To increase the potential of using parallel computations when computing the state update with weak constraint 4D-Var, Fisher and Gürol 12 introduced a symmetric 3 × 3 block saddle point formulation. The resulting large symmetric linear systems are solved using Krylov subspace solvers. 14,17,18 One criteria that affects their convergence is the spectra of the coefficient matrices. 18 We derive bounds for the eigenvalues of the 3 × 3 block matrix using the work of Rusten and Winther. 19 In addition, inspired by the practice in solving saddle point systems that arise from interior point methods, 20,21 we reduce the 3 × 3 block system to a 2 × 2 block saddle point formulation and derive eigenvalue bounds for this system. We also consider a 1 × 1 block formulation with a positive definite coefficient matrix, which corresponds to the standard method. 8,9 Some of the blocks in the 3 × 3 and 2 × 2 block saddle point coefficient matrices, and the 1 × 1 block positive definite coefficient matrix depend on the available observations of the dynamical system. We present a novel examination of how adding new observations influences the spectrum of these coefficient matrices.
In Section 2, we formulate the data assimilation problem and introduce weak constraint 4D-Var with the 3 × 3 block and 2 × 2 block saddle point formulations and the 1 × 1 block symmetric positive definite formulation. Eigenvalue bounds for the saddle point and positive definite matrices and results on how the extreme eigenvalues and the bounds depend on the number of observations are presented in Section 3. Section 4 illustrates the theoretical considerations using numerical examples, and concluding remarks and future directions are presented in Section 5.

VARIATIONAL DATA ASSIMILATION
The state of the dynamical system of interest at times t 0 <t 1 <· · ·<t N is represented by the state vectors x 0 ,x 1 ,… ,x N with x i ∈ R n . A nonlinear model m i that is assumed to have errors describes the transition from the state at time t i to the state at time t i+1 , that is where i represents the model error at time t i . It is further assumed that the model errors are Gaussian with zero mean and covariance matrix Q i ∈ R n×n , and that they are uncorrelated in time, that is, there is no relationship between the model errors at different times. In NWP, the model comes from the discretization of the partial differential equations that describe the flow and thermodynamics of a stratified multiphase fluid in interaction with radiation. 1 It also involves parameters that characterize processes arising at spatial scales that are smaller than the distance between the grid points. 22 Errors due to the discretization of the equations, errors in the boundary conditions, inaccurate parameters, and so on are components of the model error. 23 The background information about the state at time t 0 is denoted by x b ∈ R n . x b usually comes from a previous short range forecast and is chosen to be the first guess of the state. It is assumed that the background term has errors that are Gaussian with zero mean and covariance matrix B ∈ R n×n .
Observations of the dynamical system at time t i are given by y i ∈ R p i . In NWP, there are considerably fewer observations than state variables, that is, p i ≪ n. In addition, there may be indirect observations of the variables in the state vector and a comparison is obtained by mapping the state variables to the observation space using a nonlinear operator  i . For example, satellite observations used in NWP provide top of the atmosphere radiance data, whereas the model operates on different meteorological variables, for example, temperature, pressure, wind, and so on. 24 Hence, values of meteorological variables are transformed into top of the atmosphere radiances in order to compare the model output with the observations. In this case, the operator  i is nonlinear and complicated. Approximations made when mapping the state variables to the observation space, different spatial and temporal scales between the model and some observations (observations may give information at a finer resolution than the model), and preprocessing, or quality control, of the observations (see, eg, section 5.8 of Kalnay 1 ) comprise the representativity error. 25 The observation error is made up of the representativity error combined with the error arising due to the limited precision of the measurements. It is assumed to be Gaussian with zero mean and covariance matrix R i ∈ R p i ×p i . The observation errors are assumed to be uncorrelated in time. 7

Weak constraint 4D-Var
In weak constraint 4D-Var, the analysis x a 0 , x a 1 , … , x a N is obtained by minimizing the following nonlinear cost function This cost function is referred to as the state control variable formulation. Here, the control variables are defined as the variables with respect to which the cost function is minimized, that is, x 0 ,x 1 ,… ,x N in (2). Choosing different control variables and obtaining different formulations of the cost function is possible. 8 If the model is assumed to have no errors (ie, x i+1 = m i (x i )), the cost function simplifies as the last term in (2) is removed; this is called strong constraint 4D-Var. Rejecting this perfect model assumption and using weak constraint 4D-Var may lead to a better analysis. 9 Iterative gradient-based optimization methods are used in practical data assimilation. 7,26 These require the cost function and its gradient to be evaluated at every iteration. In operational applications, integrating the model over the assimilation window to evaluate the cost function is computationally expensive. The gradient is obtained by the adjoint method (see, eg, section 2 of Lawless 7 and section 3.2 of Talagrand 26 for an introduction), which is a few times more computationally expensive than evaluating the cost function. This makes the minimisation of the nonlinear weak constraint 4D-Var cost function impractical. Hence, approximations have to be made. We introduce such an approach in the following section.

Incremental formulation
Minimisation of the 4D-Var cost function in an operational setting is made feasible by employing an iterative Gauss-Newton method, as first proposed by Courtier et al 16 for the strong constraint 4D-Var. In this approach, the solution of the weak constraint 4D-Var is approximated by solving a sequence of linearised problems, that is, the (l+1)th approximation of the state is where x (l) i is obtained as the minimizer of the linearised cost function (1)) and M (l) i and H (l) i are the model m i and the observation operator  i , respectively, linearised at x (l) i . Minimisation of (4) is called the inner loop. The lth outer loop consists of updating the approximation of the state (3), linearizing the model m i and the observation operator  i , and computing the values b (l) , d (l) i , and (l) i . This cost function is quadratic, which allows the use of effective minimisation techniques, such as conjugate gradients (see Chapter 5 of Nocedal and Wright 27 ). In NWP, the computational cost of minimizing the 4D-Var cost function is reduced by using a version of the inner loop cost function that is defined for a model with lower spatial resolution, that is, on a coarser grid. 28 We do not consider such an approach in the subsequent work, because our results on the change of the spectra of the coefficient matrices and the bounds (that are introduced in the following section) hold for models with any spatial resolution. For ease of notation, we introduce the following 4D (in the sense that they contain information in space and time) vectors and matrices. These vectors and matrices are indicated in bold.
We also define the matrices where I ∈ R n×n is the identity matrix, L (l) ∈ R (N+1)n×(N+1)n and H (l) ∈ R p×(N+1)n . We define the block diagonal covariance matrices , D ∈ R (N+1)n×(N+1)n and R ∈ R p×p . The state update (3) may then be written as and the quadratic cost function (4) becomes where ||a|| 2 A −1 = a T A −1 a. We omit the superscript (l) for the outer iteration in the subsequent discussions. The minimum of (5) can be found by solving linear systems. We discuss different formulations of these in the next three subsections.

3 × 3 block saddle point formulation
In pursuance of exploiting parallel computations in data assimilation, Fisher and Gürol 12 proposed obtaining the state increment x by solving a saddle point system (see also Freitag and Green 14 ). New variables are introduced The gradient of the cost function (5) with respect to x provides the optimality constraint Multiplying (6) by D and (7) by R together with (8), yields a coupled linear system of equations: where the coefficient matrix is given by  3 is a sparse symmetric indefinite saddle point matrix that has a 3 × 3 block form. Such systems are explored in the optimization literature. 20,21,29 When solving these systems iteratively, it is usually assumed that calculations involving the blocks on the diagonal are computationally expensive, while the off-diagonal blocks are cheap to apply and easily approximated. However, in our application, operations with the diagonal blocks are relatively cheap and the off-diagonal blocks are computationally expensive, particularly because of the integrations of the model and its adjoint in L and L T .
Recall that the sizes of the blocks R, H, and H T depend on the number of observations p. Thus, the size of  3 and possibly some of its characteristics are also affected by p. The saddle point systems that arise in different outer loops vary in the right-hand sides and the linearization states of L and H.
Because of the memory requirements of sparse direct solvers, they cannot be used to solve the 3 × 3 block saddle point systems that arise in an operational setting. Iterative solvers (such as MINRES, 30 SYMMLQ, 30 GMRES 31 ) need to be used. These Krylov subspace methods require matrix-vector products with  3 at each iteration. Note that the matrix-vector product  3 q, q T = (q T 1 , q T 2 , q T 3 ), q 1 , q 3 ∈ R (N+1)n , q 2 ∈ R p , involves multiplying D and L T by q 1 , R and H T by q 2 , and L and H by q 3 . These matrix-vector products may be performed in parallel. Furthermore, multiplication of each component of each block matrix with the respective part of the vector q i can be performed in parallel. The possibility of multiplying a vector with the blocks in L and L T in parallel is particularly attractive, because the expensive operations involving the linearised model M i and its adjoint M T i can be performed at the same time for every i ∈ {0,1,… ,N−1}.

2 × 2 block saddle point formulation
The saddle point systems with 3 × 3 block coefficient matrices that arise from interior point methods are often reduced to 2 × 2 block systems. 20,21 The 2 × 2 block formulation has not been used in data assimilation before. Because of its smaller size, it may be advantageous. Therefore, we now explore using this approach in the weak constraint 4D-Var setting. Multiplying Equation (6) by D and eliminating in (8) gives the following equivalent system of equations where The reduced matrix  2 is a sparse symmetric indefinite saddle point matrix with a 2 × 2 block form. Unlike the 3 × 3 block matrix  3 , its size is independent of the number of observations.  2 involves the matrix R −1 , which is usually available in data assimilation applications. The computationally most expensive blocks L and L T are again the off-diagonal blocks.
Solving (11) in parallel might be less appealing compared with solving (9) in parallel: for a Krylov subspace method, the (2, 2) block −H T R −1 H need not be formed separately, that is, only operators to perform the matrix-vector products with H T , R −1 , and H need to be stored. Hence, a matrix-vector product  2 q, q T = (q T 1 , q T 3 ), q 1 , q 3 ∈ R (N+1)n , requires multiplying D and L T by q 1 , L and H by q 3 (which may be done in parallel) and subsequently R −1 by Hq 3 , followed by −H T by R −1 Hq 3 . Hence, the cost of matrix-vector products for the 3 × 3 and 2 × 2 block formulations differs in that the former needs matrix-vector products with R while the latter requires products with R −1 , and the 2 × 2 block formulation requires some sequential calculations. However, notice that the expensive calculations that involve applying the operators L and L T (the linearised model and its adjoint) can still be performed in parallel.

1 × 1 block formulation
The 2 × 2 block system can be further reduced to a 1 × 1 block system, that is, to the standard formulation (see, eg, Trémolet 8 and A. El-Said 10 for a more detailed consideration): Observe that the coefficient matrix is the negative Schur complement of

EIGENVALUES OF THE SADDLE POINT FORMULATIONS
One factor that influences the rate of convergence of Krylov subspace iterative solvers for symmetric systems is the spectrum of the coefficient matrix (see, eg, section 9 in the survey article 18 and Lectures 35 and 38 in the textbook 33 for a discussion). Simoncini and Szyld 34 have shown that any eigenvalues of the saddle point systems that lie close to zero can cause the iterative solver MINRES to stagnate for a number of iterations while the rate of convergence can improve if the eigenvalues are clustered.
In the following, we examine how the eigenvalues of the block matrices  3 ,  2 , and  1 change when new observations are added. This is done by considering the shift in the extreme eigenvalues of these matrices, that is, the smallest and largest positive and negative eigenvalues. We then present bounds for the eigenvalues of these matrices.

Preliminaries
In order to determine how changing the number of observations influences the spectra of  3 ,  2 , and  1 , we explore the extreme singular values and eigenvalues of some blocks in  3 ,  2 , and  1 . We state two theorems that we will require. Here, we employ the notation k (A) to denote the kth largest eigenvalue of a matrix A and subscripts min and max are used to denote the smallest and largest eigenvalues, respectively.
Theorem 2 (Cauchy's Interlace Theorem, see Theorem 4.2 in Chapter 4 of Stewart and Sun 36 ). If A is an n × n Hermitian matrix and C is a (n−1) × (n−1) principal submatrix of A (a matrix obtained by eliminating a row and a corresponding column of A), then In the following lemmas we describe how the smallest and largest singular values of (L T H T ) (here L and H are as defined in Section 2.2) and the extreme eigenvalues of the observation error covariance matrix R change when new observations are introduced. The same is done for the largest eigenvalues of H T R −1 H assuming that R is diagonal. In these lemmas the subscript k ∈ {0,1,… ,(N+1)n−1} denotes the number of available observations and the subscript k+1 indicates that a new observation is added to the system with k observations, that is, matrices R k ∈ R k×k and H k ∈ R k×(N+1)n correspond to a system with k observations and R k+1 and H k+1 correspond to the system with an additional observation.
Lemma 1. Let min and max be the smallest and largest singular values of (L T H T k ), and min and max be the smallest and largest singular values of (L T H T k+1 ). Then Then by Theorem 1, and since h k+1 h T k+1 is a rank 1 symmetric positive semidefinite matrix, min (h k+1 h T k+1 ) = 0. The proof for the largest singular values is analogous. ▪

Lemma 2. Consider the observation error covariance matrices
that is, the largest (respectively, smallest) eigenvalue of R increases (respectively, decreases), or is unchanged when new observations are introduced.
Proof. When adding an observation, a row and a corresponding column are appended to R k while the other entries of R k are unchanged. The result is immediate by applying Theorem 2. ▪ Lemma 3. If the observation errors are uncorrelated, that is, R is diagonal, then Proof. The proof is similar to that of Lemma 1. For diagonal R k+1 : Then Hence, by Theorem 1,

Notation
In the following, we use the notation given in Table 1 for the eigenvalues of  3 ,  2 , and  1 , and the eigenvalues and singular values of the blocks within them. We use subscripts min and max to denote the smallest and largest eigenvalues or singular values, respectively, and min denote the smallest nonzero singular value of (L T H T ). In addition, ||⋅|| denotes the L 2 norm. We also use

Bounds for the 3 × 3 block formulation
To determine the numbers of positive and negative eigenvalues of  3 given in (10), we write  3 as a congruence transformation To account for an additional observation, a row and a corresponding column is added to be the eigenvalues of  3,k , and be the eigenvalues of  3,k+1 . Then by Theorem 2: To obtain information on not only how the eigenvalues of  3 change because of new observations, but also on where the eigenvalues lie when the number of observations is fixed, we formulate intervals for the negative and positive eigenvalues of  3 .

Theorem 4. The negative eigenvalues of  3 lie in the interval
and the positive eigenvalues lie in the interval where min , max , and i are defined in (15), (16), and Table 1.   . By Lemma 1, 2 max increases or is unchanged when additional observations are included. If max = max , the same holds for max (by Lemma 2). If max = max , changing the number of observations does not affect max . Hence, the positive upper bound increases or is unchanged.
The positive lower bound min is unaltered if min = min . If min = min , the bound decreases or is unchanged by Lemma 2. ▪ Corollary 2. If max = max , the upper bound for the negative eigenvalues of  3 in (17) is either unchanged or moves away from zero when new observations are added. If min = min , the same holds for the lower bound for negative eigenvalues in (17).
Proof. The results follow from the facts that max and min do not change if observations are added, whereas min and max increase or are unchanged by Lemma 1. ▪ If max = max or min = min , it is unclear how the interval for the negative eigenvalues in (17)

3.3
Bounds for the 2 × 2 block formulation  2 given in (12) is equal to the following congruence transformation where I ∈ R (N+1)n×(N+1)n is the identity matrix. Then by Sylvester's law,  2 has (N+1)n positive and (N+1)n negative eigenvalues. The change of the extreme negative and positive eigenvalues of  2 due to the additional observations is analyzed in the subsequent theorem. However, the result holds only in the case of uncorrelated observation errors, unlike the general analysis for  3 in Theorem 3.
where  2 has negative and zero eigenvalues. Let be the eigenvalues of  2,k , and be the eigenvalues of  2,k+1 . By Theorem 1, We further search for the intervals in which the negative and positive eigenvalues of  2 lie. We follow a similar line of thought as in Silvester and Wathen, 37 with the energy arguments for any nonzero vector w ∈ R (N+1)n Theorem 6. The negative eigenvalues of  2 lie in the interval and the positive ones lie in the interval Proof. Assume that (u T , v T ) T , u, v ∈ R (N+1)n is an eigenvector of  2 with an eigenvalue . Then the eigenvalue equations are Du + Lv = u, We note that if u = 0 then v = 0 by (28) and if v = 0 then u = 0 by (29). Hence, u,v ≠ 0. First, we consider >0. Equation (29) gives v = (I +H T R −1 H) −1 L T u, where I ∈ R (N+1)n×(N+1)n . The matrix I +H T R −1 H is positive definite, hence nonsingular. We multiply (28) by u T and use the previous expression for v to get The eigenvalues of (I +H T R −1 H) −1 in increasing order are ( + max ) −1 ,… ,( + min ) −1 . Then (21)].
Hence, this inequality together with (19) and (30) gives Similarly, using the upper bound from (19) and employing (30) yields the upper bound .
Now consider the case <0. Since D− I is positive definite, from (28) u = −(D− I) −1 Lv. Using this expression and multiplying (29) Then using (20), (21) and the fact that the smallest eigenvalue of (D− I) −1 is ( max − ) −1 results in the inequality − ||v|| 2 ≥ 2 min ||v|| 2 1 max − + min ||v|| 2 , which can be expressed as and its solution gives the upper bound Notice that the bound (32) takes into account information on observations only if the system is fully observed. Otherwise, p<(N+1)n and min =0.
We obtain an alternative upper bound for the negative eigenvalues that depends on the observational information and might be useful for the fully observed case, too. Equation (31) may be written as Eigenvalues of the 2 × 2 block matrix in the previous equation are the eigenvalues of (D− I) −1 and R −1 . Thus, by an energy argument (19), (22)]. Hence, Hence, ≤ max{ 2 , 3 }.
The required upper bound follows from (32) and (33). Next, we obtain the lower bound for the negative eigenvalues. Using Equation (31) with the largest eigenvalue of (D− I) −1 and other parts of (20) and (21) .

▪
We observe that if the system is not fully observed, then p<(N+1)n and min = 0, and the upper bound for the positive eigenvalues and the upper bound for the negative eigenvalues (24) in Theorem 6 reduce to (2.11) and (2.13) of Silvester and Wathen. 37 We are interested in how the bounds in Theorem 6 change if additional observations are introduced. The change to the upper negative bound in (23) depends on which of (24), (25), or (26) gives the bound. Hence, in Corollary 3 we comment on when (26) is larger than (25) and Corollary 4 describes a setting when the negative upper bound is given by (26).
Proof. max{ 2 , 3    Proof. 1 does not change while the system is not fully observed. When the system becomes fully observed, min >0 and 1 decreases. 3 decreases or stays the same by Lemma 1. ▪ Note that if the negative upper bound in (23) is given by 2 , it is unclear how the bound changes with the number of observations, since both max and 2 min increase or stay the same. The same is true for the positive bounds in (27). Only max and min depend on the available observations and they are contained in elements with positive and negative signs.
The result in Corollary 5 that applies for  2 with a general R is consistent with the result in Theorem 5 that considers  2 with a diagonal R. The same holds for the result in the following corollary, that determines how the lower bound for the negative eigenvalues of  2 changes in the special case of uncorrelated observational errors. Corollary 6. If the observation error covariance matrix R is diagonal, the negative lower bound in (23) moves away from zero or stays the same when additional observations are introduced.
Proof. The result follows by applying Lemma 3 to see how max changes. ▪ In the following corollary, we consider the intervals for the positive eigenvalues of  3 and  2 with a fixed number of observations. It suggests that we may expect the positive eigenvalues of  2 to be more clustered than those of  3 .

Corollary 7.
The interval for the positive eigenvalues of  2 is contained in the interval for the positive eigenvalues of  3 , that is, Proof. As observed in Corollary 4, ) .
Thus, we show that the upper bound for positive eigenvalues of  3 is larger than the upper bound for positive eigenvalues of  2 : (squaring both sides and simplifying) ⇔ 2 2 max + min Inequality (34) always holds because the left-hand side is positive and the right-hand side is negative. We also show that the lower bound for the positive eigenvalues of  3 is smaller than the lower bound for the positive eigenvalues of  2 : .
Note that by definition min ≤ min and the following inequality always holds because it can be simplified to

Bounds for the 1 × 1 block formulation
The system matrix  1 given by (14) is symmetric positive definite and so its eigenvalues are positive. We determine how these change due to additional observations when the observation errors are uncorrelated (as for the extreme eigenvalues of  2 in Theorem 5).

Theorem 7.
If the observation errors are uncorrelated, that is, R is diagonal, then the eigenvalues of  1 move away from zero or are unchanged when new observations are added.
The result follows by applying Theorem 1. ▪ We formulate spectral bounds for  1 that depend on the largest and smallest eigenvalues of D and R, and the largest and smallest singular values of (L T H T ).

Theorem 8. The eigenvalues of  1 lie in the interval
where i and i are defined in Table 1, and (15) and (16).
Proof. Assume that u ∈ R (N+1)n is an eigenvector of  1 . Then the eigenvalue equation premultiplied by u T can be written as where is an eigenvalue of  1 . The smallest and largest eigenvalues of are −1 max and −1 min , respectively. The bounds follow from the following inequalities that are obtained using (22): The following corollary explains how the upper bound for the eigenvalues of  1 changes with the addition of new observations. This result that applies for  1 with a general R is consistent with Theorem 7 that considers  1 with a diagonal R.

Corollary 8. The upper bound in Theorem 8 moves away from zero or is unchanged when new observations are added.
Proof. If min = min , min decreases by Lemma 2. Otherwise min does not change. The result follows by applying Lemma 1 to determine the change to max . ▪ It is unclear how the lower bound in Theorem 8 changes with respect to the number of observations, because both the numerator and denominator grow or stay unchanged by Lemmas 1 and 2, respectively.

Theorem 9 (From Theorem 1 (c) of Axelsson and Neytcheva 38 ). The negative eigenvalues of  3 lie in the interval
and the positive ones lie in the interval Note that the lower bound for the positive eigenvalues in Theorem 9 is the same as in Theorem 4.

Theorem 10 (From Theorem 1 (a) and (b) of Axelsson and Neytcheva 38 ). The negative eigenvalues of  2 lie in the interval
, and the positive ones lie in the interval We observe that the bound (35) for the positive eigenvalues, unlike our bound in Theorem 6, is independent of the number of observations. In addition, in practical applications it may not be possible to compute the upper bound for the negative eigenvalues because of the term.

System setup
We present results of numerical experiments using the Lorenz 96 model, 39 where the state of the system at time t i is , X n i ) T and the evolution of x i components X j ,j∈{1,2,… ,n}, is governed by a set of n coupled ODEs: where X −1 = X n−1 ,X 0 = X n and X n+1 = X 1 . This model is continuous in time and discrete in space. We assume that X 1 ,X 2 … ,X n are equally spaced on a periodic domain of length one and take the space increment to be ΔX = 1/n. We require the linearization of this model M (l) i , i ∈ {0,… ,N−1} to define  3 ,  2 , and  1 . In our experiments, we set n=40 and F = 8, since the system shows chaotic behavior with the latter value. The equations are integrated using a fourth-order Runge-Kutta scheme. 40 The time step is set to Δt = 2.5×10 −2 and the system is run for N = 15 time steps.
The assimilation system is set up for so-called identical twin experiments, by which synthetic data are generated using the same model as is used in the assimilation. We generate a reference, or "true," model trajectory x t by running the Lorenz 96 model over the time window from prescribed initial conditions and with prescribed Gaussian model errors i . An initial background state x b and observations y i at each time t i are then generated by adding Gaussian noise to x t . Assimilation experiments are run using this background state and observations, assuming that the true state is unknown. The error covariance matrices that are used to generate the model error in x t and the observation error in y i are also used for the assimilation, that is, in the 3 × 3 block, 2 × 2 block, and 1 × 1 block matrices. These error covariance matrices do not change over time. The observation error covariance matrix is R i = 2 o I p i , where p i is the number of observations at time t i , (diagonal R i is a common choice in data assimilation experiments 13,14 ) and the model error covariance matrix is equal to the background error covariance matrix where C b is a second-order auto-regressive correlation matrix 41 with correlation length scale 1.5×10 −2 . We have also performed numerical experiments with Q i = 2 q C q ≠ B, where C q is a Laplacian correlation matrix, 42 and q and b vary by a factor of two. We observed similar results to those presented here. In our experiments, the parameters are chosen so that the observations are close to the real values of the variables, and the background and the model errors are low, in particular, we set o =10 −1 , which is about 5% of the mean of the values in x t , and b =5×10 −2 . y i consists of direct observations of the variables X j ,j ∈ {1,2,… ,n} at time t i , hence the observation operator  i is linear.
All computations are performed using Matlab R2016b. In particular, the eigenvalues are computed using the Matlab function eig. If only extreme eigenvalues are needed, eigs is used, and the extreme singular values are given by svds.

Eigenvalue bounds
We present numerically calculated eigenvalue bounds and eigenvalues of  3 ,  2 , and  1 and illustrate their change with the number of observations and the quality of the spectral estimates, presented in Section 3. We consider the following observation networks that have different numbers of observations (p = ∑ N i=0 p i ): a. 1 observation at the final time t 15 , b. 20 observations, observing every eighth model variable at every fourth time step (at times t 3 ,t 7 ,t 11 ,t 15 ), c. 80 observations, observing every fourth model variable at every second time step (at times t 1 ,t 3 ,t 5 ,t 7 ,t 9 ,t 11 ,t 13 ,t 15 ), d. 160 observations, observing every second model variable at every second time step (at the same times as in observation network c), e. 320 observations, observing every second model variable at every time step, f. 640 observations, fully observed system.
In Figure 1, we plot the eigenvalues of the matrices  3 ,  2 , and  1 together with the bounds from Theorems 4,6, and 8, respectively, for each of the observation networks a-f. In these experiments, as expected from Theorem 3, as the number of observations increases, the smallest and largest negative and the largest positive eigenvalues of  3 move away from zero and the smallest positive eigenvalue approaches zero. In addition, as determined in Corollary 1, the upper bound for the positive eigenvalues of  3 presented in Figure 1(I) grows and the lower bound stays the same (because the eigenvalues of R do not change) when more observations are added. The change is too small to observe in the plots, hence we report the extreme eigenvalues of  3 and the intervals from Theorem 4 for the networks a, c, e, and f in Table 2. Moreover, the negative bounds for the eigenvalues of  3 in Figure 1(II) move away from zero. This agrees with Corollary 2, because here min = min . However, in this setting max = max and the same Corollary cannot be used to explain the change to the upper bound. In general, the outer bounds (the largest positive and the smallest negative) for the eigenvalues of  3 are tight and the inner bounds (the smallest positive and the largest negative) get tighter as the number of observations increases.
The positive eigenvalues of  2 displayed in Figure 1(III) approach zero as observations are added, whereas the negative eigenvalues in Figure 1(IV) move away from it. This is consistent with Theorem 5, which holds for this experiment F I G U R E 1 Semilogarithmic plots of the positive and negative eigenvalues of the matrices  3 (I and II) and  2 (III and IV), and the positive eigenvalues of  1 in V for the different observation networks (a-f). Eigenvalues are denoted with merged blue dots. The filled black squares mark the bounds for eigenvalues of  3 in Theorem 4,  2 in Theorem 6, and  1 in Theorem 8. Note that the smallest negative eigenvalues of  2 coincide with the bounds because we have chosen diagonal R. The lower bounds for the positive and negative eigenvalues of  2 stay the same when the observation network is changed. In these bounds only max (the largest eigenvalue of H T R −1 H) depends on the observations. In our experiments, max does not change because of our choice of H and R. The constant negative lower bound is consistent with Corollary 6. The numerical values of the intervals from Theorem 6 and of the extreme eigenvalues of  2 for the networks a, c, d, and f are presented in Table 3. The upper positive bound moves toward zero when the system becomes fully observed and is constant for the other networks, because the smallest eigenvalue min of H T R −1 H is nonzero only for the fully observed system. The negative upper bound for the spectrum of  2 is given by 1 in (24) for the fully observed system and 3 in (26) otherwise, and moves away from zero, in agreement with Corollary 5. Notice that the eigenvalue bounds are tight. In addition, the numerical results confirm the statement of Corollary 7 that the interval for the positive eigenvalues of  3 contains the bounds for positive eigenvalues of  2 .
Note that  2 has p distinct eigenvalues that coincide with the negative lower bound in the plots. The distinct eigenvalues are explained by the bounds for individual eigenvalues in Corollary 9 in Appendix A, because in our experiments The eigenvalues of  1 and their bounds presented in Figure 1(V) move away from zero when more observations are used. This is as expected, because Theorem 7 holds for our choice of diagonal R. The variation of the bounds is explained by the fact that with our choice of R values of min and max do not change, and min and max grow. Such behavior of the upper bound agrees with Corollary 8. However, as can be seen in Table 4 the upper value of the intervals in Theorem 8 are too pessimistic.
Better eigenvalue clustering away from zero when more observations are used can speed up the convergence of iterative solvers when solving the 1 × 1 block formulation. However, nothing definite can be said about the 3 × 3 block and 2 × 2 block formulations: the negative eigenvalues become more clustered, but the smallest positive eigenvalues approach zero when new observations are introduced.
We also calculate the alternative eigenvalue bounds given in Theorems 9 and 10. With the choice of parameters and observations considered in this section, the bounds given in these theorems are not as sharp as those in Theorems 4 and 6. However, this is not always the case, as is illustrated in Tables 5 and 6. Here, o =1.5, b =1 and the observation network d is used.

Solving the systems
We solve the 3 × 3 block, 2 × 2 block, and 1 × 1 block systems with the coefficient matrices discussed in the previous subsection, and the right-hand sides defined in (9), (11), and (13), respectively. The saddle point systems are solved with MINRES and the symmetric positive definite systems are solved with CG. The relative residual at the jth iteration of the iterative method is defined as ||r j ||/||r 0 ||, where ||⋅|| is the L 2 norm and r j is the residual on iteration j. The iterative method terminates after 400 iterations or when the relative residual reaches 10 −4 . The initial guess is taken to be the zero vector. In Figure 2, we plot the relative residuals. Note that the residual reaches 10 −4 in the fully observed case (observation network f) when solving each of the systems and convergence is most rapid in this case. This is expected because of the clustering of the eigenvalues. The convergence rates are similar for networks d and e, which is consistent with Figure 1. The convergence of MINRES for the observation network a with a single observation is not explained by the spectra of  3 and  2 . However, the convergence in other cases agrees with our eigenvalue analysis.

CONCLUSIONS
Weak constraint 4D-Var data assimilation requires the minimisation of a cost function in order to obtain an estimate of the state of a dynamical system. Its solution can be approximated by solving a series of linear systems. We have analyzed three different formulations of these systems, namely, the standard system with 1 × 1 block symmetric positive definite coefficient matrix  1 , a new system with a 2 × 2 block saddle point coefficient matrix  2 , and the version with 3 × 3 block saddle point coefficient matrix  3 of Fisher and Gürol. 12 We have focused on the dependency of the coefficient matrices on the number of observations. We have found that the spectra of  3 ,  2 , and  1 are sensitive to the number of observations and examined how they change when new observations are added. The results hold with any choice of the blocks in  3 , whereas we can only make inference about the change of the spectra of  2 and  1 for uncorrelated observation errors (diagonal R). We have shown that the negative eigenvalues of both  3 and  2 move away from zero or are unchanged when observations are added. The smallest and largest positive eigenvalues of  2 , as well as the smallest positive eigenvalue of  3 , approach zero or are unchanged, whereas the largest positive eigenvalue of  3 moves away from zero or is unchanged. The smallest and largest eigenvalues of  1 move away from zero or are unchanged. The extreme eigenvalues may cause convergence problems for Krylov subspace solvers, hence we may expect the small positive eigenvalues of  2 and  3 to cause these issues when new observations are added. We summarise these results together with the properties of the three systems in Table 7. We have used the work of Rusten and Winther 19 to determine the bounds for the spectrum of  3 and derived novel bounds for the spectral intervals of the saddle point matrix  2 and the positive definite matrix  1 . We have observed that the change to the intervals due to new observations is consistent with the change of the extreme eigenvalues of the matrices. Our numerical experiments agree with these findings. In general, the bounds for the saddle point matrices are tight whereas the upper bounds for the positive definite matrix are too pessimistic.
Our numerical experiments show slow convergence, particularly with a few observations, and the need for preconditioning is evident. Previous work on the 3 × 3 block saddle point system considered iteratively solving the problem when inexact constraint preconditioners of Bergamaschi et al 43 are used (see, Fisher and Gürol, 12 Freitag and Green, 14 Gratton et al 13 ). It was shown that such a preconditioning approach is not optimal and further research into effective preconditioning is still an open question. Preconditioning may transform the coefficient matrix into a nonnormal one with GMRES as an iterative solver of choice. Although the spectrum of a nonnormal matrix may not be enough to describe the convergence of GMRES, 44 Benzi et al 18 claim that fast convergence often appears if the spectrum is clustered away from the origin. Hence, a better understanding of the spectrum of  3 ,  2 , and  1 may help in finding a suitable preconditioner for these matrices. We suggest that including the information on observations coming from the observation error covariance matrix R and the linearised observation operator H could be beneficial for preconditioning, given that the spectra of all the considered matrices depend on the observations. The design of such preconditioners that are cheap to construct and apply is an interesting area for future research. We derive bounds for the individual eigenvalues of  3 and  2 (Theorems 13 and 14, respectively). First, we state two theorems that are used in deriving these bounds. The notation of Table 1 is used. Proof. We can write  3 as a sum of two symmetric matrices: The result follows from applying Theorem 1 to the matrices S 3x3 D and S 3x3 L . ▪ Theorem 14. The eigenvalues of  2 are bounded by positive eigenvalues: k − max ≤ k ≤ k + max , k = 1, … , (N + 1)n.