Conditioning of hybrid variational data assimilation

In variational assimilation, the most probable state of a dynamical system under Gaussian assumptions for the prior and likelihood can be found by solving a least‐squares minimization problem. In recent years, we have seen the popularity of hybrid variational data assimilation methods for Numerical Weather Prediction. In these methods, the prior error covariance matrix is a weighted sum of a climatological part and a flow‐dependent ensemble part, the latter being rank deficient. The nonlinear least squares problem of variational data assimilation is solved using iterative numerical methods, and the condition number of the Hessian is a good proxy for the convergence behavior of such methods. In this article, we study the conditioning of the least squares problem in a hybrid four‐dimensional variational data assimilation (Hybrid 4D‐Var) scheme by establishing bounds on the condition number of the Hessian. In particular, we consider the effect of the ensemble component of the prior covariance on the conditioning of the system. Numerical experiments show that the bounds obtained can be useful in predicting the behavior of the true condition number and the convergence speed of an iterative algorithm


INTRODUCTION
In weather forecasting, we use mathematical and numerical models to describe the system dynamics of the ocean and atmosphere.These models are highly nonlinear and often sensitive to noise in initial conditions.Because of the nonlinearity and instability, random perturbations in the initial data and errors in the model amplify rapidly through time, producing unreliable predictions. 1,2In variational data assimilation, the goal is to find the maximum Bayesian a posteriori estimate of the system state from which to initialize the model.Major operational centers worldwide have adopted the four-dimensional variational assimilation scheme (4D-Var) for environmental forecasting in recent years.Similar applications arise in other fields such as physics, biology, and economics. 1,3,4n 4D-Var, we aim to obtain an optimal initial state variable (conventionally called the "analysis") by solving a nonlinear least squares problem, in which we try to find the best fit between a set of observations over a time window and an a priori estimate of the state at the start of the window, known as the background.In the case where observations are only given at one time, the method becomes three-dimensional variational assimilation, or 3D-Var.We assume that the background state and observations have Gaussian, unbiased errors, with covariance matrices B and R respectively.Traditional variational data assimilation methods have used a climatological estimate 4 for the background error covariance matrix B, where flow-dependent ensemble information is not incorporated into the system. 5,6Recent developments utilize a hybrid approach, in which an ensemble background error covariance matrix is estimated with ensemble members and then combined with the climatological part.This has a clear advantage of bringing in the variability of the system and updates the statistics in each prediction window, giving flow-dependent information that can improve the accuracy of predictions. 4However, due to computational restrictions, the affordable number of ensemble members is normally small, which leads to a rank deficient ensemble background error covariance matrix.The method of combining the ensemble parts with the conventional 4D-Var is called Hybrid 4D-Var.In this method, B is given by B = (1 − )B 0 + P f .Here, B 0 , P f are the climatological background error covariance and the ensemble error covariance matrix, and (1 − ),  are their weights, where  is a scalar.For the details of the hybrid method, we refer readers to the review paper of Bannister. 4n this article, we are especially interested in establishing the relationship between the conditioning of Hybrid 4D-Var and the weight  on P f .The 4D-Var problem is usually solved using iterative gradient methods, such as conjugate gradient or Quasi-Newton methods. 1,4The condition number of the Hessian can be used to estimate the number of iterations required for convergence. 7,8In addition, the condition number also reveals the sensitivity of the minimization problem with respect to random noise. 7Here we establish that, in Hybrid 4D-Var, a transition point exists where the condition number of B sharply increases with the weight on P f .Since P f is rank deficient, 1,[4][5][6] adding the ensemble background error covariance matrix may cause difficulties for solving the nonlinear least-squares minimization problem 1,[4][5][6] and an adequate preconditioning scheme becomes desirable.This is typically achieved through a Control Variable Transformation (CVT). 1,4CVT uses a decomposition of B 0 and P f to transform the state variable such that the conditioning of the Hessian matrix is improved. 1,4Implementation details of CVT are frequently described in previous works on 4D-Var 3,5,6,9 and Hybrid 4D-Var. 4In terms of practical applications of Hybrid 4D-Var, we note that the nonlinear least-squares problem is often linearized and solved as a sequence of linear least-squares minimizations.This is known as the incremental 4D-Var method 1,3,4 and is equivalent to an approximate Gauss-Newton method. 10n practice, it is useful to understand the contribution of each component of DA, in such a way that the impact on the conditioning can be predicted when these components change.This then motivates a comprehensive theory that can predict the conditioning while separating the contribution of each error component (B 0 , P f , R), and relating it to the parameters that characterize these components.We note that both preconditioned and unpreconditioned 4D-Var are implemented by major operational centers (such as the UK Met Office).Such theories have previously been constructed for conventional 4D-Var.Haben et al. 3 established a theory to estimate the conditioning of a preconditioned 4D-Var system.This work is later developed by Tabeart et al. 8,9 for both unpreconditioned 3D-Var and preconditioned 4D-Var with Control Variable Transformation (CVT).In Tabeart et al.'s studies, a bound estimation is proved for the conditioning of the system, in which the contributions of the background error covariance matrix and observation error covariance matrix are separated.The impact of each component is then associated with its characterizing parameters.The research of Haben et al. 3 and Tabeart et al. 8,9 are tested and analyzed using small scale examples.However, they are extrapolated to justify observations in large scale real-life applications.To give a few examples, Mercier et al. 11 used the analysis of Haben et al. 3 to explain the convergence behavior of a block Krylov method for a 3D-Var application; 11 Desroziers et al. 12 cited the same result of Haben et al. 3 to guide their design of a preconditioned Lanczos/Conjugate Gradient algorithm.Hatfield et al. 13 cited Haben et al.'s analysis 3 to explain the effect of an increasing model error on the convergence of an incremental 4D-Var; Aabaribaoune et al. 14 used the result of Tabeart et al. 8 to analyze the convergence speed of a BFGS algorithm for solving a 3D-Var system, in the application of ozone profiling.
In this article, we aim to extend previous studies of Haben et al. 3 and Tabeart et al. 8,9 to a hybrid system.In particular, we study the impact of P f on the condition number of the Hessian matrix, for both unpreconditioned cases and preconditioned cases with CVT.
We outline this article as follows.In Section 2, we briefly formulate the problem of Hybrid 4D-Var and introduce the Hessian matrix; in Section 3, we establish the theory of conditioning for unpreconditioned Hybrid 4D-Var and preconditioned Hybrid 4D-Var with CVT; in Sections 4 to 6, we provide multiple numerical experiments to illustrate the theories and analyze the conditioning; in Section 7, we show how the behavior of the condition number of the Hessian predicted by our theory is reflected in the convergence speed of a conjugate gradient algorithm.Section 8 gives a general summary of the results.

PROBLEM FORMULATION
In this section, we introduce the Hybrid 4D-Var method, the incremental method, the preconditioning technique of Control Variable Transformation, and the Hessian matrix associated with the unpreconditioned and preconditioned least-squares problems.

A general formulation of the Hybrid 4D-Var
A general formulation of the 4D-Var is given by Problem 1.
Problem 1. Solve for the optimal initial state x a by solving a minimization problem given by, x a = arg min where  i is the nonlinear observational operator; y i is the vector of observational measurements, taken at time t i ; x i is the state variable at time t i , given by, The vector x b ∈ R n is the prior information given by the model at t 0 , known as the background state.A simplified problem, given by 3D-Var, is a special case of the general 4D-Var problem where N = 0, meaning there are only observations at time t 0 .
In Hybrid 4D-Var and 3D-Var, we follow the formulation given by (1), but replace the background error covariance matrix B with B = (1 − )B 0 + P f .We note that the problem involves solving a nonlinear least squares problem, and it is difficult to implement directly when the problem is large scale.Instead, this is often replaced with a linearized incremental formulation, which we describe next.

Incremental Hybrid 4D-Var, control variable transformation and the Hessian matrix
In practice, especially in NWP, Problem 1 is often solved using the incremental method.In incremental 4D-Var, the nonlinear least squares problem is replaced with a sequence of linear least squares problems with a cost function of where the vector d k i is known as the innovation, defined by ; the linear operator H i is the Jacobian of  i , and the vector x k b is the increment, given by x The model operator M i,0 is given by M i,0 = ∏ i j=1 M j , where M j is the Jacobian of  j,j−1 .At each outer iteration k (the outer loop), we minimize (3) to solve for the increment, and then x k 0 is updated for the next iteration.We now introduce the Hessian matrix of the cost function  in Problem 1, which is given by, 3,9 The sum of the matrix product above can be written in a simple compact form.
, where R is a block diagonal matrix with its ith diagonal block given by R i .
Following Notation 1, we can rewrite the Hessian matrix as follows, 3,9 In the case of Hybrid 3D-Var, we recall that it is a special case of 4D-Var with N = 0 and its Hessian is given by In applications, preconditioning techniques are often applied to improve the conditioning of the system.Control Variable Transformation (CVT) is one of the popular preconditioning techniques.The detail of CVT is described by Nichols, 1 Bannister, 4 and Buehner. 15In this approach, we utilize factorizations of B 0 and P f , given by and x 1 , x 2 , … , x m are ensemble members, m is the number of samples and x is the ensemble mean.These matrices are then used to transform the state variable as follows, 4 U h v = x, where Here x ∈ R n is the increment of the state variable and v is the increment of the control variable.Applying this transform to (3) leads to a new cost function that reads A direct calculation then yields the Hessian of J ( v k 0 ) with respect to v k 0 as In the following sections, we will demonstrate that CVT prevents the condition number from going to infinity when  (the weight of the ensemble part) approaches 1.In addition, we also note that the adjoint of U h does not need to be computed explicitly.This is discussed in detail by Smith et al. 5 and Bannister. 4

THEORY OF THE CONDITIONING OF HYBRID 4D-VAR
To better analyze the convergence of the nonlinear least squares problem of Hybrid 4D-Var, we aim to establish a set of theories that can predict changes in the conditioning of the system prompted by varying parameters (such as correlation length scale, error variance, etc).Typically, we are also keen to understand the impact of the rank deficient ensemble part on the conditioning.These motivate us to develop an estimation of the condition number that is informative of the actual conditioning of the system.In order to achieve such a goal, we use spectral theories to establish bounds for the condition number of the Hessian matrix.
We outline the structure of this section as follows.We start by introducing some pre-established results, then extend them to the Hybrid method.We will discuss the unpreconditioned cases and the preconditioned cases separately.

Eigenvalues and conditioning
We begin with a brief review of previous work by Tabeart et al. 8 and some fundamental eigenvalue inequalities. 16We will extend these results to Hybrid 4D-Var.In the scope of this article, we always assume that B and R are symmetric.
Notation 2. Let  k (A) be the kth largest eigenvalue of a matrix A ∈ R n,n ,  1 (A),  n (A) be the largest and smallest eigenvalues of A, and (A) be the condition number of A.
Definition 1.For a symmetric positive definite matrix A ∈ R n,n , its condition number is defined by For the eigenvalues of the sum of two Hermitian matrices, Weyl 17 proved the following theorem: Theorem 1.Let A 1 , A 2 be two symmetric matrices.Then the eigenvalues of A = A 1 + A 2 satisfy the following: The inequality for products is given by Wang and Zhang as follows, 18 Theorem 2. Let A 1 , A 2 ∈ R n,n be positive semidefinite Hermitian matrices.Then Applying these results to the Hessian matrix of 3D-Var, Tabeart et al. 8 established that,

Conditioning of unpreconditioned Hybrid 4D-Var
In Hybrid 4D-Var, the background error covariance matrix B is replaced by a weighted sum of the static and ensemble background error covariance matrix, such that B = (1 − )B 0 + P f .Although, we can still use (12) to estimate the condition number of the Hessian, we note that it does not have the capacity to separate the contribution of P f and predict the condition number as the weight of P f grows.This motivates us to establish theories that are specific to the hybrid case of 4D-Var.
Lemma 1. Assume that P f is rank deficient and let B = (1 − )B 0 + P f .Then the extreme eigenvalues of B satisfy the following: Proof.The conclusion follows from Theorem 1 and that  n Given B 0 , P f are two symmetric matrices, let B = (1 − )B 0 + P f , and assuming that P f is rank deficient, the condition number of B is then bounded as follows, Proof.By the definition of (B) and Lemma 1, we find that, and similarly, We highlight that the condition number of B diverges to infinity as  → 1.This is reflected by the lemma above as the lower bound of (B) goes to infinity.
We now use these results to obtain a bound on the condition number in Theorem 4. We emphasize that the conclusion of these results cannot be directly derived from Theorem 3 by simply replacing B with (1 − )B 0 + P f in the upper and lower bounds.In fact, such an effort leads to a less sharp bound that does not provide analytical or theoretical insight on how the bound changes with .Moreover, this approach does not reveal how the bound responds to changes in physical parameters related to B 0 and P f .Using different strategies leads to a bound that clearly separates the contribution of each component (B 0 , P f ) and their weights, making it possible to analyze their impact on the condition number and their interaction as well.Notation 3.For simplicity of presentation, we introduce  z , Γ z to represent the lower and upper bound of any z ∈ R.
Given that B, R are symmetric, the condition number of S 4D is then bounded as follows, Proof.In Theorem 3, replacing H T 0 R −1 H 0 with Ĥ R−1 Ĥ, then the left hand side of Inequality ( 12) can be written max For the upper bound, we note that the upper bound can be written as follows, Substituting (B) and  1 (B) with their upper bounds, this then produces the upper bound on (S 4D ).▪ We note that the same bound can be derived by using eigenvalue inequalities.Crucially, the upper bound in Theorem 4 does not require any explicit computation of S 4D .We only need to compute the largest eigenvalues of  1 (B 0 ),  1 ( P f ) and the condition number of B 0 .
In the case of a special observation operator we can simplify the Hessian in the 3D-Var case as follows.
Corollary 1. Assuming that each row of H 0 has one unit entry, with all other entries being zero, and that R 0 is diagonal such that R 0 =  2 R 0 I p , where I p ∈ R p,p is the identity matrix, then the bounds in Theorem 4 in the case of 3D-Var simplify to: Proof.Given that R 0 =  2 R 0 I p , and with the assumption on H, the largest eigenvalue of . Replacing relevant terms in Bounds (15), we then reach the conclusion.▪ We emphasize that the upper bound on (S 4D ) diverges to infinity as  → 1.In such a case, the background error covariance matrix is dominated by P f .However, the theory does not predict the same behavior for the lower bound.In terms of of the observation part, we observe that the number p of observation points does not alter the bounds; thus the theory cannot predict the behavior of (S 4D ) when p varies.
Although unpreconditioned 4D-Var is still in use for real-life applications, 8 it is now a common practice to implement CVT for Hybrid 4D-Var.In the next section, we focus on developing similar theories for such cases.

Conditioning of preconditioned Hybrid 4D-Var method with CVT
The preconditioning of Hybrid 4D-Var is broadly adopted in major operational centers, and the impact of different error components on the conditioning is important to determine the convergence of numerical schemes for incremental Hybrid 4D-Var, or more generally, for solving the nonlinear least squares problem in Hybrid 4D-Var.In this section, we prove two versions of the bounds for (S 4D ) with different strategies.First, we remind readers some useful notation as follows: let , where x 1 , x 2 , … , x m is the set of ensemble members and x is the ensemble mean.We recall that the motivation behind our approach to deriving the bound is to separate the contribution of each component in the Hessian, particularly the components of B 0 and P f and their weights.We are especially keen to find out how the ratio of (1 − )||B 0 || 2 ∕(||P f || 2 ) influences the bound, as this ratio is linked to the balancing of the climatological part and the ensemble part.Driven by this motivation, we discovered a decomposition of the matrix product U T h ĤT R−1 ĤU h that yields a useful max function which is directly related to this ratio.The detail is given by the proof of Theorem 5.

Theorem 5. The condition number of the Hessian matrix of the Hybrid 4D-Var method satisfies
Proof.We know that the Hessian matrix of the Hybrid 4D-Var can be expressed as follows, Substituting U h in S P4D , we derive We note that Then, applying Theorem 2 to the matrix products, we derive that So In the lower bound, we use that Applying Theorem 2 then yields (we note that K is rank deficient, such that  n (K) = 0) and giving us By substituting max

) ▪
Another version of the bounds can be derived by directly applying Theorem 2 to the product of U h KU h instead of using the decomposition such as that in Theorem 5.

Theorem 6. Let S P4D be the Hessian matrix of Hybrid 4D-Var with CVT, then the condition number of S P4D satisfies
Proof.We note that Applying the eigenvalue inequality of the product, and using the symmetry of B, we find that Furthermore, the eigenvalue inequality of the sum yields Applying ( 35) and (36) directly to (19), we obtain that For the lower bound, we note that applying Theorem 2 to  1 ( B T K ) only yields ▪ In terms of the effectiveness of the preconditioning, we note that the upper bounds given by Theorems 5 and 6 are not controlled by the condition number of B, but by the largest eigenvalues of B 0 , P f , K 2 only.This is different from the upper bound of the unpreconditioned case.In addition, for the unpreconditioned case, B = P f at  = 1, indicating that B becomes closer to a singular matrix as  approaches 1.In such a case, the theory predicts that the condition number of S 4D diverges to infinity as  → 1.However, by implementing CVT, this divergence is eliminated.
On the other hand, we note that the upper bound given by Theorem 5 is distinctively different from all the other versions, including the ones derived for the unpreconditioned cases.Namely, the upper bound of Theorem 5 contains a max function that selects the larger term of (1 − ) 1 (B 0 ) and  1 ( P f ) .We can thus anticipate a switching point of the upper bound as a function of , which does not exist in other versions of the upper bound.In fact, this switching point provides useful information about the behavior of (S P4D ) with respect to varying .We will illustrate this with numerical examples in Section 4. Similar to the unpreconditioned scenario, we find that these bounds do not reflect any contribution of the number p of observation points, which means that they cannot predict the trend of (S 4D ) with respect to a changing p.

EXPERIMENTAL DESIGN
Recalling the motivation of the theoretical studies in Section 3, we note that the purpose of the bounds is to provide useful information on the actual condition number of the Hessian without having to compute the Hessian explicitly.However, for the bounds to be informative, we require two properties of them: the estimation given by these bounds should reflect the condition number; the bounds change with  similarly to the condition number.The first property ensures that we can use the bound directly to estimate the condition number in practice; the second property guarantees that the bound is useful in analyzing the impact of the ensemble part on the condition number.To demonstrate the theories we use the special case of 3DVar.

Computing error covariance matrices and the observational matrix
In this section, we give details of computing the error covariance matrices of B 0 , P f , R and the observational matrix H on a one-dimensional grid.
Notation 4. Let L 0 , L ens denote the correlation length scale associated with B 0 , P f , and be the variances of R, B 0 , P f .Let r denote the radius of a two dimensional disk  2 (0, r),  be the angular location of a point on the boundary of  2 (0, r) and  i,j be the angular difference between two grid point on the boundary of  2 (0, r).
The linear observation operator is chosen to be H 0 = H (4) .
is captured by both versions of the upper bounds.The upper bounds computed from Theorems 1 and 3 do not differ significantly.This is a general result for all choices of parameters we have used to conduct numerical case studies.Based on these observations, we decide to focus only on the upper bound from here onward.
In the preconditioned case, Figure 1b, we note that the condition number is drastically reduced by the CVT compared to the unpreconditioned case.We observe an inflection point in the upper bound of Theorem 5.This coincides with the transition point of the condition number (from decreasing to increasing with ) and so predicts the minimum of (S 3D ).The bound produced by Theorem 6 does not capture such behavior, while not being significantly closer to the actual condition number either.
These results are valid for all different choices of matrices and parameters.Therefore we focus our study on the bound produced by Theorems 4 and 5 only, because of their superiority in separating the impact of different matrices and capturing the shape of the actual condition number (see Figure 1b).We note that the condition numbers are plotted in log 10 scale in this paper.

CASE STUDIES FOR UNPRECONDITIONED 3D-VAR
In this section, we illustrate the upper bound given by Theorem 4 with multiple case studies.In these case studies we change parameters associated with B 0 , P f , Ĥ, R and observe the responses in the condition number of S and its upper bound.In particular, we are especially interested in the relationship between the conditioning of the system and the weight of the ensemble part (i.e., ).We note again that the case studies in the following sections use 3D-Var as a special case of 4D-Var to demonstrate the theories.Therefore, Ĥ and R are replaced with H 0 and R 0 , respectively.Additionally, we will denote the Hessian as S 3D and consider it as a special case of S 4D .
As the ensemble error covariance matrix is rank deficient, we can then expect that the balance of B 0 and P f is particularly important in the conditioning of the system.For example, when P f is the dominant component of the Hessian matrix S 3D , the condition number of S 3D is likely to be large and the system ill-conditioned.In addition, as  → 1, the Hessian matrix S 3D → P −1 f + H T 0 R −1 0 H 0 , the limit is clearly ill-defined as P −1 f does not exist.Meanwhile, the balance of B 0 and P f is also controlled by the relative sizes of physical length scales L 0 , L ens and the variances  B 0 ,  P f .
Recalling the upper bound in Theorem 4, it is predominantly controlled by the largest eigenvalues of B 0 and P f .Meanwhile, the largest eigenvalues of B 0 and P f vary with their correlation length scales. 3As Figure 2 shows, the largest eigenvalue of B 0 increases with the correlation length scale; the general trend is similar for P f except for random fluctuations caused by the sampling noise.Consequently, we expect (S 3D ) and its bound to change when the physical length scales (L 0 , L ens ) alter.
As an important justification of the effectiveness of the upper bound, we observe that the shape of the upper bound is similar to the condition number in all four case studies presented in Figure 3.We also note that in this set of experiments, we observe that the impact of L ens on the conditioning and the bounds is less significant than L 0 (compare Figure 3a,b).In Figure 3c, we observe that the trend of the condition number with respect to  P f is well captured by the upper bounds.Also, in Figure 3d, we observe that upper bound does reveal a general trend of the condition number with respect to  B 0 , but the upper bound does not reflect a significant reduction of the condition number at  ∼ 0. The reason for this discrepancy is unknown.
It is an important finding that the upper bound starts to sharply increase and diverge to infinity, and this transition occurs at about the same  as the condition number of the Hessian.This then indicates that the bound is informative from a numerical point of view, for example, it informs a constraint on  if one seeks to avoid a sudden deterioration of the conditioning of the system in Hybrid 4D-Var.On the other hand, we also note that in the three cases of changing L 0 ,  B 0 and  P f , the upper bounds reflect the direction of change in the condition number of the Hessian.Thus it shows promising results that the upper bound can be used to qualitatively analyze the conditioning of the system when these parameters alter with time.Furthermore, the upper bound is two orders of magnitude above the condition number.We find similar conclusions for the cases of changing  R 0 , the observation operator H 0 and the number of observations p (see Figure 4).
We note that the upper bound for a fixed B 0 and P f does not change insofar as the largest eigenvalue of H T 0 R 0 H 0 stays the same (see Theorem 4).Comparing  1 (H T 0 R 0 H 0 ) for different versions of H 0 , we find that H (1) , H (2) , and H (4) have the same largest eigenvalue.It is slightly smaller for H (3) but not significantly.We also point out that previous investigations in  (4) ).This group of figures showcase the unpreconditioned case with parameters of m = 100, L ens = L 0 = 0.1.4D-Var 3,8 also indicate that a small change in  2 R 0 does not change the bound noticeably.Thus we anticipate that the upper bound remains similar for different versions of H 0 .This is confirmed by cases in Figure 4b.Meanwhile, the condition number of the Hessian is the largest for the choice of H 0 = H (1) , while choosing H 0 = H (2) or H 0 = H (3) produces a similar result and they yield the best conditioning of the Hessian; the condition number associated with the randomized version H 0 = H (4) lies between those of H (1) and H (2) , H (3) .Thus the observation indicates that evenly distributed observation points produce better conditioning of the system, while partial observations concentrated in a small region lead to the opposite.
The upper bound in ( 16) also implies that changing the number of observations does not change the upper bound (as the case study in Figure 4c confirms).Last, we also verified that the upper bound also remains effective when the number of ensemble members changes.We tested cases where the number of ensemble members increases from 50 to 400.In all these cases, we find that the upper bounds have similar shapes to the condition number, and they provide a good estimation of the value of the condition number.

NUMERICAL EXPERIMENTS FOR THE PRECONDITIONED HYBRID 3D-VAR WITH CVT
Following the case studies for the unpreconditioned cases, we now conduct similar experiments to illustrate Theorem 5 and examine the predictions using Theorem 5 for preconditioned cases with CVT.In the experiments, we focus on three major issues: first, we validate the correctness of the bound; second, we illustrate the bound reflects the behavior of the conditioning in general (with a few exceptions); last, we make a comparison to the unpreconditioned cases.We note that each error covariance matrix and observation operator is given by Section 3.
An immediate observation from Theorem 5 is that the switching point in the upper bound shifts with the relative sizes of  1 (B 0 ) and  1 ( P f ) .More specifically, the max function in the upper bound in (17) switches at a larger value of  when  1 (B 0 ) increases, and the opposite occurs when  1 ( P f ) increases.Furthermore, we know that  1 (B 0 ) and  1 ( P f ) become larger as L 0 and L ens increase (see Figure 2), and the same is true for increasing  B 0 ,  P f .As Figure 5 shows, the results of these four case studies justifies the observations implied by Theorem 5, and the switching point of the upper bound F I G U R E 5 Same settings as Figure 3, but for the CVT preconditioned cases.also predicts the minimum of the condition number.On the other hand, as  1 (B 0 ),  1 ( P f ) increase with L 0 , L ens ,  B 0 ,  P f , we anticipate that the upper bound would increase with these parameters.This is confirmed by the case studies (see Figure 5).Crucially, we find that the upper bound predicts the trend of the condition number of Hessian with respect to  in all four cases.For example, in Figure 5a, we observe that the inflection points of the upper bounds predict the minima of the condition numbers, and they both move rightward when L 0 increases.In Figure 5b, we find that the inflection points of the upper bounds and the minima of the condition numbers move leftwards when L ens increases.In Figure 5c, the inflection points of the upper bounds and the minima of the condition numbers move rightwards when  B 0 increases, and in Figure 5d we find that the trend of the condition number reacting to  P f is well captured by the upper bounds, the inflection points of the upper bounds and the minima of the condition numbers move leftwards when  P f increases.However, we want to point out that the changes in the conditioning is very limited in these preconditioned cases and are normally less than one order of magnitude.
Furthermore, compared to the unpreconditioned case, these results show a clear improvement of the conditioning in Hybrid 4D-Var with CVT (we note that this is something that is also found in standard 4D-Var in previous work 9 ).We observe that CVT leads to a maximum reduction of six magnitudes in the condition number of Hessian.We also note that in all cases presented in the preconditioned Hybrid 4D-Var, the condition number of Hessian does not diverge to infinity at  = 1, and this is a crucial difference from the unpreconditioned Hybrid 4D-Var.
Theorem 5 also shows that the upper bound takes a larger value when the largest eigenvalue of K = H T 0 R 0 H 0 is bigger.Furthermore, since  −2 R 0 is a scaling factor of K, we can then anticipate that the upper bound grows with a decreasing  2 R 0 .The case presented in Figure 6a not only justifies this observation but also shows that the condition number itself follows the same trend.
On the impact of choosing different versions of H 0 , the trend is similar to the unpreconditioned case; we find that evenly spreading out the observation across the whole domain leads to better conditioning and skewing observations into a local region result in worse conditioning.We note that the upper bound remains the same for H 0 = H (1) , H (2) and H (4) (see Figure 6b,c).This is because K = H T 0 R 0 H 0 shares the same largest eigenvalue for these three versions, whereas H (1) produces a smaller maximum eigenvalue for K, therefore the upper bound is smaller.On the other hand, the impact of the number p of observations on the condition number is opposite to that of the unpreconditioned case (this is similar to previous reports of 4D-Var 8,9 ), but the bound estimation does not reflect this trend, as the largest eigenvalue of K does not change with p.We note that in both unpreconditioned and preconditioned cases, the upper bounds cannot predict F I G U R E 6 As settings as Figure 4, but for the CVT preconditioned cases.the impact of p on the conditioning of the system.The effect of sampling noise in P f is similar to the unpreconditioned case, which is that the impact on the conditioning or the upper bound is insignificant.We find that this is true even with a small sample size.
As an important justification of the effectiveness of the upper bound given by Theorem 5, we observe the changes of the upper bound with respect to  provide valuable information about the transition of the condition number (from decreasing to increasing); the inflection point of the upper bound predicts the minimum of the condition number.This is particularly important because it informs an optimal choice of  from a numerical perspective of obtaining the best conditioning of the system.We therefore can conclude that the upper bound is useful for providing qualitative information about the actual conditioning of the system.

CONVERGENCE TEST OF A CONJUGATE GRADIENT ROUTINE
We note that there are well-known situations where the condition number provides a pessimistic indication of convergence speed (e.g., in the case of repeated or clustered eigenvalues). 9Here we use hybrid 3D-Var as a special case of Hybrid 4D-Var to illustrate that for hybrid variational assimilation the convergence speed follows a similar trend to the condition number as the weight of the ensemble part increases.Following a similar method to section 5.3.2. of Tabeart et al., 8 we study how the speed of convergence of a conjugate gradient method applied to the linear system S 3D x = b changes with the weight  of the ensemble part.In the first test, the matrix S 3D is given by the Hessian of the unpreconditioned 3D-Var (Section 5), and the vector b is given by Haben where the vectors x b − x 0 , d are chosen to be random at the beginning of the trial).For the computation of S 3D , we choose the parameters as follows, L 0 = 0.1, L ens = 0.05,  B 0 =  P f =  R 0 = 1, p = 100 and H 0 = H (4) , which are in line with our previous case studies.
As Figure 7 shows, the condition number of the Hessian shows the same trend as the number of iterations performed to reach the tolerance threshold.We can observe that the conditioning is a good proxy to study the convergence speed in this case.For the preconditioned case with CVT, shown in Figure 8, we find that the difference in the condition number as  varies is marginal relative to the unpreconditioned case.One of the reason for this is that the Hessian has more eigenvalues clustered around 1, which is a result of CVT.However, the number of iterations taken to converge still follows the general pattern of the condition number, with higher values when  is close to 0 or 1.Finally, we conducted experiments with stopping criteria of different orders of magnitude to investigate the impact on the convergence speed trend.For  > 10 −6 there is very little variation in the number of iterations as  changes.For values of  smaller than 10 −6 , the number of iterations increases slightly, but the trend in Figure 8b remains consistent.

SUMMARY
In this article, we established a set of theories for the conditioning of Hybrid 4D-Var.These theories provide effective upper bounds for the condition number of the Hessian.These theoretical results are illustrated by numerical case studies using the special case of Hybrid 3D-Var.In numerical experiments we tested that the upper bounds have similar shapes to the condition number with respect to the weight of the ensemble part (i.e., ).Thus they can provide a useful estimation of the behavior of the condition number of the Hessian.In addition, the upper bound enabled us to study the condition number through parameters associated with different components and observe their interactions.We conclude that the upper bound is effective in explaining the conditioning of the Hessian matrix in general.However, the lower bound does not provide useful information about the condition number in all cases studied.We summarize our findings as follows, • For the unpreconditioned cases, the condition number of the Hessian increases gradually at first with  then quickly diverges to infinity as  → 1.This transition is predicted by the upper bounds.
• For the preconditioned cases with CVT, a general trend is that the condition number reduces at first and then increases as  increases.There is an optimal  at which the condition number is at its minimum.The upper bound predicts the optimal  effectively.
• The preconditioning with CVT improves the conditioning drastically and eliminates the divergence of the condition number of Hessian at  = 1.
• For preconditioned cases, the upper bound changes in the same direction as the condition number of the Hessian with respect to changes in L 0 , L ens ,  B 0 ,  P f , and  R 0 .In unpreconditioned cases, we have similar conclusion for L 0 ,  B 0 , and  P f .
In preconditioned cases, we find that the upper bound reveals the trend of the condition number with respect to changing parameters such as the correlation length scale and the variance.However, the theories in Section 3 cannot explain the impact of the four different choices of H 0 that we presented in this paper.Furthermore, the bounds do not change with the number of observations p and the sample size m, although p does directly influence the actual condition number.
Meanwhile, the tests show that these theories do provide useful predictions on the impact of the balancing of variances and correlation length scales for the preconditioned cases.In unpreconditioned cases, the upper bound can predict well the influence of L 0 and  P f .The bounds can inform the impact of these components on the convergence of iterative numerical algorithms.
It is well-known that the condition number of the Hessian matrix is a useful proxy to study the convergence speed of the least-squares minimization of Hybrid 4D-Var.The results presented in this paper could then inform applications in terms of the restriction of the weight of the ensemble part (for the unpreconditioned cases), such that extreme ill-conditioning can be avoided.For the preconditioned Hybrid 4D-Var with CVT, we established a theory that effectively predicts an optimal weight of the ensemble part such that the conditioning is optimal.

F I G U R E 1 A
comparison of the variation of the bounds with  for (a) (S 3D ) and (b) (S P3D ).The parameters are as follows,

F I G U R E 2 2 B 0 = 𝜎 2 R 0 =
The largest eigenvalues of B 0 and P f as functions of correlation length scale.F I G U R E 3The variation of the condition number and the upper bound with  for different values of (a) L 0 , (b) L ens , (c)  B 0 , and (d)  P f , for the unpreconditioned case with m = 100, p = 100,  2 P f =  1 and H 0 = H (4) .(a) Showcases the effect of L 0 (fixing L ens = 0.1).(b) Demonstrates the effect of L ens (fixing L 0 = 0.1).(c) Demonstrates the effect of  2 P f (fixing  2 B 0 = 1).(d) Illustrates the effect of  2

F I G U R E 4 = 1 ,
The variation of the condition number and the upper bound with  for different (a) values of  2 p = 100, H 0 = H (4) ), (b) observation operators and (c) number of observations (fixing  2

F I G U R E 7
Convergence test for an unpreconditioned case.(a) displays the condition number of S 3D and its upper bound.The parameters used to compute S 3D are L 0 = 0.1, L ens = 0.05,  B 0 =  P f =  R 0 = 1, p = 100 and we choose H 0 = H (4) .(b) displays the number of iterations.F I G U R E 8 Convergence test for a case preconditioned with CVT.(a) Displays the condition number of S P3D (as a special case of S P4D )and its upper bound.The parameters used to compute S P3D are L 0 = 0.1, L ens = 0.05,  B 0 =  P f =  R 0 = 1, p = 100 and we choose H 0 = H(4) .(b) Displays the number of iterations.