A data assimilation algorithm for predicting rain

Convective‐scale data assimilation uses high‐resolution numerical weather prediction models and temporally and spatially dense observations of relevant atmospheric variables. In addition, it requires a data assimilation algorithm that is able to provide initial conditions for a state vector of large size with one third or more of its components containing prognostic hydrometeors variables whose non‐negativity needs to be preserved. The algorithm also needs to be fast as the state vector requires a high updating frequency in order to catch fast‐changing convection. A computationally efficient algorithm for quadratic optimization (QO, or formerly QP) is presented here, which preserves physical properties in order to represent features of the real atmosphere. Crucially for its performance, it exploits the fact that the resulting linear constraints may be disjoint. Numerical results on a simple model designed for testing convective‐scale data assimilation show accurate results and promising computational cost. In particular, if constraints on physical quantities are disjoint and their rank is small, further reduction in computational costs can be achieved.


INTRODUCTION
Convection-permitting models with single digit horizontal resolution in kilometres partly resolve highly nonlinear dynamics and physics at a wide range of spatial and temporal scales. However, the quality of their prediction also depends on new advances in data assimilation algorithms which provide suitable initial conditions. In particular, there is a need for more frequent assimilation of data in order to catch fast-changing convection, better representation of time-evolving multivariate covariances, as well as a need for efficient algorithms for updating prognostic hydrometeor values such as rain, graupel or snow.
Unfortunately, prognostic hydrometeors are not updated within a number of algorithms used in operational practice, such as incremental three-dimensional variational assimilation (3D-Var; Courtier et al., 1998), four-dimensional variational assimilation (4D-Var; Rabier et al., 2000) or the localized ensemble transform Kalman filter (LETKF; Bishop et al., 2001;Hunt et al., 2007). or to update them using the latent heat nudging approach or a cloud analysis technique (Gustafsson et al., 2018). A more coherent approach would be to estimate these values directly in the state vector, allowing hydrometeors to be modified consistently with other variables. However, due to the underlying Gaussian assumption, sparse observations and other approximations in these methods, non-physical values could appear and a posteriori physical consistency checks are still needed. These checks include, for example, setting the negative values of rain, graupel and snow to zero. Moreover, if an ensemble of simulations is used to construct covariances, as in ensemble data assimilation algorithms or hybrid variational algorithms, then consistency checks must be made for each ensemble member, thereby artificially modifying the uncertainty representation. Importantly, these techniques clearly interfere with the optimality of the computed result and the quality of the resulting prediction, especially given the ill-conditioning of the problem.
The ensemble Kalman filter (EnKF; Evensen, 1994Evensen, , 2003Burgers et al., 1998) is capable of handling complex and highly nonlinear processes, and the adjoint of the model is not needed for its application. This makes it attractive for convective-scale applications. Since proper estimates of hydrometeors are crucial for prediction on convective scales, the question arises whether the EnKF method can be modified to improve these estimates. If such a modification could be found, one could for example optimize the use of radar observations to initialize numerical weather prediction (NWP) models, a real advantage given the importance of this dataset for prediction of convective storms (Sun et al., 2014). Various approaches can be taken within the EnKF framework in order to deal with non-Gaussian errors. For example, variables can be transformed by assuming that the relevant state variables follow appropriate pre-specified non-Gaussian distributions, such as the lognormal (Cohn, 1997) and truncated Gaussian (Lauvernet et al., 2009) distribution or, more generally, by carrying out a parametrized change of the state variables known as Gaussian anamorphosis (Simon and Bertino, 2009). Lien et al. (2013), for example, apply Gaussian anamorphosis on the assimilation of precipitation with the LETKF.
As an alternative to the above statistical-based approaches, Janjić et al. (2014) modified the EnKF algorithm to include constraints directly on the state vector. The Janjić et al. (2014) algorithm, Quadratic Programming Ensemble (QPEns), combines the EnKF and quadratic optimization (QO), exactly preserves feasibility with respect to linear equality and inequality constraints in each of the ensemble members, and produces multivariate estimates for all fields. This method was shown to outperform the EnKF as well as the EnKF with a lognormal change of control variables. As argued in the paper, this is caused by the fact that each of these statistical methods preserves mass (EnKF) or positivity (lognormal EnKF) but not both. This approach was extended by Ruckstuhl and Janjić (2018), showing that similar conclusions hold for a nonlinear multivariable model designed to test convective-scale data assimilation applications. Both these papers used an active set or interior point quadratic programming algorithm for solving the constrained minimization as implemented in MATLAB (Gill et al.,, 1981;1984) and Python (Andersen et al., 2010). Although these results convincingly illustrate the benefits of including the constraints directly in the minimization, the QO algorithm has turned out to be difficult to implement in practice for applications such as weather forecasting at the convective scale, as explained next.
Let us consider the problem where x and g x belong to ℜ n − p , y and g y to ℜ p , P xx is an (n − p) × (n − p) symmetric real matrix, P yy a p × p symmetric real matrix, P xy a (n − p) × p real matrix, A an m × (n − p) real matrix with m ≤ (n − p) of rank m, b ∈ ℜ m and the inequality is understood component-wise. It is easy to extend our discussion to more general bound constraints where one requires ≤ y ≤ u for some , u ∈ ℜ p with ≤ u, both of them possibly having infinite components. We refer to a problem of the type (1)-(2) as having disjoint constraints in the sense that the two sets of constraints of (2) involve disjoint sets of variables. In what follows, we focus on the convex case, and assume that P def = ( P xx P xy P T xy P yy ) is positive definite. Typical data assimilation algorithms produce initial condition of the model by minimizing, perhaps iteratively in a Gauss-Newton framework, an objective function of type (1). Minimization is usually performed every hour in convective-scale applications using new measurements of the atmosphere. The result of the minimization is a correction to a prediction of the numerical model for a given time. In this application, the vector z = (x T , y T ) T to be estimated consists of variables describing the state of the atmosphere at a given time (such as pressure, temperature, wind direction and speed, … ) in all grid points of the numerical model. Its size n ranges between 10 6 and 10 9 , resulting in truly large-scale problems. The vector y usually describes different water phases such as rain, graupel and snow at all grid points. The physical nature of these variables implies that they have to be non-negative. In this case, p is approximately one third of n, which remains very large, and could be even larger if for example a second-moment microphysics scheme is used instead of a first-moment scheme in the convection-permitting model.
Both the size of vectors x and y and the frequency of their estimation (which is usually an hour or less), significantly limit the number of iterations of the minimization algorithm that can be performed. Moreover, in an ensemble setting, not only one but an ensemble of estimates is produced in order to correctly specify uncertainty, further increasing the computational burden. The computational cost also depends on the number and the nature of the constraints, with inequality constraints being often more challenging than equality constraints. It is the purpose of this short paper to show that particular optimization algorithms can be combined in a way that is less expensive in computer time, thereby making a practical application affordable. While these considerations are based on data assimilation for weather forecasting, the methods discussed here are also applicable to similar contexts in a wide variety of problems including chemistry, ecosystems and ocean data assimilation (Bertino et al., 2003;Simon and Bertino, 2009;Bühner et al., 2013, to mention a few).
The paper is organized as follows. Section 2 introduces two algorithms which exploit the fact that the constraints are disjoint to solve problems of type (1)-(2). The application of these optimization algorithms is explored using a known convective-scale example first introduced by Würsch and Craig (2014). The model and the data assimilation framework are described in Section 3. Section 4 first analyses the convergence of both algorithms for the minimization problems appearing in the QPEns. This is followed by an investigation on how the performance of the QPEns is affected when the proposed minimization algorithms are terminated early. Here, the data assimilation results are compared against the EnKF as a benchmark, since the EnKF solution would be the same as the QPEns solution in case no constraints are imposed. A final discussion and some perspectives are presented in Section 5.

A QO algorithm for disjoint constraints
For solving problem (1)-(2), we propose an 'active-set' algorithm whose feature is to maintain feasibility with respect to the linear equality constraints on x at each iteration k, while at the same time using classical projection techniques (Conn et al., 2000, chapter 12) to enforce feasibility of y. Active set algorithms aim at iteratively finding the set of constraints satisfied as equalities at the solution by temporarily choosing some 'active constraints' at a given iteration and solving the corresponding equality constrained problem. This set of active constraints is then updated as the iterations proceed. This type of method is in contrast with 'interior point' methods where an adaptive logarithmic barrier prevents iterates to satisfy constraints exactly, keeping them 'interior' to the feasible domain. Chapters 12 and 13 of Conn et al. (2000) provide a description of these method classes.
Interior point methods perform well with respect to active set methods when the combinatorial aspect of selecting the correct active bounds dominate. However, active set methods are superior when identification of the active constraints is efficient. Our active-set QO algorithm for disjoint constraints uses efficient classical projection techniques suitable for bound constrained optimization problems to identify the active constraints. These techniques are in general not applicable when the optimization problem is subject to other constraints as well, in which case more generic but less efficient techniques have to be used. However, here we exploit during minimization that moving along the projected search direction on y does not influence the constraints on x, thereby allowing the use of these efficient projection techniques to find the active set.
The active-set QO algorithm for disjoint constraints is stated as Algorithm 1. In this description, we used the notation that, if  is a subset of {1, … , p} and v a vector in ℜ p , then the vector v  of dimension || (the number of elements in ) is the vector v reduced to its components in . Analogously, M  is the matrix M reduced to its columns (and rows, if it is symmetric) whose index is in .
We now discuss the successive steps of this algorithm. If not available at the onset, a feasible point can be computed in Step 0 by solving a linear least-squares problem for x 0 and choosing any y 0 ≥ . The set  k defined in (3), Step 1, is the 'active set' for iteration k and contains the indices of all variables that are exactly at their bound and that would leave the feasible region if moved along the steepest-descent direction. The algorithm fixes them for the duration of the kth iteration, making the corresponding bound constraints 'active'. The variables indexed by the complementary set  k are called 'free variables' and are allowed to vary within their bounds at iteration k. Since, as we observe below, (4) ensures optimality of the x variables, termination of the algorithm occurs in Step 2 at an approximate solution when all the free components of y are optimized to a certain accuracy, that is, when ∇  k y  (x k , y k ) is sufficiently small. A search direction is found in Step 3 by solving (in the free variables) the standard Karush-Kuhn-Tucker (KKT) optimality conditions for problem (1) restricted to (x, y  k ), which are given by the system (4). Note that the vector w k then contains the Lagrange multipliers associated with the equality constraints. The third line of (4) imposes that As k = 0, and it is important that this equation is satisfied with high accuracy if exact feasibility with respect to the linear equality constraint is to be preserved. We refer the reader to Gould and Toint (2002) for an in-depth discussion of this point. With this caveat, the system (4) can be solved using a Krylov solver like MINRES or GMRES (Saad, 1996 gives a description of these methods), or by a 'constrained preconditioned' conjugate gradient method (Gould et al., 2001;Gould and Toint, 2002). If this is the case, any preconditioner must also be reduced (in its y part) to the subset of currently free variables indexed by  k . If the dimension and sparsity of P allows (which is typically not the case in weather forecasting), a stable factorization can also be used to solve (4) accurately. Because the equality and bound constraints are disjoint, the step size computation in Step 4 ensures feasibility of y, without altering feasibility of x with respect to the linear equality constraints (ensured by the third line of (4)). Observe that the trajectory of the vector max [ for ≥ 0 is the projection of the half line y k + v k on the feasible domain for the bound constraints. Thus Step 4 simply finds the first minimizer of a quadratic function along a piecewise linear path. The next iteration k + 1, starts in Step 1 where a new active set  k+1 and new complementary set  k+1 is defined, possibly freeing variables and activating new ones.

A projected algorithmic variant
Often the matrix A has a very simple form, for example when representing conservation of mass. In the example presented in Section 3, it is of size 1 × (n − p) and therefore of rank one. We may then easily project the problem into the nullspace of A by choosing Z to be the symmetric orthogonal projection matrix onto that nullspace and applying the change of variable x = Zx which leads to the Algorithm 1. QO algorithm for disjoint constraints Step 0: Initialization. A feasible starting point (x 0 , y 0 ) is given (i.e., Ax 0 = b, y 0 ≥ ), as well as an accuracy threshold >0. Set k = 0.
Step 3: Search direction computation. Solve Step 4: Projected search. Determine > 0 such that subject to y ≥ .
Problem (5)- (6) is now a bound-constrained quadratic problem. Note that, without the disjoint constraints assumption, inequality constraint (6) would not be simple bounds and therefore more complex algorithms than classical projection techniques would be required. It should be mentioned that problem (5)-(6) is no longer strictly convex. However, it remains strictly convex in the nullspace of A, which is the range space of Z and thus the range space of the problem's Hessian matrix. As a consequence, standard Krylov methods such as Conjugate Gradients (CG; Hestenes and Stiefel 1952) may still be applied, including for large-scale instances (e.g. Conn et al., 1992;Dostál 1997;Lin and Moré 1999). Indeed, such methods are known to operate only in the range space of the problem's Algorithm 2. Projected QO algorithm for disjoint constraints Step 0: Initialization. A feasible starting point (x 0 , y 0 ) is given (i.e., y 0 ≥ ), as well as an accuracy threshold >0. Compute a basis Z of the null space of A and set k = 0.
Step 1: Active-set update. Define Step 2: Termination test. Terminate if the following conditions hold: Step 3: Find the Cauchy point and determine its active Step 4: Minimization beyond the Cauchy point. Apply the CG algorithm to find an approximate minimizer Terminate the CG iterations when one (or more) bound(s) of indices j 1 , … , j s are encountered, or if convergence occurred within prescribed accuracy. If CG was terminated because bounds were encountered, perform a line search similar to Step 3 along the search direction to update the iterate and repeat Step 4 after redefining  k =  k ⧵ {j 1 , … , j s } and using the new iterate as a starting point.
Hessian when started in this range space (with Zg x in our case). The use of the CG method also presents further advantages in our context. The first is that, when suitably preconditioned, it is well-known to be efficient for large problems whose size precludes methods using matrix factorizations and where the number of iterations is often limited by computational considerations. The second is that it only requires products of the Hessian matrix with vectors, which can be cheaply computed without explicitly forming ZP xx Z. The third is that it can be 'warm-started' in the sense that it is able to exploit a good guess of the active set -which is expected when convergence occurs -instead of returning to the full space at each major iteration. We therefore choose to apply the CG method with diagonal preconditioning to minimize the quadratic cost function in the current face of the feasible domain, that is the subspace spanned by thex and the free y variables at the current iterate, restarting the procedure as needed when new bound constraints become active during the calculation and a new (lower-dimensional) face must be explored. Note that the first face contains the negative gradient of the free variables, and the first step of CG performs a (in this case projected) line search along this direction. The resulting minimizer is known as the generalized Cauchy point and the decrease of the objective function obtained at this point is known to guarantee overall convergence of the algorithm (Conn et al., 2000, chapter 12). Methods differ essentially by their face changing mechanisms but require that constraints active at the Cauchy point are not made inactive during the rest of the restarted CG steps. A simple version of the resulting algorithm (based on Conn et al., 1992) is stated as Algorithm 2.

NUMERICAL EXPERIMENTS
In order to illustrate the behaviour of algorithms presented in Section 2 in a data assimilation context, we use the modified shallow-water model of Würsch and Craig (2014) and twin experiments. In this section we describe the details of our experimental set-up.

Model
The modified shallow-water model has been used for testing different data assimilation algorithms in Haslehner et al. (2016) and Ruckstuhl and Janjić (2018). The model is based on the shallow-water equations, which have been utilized for a long time in testing both numerical discretization schemes (Sadourny, 1975;Arakawa and Lamb, 1980;Cullen et al., 1997;Sommer and Névir, 2009; Ketefian and Jacobson, 2009) and data assimilation algorithms (Cohn and Parrish, 1991;Zeng and Janjić, 2016;Zeng et al., 2017). As the name suggests, the shallow-water equations have been altered in Würsch and Craig (2014) in order to mimic key aspects of convection. To that end, a third variable rain r was introduced in addition to the velocity (or wind) u and fluid height level h fields. The one-dimensional modified shallow-water model consists of following equations: The parameters D u , D r , D h are diffusion coefficients in the equations for momentum, rain mass and the continuity equation. Here, g is the gravity constant and = √ gh 0 is the gravity wave speed for the absolute fluid layer h 0 . To initiate convection, convective updrafts are modelled by stochastic Gaussian forcing u added at random locations in each model time step. These random wind perturbations, when they create convergence, elevate the fluid surface. Conditional instability is modelled by modifying the geopotential , which is set to a low constant value c once the height exceeds a threshold representing the level of free convection h c . The gradient of the geopotential will force the fluid into the region of the decreased potential. The fluid level will then continue to rise until diffusion prevents further growth. When h reaches the rain threshold h r (h r > h c ) and convergence is positive, rain is 'produced'. The parameter is the production rate for rain and is its removal rate. The value of h r ensures that production of rain is delayed relative to the onset of clouds (height field).
In our numerical implementation of the model, the one-dimensional domain, representing 125 km is discretized with 250 points using standard second-order centred differences on a staggered grid. To compute evolution of the dynamical system, the time variable is discretized into time steps of 5 s. Periodic boundary conditions are used. The model conserves mass, so the spatial integral over h is constant in time and the rain r cannot be negative. If rain becomes negative during model integration, negative values are set to zero. The Gaussian stochastic forcing u has a half-width of four grid points and an amplitude of 0.002 m⋅s −2 . The fields produced by running this model starting from identical initial conditions are illustrated in Figure 1 after 120 model time steps (which is equivalent to 10 min in real time). As illustrated in Figure 1, starting with exactly the same initial conditions, the position of clouds (height field) and rain can be different after 120 model time steps, mimicking fast-changing convective storms whose intermittency is one of the challenges of data assimilation on the convective scale.

Data assimilation experiments
The data assimilation set-up follows Ruckstuhl and Janjić (2018). To investigate the performance of the two algorithms presented in Section 2, we conduct a twin experiment, where we consider a model run to be the true state z = ( u T , h T , r T ) T , which we call the nature run. At each assimilation time, a vector of synthetic observations z obs ∈ ℜ o is created by randomly perturbing the nature run such that z obs = Hz + obs , where H is the o × n matrix that determines the location and physical nature of the observations, while obs ∈ ℜ o is a random noise whose components depend on the observed variable and is computed as follows. For observations of the wind and height field, a Gaussian observation noise is added to the wind u and height h fields with zero mean and standard deviations 0.001 m⋅s −1 and 0.02 m, respectively. A lognormal noise is added to the rain field with parameters −8 and 1.8, yielding a very small observation bias of 0.000825 and standard deviation of 0.00185. For this choice of parameters, the observation error for each field is approximately 10% of the maximum deviation from the field mean. We chose to observe every 120 model time steps, which is equivalent to 10 min in real time. The location of the observations are chosen to mimic radar data, and are simulated at locations where it rains in the nature run. As illustrated in Figure 1, it only takes 120 time steps (10 min) for the location of the observations in the nature run to differ from that of rain in each ensemble member. In addition, an extra 25% of observations are assimilated for the u field at other locations, mimicking observations from a different measurement system. The observation-error covariance matrix R is taken to be diagonal with values on the diagonal corresponding to the variance of the distributions used for generating the observation error vector obs . Although the number of observations, and therefore the matrix sizes for the linear observation operator H, and R will change from one assimilation cycle to next, we omit the time index for notational simplicity.
We employ the Quadratic Programming Ensemble (QPEns) algorithm (Janjić et al., 2014) for data assimilation. As the stochastic EnKF, the QPEns uses an ensemble of predictions (backgrounds) {z b,i } N i=1 to calculate sample background-error covariance matrix P b with z b representing the ensemble mean. Since the sample covariance P b is singular when (N − 1) < n, it is in our experiments localized with a blockwise matrix where each block uses fifth-order polynomial function (Ruckstuhl and Janjić, 2018 give details) with cut-off at eight grid points. The QPEns algorithm then calculates the analysis ensemble from the background ensemble, the perturbed observations, the localized background-error covariance matrix and the observation-error covariance matrix. However, instead of using the Kalman filter formulae to obtain the analyses, it solves the following quadratic programming problem for each member i = 1, … , N: subject to physical constraints. Here z i = z a,i − z b,i is the analysis increment and f i = z obs,i − Hz b,i − H z i . The measurements appearing in f i are perturbed using the same strategy as with generating observations, which is described at the beginning of this section. The solutions to the N constrained optimization problems form the analysis ensemble {z a,i } N i=1 . Physical constraints for the modified shallow-water model are conservation of mass and preservation of non-negativity of rain. In the QPEns framework, these constraints transfer to where e T h = (1, … , 1) is of the same size as h, and r, h are the corresponding components of z. Note that the inequality in (16) is equivalent to r a, i ≥ 0, and the equality to of same size as u and y = r, N = 50, P = (P b ) −1 + H T R −1 H and for each ensemble member i the gradient is calculated as (g i x , g i y ) T = −H T R −1 (z obs,i − Hz b,i ). Data are assimilated every 10 min and for a total of 36 cycles. In order to obtain robust results, twin experiments were repeated many times for different random seeds.

RESULTS
In the data assimilation experiments discussed in this section, the minimization of the problem (15)-(16) is carried out using the algorithms described in Section 2. First, the convergence of both algorithms is evaluated in Section 4.1. Next, the effect on the data assimilation results of only partially solving the minimization problems in the QPEns is investigated in Section 4.2.

Convergence of the minimization algorithms
To apply Algorithm 1 we use an LU decomposition with pivoting to solve (4) accurately. The lower block of this system only contains one equation in this case. Note: To illustrate the accuracy, the difference is calculated between result of each major iteration z k and the solution z * . The upper (lower) rows report the first (tenth) data assimilation cycle for one of the ensemble members. member during a first data assimilation cycle and during a tenth cycle. In this example we set = 0.1, and we see (in the last column of Table 1) that the RMSE compared to the solution decreases at each iteration, reaching an accuracy of ∼10 −3 after five iterations in the first cycle example and a tenth digit accuracy after four iterations in the tenth cycle example. The minimization stops when the ratio of the gradient to the initial gradient reaches . The third column in the Table 1 shows the number of inactive constraints, that is, the kth guess of the set of grid points where it rains, which determines the free variables in the computation of the search direction (4). After the search direction is calculated, the size of the allowed step is computed in

TA B L E
Step 4 of Algorithm 1, and reported in the fifth column in Table 1. For completeness, the value of the cost function (1) in each iteration is included in Table 1 (second column). As illustrated in Table 1, Algorithm 1 converges in only five iterations on this example, while a more general interior point method like the CVXOPT package (Andersen et al., 2010) typically requires more iterations to solve the same problem (not shown). Figure 2 shows the dependence of the RMSE with respect to the solution on the specified accuracy and on the number of iterations performed for the first ten cycles of data assimilation. By the tenth cycle, the background is relatively close to the solution and high accuracy is reached within a small number of iterations. Note that the stopping criterion of the minimization algorithm is relative to the gradient of the cost function at the starting point (x 0 , y 0 ), which in our case is the background. That this gradient decreases as the background becomes more skilful in further cycles makes sense, leading to a higher accuracy requirement for a given , as is clear from Figure 2a,c,e. To illustrate the cost of this algorithm, the number of iterations is plotted as a function of the specified relative accuracy in Figure 3. The number of iterations increases to six only for very small .
In Algorithm 2 minimization is done in the projected space with Z = I − A T A∕e T h e h . We found that a suitable preconditioner for the CG is an important ingredient for efficiency. In our case, the diagonal of the matrix entering the CG algorithm is used. We refer the reader to Gratton et al. (2013) for a discussion of suitable precondition strategies in the context of data assimilation. It is also known that Algorithm 2 could be implemented without Step 4 if mere convergence is wanted, but that performing conjugate gradient iterations, as suggested in Conn et al. (1992), very often significantly reduces the number of outer iterations. This was also observed for our test problem. The CG subproblems are terminated when the residual is smaller than a fractioñof the residual computed from the initial F I G U R E 3 Number of iterations k required for Algorithm 1 as a function of the specified relative accuracy . Different shades of blue represent results for each of the first ten data assimilation cycles. The smallest value used in the experiments is 0.01, then values 0.1 to 1 in steps of 0.1. Results were obtained by averaging over 374 experiments quantities entering the CG algorithm. We somewhat arbitrarily fixed̃= for all the experiments presented here. As in Table 1, Table 2 illustrates the performance of the algorithm for one ensemble member during the first and tenth data assimilation cycle. As before, we choose = 0.1 in this example, and we see that in this case the error with respect to the solution decreases with each iteration, reaching an accuracy of 10 −3 after three outer iterations and a total of 131 CG iterations in the first cycle. In the tenth data assimilation cycle, four digits of accuracy are achieved after four outer iterations and a total of 214 CG iterations. Note that the minimization terminates only once all conditions in Step 2 of Algorithm 2 are satisfied. The third column of Table 2 shows the number of free variables at each iteration. This number decreases, contrary to the increase over iterations in Table 1. Although this TA B L E 2 Illustration of the performance of Algorithm 2 Note: 'CG its' denotes the number of CG iterations at major iteration k and 'faces' is the number of explored faces at iteration k. To illustrate the accuracy, the difference is calculated between result of each major iteration z k and the solution z * . The upper (lower) rows report the first (tenth) data assimilation cycle for one of the ensemble members. Step 3. The fact that this step-size is very small illustrates the poor conditioning of the problem. We also note that the number of successive progressively smaller subspaces in which the CG is applied (listed under the 'faces' heading) is quite small and decreases as the guess of the optimal active set improves (and the associated number of CG iterations -under the heading 'CG its' -decreases). For completeness, the value of the cost function in each iteration is also included in the Table. The dependence of the RMSE on the specified accuracy and the number of total CG iterations performed is shown in Figure 4 for the first ten cycles of data assimilation. The total CG iterations refers to the total number of CG iterations accumulated on all outer iterations. In contrast to Algorithm 1, a clear correlation between the cycle number and the accuracy as a function of (Figure 4a,c,e) is not observed. We believe this is due to the stopping criterion of the CG subproblems, which depends on the latest iterate. In addition preconditioning is involved, which affects the stopping criterion as well. Figure 5a shows the number of accumulated CG iterations as a function of . Note that the CG iterations dominate the costs of the algorithm, thereby allowing a natural cost-comparison between Algorithm 2 and any other variational data assimilation algorithms that apply the CG for minimization. Figure 5b is a direct result of our termination choice for the CG. Generally, when the subproblem in Step 4 is solved to a high accuracy, few outer iterations are needed and vice versa. We found that the number of CG iterations accumulated over all outer iterations is fairly insensitive to the exact choice of termination test for the CG, as long as it is not too lenient (not shown). For̃= , Figure 5b indicates that the average ratio of outer iterations to CG iterations per outer iteration is larger for lower accuracy of the solution.

Data assimilation results
When the minimization problems of type (15)-(16) are solved to high accuracy, both algorithms yield the same data assimilation results as for example obtained with the interior point method of Python used in Ruckstuhl and Janjić (2018). However it is interesting to investigate the iterations required for Algorithm 2 as a function of the specified relative accuracy . Different shades of blue represent results for each of the first ten data assimilation cycles. The smallest value used in the experiments is 0.01, then values 0.1 to 1 in steps of 0.1. Results were obtained by averaging over 500 experiments effect of terminating the algorithms early, as this might be necessary due to the size of the minimization problem encountered in practice. An advantage of the presented algorithms is that they guarantee that the constraints are satisfied at each iteration. Interior point methods, on the other hand, either violate the constraints throughout the minimization process or only allow approaching the bounds near the solution (depending on the precise formulation of the algorithm), making them unsuitable for early termination in our context. We performed an experiment where Algorithm 1 is terminated after one iteration and Algorithm 2 once a total of ten CG iterations is reached. The results are presented in Figures 6 and 7. As reference, both the QPEns with fully solved minimization problems (red line) and the EnKF (blue line), which is equivalent to the QPEns without constraints, are plotted as well. As thoroughly discussed in Ruckstuhl and Janjić (2018), the QPEns produces smaller RMSE than the EnKF. The differences between the QPEns and the EnKF persist during 6 hr forecasts and may especially be seen in the height field. The difference in the rain prediction is most pronounced during the first 3 hr. The approximate QPEns for Algorithm 1 (orange line) is very close to the exact QPEns, indicating that performing one iteration for each minimization problem already yields desired results. The approximate QPEns for Algorithm 2 (green line) yields worse analyses for wind variable u than the EnKF, although this is corrected throughout the forecast. The other variables clearly benefit from the imposed constraints, even when the minimization problems are not fully solved. For the rain variable r, the RMSE is even lower than for the exact QPEns. The RMSE of forecasts starting with a perfect initial ensemble (black line) is also plotted for comparison, in order to illustrate the intrinsic predictability of this model. Although all ensemble members initially start from the same true state in the perfect ensemble, they differ during the forecast due to stochastic perturbations in the wind field (11).
To understand the differences in RMSE, let us consider one snapshot of the results in Figure 7, with the truth plotted in black. One notices that, due to the small localization radius and the observations that mimic radar data, differences in the results are primarily the spurious convection that arises for the EnKF. Although differences in the solution of the constrained minimization problems (corresponding to the QPEns) and unconstrained minimization problems (corresponding to the EnKF) are generally small, Ruckstuhl and Janjić (2018) show that data assimilation errors of the EnKF will accumulate over time leading to large errors in total mass and total rain after repeating data assimilation 250 times, that is, in less than one day. We refer the interested reader to Ruckstuhl and Janjić (2018) for the investigation of the differences between the QPEns and the EnKF analysis for different data assimilation settings, including ensemble size. The approximate QPEns for Algorithm 1 closely follows the exact QPEns. This does not hold for the approximate QPEns of Algorithm 2, where large differences are detected in the wind (u) variables. Also in the other variables the differences from the exact solution are more pronounced than for the approximate QPEns of Algorithm 1.

CONCLUSION
In NWP, we can expect that even global models will reach the convection-permitting resolution in the forthcoming decade. Data assimilation for such geophysical models that resolve many scales of motion and for observations of higher temporal and spatial density would require that we re-evaluate and improve on the methodology. Improved data assimilation systems will therefore require the development of novel approaches that account for non-Gaussianity, model errors, better understanding of how different existing methods can represent uncertainty in data assimilation and forecasting steps of both observations and numerical models . But also, data assimilation methods would require a better way to estimate (multivariately) prognostic hydrometeors. This is a challenging task, due to the computational costs of producing non-negative estimates, the frequency of their estimation, and a need to initialize probabilistic prediction. We have presented two projection algorithms which exploit the disjoint nature of constraints typically occurring in weather forecasting applications. While projection methods may be inefficient when the combinatorial aspect of selecting the correct active bounds dominate and many faces need to be explored at each major iteration (in which case the interior-point algorithms often perform better), they do perform well compared to the interior-point algorithms when the gradient quickly provides a good identification of the active constraints. This appears to be the case in our (representative) application. In addition, in contrast to interior point methods, the presented algorithms live in the feasible region, thereby allowing early termination of the minimization process without violation of constraints. We showed that, after imposing a termination test of a total of ten CG iterations for Algorithm 2, the QPEns still significantly outperformed the EnKF. The first of our methods, Algorithm 1, is more efficient than an interior point approach on a representative example, but still requires solving the Karush-Kuhn-Tucker system (4), which is impractical in weather forecasting applications due to the problem frequency and size. By contrast, Algorithm 2 exploits the low rank of the linear equality constraints and uses a well-known iterative approach to compute a possibly approximate solution while ensuring satisfaction of the constraint. If the size and conditioning of the problem are such that the conjugate gradient algorithm is allowed to converge, the number of outer iterations required by Algorithm 2 is smaller or comparable to that required by Algorithm 1. If the number of conjugate gradient iterations is limited from the start (as is often the case in weather forecasting applications), the number of outer iterations typically increases and finding the optimal equilibrium between accuracy and cost then depends on the problem at hand.
The observations made in this short paper are encouraging, but the authors are aware that adapting the proposed method(s) to a real environment remains a significant task. Both algorithms could be applied in pure 3D-Var or ensemble/hybrid setting, and for Algorithm 2 already existing implementations of the CG could be modified. However, our current implementation is in physical space. Further computational savings would be possible by revising the constrained problem in observational space and assimilating data in batches (similar to that done for the unconstrained problem in Cohn et al., F I G U R E 7 Snapshot for modified shallow-water variables (a) u, (a) h and (c) r after 36 cycles of data assimilation. Depicted are true fields (black), background ensemble mean of the EnKF (blue), QPEns (red), approximate QPEns for Algorithm 1 (orange) and Algorithm 2 (green) (a) (b) (c) 1998), combined with algorithms that draw inspiration from Gratton and Tshimanga 2009and Gratton et al., 2011. The alternative would be a more elaborate preconditioning, which would allow transformation of the data assimilation problem into a smaller control space. For example, in an ensemble setting, the square root of the inverse of the Hessian is easy to calculate and could be used for preconditioning (Zupanski, 2005). However, the current challenge with this approach is that inequality constraints are then more difficult to handle. Finally, more thought should be given to possible improvements of the details of the face-changing mechanism, such as exploiting suggestions in Dostál (1997).