A robust numerical method for the potential vorticity based control variable transform in variational data assimilation


  • S. Buckeridge,

    1. Department of Mathematical Sciences, University of Bath, BA2 7AY
    Search for more papers by this author
    • The contribution of these authors was written in the course of their employment at the Met Office, UK, and is published with the permission of the Controller of HMSO and the Queen's Printer for Scotland.

  • M.J.P. Cullen,

    Corresponding author
    1. Met Office, Exeter, UK
    • Met Office, Exeter, UK.
    Search for more papers by this author
    • The contribution of these authors was written in the course of their employment at the Met Office, UK, and is published with the permission of the Controller of HMSO and the Queen's Printer for Scotland.

  • R. Scheichl,

    1. Department of Mathematical Sciences, University of Bath, BA2 7AY
    Search for more papers by this author
  • M. Wlasak

    1. Met Office, Exeter, UK
    Search for more papers by this author
    • The contribution of these authors was written in the course of their employment at the Met Office, UK, and is published with the permission of the Controller of HMSO and the Queen's Printer for Scotland.


The potential vorticity based control variable transformation for variational data assimilation, proposed in Cullen (2003), is a promising alternative to the currently more common vorticity based transformation. It leads to a better decorrelation of the control variables, but it involves solving a highly ill-conditioned elliptic partial differential equation (PDE), with a constraint. This PDE has so far been impossible to solve to any reasonable accuracy for realistic grid resolutions in finite difference formulations. Following on from the work in Buckeridge and Scheichl (2010) we propose a numerical method for it based on a Krylov subspace method with a multigrid preconditioner. The problem of interest includes a constraint in the form of two-dimensional elliptic solves embedded within the main three-dimensional problem. Thus the discretised problem cannot be formulated as a simple linear equation system with a sparse system matrix (as usual in elliptic PDEs). Therefore, in order to precondition the system we apply the multigrid method in Buckeridge and Scheichl (2010) to a simplified form of the three-dimensional operator (without the embedded two-dimensional problems) leading to an asymptotically optimal convergence of the preconditioned Krylov subspace method. The solvers used at the Met Office typically take over 100 iterations to converge to a residual tolerance of 0.1 and fail to converge to a tolerance of 10−2. The method proposed in this paper, in contrast, can converge to a tolerance of 10−2 within 15 iterations on all typical grid resolutions used at the Met Office, and is convergent to a tolerance of 10−6. In addition, the method demonstrates almost optimal parallel scalability. Copyright © 2011 Royal Meteorological Society and British Crown Copyright, the Met Office

1. Introduction

Two- and three-dimensional elliptic partial differential equations (PDEs) play a major role in numerical weather prediction (NWP) and their solution is often the main bottleneck.

In NWP, the state of the atmosphere is described using a set of ‘model variables’ such as wind velocity, pressure, moisture and temperature. Data assimilation (DA) is the process of finding the best estimation of the current state of the atmosphere by combining a previous forecast and the knowledge of atmospheric dynamics with observational data and statistical data that measure the accuracy of the forecast and of the observations. Uncertainties are quantified with probability density functions, and a model state is found that is the statistically optimal estimate of the truth given the previous forecast and new observations, as well as estimates of the errors in each of them. Due to the chaotic nature of the governing equations in the model, any errors in the initial conditions will be amplified in the forecast. Thus, despite the continuous advancements in computational power and in numerical methods, these benefits cannot be fully realised in NWP without accurate data assimilation techniques.

A very successful and popular data assimilation technique, used e.g. in the UK Met Office's DA software VAR, is incremental 4D-VAR described by Lorenc (1986), Rawlins et al. (2007), and Katz et al. (2011). This method attempts to minimise a cost function with respect to the forecast error (or increment) x′ = xxf of the model variables, where x is the true model state and xf is the previous forecast. It relies on the inversion of the ‘background error covariance’ matrix B, which measures the uncertainties of the background state, i.e. an approximation of the current state of the atmosphere produced from a previous forecast. Unfortunately, B is a dense matrix due to the strong correlations between errors in the model variables. Moreover, B is a large matrix of size ��(107 × 107), hence it is too large to store explicitly, and attempting to invert it is operationally impractical.

However, the VAR problem can be expressed in terms of new variables that allow to simplify the background error covariance matrix in a process known as the ‘control variable transformation’ (CVT) or parameter transform (Bannister, 2008). The CVT is the transformation between the model variables and new ‘control variables’, whose errors may be assumed uncorrelated to a reasonable approximation. By expressing the VAR problem in terms of the control variables, it is possible to approximate the B-matrix by a block-diagonal matrix neglecting correlations between different control variables. Further transformations can then be used to spatially decorrelate the variables so that the B-matrix essentially becomes a diagonal matrix (see Ingleby, 2001).

The idea of the CVT is to partition errors into ‘balanced’ and ‘unbalanced’ components which are assumed mutually uncorrelated. ‘Balanced’ flows are those which are close to geostrophic and hydrostatic balance. Following Hoskins et al. (1985), it is known that the evolution of such flows can be described by the transport of the potential vorticity (PV) together with a diagnostic calculation of the mass and wind fields. This calculation involves solving two- and three-dimensional elliptic problems which can be computationally very costly. For example, an option in the current CVT is the solution of the quasi-geostrophic Omega (QG-Ω) equation for finding the vertical velocity increment (see Fisher, 2003). A robust numerical method for this equation has recently been developed by Buckeridge and Scheichl (2010).

Operationally at the UK Met Office, the control variables used are the streamfunction ψ, the unbalanced pressure pu and the velocity potential χ. The streamfunction is used to represent the balanced component of the flow. However, that assumption cannot be applied on scales larger than the Rossby radius of deformation, or aspect ratios greater than f/N where f is the Coriolis parameter and N the Brunt-Väisälä frequency, meaning that the balance operator has to be disabled on those scales.

A better decorrelation of the variables can be obtained by a CVT based on PV described and formulated by Cullen (2003), Wlasak et al. (2006); Bannister et al. (2007). The balanced components of flow are described by the PV whilst the unbalanced components of flow have no PV and are therefore associated with so-called ‘anti-PV’. Using this premise, a new control variable is chosen that is related to PV to represent the balanced component of the forecast error. The control variables used in this new ‘PV-based’ CVT (see Bannister and Cullen, 2009) are the balanced streamfunction ψb, the unbalanced pressure pu and the velocity potential χ. The new formulation recognises the presence of an unbalanced streamfunction and exploits the association between PV and the balanced component of the flow. Hence, it should not suffer the shortcomings of the vorticity-based CVT. However, it introduces a new highly ill-conditioned PDE, namely the ‘balanced PV-equation’ (see Bannister and Cullen, 2007):

equation image(1)

where equation image is the 2D Laplacian in spherical polar coordinates (r,ϕ,λ). The coefficients α0(·,·), β0(·,·), γ0(·,·), ε0(·,·) and m0(·,·) are reference state values, specified in Appendix A (see also Bannister and Cullen, 2009), that only vary with latitude and radius. PV′(·,·,·) is the potential vorticity increment. The balanced PV-equation (1) has to be solved for the balanced streamfunction increment ψb′(·,·,·). However, it requires the balanced pressure increment pb′(r,·,·) which is the solution to the linear balance equation (see Bannister and Cullen, 2007):

equation image(2)

where ρ0 is a reference value of the density and f is the Coriolis force. Note that for a constant Coriolis force, i.e. f(ϕ) = f0, (2) reduces to equation image. In this case, (1) simplifies to a standard elliptic PDE, similar to the QG-Ω equation, that can be solved optimally using a novel multigrid method based on a conditional semi-coarsening strategy in Buckeridge and Scheichl (2010). Appendix C briefly explains some of the mathematical jargon related to the iterative solution of elliptic linear systems.

For global computations it is not reasonable to assume that f(ϕ) is constant, and so, due to the embedded 2D elliptic solves for pb′, the multigrid method in Buckeridge and Scheichl (2010) cannot be applied directly to the 3D problem (1). Instead we have to resort to a Krylov subspace method (such as Conjugate Gradients) and precondition it by applying multigrid to a simplified form of (1), similar to the QG-Ω equation. This preconditioning system can be solved optimally (i.e. robust with respect to grid refinement) using the method in Buckeridge and Scheichl (2010). Since, asymptotically, as the mesh size goes to zero, the simplified system has essentially got the same spectrum as the full system (1), the optimal convergence of the multigrid preconditioner for the simplified system (demonstrated already in Buckeridge and Scheichl (2010)) translates also into an optimal convergence of the resulting preconditioned Krylov method for (1). Numerical tests confirm that the number of iterations does not grow with problem size.

The Krylov method is only optimal when preconditioned with a robust multigrid method. With standard preconditioners, such as ADI-type preconditioners used at the Met Office (see Birkhoff et al., 1962), the number of iterations grows with problem size and the method actually fails to converge beyond a residual tolerance of 10−1.

The rest of the paper is organised as follows. In Section 2, the individual steps in the PV-based CVT are described. Section 3 describes the discretisation of a general class of elliptic problems in spherical polar coordinates by means of a finite volume method on the type of grids used in VAR. In particular, this will cover the simplified preconditioning system. Section 4 then gives the results for solving this system numerically using the multigrid method in Buckeridge and Scheichl (2010) confirming its optimality in solving elliptic problems in spherical polar coordinates. In Section 5, we use a Krylov subspace method to devise an optimal solver with a multigrid preconditioner for (1). We test the performance of the method and make comparisons with the method currently employed at the Met Office using actual case studies in VAR. We then use this solver to test the accuracy of the complete PV-based CVT in Section 6. Finally, in Section 7 we confirm the parallel scalability of the code on a typical multi-core architecture.

2. The potential vorticity based transformation

In this section we describe the transformation from model to control variables, as well as the inverse transformation from control to model variables for the PV-based CVT. This follows the methods set out in Cullen (2003) and Bannister et al. (2007). The discrete representation follows that used in the Met Office Unified Model (Davies et al., 2005). The domain Ω is parametrised using spherical polar coordinates,

equation image

where a ≈ 6.3 × 106m and d ≈ 6.3 × 104m denote the radius of the Earth and the height of the atmosphere, respectively. For simplicity, in the description of the CVT let us neglect orography and assume ara + d for all ϕ and λ. The domain is partitioned into staggered grids which are the Arakawa C-grid in the horizontal (cf. Arakawa et al., 1977) and the Charney-Phillips grid in the vertical (cf. Arakawa etal., 1996). The two grids in the vertical are labelled as θ- and ρ-levels, where the θ-levels occupy the top and bottom vertical boundary and the ρ-levels are located halfway between adjacent θ-levels. Each model and control variable is located at a specific point on the staggered grid, as shown in Figures 1 and 2.

Figure 1.

Arakawa C-grid used in the horizontal

Figure 2.

Charney-Phillips grid used in the vertical.

Firstly the transformation from the model to control variables is compactly formulated by the T-transform, which is defined by

equation image


equation image

Given the model variables, the T-transform consists of a succession of

  • finite difference calculations,

  • interpolations of values on the staggered grids,

  • solves of 2D and 3D elliptic problems.

It is achieved via an intermediate set of variables, namely the horizontal divergence, D, the PV and the anti-PV, denoted equation image. In the definition of the intermediate variables, we use the shallow atmosphere approximation (White et al., 2005), and assume a constant radius of a in the definition of the horizontal derivatives. The detailed sequence of steps in the T-transform is as follows:

  • Step 1:(u,v,p′) → (D,p′)The model variables u′ and v′ are located on the edges of each cell in the Arakawa C-grid and on the ρ-levels in the vertical direction on the Charney-Phillips grid. p′ is found in the centre of each cell (including the pole nodes) and also on the ρ-levels, and we call these p′-points (see Figure 3(a)). The horizontal divergence increment, D′, is calculated as the divergence of the horizontal velocity field, i.e.
    equation image
    where u′ = (u,v′) and the derivatives are calculated using finite differences. Since u′ and v′ are located on the edges of each cell, D′ is found naturally at the p′-points. Since the poles are also located at cell centres, D′ is derived at the poles using the divergence theorem.The vorticity ζ′ is the curl of the velocity field, i.e. ζ′ = ∇r×u′, and is found naturally on the corners of each cell and on ρ-levels. We call these ψ′-points (see Figure 3(b)). By the Helmholtz decomposition, the horizontal velocities can be decomposed into rotational and divergent parts. Thus, using the equation for the vorticity, the streamfunction ψ′ (located at the ψ′-points) is obtained by solving
    equation image(3)
    equation image
    is the 2D Laplacian in spherical polar coordinates. The right-hand-side in (3) is calculated using finite differences, and this 2D Poisson problem with periodic and polar boundary conditions (see Buckeridge and Scheichl, 2010) must be solved on each ρ-level. As a result the system is rank-deficient and is unique only up to a constant. The choice of constant does not affect the implied wind increments and for the experiments in this paper it is set to zero. There are no equations at the poles in the case of variables located at ψ′-points.
  • Step 2:equation imageThe potential vorticity increment PV′ is calculated from p′ and ψ′ by a finite difference calculation:
    equation image(4)
    α0, β0, γ0, ε0 and m0 are reference state values defined in Appendix A following Bannister and Cullen, 2009, Section 5.1. These are all separable functions of r and ϕ. It is natural to define PV′ at the ψ′-points because it will behave like ψ′ on small horizontal scales. Therefore p′ must be interpolated to the ψ′-points before (4) can be applied.The anti-PV increment equation image is calculated from p′ and ψ′ via finite differences from the linear unbalance equation in Bannister and Cullen (2007):
    equation image(5)
    where ρ0 = ρ0(r,ϕ) is a reference density value and f(ϕ) = −2ωcos(ϕ) is the Coriolis force calculated using the angular velocity ω = 7.292 × 10−5 rad/s of the earth. equation image is defined at the p′-points so ψ′ is interpolated before (5) is applied.
  • Step 3:equation imageThis is the key step. Only balanced components have PV, and so the ‘balanced PV equation’ (1) is solved for the balanced component of the streamfunction, i.e. ψb′. We recall from Section 1 that in order to solve (1) we require the balanced pressure, pb′, which is found by solving the family of linear balance equations (2).Problem (1) is solved using periodic and polar boundary conditions respectively on the ϕ-λ plane, and Neumann boundary conditions at the top and bottom boundaries of the atmosphere. The solutions to the 2D Poisson solves in (2) on each ρ-level are unique only up to a constant. However, if this constant is fixed, then (1) has a unique solution despite the Neumann boundary conditions, thanks to the presence of the zeroth order term in the 3D problem. Note that the value of the constant on each ρ-level is irrelevant, since only derivatives of ψb′ are of relevance. ψb′ is found at the ψ′-points.To obtain pu′, we firstly find ψu′ by subtraction, i.e. ψu′ = ψ′ − ψb′. Then pu′ can be found at the p′-points by interpolating ψu′ and solving
    equation image(6)
    on each ρ-level. There is now an arbitrary constant in the definition of pu. This has to be treated as a separate control variable for each model level so that all the degrees of freedom in the model variables are accounted for.Finally, we deduce from the Helmholtz decomposition that
    equation image(7)
    i.e. another 2D Poisson solve on each ρ-level to find χ′ at the p′-points.

This completes the T-transform. To recapitulate, we have to solve one 3D system, (1), and three sets of 2D problems, namely (3), (6) and (7), on each ρ-level.

Figure 3.

Horizontal slices of the finite volume grid in latitude/longitude with the North pole at the top and the South pole at the bottom: (a) p′-points as cell centres and (b) ψ′-points as cell centres.

In order to test whether the full cycle of the PV-based CVT works, we must be certain that the 3D solver is accurate and that no errors are carried over between variables. This will not necessarily be the case, if we use ψu′ = ψ′ − ψb′ in Step 3. To confirm the accuracy of the CVT transformation we find ψu′ directly with a separate 3D solve of the ‘unbalanced equation’

equation image(8)

This is of the same form as (1) with the same boundary conditions and with similar embedded 2D solves given in (6). It can be solved in the same way as (1) and we will do this when testing the accuracy of the complete PV-based CVT in Section 6. To use the PV-based CVT operationally this second 3D solve is not required.

Let us finally present the inverse problem, the U-transform

equation image

which proceeds as follows to find the model variables x′ = (u,v,p′) given the control variables v′ = (χb,pu′):

  • Step 1:(χb,pu′) → (χ,p′)Given ψb′ we can find pb′ by interpolating ψb′ at the p′-points and solving
    equation image(9)
    on each ρ-level. This gives pb′ at the p′-points and so we find
    equation image
    Also, given pu′, we obtain ψu by interpolating pu′ at the ψ′-points and solving
    equation image(10)
    on each ρ-level. Again
    equation image
  • Step 2:(χ,p′) → (u,v,p′)The Helmholtz decomposition yields
    equation image
    to obtain u′ and v′ at the edges of each cell using finite differences. u′ is set to zero at the poles.

This completes the U-transform. Here we had to solve two further sets of 2D problems, namely (9) and (10).

Note that both the 3D problems (1) and (8) are solved at the ψ′-points, but some of the 2D problems have to be solved at the p′-points. Thus, in order to carry out the CVT in practice, two grids are required: one whose cell centres are at the p′-points and another whose cell centres are at the ψ′-points (see Figure 3). Both p′- and ψ′-points are located on the ρ-levels, so interpolation between these points only requires averaging in the horizontal direction.

3. Finite volume discretisation

Each of the 2D and 3D elliptic problems from Section 2 are formulated in spherical polar coordinates and can be written in the general form

equation image(11)

for all ξ = (r,ϕ,λ) ∈ Ω, where Ω is the usual domain of spherical polar coordinates defined in Section 2 with boundary Ω. The coefficients of (11) are assumed to be

equation image

with separable functions Ki(ξ), for i ∈ {1,2,3}. For the problems of interest in this paper, a first order term is only present with respect to the first (the radial) coordinate direction, so we typically have a2 = a3 = 0. The differential operators in (11) are the usual gradient and divergence operators in polar form, i.e.

equation image

K is assumed to be positive definite almost everywhere (a.e.) in Ω, i.e. Ki(ξ) > 0 for all i ∈ {1,2,3} and almost all ξ ∈ Ω.

Let us now describe how to discretise (11) on Ω on the grids defined in Section 2. We must first subdivide Ω into cubes known as control volumes.

For the grid with p′-points as cell centres, we subdivide Ω into nr × nϕ × nλ cubes with cell centres

equation image

and edge lengths hλ = 2π/nλ, hϕ = π/(nϕ + 1) and hr,i, i = 1,…,nr, as well as 2 × nr cells at the poles with edge lengths 2π, π/(2nϕ + 2) and hr,i. We have ϕj = jhϕ and equation image. At the poles we use half cells, so that the poles themselves are located at the centres of the cells in the physical domain (in Euclidean coordinates), and so that a discrete equation can be derived at these points in the same fashion as at all the other points. In the radial direction, the mesh is graded as shown in Figure 2 with the cell centres located at equation image, which correspond to the ρ-levels, and with the mesh widths hr,i increasing with i. Thus, the total number of unknowns (including the unknowns at the poles) is (nλ × nϕ + 2) × nr.

For the grid with ψ′-points as cell centres, there are nr × (nϕ + 1) × nλ unknowns, with none of the unknowns located at the poles.

As in Buckeridge and Scheichl (2010) we discretise (11) by means of the finite volume method, integrating the PDE over the control volume

equation image

for each grid point (rijk), where equation image, equation image and equation image.

We impose periodic boundary conditions on the lateral boundary. In addition, for the upper and lower boundaries of the atmosphere (corresponding to r = a and r = a + d), we impose homogeneous Neumann boundary conditions, i.e.

equation image

For the second order term, i.e. −∇ · (Ku(ξ)), after integrating over each control volume, we apply the divergence theorem and use central differences to approximate derivatives across cell faces. The midpoint rule is then used to approximate the integrals, resulting in a matrix equation image. The dimension of the problem is n = (nϕ × nλ + 2) × nr on the p′-grid and n = nr × (nϕ + 1) × nr on the ψ′-grid. The matrix L is represented by the following 7-point stencil at the interior nodes (i.e. nodes whose control volume does not intersect Ω):*

equation image(12)


equation image

Similar stencils are obtained at the lateral, as well as at the top and bottom boundaries. They are given in Buckeridge and Scheichl (2010) (2010) Chapter 3. The stencils for pole cells are larger, with nλ + 2 off-diagonal entries corresponding to the nλ neighbours in the λ-direction and the two neighbours in the radial direction (with simple modifications at the top and bottom of the atmosphere).

For the first order term, i.e. a(ξ) · ∇u(ξ), the average of the forward and backward differences is used to approximate the first order derivatives across each cell, and the midpoint rule to approximate the integral. This produces a matrix equation image which is represented by the following 7-point stencil

equation image(13)


equation image

Now, given that a2 = a3 = 0 for typical problems in VAR, we obtain again very similar stencils at all the boundaries, with only a slight variation at the upper and lower boundaries of the atmosphere. Note, however, that the matrix B is non-symmetric.

Finally, the integrals of the zeroth order term and the right-hand-side are simply approximated by the midpoint rule and lead to a matrix equation image and a vector equation image.

Combining the three matrices, i.e. equation image, the complete discretisation of (11) results in a system of linear equations

equation image(14)

for equation image, the unknown solution vector corresponding to the unknown solution u at the cell centres.

4. Solving elliptic systems in polar coordinates

In this section we describe an optimal solver for the elliptic problem (1) based on a non-uniform multigrid method devised in Buckeridge and Scheichl (2010).

Standard geometric multigrid methods for simple isotropic problems use full coarsening (i.e. coarsening in each coordinate direction) and point relaxation smoothers (see Briggs and McCormick (2000) for details), and the optimal convergence of this method has been proven both experimentally and theoretically (cf. Hackbusch, 1985; Trottenberg et al., 2001). There are several variants of the method, but here we focus only on the V-cycle.

It requires a sequence of matrices A, = 1,…,F, corresponding to the PDE (11) discretised on a sequence of grids, where usually the grid on level is a uniform refinement of the grid on level − 1. It requires a smoother �� on each grid (which is commonly a simple relaxation scheme like Gauss−Seidel), as well as prolongation and restriction matrices P and R, e.g. linear interpolation and its transpose as the restriction. The number of pre- and post-smoothing steps is denoted by ν1 and ν2.

This standard method, however, is not robust for problems with anisotropy. Problem (11) discretised on the grid described in Section 2 contains two sources of anisotropy: one due to the large aspect ratio between the horizontal and vertical grid spacings; the second due to the accumulation of grid points near the pole in spherical polar type grids. Thus alternative ingredients are needed to solve it optimally. If the anisotropy has the convenient feature of being grid–aligned, i.e. aligned with the coordinate directions, two standard ways to retain optimality of multigrid are to use semi-coarsening and/or line relaxation. Line relaxation involves collectively relaxing all unknowns on an entire grid line by solving a tridiagonal system corresponding to the unknowns on that line. Semi-coarsening uses a family of coarse grids that are not coarsened equally in all coordinate directions, thus reducing the strength of the anisotropy on the coarser grids.

4.1. 2D Poisson equation in polar coordinates

Let us first consider two dimensions, where problems with grid aligned anisotropy can be written in the form

equation image(15)

with Kϕϕ(ϕ),Kλϕ(λ),Kϕλ(ϕ),Kλλ(λ) being uniformly positive almost everywhere.

If we ignore for a moment the r-dependency in our problem (11) and restrict to the two-dimensional Poisson equation in spherical polar coordinates in each r-layer of the domain, we see that this is exactly of the type (15), i.e.

equation image(16)

An optimal multigrid method for (15) is to combine line relaxation in one coordinate direction with (semi) coarsening in the other direction. This gives optimal convergence rates that can also be proven theoretically (cf. Börm and Hiptmair, 2001). However, line relaxation is more expensive than simple point relaxation (such as the Gauss-Seidel method) and so it seems possible to improve the efficiency of this method. We will now use problem (16) to motivate the key idea of a multigrid method in two dimensions that avoids the use of line relaxation by using a conditional semi-coarsening strategy.

Assuming a quasi-uniform grid such that hλhϕ, the finite volume discretisation of problem (16) on the grid introduced in Section 2 results in the following stencil at the interior nodes:

equation image(17)

Since ϕj ∈ [0], we observe a strong anisotropy near the poles caused by the spherical polar grid, where equation image and sinϕj → 0. Thus, near the poles, the entries in the λ-direction are significantly larger than the entries in the ϕ-direction. In this case we say that there is a large connection in the λ-direction and a weakconnection in the ϕ-direction (a notion coined in the algebraic multigrid literature). Near the equator, on the other hand, the problem is close to isotropic. So while semi-coarsening would be effective near the poles, it would not work near the equator. This motivates the key idea which we propose, i.e. to introduce a conditional semi-coarsening strategy that uses full uniform coarsening near the equator and semi-coarsening (in λ-direction only) near the poles.

More specifically, we compare the ratio of the off-diagonal entries in the ϕ–and λ–direction at each line of latitude. We fully coarsen that line only if the ratio is sufficiently close to 1. We observe from (17) that on a uniform mesh with hλhϕ the ratio is about sin2(ϕj). On subsequent grids this gets compensated by the factor (hλ/hϕ)2. In practice, since 0 ≤ sin2(ϕj) ≤ 1, we fully coarsen only if (hλ/hϕ)2 sin2(ϕj) is greater than equation image which in numerical experiments proved to be the optimal choice. This gives an optimal method for solving (16) on the unit sphere, and a heuristic explanation of the optimality is given in Buckeridge and Scheichl (2010).

We confirm this claim with a simple test for the two-dimensional problem. The Poisson equation (16) on the unit sphere is solved using a standard multigrid V-cycle with pointwise Gauss−Seidel smoother combined with the conditional semi-coarsening described above. The stopping criterion is the relative reduction of the residual norm by a factor 10−8. We give the CPU times and the number of iterations Nits, required for convergence. Table I shows that the time taken to solve (16) increases linearly with problem size, and that the number of iterations remains constant, which shows that the method is robust and performs optimally. Note that the coarsening factor from grid level to grid level is approximately 3.

Table I. Two-dimensional Poisson equation on the unit sphere solved using a multigrid method with conditional semi-coarsening. CPU time in seconds on a 2GHz 64-bit Intel Xeon E5462 processor with 2GB memory and 3MB Cache.
Problem sizeSolve time (s)Nits

The optimality of this method relies not only on the anisotropy in the problem being grid aligned, but also on the fact that the direction of anisotropy remains unchanged throughout the domain. In other words, the off-diagonal entry in the λ-direction, i.e. equation image, has a larger magnitude than the off-diagonal entry in the ϕ-direction, i.e. equation image, for all ϕ ∈ (0) and λ ∈ [0,2π]. If the off-diagonal entry in the ϕ-direction were significantly larger than that in the λ-direction at any point in the domain, then the method would not be optimal any longer. In this case it would be necessary to resort to line relaxation combined with semi coarsening, as outlined above.

A theoretical proof of the robustness of multigrid with conditional semi-coarsening is still missing. It seems amenable to standard proof techniques, e.g. in Hackbusch (1985), at least for the two-grid case. However, we were unable to locate such a proof in the literature, even for the simpler test problem equation image and ε ≪ 1 with point smoother and standard semi-coarsening.

4.2. 3D problems in polar coordinates

Let us now come back to the original 3D problem. The anisotropy introduced by the spherical polar grid is of the exactly same type as in two dimensions and so we adopt the same coarsening strategy in the ϕλ plane. However, there is a second source of anisotropy in three dimensions due to the large grid aspect ratio between the radial direction and the horizontal directions. In typical computations at the Met Office, the mesh widths hλ and hϕ (in radiants) are O(10−2), whereas in the radial direction we have (in m)

equation image

This implies that hr,krkhλ and hr,krkhϕ, for all k.

Let us firstly consider (11) with ac ≡ 0, K1 ≈ 10−5K2 and K2 = K3. Note that equation image, for all i, and that near the equator sin(ϕ) = ��(1). Thus, near the equator, the entries in stencil (12) at the top and bottom of the atmosphere have relative sizes

equation image

respectively. Hence the problem is anisotropic because of the size of the off-diagonal entries in the radial direction compared with those in the ϕ- and λ-directions. The off-diagonal entry in the radial direction is the smallest near the top of the atmosphere but largest near the surface of the Earth. In contrast, near the poles, where sin(ϕ) ≈ ��(10−2) for common grid resolutions, the relative entries at the top and bottom of the atmosphere are

equation image

respectively. Hence the off-diagonal entries in the radial direction are much smaller relative to the λ-direction, in particular near the top of the atmosphere.

Thus, in our 3D problem the anisotropy switches between the r- and λ-directions. For this reason (as outlined at the end of Section 4.1) it is not possible to use a point smoother and to devise a simple conditional semi-coarsening strategy for 3D that also incorporates the r-direction. Instead, we resort to r-line relaxation and do not coarsen in the the radial direction. To deal with the anisotropy introduced by the spherical polar coordinates, we use the conditional semi-coarsening strategy described in Section 4.1 above on the ϕλ plane. The r-line relaxation will take care of the strong coupling in the r-direction, whilst we already know from the 2D experiments that conditional semi-coarsening will take care of the anisotropy on the ϕλ plane, leading to optimal convergence rates for the resulting method. An outline of the theoretical proof, first given in Buckeridge and Scheichl (2010) (2010), that links the optimality of this method for 3D directly to the optimality of conditional semi-coarsening with point smoother in 2D is given in Appendix B. The proof is based on a separation of the r dependence from the ϕλ dependence, using a tensor product analysis as suggested for two dimensions by Börm and Hiptmair (2001).

Let us confirm this numerically and consider (11) with the first and zeroth order terms included. The presence of the zeroth order term means (11) is a Helmholtz-type problem. For typical problems at the Met Office c(ξ) > 0. Thus the zeroth order term is uniformly positive which will improve the conditioning of the system. In contrast, the presence of a first order term is usually detrimental to the conditioning of a system, because we see from stencil (13) that it intrinsically yields a non-symmetric operator. However, typically at the Met Office a2 = a3 = 0, and so the r-line relaxation will in fact ‘eliminate’ the first order term, so that we expect to achieve optimal convergence results with our multigrid algorithm despite the presence of the non-symmetric first-order term. We will now confirm this claim by testing the method on a particular 3D problem that will be used later to precondition (1).

In order to simplify (1) we set

equation image(18)

as if f(ϕ) and ρ0(r,ϕ) were constant in the linear balance equation (2). It is no longer necessary to solve (2) in that case and we can substitute (18) into (1) to obtain the preconditioning system

equation image(19)

As stated in the introduction, for small mesh sizes, this system has essentially got the same spectrum as the original system (1), and is thus a good preconditioner. The reason is, that in order to arrive at (18), we have only introduced some artificial low order terms in (2). To be more precise, expand

equation image(20)

Therefore, pb′ in (18) satisfies a linear balance equation similar to (2), except for some additional low order terms in ψb′. The highest order term in (20) is the same as the right-hand side of (2) and so for small mesh sizes their discretisations will have a similar spectrum.

Since the coefficients in (19) are separable, we have equation image and equation image. Therefore, we can divide (19) by equation image, such that it takes the general form (11) with

equation image

If β0 = γ0 = 0, then (19) takes a very similar form to the QG-Ω equation studied in Buckeridge and Scheichl (2010).

We now test the multigrid method proposed for (19). We employ the standard V-cycle with linear interpolation and full weighting restriction with r-line relaxation smoother (block Gauss−Seidel). We employ no coarsening in the radial direction and conditional semi-coarsening (as described above) in the ϕλ plane. For the solution on the coarsest grid we use the line relaxation again and iterate until the relative residual on that grid is reduced by 10−2. The stopping criteria on the original (finest) grid is the relative residual reduction by a factor 10−4. We use the same Intel Xeon processor as in Section 4.1. Table II shows that the CPU time required by the solver increases linearly with problem size and that the number of iterations remains constant for all problem sizes, indicating the robustness of the method for solving the preconditioning system (19).

Table II. Solving the preconditioning system (19) using multigrid with conditional semi-coarsening and r-line relaxation. CPU time in seconds on a 2GHz Intel Xeon processor.
Problem sizeSolve time (s)Nits

5. An optimal solver for the balanced PV-equation

A novel and optimal solver for the balanced PV-equation can now be devised on the basis of a Krylov subspace method for (1) by preconditioning each iteration with a multigrid solve of the simplified problem (19). System (1) is non-symmetric due to the first-order term, which means that the standard conjugate gradient method is not applicable, but we can use other Krylov subspace methods suitable for non-symmetric systems, such as the generalised conjugate residual (GCR) method, cf. Eisenstat et al. (1983), or the stabilised biconjugate gradient (Bi-CGSTAB) method, cf. van der Vorst (1992). See Appendix C for some background on Krylov subspace methods.

In addition to a preconditioner, Krylov subspace methods also require vector operations and the application of the discretised system operator. In the context of (1) this involves a sequence of finite volume calculations of the zeroth, first and second order terms, as well as the inversion of the 2D linear balance equation (2) to compute pb′ on each ρ-level. The preconditioned Bi-CGSTAB method, which we will use, requires two applications of the operator and two solves of the preconditioning system at each iteration. We solve the preconditioning system to within a relative residual tolerance of 10−4 which typically takes four iterations of multigrid, as shown in Table II. Experiments showed that any stricter tolerances do not influence the overall number of iterations required for the Bi-CGSTAB method to converge.

Table III gives the total number of Bi-CGSTAB iterations required to reduce the relative residual in (1) by 10−4 and the total CPU time required. The number of iterations does not increase with problem size. In fact, we observe that the number of iterations decreases initially. The reason for this is that the second–order term in the finite volume discretisation of (20) is less dominant for larger mesh sizes and so the preconditioner is less effective in clustering the spectrum. Once we are in the asymptotic regime the method converges in a constant number of iterations and consequently the CPU time increases approximately linearly with problem size. This shows that the method performs optimally. A similar behaviour can be expected for the GCR method (cf. Eisenstat et al., 1983) with the multigrid preconditioner.

Table III. Number of iterations and total CPU time (in seconds) for solving (1) using Bi-CGSTAB preconditioned with multigrid.
Problem SizeNitsCPU time

In comparison, the GCR method with the current preconditioner used at the Met Office, an alternating direction implicit (ADI) preconditioner (see Appendix C for more details), with otherwise identical components converges extremely slowly. Typical convergence factors are greater than 0.99 and the Krylov method stalls after reaching a relative residual tolerance of only 0.1 for most typical grid resolutions.

To finish this section, let us demonstrate the performance of our method for solving the balanced PV-equation (1) within the developmental version of the VAR code at the MET Office with PV-based CVT. The only difference to the model described above is that the radial positions of the mesh points depend on ϕ and λ to take into account the orography. Table IV gives the results for solving (1) on two different grids used for test problems in VAR: the N48 with a resolution of 96 × 73 × 70, and the N108 grid with a resolution of 216 × 163 × 70. For both test problems the relative residual tolerance was 10−2 measured using the infinity norm. We compare Bi-CGSTAB preconditioned with our novel multigrid method and GCR preconditioned with ADI. In both cases, the Krylov method with ADI preconditioner performs very poorly compared to that preconditioned with multigrid. On the N108 grid it stalls after reaching a relative residual of about 0.1.

Table IV. Solving (1) within the VAR code on the N48 grid (above) and on the N108 grid (below). Comparing Bi-CGSTAB preconditioned with multigrid and GCR preconditioned with ADI. CPU time in seconds.
Iterations15 176
CPU time / iteration4.221.99
CPU time (incl. setup) 68.14350
Iterations11 Stalled
  after 30
CPU time / iteration66.116.4
CPU time (incl. setup)765N/A

6. Accuracy of the CVT

Now that we can solve (1) to any level of accuracy, it is possible to implement the complete set of transformations. Given values for a set of control variables v′, we test the accuracy of the transformations by applying the U-transform followed by the T-transform and compare the final values of the control variables after the full cycle of the CVT with the original values. In other words we determine how close TU is to the identity operator I.

The values we choose for the control variables for our test are

equation image

Table V shows the results of the CVT, where error(X) denotes the relative difference between the initial value Xinit of the control variable X and its final value Xfinal after the full cycle of the transformations, measured in the Euclidean norm, i.e. error(X)=∥XfinalXinit2/∥Xinit2, where ∥ · ∥2 is the Euclidean norm. The results indicate that the CVT is implemented accurately, with the error between the initial and final values of each control variable converging slightly faster than linearly with respect to the mesh width. Note that this is of course only an expected rate of convergence of the errors, since the interpolation and discretisation operators used are only second-order accurate with respect to the mesh width.

Table V. The magnitude of errors, measured using the Euclidean norm, in the PV-based CVT for each control variable.
Problem Sizeerror(ψb)error(pu′)error(χ′)

7. Parallel results

The parallelisation of the method is described in detail in buckeridge, Buckeridge and Scheichl (2010) (2010). A ghost point (or halo) strategy is used for communication between processors. It is implemented using the message passing interface (MPI). To partition the domain we subdivide it in the latitudinal and longitudinal directions, as currently done in VAR, to ensure an efficient application of the r-line smoother (without communication) which is the most costly component of the multigrid method. Any numbers of subdivisions in the latitudinal and longitudinal directions are admissible.

A commonly used test for how well the implementation performs on large numbers of processors is a weak scalability test, where the problem size per processor is kept fixed as the number of processors is increased. A method has optimal parallel scalability, if the CPU time in this test remains constant as the number of processors is increased. In Buckeridge and Scheichl (2010), we tested the parallel, non-uniform multigrid code for a general form of the QG-Ω equation on a cluster of 2GHz 64-bit Intel Xeon E5462 processors connected via an Infinipath network (Aquila, University of Bath). We observed close to optimal parallel scalability up to 256 processors, as shown in Figure 4. The weak scalability for the preconditioning system (19) is identical.

Figure 4.

Scaled efficiency (weak scalability) on Aquila (University of Bath), solving the QG-Ω equation (similar to (19)) with parallel multigrid. Problem size 192 × 120 × 50 on each processor.

8. Conclusions

In this paper, we are for the first time able to carry out a PV-based control variable transformation for finite volume discretisations of atmospheric flow with operationally realistic grid resolutions. The proposed numerical method, which is based on a preconditioned Krylov method and a novel multigrid preconditioner, is very efficient, fully robust to grid refinement and shows optimal parallel scalability up to 256 processors. The method could also be used for other three-dimensional elliptic problems that arise in atmospheric data assimilation or modelling. An example would be a general form of the QG-Ω equation solved by Buckeridge and Scheichl (2010).


A. Definitions of variables used in PV equation

The coefficients in the balanced PV-equation (1) are:

equation image

where R, cp and g are the gas constant, the specific heat capacity and the gravitational force, respectively. κ is a dimensionless constant given by equation image. The variables θ, Π, p denote potential temperature, Exner and pressure, respectively. The ‘0’ subscripts denote reference state quantities that are averaged zonally, such that they become functions of latitude and radius only. Variables with a ‘hat’ denote quantities that have been vertically interpolated onto an intermediate level.

B. Theoretical considerations

In this section, we outline the theory developed in Buckeridge and Scheichl (2010) (2010), Section 5.4. It establishes a rigorous link between the robustness of the non-uniform multigrid method for 3D problems of the form (11) with r-line smoother and no radial coarsening discussed in Section 4.2 and the robustness of the corresponding non-uniform multigrid method for 2D elliptic problems with point smoother discussed in Section 4.1. It exploits the tensor product structure of the problem and of the grids to extend the 2D theory in Börm and Hiptmair (2001) to three dimensions. The rigorous proof relies on a discretisation with trilinear finite elements, but using quadrature as well as an estimate of the quadrature error based on a theoretical result sometimes referred to as ‘Strang's 2nd Lemma’ (cf. Ciarlet (1978)), the results apply also to the above finite volume discretisation of (11) on the same mesh.

Because of the tensor product structure, we can expand the r-dependent part of the finite element approximation of the solution u of (11) in an eigenbasis leading to a natural splitting into nr subspaces (similar to spectral methods). Since we do not coarsen in the r-direction, we can expand in the same basis on all of the grids, leading to a nested sequence of subspaces for each eigenvector. This block–diagonalises the system matrix on each grid into a family of 2D problems in the ϕλ plane. Since we employ r-line relaxation, it also block–diagonalises the smoother, leading to the following theorem. (See Buckeridge and Scheichl (2010) (2010) for a more precise statement and a complete proof.)

Theorem.Using r-line smoothing and no coarsening in the radial direction, atensorproduct multigrid method for finite element approximations of (11) convergesuniformly if, and only if, the corresponding 2D multigrid method with point smootheron the ϕλplane converges uniformly.

We have seen experimentally in Section 4.1 that the 2D multigrid method with point smoother and conditional semi-coarsening in the ϕλ plane converges uniformly. A rigorous proof of this uniform convergence of the 2D method is still lacking, but buckeridge, Buckeridge and Scheichl (2010) (2010) contains some convincing heuristic explanations. This suggests the conditions of the above theorem are satisfied, thus guaranteeing also uniform convergence of the 3D multigrid method with conditional semi-coarsening in the ϕλ plane.

C. Iterative methods for linear systems of equations

Consider the linear system

equation image(21)

to be solved for equation image, given equation image and equation image. Let u(0) be an initial guess to the solution.

  • Krylov subspace methods. A Krylov subspace method is an iterative method for (21) that seeks at the mth iteration an approximation u(m) to u in the space u(0)+��m(A,r(0)), where

    equation image

    and r(m) = bAu(m) is the mth residual. The particular iterate u(m) is chosen such that

    equation image

    for some (usually different) m-dimensional subspace ℒm, where ⟂ denotes orthogonality. The subspace ��m(A,r(0)) is known as the Krylov subspace. There are many different choices of Krylov subspace methods, each arising from a different choice of ℒm (see Saad, 2003, Chapters 6 and 7, for details).

  • Conjugate gradient method. This is a Krylov subspace method for symmetric positive definite (SPD) systems. It arises from the choice ℒm = ��m(A,r(0)). The conjugate gradient (CG) method is the best known Krylov subspace method and one of the most popular iterative techniques for SPD systems.

  • Biconjugate gradient stabilised (Bi-CGSTAB). This is a Krylov subspace method for general, non-symmetric linear systems, where ℒm = ��m(AT, equation image ) for some initial residual equation image. The space ℒm is associated with the dual system ATu = b. Bi-CGSTAB is a stabilised version of Bi-CG proposed in van der Vorst (1992), which corresponds essentially to applying the CG method to ATAu = ATb. It avoids the use of AT and achieves fast convergence rates for non-symmetric linear systems. However, it does require two applications of A and two applications of the preconditioner.

  • Generalised conjugate residual (GCR). This is an alternative Krylov subspace method for non-symmetric linear systems currently used operationally at the Met Office (see Davies et al. (2005) for details). Its convergence is usually slightly slower than that of Bi-CGSTAB, but it only requires one application of A and one application of the preconditioner. It is mathematically equivalent to another popular Krylov subspace method, the generalized minimal residual (GMRES) method.

  • Preconditioning. Krylov subspace methods are usually applied in conjunction with a preconditioner since their convergence depends on the distribution of the spectrum of A. A preconditioner is a matrix P, such that P−1A has a more clustered spectrum than A and such that P−1 is relatively cheap to apply. The Krylov subspace method is then applied to one of the two systems (P−1A)u = P−1b or (AP−1)(Pu) = b.

  • Relaxation methods. Relaxation methods, such as the Jacobi or the Gauss-Seidel method, are the simplest iterative methods for les. They are usually very slow to converge, especially for large systems. However, they are often used as preconditioners in conjunction with Krylov subspace methods. Block variants, such as line relaxation, are very popular for highly anisotropic linear systems, such as the elliptic problems arising in 3D atmospheric flow.

  • Alternate direction implicit (ADI) method. This is an iterative method or a preconditioner built on line relaxation. It alternates the directions in the line relaxations to achieve better preconditioning in cases where the anisotropy is not grid-aligned and/or varies throughout the domain. As described in Davies et al. (2005), it is the preconditioner currently used in the Met Office model.

  • Two-grid methods. The basic strategy for two grids is summarised as follows 7 (cf. Briggs and McCormick, 2000; Trottenberg et al., 2001):

    • 1.Apply a simple iterative method, a smoother, to an initial approximation to eliminate the high-frequency components in the error.
    • 2.Restrict the residual onto a coarser grid and solve the residual equation directly or iteratively on this grid to obtain a coarse grid correction for the error.
    • 3.Interpolate the coarse grid correction back onto the original (fine) grid and update the approximation to the solution.
    • 4.Repeat steps 1–3 until a desired convergence criterion is satisfied.
  • Multigrid V-cycle. Recursively applying the above two-grid strategy on a sequence of coarse grids leads to the multigrid method. The V-cycle is the cheapest version, where each grid is only visited once per iteration. On the coarsest grid, the residual equation is solved either directly or by applying several iterations of the smoother. Multigrid methods are also very effective preconditioners for Krylov subspace methods. Detailed descriptions of the multigrid V-cycle are given in Briggs and McCormick (2000) and Trottenberg et al. (2001), whilst a rigorous convergence analysis of the method is found in Hackbusch (1985).

  • *

    The notation for the 7-point stencil is taken from Barros (1991). Values in square brackets give the 5-point stencil on the ϕλ plane. The numbers outside the brackets denote the off-diagonal entries corresponding to the two neighbours in the radial direction. equation image denotes the sum of all the off-diagonal entries in the stencil.