On the diffusion equation and its application to isotropic and anisotropic correlation modelling in variational assimilation

Authors


Abstract

Differential operators derived from the explicit or implicit solution of a diffusion equation are widely used for modelling background-error correlations in geophysical applications of variational data assimilation. Key theoretical results underpinning the diffusion method are reviewed. Solutions to the isotropic diffusion problem on both the spherical space equation image and the d-dimensional Euclidean space equation image are considered first. In equation image the correlation functions implied by explicit diffusion are approximately Gaussian, whereas those implied by implicit diffusion belong to the larger class of Matérn functions which contains the Gaussian function as a limiting case. The Daley length-scale, defined as equation image where ∇2 is the d-dimensional Laplacian operator and r = |r| is Euclidean distance, is used as a standard parameter for comparing the different isotropic functions c(r). Diffusion on equation image is shown to be well approximated by diffusion on equation image for length-scales of interest. As a result, fundamental parameters that define the correlation model on equation image can be specified using more convenient expressions available on equation image.

Anisotropic Gaussian or Matérn correlation functions on equation image can be represented by a diffusion operator furnished with a symmetric and positive-definite diffusion tensor. For anisotropic functions c(r), the tensor equation image where ∇ is the d-dimensional gradient operator, is a natural generalization of the (square of) the Daley length-scale for characterizing the spatial scales of the function. Relationships between this tensor, which we call the Daley tensor, and the diffusion tensor of the explicit and implicit diffusion operators are established. Methods to estimate the elements of the local Daley tensor from a sample of simulated background errors are presented and compared in an idealized experiment with spatially varying covariance parameters. Since the number of independent parameters needed to specify the local diffusion tensor is of the order of the total number of grid points N, sampling errors are inherently much smaller than those involved in the order N2 estimation problem of the full correlation function. While the correlation models presented in this paper are general, the discussion is slanted to their application to background-error correlation modelling in ocean data assimilation. Copyright © 2012 Royal Meteorological Society

1. Introduction

Various methods have been proposed for modelling background-error correlations in geophysical applications of variational data assimilation (VDA) (see Bannister, 2008, for example, for a thorough review of methods used in atmospheric VDA). In ocean VDA, background-error correlation models based on the diffusion equation are popular. The method has its origins in the work of Derber and Rosati (1989), who proposed the use of an iterative Laplacian grid-point filter in order to approximate a Gaussian correlation operator. Egbert et al. (1994) described a close variant of the algorithm in which the Laplacian grid-point filter could be interpreted as a pseudo-time-step integration of a diffusion equation with an explicit scheme. Weaver and Courtier (2001) (hereafter WC01) described the algorithm in more detail and proposed various extensions to account for more general correlation functions than the quasi-Gaussian of the original Derber and Rosati (1989) algorithm. Correlation models based on explicit diffusion methods have been used in various VDA systems in oceanography (Weaver et al., 2003; Di Lorenzo et al., 2007; Muccino et al., 2008; Daget et al., 2009; Kurapov et al., 2009; Moore et al., 2011), meteorology (Bennett et al. 1996), and atmospheric chemistry (Geer et al., 2006; Elbern et al., 2010).

An explicit diffusion scheme is appealing because of its simplicity, but can be expensive if many iterations are required to keep the scheme numerically stable. This can occur when the local diffusion scale is ‘large’ relative to the local grid size. To keep the explicit scheme affordable, the correlation length-scales must be bounded even if statistics or physical considerations suggest that larger values would be more appropriate. This limitation can be overcome by reformulating the diffusion model using an implicit scheme which has the advantage of being unconditionally stable.

One-dimensional (1D) implicit diffusion operators have been used for representing temporal and vertical correlation functions (Bennett et al., 1997; Chua and Bennett, 2001; Ngodock, 2005) and products of 1D implicit diffusion operators have been used for constructing two-dimensional (2D) and three-dimensional (3D) correlation models (Chua and Bennett, 2001; Zaron et al., 2009). The correlation kernels associated with the 1D implicit diffusion operator belong to the family of Mth-order autoregressive (AR) functions where M is the number of implicit iterations (Mirouze and Weaver, 2010; hereafter MW10). As discussed by MW10, the 1D implicit diffusion operator is closely linked to the recursive filter (Lorenc, 1992; Hayden and Purser, 1995), which has been developed extensively in meteorology for constructing correlation models in multiple dimensions (Wu et al., 2002; Purser et al., 2003a, 2003b; Liu et al., 2007). The recursive filter has also been employed in ocean data assimilation systems (Martin et al., 2007; Dobricic and Pinardi, 2008; Liu et al., 2009).

The 1D implicit diffusion approach for constructing 2D and 3D correlation models can be convenient for computational reasons, but has limitations. For example, with few iterations, the product of 1D implicit diffusion operators produces a well-known spurious anisotropic response (Purser et al., 2003a). Unphysical features can also appear near complex boundaries, such as coastlines or islands in an ocean model, where correlation functions cannot always be reasonably represented by a product of separable functions of the model's coordinates. Correlation models based on 2D or 3D implicit diffusion operators can overcome these limitations but are more difficult to implement since they involve the solution of a large linear system (matrices of dimension equation image or larger in VDA). Some progress in the development of this approach has been made by Weaver and Ricci (2004) and Massart et al.(2012), who used sparse matrix methods to solve a 2D implicit diffusion problem directly, and by Carrier and Ngodock (2010) and S. Gratton (2011, personal communication), who used iterative methods based on conjugate gradient or multi-grid to approximate the solution of a 2D or 3D implicit diffusion problem.

Multidimensional implicit diffusion correlation operators can be interpreted in terms of smoothing norm splines, which were introduced to atmospheric data assimilation by Wahba and Wendelberger (1982) and Wahba (1982), and discussed within an oceanographic context by McIntosh (1990). In the norm spline approach, the background term of the cost function in VDA is formulated in terms of a linear combination of weighted derivative operators that penalize explicitly the amplitude and curvature of the solution. When the weighting coefficients are given by binomial coefficients, the inverse of the background-error correlation operator implied by the norm spline can be expressed as the inverse of an implicit diffusion operator. The direct penalty approach was popular in some of the early studies of four-dimensional VDA (Thacker, 1988; Sheinbaum and Anderson, 1990) but generally leads to a poorly conditioned minimization problem (Lorenc et al., 2000). Effective preconditioning techniques for VDA require access to the background-error covariance operator itself. An interesting exception is the recent study of Yaremchuk et al. (2011), who propose a variational formulation in which the inverse of the background-error covariance is modelled directly using the inverse of a low-order (two-iteration) 3D implicit diffusion operator. No apparent conditioning problems were reported in their examples from an ocean VDA system.

The present paper has a dual purpose: first, to provide a review of the diffusion equation as a basis for constructing anisotropic and inhomogeneous correlation models for data assimilation; and second, to illustrate how fundamental parameters that control spatial smoothness properties of these models can be estimated using ensemble methods. Section 2 brings together key results from data assimilation and geostatistics on the isotropic diffusion problem. Diffusion is considered both on the sphere and in the d-dimensional Euclidean space. Analytical expressions for the isotropic correlation functions implied by appropriately normalized explicit and implicit diffusion in these spaces are presented and compared. The Daley length-scale is used as a standard parameter for comparing the different functions, and expressions relating it to the parameters of the diffusion-model are established.

The results from section 2 provide the foundation for building anisotropic correlation models with the diffusion equation. This is discussed in sections 3 and 4. The Daley tensor is introduced, which is defined as the negative inverse of the tensor of second derivatives of the correlation function evaluated at zero distance (the Hessian tensor). The Daley tensor is an anisotropic generalization of the Daley length-scale. Expressions relating the Daley tensor to the diffusion tensor of the diffusion models are given. Section 4 discusses techniques for estimating the Daley tensor from statistics of a sample of simulated errors such as those that would be available from an ensemble data assimilation system. Idealized experiments are then presented to compare the effectiveness of two of the estimation techniques. Conclusions are given in section 5. Appendix A provides a derivation of the relationship between the Daley and diffusion tensors for the correlation functions represented by the implicit diffusion equation in equation image. Appendix B provides a derivation of the key formulae for estimating the Daley tensor.

2. Isotropic diffusion

Coordinate systems of global atmospheric and ocean models refer to the spherical-shell geometry of the atmosphere and ocean. From a mathematical perspective, this leads naturally to consideration of 2D ‘horizontal’ correlation functions on the spherical space equation image. The product of a 2D correlation function on equation image and a 1D correlation function on the bounded subset of the Euclidean space equation image is commonly used to construct 3D correlation functions on the spherical-shell subspace of equation image that defines the model domain. This approach of separating the horizontal and vertical correlation functions is usually justified by the fact that the global atmospheric and ocean circulations are characterized by scales that are much larger in the horizontal direction (along geopotential surfaces) than in the vertical direction (perpendicular to geopotential surfaces). In the remainder of this section, the correlation functions that can be represented by isotropic diffusion on equation image and the general Euclidean space equation image are described. Table 1 provides a brief description of the main symbols used in this section.

Table 1. A list of the main generic symbols used in section 2. The specification of the superscripts α and β is summarized in the bottom table. A quantity in equation image is supplemented with a subscript d if it depends explicitly on the dimension of the space; otherwise it is omitted.
SymbolDescription
equation image, equation image Correlation operators on equation image and equation image
Cα (x,x′) General correlation function on equation image
x Vector of Cartesian coordinates
equation image Isotropic correlation function on equation image
r  Euclidean distance
equation image Fourier transform of equation image
equation image Vector of spectral wave numbers
cβ (θ) Isotropic correlation function on equation image
θ  Angular separation
equation image Legendre coefficients for cβ (θ)
n  Total wave number
Dα, Dβ Daley length-scale of equation image and cβ (θ)
equation image, γβ Normalization constants on equation image and equation image
Superscript Description
α gRegular diffusion on equation image
 wImplicit diffusion on equation image
β sRegular diffusion on equation image
 hImplicit diffusion on equation image

2.1. Explicit diffusion on equation image

Consider the 2D diffusion equation applied to the scalar field η(λ,ϕ,s):

equation image(1)

where κ > 0 is a diffusion coefficient, and

equation image

is the Laplacian operator in geographical coordinates (λ,ϕ), λ denoting longitude (0 ≤ λ ≤ 2π), ϕ latitude (−π/2 ≤ ϕπ/2), and a the radius of the sphere (the Earth's radius in our case). In the context of this paper, s is to be interpreted as a dimensionless pseudo-time coordinate. The diffusion coefficient then has physical units of length squared. The solution of Eq. (1) on equation image can be interpreted as a covariance operator (e.g. see WC01). Let

equation image(2)

denote the initial condition, where γs is a normalization constant. The solution at some s > 0 can be expressed as the integral operator equation image,

equation image(3)

where cs (θ) is an isotropic function that depends on the angular separation θ, 0 ≤ θπ, between points (λ,ϕ) and (λ′) on the sphere:

equation image(4)

The normalization constant γs in Eq. (2) has been absorbed into the function cs (θ) which has the specific form

equation image(5)

where

equation image(6)

n being the total wave number, and equation image the Legendre polynomials, normalized such that equation image, following the usual convention in meteorology (Courtier et al., 1998). All isotropic covariance functions on equation image can be expressed, as in Eq. (5), as an expansion in terms of the Legendre polynomials (Weber and Talkner, 1993; Theorem 2.11 of Gaspari and Cohn, 1999). They are positive-definite functions on equation image if the spectral coefficients are positive, which is clearly the case for all of the coefficients equation image. Equation (3) is thus a valid covariance operator on equation image.

The covariance function is readily transformed into a correlation function (cs (0) = 1) by defining the normalization constant as

equation image(7)

The fundamental parameter controlling the shape of the correlation function is the product κs in Eq. (6). To define the length-scale of cs (θ), we use the standard definition from Daley (1991, p. 110), the geometrical interpretation of which is discussed by Pannekoucke et al. (2008). For cs (θ), the Daley length-scale reads

equation image(8)

Equations (6) and (8) provide a relationship between κs and Ds that allows us to control the correlation shape (length-scale).

Now consider a discretized version of Eq. (1) in which the first-order derivative is approximated using a forward-Euler (explicit) scheme. This yields

equation image(9)

where m is a positive integer, Δs = smsm−1 is the step size, and ∇2 is understood to be the Laplacian operator in discretized form. For convenience, we can assume that sm = m so that the step size Δs = 1. This parameter can thus be ignored hereafter without loss of generality. Repeated applications of Eq. (9) on the interval 0 < mM leads to the linear operator

equation image(10)

where η(λ,ϕ,0) is given by Eq. (2). For clarity we let

equation image(11)

to emphasize that the coefficient is positive and can be interpreted as the square of a scale parameter.

The key idea is that, on a numerical grid, the effect of the integral correlation operator (3) on an arbitrary scalar field equation image can be approximated by applying a discretized differential operator (10). This is the essence of the original Derber and Rosati (1989) scheme. The parameter κs of the correlation function cs (θ) can be related to the parameters M and κ of Eq. (10) by noticing that κs = κM = ML2. In practice, it is customary to prescribe the Daley length-scale (Ds). Given Ds, the product ML2 can be determined by a non-trivial inversion of Eq. (8). This has been done by trial and error for the illustrative examples presented in this paper. To determine M and L2 from the product ML2, we have an additional requirement that M must be sufficiently large (L2 sufficiently small) in order to maintain the numerical stability of the explicit scheme. Provided M is not too ‘large’, applying the discretized operator (10) is an efficient way of evaluating the integral operator (3). What defines an acceptable value of M will depend on the application.

To represent a larger family of correlation functions than Eqs (5) and (6), WC01 proposed a generalized diffusion model in which the scaled Laplacian in Eq. (1) is replaced by a linear combination of powers of scaled Laplacians:

equation image(12)

where the diffusion coefficients κp > 0 can be related to a general set of scale parameters Lp via the equation

equation image(13)

The resulting correlation functions have the same basic form as Eq. (5) but with the equation image given by

equation image(14)

and the appropriate modification to γs to produce a unit-amplitude function. Equation (6) is a special case of Eq. (14) with P = 1. Unlike the standard diffusion model, the generalized diffusion model can be used to represent correlation functions that change sign, as illustrated in Figure 1 of WC01. This is an appealing feature if there is compelling evidence of negative correlations in the error fields, although representing them with powers of Laplacian operators would clearly increase the cost of the correlation model.

Figure 1.

The grid-point values ch (r(θ)) (upper panel) and the variance-power spectra equation image (lower panel) of sample correlation functions generated with Eqs (5) and (30) using different values of M. The scale parameters have been set to L = 353km, L = 250km and L = 125km for the functions corresponding to M = 3 (dashed-dotted curves), M = 4 (dashed curves) and M = 10 (dotted curves), respectively, in order to achieve a common Daley length-scale of Dh = 500km (see Eq. (8)). The Gaussian correlation function cg (r(θ)) (Eq. (24)) with Dg = 500km is shown for reference (thick solid curves). The correlation functions in the upper panel are plotted as a function of chordal distance (Eq. (23)). A spectral truncation at n = 500 has been used. The lower panel is plotted on a log-log scale.

2.2. Explicit diffusion on equation image

Now consider the diffusion equation (1) on the d-dimensional Euclidean space equation image, where ∇2 now represents the Laplacian operator in Cartesian coordinates x = (x1,…,xd). While our particular interest concerns the spaces equation image, equation image and equation image, it is easier to consider them as special cases of the general diffusion problem in equation image. The initial condition of the diffusion problem can be written as

equation image(15)

where equation image is a normalization constant and equation image is assumed to be bounded at infinity. Using the Fourier transform (FT), the solution at ‘time’ s > 0 can be written as a convolution operator equation image:

equation image(16)

where C(x,x′) = cg (r) is the Gaussian function

equation image(17)

r = |xx′| being the Euclidean distance between points x and x′ on equation image. Setting the normalization factor to

equation image(18)

ensures that cg (0) = 1.

The Daley length-scale for any twice differentiable, isotropic correlation function c(r) in d dimensions is given by

equation image(19)

where ∇∇T is the outer product of the d-dimensional gradient operator ∇ = (∂/∂x1 …∂/∂xd)T and its transpose. The quantity within the trace operator is the correlation Hessian tensor (Chorti and Hristopulos, 2008). The Hessian tensor plays a fundamental role in characterizing the anisotropic correlation functions described later in this paper (sections 3 and 4). For the d-dimensional Gaussian function, the Daley length-scale is

equation image(20)

In terms of Dg, the normalization factor is

equation image(21)

As before, we can approximate Eq. (16) with a differential operator based on a discretization of the diffusion equation using an M-step explicit scheme. In terms of the parameters M and L2 of the explicit diffusion operator, Eqs (20) and (21) become

equation image(22)

Let us consider now the interpretation of the Gaussian function on equation image. First, since equation image is embedded in equation image, a valid isotropic correlation function on equation image can always be constructed from a valid isotropic correlation function in equation image by restricting x = (x1,x2,x3) and equation image to be points on the sphere. Expressing these points in geographical coordinates x = (acosϕcosλ,acosϕcosλ,asinϕ) and x′ = (acosϕ′cosλ,acosϕ′sinλ,asinϕ′) leads to the chordal distance measure

equation image(23)

where cosθ is given by Eq. (4). The Gaussian correlation function on equation image confined to the subspace equation image is thus

equation image

From Eq. (23) we notice that r depends only on cosθ, or alternatively θ, and that cosθ = 1 − r2/2a2, where 0 ≤ r ≤ 2a. We also recall that all isotropic correlation functions on equation image can be expressed as a Legendre expansion that depends only on cosθ (Eq. (5)). It is then possible to represent any isotropic correlation function on equation image as a function of either r or θ.*

In particular, consider the representation of the Gaussian on equation image in terms of the Legendre polynomials. As shown in WC01:

equation image(24)

where

equation image

In+1/2(ω) denoting the modified Bessel function of fractional order n + 1/2, and ω = (a/Dg)2. In view of the results on equation image, one might expect that the correlation kernel cs (θ) implied by diffusion on equation image (Eq. (5)) is similar to the Gaussian correlation function (24) on equation image. Indeed, for a given length-scale Dg, it is possible to find a corresponding parameter κs in Eq. (6) such that the difference between cs (θ) and cg (θ) is ‘small’ (Roberts and Ursell, 1960; Hartman and Watson, 1974). In particular, consider the scales of interest in atmospheric and ocean data assimilation for which ω ≫ 1. Matching the n = 0 coefficients equation image and equation image of the Legendre polynomials and noting that

equation image

for large ω, we obtain the approximation to the normalization factor

equation image(25)

Now matching the n = 1 coefficients equation image and equation image and using (25) leads to the approximation

equation image(26)

WC01 illustrate the excellent agreement between cs (θ) and cg (θ), particularly for the large scales, for a given length-scale Dg and with κs approximated according to Eq. (26) (see their Figure A1).

Equations (25) and (26) are none other than those derived earlier for the diffusion problem in equation image (cf. Eqs (21) and (20) with d = 2). In other words, for length-scales small compared to the radius of the Earth, we obtain the somewhat intuitive result that diffusion on the sphere (the equation images operator) is well approximated by diffusion on the 2D Cartesian plane (the equation image operator). For calibrating the correlation model, it is then possible to employ the simple expressions (26) and (25) for the length-scale and normalization factor in place of the more complicated expressions (8) and (7).

There are two main drawbacks with the generalized explicit diffusion model of WC01. First, the correlation functions that can be represented by the model have limited flexibility in the spectral domain, especially at high wave numbers where their decay rates are at least as fast as that of the Gaussian function. In data assimilation, this can result in excessive smoothing of small-scale features in the analysis (Purser et al., 2003b). Second, the explicit scheme is subject to a stability criterion that depends on the ratio of the length-scale and grid size, raised to the power of 2P. As a result, many iterations may be required when the length-scale is large compared with the grid resolution. With the variable coefficient and anisotropic versions of the model discussed later, the computational cost of the algorithm can be especially high. A diffusion model based on an implicit formulation can overcome these limitations, as described next.

2.3. Implicit diffusion on equation image

Consider again the diffusion equation (1) but this time discretized using a backward-Euler (implicit) scheme:

equation image(27)

where, as in Eq. (10), we can assume sm = m and hence Δs = 1, and interpret κ as the square of a scale parameter (Eq. (11)). Rearranging Eq. (27) and applying it repeatedly on the interval 0 < mM leads to the ‘reverse-time’ or inverse diffusion operator

equation image(28)

Equation (28) can be interpreted as a roughening operator as opposed to the diffusion operator itself, which is a smoothing operator.

Following Eq. (2), we define the initial condition as

equation image(29)

where γh is a normalization constant. Weaver and Ricci (2004) show that the differential operator equation image is the inverse of a correlation operator equation image, where the latter is given by an integral equation of the form (3), with isotropic correlation function ch (θ) of the Legendre form (5) as its kernel. The spectral coefficients of ch (θ) are strictly positive and given by

equation image(30)

The normalization factor is

equation image(31)

and the Daley length-scale is given by Eq. (8) with equation image replaced by equation image.

In the explicit diffusion model, the only free parameter was the product κsM = ML2 which controls the spatial scale of the quasi-Gaussian correlation kernel (Eq. (6) with s = sM). The implicit diffusion model, on the other hand, allows for greater control of the shape characteristics of the associated correlation kernels since both L2 and M are free parameters. Numerically, this extra flexibility is reflected by the important property of unconditional stability of the implicit scheme. In the limiting case of M → ∞, with ML2 held fixed, the spectral coefficients (30) reduce to those of the quasi-Gaussian solution which is the only correlation function that can be represented by solving the diffusion equation explicitly.

The upper panel in Figure 1 displays correlation functions ch (r(θ)) for different values of M and a constant Daley length-scale (500 km). The values are plotted as a function of chordal distance r(θ). The Gaussian function cg (r(θ)) is also shown for reference. Increasing the value of M decreases the ‘fatness’ of the tail of ch (r(θ)), with the Gaussian providing the upper limit as M → ∞. The total variance of ch (r(θ)) and cg (r(θ)) is given by their value at the origin, which is equal to one. The coefficients equation image and equation image give the contribution of each wave number n to the total variance of ch (r(θ)) and cg (r(θ)), respectively, and thus define the variance-power spectra. The lower panel in Figure 1 shows a log-log plot of this spectra as a function of n. Here we see that the increased fatness in correlation shape for low values of M is associated with higher variance and a reduced damping rate in the small scales, slightly less variance in the intermediate scales, and increased variance in the large scales.

As with the generalized diffusion equation, a linear combination of powers of scaled Laplacian operators (12) can be introduced in Eq. (28) to yield a larger family of correlation functions, but at extra cost. The spectral coefficients of this larger family are given by

equation image(32)

with γh modified accordingly so that ch (0) = 1. The smoothing spline functions introduced by Wahba (1982) correspond to the special case of Eqs (5) and (32) for which M = 2.

Increasing the degree P of the polynomial of the Laplacian leads to correlation functions that oscillate about the zero axis. This is illustrated in the upper panel of Figure 2, where the generalized ch (r(θ)) are displayed with different values of P but a fixed value of M = 4. The amplitude of the negative lobes increases with increasing value of P. In spectral space, the negative lobes are associated with a decrease in variance in the large scales and an increase in variance in the intermediate scales. Increasing the value P also leads to a steepening of the decay rate of the variance in the smaller scales.

Figure 2.

As Figure 1 but for sample correlation functions generated with Eqs (5) and (32) with a fixed value of M = 4 and different values of P. The scale parameters have been adjusted to yield a common Daley length-scale of 500 km: L1 = 250km with P = 1 (dashed-dotted curves), L1 = 0 and L2 = 206km with P = 2 (dashed curves), L1 = L2 = 0, L3 = 209km with P = 3 (dotted curves).

A straightforward variant of Eq. (32) that can be used to enhance the oscillations while maintaining a gradual spectral decay rate at high wave numbers is

equation image(33)

where ρp is a dimensionless coefficient that can take on both negative and positive values. This is equivalent to redefining the diffusion coefficients (13) as κp = ρpL2p. Equation (33) yields positive coefficients by restricting M to be even. Examples are shown in Figure 3 for the case P = 2 and M = 2, and a single scale parameter L1 = L2 = L. Here ρ2 has been set to one and negative values have been used for ρ1. Increasing the magnitude of ρ1 results in a significant increase in the amplitude of the oscillations and a much sharper spectral peak at intermediate scales. Notice that by setting ρ1 = 2 we recover the non-oscillatory correlation function governed by Eq. (30) with M = 4, which is displayed in Figure 1 (dashed curves).

Figure 3.

As Figure 1 but for sample correlation functions generated with Eqs (5) and (33) with a fixed value of M = 2, P = 2 and ρ2 = 1, and different values of ρ1. A single scale parameter L1 = L2 = L has been used and adjusted to yield a common Daley length-scale of 500 km: L = 308km with ρ1 = −1 (dashed-dotted curves), L = 326km with ρ1 = −1.5 (dashed curves), and L = 340km with ρ1 = −1.8 (dotted curves).

On a numerical grid, equation image can be approximated by a discrete operator that solves the linear system (28)–(29) for a given right-hand side equation image. We refer to equation image as an implicit diffusion correlation operator. Although the cost of each iteration of an implicit diffusion operator will generally increase relative to that of the explicit scheme, the total cost of the implicit algorithm can easily decrease through the possibility of performing significantly fewer iterations.

2.4. Implicit diffusion on equation image

The starting point is the following general fractional differential operator equation image (Whittle, 1954, 1963; Guttorp and Gneiting, 2006):

equation image(34)

where equation image is assumed to be bounded at infinity, ν > 0 is a smoothness parameter, and equation image is a normalization constant that depends on the dimension of the space. The FT of Eq. (34) gives the relation

equation image(35)

where equation image and equation image denote the FTs of ψ(x) and equation image, respectively, and

equation image(36)

equation image being the norm of the vector of spectral wave numbers associated with x (see also Yaglom, 1987, p. 363, Eq. (4.130); Stein, 1999, p. 49, Eq. (32); or Gneiting et al., 2009, p. 16, Eq. (20)). Setting

equation image(37)

where Γ(ν) denotes the Gamma function, and applying the inverse FT to Eq. (35) leads to an integral solution of the general form (16), where C(x,x′) = cw (r) is a unit-amplitude isotropic function given by

equation image(38)

Kν(r/L) denoting the modified Bessel function of the second kind of order ν, and r = |xx′|. Since equation image is strictly positive, cw (r) is a valid correlation function in equation image (Bochner's theorem; see Theorem 2.10 in Gaspari and Cohn, 1999). Notice that the power spectrum equation image depends on d but the correlation function itself cw (r) is independent of d.

Equation (38) is a class of correlation function well known in the geostatistical literature as the Whittle–Matérn or Matérn family (Gneiting, 1999; Stein, 1999; Guttorp and Gneiting, 2006). The link between this correlation family and the fractional differential operator (34) is attributed to Whittle (1954, 1963). Of particular interest here is the subclass of Matérn functions that correspond to (positive) integer values of the parameter M = ν + d/2. For this subclass, the inverse correlation operator equation image has a greatly simplified representation for numerical applications and can be interpreted as an M-step implicitly formulated diffusion operator (MW10), where (cf. Eqs (15) and (16))

equation image

The correlation kernels and their associated FT are given by

equation image(39)

and

equation image

Equation (39) yields valid correlation functions if M > d − 1 (ν > 0; Guttorp and Gneiting, 2006). Notice also that in contrast to the full Matérn family, the implicit-diffusion kernels depend on d (which has been made explicit by adding the subscript d in equation image) but their normalized power spectrum equation image is independent of d. For odd values of d, Eq. (39) reduces to a polynomial of order M − (d + 1)/2 times an exponential function; this is the well-known class of AR functions.

Of relevance here are the spaces equation image, equation image and equation image. The implicit-diffusion kernels on these spaces can be written explicitly as

equation image(40)
equation image(41)
equation image(42)

where

equation image

From Eq. (37), the expressions for the normalization constants become

equation image(43)

Using Eq. (19), the Daley length-scale of the implicit diffusion kernels in equation image can be evaluated as

equation image(44)

Equation (44) is derived in MW10 for d = 1 and in Appendix A for d = 2. The generalization to d > 2 follows by noting that the correlation functions associated with odd d all have the form (40) with equation image, while those functions with even d all have the form (41) with equation image. Equation (44) imposes further restrictions on the choice of M where now we require

equation image

In equation image, for example, we require M > 2. Finally, even values of M are more convenient than odd values of M since they greatly simplify the derivation of a ‘square-root’ factor of the diffusion operator, which is important for estimating normalization factors and for preconditioning in variational assimilation (WC01).

The explicit diffusion kernels are the limiting case of the implicit diffusion kernels as M → ∞ with equation image fixed. This is easily deduced from Eqs (36), (37) and (44) where, for M = ν + d/2 large, equation image (Eq. (22)), equation image (Eq. (21)), and equation image, where equation image is the FT of the d-dimensional Gaussian function (17). Based on the similarity of the explicit diffusion kernels on equation image and equation image when (Dg/a)2 ≪ 1, one would expect similar agreement between the implicit diffusion kernels on equation image and equation image when equation image.

Figure 4 shows the correlation function ch (r(θ)) for M = 4 plotted as a function of chordal distance r(θ) for four different Daley length-scales Dh (solid curves). For comparison, the correlation functions equation image for M = 4 are also shown (dashed curves). The Daley scales equation image have been set to Dh and then the corresponding L have been computed from Eq. (44). As expected, the curves are virtually indistinguishable for length-scales of primary interest (<1000 km). Only when the length-scale exceeds 2000 km do noticeable differences appear, and these mainly occur in the tail of the function. In other words, the differential operator (equation imageh)−1 on equation image can be well approximated locally by the differential operator equation image acting on the tangent plane equation image. The simple relations (43) and (44) can then be used in place of the spectral expansions (31) and (8) to provide a good approximation of the normalization factor and length-scale. This is convenient especially with grid-point ocean models where spectral expansions cannot be readily computed due to the presence of complex land boundaries.

Figure 4.

A comparison of the correlation functions ch (r(θ)) (Eqs (5) and (30)), equation image (Eq. (41)) and equation image (Eq. (42)) for M = 4. The four sets of curves correspond to Daley length-scales of 500 km, 1000 km, 2000 km and 4000 km (curves from left to right, respectively). Correlations are plotted as a function of chordal distance. A spectral truncation at n = 500 has been used for ch (r(θ)).

It is important to stress, however, that equation image itself is not a valid correlation function on equation image. A valid correlation function on equation image from the Matérn family is the AR function equation image. For example, Gaspari and Cohn (1999) discuss the second-order AR (SOAR) function on equation image (see their Eq. (2.36)). Figure 4 shows the fourth-order AR function for different length-scales (dashed-dotted curves). The differences between equation image and ch (r(θ)) are larger than those between equation image and ch (r(θ)) but still quite small for length-scales less than 1000 km.

A more general set of correlation functions on equation image can be modelled using a linear combination of implicit diffusion operators or a generalized implicit diffusion operator constructed from the inverse of a polynomial of Laplacian operators raised to the power of M (MW10; Yaremchuk and Smith, 2011; or see Purser et al., 2003b, for related approaches involving the recursive filter). The correlation functions generated by the first approach are described by a linear combination of Matérn functions where the weighting coefficients for each function are specified such that the combined function is positive definite. Gregori et al. (2008) provide general conditions on the model parameters for achieving this. MW10 provide an example in equation image in which two SOAR functions are combined to produce a correlation function with negative lobes.

The second approach is analogous to the one outlined in section 2.3 for the problem on equation image (Eqs (32) and (33)). Hristopulos (2003), Hristopulos and Elogne (2007) and Yaremchuk and Smith (2011) have studied extensively the special case M = 1 and P = 2 on equation image for which the parameter settings κ1 = ρL2 and κ2 = L4 with ρ < 0 and satisfying ρ2 < 4 yield a family of positive-definite, oscillatory functions such as those illustrated in Figure 3 on equation image. With all of these approaches, however, the advantages of increasing the flexibility in the correlation model have to be carefully measured against the increase in computational cost that results from the need to solve additional or more complicated large linear systems, and the difficulty of having to estimate additional parameters.

3. Anisotropic diffusion

Isotropic correlation models are commonly used in data assimilation algorithms because of their simplicity and computational convenience. There is no reason, however, to expect actual background-error correlations to be isotropic in geophysical fluids such as the ocean. On the contrary, one would expect them to be strongly anisotropic, particularly near coastlines, bathymetry, or ocean fronts. General anisotropic correlation models allow for preferential stretching or shrinking of the correlation functions along arbitrary directions. With a diffusion-based correlation model this can be done using a diffusion tensor, as outlined in this section. To fix the concepts and definitions, we focus mainly on the homogeneous and anisotropic problem. Methods for estimating the parameters of a general inhomogeneous and anisotropic diffusion model are described in section 4.

3.1. Homogeneity and anisotropy

Consider the 2D diffusion equation on equation image,

equation image(45)

where equation image is an anisotropic, but constant diffusion tensor

equation image(46)

which is assumed to be symmetric (κyx = κxy) and positive definite (equation image) so that κ is guaranteed to be invertible. The diagonal terms of the tensor determine the strength of the diffusion in the coordinate directions x and y, while the off-diagonal elements allow the principal axes of the diffusion to be rotated relative to x and y.

The solution of Eq. (45) is a straightforward extension of the solution to the isotropic problem (Pannekoucke and Massart, 2008; Pannekoucke, 2009). Given the initial condition equation image, the solution can be expressed as Eq. (16) (with d = 2) where the kernel is given by the Gaussian function

equation image(47)

|κ| denoting the determinant of κ, and equation image the non-dimensional distance measure

equation image(48)

with x = (x,y)T. From this definition, κ can also be interpreted as the aspect tensor of the Gaussian function (eq:cg) (Purser et al., 2003b). The elements of κ have physical units of length squared. Setting

equation image

ensures that cg (0) = 1.

For a homogeneous and at least twice-differentiable correlation function, we can define the Hessian tensor (Swerling, 1962; Hristopulos, 2002; Chorti and Hristopulos, 2008), which for the 2D Gaussian function is

equation image(49)

where ∇∇T is the outer product of the 2D gradient operator ∇ = (∂/∂x ∂/∂y)T and its transpose. The correlation Hessian tensor is of interest here since it is a quantity that can be estimated from sample statistics of background error (see section 4). Following the basic procedure described in Appendix A, it is straightforward to verify that

equation image(50)

In the isotropic case, κ = κI and hence Hg = (Dg)−2I where (Dg)2 = 2κs is the square of the Daley length-scale. The inverse of the tensor (49)

equation image(51)

can thus be considered as a generalization of the (square of the) Daley length-scale to the anisotropic case. We will thus refer to this quantity as a Daley tensor.

For the 3D diffusion equation, the diffusion tensor equation image contains six independent elements:

equation image(52)

where κyx = κxy, κzx = κxz and κzy = κyz. In direct analogy with the 2D problem, the integral solution involves a 3D Gaussian kernel with aspect tensor given by (52), and normalization constant given by γg3 = (4πs)3/2|κ|1/2 (cf. Eq. (18)). The relationships (49)–(51) hold for the 3D problem with ∇ now interpreted as the 3D gradient operator.

To approximate a 2D or 3D anisotropic and homogeneous Gaussian correlation operator numerically, we can solve Eq. (45) with an explicit scheme,

equation image

where from Eqs (50) and (51)

equation image(53)

and the operator 1 + ∇ · κ∇ is understood to be in discrete form. If the non-diagonal tensor elements of κ are zero, which can always be achieved by rotating the model coordinates to be aligned with the principal axes of the ellipse or ellipsoid implied by Eq. (48) (see, for example, Xu, 2005), then the 2D or 3D Gaussian operator can be replaced by a product of 1D Gaussian operators acting independently along each direction x, y and z. Ignoring boundary conditions, each 1D Gaussian operator can in turn be approximated by a 1D diffusion operator discretized using an M-step explicit scheme.

Extending these results to the d-dimensional implicit case, we can define a set of anisotropic and homogeneous Matérn correlation operators, with ν = Md/2, as solutions to the following linear system (cf. Eq. (34)):

equation image

where

equation image(54)

The associated correlation functions are given by

equation image(55)

with equation image defined by Eq. (48). As for the Gaussian, we can derive the following relationships between the Hessian tensor of equation image and diffusion tensor κ (see Appendix A):

equation image(56)

The solution described in section 2.4 corresponds to the isotropic case κ = L2 I with L2 = (Dwd)2/(2Md − 2).

3.2. Inhomogeneity and anisotropy

Analytical expressions for the correlation kernels of the anisotropic diffusion operators in equation image with spatially varying diffusion tensors κ(x) are not known in general. Paciorek and Schervish (2006) describe a family of anisotropic and inhomogeneous correlation functions that generalize the standard isotropic and homogeneous Gaussian and Matérn family. These correlation functions have the form

equation image(57)

for the Gaussian-like function, and

equation image(58)

for the Matérn-like functions (ν = Md/2) where

equation image

and

equation image

A(x) and A(x′) denoting the (symmetric and positive definite) aspect tensors at points x and x′, respectively. Equations (57) and (58) with A(x) ≈ 2sκ(x) for the Gaussian-like function and A(x) ≈ κ(x) for the Matérn-like functions can be considered as the approximate kernels of the explicit and implicit forms of the anisotropic diffusion operator when the diffusion tensors κ(x) vary slowly and smoothly in space. This is illustrated in MW10 who provide examples in 1D comparing a two-step implicit-diffusion kernel and an inhomogeneous version of the SOAR function for different spatial distributions of the length-scale parameter.

4. Specifying the anisotropic tensor

The elements κxz, κzx, κyz and κzy of the 3D diffusion tensor account for anisotropy between the horizontal and vertical directions. The importance of these terms compared to the diagonal terms is related to the choice of vertical coordinate in the correlation model. In an ocean model, for example, a natural vertical coordinate is a hybrid coordinate involving a standard geopotential (z) coordinate in unstratified regions such as the mixed layer, an isopycnal (ρ) coordinate in strongly stratified regions, and a terrain-following (s) coordinate near the ocean bottom, the latter being particularly important in shallow coastal regions (Haidvogel and Beckmann, 1999). In this hybrid coordinate system, the flow is more naturally decoupled into ‘horizontal’ and ‘vertical’ processes. If the same coordinate system is adopted for a background-error correlation model then it is reasonable to assume, at least from a physical viewpoint, that the non-diagonal tensor elements κxz, κzx, κyz and κzy, and possibly κxy and κyx, are small and can be neglected. However, anisotropy in background-error correlations can also arise from the assimilation of data, especially when the data coverage is irregular. In general, the relative importance of the diagonal and non-diagonal terms of the tensor can only be determined after a thorough diagnostic study involving, for instance, the direct estimation of the elements of the Daley tensor.

Many ocean models used for global- and basin-scale circulation studies employ a z coordinate. WC01 illustrated how a standard isopycnal diffusion tensor used to parametrize mixing of unresolved processes in a z-coordinate ocean model could also be used to transform the coordinates of a background-error correlation model formulated as an explicit 3D diffusion operator. An analogous coordinate transformation was proposed within the framework of Optimal Interpolation by Balmaseda et al. (2008). While the isopycnal correlation model has appealing features, the implementation based on the explicit scheme proposed by WC01 is too expensive for routine applications since a prohibitively high number of iterations is required to maintain numerical stability in regions of strong isopycnal gradients. Moreover, the specification of the Daley length-scales must be performed in isopycnal space, which makes estimating them more difficult in a z-coordinate model. In the remainder of this section we explore alternative methods for defining anisotropic and inhomogeneous correlations, which involve estimating the Daley tensor directly in the model coordinate system.

4.1. Ensemble estimation methods

Given an estimate of the Daley tensor, the anisotropic response of the explicit diffusion operator can be calibrated using Eq. (53), which relates the Daley tensor of the Gaussian function to the diffusion tensor. Alternatively, the anisotropic response of the implicit diffusion operator can be calibrated using the third expression in (eq:Lhi2), which relates the Daley tensors of the ν = Md/2 Matérn functions to the diffusion tensor. Several authors have proposed methods for estimating the Daley tensor using perturbations from an ensemble of model states (Belo Pereira and Berre, 2006; Pannekoucke and Massart, 2008; Pannekoucke, 2009; Sato et al., 2009). The basic procedure is outlined below. Two of the methods will then be compared in idealized experiments using the diffusion equation. For simplicity, we focus on the 2D case.

Assume that an ensemble of Ne model states is available and that the distribution of these states about their mean is a good approximation of the true probability distribution function (pdf) of the model-state (background) error equation image. In variational assimilation, this pdf is assumed to be Gaussian and thus fully described by its mean E[equation image and covariance function

equation image(59)

where E[ · ] denotes the expectation operator. The associated correlation function C(x,x′) can be determined from the factorization

equation image(60)

where

equation image

is the standard deviation of equation image at x. Assuming that C(x,x′) is at least twice differentiable then we can define the symmetric tensor

equation image

where ∇ = (∂/∂x ∂/∂y)T and ∇′ = (∂/∂x∂/∂y′)T. The local correlation Hessian tensor is the value of T at x = x′. Assume further that, in a neighbourhood of x, C(x,x′) can be well approximated by a homogeneous function equation image where r = xx′ = (xx,yy′)T = (equation image, equation image)T. Letting equation image, then we can define

equation image(61)

such that equation image (see Appendix B).

Let H(x) = T(x,x) denote the local correlation Hessian tensor at x. Pannekoucke and Massart (2008) and Pannekoucke (2009) assume a Gaussian form for the correlation function and then invert this function at each point x to estimate H(x) in terms of sample correlation estimates with neighbouring points. Belo Pereira and Berre (2006) propose an alternative method for estimating H(x), which does not require a prior assumption on the functional form of the correlations. Their method leads to the expression

equation image(62)

where

equation image

equation image

The gradient terms can be estimated numerically using finite differences. When the correlation function is strictly homogeneous, equation image is equivalent to the constant tensor equation image (Eq. (61)) if the sampling operator is replaced by the expectation operator (see Appendix B). Note that the derivation of Eq. (62) is based on the rather general assumptions of differentiability and local homogeneity of the correlation function. A specific assumption about the actual form of the correlation function is implied only when equation image is employed with a particular diffusion operator (explicit or M-step implicit scheme). Hristopulos (2002) and Chorti and Hristopulos (2008) describe a related approach in geostatistics which involves estimating the aspect ratios and orientation angle required to transform the local covariance Hessian tensor into isotropic form.

Multidimensional Gaussian correlation operators can be applied efficiently using a combination of 1D recursive filters or 1D diffusion operators (Purser et al., 2003a, 2003b; MW10). For anisotropic Gaussian operators, the so-called triad (hexad) algorithm (Purser et al., 2003b; Purser, 2005) allows one to determine from the aspect tensor of the 2D (3D) Gaussian function, the three (six) generalized grid-lines along which the 1D filters should be applied. Within that framework, various flow-dependent formulations of the aspect tensor have been proposed (De Pondeca et al., 2006; Liu et al., 2007, 2009; Sato et al., 2009). Of particular interest here is the hybrid formulation of Sato et al. (2009) where the inverse of the aspect tensor of the Gaussian function is defined as a linear combination of a ‘conventional term’ based on a quasi-isotropic, static formulation (equation image) and an ‘ensemble term’ formed from the sample covariance of the gradient of the ensemble-generated perturbations, normalized by the sample variance of the perturbations:

equation image(63)

where

equation image(64)

and α and β are weighting coefficients. Sato et al. (2009) provide a heuristic derivation of Eqs (63)–(64). They are equivalent to Eq. (62) when α = 0 and β = 1, and when the standard deviations are constant. As with the hybrid covariance formulations that involve combining static and ensemble-based expressions of the full covariance matrix (Wang et al., 2008), the static term in the aspect tensor is intended to give the estimate more robustness especially when the ensemble size is small, although accounting for it requires extra parameters that must be tuned empirically.

Finally, with small ensemble sizes it can be advantageous to apply a local spatial averaging or filtering operator to the estimated variances and covariances in order to reduce the effects of sampling error (see, for example, Berre and Desroziers (2010) for a thorough review of recent work in this area). Letting F denote a particular filtering operator then the expressions for the filtered estimate of the inverse tensor are given by Eqs (62) and (63)–(64) with

equation image

equation image

4.2. Numerical experiments

In this section we perform idealized experiments to evaluate and compare the effectiveness of Eq. (62) and Eqs (63)–(64) for estimating the parameters of an anisotropic tensor. For simplicity, we focus on the 2D anisotropic diffusion problem and the solution algorithm based on the explicit scheme. Furthermore, for the tensor estimated using Eqs (63)–(64), only the special case α = 0 and β = 1 is considered.

The experimental design is as follows. First, we define the ‘true’ covariance matrix of the problem as

equation image

where L is the M-step explicit diffusion operator (1 + ∇ · κ∇)M discretized using a standard centred finite-difference scheme on a uniform grid, Γ = Γ1/2Γ1/2 is a diagonal matrix of normalization factors, and Σ is a diagonal matrix of standard deviations σ. With constant parameters and ignoring the influence of boundaries, B defines a Gaussian covariance matrix.

Next, a sample of Ne spatially uncorrelated random vectors equation image, l = 1,…,Ne, are produced on the grid, where the distribution of each equation image is taken to be Gaussian with equation image and equation image. Each vector equation image is then transformed into a new vector equation image such that equation image. This is done using the ‘square-root’ of the B-operator,

equation image

where L = L1/2L1/2, the exponent 1/2 implying M/2 iterations of the explicit diffusion operator (with M taken to be even). The sample covariance matrix constructed from the ensemble of equation image vectors provides an estimate of the true covariance matrix:

equation image(65)

where equation image.

Our interest here is not to try to reconstruct the full covariance matrix from (65) but rather, with the help of Eq. (62) or Eq. (64), to try to reconstruct the anisotropic tensor used in L to generate the equation image. Indeed, the sampling errors resulting from estimating the local anisotropic tensor can be expected to be much smaller than those resulting from estimating the full covariance matrix. For the 2D problem, the tensor estimation requires, at each grid point, sample estimates of the standard deviation of equation image and of the three independent tensor elements involving the gradient of equation image, i.e. a total of 4N elements where N is the total number of grid points. This is much smaller than the (N2 + N)/2 independent elements required to determine the full covariance matrix.

The numerical experiments are performed in a square domain on a 2D grid xi,j = (xi,yj) where xi = iΔx, equation image, and yj = jΔy, equation image. Here, Δx = Δy = 1 unit and N = 200 × 60, and thus the effective size of the B matrix is (1.2 × 104) × (1.2 × 104). Neumann boundary conditions are employed at the solid walls located at the domain edges. As a result, the implied correlation function near the boundary is slightly modified from the target Gaussian (MW10).

At each grid point xi,j, the ‘true’ diffusion tensor κ for L is defined according to (cf. Eq. (53))

equation image(66)

where Di,j is the local Daley tensor which is formulated as

equation image

equation image being a diagonal matrix and Ri,j a rotation matrix (equation image). The elements (Dxx)i,j, (Dyy)i,j and (Dxy)i,j = (Dyx)i,j of Di,j are thus determined by the diagonal elements equation image and equation image of equation image and by the rotation angle θi,j of Ri,j. For the experiments described here, θi,j = θ is constant, while the parameters equation image and equation image are specified as a simple oscillatory function of the spatial coordinates xi,j:

equation image

where

equation image

equation image, equation image, and X = Y = 20. Similarly, the variances are specified as

equation image

where equation image and equation image. Experiments are performed with different values of the parameters θ, equation image, equation image, equation image and equation image (see Table 2).

Table 2. Bias and RMSE of the estimates of the correlation Hessian tensor elements (Hxx,Hyy,Hxy) using expressions (62) (equation image) and (64) (equation image). The second column lists the parameter settings in the ‘true’ covariance model and the third column indicates the choice of ensemble size (Ne) and spatial filtering scale (Navg) used in the estimation process. The RMS of the true values of (Hxx,Hyy,Hxy) are (5.3,5.3,0) × 10−2 when θ = 0, and (5.0,5.0,1.7) × 10−2 when θ = π/4.
Exp(θ,Dmin,Dmaxminmax)(Ne,Navg)Method(Hxx,Hyy,Hxy)
    Bias ×10−2RMSE ×10−2
1(0, 3, 6, 1, 1 )(100, 0)equation image(0.06, 0.20, 0.0)(1.2, 1.6, 0.46)
2(0, 3, 6, 1, 1 )(100, 0)equation image(−0.01, 0.15, 0.0)(1.2, 1.6, 0.47)
3(0, 3, 6, 1, 1 )(10, 0)equation image(0.33, 0.39, 0.07)(4.6, 4.8, 2.1)
4(0, 3, 6, 1, 1 )(10, 0)equation image(0.92, 0.99, 0.10)(4.9, 5.2, 2.2)
5(0, 3, 6, 1, 5 )(100, 0)equation image(−0.07, −0.22, 0.0)(1.2, 1.6, 0.47)
6(0, 3, 6, 1, 5 )(100, 0)equation image(1.2, 1.1, −0.01)(2.5, 2.8, 1.2)
7(π/4, 3, 6, 1, 5)(100, 0)equation image(−0.22, −0.40, 0.01)(1.4, 1.7, 0.85)
8(π/4, 3, 6, 1, 5)(100, 0)equation image(1.1, 0.88, 0.0)(2.6, 2.7, 1.4)
9(π/4, 3, 6, 1, 5)(10, 0)equation image(0.17, 0.10, −0.02)(4.0, 4.5, 2.3)
10(π/4, 3, 6, 1, 5)(10, 1)equation image(0.33, 0.28, 1.9)(3.3, 3.6, 1.9)
11(π/4, 3, 6, 1, 5)(10, 3)equation image(0.51, 0.45, −0.36)(2.0, 2.2, 2.8)
12(π/4, 3, 6, 1, 5)(100, 1)equation image(0.02, 0.14, −0.07)(1.2, 1.4, 0.86)
13(π/4, 3, 6, 1, 5)(100, 3)equation image(0.31, 0.19, −0.32)(0.98, 1.0, 1.1)

The normalization factors γi,j of the diagonal matrix Γ are approximately given by the expression

equation image(67)

This approximation was used by Pannekoucke and Massart (2008), for example, and is reasonable if the diffusion tensor varies in space on a scale much larger than the local correlation scale and with a proper treatment of the boundary conditions (MW10). The factors can be estimated to a higher accuracy using more refined analytical approximations (Purser et al., 2003b; Purser, 2008a, 2008b; MW10; Yaremchuk and Carrier, 2012) or randomization methods (WC01; Yaremchuk and Carrier, 2012). They can also be computed exactly using the δ-function method (WC01; MW10). In this idealized study, we employ the exact normalization method in order to avoid introducing a bias in the ensemble perturbations and thus complicating the interpretation of the results. In practice, however, the exact computation is generally not affordable and hence the representation of covariances using the diffusion equation will also be affected by approximations in the normalization factors. The errors that can result from using the approximate expression (67) are illustrated in the experiments below.

The estimation of the tensor via the statistical relationships (62) and (64) is achieved using centred finite-differences. Estimates of the first derivatives of the error and its standard deviation produce values at the interface of the grid cells, i.e. at the half-integer points (i + 1/2,j) for the x-component and (i,j + 1/2) for the y-component. The sample variance of these quantities is computed directly at these points to evaluate the numerator in the expressions for the diagonal elements of the tensor. The off-diagonal elements involve estimates of the cross-product of the x- and y-components of the derivatives. This requires interpolation of one of the component derivatives to the point where the other component derivative is defined. To estimate the cross-product at (i + 1/2,j), the x-component derivative that is defined there is multiplied with an estimate of the y-component derivative obtained by averaging its values from the four surrounding points (i,j + 1/2), (i + 1,j + 1/2), (i,j − 1/2) and (i + 1,j − 1/2), and vice-versa for estimating the cross-product at (i,j + 1/2). To compute the denominator in the expressions for the tensor elements, the sample variance of the error is interpolated from (i,j) points to (i + 1/2,j) or (i,j + 1/2) points. In order to use the estimated tensor elements in the diffusion equation, the elements are first averaged to the (i,j) points and then the off-diagonal elements averaged to force symmetry. The estimated tensor is then inverted at each point and used with the relation (eq:Lij) to define the diffusion tensor at each point. Finally, interpolation is used to define the values of the tensor elements at the half-integer points (i + 1/2,j) or (i,j + 1/2) where they are required with the centred-difference formulation of ∇ · κ∇.

Table 2 summarizes the results from several experiments with different parameter settings equation image. Three cases are considered. In the first case, the principal axes of the anisotropic correlations are aligned with the grid-lines, and the variance is constant: equation image. The second case extends the first case by allowing the variances to vary in space: equation image. Finally, the third case extends the second case by rotating the principal axes of the anisotropic correlations relative to the grid-lines: equation image. The quality of the estimation is measured in terms of the domain-averaged bias and root-mean-square error (RMSE) of the estimates of the elements (Hxx,Hyy,Hxy). (The results for the estimates of Hyx are not given since they are almost identical to those of Hxy). For reference, the domain-averaged RMS of the true values of (Hxx,Hyy,Hxy) are (5.3,5.3,0) × 10−2 when θ = 0, and (5.0,5.0,1.7) × 10−2 when θ = π/4.

With equation image (Eq. (62)) and equation image (Eq. (64)) produce similar results with a relatively large ensemble (Ne = 100) as one might expect since the true variances are constant (Exps 1–2 in Table 2). Interestingly, however, equation image is noticeably more accurate than equation image with a small ensemble size (Ne = 10; Exps 3–4). When the variances are spatially varying (equation image and equation image), the errors for equation image become significantly larger, whereas those for equation image are similar to the constant variance case (Exps 5-8). This illustrates the importance of the second term in Eq. (62).

Local spatial filtering is beneficial for reducing the RMSE especially when the ensemble size is small (Ne = 10; Exps 9–11). With the raw ensemble estimates (Navg = 0), the RMSE is comparable to the RMS of the true signal (Exps 3–4, 9). Here, a very simple filtering procedure has been used in which the estimate at xi,j is obtained by averaging estimates at points within Navg grid points of (i,j) where in the examples considered Navg = 1 or 3. This increases the size of the averaging sample at each point to Neff = (2Navg + 1)2 × Ne, except near the boundary where fewer points are used in the averaging process. While increasing the value of Navg reduces the RMSE, it does so at the expense of increasing the bias in the estimates. With a larger ensemble (Ne = 100), good results are obtained when a ‘light’ filtering is applied (Navg = 1), with both the bias and RMSE being reduced relative to the no filtering case for all but the off-diagonal elements which are very slightly degraded (Exps 7, 12–13). The filter in this example is very simple and the choice of filtering scale is somewhat ad hoc. More sophisticated (objective) filters could be expected to perform better as discussed by Raynaud et al. (2009) and Berre and Desroziers (2010), and recently by Raynaud and Pannekoucke (2012) within the context of filters based on diffusion.

The correlations obtained using the ‘true’ tensor with the parameter settings equation image are illustrated at selected points in Figure 5(a). Figure 5(b) shows the corresponding correlations estimated directly from the sample covariance matrix (Eq. (65)) with a 100-member ensemble. Sampling errors are large and manifest themselves as spurious non-local correlations. In contrast, the diffusion-based correlation model is localized by construction. The correlations resulting from estimating the diffusion tensor from the 100-member ensemble are shown in Figure 5(c). The estimated correlations are in good agreement with the true correlations and notably capture prominent anisotropic features such as the rotation of the principal axes relative to the grid lines. The third correlation pattern from the left boundary is computed with respect to a point that is located midway between maximum and minimum values of Dxx, Dyy and σ2 where the spatial derivative of these parameters is maximum and thus where one would expect the local homogeneous assumption to be least valid. At this location the estimated errors are largest and up to 20% (Figure 5d). The breakdown of homogeneity also affects the accuracy of the approximate expression (67) for the normalization factors. This can be seen in Figure 5 parts (e) and (f), which show the estimated correlations and associated error when approximate normalization factors from Eq. (67) are used in place of the exact factors that were used to produce Figure 5(c). The amplitude of the error now reaches 50% for the third correlation pattern (the colour bar is truncated at 30%) and is noticeably larger for the other correlation patterns as well. Finally, Figure 5 parts (g) and (h) show the correlations and associated errors obtained using the tensor estimated with Ne = 10 combined with local spatial averaging. While the correlations are not as accurate as those with Ne = 100, they are still reasonable approximations. The maximum error for the third correlation pattern is approximately 25% and reaches 36% when the approximate normalization factors are used (not shown).

Figure 5.

(a) The ‘true’ correlations at selected points in the domain. (b) The correlations estimated directly from the sample covariance matrix (Eq. (65)) with 100 ensemble members. The correlations produced using the diffusion equation with the diffusion tensor estimated with (c) 100 members, no local spatial averaging, and exact normalization; (e) 100 members, no local spatial averaging, and normalization factors approximated using Eq. (67); and (g) 10 members, local spatial averaging with a 7 × 7 grid-point window, and exact normalization. The differences between the correlations obtained using the estimated diffusion tensor and the true correlations are illustrated in panels (d), (f) and (h).

5. Summary and conclusions

Accounting for general background-error correlations effectively and efficiently is a considerable challenge in geophysical data assimilation. In VDA, general background-error correlation models can be defined using differential operators constructed numerically from the explicit or implicit solution of a diffusion equation. Theoretical results underpinning the diffusion approach to correlation modelling were reviewed in this paper. First, the isotropic, constant-coefficient diffusion problem was considered both on the sphere and in the d-dimensional Euclidean space equation image. The covariance functions (kernels) of the integral solution operators implied by explicit and implicit diffusion in these spaces were identified. The solutions on the sphere were shown to be well approximated by the solutions on equation image for scales of interest in ocean and meteorological data assimilation. Expressions relating the diffusion model parameters to the parameters that control the length-scale and amplitude (normalization factor) of the covariance function were also given. These results provided the basis for constructing more general correlation operators via anisotropic diffusion, which was the focus of the second part of the paper.

Anisotropic diffusion was considered in equation image. The anisotropic diffusion problem is characterized by a diffusion tensor that controls the direction of the covariance response, as well as its scale and amplitude. Solutions to the anisotropic, constant-tensor diffusion problem are integral operators that involve covariance kernels with the same basic form as those of the isotropic, constant-coefficient problem. With the explicit scheme, these functions are approximately Gaussian, whereas with the implicit scheme they are members of the larger Matérn family (e.g., in equation image they are AR functions). For the anisotropic functions, distance is defined by a norm whose metric is given by the inverse of the diffusion tensor. This metric can in turn be related to the correlation Hessian tensor which is defined by the tensor of second-derivatives of the correlation function evaluated at zero separation. The importance of this tensor is that it can be related to quantities that can be estimated directly from ensemble statistics. The inverse of the correlation Hessian tensor was referred to as the Daley tensor in this paper in view of its close connection to the conventional Daley length-scale in the isotropic case.

Ensemble data assimilation methods can be used to provide flow-dependent estimates of the background-error covariances. In realistic applications, the number of independent background-error covariances that need to be estimated is huge and the number of ensemble members that can be affordably run is very limited. Methods are then required to synthesize the ensemble-covariance information to avoid manipulating huge covariance matrices, on the one hand, and to reduce the effects of sampling error, on the other.

The correlation information in the ensemble can be synthesized using a diffusion model with an anisotropic and spatially varying tensor. Procedures for estimating the local Hessian tensor (which in turn can be related to the diffusion tensor) from ensemble perturbations were described and compared in idealized numerical experiments. The method of Belo Pereira and Berre (2006), which assumes local homogeneity of the correlation function but accounts for spatially varying variances, was shown to work particularly well, and is well suited for the automated computations required in a cycled ensemble data assimilation system. Local spatial filtering of the tensor was critical with small ensemble sizes (order 10), but the raw ensemble with 100 members gave good results without spatial filtering in our example. In general, a carefully designed objective filter would be beneficial in order to maximize the signal-to-noise ratio of the ensemble-estimates of the tensor elements in a similar way that it has been shown to be beneficial to the ensemble estimation of background-error variances (Raynaud et al., 2009; Berre and Desroziers, 2010).

In realistic applications, the numerical stability condition associated with explicit diffusion schemes can severely limit their computational efficiency. In particular, many iterations are likely to be needed with general anisotropic and inhomogeneous diffusion models that employ ensemble-estimated tensors. Implicit diffusion schemes are more robust but require solving a large linear system for which efficient methods that are well-suited to massively parallel machines are required. This important practical aspect of the problem was not addressed in this paper and should be the subject of further research.

Acknowledgements

Financial support from the French National Research Agency (ANR) COSINUS programme (VODA project, no. ANR-08-COSI-016), the RTRA STAE foundation (ADTAO project), the European Framework Programme 7 (COMBINE project, GA 226520), and the French LEFE-ASSIM programme is gratefully acknowledged. This work benefited from discussions with Loik Berre, Serge Gratton, Sébastien Massart, Olivier Pannekoucke and Andrea Piacentini. The anonymous reviewers and the Associate Editor Martin Leutbecher provided many helpful remarks for improving the presentation of the paper.

Appendix A

The Daley tensor of the 2D implicit-diffusion kernels

In this appendix we show that the expression for the Daley tensor of the 2D implicit-diffusion kernels equation image (Eq. (55) with d = 2) is related to the diffusion tensor by Eq. (56). For clarity of notation the subscript and superscript of equation image will be dropped hereafter. The relationships between the Daley and diffusion tensors for the implicit-diffusion kernels in higher dimensions and for the Gaussian function (Eqs (49)–(51)) are straightforward to verify following the basic procedure outlined here.

From the chain rule, the three independent elements of the outer product in the first equation of (56) can be written as

equation image(68)

Expressing Eq. (55) with d = 2 as

equation image(69)

where αM = 22−M/(M−2)! and M > 2, and using the following recurrence relation for the modified Bessel functions of the second kind of integer order n (Eq. 9.6.26 of Abramowitz and Stegun, 1970),

equation image

where Kn = Kn(equation image), allows us to write

equation image(70)

The inverse of the symmetric diffusion tensor (46) can be written as

equation image

where τ = 1/(μ − 1) and equation image. In expanded form the nondimensional distance measure (48) then reads

equation image(71)

From Eq. (71) we can derive the following relations

equation image(72)

where

equation image

Substituting Eqs (70) and (72) in Eq. (68) yields

equation image

Since c(0) = 1, for all allowable M, we have from Eq. (69) the general relation

equation image

Hence

equation image(73)
equation image(74)

The first term on the right-hand side of each of the above equations vanishes since X = Y = 0 at equation image = 0, while the common coefficient of the second term in each equation is

equation image

Thus we obtain the relationship governed by (56) with d = 2.

In the isotropic case, κxx = κyy = L2 and τκ−1xy = 0. Equations (73) and (74) can then be averaged and inverted to yield the standard definition of the square of the 2D Daley length-scale involving the Laplacian operator (Eq. (44) with d = 2).

Appendix B

Estimating the Hessian tensor from an ensemble of simulated errors

In this appendix we provide a derivation of Eq. (62) for the special case of a homogeneous correlation function. The derivation is similar to the one given in the Appendix of Belo Pereira and Berre (2006) except for notational changes and greater emphasis here on some of the underlying assumptions.

The starting point is the general expression (59) for the covariance function B(x,x′) of the ensemble of model-state errors. We consider here the 2D case where x = (x,y)T and x′ = (x,y′)T, and assume that B(x,x′) is at least twice differentiable. We can express the covariance function of the derivatives of the ensemble errors as follows (Swerling, 1962; Daley, 1991, p. 156):

equation image

Using Eq. (60) the derivatives on the right-hand side of the above equations can be evaluated in terms of the standard deviations σ(x) and correlation function C(x,x′). Focusing on the first of these equations this yields

equation image

Under the assumption of homogeneous correlations, we can write C(x,x′) = c(r) where r = xx′ = (xx,yy′)T (Gaspari and Cohn, 1999). Using the chain rule, the derivatives of C with respect to x, x′, y and y′ can be rewritten in terms of derivatives of c with respect to equation image and equation image. For equation image this gives

equation image

Evaluating the above equation at x = x′ (r = 0), and noting that equation image since c(r) is maximum at r = 0 and that c(0) = 1, yields

equation image

which can be rearranged as

equation image(75)

where σ = σ(x). A similar analysis for the other covariance functions yields

equation image(76)
equation image(77)
equation image(78)

In tensor notation, the left-hand side of Eqs (75)–(60yx) is equivalent to Eq. (61) evaluated at r = 0, while the right-hand side of the equations can be identified with the right-hand side of Eq. (62).

  • *

    Isotropic correlation functions on equation image will be written explicitly as a function of r(θ) whenever the context requires an interpretation in terms of chordal distance.

  • It is common in VDA to use multivariate balance operators to transform the background-error variables into a set of new variables whose cross-covariances can be effectively neglected. The background-error covariances of the transformed variables can then be treated as univariate functions and represented via a diffusion model.

Ancillary