By continuing to browse this site you agree to us using cookies as described in About Cookies
Notice: Wiley Online Library will be unavailable on Saturday 7th Oct from 03.00 EDT / 08:00 BST / 12:30 IST / 15.00 SGT to 08.00 EDT / 13.00 BST / 17:30 IST / 20.00 SGT and Sunday 8th Oct from 03.00 EDT / 08:00 BST / 12:30 IST / 15.00 SGT to 06.00 EDT / 11.00 BST / 15:30 IST / 18.00 SGT for essential maintenance. Apologies for the inconvenience.
Differential operators derived from the explicit or implicit solution of a diffusion equation are widely used for modelling background-error correlations in geophysical applications of variational data assimilation. Key theoretical results underpinning the diffusion method are reviewed. Solutions to the isotropic diffusion problem on both the spherical space and the d-dimensional Euclidean space are considered first. In the correlation functions implied by explicit diffusion are approximately Gaussian, whereas those implied by implicit diffusion belong to the larger class of Matérn functions which contains the Gaussian function as a limiting case. The Daley length-scale, defined as where ∇2 is the d-dimensional Laplacian operator and r = |r| is Euclidean distance, is used as a standard parameter for comparing the different isotropic functions c(r). Diffusion on is shown to be well approximated by diffusion on for length-scales of interest. As a result, fundamental parameters that define the correlation model on can be specified using more convenient expressions available on .
Various methods have been proposed for modelling background-error correlations in geophysical applications of variational data assimilation (VDA) (see Bannister, 2008, for example, for a thorough review of methods used in atmospheric VDA). In ocean VDA, background-error correlation models based on the diffusion equation are popular. The method has its origins in the work of Derber and Rosati (1989), who proposed the use of an iterative Laplacian grid-point filter in order to approximate a Gaussian correlation operator. Egbert et al. (1994) described a close variant of the algorithm in which the Laplacian grid-point filter could be interpreted as a pseudo-time-step integration of a diffusion equation with an explicit scheme. Weaver and Courtier (2001) (hereafter WC01) described the algorithm in more detail and proposed various extensions to account for more general correlation functions than the quasi-Gaussian of the original Derber and Rosati (1989) algorithm. Correlation models based on explicit diffusion methods have been used in various VDA systems in oceanography (Weaver et al., 2003; Di Lorenzo et al., 2007; Muccino et al., 2008; Daget et al., 2009; Kurapov et al., 2009; Moore et al., 2011), meteorology (Bennett et al. 1996), and atmospheric chemistry (Geer et al., 2006; Elbern et al., 2010).
An explicit diffusion scheme is appealing because of its simplicity, but can be expensive if many iterations are required to keep the scheme numerically stable. This can occur when the local diffusion scale is ‘large’ relative to the local grid size. To keep the explicit scheme affordable, the correlation length-scales must be bounded even if statistics or physical considerations suggest that larger values would be more appropriate. This limitation can be overcome by reformulating the diffusion model using an implicit scheme which has the advantage of being unconditionally stable.
The 1D implicit diffusion approach for constructing 2D and 3D correlation models can be convenient for computational reasons, but has limitations. For example, with few iterations, the product of 1D implicit diffusion operators produces a well-known spurious anisotropic response (Purser et al., 2003a). Unphysical features can also appear near complex boundaries, such as coastlines or islands in an ocean model, where correlation functions cannot always be reasonably represented by a product of separable functions of the model's coordinates. Correlation models based on 2D or 3D implicit diffusion operators can overcome these limitations but are more difficult to implement since they involve the solution of a large linear system (matrices of dimension or larger in VDA). Some progress in the development of this approach has been made by Weaver and Ricci (2004) and Massart et al.(2012), who used sparse matrix methods to solve a 2D implicit diffusion problem directly, and by Carrier and Ngodock (2010) and S. Gratton (2011, personal communication), who used iterative methods based on conjugate gradient or multi-grid to approximate the solution of a 2D or 3D implicit diffusion problem.
Multidimensional implicit diffusion correlation operators can be interpreted in terms of smoothing norm splines, which were introduced to atmospheric data assimilation by Wahba and Wendelberger (1982) and Wahba (1982), and discussed within an oceanographic context by McIntosh (1990). In the norm spline approach, the background term of the cost function in VDA is formulated in terms of a linear combination of weighted derivative operators that penalize explicitly the amplitude and curvature of the solution. When the weighting coefficients are given by binomial coefficients, the inverse of the background-error correlation operator implied by the norm spline can be expressed as the inverse of an implicit diffusion operator. The direct penalty approach was popular in some of the early studies of four-dimensional VDA (Thacker, 1988; Sheinbaum and Anderson, 1990) but generally leads to a poorly conditioned minimization problem (Lorenc et al., 2000). Effective preconditioning techniques for VDA require access to the background-error covariance operator itself. An interesting exception is the recent study of Yaremchuk et al. (2011), who propose a variational formulation in which the inverse of the background-error covariance is modelled directly using the inverse of a low-order (two-iteration) 3D implicit diffusion operator. No apparent conditioning problems were reported in their examples from an ocean VDA system.
The present paper has a dual purpose: first, to provide a review of the diffusion equation as a basis for constructing anisotropic and inhomogeneous correlation models for data assimilation; and second, to illustrate how fundamental parameters that control spatial smoothness properties of these models can be estimated using ensemble methods. Section 2 brings together key results from data assimilation and geostatistics on the isotropic diffusion problem. Diffusion is considered both on the sphere and in the d-dimensional Euclidean space. Analytical expressions for the isotropic correlation functions implied by appropriately normalized explicit and implicit diffusion in these spaces are presented and compared. The Daley length-scale is used as a standard parameter for comparing the different functions, and expressions relating it to the parameters of the diffusion-model are established.
The results from section 2 provide the foundation for building anisotropic correlation models with the diffusion equation. This is discussed in sections 3 and 4. The Daley tensor is introduced, which is defined as the negative inverse of the tensor of second derivatives of the correlation function evaluated at zero distance (the Hessian tensor). The Daley tensor is an anisotropic generalization of the Daley length-scale. Expressions relating the Daley tensor to the diffusion tensor of the diffusion models are given. Section 4 discusses techniques for estimating the Daley tensor from statistics of a sample of simulated errors such as those that would be available from an ensemble data assimilation system. Idealized experiments are then presented to compare the effectiveness of two of the estimation techniques. Conclusions are given in section 5. Appendix A provides a derivation of the relationship between the Daley and diffusion tensors for the correlation functions represented by the implicit diffusion equation in . Appendix B provides a derivation of the key formulae for estimating the Daley tensor.
2. Isotropic diffusion
Coordinate systems of global atmospheric and ocean models refer to the spherical-shell geometry of the atmosphere and ocean. From a mathematical perspective, this leads naturally to consideration of 2D ‘horizontal’ correlation functions on the spherical space . The product of a 2D correlation function on and a 1D correlation function on the bounded subset of the Euclidean space is commonly used to construct 3D correlation functions on the spherical-shell subspace of that defines the model domain. This approach of separating the horizontal and vertical correlation functions is usually justified by the fact that the global atmospheric and ocean circulations are characterized by scales that are much larger in the horizontal direction (along geopotential surfaces) than in the vertical direction (perpendicular to geopotential surfaces). In the remainder of this section, the correlation functions that can be represented by isotropic diffusion on and the general Euclidean space are described. Table 1 provides a brief description of the main symbols used in this section.
Table 1. A list of the main generic symbols used in section 2. The specification of the superscripts α and β is summarized in the bottom table. A quantity in is supplemented with a subscript d if it depends explicitly on the dimension of the space; otherwise it is omitted.
Correlation operators on and
General correlation function on
Vector of Cartesian coordinates
Isotropic correlation function on
Fourier transform of
Vector of spectral wave numbers
Isotropic correlation function on
Legendre coefficients for cβ (θ)
Total wave number
Daley length-scale of and cβ (θ)
Normalization constants on and
Regular diffusion on
Implicit diffusion on
Regular diffusion on
Implicit diffusion on
2.1. Explicit diffusion on
Consider the 2D diffusion equation applied to the scalar field η(λ,ϕ,s):
where κ > 0 is a diffusion coefficient, and
is the Laplacian operator in geographical coordinates (λ,ϕ), λ denoting longitude (0 ≤ λ ≤ 2π), ϕ latitude (−π/2 ≤ ϕ ≤ π/2), and a the radius of the sphere (the Earth's radius in our case). In the context of this paper, s is to be interpreted as a dimensionless pseudo-time coordinate. The diffusion coefficient then has physical units of length squared. The solution of Eq. (1) on can be interpreted as a covariance operator (e.g. see WC01). Let
denote the initial condition, where γs is a normalization constant. The solution at some s > 0 can be expressed as the integral operator ,
where cs (θ) is an isotropic function that depends on the angular separation θ, 0 ≤ θ ≤ π, between points (λ,ϕ) and (λ′,ϕ′) on the sphere:
The normalization constant γs in Eq. (2) has been absorbed into the function cs (θ) which has the specific form
n being the total wave number, and the Legendre polynomials, normalized such that , following the usual convention in meteorology (Courtier et al., 1998). All isotropic covariance functions on can be expressed, as in Eq. (5), as an expansion in terms of the Legendre polynomials (Weber and Talkner, 1993; Theorem 2.11 of Gaspari and Cohn, 1999). They are positive-definite functions on if the spectral coefficients are positive, which is clearly the case for all of the coefficients . Equation (3) is thus a valid covariance operator on .
The covariance function is readily transformed into a correlation function (cs (0) = 1) by defining the normalization constant as
The fundamental parameter controlling the shape of the correlation function is the product κs in Eq. (6). To define the length-scale of cs (θ), we use the standard definition from Daley (1991, p. 110), the geometrical interpretation of which is discussed by Pannekoucke et al. (2008). For cs (θ), the Daley length-scale reads
Equations (6) and (8) provide a relationship between κs and Ds that allows us to control the correlation shape (length-scale).
Now consider a discretized version of Eq. (1) in which the first-order derivative is approximated using a forward-Euler (explicit) scheme. This yields
where m is a positive integer, Δs = sm − sm−1 is the step size, and ∇2 is understood to be the Laplacian operator in discretized form. For convenience, we can assume that sm = m so that the step size Δs = 1. This parameter can thus be ignored hereafter without loss of generality. Repeated applications of Eq. (9) on the interval 0 < m ≤ M leads to the linear operator
where η(λ,ϕ,0) is given by Eq. (2). For clarity we let
to emphasize that the coefficient is positive and can be interpreted as the square of a scale parameter.
The key idea is that, on a numerical grid, the effect of the integral correlation operator (3) on an arbitrary scalar field can be approximated by applying a discretized differential operator (10). This is the essence of the original Derber and Rosati (1989) scheme. The parameter κs of the correlation function cs (θ) can be related to the parameters M and κ of Eq. (10) by noticing that κs = κM = ML2. In practice, it is customary to prescribe the Daley length-scale (Ds). Given Ds, the product ML2 can be determined by a non-trivial inversion of Eq. (8). This has been done by trial and error for the illustrative examples presented in this paper. To determine M and L2 from the product ML2, we have an additional requirement that M must be sufficiently large (L2 sufficiently small) in order to maintain the numerical stability of the explicit scheme. Provided M is not too ‘large’, applying the discretized operator (10) is an efficient way of evaluating the integral operator (3). What defines an acceptable value of M will depend on the application.
To represent a larger family of correlation functions than Eqs (5) and (6), WC01 proposed a generalized diffusion model in which the scaled Laplacian in Eq. (1) is replaced by a linear combination of powers of scaled Laplacians:
where the diffusion coefficients κp > 0 can be related to a general set of scale parameters Lp via the equation
The resulting correlation functions have the same basic form as Eq. (5) but with the given by
and the appropriate modification to γs to produce a unit-amplitude function. Equation (6) is a special case of Eq. (14) with P = 1. Unlike the standard diffusion model, the generalized diffusion model can be used to represent correlation functions that change sign, as illustrated in Figure 1 of WC01. This is an appealing feature if there is compelling evidence of negative correlations in the error fields, although representing them with powers of Laplacian operators would clearly increase the cost of the correlation model.
2.2. Explicit diffusion on
Now consider the diffusion equation (1) on the d-dimensional Euclidean space , where ∇2 now represents the Laplacian operator in Cartesian coordinates x = (x1,…,xd). While our particular interest concerns the spaces , and , it is easier to consider them as special cases of the general diffusion problem in . The initial condition of the diffusion problem can be written as
where is a normalization constant and is assumed to be bounded at infinity. Using the Fourier transform (FT), the solution at ‘time’ s > 0 can be written as a convolution operator :
where C(x,x′) = cg (r) is the Gaussian function
r = |x − x′| being the Euclidean distance between points x and x′ on . Setting the normalization factor to
ensures that cg (0) = 1.
The Daley length-scale for any twice differentiable, isotropic correlation function c(r) in d dimensions is given by
where ∇∇T is the outer product of the d-dimensional gradient operator ∇ = (∂/∂x1…∂/∂xd)T and its transpose. The quantity within the trace operator is the correlation Hessian tensor (Chorti and Hristopulos, 2008). The Hessian tensor plays a fundamental role in characterizing the anisotropic correlation functions described later in this paper (sections 3 and 4). For the d-dimensional Gaussian function, the Daley length-scale is
In terms of Dg, the normalization factor is
As before, we can approximate Eq. (16) with a differential operator based on a discretization of the diffusion equation using an M-step explicit scheme. In terms of the parameters M and L2 of the explicit diffusion operator, Eqs (20) and (21) become
Let us consider now the interpretation of the Gaussian function on . First, since is embedded in , a valid isotropic correlation function on can always be constructed from a valid isotropic correlation function in by restricting x = (x1,x2,x3) and to be points on the sphere. Expressing these points in geographical coordinates x = (acosϕcosλ,acosϕcosλ,asinϕ) and x′ = (acosϕ′cosλ′,acosϕ′sinλ′,asinϕ′) leads to the chordal distance measure
where cosθ is given by Eq. (4). The Gaussian correlation function on confined to the subspace is thus
From Eq. (23) we notice that r depends only on cosθ, or alternatively θ, and that cosθ = 1 − r2/2a2, where 0 ≤ r ≤ 2a. We also recall that all isotropic correlation functions on can be expressed as a Legendre expansion that depends only on cosθ (Eq. (5)). It is then possible to represent any isotropic correlation function on as a function of either r or θ.*
In particular, consider the representation of the Gaussian on in terms of the Legendre polynomials. As shown in WC01:
In+1/2(ω) denoting the modified Bessel function of fractional order n + 1/2, and ω = (a/Dg)2. In view of the results on , one might expect that the correlation kernel cs (θ) implied by diffusion on (Eq. (5)) is similar to the Gaussian correlation function (24) on . Indeed, for a given length-scale Dg, it is possible to find a corresponding parameter κs in Eq. (6) such that the difference between cs (θ) and cg (θ) is ‘small’ (Roberts and Ursell, 1960; Hartman and Watson, 1974). In particular, consider the scales of interest in atmospheric and ocean data assimilation for which ω ≫ 1. Matching the n = 0 coefficients and of the Legendre polynomials and noting that
for large ω, we obtain the approximation to the normalization factor
Now matching the n = 1 coefficients and and using (25) leads to the approximation
WC01 illustrate the excellent agreement between cs (θ) and cg (θ), particularly for the large scales, for a given length-scale Dg and with κs approximated according to Eq. (26) (see their Figure A1).
Equations (25) and (26) are none other than those derived earlier for the diffusion problem in (cf. Eqs (21) and (20) with d = 2). In other words, for length-scales small compared to the radius of the Earth, we obtain the somewhat intuitive result that diffusion on the sphere (the s operator) is well approximated by diffusion on the 2D Cartesian plane (the operator). For calibrating the correlation model, it is then possible to employ the simple expressions (26) and (25) for the length-scale and normalization factor in place of the more complicated expressions (8) and (7).
There are two main drawbacks with the generalized explicit diffusion model of WC01. First, the correlation functions that can be represented by the model have limited flexibility in the spectral domain, especially at high wave numbers where their decay rates are at least as fast as that of the Gaussian function. In data assimilation, this can result in excessive smoothing of small-scale features in the analysis (Purser et al., 2003b). Second, the explicit scheme is subject to a stability criterion that depends on the ratio of the length-scale and grid size, raised to the power of 2P. As a result, many iterations may be required when the length-scale is large compared with the grid resolution. With the variable coefficient and anisotropic versions of the model discussed later, the computational cost of the algorithm can be especially high. A diffusion model based on an implicit formulation can overcome these limitations, as described next.
2.3. Implicit diffusion on
Consider again the diffusion equation (1) but this time discretized using a backward-Euler (implicit) scheme:
where, as in Eq. (10), we can assume sm = m and hence Δs = 1, and interpret κ as the square of a scale parameter (Eq. (11)). Rearranging Eq. (27) and applying it repeatedly on the interval 0 < m ≤ M leads to the ‘reverse-time’ or inverse diffusion operator
Equation (28) can be interpreted as a roughening operator as opposed to the diffusion operator itself, which is a smoothing operator.
Following Eq. (2), we define the initial condition as
where γh is a normalization constant. Weaver and Ricci (2004) show that the differential operator is the inverse of a correlation operator , where the latter is given by an integral equation of the form (3), with isotropic correlation function ch (θ) of the Legendre form (5) as its kernel. The spectral coefficients of ch (θ) are strictly positive and given by
The normalization factor is
and the Daley length-scale is given by Eq. (8) with replaced by .
In the explicit diffusion model, the only free parameter was the product κsM = ML2 which controls the spatial scale of the quasi-Gaussian correlation kernel (Eq. (6) with s = sM). The implicit diffusion model, on the other hand, allows for greater control of the shape characteristics of the associated correlation kernels since both L2 and M are free parameters. Numerically, this extra flexibility is reflected by the important property of unconditional stability of the implicit scheme. In the limiting case of M → ∞, with ML2 held fixed, the spectral coefficients (30) reduce to those of the quasi-Gaussian solution which is the only correlation function that can be represented by solving the diffusion equation explicitly.
The upper panel in Figure 1 displays correlation functions ch (r(θ)) for different values of M and a constant Daley length-scale (500 km). The values are plotted as a function of chordal distance r(θ). The Gaussian function cg (r(θ)) is also shown for reference. Increasing the value of M decreases the ‘fatness’ of the tail of ch (r(θ)), with the Gaussian providing the upper limit as M → ∞. The total variance of ch (r(θ)) and cg (r(θ)) is given by their value at the origin, which is equal to one. The coefficients and give the contribution of each wave number n to the total variance of ch (r(θ)) and cg (r(θ)), respectively, and thus define the variance-power spectra. The lower panel in Figure 1 shows a log-log plot of this spectra as a function of n. Here we see that the increased fatness in correlation shape for low values of M is associated with higher variance and a reduced damping rate in the small scales, slightly less variance in the intermediate scales, and increased variance in the large scales.
As with the generalized diffusion equation, a linear combination of powers of scaled Laplacian operators (12) can be introduced in Eq. (28) to yield a larger family of correlation functions, but at extra cost. The spectral coefficients of this larger family are given by
with γh modified accordingly so that ch (0) = 1. The smoothing spline functions introduced by Wahba (1982) correspond to the special case of Eqs (5) and (32) for which M = 2.
Increasing the degree P of the polynomial of the Laplacian leads to correlation functions that oscillate about the zero axis. This is illustrated in the upper panel of Figure 2, where the generalized ch (r(θ)) are displayed with different values of P but a fixed value of M = 4. The amplitude of the negative lobes increases with increasing value of P. In spectral space, the negative lobes are associated with a decrease in variance in the large scales and an increase in variance in the intermediate scales. Increasing the value P also leads to a steepening of the decay rate of the variance in the smaller scales.
A straightforward variant of Eq. (32) that can be used to enhance the oscillations while maintaining a gradual spectral decay rate at high wave numbers is
where ρp is a dimensionless coefficient that can take on both negative and positive values. This is equivalent to redefining the diffusion coefficients (13) as κp = ρpL2p. Equation (33) yields positive coefficients by restricting M to be even. Examples are shown in Figure 3 for the case P = 2 and M = 2, and a single scale parameter L1 = L2 = L. Here ρ2 has been set to one and negative values have been used for ρ1. Increasing the magnitude of ρ1 results in a significant increase in the amplitude of the oscillations and a much sharper spectral peak at intermediate scales. Notice that by setting ρ1 = 2 we recover the non-oscillatory correlation function governed by Eq. (30) with M = 4, which is displayed in Figure 1 (dashed curves).
On a numerical grid, can be approximated by a discrete operator that solves the linear system (28)–(29) for a given right-hand side . We refer to as an implicit diffusion correlation operator. Although the cost of each iteration of an implicit diffusion operator will generally increase relative to that of the explicit scheme, the total cost of the implicit algorithm can easily decrease through the possibility of performing significantly fewer iterations.
where Γ(ν) denotes the Gamma function, and applying the inverse FT to Eq. (35) leads to an integral solution of the general form (16), where C(x,x′) = cw (r) is a unit-amplitude isotropic function given by
Kν(r/L) denoting the modified Bessel function of the second kind of order ν, and r = |x − x′|. Since is strictly positive, cw (r) is a valid correlation function in (Bochner's theorem; see Theorem 2.10 in Gaspari and Cohn, 1999). Notice that the power spectrum depends on d but the correlation function itself cw (r) is independent of d.
Equation (38) is a class of correlation function well known in the geostatistical literature as the Whittle–Matérn or Matérn family (Gneiting, 1999; Stein, 1999; Guttorp and Gneiting, 2006). The link between this correlation family and the fractional differential operator (34) is attributed to Whittle (1954, 1963). Of particular interest here is the subclass of Matérn functions that correspond to (positive) integer values of the parameter M = ν + d/2. For this subclass, the inverse correlation operator has a greatly simplified representation for numerical applications and can be interpreted as an M-step implicitly formulated diffusion operator (MW10), where (cf. Eqs (15) and (16))
The correlation kernels and their associated FT are given by
Equation (39) yields valid correlation functions if M > d − 1 (ν > 0; Guttorp and Gneiting, 2006). Notice also that in contrast to the full Matérn family, the implicit-diffusion kernels depend on d (which has been made explicit by adding the subscript d in ) but their normalized power spectrum is independent of d. For odd values of d, Eq. (39) reduces to a polynomial of order M − (d + 1)/2 times an exponential function; this is the well-known class of AR functions.
Of relevance here are the spaces , and . The implicit-diffusion kernels on these spaces can be written explicitly as
From Eq. (37), the expressions for the normalization constants become
Using Eq. (19), the Daley length-scale of the implicit diffusion kernels in can be evaluated as
Equation (44) is derived in MW10 for d = 1 and in Appendix A for d = 2. The generalization to d > 2 follows by noting that the correlation functions associated with odd d all have the form (40) with , while those functions with even d all have the form (41) with . Equation (44) imposes further restrictions on the choice of M where now we require
In , for example, we require M > 2. Finally, even values of M are more convenient than odd values of M since they greatly simplify the derivation of a ‘square-root’ factor of the diffusion operator, which is important for estimating normalization factors and for preconditioning in variational assimilation (WC01).
The explicit diffusion kernels are the limiting case of the implicit diffusion kernels as M → ∞ with fixed. This is easily deduced from Eqs (36), (37) and (44) where, for M = ν + d/2 large, (Eq. (22)), (Eq. (21)), and , where is the FT of the d-dimensional Gaussian function (17). Based on the similarity of the explicit diffusion kernels on and when (Dg/a)2 ≪ 1, one would expect similar agreement between the implicit diffusion kernels on and when .
Figure 4 shows the correlation function ch (r(θ)) for M = 4 plotted as a function of chordal distance r(θ) for four different Daley length-scales Dh (solid curves). For comparison, the correlation functions for M = 4 are also shown (dashed curves). The Daley scales have been set to Dh and then the corresponding L have been computed from Eq. (44). As expected, the curves are virtually indistinguishable for length-scales of primary interest (<1000 km). Only when the length-scale exceeds 2000 km do noticeable differences appear, and these mainly occur in the tail of the function. In other words, the differential operator (h)−1 on can be well approximated locally by the differential operator acting on the tangent plane . The simple relations (43) and (44) can then be used in place of the spectral expansions (31) and (8) to provide a good approximation of the normalization factor and length-scale. This is convenient especially with grid-point ocean models where spectral expansions cannot be readily computed due to the presence of complex land boundaries.
It is important to stress, however, that itself is not a valid correlation function on . A valid correlation function on from the Matérn family is the AR function . For example, Gaspari and Cohn (1999) discuss the second-order AR (SOAR) function on (see their Eq. (2.36)). Figure 4 shows the fourth-order AR function for different length-scales (dashed-dotted curves). The differences between and ch (r(θ)) are larger than those between and ch (r(θ)) but still quite small for length-scales less than 1000 km.
A more general set of correlation functions on can be modelled using a linear combination of implicit diffusion operators or a generalized implicit diffusion operator constructed from the inverse of a polynomial of Laplacian operators raised to the power of M (MW10; Yaremchuk and Smith, 2011; or see Purser et al., 2003b, for related approaches involving the recursive filter). The correlation functions generated by the first approach are described by a linear combination of Matérn functions where the weighting coefficients for each function are specified such that the combined function is positive definite. Gregori et al. (2008) provide general conditions on the model parameters for achieving this. MW10 provide an example in in which two SOAR functions are combined to produce a correlation function with negative lobes.
The second approach is analogous to the one outlined in section 2.3 for the problem on (Eqs (32) and (33)). Hristopulos (2003), Hristopulos and Elogne (2007) and Yaremchuk and Smith (2011) have studied extensively the special case M = 1 and P = 2 on for which the parameter settings κ1 = ρL2 and κ2 = L4 with ρ < 0 and satisfying ρ2 < 4 yield a family of positive-definite, oscillatory functions such as those illustrated in Figure 3 on . With all of these approaches, however, the advantages of increasing the flexibility in the correlation model have to be carefully measured against the increase in computational cost that results from the need to solve additional or more complicated large linear systems, and the difficulty of having to estimate additional parameters.
3. Anisotropic diffusion
Isotropic correlation models are commonly used in data assimilation algorithms because of their simplicity and computational convenience. There is no reason, however, to expect actual background-error correlations to be isotropic in geophysical fluids such as the ocean. On the contrary, one would expect them to be strongly anisotropic, particularly near coastlines, bathymetry, or ocean fronts. General anisotropic correlation models allow for preferential stretching or shrinking of the correlation functions along arbitrary directions. With a diffusion-based correlation model this can be done using a diffusion tensor, as outlined in this section. To fix the concepts and definitions, we focus mainly on the homogeneous and anisotropic problem. Methods for estimating the parameters of a general inhomogeneous and anisotropic diffusion model are described in section 4.
3.1. Homogeneity and anisotropy
Consider the 2D diffusion equation on ,
where is an anisotropic, but constant diffusion tensor
which is assumed to be symmetric (κyx = κxy) and positive definite () so that κ is guaranteed to be invertible. The diagonal terms of the tensor determine the strength of the diffusion in the coordinate directions x and y, while the off-diagonal elements allow the principal axes of the diffusion to be rotated relative to x and y.
The solution of Eq. (45) is a straightforward extension of the solution to the isotropic problem (Pannekoucke and Massart, 2008; Pannekoucke, 2009). Given the initial condition , the solution can be expressed as Eq. (16) (with d = 2) where the kernel is given by the Gaussian function
|κ| denoting the determinant of κ, and the non-dimensional distance measure
with x = (x,y)T. From this definition, κ can also be interpreted as the aspect tensor of the Gaussian function (eq:cg) (Purser et al., 2003b). The elements of κ have physical units of length squared. Setting
where ∇∇T is the outer product of the 2D gradient operator ∇ = (∂/∂x ∂/∂y)T and its transpose. The correlation Hessian tensor is of interest here since it is a quantity that can be estimated from sample statistics of background error (see section 4). Following the basic procedure described in Appendix A, it is straightforward to verify that
In the isotropic case, κ = κI and hence Hg = (Dg)−2I where (Dg)2 = 2κs is the square of the Daley length-scale. The inverse of the tensor (49)
can thus be considered as a generalization of the (square of the) Daley length-scale to the anisotropic case. We will thus refer to this quantity as a Daleytensor.
For the 3D diffusion equation, the diffusion tensor contains six independent elements:
where κyx = κxy, κzx = κxz and κzy = κyz. In direct analogy with the 2D problem, the integral solution involves a 3D Gaussian kernel with aspect tensor given by (52), and normalization constant given by γg3 = (4πs)3/2|κ|1/2 (cf. Eq. (18)). The relationships (49)–(51) hold for the 3D problem with ∇ now interpreted as the 3D gradient operator.
To approximate a 2D or 3D anisotropic and homogeneous Gaussian correlation operator numerically, we can solve Eq. (45) with an explicit scheme,
where from Eqs (50) and (51)
and the operator 1 + ∇ · κ∇ is understood to be in discrete form. If the non-diagonal tensor elements of κ are zero, which can always be achieved by rotating the model coordinates to be aligned with the principal axes of the ellipse or ellipsoid implied by Eq. (48) (see, for example, Xu, 2005), then the 2D or 3D Gaussian operator can be replaced by a product of 1D Gaussian operators acting independently along each direction x, y and z. Ignoring boundary conditions, each 1D Gaussian operator can in turn be approximated by a 1D diffusion operator discretized using an M-step explicit scheme.
Extending these results to the d-dimensional implicit case, we can define a set of anisotropic and homogeneous Matérn correlation operators, with ν = M − d/2, as solutions to the following linear system (cf. Eq. (34)):
The associated correlation functions are given by
with defined by Eq. (48). As for the Gaussian, we can derive the following relationships between the Hessian tensor of and diffusion tensor κ (see Appendix A):
The solution described in section 2.4 corresponds to the isotropic case κ = L2I with L2 = (Dwd)2/(2M − d − 2).
3.2. Inhomogeneity and anisotropy
Analytical expressions for the correlation kernels of the anisotropic diffusion operators in with spatially varying diffusion tensors κ(x) are not known in general. Paciorek and Schervish (2006) describe a family of anisotropic and inhomogeneous correlation functions that generalize the standard isotropic and homogeneous Gaussian and Matérn family. These correlation functions have the form
for the Gaussian-like function, and
for the Matérn-like functions (ν = M − d/2) where
A(x) and A(x′) denoting the (symmetric and positive definite) aspect tensors at points x and x′, respectively. Equations (57) and (58) with A(x) ≈ 2sκ(x) for the Gaussian-like function and A(x) ≈ κ(x) for the Matérn-like functions can be considered as the approximate kernels of the explicit and implicit forms of the anisotropic diffusion operator when the diffusion tensors κ(x) vary slowly and smoothly in space. This is illustrated in MW10 who provide examples in 1D comparing a two-step implicit-diffusion kernel and an inhomogeneous version of the SOAR function for different spatial distributions of the length-scale parameter.
4. Specifying the anisotropic tensor
The elements κxz, κzx, κyz and κzy of the 3D diffusion tensor account for anisotropy between the horizontal and vertical directions. The importance of these terms compared to the diagonal terms is related to the choice of vertical coordinate in the correlation model. In an ocean model, for example, a natural vertical coordinate is a hybrid coordinate involving a standard geopotential (z) coordinate in unstratified regions such as the mixed layer, an isopycnal (ρ) coordinate in strongly stratified regions, and a terrain-following (s) coordinate near the ocean bottom, the latter being particularly important in shallow coastal regions (Haidvogel and Beckmann, 1999). In this hybrid coordinate system, the flow is more naturally decoupled into ‘horizontal’ and ‘vertical’ processes. If the same coordinate system is adopted for a background-error correlation model then it is reasonable to assume, at least from a physical viewpoint, that the non-diagonal tensor elements κxz, κzx, κyz and κzy, and possibly κxy and κyx, are small and can be neglected. However, anisotropy in background-error correlations can also arise from the assimilation of data, especially when the data coverage is irregular. In general, the relative importance of the diagonal and non-diagonal terms of the tensor can only be determined after a thorough diagnostic study involving, for instance, the direct estimation of the elements of the Daley tensor.
Many ocean models used for global- and basin-scale circulation studies employ a z coordinate. WC01 illustrated how a standard isopycnal diffusion tensor used to parametrize mixing of unresolved processes in a z-coordinate ocean model could also be used to transform the coordinates of a background-error correlation model formulated as an explicit 3D diffusion operator. An analogous coordinate transformation was proposed within the framework of Optimal Interpolation by Balmaseda et al. (2008). While the isopycnal correlation model has appealing features, the implementation based on the explicit scheme proposed by WC01 is too expensive for routine applications since a prohibitively high number of iterations is required to maintain numerical stability in regions of strong isopycnal gradients. Moreover, the specification of the Daley length-scales must be performed in isopycnal space, which makes estimating them more difficult in a z-coordinate model. In the remainder of this section we explore alternative methods for defining anisotropic and inhomogeneous correlations, which involve estimating the Daley tensor directly in the model coordinate system.
4.1. Ensemble estimation methods
Given an estimate of the Daley tensor, the anisotropic response of the explicit diffusion operator can be calibrated using Eq. (53), which relates the Daley tensor of the Gaussian function to the diffusion tensor. Alternatively, the anisotropic response of the implicit diffusion operator can be calibrated using the third expression in (eq:Lhi2), which relates the Daley tensors of the ν = M − d/2 Matérn functions to the diffusion tensor. Several authors have proposed methods for estimating the Daley tensor using perturbations from an ensemble of model states (Belo Pereira and Berre, 2006; Pannekoucke and Massart, 2008; Pannekoucke, 2009; Sato et al., 2009). The basic procedure is outlined below. Two of the methods will then be compared in idealized experiments using the diffusion equation. For simplicity, we focus on the 2D case.
Assume that an ensemble of Ne model states is available and that the distribution of these states about their mean is a good approximation of the true probability distribution function (pdf) of the model-state (background) error . In variational assimilation, this pdf is assumed to be Gaussian and thus fully described by its mean E[ and covariance function
where E[ · ] denotes the expectation operator. The associated correlation function C(x,x′) can be determined from the factorization
is the standard deviation of at x. Assuming that C(x,x′) is at least twice differentiable then we can define the symmetric tensor
where ∇ = (∂/∂x ∂/∂y)T and ∇′ = (∂/∂x′ ∂/∂y′)T. The local correlation Hessian tensor is the value of T at x = x′. Assume further that, in a neighbourhood of x, C(x,x′) can be well approximated by a homogeneous function where r = x−x′ = (x − x′,y − y′)T = (, )T. Letting , then we can define
such that (see Appendix B).
Let H(x) = T(x,x) denote the local correlation Hessian tensor at x. Pannekoucke and Massart (2008) and Pannekoucke (2009) assume a Gaussian form for the correlation function and then invert this function at each point x to estimate H(x) in terms of sample correlation estimates with neighbouring points. Belo Pereira and Berre (2006) propose an alternative method for estimating H(x), which does not require a prior assumption on the functional form of the correlations. Their method leads to the expression
The gradient terms can be estimated numerically using finite differences. When the correlation function is strictly homogeneous, is equivalent to the constant tensor (Eq. (61)) if the sampling operator is replaced by the expectation operator (see Appendix B). Note that the derivation of Eq. (62) is based on the rather general assumptions of differentiability and local homogeneity of the correlation function. A specific assumption about the actual form of the correlation function is implied only when is employed with a particular diffusion operator (explicit or M-step implicit scheme). Hristopulos (2002) and Chorti and Hristopulos (2008) describe a related approach in geostatistics which involves estimating the aspect ratios and orientation angle required to transform the local covariance Hessian tensor into isotropic form.
Multidimensional Gaussian correlation operators can be applied efficiently using a combination of 1D recursive filters or 1D diffusion operators (Purser et al., 2003a, 2003b; MW10). For anisotropic Gaussian operators, the so-called triad (hexad) algorithm (Purser et al., 2003b; Purser, 2005) allows one to determine from the aspect tensor of the 2D (3D) Gaussian function, the three (six) generalized grid-lines along which the 1D filters should be applied. Within that framework, various flow-dependent formulations of the aspect tensor have been proposed (De Pondeca et al., 2006; Liu et al., 2007, 2009; Sato et al., 2009). Of particular interest here is the hybrid formulation of Sato et al. (2009) where the inverse of the aspect tensor of the Gaussian function is defined as a linear combination of a ‘conventional term’ based on a quasi-isotropic, static formulation () and an ‘ensemble term’ formed from the sample covariance of the gradient of the ensemble-generated perturbations, normalized by the sample variance of the perturbations:
and α and β are weighting coefficients. Sato et al. (2009) provide a heuristic derivation of Eqs (63)–(64). They are equivalent to Eq. (62) when α = 0 and β = 1, and when the standard deviations are constant. As with the hybrid covariance formulations that involve combining static and ensemble-based expressions of the full covariance matrix (Wang et al., 2008), the static term in the aspect tensor is intended to give the estimate more robustness especially when the ensemble size is small, although accounting for it requires extra parameters that must be tuned empirically.
Finally, with small ensemble sizes it can be advantageous to apply a local spatial averaging or filtering operator to the estimated variances and covariances in order to reduce the effects of sampling error (see, for example, Berre and Desroziers (2010) for a thorough review of recent work in this area). Letting F denote a particular filtering operator then the expressions for the filtered estimate of the inverse tensor are given by Eqs (62) and (63)–(64) with
4.2. Numerical experiments
In this section we perform idealized experiments to evaluate and compare the effectiveness of Eq. (62) and Eqs (63)–(64) for estimating the parameters of an anisotropic tensor. For simplicity, we focus on the 2D anisotropic diffusion problem and the solution algorithm based on the explicit scheme. Furthermore, for the tensor estimated using Eqs (63)–(64), only the special case α = 0 and β = 1 is considered.
The experimental design is as follows. First, we define the ‘true’ covariance matrix of the problem as
where L is the M-step explicit diffusion operator (1 + ∇ · κ∇)M discretized using a standard centred finite-difference scheme on a uniform grid, Γ = Γ1/2Γ1/2 is a diagonal matrix of normalization factors, and Σ is a diagonal matrix of standard deviations σ. With constant parameters and ignoring the influence of boundaries, B defines a Gaussian covariance matrix.
Next, a sample of Ne spatially uncorrelated random vectors , l = 1,…,Ne, are produced on the grid, where the distribution of each is taken to be Gaussian with and . Each vector is then transformed into a new vector such that . This is done using the ‘square-root’ of the B-operator,
where L = L1/2L1/2, the exponent 1/2 implying M/2 iterations of the explicit diffusion operator (with M taken to be even). The sample covariance matrix constructed from the ensemble of vectors provides an estimate of the true covariance matrix:
Our interest here is not to try to reconstruct the full covariance matrix from (65) but rather, with the help of Eq. (62) or Eq. (64), to try to reconstruct the anisotropic tensor used in L to generate the . Indeed, the sampling errors resulting from estimating the local anisotropic tensor can be expected to be much smaller than those resulting from estimating the full covariance matrix. For the 2D problem, the tensor estimation requires, at each grid point, sample estimates of the standard deviation of and of the three independent tensor elements involving the gradient of , i.e. a total of 4N elements where N is the total number of grid points. This is much smaller than the (N2 + N)/2 independent elements required to determine the full covariance matrix.
The numerical experiments are performed in a square domain on a 2D grid xi,j = (xi,yj) where xi = iΔx, , and yj = jΔy, . Here, Δx = Δy = 1 unit and N = 200 × 60, and thus the effective size of the B matrix is (1.2 × 104) × (1.2 × 104). Neumann boundary conditions are employed at the solid walls located at the domain edges. As a result, the implied correlation function near the boundary is slightly modified from the target Gaussian (MW10).
At each grid point xi,j, the ‘true’ diffusion tensor κ for L is defined according to (cf. Eq. (53))
where Di,j is the local Daley tensor which is formulated as
being a diagonal matrix and Ri,j a rotation matrix (). The elements (Dxx)i,j, (Dyy)i,j and (Dxy)i,j = (Dyx)i,j of Di,j are thus determined by the diagonal elements and of and by the rotation angle θi,j of Ri,j. For the experiments described here, θi,j = θ is constant, while the parameters and are specified as a simple oscillatory function of the spatial coordinates xi,j:
, , and X = Y = 20. Similarly, the variances are specified as
where and . Experiments are performed with different values of the parameters θ, , , and (see Table 2).
Table 2. Bias and RMSE of the estimates of the correlation Hessian tensor elements (Hxx,Hyy,Hxy) using expressions (62) () and (64) (). The second column lists the parameter settings in the ‘true’ covariance model and the third column indicates the choice of ensemble size (Ne) and spatial filtering scale (Navg) used in the estimation process. The RMS of the true values of (Hxx,Hyy,Hxy) are (5.3,5.3,0) × 10−2 when θ = 0, and (5.0,5.0,1.7) × 10−2 when θ = π/4.
(0, 3, 6, 1, 1 )
(0.06, 0.20, 0.0)
(1.2, 1.6, 0.46)
(0, 3, 6, 1, 1 )
(−0.01, 0.15, 0.0)
(1.2, 1.6, 0.47)
(0, 3, 6, 1, 1 )
(0.33, 0.39, 0.07)
(4.6, 4.8, 2.1)
(0, 3, 6, 1, 1 )
(0.92, 0.99, 0.10)
(4.9, 5.2, 2.2)
(0, 3, 6, 1, 5 )
(−0.07, −0.22, 0.0)
(1.2, 1.6, 0.47)
(0, 3, 6, 1, 5 )
(1.2, 1.1, −0.01)
(2.5, 2.8, 1.2)
(π/4, 3, 6, 1, 5)
(−0.22, −0.40, 0.01)
(1.4, 1.7, 0.85)
(π/4, 3, 6, 1, 5)
(1.1, 0.88, 0.0)
(2.6, 2.7, 1.4)
(π/4, 3, 6, 1, 5)
(0.17, 0.10, −0.02)
(4.0, 4.5, 2.3)
(π/4, 3, 6, 1, 5)
(0.33, 0.28, 1.9)
(3.3, 3.6, 1.9)
(π/4, 3, 6, 1, 5)
(0.51, 0.45, −0.36)
(2.0, 2.2, 2.8)
(π/4, 3, 6, 1, 5)
(0.02, 0.14, −0.07)
(1.2, 1.4, 0.86)
(π/4, 3, 6, 1, 5)
(0.31, 0.19, −0.32)
(0.98, 1.0, 1.1)
The normalization factors γi,j of the diagonal matrix Γ are approximately given by the expression
This approximation was used by Pannekoucke and Massart (2008), for example, and is reasonable if the diffusion tensor varies in space on a scale much larger than the local correlation scale and with a proper treatment of the boundary conditions (MW10). The factors can be estimated to a higher accuracy using more refined analytical approximations (Purser et al., 2003b; Purser, 2008a, 2008b; MW10; Yaremchuk and Carrier, 2012) or randomization methods (WC01; Yaremchuk and Carrier, 2012). They can also be computed exactly using the δ-function method (WC01; MW10). In this idealized study, we employ the exact normalization method in order to avoid introducing a bias in the ensemble perturbations and thus complicating the interpretation of the results. In practice, however, the exact computation is generally not affordable and hence the representation of covariances using the diffusion equation will also be affected by approximations in the normalization factors. The errors that can result from using the approximate expression (67) are illustrated in the experiments below.
The estimation of the tensor via the statistical relationships (62) and (64) is achieved using centred finite-differences. Estimates of the first derivatives of the error and its standard deviation produce values at the interface of the grid cells, i.e. at the half-integer points (i + 1/2,j) for the x-component and (i,j + 1/2) for the y-component. The sample variance of these quantities is computed directly at these points to evaluate the numerator in the expressions for the diagonal elements of the tensor. The off-diagonal elements involve estimates of the cross-product of the x- and y-components of the derivatives. This requires interpolation of one of the component derivatives to the point where the other component derivative is defined. To estimate the cross-product at (i + 1/2,j), the x-component derivative that is defined there is multiplied with an estimate of the y-component derivative obtained by averaging its values from the four surrounding points (i,j + 1/2), (i + 1,j + 1/2), (i,j − 1/2) and (i + 1,j − 1/2), and vice-versa for estimating the cross-product at (i,j + 1/2). To compute the denominator in the expressions for the tensor elements, the sample variance of the error is interpolated from (i,j) points to (i + 1/2,j) or (i,j + 1/2) points. In order to use the estimated tensor elements in the diffusion equation, the elements are first averaged to the (i,j) points and then the off-diagonal elements averaged to force symmetry. The estimated tensor is then inverted at each point and used with the relation (eq:Lij) to define the diffusion tensor at each point. Finally, interpolation is used to define the values of the tensor elements at the half-integer points (i + 1/2,j) or (i,j + 1/2) where they are required with the centred-difference formulation of ∇ · κ∇.
Table 2 summarizes the results from several experiments with different parameter settings . Three cases are considered. In the first case, the principal axes of the anisotropic correlations are aligned with the grid-lines, and the variance is constant: . The second case extends the first case by allowing the variances to vary in space: . Finally, the third case extends the second case by rotating the principal axes of the anisotropic correlations relative to the grid-lines: . The quality of the estimation is measured in terms of the domain-averaged bias and root-mean-square error (RMSE) of the estimates of the elements (Hxx,Hyy,Hxy). (The results for the estimates of Hyx are not given since they are almost identical to those of Hxy). For reference, the domain-averaged RMS of the true values of (Hxx,Hyy,Hxy) are (5.3,5.3,0) × 10−2 when θ = 0, and (5.0,5.0,1.7) × 10−2 when θ = π/4.
With (Eq. (62)) and (Eq. (64)) produce similar results with a relatively large ensemble (Ne = 100) as one might expect since the true variances are constant (Exps 1–2 in Table 2). Interestingly, however, is noticeably more accurate than with a small ensemble size (Ne = 10; Exps 3–4). When the variances are spatially varying ( and ), the errors for become significantly larger, whereas those for are similar to the constant variance case (Exps 5-8). This illustrates the importance of the second term in Eq. (62).
Local spatial filtering is beneficial for reducing the RMSE especially when the ensemble size is small (Ne = 10; Exps 9–11). With the raw ensemble estimates (Navg = 0), the RMSE is comparable to the RMS of the true signal (Exps 3–4, 9). Here, a very simple filtering procedure has been used in which the estimate at xi,j is obtained by averaging estimates at points within Navg grid points of (i,j) where in the examples considered Navg = 1 or 3. This increases the size of the averaging sample at each point to Neff = (2Navg + 1)2 × Ne, except near the boundary where fewer points are used in the averaging process. While increasing the value of Navg reduces the RMSE, it does so at the expense of increasing the bias in the estimates. With a larger ensemble (Ne = 100), good results are obtained when a ‘light’ filtering is applied (Navg = 1), with both the bias and RMSE being reduced relative to the no filtering case for all but the off-diagonal elements which are very slightly degraded (Exps 7, 12–13). The filter in this example is very simple and the choice of filtering scale is somewhat ad hoc. More sophisticated (objective) filters could be expected to perform better as discussed by Raynaud et al. (2009) and Berre and Desroziers (2010), and recently by Raynaud and Pannekoucke (2012) within the context of filters based on diffusion.
The correlations obtained using the ‘true’ tensor with the parameter settings are illustrated at selected points in Figure 5(a). Figure 5(b) shows the corresponding correlations estimated directly from the sample covariance matrix (Eq. (65)) with a 100-member ensemble. Sampling errors are large and manifest themselves as spurious non-local correlations. In contrast, the diffusion-based correlation model is localized by construction. The correlations resulting from estimating the diffusion tensor from the 100-member ensemble are shown in Figure 5(c). The estimated correlations are in good agreement with the true correlations and notably capture prominent anisotropic features such as the rotation of the principal axes relative to the grid lines. The third correlation pattern from the left boundary is computed with respect to a point that is located midway between maximum and minimum values of Dxx, Dyy and σ2 where the spatial derivative of these parameters is maximum and thus where one would expect the local homogeneous assumption to be least valid. At this location the estimated errors are largest and up to 20% (Figure 5d). The breakdown of homogeneity also affects the accuracy of the approximate expression (67) for the normalization factors. This can be seen in Figure 5 parts (e) and (f), which show the estimated correlations and associated error when approximate normalization factors from Eq. (67) are used in place of the exact factors that were used to produce Figure 5(c). The amplitude of the error now reaches 50% for the third correlation pattern (the colour bar is truncated at 30%) and is noticeably larger for the other correlation patterns as well. Finally, Figure 5 parts (g) and (h) show the correlations and associated errors obtained using the tensor estimated with Ne = 10 combined with local spatial averaging. While the correlations are not as accurate as those with Ne = 100, they are still reasonable approximations. The maximum error for the third correlation pattern is approximately 25% and reaches 36% when the approximate normalization factors are used (not shown).
5. Summary and conclusions
Accounting for general background-error correlations effectively and efficiently is a considerable challenge in geophysical data assimilation. In VDA, general background-error correlation models can be defined using differential operators constructed numerically from the explicit or implicit solution of a diffusion equation. Theoretical results underpinning the diffusion approach to correlation modelling were reviewed in this paper. First, the isotropic, constant-coefficient diffusion problem was considered both on the sphere and in the d-dimensional Euclidean space . The covariance functions (kernels) of the integral solution operators implied by explicit and implicit diffusion in these spaces were identified. The solutions on the sphere were shown to be well approximated by the solutions on for scales of interest in ocean and meteorological data assimilation. Expressions relating the diffusion model parameters to the parameters that control the length-scale and amplitude (normalization factor) of the covariance function were also given. These results provided the basis for constructing more general correlation operators via anisotropic diffusion, which was the focus of the second part of the paper.
Anisotropic diffusion was considered in . The anisotropic diffusion problem is characterized by a diffusion tensor that controls the direction of the covariance response, as well as its scale and amplitude. Solutions to the anisotropic, constant-tensor diffusion problem are integral operators that involve covariance kernels with the same basic form as those of the isotropic, constant-coefficient problem. With the explicit scheme, these functions are approximately Gaussian, whereas with the implicit scheme they are members of the larger Matérn family (e.g., in they are AR functions). For the anisotropic functions, distance is defined by a norm whose metric is given by the inverse of the diffusion tensor. This metric can in turn be related to the correlation Hessian tensor which is defined by the tensor of second-derivatives of the correlation function evaluated at zero separation. The importance of this tensor is that it can be related to quantities that can be estimated directly from ensemble statistics. The inverse of the correlation Hessian tensor was referred to as the Daley tensor in this paper in view of its close connection to the conventional Daley length-scale in the isotropic case.
Ensemble data assimilation methods can be used to provide flow-dependent estimates of the background-error covariances. In realistic applications, the number of independent background-error covariances that need to be estimated is huge and the number of ensemble members that can be affordably run is very limited. Methods are then required to synthesize the ensemble-covariance information to avoid manipulating huge covariance matrices, on the one hand, and to reduce the effects of sampling error, on the other.
The correlation information in the ensemble can be synthesized using a diffusion model with an anisotropic and spatially varying tensor.† Procedures for estimating the local Hessian tensor (which in turn can be related to the diffusion tensor) from ensemble perturbations were described and compared in idealized numerical experiments. The method of Belo Pereira and Berre (2006), which assumes local homogeneity of the correlation function but accounts for spatially varying variances, was shown to work particularly well, and is well suited for the automated computations required in a cycled ensemble data assimilation system. Local spatial filtering of the tensor was critical with small ensemble sizes (order 10), but the raw ensemble with 100 members gave good results without spatial filtering in our example. In general, a carefully designed objective filter would be beneficial in order to maximize the signal-to-noise ratio of the ensemble-estimates of the tensor elements in a similar way that it has been shown to be beneficial to the ensemble estimation of background-error variances (Raynaud et al., 2009; Berre and Desroziers, 2010).
In realistic applications, the numerical stability condition associated with explicit diffusion schemes can severely limit their computational efficiency. In particular, many iterations are likely to be needed with general anisotropic and inhomogeneous diffusion models that employ ensemble-estimated tensors. Implicit diffusion schemes are more robust but require solving a large linear system for which efficient methods that are well-suited to massively parallel machines are required. This important practical aspect of the problem was not addressed in this paper and should be the subject of further research.
Financial support from the French National Research Agency (ANR) COSINUS programme (VODA project, no. ANR-08-COSI-016), the RTRA STAE foundation (ADTAO project), the European Framework Programme 7 (COMBINE project, GA 226520), and the French LEFE-ASSIM programme is gratefully acknowledged. This work benefited from discussions with Loik Berre, Serge Gratton, Sébastien Massart, Olivier Pannekoucke and Andrea Piacentini. The anonymous reviewers and the Associate Editor Martin Leutbecher provided many helpful remarks for improving the presentation of the paper.
The Daley tensor of the 2D implicit-diffusion kernels
In this appendix we show that the expression for the Daley tensor of the 2D implicit-diffusion kernels (Eq. (55) with d = 2) is related to the diffusion tensor by Eq. (56). For clarity of notation the subscript and superscript of will be dropped hereafter. The relationships between the Daley and diffusion tensors for the implicit-diffusion kernels in higher dimensions and for the Gaussian function (Eqs (49)–(51)) are straightforward to verify following the basic procedure outlined here.
From the chain rule, the three independent elements of the outer product in the first equation of (56) can be written as
Expressing Eq. (55) with d = 2 as
where αM = 22−M/(M−2)! and M > 2, and using the following recurrence relation for the modified Bessel functions of the second kind of integer order n (Eq. 9.6.26 of Abramowitz and Stegun, 1970),
where Kn = Kn(), allows us to write
The inverse of the symmetric diffusion tensor (46) can be written as
where τ = 1/(μ − 1) and . In expanded form the nondimensional distance measure (48) then reads
From Eq. (71) we can derive the following relations
Substituting Eqs (70) and (72) in Eq. (68) yields
Since c(0) = 1, for all allowable M, we have from Eq. (69) the general relation
The first term on the right-hand side of each of the above equations vanishes since X = Y = 0 at = 0, while the common coefficient of the second term in each equation is
Thus we obtain the relationship governed by (56) with d = 2.
In the isotropic case, κxx = κyy = L2 and τκ−1xy = 0. Equations (73) and (74) can then be averaged and inverted to yield the standard definition of the square of the 2D Daley length-scale involving the Laplacian operator (Eq. (44) with d = 2).
Estimating the Hessian tensor from an ensemble of simulated errors
In this appendix we provide a derivation of Eq. (62) for the special case of a homogeneous correlation function. The derivation is similar to the one given in the Appendix of Belo Pereira and Berre (2006) except for notational changes and greater emphasis here on some of the underlying assumptions.
The starting point is the general expression (59) for the covariance function B(x,x′) of the ensemble of model-state errors. We consider here the 2D case where x = (x,y)T and x′ = (x′,y′)T, and assume that B(x,x′) is at least twice differentiable. We can express the covariance function of the derivatives of the ensemble errors as follows (Swerling, 1962; Daley, 1991, p. 156):
Using Eq. (60) the derivatives on the right-hand side of the above equations can be evaluated in terms of the standard deviations σ(x) and correlation function C(x,x′). Focusing on the first of these equations this yields
Under the assumption of homogeneous correlations, we can write C(x,x′) = c(r) where r = x − x′ = (x − x′,y − y′)T (Gaspari and Cohn, 1999). Using the chain rule, the derivatives of C with respect to x, x′, y and y′ can be rewritten in terms of derivatives of c with respect to and . For this gives
Evaluating the above equation at x = x′ (r = 0), and noting that since c(r) is maximum at r = 0 and that c(0) = 1, yields
which can be rearranged as
where σ = σ(x). A similar analysis for the other covariance functions yields
In tensor notation, the left-hand side of Eqs (75)–(60yx) is equivalent to Eq. (61) evaluated at r = 0, while the right-hand side of the equations can be identified with the right-hand side of Eq. (62).
Isotropic correlation functions on will be written explicitly as a function of r(θ) whenever the context requires an interpretation in terms of chordal distance.
It is common in VDA to use multivariate balance operators to transform the background-error variables into a set of new variables whose cross-covariances can be eﬀectively neglected. The background-error covariances of the transformed variables can then be treated as univariate functions and represented via a diﬀusion model.